<a href="README_CN.md">中文</a>&nbsp ｜ &nbspEnglish&nbsp ｜ &nbsp<a href="README_JA.md">日本語</a> ｜ &nbsp<a href="README_FR.md">Français</a> ｜ &nbsp<a href="README_ES.md">Español</a> <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/logo_qwen.jpg" width="400"/> 🤗 <a href="https://huggingface.co/Qwen">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/qwen">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2309.16609">Paper</a> &nbsp&nbsp ｜ &nbsp&nbsp🖥️ <a href="https://modelscope.cn/studios/qwen/Qwen-72B-Chat-Demo/summary">Demo</a> <a href="assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp<a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp ｜ &nbsp&nbsp<a href="https://dashscope.aliyun.com">API</a>

[!Important] Qwen2 is here! You are welcome to follow QwenLM/Qwen2 and share your experience there.

This repo (QwenLM/Qwen) is no longer actively maintained, due to substantial codebase differences.

| | Qwen-Chat | Qwen-Chat (Int4) | Qwen-Chat (Int8) | Qwen | |-----|:------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------:| | 1.8B | <a href="https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-1_8B-Chat">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int4/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int4">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int8/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int8">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-1_8B/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-1_8B">🤗</a> | | 7B | <a href="https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-7B-Chat">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-7B-Chat-Int4">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-7B-Chat-Int8">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-7B/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-7B">🤗</a> | | 14B | <a href="https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-14B-Chat">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-14B-Chat-Int4">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-14B-Chat-Int8">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-14B/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-14B">🤗</a> | | 72B | <a href="https://modelscope.cn/models/qwen/Qwen-72B-Chat/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-72B-Chat">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int4/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-72B-Chat-Int4">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int8/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-72B-Chat-Int8">🤗</a> | <a href="https://modelscope.cn/models/qwen/Qwen-72B/summary">🤖</a> <a href="https://huggingface.co/Qwen/Qwen-72B">🤗</a> |

We opensource our Qwen series, now including Qwen, the base language models, namely Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B, as well as Qwen-Chat, the chat models, namely Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, and Qwen-72B-Chat. Links are on the above table. Click them and check the model cards. Also, we release the technical report. Please click the paper link and check it out!

In brief, we have strong base language models, which have been stably pretrained for up to 3 trillion tokens of multilingual data with a wide coverage of domains, languages (with a focus on Chinese and English), etc. They are able to achieve competitive performance on benchmark datasets. Additionally, we have chat models that are aligned with human preference based on SFT and RLHF (not released yet), which are able to chat, create content, extract information, summarize, translate, code, solve math problems, and so on, and are able to use tools, play as agents, or even play as code interpreters, etc.

| Model | Release Date | Max Length | System Prompt Enhancement | # of Pretrained Tokens | Minimum GPU Memory Usage of Finetuning (Q-Lora) | Minimum GPU Usage of Generating 2048 Tokens (Int4) | Tool Usage | |:----------|:------------:|:----------:|:-------------------------:|:----------------------:|:-----------------------------------------------:|:--------------------------------------------------:|:----------:| | Qwen-1.8B | 23.11.30 | 32K | ✅ | 2.2T | 5.8GB | 2.9GB | ✅ |
| Qwen-7B | 23.08.03 | 32K | ❎ | 2.4T | 11.5GB | 8.2GB | ✅ |
| Qwen-14B | 23.09.25 | 8K | ❎ | 3.0T | 18.7GB | 13.0GB | ✅ | | Qwen-72B | 23.11.30 | 32K | ✅ | 3.0T | 61.4GB | 48.9GB | ✅ |

In this repo, you can figure out:

Quickstart with Qwen, and enjoy the simple inference.
Details about the quantization models, including GPTQ and KV cache quantization.
Statistics of inference performance, including speed and memory.
Tutorials on finetuning, including full-parameter tuning, LoRA, and Q-LoRA.
Instructions on deployment, with the example of vLLM and FastChat.
Instructions on building demos, including WebUI, CLI demo, etc.
Introduction to DashScope API service, as well as the instructions on building an OpenAI-style API for your model.
Information about Qwen for tool use, agent, and code interpreter
Statistics of long-context understanding evaluation
License agreement
...

Also, if you meet problems, turn to FAQ for help first. Still feeling struggled? Feel free to shoot us issues (better in English so that more people can understand you)! If you would like to help us, send us pull requests with no hesitation! We are always excited about PR!

Would like to chat with us or date us coffee time? Welcome to our Discord or WeChat!

News and Updates

2023.11.30 🔥 We release Qwen-72B and Qwen-72B-Chat, which are trained on 3T tokens and support 32k context, along with Qwen-1.8B, and Qwen-1.8B-Chat, on ModelScope and Hugging Face. We have also strengthened the System Prompt capabilities of the Qwen-72B-Chat and Qwen-1.8B-Chat, see example documentation. Additionally, support the inference on Ascend 910 and Hygon DCU. Check ascend-support and dcu-support for more details.
2023.10.17 We release the Int8 quantized model Qwen-7B-Chat-Int8 and Qwen-14B-Chat-Int8.
2023.9.25 🔥 We release Qwen-14B and Qwen-14B-Chat on ModelScope and Hugging Face, along with qwen.cpp and Qwen-Agent. Codes and checkpoints of Qwen-7B and Qwen-7B-Chat are also updated. PLEASE PULL THE LATEST VERSION!
- Compared to Qwen-7B (original), Qwen-7B uses more training tokens, increasing from 2.2T tokens to 2.4T tokens, while the context length extends from 2048 to 8192. The Chinese knowledge and coding ability of Qwen-7B have been further improved.
2023.9.12 We now support finetuning on the Qwen-7B models, including full-parameter finetuning, LoRA and Q-LoRA.
2023.8.21 We release the Int4 quantized model for Qwen-7B-Chat, Qwen-7B-Chat-Int4, which requires low memory costs but achieves improved inference speed. Besides, there is no significant performance degradation on the benchmark evaluation.
2023.8.3 We release both Qwen-7B and Qwen-7B-Chat on ModelScope and Hugging Face. We also provide a technical memo for more details about the model, including training details and model performance.

Performance

Qwen models outperform the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-

Qwen

Install / Use

README

News and Updates

Performance