MiniCPM
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Install / Use
/learn @OpenBMB/MiniCPMREADME
[!NOTE]
🏆 2026 Sparse Operator Acceleration & Race (SOAR) is Now Live!
The MiniCPM-SALA architecture is just the beginning. Realizing its full potential requires deep system-level synergy and cross-layer compilation optimization.
OpenBMB, in collaboration with SGLang and NVIDIA, invites global geeks to tackle the limits of 9B-scale, 1M-token inference on a dedicated NVIDIA 6000D environment.
- 💰 Prize Pool: >$100,000 USD (Top Prize: $89,000)
- 🚀 Goal: Optimize single and multi-batch performance via cross-layer compilation.
Changelog🔥
- [2026.02.11] MiniCPM-SALA is released! This is the first large-scale hybrid model effectively integrating sparse and linear attention for million-token context modeling. 🔥🔥🔥
- [2025.09.29] InfLLM-V2 paper is released! We can train a sparse attention model with only 5B long-text tokens. 🔥🔥🔥
- [2025.09.05] MiniCPM4.1 series are released! This series is a hybrid reasoning model with trainable sparse attention, which can be used in both deep reasoning mode and non-reasoning mode. 🔥🔥🔥
- [2025.06.06] Released MiniCPM4! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips!
- [2024.09.05] We release MiniCPM3-4B! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat.
- [2024.07.05] Released MiniCPM-S-1B! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance.
- [2024.04.11] Released MiniCPM-2B-128k, MiniCPM-MoE-8x2B and MiniCPM-1B! Click here to read our technical blog.
- [2024.02.01] Released MiniCPM-2B! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B.
Quick Links
- Changelog🔥
- Quick Links
- Model Downloads
- MiniCPM-SALA
- MiniCPM4 and MiniCPM4.1 Series
- LICENSE
- Institutions
- Citation
Model Downloads
| HuggingFace | ModelScope | |-------------|------------| | MiniCPM-SALA | MiniCPM-SALA | | MiniCPM4.1-8B | MiniCPM4.1-8B | | MiniCPM4.1-8B-GPTQ | MiniCPM4.1-8B-GPTQ | | MiniCPM4.1-8B-AutoAWQ | MiniCPM4.1-8B-AutoAWQ | | MiniCPM-4.1-8B-Marlin | MiniCPM-4.1-8B-Marlin | | MiniCPM4.1-8B-GGUF | MiniCPM4.1-8B-GGUF | | MiniCPM4.1-8B-MLX | MiniCPM4.1-8B-MLX | | MiniCPM4.1-8B-Eagle3 | MiniCPM4.1-8B-Eagle3 | | MiniCPM4-8B | MiniCPM4-8B | | MiniCPM4-0.5B | MiniCPM4-0.5B | | BitCPM4-1B | BitCPM4-1B | | BitCPM4-0.5B | BitCPM4-0.5B | | MiniCPM4-Survey | MiniCPM4-Survey | | MiniCPM4-MCP | MiniCPM4-MCP |
<details> <summary>📋 Click to view all MiniCPM series models</summary>| HuggingFace | ModelScope | |-------------|------------| | MiniCPM4-8B-Eagle-FRSpec | MiniCPM4-8B-Eagle-FRSpec | | MiniCPM4-8B-Eagle-FRSpec-QAT | MiniCPM4-8B-Eagle-FRSpec-QAT | | MiniCPM4-8B-Eagle-vLLM | MiniCPM4-8B-Eagle-vLLM | | MiniCPM4-8B-marlin-Eagle-vLLM | MiniCPM4-8B-marlin-Eagle-vLLM | | MiniCPM4-0.5B-QAT-Int4-unquantized | MiniCPM4-0.5B-QAT-Int4-unquantized | | MiniCPM4-0.5B-QAT-Int4-GPTQ-format | [MiniCPM4-0.5B-QAT-Int4-GPTQ-format](https://modelscope.cn/models
