<div align="center"> <img src="./assets/minicpm_logo.png" width="500em" ></img> </div> <h4 align="center"> <p> <a href="https://github.com/OpenBMB/MiniCPM/blob/main/README-cn.md">中文</a> | <b>English</b> <p> </h4> <p align="center"> <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM Paper</a> | <a href="https://modelbest.feishu.cn/wiki/D2tFw8Pcsi5CIzkaHNacLK64npg" target="_blank">MiniCPM Wiki (in Chinese)</a> | <a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V Repo</a> | Join our <a href="https://discord.gg/3cGQn9b3YM" target="_blank">discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a> | <a href="https://mp.weixin.qq.com/s/KIhH2nCURBXuFXAtYRpuXg?poc_token=HBIsUWijxino8oJ5s6HcjcfXFRi0Xj2LJlxPYD9c">Join Us</a> </p>

[!NOTE]

🏆 2026 Sparse Operator Acceleration & Race (SOAR) is Now Live!

The MiniCPM-SALA architecture is just the beginning. Realizing its full potential requires deep system-level synergy and cross-layer compilation optimization.

OpenBMB, in collaboration with SGLang and NVIDIA, invites global geeks to tackle the limits of 9B-scale, 1M-token inference on a dedicated NVIDIA 6000D environment.

💰 Prize Pool: >$100,000 USD (Top Prize: $89,000)

🚀 Goal: Optimize single and multi-batch performance via cross-layer compilation.

👉 Learn more and Register

Changelog🔥

[2026.02.11] MiniCPM-SALA is released! This is the first large-scale hybrid model effectively integrating sparse and linear attention for million-token context modeling. 🔥🔥🔥
[2025.09.29] InfLLM-V2 paper is released! We can train a sparse attention model with only 5B long-text tokens. 🔥🔥🔥
[2025.09.05] MiniCPM4.1 series are released! This series is a hybrid reasoning model with trainable sparse attention, which can be used in both deep reasoning mode and non-reasoning mode. 🔥🔥🔥
[2025.06.06] Released MiniCPM4! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips!
[2024.09.05] We release MiniCPM3-4B! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat.
[2024.07.05] Released MiniCPM-S-1B! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance.
[2024.04.11] Released MiniCPM-2B-128k, MiniCPM-MoE-8x2B and MiniCPM-1B! Click here to read our technical blog.
[2024.02.01] Released MiniCPM-2B! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B.

Model Downloads

| HuggingFace | ModelScope | |-------------|------------| | MiniCPM-SALA | MiniCPM-SALA | | MiniCPM4.1-8B | MiniCPM4.1-8B | | MiniCPM4.1-8B-GPTQ | MiniCPM4.1-8B-GPTQ | | MiniCPM4.1-8B-AutoAWQ | MiniCPM4.1-8B-AutoAWQ | | MiniCPM-4.1-8B-Marlin | MiniCPM-4.1-8B-Marlin | | MiniCPM4.1-8B-GGUF | MiniCPM4.1-8B-GGUF | | MiniCPM4.1-8B-MLX | MiniCPM4.1-8B-MLX | | MiniCPM4.1-8B-Eagle3 | MiniCPM4.1-8B-Eagle3 | | MiniCPM4-8B | MiniCPM4-8B | | MiniCPM4-0.5B | MiniCPM4-0.5B | | BitCPM4-1B | BitCPM4-1B | | BitCPM4-0.5B | BitCPM4-0.5B | | MiniCPM4-Survey | MiniCPM4-Survey | | MiniCPM4-MCP | MiniCPM4-MCP |

<details> <summary>📋 Click to view all MiniCPM series models</summary>

| HuggingFace | ModelScope | |-------------|------------| | MiniCPM4-8B-Eagle-FRSpec | MiniCPM4-8B-Eagle-FRSpec | | MiniCPM4-8B-Eagle-FRSpec-QAT | MiniCPM4-8B-Eagle-FRSpec-QAT | | MiniCPM4-8B-Eagle-vLLM | MiniCPM4-8B-Eagle-vLLM | | MiniCPM4-8B-marlin-Eagle-vLLM | MiniCPM4-8B-marlin-Eagle-vLLM | | MiniCPM4-0.5B-QAT-Int4-unquantized | MiniCPM4-0.5B-QAT-Int4-unquantized | | MiniCPM4-0.5B-QAT-Int4-GPTQ-format | [MiniCPM4-0.5B-QAT-Int4-GPTQ-format](https://modelscope.cn/models

MiniCPM

Install / Use

README

🏆 2026 Sparse Operator Acceleration & Race (SOAR) is Now Live!

Changelog🔥

Quick Links

Model Downloads