AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
Install / Use
/learn @Tencent/AngelSlimREADME
English | 简体中文
<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/logos/angelslim_logo_light.png"> <img alt="AngelSlim" src="./docs/source/assets/logos/angelslim_logo.png" width=55%> </picture> </p> <h3 align="center"> A more accessible, comprehensive, and efficient toolkit for large model compression. </h3> <p align="center"> ✒️ <a href="https://arxiv.org/abs/2602.21233">TechnicalReport</a>   |    📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>   |   🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a> <br> </p> <p align="center"> 💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a> |   🫨 <a href="https://discord.com/invite/dHVNeuNdFt">Discord</a> <br> </p>📣Latest News
- [26/03/25] We have released DAQ, the quantization algorithm that preserves the knowledge acquired while the update of parameters is relatively small during post-training training.[Paper] | [Docs]
- [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model,[Huggingface].
- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the guidance documentation. And We released Sherry, the hardware-efficient 1.25 bit quantization algorithm [Paper] | [Code]🔥🔥🔥
- [25/11/05] We have released v0.2. Quantization support for new models, such as
GLM-4.6,Qwen3-VLandQwen3-Omni, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools. - [25/09/30] We have released SpecExit, the reasoning early-exit algorithm: [Paper] | [Docs] | [vLLM Code]
- [25/09/26] We have released TEQUILA, the ternary quantization algorithm [Paper] | [Code]
- [25/09/24] We now support the PTQ quantization of NVFP4 for the Qwen3 series models. We also opensource Qwen3-32B-NVFP4 and Qwen3-235B-A22B-NVFP4 weights.
- [25/09/01] We now support FP8 quantization of the Hunyuan-MT-7B translation model. And enabled Torch inference and Benchmark evaluation for Eagle3. And implemented support for quantization and Cache for FLUX. And support quantization for the Seed-OSS.
- [25/08/06] We now support quantization for
Hunyuan 0.5B/1.8B/4B/7Band multimodal modelQwen2.5VL 3B/7B/32B/72B, includingFP8/INT4algorithms, and quantization forDeepSeek-R1/V3andKimi-K2, includingFP8-StaticandW4A8-FP8algorithms. We also opensourceHunyuan 1.8B/4B/7Bseries Eagle3 model weight. - [25/07/04] We now support quantization for
Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwenand other models, includingINT8/FP8/INT4algorithms. We also opensourceQwen3series Eagle3 model weight.
🌟Key Features
- Highly Integrated: This toolkit integrates mainstream compression algorithms into a unified framework, offering developers one-click access with exceptional ease of use.
- Continuous Innovation: Beyond integrating widely-used industry algorithms, we are continuously researching better compression algorithms, which will be gradually open-sourced in the future.
- Performance-Driven: We continuously optimize end-to-end performance in model compression workflows and algorithm deployment, such as enabling quantization of models like Qwen3-235B and DeepSeek-R1 on a single GPU.
💼Technical Overview
<table> <thead> <tr> <th rowspan="2" style="text-align: center; vertical-align: middle;">Scenario</th> <th rowspan="2" style="text-align: center; vertical-align: middle;">Model</th> <th colspan="3" style="text-align: center; vertical-align: middle;">Compression Strategy</th> </tr> <tr> <th style="text-align: center; vertical-align: middle;">Quantization</th> <th style="text-align: center; vertical-align: middle;">Speculative Decoding</th> <th style="text-align: center; vertical-align: middle;">Other Techniques</th> </tr> </thead> <tbody> <tr> <td><strong>Large Language Models (LLMs)</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://huggingface.co/collections/tencent/hunyuan-dense-model">Hunyuan-Dense</a></li> <li><a href="https://huggingface.co/collections/tencent/hunyuan-a13b">Hunyuan-MoE</a></li> <li><a href="https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8">Qwen3</a></a></li> <li><a href="https://huggingface.co/AngelSlim/DeepSeek-R1-0528_w4a8_fp8">DeepSeek-V3/R1</a></li> <li><a href="https://huggingface.co/AngelSlim/Glm4_6-fp8_static">GLM-4.6</a></li> <li><a href="https://huggingface.co/collections/AngelSlim/qwen2-25-quant-68652d6cbdf5c0d4b1c4499a">Qwen2.5</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">FP8-Static/Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">INT8-Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">INT4-GPTQ/AWQ/GPTAQ</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/d55b06aeffc53e31f485044c5026e754f4e27b74/configs/qwen3/nvfp4">NVFP4</a></li> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/quantization/fp8_lepto.html">LeptoQuant</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant">Tequila</a> | <a href="https://github.com/Tencent/AngelSlim/tree/sherry/Sherry">Sherry</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li> <strong>Sparse Attention</strong> <ul style="padding-left: 1.5rem"> <li>Under Development</li> </ul> </li> </ul> </td> </tr> <tr> <td><strong>Vision Language Models (VLMs)</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="">Hunyuan-VL</a></li> <li><a href="https://huggingface.co/tencent/HunyuanOCR">HunyuanOCR</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen3-vl">Qwen3-VL</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen25-vl">Qwen2.5-VL</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3_vl">FP8-Static/Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen2_5_vl">INT8-Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen2_5_vl">INT4-GPTQ/AWQ/GPTAQ</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li> <strong>Token Pruning</strong> <ul style="padding-left: 1.5rem"> <li>Under Development</li> </ul> </li> </ul> </td> </tr> <tr> <td><strong>Diffusion Models</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://huggingface.co/collections/tencent/hunyuanimage">Hunyuan-Image</a></li> <li><a href="https://huggingface.co/tencent/HunyuanVideo">Hunyuan-Video</a></li> <li><a href="https://huggingface.co/collections/tencent/hunyuan3d">Hunyuan-3D</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen-image">Qwen-Image</a></li> <li><a href="https://huggingface.co/collections/black-forest-labs/flux1">FLUX</a></li> <li><a href="https://huggingface.co/collections/Wan-AI/wan21">Wan</a></li> <li><a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0">SDXL</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/diffusion/quantization.html">FP8Related Skills
ai-cmo
Collection of my Agent Skills and books.
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
38PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
