SkillAgentSearch skills...

AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Install / Use

/learn @Tencent/AngelSlim

README

English | 简体中文

<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/logos/angelslim_logo_light.png"> <img alt="AngelSlim" src="./docs/source/assets/logos/angelslim_logo.png" width=55%> </picture> </p> <h3 align="center"> A more accessible, comprehensive, and efficient toolkit for large model compression. </h3> <p align="center"> ✒️ <a href="https://arxiv.org/abs/2602.21233">TechnicalReport</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a> <br> </p> <p align="center"> 💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a> | &nbsp&nbsp🫨 <a href="https://discord.com/invite/dHVNeuNdFt">Discord</a> <br> </p>

📣Latest News

  • [26/03/25] We have released DAQ, the quantization algorithm that preserves the knowledge acquired while the update of parameters is relatively small during post-training training.[Paper] | [Docs]
  • [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model,[Huggingface].
  • [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the guidance documentation. And We released Sherry, the hardware-efficient 1.25 bit quantization algorithm [Paper] | [Code]🔥🔥🔥
  • [25/11/05] We have released v0.2. Quantization support for new models, such as GLM-4.6, Qwen3-VL and Qwen3-Omni, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools.
  • [25/09/30] We have released SpecExit, the reasoning early-exit algorithm: [Paper] | [Docs] | [vLLM Code]
  • [25/09/26] We have released TEQUILA, the ternary quantization algorithm [Paper] | [Code]
  • [25/09/24] We now support the PTQ quantization of NVFP4 for the Qwen3 series models. We also opensource Qwen3-32B-NVFP4 and Qwen3-235B-A22B-NVFP4 weights.
<details> <summary>Previous News</summary>
  • [25/09/01] We now support ​FP8 quantization​ of the Hunyuan-MT-7B translation model. And enabled ​Torch inference and Benchmark evaluation​ for Eagle3. And implemented support for ​quantization and Cache​ for FLUX. And support ​quantization​ for the Seed-OSS.
  • [25/08/06] We now support quantization for Hunyuan 0.5B/1.8B/4B/7B and multimodal model Qwen2.5VL 3B/7B/32B/72B, including FP8/INT4 algorithms, and quantization for DeepSeek-R1/V3 and Kimi-K2, including FP8-Static and W4A8-FP8 algorithms. We also opensource Hunyuan 1.8B/4B/7B series Eagle3 model weight.
  • [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms. We also opensource Qwen3 series Eagle3 model weight.
</details>

🌟Key Features

  • Highly Integrated: This toolkit integrates mainstream compression algorithms into a unified framework, offering developers one-click access with exceptional ease of use.
  • Continuous Innovation: Beyond integrating widely-used industry algorithms, we are continuously researching better compression algorithms, which will be gradually open-sourced in the future.
  • Performance-Driven: We continuously optimize end-to-end performance in model compression workflows and algorithm deployment, such as enabling quantization of models like Qwen3-235B and DeepSeek-R1 on a single GPU.

💼Technical Overview

<table> <thead> <tr> <th rowspan="2" style="text-align: center; vertical-align: middle;">Scenario</th> <th rowspan="2" style="text-align: center; vertical-align: middle;">Model</th> <th colspan="3" style="text-align: center; vertical-align: middle;">Compression Strategy</th> </tr> <tr> <th style="text-align: center; vertical-align: middle;">Quantization</th> <th style="text-align: center; vertical-align: middle;">Speculative Decoding</th> <th style="text-align: center; vertical-align: middle;">Other Techniques</th> </tr> </thead> <tbody> <tr> <td><strong>Large Language Models (LLMs)</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://huggingface.co/collections/tencent/hunyuan-dense-model">Hunyuan-Dense</a></li> <li><a href="https://huggingface.co/collections/tencent/hunyuan-a13b">Hunyuan-MoE</a></li> <li><a href="https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8">Qwen3</a></a></li> <li><a href="https://huggingface.co/AngelSlim/DeepSeek-R1-0528_w4a8_fp8">DeepSeek-V3/R1</a></li> <li><a href="https://huggingface.co/AngelSlim/Glm4_6-fp8_static">GLM-4.6</a></li> <li><a href="https://huggingface.co/collections/AngelSlim/qwen2-25-quant-68652d6cbdf5c0d4b1c4499a">Qwen2.5</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">FP8-Static/Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">INT8-Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3">INT4-GPTQ/AWQ/GPTAQ</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/d55b06aeffc53e31f485044c5026e754f4e27b74/configs/qwen3/nvfp4">NVFP4</a></li> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/quantization/fp8_lepto.html">LeptoQuant</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant">Tequila</a> | <a href="https://github.com/Tencent/AngelSlim/tree/sherry/Sherry">Sherry</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html">SpecExit</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li> <strong>Sparse Attention</strong> <ul style="padding-left: 1.5rem"> <li>Under Development</li> </ul> </li> </ul> </td> </tr> <tr> <td><strong>Vision Language Models (VLMs)</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="">Hunyuan-VL</a></li> <li><a href="https://huggingface.co/tencent/HunyuanOCR">HunyuanOCR</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen3-vl">Qwen3-VL</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen25-vl">Qwen2.5-VL</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen3_vl">FP8-Static/Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen2_5_vl">INT8-Dynamic</a></li> <li><a href="https://github.com/Tencent/AngelSlim/tree/main/configs/qwen2_5_vl">INT4-GPTQ/AWQ/GPTAQ</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html">Eagle3</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li> <strong>Token Pruning</strong> <ul style="padding-left: 1.5rem"> <li>Under Development</li> </ul> </li> </ul> </td> </tr> <tr> <td><strong>Diffusion Models</strong></td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://huggingface.co/collections/tencent/hunyuanimage">Hunyuan-Image</a></li> <li><a href="https://huggingface.co/tencent/HunyuanVideo">Hunyuan-Video</a></li> <li><a href="https://huggingface.co/collections/tencent/hunyuan3d">Hunyuan-3D</a></li> <li><a href="https://huggingface.co/collections/Qwen/qwen-image">Qwen-Image</a></li> <li><a href="https://huggingface.co/collections/black-forest-labs/flux1">FLUX</a></li> <li><a href="https://huggingface.co/collections/Wan-AI/wan21">Wan</a></li> <li><a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0">SDXL</a></li> </ul> </td> <td> <ul style="padding-left: 0; list-style-position: inside;"> <li><a href="https://angelslim.readthedocs.io/zh-cn/latest/features/diffusion/quantization.html">FP8

Related Skills

View on GitHub
GitHub Stars559
CategoryProduct
Updated1d ago
Forks75

Languages

Python

Security Score

85/100

Audited on Mar 31, 2026

No findings