<p align=center> <div align=center> <img src="https://github.com/user-attachments/assets/ab70eb1e-06e7-4dc9-83e5-bd562e1a78b2" width=500> </div> <h1 align="center">GPT-QModel</h1> </p> <p align="center">LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU, and Intel/AMD/Apple CPUs via HF, vLLM, and SGLang.</p> <p align="center"> <a href="https://github.com/ModelCloud/GPTQModel/releases" style="text-decoration:none;"><img alt="GitHub release" src="https://img.shields.io/github/release/ModelCloud/GPTQModel.svg"></a> <a href="https://pypi.org/project/gptqmodel/" style="text-decoration:none;"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/gptqmodel"></a> <a href="https://pepy.tech/projects/gptqmodel" style="text-decoration:none;"><img src="https://static.pepy.tech/badge/gptqmodel" alt="PyPI Downloads"></a> <a href="https://github.com/ModelCloud/GPTQModel/blob/main/LICENSE"><img src="https://img.shields.io/pypi/l/gptqmodel"></a> <a href="https://huggingface.co/modelcloud/"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-ModelCloud-%23ff8811.svg"></a> <a href="https://huggingface.co/models?search=gptq"> <img alt="Huggingface - Models" src="https://img.shields.io/badge/🤗_6.7K_gptq_models-8A2BE2"> </a> <a href="https://huggingface.co/models?search=awq"> <img alt="Huggingface - Models" src="https://img.shields.io/badge/🤗_8.2K_awq_models-8A2BE2"> </a> </p>

Latest News

03/22/2026 [6.0-dev main]: ✨New quantization methods: ParoQuant, GGUF, FP8, EXL3. main is currently undergoing a major refactor and api is unstable.
03/19/2026 5.8.0: ✨HF Transformers 5.3.0 support with auto-defusing of fused models via pypi pkg: Defuser. Qwen 3.5 family support added. New fast HF cpu kernels for GPTQ/AWQ added. Experimental INT8 cpu kernel added for GPTQ.
03/09/2026 [main]: ✨Qwen 3.5 MoE model support added. New HF Kernel support added for AWQ. HF Kernel for both gptq/awq are now used by default for cpu devices for best performance. New INT8 kernel ported from Intel for gptq.
02/28/2026 [main]: ✨Qwen 3.5 model support added.
02/09/2026 5.7.0: ✨New MoE.Routing config with Bypass and Override options to allow multiple brute-force MoE routing controls for higher quality quantization of MoE experts. Combined with FailSafeStrategy, GPT-QModel now has three separate control settings for efficient MoE expert quantization. AWQ qcfg.zero_point property has been merged with a unified sym symmetry property; zero_point=True is now sym=False. Fixed AWQ sym=True packing/inference and quantization compatibility with some Qwen3 models. Exaone 4.0 support.

<details> <summary>Archived News</summary> * 12/31/2025 5.7.0-dev: ✨New `FailSafe` config and `FailSafeStrategy`, auto-enabled by default, to address uneven routing of MoE experts resulting in quantization issues for some MoE modules. `Smooth` operations are introduced to `FailSafeStrategy` to reduce the impact of outliers in `FailSafe` quantization using `RTN` by default. Different `FailSafeStrategy` and `Smoothers` can be selected. `Threshold` to activate `FailSafe` can also be customized. New Voxtral and Glm-4v model support, plus audio dataset calibration for Qwen2-Omni. `AWQ` compatibility fix for `GLM 4.5-Air`.

GPTQModel

Install / Use

README

Latest News