SkillAgentSearch skills...

FOEM

(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

Install / Use

/learn @Xingyu-Zheng/FOEM
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models.

FOEM has been accepted at AAAI 2026.

We have completed the integration with GPTQModel.

Parts of this repository are now outdated, but we keep it available for developers who wish to debug or experiment with the algorithm.

The code snippets and results below are all obtained using GPTQModel.

Quant

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig, FOEMConfig

size = "8B"
model_id = f"Qwen/Qwen3-{size}"
quant_path = f"models/gptqmodel/Qwen3-{size}-foem-4bit"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(256))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128, foem=FOEMConfig(alpha=0, beta=0.2, device="auto"))

model = GPTQModel.load(model_id, quant_config)

model.quantize(calibration_dataset, batch_size=4)

model.save(quant_path)

Eval

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3-8B-foem-4bit,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.6 --tasks wikitext --batch_size auto

Result

Note: The PPL evaluation on WikiText using lm-eval differs from that reported in our original paper.

| Model | Method | Bits | Hyperparameters | Wikitext PPL | | ---------- | ---------------- | ---- | -------------------- | ------------ | | Qwen3-0.6B | GPTQ | 4 | \ | 30.0372 | | | GPTAQ | 4 | alpha=0.25 | 30.5776 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 29.6199 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 29.3823 | | Qwen3-8B | GPTQ | 4 | \ | 12.5488 | | | GPTAQ | 4 | alpha=0.25 | 12.7152 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 12.5128 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 12.6172 |


Citation

If you find this work useful, please cite:

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}

Related Skills

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated1d ago
Forks0

Languages

Python

Security Score

75/100

Audited on Apr 2, 2026

No findings