FOEM
(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
Install / Use
/learn @Xingyu-Zheng/FOEMREADME
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models.
FOEM has been accepted at AAAI 2026.
We have completed the integration with GPTQModel.
Parts of this repository are now outdated, but we keep it available for developers who wish to debug or experiment with the algorithm.
The code snippets and results below are all obtained using GPTQModel.
Quant
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig, FOEMConfig
size = "8B"
model_id = f"Qwen/Qwen3-{size}"
quant_path = f"models/gptqmodel/Qwen3-{size}-foem-4bit"
calibration_dataset = load_dataset(
"allenai/c4",
data_files="en/c4-train.00001-of-01024.json.gz",
split="train"
).select(range(256))["text"]
quant_config = QuantizeConfig(bits=4, group_size=128, foem=FOEMConfig(alpha=0, beta=0.2, device="auto"))
model = GPTQModel.load(model_id, quant_config)
model.quantize(calibration_dataset, batch_size=4)
model.save(quant_path)
Eval
lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3-8B-foem-4bit,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.6 --tasks wikitext --batch_size auto
Result
Note: The PPL evaluation on WikiText using lm-eval differs from that reported in our original paper.
| Model | Method | Bits | Hyperparameters | Wikitext PPL | | ---------- | ---------------- | ---- | -------------------- | ------------ | | Qwen3-0.6B | GPTQ | 4 | \ | 30.0372 | | | GPTAQ | 4 | alpha=0.25 | 30.5776 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 29.6199 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 29.3823 | | Qwen3-8B | GPTQ | 4 | \ | 12.5488 | | | GPTAQ | 4 | alpha=0.25 | 12.7152 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 12.5128 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 12.6172 |
Citation
If you find this work useful, please cite:
@inproceedings{zheng2026first,
title={First-order error matters: Accurate compensation for quantized large language models},
author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={34},
pages={28883--28891},
year={2026}
}
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
