FOEM

(AAAI 2026) First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

Generate Convert Improve

Install / Use

/learn @Xingyu-Zheng/FOEM

About this skill

Quality Score

0/100

README

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models.

FOEM has been accepted at AAAI 2026.

We have completed the integration with GPTQModel.

Parts of this repository are now outdated, but we keep it available for developers who wish to debug or experiment with the algorithm.

The code snippets and results below are all obtained using GPTQModel.

Quant

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig, FOEMConfig

size = "8B"
model_id = f"Qwen/Qwen3-{size}"
quant_path = f"models/gptqmodel/Qwen3-{size}-foem-4bit"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(256))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128, foem=FOEMConfig(alpha=0, beta=0.2, device="auto"))

model = GPTQModel.load(model_id, quant_config)

model.quantize(calibration_dataset, batch_size=4)

model.save(quant_path)

Eval

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3-8B-foem-4bit,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.6 --tasks wikitext --batch_size auto

Result

Note: The PPL evaluation on WikiText using lm-eval differs from that reported in our original paper.

| Model | Method | Bits | Hyperparameters | Wikitext PPL | | ---------- | ---------------- | ---- | -------------------- | ------------ | | Qwen3-0.6B | GPTQ | 4 | \ | 30.0372 | | | GPTAQ | 4 | alpha=0.25 | 30.5776 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 29.6199 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 29.3823 | | Qwen3-8B | GPTQ | 4 | \ | 12.5488 | | | GPTAQ | 4 | alpha=0.25 | 12.7152 | | | FOEM (w/o GPTAQ) | 4 | alpha=0, beta=0.2 | 12.5128 | | | FOEM (w/ GPTAQ) | 4 | alpha=0.25, beta=0.2 | 12.6172 |

Citation

If you find this work useful, please cite:

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}

Related Skills

node-connect

346.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

346.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

346.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。