CommVQ

[ICML 2025] CommVQ: Commutative Vector Quantization for KV Cache Compression

Generate Convert Improve

Install / Use

/learn @UMass-Embodied-AGI/CommVQ

About this skill

Quality Score

0/100

README

CommVQ: Commutative Vector Quantization for KV Cache Compression

[Paper] [Hugging Face Models]

This repository contains the official implementation of CommVQ, a method for memory-efficient and long-context inference through KV cache quantization with learned codebooks. It achieves strong performance across a wide range of benchmarks while significantly reducing memory overhead.

News
Model Checkpoints
Installation
Training
Evaluation
Memory Measurement
Citation

News

[June, 2025]: Released code and model weights.
[May, 2025]: CommVQ is accepted to ICML 2025! See you in Vancouver, BC.

Model Checkpoints

We release the following LLaMA-3.1 8B checkpoints with CommVQ 1-bit and 2-bit compression. Both value codebooks and key codebooks are provided below. The value codebooks are used together with the original (unchanged) model weights.

| Model Variant | Value Codebook | Key Codebook | |---------------|-------|----------| | LLaMA-3.1 8B + CommVQ 1-bit | 🤗 Hugging Face | 🤗 Hugging Face | | LLaMA-3.1 8B + CommVQ 2-bit | 🤗 Hugging Face | 🤗 Hugging Face |

Installation

conda create -n commvq python=3.10
conda activate commvq
pip install -e .
pip install flash-attn --no-build-isolation

Training

cd training

# Step 1: Collect KV cache
bash collect_kv.sh

# Step 2: Prepare scaling factors
python make_scale.py

# Step 3: Train the codebook for key cache
bash quantize_key_cache.sh

# Step 4: Train the codebook for value cache
bash finetune/llama3.1_8b_int1.sh

Evaluation

Longbench

cd evaluation/longbench
python pred.py --model $CHECKPOINT
python eval.py --model $RESULT_DIR

Infinitebench

cd evaluation/infiniteBench/src
# Download the evaluation datasets
bash scripts/download_dataset.sh
# Evaluate each tasks
bash run_passkey.sh
# Merge all results in each task into one jsonl file
cat ../results/commvq/preds_passkey_*.jsonl > ../results/commvq/preds_passkey.jsonl
# Compute the task score
python compute_scores.py --task all --model_name commvq

NIAH

cd evaluation/niah
bash run.sh $CHECKPOINT

Memory Measurement

We implement Triton-based kernels to further optimize memory usage and enable real memory savings with CommVQ. (Currently supports LLaMA-3.1 8B with 1-bit quantization; ongoing development for broader model support.)

cd evaluation/memory_measurement
pip install -e ../../transformers_triton_infer
bash eval_memory.sh $CHECKPOINT

Citation

If you find CommVQ useful in your research or applications, please consider citing:

@inproceedings{li2025commvq,
  title = {CommVQ: Commutative Vector Quantization for KV Cache Compression},
  author = {Junyan Li and Yang Zhang and Muhammad Yusuf Hassan and Talha Chafekar and Tianle Cai and Zhile Ren and Pengsheng Guo and Binazir Karimzadeh and Colorado J Reed and Chong Wang and Chuang Gan},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
  year = {2025}
}

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

UMass-Embodied-AGI

View profile

View on GitHub

GitHub Stars24

CategoryDevelopment

Updated5d ago

Forks0

UMass-Embodied-AGI/CommVQ

Languages

Python

Security Score

75/100

Audited on Mar 31, 2026

No findings

CommVQ

Install / Use

README

CommVQ: Commutative Vector Quantization for KV Cache Compression

Table of Contents

News

Model Checkpoints

Installation

Training

Evaluation

Longbench

Infinitebench

NIAH

Memory Measurement

Citation

Related Skills