QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Install / Use
/learn @spcl/QuaRotREADME
<img src="img/carrot.png" alt="Your Image" width="40" height="45">QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
This repository contains the code for QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.
Abstract
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized LLaMa2-70B model has losses of at most 0.29 WikiText perplexity and retains 99% of the zero-shot performance.

Usage
Compile the QuaRot kernels using the following commands:
git clone https://github.com/spcl/QuaRot.git
cd QuaRot
pip install -e . # or pip install .
For simulation results, check fake_quant directory.
Star History
Citation
The full citation is
@article{ashkboos2024quarot,
title={QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs},
author={Ashkboos, Saleh and Mohtashami, Amirkeivan and Croci, Maximilian L and Li, Bo and Jaggi, Martin and Alistarh, Dan and Hoefler, Torsten and Hensman, James},
journal={arXiv preprint arXiv:2404.00456},
year={2024}
}
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
