HAWQ: Hessian AWare Quantization

HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform quantization, with direct hardware implementation through TVM.

For more details please see:

Installation

PyTorch version >= 1.4.0
Python version >= 3.6
For training new models, you'll also need NVIDIA GPUs and NCCL
To install HAWQ and develop locally:

git clone https://github.com/Zhen-Dong/HAWQ.git
cd HAWQ
pip install -r requirements.txt

Getting Started

Quantization-Aware Training

An example to run uniform 8-bit quantization for resnet50 on ImageNet.

export CUDA_VISIBLE_DEVICES=0
python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data /path/to/imagenet/ --pretrained --save-path /path/to/checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8

The commands for other quantization schemes and for other networks are shown in the model zoo.

Inference Acceleration

Instructions on Hardware Implementation through TVM

Experimental Results

Table I and Table II in HAWQ-V3: Dyadic Neural Network Quantization

ResNet18 on ImageNet

Model | Quantization | Model Size(MB) | BOPS(G) | Accuracy(%) | Inference Speed (batch=8, ms) | Download ---|---|---|---|---|---|--- ResNet18 | Floating Points | 44.6 | 1858 | 71.47 | 9.7 (1.0x) | resnet18_baseline ResNet18 | W8A8 | 11.1 | 116 | 71.56 | 3.3 (3.0x) | resnet18_uniform8 ResNet18 | Mixed Precision | 6.7 | 72 | 70.22 | 2.7 (3.6x) | resnet18_bops0.5 ResNet18 | W4A4 | 5.8 | 34 | 68.45 | 2.2 (4.4x) | resnet18_uniform4

ResNet50 on ImageNet

Model | Quantization | Model Size(MB) | BOPS(G) | Accuracy(%) | Inference Speed (batch=8, ms) | Download ---|---|---|---|---|---|--- ResNet50 | Floating Points | 97.8 | 3951 | 77.72 | 26.2 (1.0x) | resnet50_baseline ResNet50 | W8A8 | 24.5 | 247 | 77.58 | 8.5 (3.1x) | resnet50_uniform8 ResNet50 | Mixed Precision | 18.7 | 154 | 75.39 | 6.9 (3.8x) | resnet50_bops0.5 ResNet50 | W4A4 | 13.1 | 67 | 74.24 | 5.8 (4.5x) | resnet50_uniform4

More results for different quantization schemes and different models (also the corresponding commands and important notes) are available in the model zoo.
To download the quantized models through wget, please refer to a simple command in model zoo.
Checkpoints in model zoo are saved in floating point precision. To shrink the memory size, BitPack can be applied on weight_integer tensors, or directly on quantized_checkpoint.pth.tar file.

Related Works

License

THIS SOFTWARE WAS DEPOSITED IN THE BAIR OPEN RESEARCH COMMONS REPOSITORY ON FEB 1, 2023.

HAWQ is released under the MIT license.

HAWQ

Install / Use

README