Nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
Install / Use
/learn @openvinotoolkit/NncfREADME
Neural Network Compression Framework (NNCF)
Key Features • Installation • Documentation • Usage • Tutorials and Samples • Third-party integration • Model Zoo
Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in OpenVINO™ with a minimal accuracy drop.
NNCF is designed to work with models from PyTorch, TorchFX, ONNX and OpenVINO™.
NNCF provides samples that demonstrate the usage of compression algorithms for different use cases and models. See compression results achievable with the NNCF-powered samples on the NNCF Model Zoo page.
The framework is organized as a Python* package that can be built and used in a standalone mode. The framework architecture is unified to make it easy to add different compression algorithms for both PyTorch deep learning frameworks.
<a id="key-features"></a>
Key Features
Post-Training Compression Algorithms
| Compression algorithm | OpenVINO | PyTorch | TorchFX | ONNX | | :------------------------------------------------------------------------------------------------------- | :-----------: | :----------: | :-----------: | :-----------: | | Post-Training Quantization | Supported | Supported | Experimental | Supported | | Weights Compression | Supported | Supported | Experimental | Supported | | Activation Sparsity | Not supported | Experimental | Not supported | Not supported |
Training-Time Compression Algorithms
| Compression algorithm | PyTorch | | :-------------------------------------------------------------------------------------------------------------------------------------------- | :-------: | | Quantization Aware Training | Supported | | Weight-Only Quantization Aware Training with LoRA and NLS | Supported | | Pruning | Supported |
- Automatic, configurable model graph transformation to obtain the compressed model.
- Common interface for compression methods.
- GPU-accelerated layers for faster compressed model fine-tuning.
- Distributed training support.
- Git patch for prominent third-party repository (huggingface-transformers) demonstrating the process of integrating NNCF into custom training pipelines.
- Exporting PyTorch compressed models to ONNX* checkpoints compressed models to SavedModel or Frozen Graph format, ready to use with OpenVINO™ toolkit.
<a id="documentation"></a>
Documentation
This documentation covers detailed information about NNCF algorithms and functions needed for the contribution to NNCF.
The latest user documentation for NNCF is available here.
NNCF API documentation can be found here.
<a id="usage"></a>
Usage
Post-Training Quantization
The NNCF PTQ is the simplest way to apply 8-bit quantization. To run the algorithm you only need your model and a small (~300 samples) calibration dataset.
OpenVINO is the preferred backend to run PTQ with, while PyTorch and ONNX are also supported.
<details open><summary><b>OpenVINO</b></summary>import nncf
import openvino as ov
import torch
from torchvision import datasets, transforms
# Instantiate your uncompressed model
model = ov.Core().read_model("/model_path")
# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)
# Step 1: Initialize transformation function
def transform_fn(data_item):
images, _ = data_item
return images
# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)
</details>
<details><summary><b>PyTorch</b></summary>
import nncf
import torch
from torchvision import datasets, models
# Instantiate your uncompressed model
model = models.mobilenet_v2()
# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset)
# Step 1: Initialize the transformation function
def transform_fn(data_item):
images, _ = data_item
return images
# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)
NOTE If the Post-Training Quantization algorithm does not meet quality requirements you can fine-tune the quantized pytorch model. You can find an example of the Quantization-Aware training pipeline for a pytorch model here.
</details> <details><summary><b>TorchFX</b></summary>import nncf
import torch.fx
from torchvision import datasets, models
# Instantiate your uncompressed model
model = models.mobilenet_v2()
# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset)
# Step 1: Initialize the transformation function
def transform_fn(data_item):
images, _ = data_item
return images
# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Export model to TorchFX
input_shape = (1, 3, 224, 224)
fx_model = torch.export.export_for_training(model, args=(ex_input,)).module()
# or
# fx_model = torch.export.export(model, args=(ex_input,)).module()
# Step 4: Run the quantization pipeline
quantized_fx_model = nncf.quantize(fx_model, calibration_dataset)
</details>
<details><summary><b>ONNX</b></summary>
import onnx
import nncf
import torch
from torchvision import datasets
# Instantiate your uncompressed model
onnx_model = onnx.load_model("/model_path")
# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)
# Step 1: Initialize transformation function
input_name = onnx_model.graph.input[0].name
def transform_fn(data_item):
images, _ = data_item
return {input_name: images.numpy()}
# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(onnx_model, calibration_dataset)
</details>
Training-Time Quantization
Here is an example of Accuracy Aware Quantization pipeline where model weights and compression parameters may be fine-tuned to achieve a higher accuracy.
<details><summary><b>PyTorch</b></summary>import nncf
import torch
from torchvision import datasets, models
# Instantiate your uncompressed model
model = models.mobilenet_v2()
# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset)
# Step 1: Initialize the transformation function
def transform_fn(data_item):
images, _ = data_item
return images
# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
