Micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Generate Convert Improve

Install / Use

/learn @666DZY666/Micronet

About this skill

Quality Score

0/100

README

micronet

"目前在深度学习领域分类两个派别，一派为学院派，研究强大、复杂的模型网络和实验方法，为了追求更高的性能；另一派为工程派，旨在将算法更稳定、高效的落地在硬件平台上，效率是其追求的目标。复杂的模型固然具有更好的性能，但是高额的存储空间、计算资源消耗是使其难以有效的应用在各硬件平台上的重要原因。所以，深度神经网络日益增长的规模为深度学习在移动端的部署带来了巨大的挑战，深度学习模型压缩与部署成为了学术界和工业界都重点关注的研究领域之一"

项目简介

PyPI - Python Version

micronet, a model compression and deploy lib.

压缩

量化：High-Bit(>2b): QAT, PTQ, QAFT; Low-Bit(≤2b)/Ternary and Binary: QAT
剪枝：正常、规整和分组卷积结构剪枝
针对特征(A)二值量化的BN融合(训练量化后，BN参数 —> conv的偏置b)
High-Bit量化的BN融合(训练量化中，先融合再量化，融合：BN参数 —> conv的权重w和偏置b)

部署

TensorRT(fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape等)

代码结构

code_structure

micronet
├── __init__.py
├── base_module
│   ├── __init__.py
│   └── op.py
├── compression
│   ├── README.md
│   ├── __init__.py
│   ├── pruning
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── gc_prune.py
│   │   ├── main.py
│   │   ├── models_save
│   │   │   └── models_save.txt
│   │   └── normal_regular_prune.py
│   └── quantization
│       ├── README.md
│       ├── __init__.py
│       ├── wbwtab
│       │   ├── __init__.py
│       │   ├── bn_fuse
│       │   │   ├── bn_fuse.py
│       │   │   ├── bn_fused_model_test.py
│       │   │   └── models_save
│       │   │       └── models_save.txt
│       │   ├── main.py
│       │   ├── models_save
│       │   │   └── models_save.txt
│       │   └── quantize.py
│       └── wqaq
│           ├── __init__.py
│           ├── dorefa
│           │   ├── __init__.py
│           │   ├── main.py
│           │   ├── models_save
│           │   │   └── models_save.txt
│           │   ├── quant_model_test
│           │   │   ├── models_save
│           │   │   │   └── models_save.txt
│           │   │   ├── quant_model_para.py
│           │   │   └── quant_model_test.py
│           │   └── quantize.py
│           └── iao
│               ├── __init__.py
│               ├── bn_fuse
│               │   ├── bn_fuse.py
│               │   ├── bn_fused_model_test.py
│               │   └── models_save
│               │       └── models_save.txt
│               ├── main.py
│               ├── models_save
│               │   └── models_save.txt
│               └── quantize.py
├── data
│   └── data.txt
├── deploy
│   ├── README.md
│   ├── __init__.py
│   └── tensorrt
│       ├── README.md
│       ├── __init__.py
│       ├── calibrator.py
│       ├── eval_trt.py
│       ├── models
│       │   ├── __init__.py
│       │   └── models_trt.py
│       ├── models_save
│       │   └── calibration_seg.cache
│       ├── test_trt.py
│       └── util_trt.py
├── models
│   ├── __init__.py
│   ├── nin.py
│   ├── nin_gc.py
│   └── resnet.py
└── readme_imgs
    ├── code_structure.jpg
    └── micronet.xmind

项目进展

2019.12.4, 初次提交
12.8, DoReFa特征(A)量化前先进行缩放(* 0.1)，然后再截断，以减小截断误差
12.11, 增加项目代码结构图
12.12, 完善使用示例
12.14, 增加:1、BN融合量化情况(W三值/二值)可选，即训练量化时选择W三/二值，这里则对应选择; 2、BN融合时对卷积核(conv)不带偏置(bias)的处理
12.17, 增加模型压缩前后数据对比(示例)
12.20, 增加设备可选(cpu、gpu(单卡、多卡))
12.27, 补充相关论文
12.29, 取消High-Bit量化8-bit以内的限制，即现在可以量化至10-bit、16-bit等
2020.2.17, 1、精简W三值/二值量化代码; 2、加速W三值量化训练
2.18, 优化针对特征(A)二值的BN融合:去除对BN层gamma参数的限制，即现在此情况下融合时BN可正常训练
2.24, 再次优化三/二值量化代码组织结构，增强可移植性，旧版确实不太好移植。目前移植方法：将想要量化的Conv用compression/quantization/wbwtab/models/util_wbwtab.py中的QuantConv2d替换即可，可参照该路径下nin_gc.py中的使用方法
3.1, 新增：1、google的High-Bit量化方法; 2、训练中High-Bit量化的BN融合
3.2、3.3, 规整量化代码整体结构，目前所有量化方法都可采取类似的移植方式：将想要量化的Conv(或FC，目前dorefa支持，其他方法类似可写)用models/util_wxax.py中的QuantConv2d(或QuantLinear)替换即可，可分别参照该路径下nin_gc.py中的使用方法进行移植（分类、检测、分割等均适用，但需要据实际情况具体调试）
3.4, 规整优化wbwtab/bn_fuse中“针对特征(A)二值的BN融合”的相关实现代码，可进行BN融合及融合前后模型对比测试(精度/速度/(大小))
3.11, 调整compression/wqaq/iao中的BN层momentum参数(0.1 —> 0.01),削弱batch统计参数占比,一定程度抑制量化带来的抖动。经实验,量化训练更稳定,acc提升1%左右
3.13, 更新代码结构图
4.6, 修正二值量化训练中W_clip的相关问题(之前由于这个，导致二值量化训练精度上不去，现在已可正常使用)(同时修正无法找到一些模块如models/util_wxax.py的问题)
12.14, 1、improve code structure; 2、add deploy-tensorrt(main module, but not running yet)
12.18, 1、improve code structure/module reference/module_name; 2、add transfer-use demo
12.21, improve pruning-quantization pipeline and code
2021.1.4, add other quant_op
1.5, add quant_weight's per-channel and per-layer selection
1.7, fix iao's loss-nan bug. The bug is due to per-channel min/max error
1.8, 1、improve quant_para save. Now, only save scale and zero_point; 2、add optional weight_observer(MinMaxObserver or MovingAverageMinMaxObserver)
1.11, fix bug in binary_a(1/0) and binary_w preprocessing
1.12, add "pip install"
1.22, add auto_insert_quant_op(this still needs to be improved)
1.27, improve auto_insert_quant_op(now you can easily use quantization, as quant_test_auto)
1.28, 1、fix prune-quantization pipeline and code; 2、improve code structure
2.1, improve wbwtab_bn_fuse
2.4, 1、add wqaq_bn_fuse; 2、add quant_model_inference_simulation; 3、improve code format
4.30, 1、update code_structure img; 2、fix iao's quant_weight_range, quant_contrans and quant_bn_fuse_conv pretrained_model bn_para load bug
5.4, add qaft, it's beneficial to improve the quantization accuracy
5.6, add ptq, its quantization accuracy is also good
5.11, add bn_fuse_calib flag
5.14, 1、change ste to clip_ste, it's beneficial to improve the quant_train；2、remove quant_relu and add quant_leaky_relu
5.15, fix bug in quant_model_para post-processing
6.7, add quant_add(need use base_module's op) and quant_resnet demo
6.9, iao_quant supports multi gpus
6.16, fix quant_round() and quant_binary()
10.6, format

环境要求

python >= 3.5
torch >= 1.1.0
torchvison >= 0.3.0
numpy
onnx == 1.6.0
tensorrt == 7.0.0.11

安装

PyPI

pip install micronet -i https://pypi.org/simple

GitHub

git clone https://github.com/666DZY666/micronet.git
cd micronet
python setup.py install

验证

python -c "import micronet; print(micronet.__version__)"

测试

Install from github

压缩

量化

--refine,可加载预训练浮点模型参数,在其基础上做量化

wbwtab

--W --A, 权重W和特征A量化取值

cd micronet/compression/quantization/wbwtab

WbAb

python main.py --W 2 --A 2

WbA32

python main.py --W 2 --A 32

WtAb

python main.py --W 3 --A 2

WtA32

python main.py --W 3 --A 32

wqaq

--w_bits --a_bits, 权重W和特征A量化位数

dorefa

cd micronet/compression/quantization/wqaq/dorefa

W16A16

python main.py --w_bits 16 --a_bits 16

W8A8

python main.py --w_bits 8 --a_bits 8

W4A4

python main.py --w_bits 4 --a_bits 4

其他bits情况类比

iao

cd micronet/compression/quantization/wqaq/iao

量化位数选择同dorefa

单卡

QAT/PTQ —> QAFT

! 注意，需要在QAT/PTQ之后再做QAFT !

--q_type, 量化类型(0-对称, 1-非对称)

--q_level, 权重量化级别(0-通道级, 1-层级)

--weight_observer, weight_observer选择(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)

--bn_fuse, 量化中bn融合标志

--bn_fuse_calib, 量化中bn融合校准标志

--pretrained_model, 预训练浮点模型

--qaft, qaft标志

--ptq, ptq_observer

--ptq_control, ptq_control

--ptq_batch, ptq的batch数量

--percentile, ptq校准的比例

QAT

默认: 对称、(权重)通道级量化, bn不融合, weight_observer-MinMaxObserver, 不加载预训练浮点模型, 进行qat

python main.py --q_type 0 --q_level 0 --weight_observer 0

对称、(权重)通道级量化, bn不融合, weight_observer-MovingAverageMinMaxObserver

python main.py --q_type 0 --q_level 0 --weight_observer 1

对称、(权重)层级量化, bn不融合

python main.py --q_type 0 --q_level 1

非对称、(权重)通道级量化, bn不融合

python main.py --q_type 1 --q_level 0

非对称、(权重)层级量化, bn不融合

python main.py --q_type 1 --q_level 1

对称、(权重)通道级量化, bn融合

python main.py --q_type 0 --q_level 0 --bn_fuse

对称、(权重)层级量化, bn融合

python main.py --q_type 0 --q_level 1 --bn_fuse

非对称、(权重)通道级量化, bn融合

python main.py --q_type 1 --q_level 0 --bn_fuse

非对称、(权重)层级量化, bn融合

python main.py --q_type 1 --q_level 1 --bn_fuse

对称、(权重)通道级量化, bn融合校准

python main.py --q_type 0 --q_level 0 --bn_fuse --bn_fuse_calib

PTQ

需要加载预训练浮点模型,本项目中其可由剪枝中采用正常训练获取

对称、(权重)通道级量化, bn融合

python main.py --refine ../../../pruning/models_save/nin_gc.pth --q_level 0 --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999

其他情况类比

QAFT

! 注意，需要在QAT/PTQ之后再做QAFT !

QAT —> QAFT

对称、(权重)通道级量化, bn融合

python main.py --resume models_save/nin_gc_bn_fused.pth --q_type 0 --q_level 0 --bn_fuse --qaft --lr 0.00001

其他情况类比

PTQ —> QAFT

对称、(权重)通道级量化, bn融合

python main.py --resume models_save/nin_gc_bn_fused.pth --q_level 0 --bn_fuse --qaft --lr 0.00001 --ptq

其他情况类比

剪枝

稀疏训练 —> 剪枝 —> 微调

cd micronet/compression/pruning

稀疏训练

-sr 稀疏标志

--s 稀疏率(需根据dataset、model情况具体调整)

--model_type 模型类型(0-nin, 1-nin_gc)

nin(正常卷积结构)

python main.py -sr --s 0.0001 --model_type 0

nin_gc(含分组卷积结构)

python main.py -sr --s 0.001 --model_type 1

剪枝

--percent 剪枝率

--normal_regular 正常、规整剪枝标志及规整剪枝基数(如设置为N,则剪枝后模型每层filter个数即为N的倍数)

--model 稀疏训练后的model路径

--save 剪枝后保存的model路径（路径默认已给出, 可据实际情况更改）

正常剪枝(nin)

python normal_regular_prune.py --percent 0.5 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

规整剪枝(nin)

python normal_regular_prune.py --percent 0.5 --normal_regular 8 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

或

python normal_regular_prune.py --percent 0.5 --normal_regular 16 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

分组卷积结构剪枝(nin_gc)

Related Skills

tmux

336.5k

Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.

blogwatcher

336.5k

Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.

Unla

2.1k

🧩 MCP Gateway - A lightweight gateway service that instantly transforms existing MCP Servers and APIs into MCP servers with zero code changes. Features Docker deployment and management UI, requiring no infrastructure modifications.

mcp-gateway-registry

522

Enterprise-ready MCP Gateway & Registry that centralizes AI development tools with secure OAuth authentication, dynamic tool discovery, and unified access for both autonomous AI agents and AI coding assistants. Transform scattered MCP server chaos into governed, auditable tool access with Keycloak/Entra integration.

666DZY666

View profile

View on GitHub

GitHub Stars2.3k

CategoryOperations

Updated7d ago

Forks476

666DZY666/micronet

Languages

Python

Security Score

100/100

Audited on Mar 18, 2026

No findings