BlockConv

This repository serves as the official code release of the paper Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA (pubilished at TCAD 2021).

Block convolution is a hardware-friendly, simple, yet efficient convolution operation that can completely avoid the off-chip transfer of intermediate feature maps at runtime. The fundamental idea of block convolution is to eliminate the dependency of feature map tiles in the spatial dimension when spatial tiling is used, which is realized by splitting a feature map into independent blocks so that convolution can be performed separately on individual blocks.

Installation

Python version >= 3.8
Pytorch version >= 1.8

# create conda environment
conda create -n BlockConv 
conda activate BlockConv
conda install pytorch torchvision cudatoolkit=10.2
pip install torchnet tqdm tabulate gitpython tensorboard

# install from source code
git clone https://github.com/zejiangp/BlockConv.git
cd BlockConv

Training from scratch

--arch: block_resnet18, block_resnet50, block_vgg16, block_mobilenet
--padding_mode: constant (equal to zero padding), replicate, reflect
--type: 0 (Fixed blocking), 1 (hierarchical blocking)

For example, if we want to train a resnet18 with block size 28, fixed blocking mode, and zero padding from scratch, the command as below:

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=logs/  \
    --gpus=0,1,2,3   \
    --arch=block_resnet18\
    --name=resnet18_F28_constant_scratch   \
    --batch_size=128   \
    -j=32   \
    --epochs=90 \
    --lr=0.1   \
    --wd=1e-4  \
    --momentum=0.9 \
    --milestones=30,60 \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --do_train \
    --do_eval

Fine-tuning

Another way to get a model using block convolution is fine-tuning from the pre-trained model:

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=logs/  \
    --gpus=0,1,2,3   \
    --arch=block_resnet18\
    --name=resnet18_F28_constant   \
    --batch_size=128   \
    -j=32   \
    --epochs=30 \
    --lr=0.001   \
    --wd=1e-4  \
    --momentum=0.9 \
    --milestones=10,20 \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --resume_from logs/resnet18_baseline.pth.tar \
    --reset_optimizer \
    --do_train \
    --do_eval

Hyperparamter

| strategy | model | epochs | batch size | learning rate | weight decay | milestones | |:-:|:-:|:-:|:-:|:-:|:-:|:-:| | Training from scratch | resnet18 resnet 50 vgg16 mobilenet | 90 90 105 300 | 128 128 256 128| 0.1 0.1 0.01 0.0001 | 1e-4 1e-4 5e-4 5e-5 | 30,60 30, 60 30, 60, 90 -| | Fine-tuning | resnet18 resnet 50 vgg16 mobilenet | 30 30 20 50| 128 128 256 128 | 0.001 0.001 0.001 0.0001 | 1e-4 1e-4 1e-4 5e-5 | 10, 20 10, 20 8, 16 -|

Evaluation

python classification.py \
    ./data/ilsvrc12   \
    --dataset=imagenet   \
    --out_dir=test_logs/  \
    --gpus=0   \
    --arch=block_vgg16    \
    --name=test_vgg   \
    --batch_size=128   \
    -j=32   \
    --block_size 28,28 \
    --padding_mode constant \
    --type 0 \
    --do_eval \
    --resume_from logs/vgg16_finetune_F28_zero.pth.tar

Model Accuracy

We provide pre-trained models for evaluations here.

TOP-1 accuracy on ImageNet classification task.

| model | baseline | Scratch | Fine-tuning | |:-:|:-:|:-:|:-:| | vgg16 | 71.59% | 70.47% | 71.45% | | resnet18 | 70.60% | 69.94% | 71.21% | | resnet50 | 75.86% | 75.42% | 76.67% | | mobilenetv1 | 72.29% | 72.05% | 71.76% |

Top-1 accuracy of blocked networks with respect to blocking ratio under fixed blocking (F) and hierarchical blocking (H).

| model | H2x2 | H4x4 | H8x8 | H16x16 | F112 | F56 | F28 | F14 | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-: | vgg16 | 70.14% | 70.28% | 70.76% | 71.18% | 71.81% | 71.74% | 71.45% | 70.48% | | resnet18 |70.06% | 70.67% | 71.12% | 70.82% | 71.60% | 71.37% | 71.21% | 70.20% | | mobilenetv1 | 69.96% | 71.49% | 71.53% | 71.50% | 72.16% | 71.89% | 71.76% | 71.13% |

Impact of block padding on classification accuracy.

| model | zero | replicate | reflect | |:-:|:-:|:-:|:-:| | vgg16 | 71.45% | 70.90% | 70.22% | | resnet18 | 71.21% | 70.92% | 70.61% | | resnet50 | 76.67% | 76.71% | 76.47% | | mobilenetv1 | 71.76% | 71.92% | 71.58% |

Citation

If you found the library useful for your work, please kindly cite our work:

@article{Gangli2022BlockConv,  
    author={Li, Gang and Liu, Zejian and Li, Fanrong and Cheng, Jian},  
    journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},   
    title={Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA},   
    year={2022},  
    volume={41},  
    number={5},  
    pages={1436-1447},  
    doi={10.1109/TCAD.2021.3082868}
}

BlockConv

Install / Use

README

BlockConv

Installation

Training from scratch

Fine-tuning

Hyperparamter

Evaluation

Model Accuracy

Citation

Related Skills