SkillAgentSearch skills...

Retinaface

Reimplement RetinaFace use C++ and TensorRT

Install / Use

/learn @clancylian/Retinaface
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

RetinaFace C++ Reimplement

source

Reference resources RetinaFace in insightface with python code.

model transformation tool

MXNet2Caffe

you need to add some layers yourself, and in caffe there is not upsample,you can replace with deconvolution,and maybe slight accuracy loss.

the origin model reference from mobilenet25,and I have retrain it.

Demo

$ mkdir build
$ cd build/
$ cmake ../
$ make

you need to modify dependency path in CmakeList file.

Speed

test hardware:1080Ti

test1:

| model | speed | input size | preprocess time | inference | postprocess time | | :------: | :----: | :--------: | :-------------: | :-------: | :--------------: | | mxnet | 44.8ms | 1280x896 | 19.0ms | 8.0ms | 16.0ms | | caffe | 46.9ms | 1280x896 | 5.8ms | 24.1ms | 16.0ms | | tensorrt | 29.3ms | 1280x896 | 6.9ms | 5.4ms | 15.0ms |

test2:

| model | speed | inputsize | preprocess time | inference | postprocess time | | :------: | :----: | :-------: | :-------------: | :-------: | :--------------: | | mxnet | 6.4ms | 320x416 | 1.3ms | 0.1ms | 4.2ms | | caffe | 30.8ms | 320x416 | 1.2ms | 27ms | 2.3ms | | tensorrt | 4.7ms | 320x416 | 0.7ms | 1.9ms | 1.8ms |

tensorrt batch test:

| batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :-----: | :--: | | 1 | 448x448 | 8 | 1.0ms | 2.3ms | 2.6ms | 6.7ms | 35% | | 2 | 448x448 | 8 | 2.5ms | 3.3ms | 5.2ms | 11.8ms | 33% | | 4 | 448x448 | 8 | 4.1ms | 4.6ms | 10.0ms | 21.8ms | 28% | | 8 | 448x448 | 8 | 8.7ms | 7.0ms | 20.3ms | 40.7ms | 23% | | 16 | 448x448 | 32 | 28.1 | 14.7 | 38.7ms | 92.0ms | - | | 32 | 448x448 | 32 | 36.2ms | 26.3 | 75.7ms | 163.5ms | - |

note: batch size have some advantage in inference but can't speed up preprocess and postprocess.

optimize post process:

| batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :----: | :--: | | 1 | 448x448 | 8 | 1.0ms | 2.3ms | 0.09ms | 3.5ms | 70% | | 2 | 448x448 | 8 | 2.2ms | 2.8ms | 0.2ms | 5.3ms | 60% | | 4 | 448x448 | 8 | 3.7ms | 5.0ms | 0.3ms | 8.4ms | 55% | | 8 | 448x448 | 8 | 7.5ms | 6.5ms | 0.67ms | 14.9ms | 50% | | 16 | 448x448 | 32 | 26ms | 13ms | 1.3ms | 41ms | 40% | | 32 | 448x448 | 32 | 32ms | 22ms | 2.7ms | 56.6ms | 50% |

use nvidia npp library to speed up preprocess:

| batchsize | inputsize | maxbatchsize | preprocess time | inference | postprocess time | all | GPU | | :-------: | :-------: | :----------: | :-------------: | :-------: | :--------------: | :----: | :--: | | 1 | 448x448 | 8 | 0.2ms | 2.3ms | 0.1ms | 2.6ms | 91% | | 2 | 448x448 | 8 | 0.3ms | 3.0ms | 0.2ms | 3.5ms | 85% | | 4 | 448x448 | 8 | 0.5ms | 4.1ms | 0.32ms | 5.0ms | 82% | | 8 | 448x448 | 8 | 1.2ms | 6.3ms | 0.77ms | 8.3ms | 79% | | 16 | 448x448 | 32 | 2.2ms | 14ms | 1.3ms | 16.7ms | 80% | | 32 | 448x448 | 32 | 5.0ms | 22ms | 2.8ms | 29.3ms | 77% |

INT8 inference

INT8 calibration table can generate by INT8-Calibration-Tool.

Accuracy

https://raw.githubusercontent.com/clancylian/retinaface/master/data/retinaface-widerface%E6%B5%8B%E8%AF%95.png

View on GitHub
GitHub Stars298
CategoryDevelopment
Updated4mo ago
Forks89

Languages

C++

Security Score

97/100

Audited on Dec 2, 2025

No findings