DeepLearningBenchmarks
Benchmarks across Deep Learning Frameworks in Julia and Python
Install / Use
/learn @avik-pal/DeepLearningBenchmarksREADME
Popular Computer Vision Model Benchmarks
Input Dimensions
- Batch Size = 8, Image = 3 x 224 x 224 (IF NOTHING SPECIFIED / CPU USED)
- Batch Size = 4, Image = 3 x 224 x 224
- Resnet 101
- Resnet 152
GPU USED --- Titan 1080Ti 12 GB
|Model|Framework|Forward Pass|Backward Pass|Total Time|Inference| |:---:|:---:|:---:|:---:|:---:|:---:| |VGG16|Pytorch 0.4.1|0.0245 s|0.0606 s|0.0852 s|0.0234 s| ||Flux 0.6.8+|0.0287 s|0.0760 s|0.1047 s|0.0288 s| |VGG16 BN|Pytorch 0.4.1|0.0271 s|0.0672 s|0.0943 s|0.0273 s| ||Flux 0.6.8+|0.0333 s|0.0818 s|0.1151 s|0.0327 s| |VGG19|Pytorch 0.4.1|0.0281 s|0.0741 s|0.1021 s|0.0280 s| ||Flux 0.6.8+|0.0355 s|0.0923 s|0.1278 s|0.0356 s| |VGG19 BN|Pytorch 0.4.1|0.0321 s|0.0812 s|0.1134 s|0.0325 s| ||Flux 0.6.8+|0.0377 s|0.0965 s|0.1342 s|0.0371 s| |Resnet18|Pytorch 0.4.1|0.0064 s|0.0125 s|0.0190 s|0.0050 s| ||Flux 0.6.8+|0.0079 s|0.0218 s|0.0297 s|0.0079 s| |Resnet34|Pytorch 0.4.1|0.0092 s|0.0216 s|0.0307 s|0.0092 s| ||Flux 0.6.8+|0.0137 s|0.0313 s|0.0450 s|0.0151 s| |Resnet50|Pytorch 0.4.1|0.0155 s|0.0351 s|0.0506 s|0.0152 s| ||Flux 0.6.8+|0.0205 s|0.1795 s|0.2000 s|-| |Resnet101|Pytorch 0.4.1|0.0297 s|0.0379 s|0.0676 s|0.0298 s| ||Flux 0.6.8+|0.0215 s|0.0616 s|0.0831 s|0.0208 s| |Resnet152|Pytorch 0.4.1|0.0431 s|0.05337 s|0.0965 s|0.0429 s| ||Flux 0.6.8+|0.0308 s|0.0807 s|0.1115 s|0.0298 s|
CPU USED --- Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
|Model|Framework|Forward Pass|Backward Pass|Total Time|Inference| |:---:|:---:|:---:|:---:|:---:|:---:| |VGG16|Pytorch 0.4.1|6.6024 s|9.4336 s|16.036 s|6.4216 s| ||Flux 0.6.8+|10.458 s|10.245 s|20.703 s|10.111 s| |VGG16 BN|Pytorch 0.4.1|7.0793 s|9.0536 s|16.132 s|6.7909 s| ||Flux 0.6.8+|29.633 s|18.649 s|49.282 s|24.047 s| |VGG19|Pytorch 0.4.1|8.3075 s|10.899 s|19.207 s|8.0593 s| ||Flux 0.6.8+|12.226 s|12.457 s|24.683 s|12.029 s| |VGG19 BN|Pytorch 0.4.1|8.7794 s|12.739 s|21.519 s|8.4044 s| ||Flux 0.6.8+|28.518 s|21.464 s|49.982 s|22.649 s|
<!-- |Resnet18|Pytorch 0.4.1||||| ||Flux 0.6.8+||||| |Resnet34|Pytorch 0.4.1||||| ||Flux 0.6.8+||||| |Resnet50|Pytorch 0.4.1||||| ||Flux 0.6.8+||||| |Resnet101|Pytorch 0.4.1||||| ||Flux 0.6.8+||||| |Resnet152|Pytorch 0.4.1||||| ||Flux 0.6.8+||||| -->Individual Layer Benchmarks
Layer Descriptions
- Conv3x3/1 = Conv2d, 3x3 Kernel, 1x1 Padding, 1x1 Stride
- Conv5x5/1 = Conv2d, 5x5 Kernel, 2x2 Padding, 1x1 Stride
- Conv3x3/2 = Conv2d, 3x3 Kernel, 1x1 Padding, 2x2 Stride
- Conv5x5/2 = Conv2d, 5x5 Kernel, 2x2 Padding, 2x2 Stride
- Dense = 1024 => 512
- BatchNorm = BatchNorm2d
GPU USED --- Titan 1080Ti 12 GB
|Layer|Framework|Forward Pass|Backward Pass|Total Time| |:---:|:---:|:---:|:---:|:---:| |Conv3x3/1|Pytorch 0.4.1|0.2312 ms|0.5359 ms|0.7736 ms| ||Flux 0.6.8+|0.1984 ms|0.7640 ms|0.9624 ms| |Conv5x5/1|Pytorch 0.4.1|0.2667 ms|0.5345 ms|0.8299 ms| ||Flux 0.6.8+|0.2065 ms|0.8075 ms|1.014 ms| |Conv3x3/2|Pytorch 0.4.1|0.1170 ms|0.2203 ms|0.3376 ms| ||Flux 0.6.8+|0.0927 ms|0.5988 ms|0.6915 ms| |Conv5x5/2|Pytorch 0.4.1|0.1233 ms|0.2162 ms|0.3407 ms| ||Flux 0.6.8+|0.0941 ms|0.6515 ms|0.7456 ms| |Dense|Pytorch 0.4.1|0.0887 ms|0.1523 ms|0.2411 ms| ||Flux 0.6.8+|0.0432 ms|0.2044 ms|0.2476 ms| |BatchNorm|Pytorch 0.4.1|0.1096 ms|0.1999 ms|0.3095 ms| ||Flux 0.6.8+|0.2211 ms|0.2849 ms|0.5060 ms|
<!-- ## CPU USED --- Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz |Layer|Framework|Forward Pass|Backward Pass|Total Time| |:---:|:---:|:---:|:---:|:---:| |Conv3x3/1|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| |Conv5x5/1|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| |Conv3x3/2|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| |Conv5x5/2|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| |Dense|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| |BatchNorm|Pytorch 0.4.1|||| ||Flux 0.6.8+|||| -->NOTE
To reproduce the benchmarks checkout Flux 0.6.8+ avik-pal/cudnn_batchnorm and CuArrays master.
Since the Batchnorm GPU is broken for Flux 0.6.8+ master so we cannot perform the benchmarks using that.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
last30days-skill
17.6kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
