TLCBench

Benchmark scripts for TVM

Generate Convert Improve

Install / Use

/learn @tlc-pack/TLCBench

About this skill

Quality Score

0/100

README

TLCBench

Benchmark scripts for TVM

Content

Requirement
Intel CPU
NVIDIA GPU

Requirement

Tested with
TVM commit id: 91e07e1f3a7 (Feb. 5, 2021)
mxnet==1.7.0
gluonnlp==0.10.0

Intel CPU

Results on AWS c5.9xlarge (Intel Xeon Platinum 8124m @ 3.00GHz 18-core)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.40 ms             (0.08 ms)
mobilenet_v2       1            1.33 ms             (0.05 ms)
bert               1            31.31 ms            (0.11 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            5.30 ms             (0.05 ms)
mobilenet_v2       1            0.91 ms             (0.02 ms)
bert               1            16.52 ms            (0.16 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"  --logdir saved_logs/latest

Tuning

The following commands perform auto-tuning for one or all networks and save tuning logs to directory tmp_logs. After tuning, you can use these logs to run benchmark by using benchmark commands above and replace the last argument with --logdir tmp_logs

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autotvm.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "llvm -mcpu=skylake-avx512 -model=platinum-8124m"

Nvidia GPU

Results on AWS g4dn.4xlarge (NVIDIA T4)

AutoTVM

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            3.54 ms             (0.02 ms)
mobilenet_v2       1            0.74 ms             (0.00 ms)
bert               1            89.06 ms            (1.22 ms)
-------------------------------------------------------------

AutoScheduler

-------------------------------------------------------------
Network Name       Batch size   Mean Inference Time (std dev)
-------------------------------------------------------------
resnet_50          1            2.90 ms             (0.01 ms)
mobilenet_v2       1            0.57 ms             (0.00 ms)
bert               1            9.95 ms             (0.01 ms)
-------------------------------------------------------------

Benchmark All Networks

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for all networks.

Commands for AutoTVM

python3 benchmark_autotvm.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network all --target "cuda -model=t4" --logdir saved_logs/latest

Benchmark One Network

The following commands read pre-tuned logs from directory saved_logs/latest and benchmark the latency for one network. You can replace "resnet_50" below with "mobilenet_v2" or "bert".

Commands for AutoTVM

python3 benchmark_autotvm.py --network resnet_50 --target "cuda -model=t4" --logdir saved_logs/latest

Commands for AutoScheduler

python3 benchmark_autoscheduler.py --network resnet_50 --target "cuda -model=t4"  --logdir saved_logs/latest

Tuning

Commands for AutoTVM

# Tune one network
python3 tune_autotvm.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autotvm.py --network all --target "cuda -model=t4"

Commands for AutoScheduler

# Tune one network
python3 tune_autoscheduler.py --network resnet_50 --target "cuda -model=t4"
# Tune all networks
python3 tune_autoscheduler.py --network all --target "cuda -model=t4"

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary