HELP
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)
Install / Use
/learn @HayeonLee/HELPREADME
[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper]
This is Official PyTorch implementation for HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning.
@inproceedings{lee2021help,
title = {HELP: Hardware-Adaptive Efficient Latency Prediction for NAS via Meta-Learning},
author = {Lee, Hayeon and Lee, Sewoong and Chong, Song and Hwang, Sung Ju},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
Overview
<img align="middle" width="700" src="data/help_concept.png"> For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on hardware-aware NAS collect a large number of samples (e.g., accuracy and latency) from a target device, either builds a lookup table or a latency estimator. However, such approach is impractical in real-world scenarios as there exist numerous devices with different hardware specifications, and collecting samples from such a large number of devices will require prohibitive computational and monetary cost. To overcome such limitations, we propose Hardware-adaptive Efficient Latency Predictor (HELP), which formulates the device-specific latency estimation problem as a meta-learning problem, such that we can estimate the latency of a model's performance for a given task on an unseen device with a few samples. To this end, we introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner, using the hardware embeddings. We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines. We also validate end-to-end NAS frameworks using HELP against ones without it, and show that it largely reduces the total time cost of the base NAS method, in latency-constrained settings.Prerequisites
- Python 3.8 (Anaconda)
- PyTorch 1.8.1
- CUDA 10.2
Hardware spec used for meta-training the proposed HELP model
- GPU: A single Nvidia GeForce RTX 2080Ti
- CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Installation
$ conda create --name help python=3.8
$ conda activate help
$ conda install pytorch==1.8.1 torchvision cudatoolkit=10.2 -c pytorch
$ pip install nas-bench-201
$ pip install tqdm
$ conda install scipy
$ conda install pyyaml
$ conda install tensorboard
Contents
1. Experiments on NAS-Bench-201 Search Space
2. Experiments on FBNet Search Space
3. Experiments on OFA Search Space
4. Experiments on HAT Search Space
1. Reproduce Main Results on NAS-Bench-201 Search Space
We provide the code to reproduce the main results on NAS-Bench-201 search space as follows:
- Computing architecture ranking correlation between latencies estimated by HELP and true measured latencies on unseen devices (Table 3).
- Latency-constrained NAS Results with MetaD2A + HELP on unseen devices (Table 4).
- Meta-Training HELP model.
1.1. Data Preparation and Model Checkpoint
We include all required datasets and checkpoints in this github repository.
1.2. [Meta-Test] Architecture ranking correlation
You can compute architecture ranking correlation between latencies estimated by HELP and true measured latencies on unseen devices on NAS-Bench-201 search space (Table 3):
$ python main.py --search_space nasbench201 \
--mode 'meta-test' \
--num_samples 10 \
--num_meta_train_sample 900 \
--load_path [Path of Checkpoint File] \
--meta_train_devices '1080ti_1,1080ti_32,1080ti_256,silver_4114,silver_4210r,samsung_a50,pixel3,essential_ph_1,samsung_s7' \
--meta_valid_devices 'titanx_1,titanx_32,titanx_256,gold_6240' \
--meta_test_devices 'titan_rtx_256,gold_6226,fpga,pixel2,raspi4,eyeriss'
You can use checkpoint file provided by this git repository ./data/nasbench201/checkpoint/help_max_corr.pt as follows:
$ python main.py --search_space nasbench201 \
--mode 'meta-test' \
--num_samples 10 \
--num_meta_train_sample 900 \
--load_path './data/nasbench201/checkpoint/help_max_corr.pt' \
--meta_train_devices '1080ti_1,1080ti_32,1080ti_256,silver_4114,silver_4210r,samsung_a50,pixel3,essential_ph_1,samsung_s7' \
--meta_valid_devices 'titanx_1,titanx_32,titanx_256,gold_6240' \
--meta_test_devices 'titan_rtx_256,gold_6226,fpga,pixel2,raspi4,eyeriss'
or you can use provided script:
$ bash script/run_meta_test_nasbench201.sh [GPU_NUM]
Architecture Ranking Correlation Results (Table 3) | Method | # of Training Samples <br> From Target Device | Desktop GPU <br> (Titan RTX Batch 256) | Desktop CPU <br> (Intel Gold 6226) | Mobile <br> Pixel2 | Raspi4 | ASIC | FPGA | Mean | |:-----------------------: |:-----------------------------------: |:-----------: |:-----------: |:-------------: |:---------: |:---------: |:---------: |:---------: | | FLOPS | - | 0.950 | 0.826 | 0.765 | 0.846 | 0.437 | 0.900 | 0.787 | | Layer-wise Predictor | - | 0.667 | 0.866 | - | - | - | - | 0.767 | | BRP-NAS | 900 | 0.814 | 0.796 | 0.666 | 0.847 | 0.811 | 0.801 | 0.789 | | BRP-NAS <br> (+extra samples) | 3200 | 0.822 | 0.805 | 0.693 | 0.853 | 0.830 | 0.828 | 0.805 | | HELP (Ours) | 10 | 0.987 | 0.989 | 0.802 | 0.890 | 0.940 | 0.985 | 0.932 |
1.3. [Meta-Test] Efficient Latency-constrained NAS combined with MetaD2A
You can reproduce latency-constrained NAS results with MetaD2A + HELP on unseen devices on NAS-Bench-201 search space (Table 4):
$ python main.py --search_space nasbench201 --mode 'nas' \
--load_path [Path of Checkpoint File] \
--sampled_arch_path 'data/nasbench201/arch_generated_by_metad2a.txt' \
--nas_target_device [Device] \
--latency_constraint [Latency Constraint]
For example, if you use checkpoint file provided by this git repository, then path of checkpoint file is ./data/nasbench201/checkpoint/help_max_corr.pt, if you set target device as CPU Intel Gold 6226 (gold_6226) with batch size 256 and target latency constraint as 11.0 (ms), command is as follows:
$ python main.py --search_space nasbench201 --mode 'nas' \
--load_path './data/nasbench201/checkpoint/help_max_corr.pt' \
--sampled_arch_path 'data/nasbench201/arch_generated_by_metad2a.txt' \
--nas_target_device gold_6226 \
--latency_constraint 11.0
or you can use provided script:
$ bash script/run_nas_metad2a.sh [GPU_NUM]
Efficient Latency-constrained NAS Results (Table 4)
| Device | # of Training Samples <br> from Target Device | Latency <br> Constraint (ms) | Latency <br> (ms) | Accuracy <br> (%) | Neural Architecture <br> Config |
|:---------------------------------------------------: |:------------------------------: |:----------------------------: |:------------------------: |:------------------------: |:------------------------: |
| GPU Titan RTX <br> (Batch 256) <br> titan_rtx_256 | 10 | 18.0 <br> 21.0 <br> 25.0 | 17.8 <br> 18.9 <br> 24.2 | 69.7 <br> 71.5 <br> 71.8 | link <br> link <br> link |
| CPU Intel Gold 6226 <br> gold_6226 | 10 | 8.0 <br> 11.0 <br> 14.0 | 8.0 <br> 10.7 <br> 14.3 | 67.3 <br> 70.2 <br> 72.1 | link <br> link <br> link |
| Mobile Pixel2 <br> pixel2 | 10 | 14.0 <br> 18.0 <br> 22.0 | 13.0 <br> 19.0 <br> 25.0 | 69.7 <br> 71.8 <br> 73.2 | link <br> link <br> [link](https://github.com/HayeonLee/HELP/blob/main/data/nasbench201/a
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
