SkillAgentSearch skills...

Efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Install / Use

/learn @mit-han-lab/Efficientvit

README

Efficient Vision Foundation Models for High-Resolution Generation and Perception

PWC

News

  • (🔥 New) [2025/09/05] We will no longer maintain this codebase. All future updates and announcements will be made on DC-Gen.
  • (🔥 New) [2025/01/24] We released DC-AE-SANA-1.1: doc.
  • (🔥 New) [2025/01/23] DC-AE and SANA are accepted by ICLR 2025.
  • (🔥 New) [2025/01/14] We released DC-AE+USiT models: model, training. Using the default training settings and sampling strategy, DC-AE+USiT-2B achieves 1.72 FID on ImageNet 512x512, surpassing the SOTA diffusion model EDM2-XXL and SOTA auto-regressive image generative models (MAGVIT-v2 and MAR-L).

Content

[ICLR 2025] Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models [paper] [readme] [poster]

Deep Compression Autoencoder (DC-AE) is a new family of high-spatial compression autoencoders with a spatial compression ratio of up to 128 while maintaining reconstruction quality. It accelerates all latent diffusion models regardless of the diffusion model architecture.

Demo

demo

<p align="center"> <b> Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders. </p>

demo

<p align="center"> <b> Figure 2: DC-AE speeds up latent diffusion models. </p> <p align="center"> <img src="https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0/resolve/main/assets/dc_ae_sana.jpg" width="1200"> </p> <p align="center"> <b> Figure 3: DC-AE enables efficient text-to-image generation on the laptop: <a href="https://nvlabs.github.io/Sana/">SANA</a>. </p>

[CVPR 2024 eLVM Workshop] EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss [paper] [online demo] [readme]

EfficientViT-SAM is a new family of accelerated segment anything models by replacing SAM's heavy image encoder with EfficientViT. It delivers a 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing accuracy.

<p align="left"> <img src="https://huggingface.co/mit-han-lab/efficientvit-sam/resolve/main/sam_zero_shot_coco_mAP.png" width="500"> </p>

[ICCV 2023] EfficientViT-Classification [paper] [readme]

Efficient image classification models with EfficientViT backbones.

<p align="left"> <img src="https://huggingface.co/han-cai/efficientvit-cls/resolve/main/efficientvit_cls_results.png" width="600"> </p>

[ICCV 2023] EfficientViT-Segmentation [paper] [readme]

Efficient semantic segmantation models with EfficientViT backbones.

demo

EfficientViT-GazeSAM [readme]

Gaze-prompted image segmentation models capable of running in real time with TensorRT on an NVIDIA RTX 4070.

GazeSAM demo

Getting Started

conda create -n efficientvit python=3.10
conda activate efficientvit
pip install -U -r requirements.txt

Third-Party Implementation/Integration

Contact

Han Cai

Reference

If EfficientViT or EfficientViT-SAM or DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{cai2023efficientvit,
  title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},
  author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={17302--17313},
  year={2023}
}
@article{zhang2024efficientvit,
  title={EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss},
  author={Zhang, Zhuoyang and Cai, Han and Han, Song},
  journal={arXiv preprint arXiv:2402.05008},
  year={2024}
}
@article{chen2024deep,
  title={Deep Compression Autoencoder for Efficient Hig

Related Skills

View on GitHub
GitHub Stars3.3k
CategoryDevelopment
Updated1h ago
Forks238

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings