SPViT

[TPAMI 2024] This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Generate Convert Improve

Install / Use

/learn @ziplab/SPViT

About this skill

Quality Score

0/100

README

<h1 align="center">[TPAMI 2024] Pruning Self-attentions into Convolutional Layers in Single Path</h1>

This is the official repository for our paper: Pruning Self-attentions into Convolutional Layers in Single Path by Haoyu He, Jianfei Cai, Jing liu, Zizheng Pan, Jing Zhang, Dacheng Tao and Bohan Zhuang.

<h3><strong><i>🚀 News</i></strong></h3>
[2023-12-29]: Accepted by TPAMI!

[2023-06-09]: Update distillation configurations and pre-trained checkpoints.

[2021-12-04]: Release pre-trained models.

[2021-11-25]: Release code.

Introduction:

To reduce the massive computational resource consumption for ViTs and add convolutional inductive bias, our SPViT prunes pre-trained ViT models into accurate and compact hybrid models by pruning self-attentions into convolutional layers. Thanks to the proposed weight-sharing scheme between self-attention and convolutional layers that cast the search problem as finding which subset of parameters to use, our SPViT has significantly reduced search cost.

Experimental results:

We provide experimental results and pre-trained models for SPViT:

| Name | Acc@1 | Acc@5 | # parameters | FLOPs | Model | | :------------ | :---: | :---: | ------------ | ----- | ------------------------------------------------------------ | | SPViT-DeiT-Ti | 70.7 | 90.3 | 4.9M | 1.0G | Model | | SPViT-DeiT-Ti* | 73.2 | 91.4 | 4.9M | 1.0G | Model | | SPViT-DeiT-S | 78.3 | 94.3 | 16.4M | 3.3G | Model | | SPViT-DeiT-S* | 80.3 | 95.1 | 16.4M | 3.3G | Model | | SPViT-DeiT-B | 81.5 | 95.7 | 46.2M | 8.3G | Model | | SPViT-DeiT-B* | 82.4 | 96.1 | 46.2M | 8.3G | Model |

| Name | Acc@1 | Acc@5 | # parameters | FLOPs | Model | | :------------ | :---: | :---: | ------------ | ----- | ------------------------------------------------------------ | | SPViT-Swin-Ti | 80.1 | 94.9 | 26.3M | 3.3G | Model | | SPViT-Swin-Ti* | 81.0 | 95.3 | 26.3M | 3.3G | Model | | SPViT-Swin-S | 82.4 | 96.0 | 39.2M | 6.1G | Model | | SPViT-Swin-S* | 83.0 | 96.4 | 39.2M | 6.1G | Model |

* indicates knowledge distillation.

Getting started:

In this repository, we provide code for pruning two representative ViT models.

SPViT-DeiT that prunes DeiT. Please see SPViT_DeiT/README.md for details.
SPViT-Swin that prunes Swin. Please see SPViT_Swin/README.md for details.

If you find our paper useful, please consider cite:

@article{he2024Pruning,
  title={Pruning Self-attentions into Convolutional Layers in Single Path},
  author={He, Haoyu and Liu, Jing and Pan, Zizheng and Cai, Jianfei and Zhang, Jing and Tao, Dacheng and Zhuang, Bohan},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。