GreenMIM

[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.

Generate Convert Improve

Install / Use

/learn @LayneH/GreenMIM

About this skill

Quality Score

0/100

README

GreenMIM

This is the official PyTorch implementation of the NeurIPS 2022 paper Green Hierarchical Vision Transformer for Masked Image Modeling. GreenMIM consists of two key desgins, Group Window Attention and Sparse Convolution. It offers 2.7x faster pre-training and competitive performance on hierarchical vision transformers, e.g., Swin/Twins Transformers.

<p align="center"> <img src="figs/GroupAttention.png" > </p> <p align="center"> Group Attention Scheme. </p> <p align="center"> <img src="figs/GreenMIM.png" > </p> <p align="center"> Method Overview. </p>

Citation

If you find our work interesting or use our code/models, please cite:

@article{huang2022green,
  title={Green Hierarchical Vision Transformer for Masked Image Modeling},
  author={Huang, Lang and You, Shan and Zheng, Mingkai and Wang, Fei and Qian, Chen and Yamasaki, Toshihiko},
  journal={Thirty-Sixth Conference on Neural Information Processing Systems},
  year={2022}
}

News

2023.01: We have refactor the structure of this codebase, supporting most, if not any, vision transformer backbones with various input resolutions. Checkout our implementation of GreenMIM with Twins Transformer here.

Catalogs

[x] Pre-trained checkpoints
[x] Pre-training code for Swin Transformer and Twins Transformer
[x] Fine-tuning code

Pre-trained Models

<table><tbody>   <th valign="bottom"></th> <th valign="bottom">Swin-Base (Window 7x7)</th> <th valign="bottom">Swin-Base (Window 14x14)</th> <th valign="bottom">Swin-Large (Window 14x14)</th>  <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://drive.google.com/file/d/1vCt7QN3rNC7hmWlWYomqfhjUqN-PvR7a/view?usp=sharing">Download</a></td> <td align="center"><a href="https://drive.google.com/file/d/1P1dAdcZtSEGWFQy5GeeJdfGTqesSAES9/view?usp=sharing">Download</a></td> <td align="center"><a href="https://drive.google.com/file/d/1Tw1KeGviVWxbVt3h1TT7BxTX1aLeT-Nm/view?usp=sharing">Download</a></td> </tbody></table>

Pre-training

The pre-training scripts are given in the scripts/ folder. The scripts with names start with 'run*' are for non-slurm users while the others are for slurm users.

For Non-Slurm Users

To train a Swin-B with on a single node with 8 GPUs.

PORT=23456 NPROC=8 bash scripts/run_greenmim_swin_base.sh

For Slurm Users

To train a Swin-B with on a single node with 8 GPUs.

bash scripts/srun_greenmim_swin_base.sh [Partition] [NUM_GPUS]

Fine-tuning on ImageNet-1K

| Model | #Params | Pre-train Resolution | Fine-tune Resolution | Config | Acc@1 (%) | | :---- | ------- | -------------------- | -------------------- | ------ | --------- | | Swin-B (Window 7x7) | 88M | 224x224 | 224x224 | Config | 83.8 | | Swin-L (Window 14x14) | 197M | 224x224 | 224x224 | Config | 85.1 |

Currently, we directly use the code of SimMIM for fine-tuning, please follow their instructions to use the configs. NOTE that, due to the limited computing resource, we use a batch size of a batch size of 768 (48 x 16) for fine-tuning.

Acknowledgement

This code is based on the implementations of MAE, SimMIM, BEiT, SwinTransformer, Twins Transformer, and DeiT.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

mentoring-juniors

Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

LayneH

View profile

View on GitHub

GitHub Stars177

CategoryEducation

Updated1mo ago

Forks6

LayneH/GreenMIM

Languages

Python

Security Score

85/100

Audited on Feb 8, 2026

No findings