RegionProxy

[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Generate Convert Improve

Install / Use

/learn @YiF-Zhang/RegionProxy

About this skill

Quality Score

0/100

README

RegionProxy

<div align="center"> <img src="./.github/perf-gflops-param.jpg" height="400"> </div> <p align="center"> <b>Figure 2.</b> Performance vs. GFLOPs on ADE20K val split. </p>

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

CVPR 2022 (Poster) [arXiv]

Installation

Note: recommend using the exact version of the packages to avoid running issues.

Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.
Install timm 0.4.12 and einops:
```
pip install timm==0.4.12 einops
```
This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.

Models

ADE20K

| backbone | Resolution | FLOPs | #params. | mIoU | mIoU (ms+flip) | FPS | download | | ------------ | ---------- | ----- | -------- | ---- | -------------- | ---- | ------------------------------------------------------------ | | ViT-Ti/16 | 512x512 | 3.9G | 5.8M | 42.1 | 43.1 | 38.9 | [model] | | ViT-S/16 | 512x512 | 15G | 22M | 47.6 | 48.5 | 32.1 | [model] | | R26+ViT-S/32 | 512x512 | 16G | 36M | 47.8 | 49.1 | 28.5 | [model] | | ViT-B/16 | 512x512 | 59G | 87M | 49.8 | 50.5 | 20.1 | [model] | | R50+ViT-L/32 | 640x640 | 82G | 323M | 51.0 | 51.7 | 12.7 | [model] | | ViT-L/16 | 640x640 | 326G | 306M | 52.9 | 53.4 | 6.6 | [model] |

Cityscapes

| backbone | Resolution | FLOPs | #params. | mIoU | mIoU (ms+flip) | download | | --------- | ---------- | ----- | -------- | ---- | -------------- | ------------------------------------------------------------ | | ViT-Ti/16 | 768x768 | 69G | 6M | 76.5 | 77.7 | [model] | | ViT-S/16 | 768x768 | 270G | 23M | 79.8 | 81.5 | [model] | | ViT-B/16 | 768x768 | 1064G | 88M | 81.0 | 82.2 | [model] | | ViT-L/16 | 768x768 | - | 307M | 81.4 | 82.7 | [model] |

Evaluation

You may evaluate the model on single GPU by running:

python test.py \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt \
	--eval mIoU

To evaluate on multiple GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 test.py \
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt 
	--eval mIoU

You may add --aug-test to enable multi-scale + flip evaluation. The test.py script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).

Training

The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:

For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:
```
import timm
m = timm.create_model('vit_tiny_patch16_384', pretrained=True)
```
The full list of entries can be found here (vanilla ViTs) and here (hybrid models).
Convert the timm models to mmsegmentation style using this script.

We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:

python -m torch.distributed.launch --nproc_per_node 8 train.py 
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--work-dir /path/to/workdir \
	--options model.pretrained=/path/to/pretrained/model

You may need to adjust data.samples_per_gpu if you plan to train on less GPUs. Please refer to this link for more training optioins.

Citation

@article{zhang2022semantic,
  title={Semantic Segmentation by Early Region Proxy},
  author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2203.14043},
  year={2022}
}

Related Skills

node-connect

334.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

334.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.1k

Commit, push, and open a PR