ETRIS

[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

Generate Convert Improve

Install / Use

/learn @kkakkkka/ETRIS

About this skill

Quality Score

0/100

README

ETRIS

This is an official PyTorch implementation of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation.

News

:triangular_flag_on_post: Updates

:fire::fire: Our new referring image segmentation work DETRIS was accepted to AAAI 2025, which has over a 10 average IoU improvement using adapters with fewer parameters!

Overall Architecture

Preparation

Environment

PyTorch (e.g. 1.8.1+cu111)

Other dependencies in requirements.txt

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Datasets
- The detailed instruction is in prepare_datasets.md

Pretrained weights

Download the pretrained weights of ResNet-50/101 and ViT-B to pretrain

mkdir pretrain && cd pretrain
# ResNet-50
wget https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt
# ResNet-101
wget https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt
# ViT-B
wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt

Quick Start

To do training of ETRIS, modify the script according to your requirement and run:

bash run_scripts/train.sh

If you want to use multi-gpu training, simply modify the gpu in the run_scripts/train.sh. Please notice that you should execute this bash script under the first-level directory (the path with train.py).

To do evaluation of ETRIS, modify the script according to your requirement and run:

bash run_scripts/test.sh

If you want to visualize the results, simply modify the visualize to True in the config file.

Weights

Our model weights have already been open-sourced and can be directly downloaded from Huggingface.

Acknowledgements

The code is based on CRIS. We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

Citation

If ETRIS is useful for your research, please consider citing:

@inproceedings{xu2023bridging,
  title={Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation},
  author={Xu, Zunnan and Chen, Zhihong and Zhang, Yong and Song, Yibing and Wan, Xiang and Li, Guanbin},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={17503--17512},
  year={2023}
}

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>