ETRIS
[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Install / Use
/learn @kkakkkka/ETRISREADME
ETRIS
This is an official PyTorch implementation of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation.
<div align="center" width="300px" height="400px"> <img src="img/demo.gif" alt="teaser" height="280px" /> </div>News
:triangular_flag_on_post: Updates
- :fire::fire: Our new referring image segmentation work DETRIS was accepted to AAAI 2025, which has over a 10 average IoU improvement using adapters with fewer parameters!
Overall Architecture
<img src="img/arch.png">Preparation
-
Environment
- PyTorch (e.g. 1.8.1+cu111)
- Other dependencies in
requirements.txtpip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html pip install -r requirements.txt
-
Datasets
- The detailed instruction is in prepare_datasets.md
-
Pretrained weights
- Download the pretrained weights of ResNet-50/101 and ViT-B to
pretrainmkdir pretrain && cd pretrain # ResNet-50 wget https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt # ResNet-101 wget https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt # ViT-B wget https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt
- Download the pretrained weights of ResNet-50/101 and ViT-B to
Quick Start
To do training of ETRIS, modify the script according to your requirement and run:
bash run_scripts/train.sh
If you want to use multi-gpu training, simply modify the gpu in the run_scripts/train.sh. Please notice that you should execute this bash script under the first-level directory (the path with train.py).
To do evaluation of ETRIS, modify the script according to your requirement and run:
bash run_scripts/test.sh
If you want to visualize the results, simply modify the visualize to True in the config file.
Weights
Our model weights have already been open-sourced and can be directly downloaded from Huggingface.
Acknowledgements
The code is based on CRIS. We thank the authors for their open-sourced code and encourage users to cite their works when applicable.
Citation
If ETRIS is useful for your research, please consider citing:
@inproceedings{xu2023bridging,
title={Bridging vision and language encoders: Parameter-efficient tuning for referring image segmentation},
author={Xu, Zunnan and Chen, Zhihong and Zhang, Yong and Song, Yibing and Wan, Xiang and Li, Guanbin},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={17503--17512},
year={2023}
}
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
