SCS
"Visual Prompt Selection for In-Context Learning Segmentation Framework"
Install / Use
/learn @LanqingL/SCSREADME
Visual Prompt Selection for In-Context Learning Segmentation 
This repository is the implementation of the paper, for more info about this work see Project Page.
<div align="center"> <img src="assets/model.png"> </div>Abstract
As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual prompts or simply apply similarity sorting to select contextual examples. In this paper, we focus on rethinking and improving the example selection strategy. By comprehensive comparisons, we first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation. Based on the above insights, we propose a new stepwise context search method. Different from previous works, we construct a small yet rich candidate pool and adaptively search the well-matched contexts. More importantly, this method effectively reduces the annotation cost by compacting the search space. Extensive experiments show that our method is an effective strategy for selecting examples and enhancing segmentation performance.
Dataset preparation
Our evaluation pipeline is based on Volumetric Aggregation Transformer. Please follow the dataset preparation steps for PASCAL-5i dataset in this repository.
Prerequisites
Pytorch installation, set cudatoolkit to your cuda version or choose an installation using these instructions.
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 -c pytorch -c conda-forge
Other dependencies are provided in requirements.txt, install them by:
pip install -r requirements.txt
Train:
The following runs the training of visual prompt selection, noted that visual backbone was initialized with the weights of "vit_large_patch14_clip_224.laion2b_ft_in12k_in1k":
python train.py \
--model seggpt_vit_large_patch16_input896x448 \
--engine_ckpt_path <ICL_model_ckpt_path> \
--cluster_num <cluster_num> \
--sample_num <sample_num> \
--fold <data_fold> \
--BENCHMARK <dataset_name> \
--output_root <outputs_dir>
Inference
python test.py \
--model seggpt_vit_large_patch16_input896x448 \
--engine_ckpt_path <ICL_model_ckpt_path> \
--cluster_num <cluster_num> \
--shot <prompt_num> \
--fold <data_fold> \
--BENCHMARK <dataset_name> \
--output_root <outputs_dir> \
--ckpt_projector_path <project_ckpt_relative_path> \
--ckpt_predictor_path <predictor_ckpt_relative_path>
Citations
If you found our work useful, please consider citing our paper:
@inproceedings{Suo2024VisualPS,
title={Visual Prompt Selection for In-Context Learning Segmentation},
author={Wei Suo and Lanqing Lai and Mengyang Sun and Hanwang Zhang and Peng Wang and Yanning Zhang},
year={2024},
url={https://api.semanticscholar.org/CorpusID:271213205}
}
Acknowledgement
visual_prompting, SegGPT, SAM, Matcher and Personalize-SAM.
License
This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
last30days-skill
8.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
