InMeMo

[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning

Generate Convert Improve

Install / Use

/learn @Jackieam/InMeMo

About this skill

Quality Score

0/100

README

Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

InMeMo

News

[2025/04/29] Please check the new version of E-InMeMo!

Environment Setup

conda create -n inmemo python=3.8 -y
conda activate inmemo

The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.

For NVIDIA GeForce RTX 4090, here is the Installation command:

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

Preparation

Dataset

Download the Pascal-5<sup>i</sup> Dataset from Volumetric-Aggregation-Transformer, and put it under the InMeMo/ path, rename to pascal-5i.

Pre-trained weights for Large-scale Vision Model

Please follow the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.

Prompt Retriever

Foreground Sementation Prompt Retriever

Single Object Detection Prompt Retriever

Training

For foreground segmentation:

# Change the fold for training each split.
python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

For single object detection:

python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

Inference

For foreground segmentation

With prompt enhancer

# Change the fold for testing each split.
python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples

For single object detection

With prompt enhancer

python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples

Performance

Visual Examples

Visual_result

Citation

If you find this work useful, please consider citing us as:

@inproceedings{zhang2024instruct,
  title={Instruct Me More! Random Prompting for Visual In-Context Learning},
  author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2597--2606},
  year={2024}
}

Acknowledgments

Part of the code is borrowed from Visual Prompting, visual_prompt_retrieval, timm, ILM-VP

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

last30days-skill

8.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary