InMeMo
[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning
Install / Use
/learn @Jackieam/InMeMoREADME
Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

News
- [2025/04/29] Please check the new version of E-InMeMo!
Environment Setup
conda create -n inmemo python=3.8 -y
conda activate inmemo
The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.
For NVIDIA GeForce RTX 4090, here is the Installation command:
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt
Preparation
Dataset
Download the Pascal-5<sup>i</sup> Dataset from Volumetric-Aggregation-Transformer, and put it under the InMeMo/ path, rename to pascal-5i.
Pre-trained weights for Large-scale Vision Model
Please follow the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.
Prompt Retriever
Foreground Sementation Prompt Retriever
Single Object Detection Prompt Retriever
Training
For foreground segmentation:
# Change the fold for training each split.
python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1
For single object detection:
python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1
Inference
For foreground segmentation
With prompt enhancer
# Change the fold for testing each split.
python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH
Without prompt enhancer
python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples
For single object detection
With prompt enhancer
python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH
Without prompt enhancer
python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples
Performance

Visual Examples

Citation
If you find this work useful, please consider citing us as:
@inproceedings{zhang2024instruct,
title={Instruct Me More! Random Prompting for Visual In-Context Learning},
author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2597--2606},
year={2024}
}
Acknowledgments
Part of the code is borrowed from Visual Prompting, visual_prompt_retrieval, timm, ILM-VP
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
last30days-skill
8.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
