RefLDMSeg

[AAAI 2025] Explore In-Context Segmentation via Latent Diffusion Models

Generate Convert Improve

Install / Use

/learn @wang-chaoyang/RefLDMSeg

About this skill

Quality Score

0/100

README

Explore In-Context Segmentation via Latent Diffusion Models

</div> <div> <p align="center" style="font-size: larger;"> <strong>AAAI 2025</strong> </p> </div> <p align="center"> <img src="assets/teaser.png" width=95%> <p>

Requirements

Install torch==2.1.0.
Install pip packages via pip install -r requirements.txt and alpha_clip.
Our model is based on Stable Diffusion, download and put it into datasets/pretrain. Put the checkpoints of alpha_clip into datasets/pretrain/alpha-clip.

Data Preparation

Please download the following datasets: COCO 2014, DAVIS16, VSPW, and PASCAL, which includes PASCAL VOC 2012 and SBD. And then download the meta files. Put them under datasets and rearrange as follows.

datasets
├── pascal
│   ├── JPEGImages
│   ├── SegmentationClassAug
│   └── metas
├── davis16
│   ├── JPEGImages
│   ├── Annotations
│   └── metas
├── vspw
│   ├── images
│   ├── masks
│   └── metas
└── coco20i
    ├── annotations
    │   ├── train2014
    │   └── val2014
    ├── metas
    ├── train2014
    └── val2014

Train

The codes in scripts is launched by accelerate. The saved path is specified by --output_dir defined in args.

# ldis1
accelerate launch --multi_gpu --num_processes [GPUS] scripts/modelf.py --config configs/cfg.py
# ldisn
accelerate launch --multi_gpu --num_processes [GPUS] scripts/modeln.py --config configs/cfg.py --mask_alpha 0.4

Inference

# ldis1
accelerate launch --multi_gpu --num_processes [GPUS] scripts/modelf.py --config configs/cfg.py --only_val 1 --val_dataset pascal --output_dir [the path of ckpt]
# ldisn
accelerate launch --multi_gpu --num_processes [GPUS] scripts/modeln.py --config configs/cfg.py --only_val 1 --val_dataset pascal --output_dir [the path of ckpt] --mask_alpha 0.4

The pretrained models can be found here.

Citation

If you find our work useful, please kindly consider citing our paper:

@article{wang2024explore,
  title={Explore In-Context Segmentation via Latent Diffusion Models},
  author={Wang, Chaoyang and Li, Xiangtai and Ding, Henghui and Qi, Lu and Zhang, Jiangning and Tong, Yunhai and Loy, Chen Change and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2403.09616},
  year={2024}
}

License

MIT license

Related Skills

node-connect

351.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。