CLOUDS
[CVPR 2024] Official Implementation of Collaborating Foundation models for Domain Generalized Semantic Segmentation
Install / Use
/learn @yasserben/CLOUDSREADME
Collaborating Foundation models for Domain Generalized Semantic Segmentation
This repository contains the code for the paper: Collaborating Foundation models for Domain Generalized Semantic Segmentation.
Overview
Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work, we take an orthogonal approach to DGSS and propose to use an assembly of CoLlaborative FOUndation models for Domain Generalized Semantic Segmentation (CLOUDS). In detail, CLOUDS is a framework that integrates FMs of various kinds: (i) CLIP backbone for its robust feature represen- tation, (ii) text-to-image generative models to diversify the content, thereby covering various modes of the possible target distribution, and (iii) Segment Anything Model (SAM) for iteratively refining the predictions of the segmentation model. Extensive experiments show that our CLOUDS excels in adapting from synthetic to real DGSS benchmarks and under varying weather conditions, notably outperforming prior methods by 5.6% and 6.7% on averaged mIoU, respectively.
<img src="imgs/main_figure.png" width="1000"> <div style="text-align: center;"> </div>Installation
See installation instructions.
Getting Started
See Preparing Datasets for CLOUDS.
See Getting Started with CLOUDS.
Relevant Files :
train_net.py : The training script of CLOUDS
clouds/clouds.py : This file defines the model class and its forward function, which forms the core of our model's architecture and forward pass logic
generate_txt_im.py : The script to generate a dataset using Stable Diffusion
prompt_llama70b.txt : The text file containing 100 generated prompts using Llama70b-Chat
Checkpoints & Generated dataset
We provide the following checkpoints for CLOUDS:
Citation
If you find our work useful in your research, please consider citing:
@InProceedings{Benigmim_2024_CVPR,
author = {Benigmim, Yasser and Roy, Subhankar and Essid, Slim and Kalogeiton, Vicky and Lathuili\`ere, St\'ephane},
title = {Collaborating Foundation Models for Domain Generalized Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {3108-3119}
}
Acknowledgements
CLOUDS draws its foundation from the following open-source projects, and we'd like to acknowledge their authors for making their source code available :
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
