DAFormer
[CVPR22] Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Install / Use
/learn @lhoyer/DAFormerREADME
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
by Lukas Hoyer, Dengxin Dai, and Luc Van Gool
[CVPR22 Paper] [Extension Paper]
:bell: News:
- [2024-07-03] We are happy to announce that our work SemiVL on semi-supervised semantic segmentation with vision-language guidance was accepted at ECCV24.
- [2024-07-03] We are happy to announce that our follow-up work DGInStyle on image diffusion for domain-generalizable semantic segmentation was accepted at ECCV24.
- [2023-09-26] We are happy to announce that our Extension Paper on domain generalization and clear-to-adverse-weather UDA was accapted at PAMI.
- [2023-08-25] We are happy to announce that our follow-up work EDAPS on panoptic segmentation UDA was accepted at ICCV23.
- [2023-04-23] We further extend DAFormer to domain generalization and clear-to-adverse-weather UDA in the Extension Paper.
- [2023-02-28] We are happy to announce that our follow-up work MIC on context-enhanced UDA was accepted at CVPR23.
- [2022-07-06] We are happy to announce that our follow-up work HRDA on high-resolution UDA was accepted at ECCV22.
- [2022-03-09] We are happy to announce that DAFormer was accepted at CVPR22.
Overview
As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA).
Even though a large number of methods propose new UDA strategies, they are mostly based on outdated network architectures. In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a network architecture tailored for UDA. It consists of a Transformer encoder and a multi-level context-aware feature fusion decoder.
DAFormer is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a Learning Rate Warmup promote feature transfer from ImageNet pretraining.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes and enables learning even difficult classes such as train, bus, and truck well.

The strengths of DAFormer, compared to the previous state-of-the-art UDA method ProDA, can also be observed in qualitative examples from the Cityscapes validation set.

DAFormer can be further extended to domain generalization lifting the requirement of access to target images. Also in domain generalization, DAFormer significantly improves the state-of-the-art performance by +6.5 mIoU.
For more information on DAFormer, please check our [CVPR Paper] and the [Extension Paper].
If you find this project useful in your research, please consider citing:
@InProceedings{hoyer2022daformer,
title={{DAFormer}: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation},
author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={9924--9935},
year={2022}
}
@Article{hoyer2024domain,
title={Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation},
author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)},
year={2024},
volume={46},
number={1},
pages={220-235},
doi={10.1109/TPAMI.2023.3320613}}
}
Comparison with State-of-the-Art UDA
DAFormer significantly outperforms previous works on several UDA benchmarks. This includes synthetic-to-real adaptation on GTA→Cityscapes and Synthia→Cityscapes as well as clear-to-adverse-weather adaptation on Cityscapes→ACDC and Cityscapes→DarkZurich.
| | GTA→CS(val) | Synthia→CS(val) | CS→ACDC(test) | CS→DarkZurich(test) | |---------------------|----------------|--------------------|-----------------|-----------------------| | ADVENT [1] | 45.5 | 41.2 | 32.7 | 29.7 | | BDL [2] | 48.5 | -- | 37.7 | 30.8 | | FDA [3] | 50.5 | -- | 45.7 | -- | | DACS [4] | 52.1 | 48.3 | -- | -- | | ProDA [5] | 57.5 | 55.5 | -- | -- | | MGCDA [6] | -- | -- | 48.7 | 42.5 | | DANNet [7] | -- | -- | 50.0 | 45.2 | | DAFormer (Ours) | 68.3 | 60.9 | 55.4* | 53.8* |
* New results of our extension paper
References:
- Vu et al. "Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation" in CVPR 2019.
- Li et al. "Bidirectional learning for domain adaptation of semantic segmentation" in CVPR 2019.
- Yang et al. "Fda: Fourier domain adaptation for semantic segmentation" in CVPR 2020.
- Tranheden et al. "Dacs: Domain adaptation via crossdomain mixed sampling" in WACV 2021.
- Zhang et al. "Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation" in CVPR 2021.
- Sakaridis et al. "Map-guided curriculum domain adaptation and uncertaintyaware evaluation for semantic nighttime image segmentation" in TPAMI, 2020.
- Wu et al. "DANNet: A one-stage domain adaptation network for unsupervised nighttime semantic segmentation" in CVPR, 2021.
Comparison with State-of-the-Art Domain Generalization (DG)
DAFormer significantly outperforms previous works on domain generalization from GTA to real street scenes.
| DG Method | Cityscapes | BDD100K | Mapillary | Avg. | |-----------------|----------------|----------------|------------------|----------------| | IBN-Net [1,5] | 37.37 | 34.21 | 36.81 | 36.13 | | DRPC [2] | 42.53 | 38.72 | 38.05 | 39.77 | | ISW [3,5] | 37.20 | 33.36 | 35.57 | 35.38 | | SAN-SAW [4] | 45.33 | 41.18 | 40.77 | 42.43 | | SHADE [5] | 46.66 | 43.66 | 45.50 | 45.27 | | DAFormer (Ours) | 52.65* | 47.89* | 54.66* | 51.73* |
* New results of our extension paper
References:
- Pan et al. "Two at once: Enhancing learning and generalization capacities via IBN-Net" in ECCV, 2018.
- Yue et al. "Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data" ICCV, 2019.
- Choi et al. "RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening" in CVPR, 2021.
- Peng et al. "Semantic-aware domain generalized segmentation" in CVPR, 2022.
- Zhao et al. "Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation" in ECCV, 2022.
Setup Environment
For this project, we used python 3.8.5. We recommend setting up a new virtual environment:
python -m venv ~/venv/daformer
source ~/venv/daformer/bin/activate
In that environment, the requirements can be installed with:
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7 # requires the other packages to be installed first
Please, download the MiT ImageNet weights (b3-b5) provided by SegFormer
from their OneDrive and put them in the folder pretrained/.
Further, download the checkpoint of DAFormer on GTA→Cityscapes and extract it to the folder work_dirs/.
All experiments were executed on an NVIDIA RTX 2080 Ti.
Inference Demo
Already as this point, the provided DAFormer model can be applied to a demo image:
python -m demo.image_demo demo/demo.png work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/211108_1622_gta2cs_daformer_s0_7f24c.json work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/latest.pth
When judging the predictions, please keep in mind that DAFormer had no access to real-world labels during the training.
Setup Datasets
Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_train
Related Skills
node-connect
334.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.1kCommit, push, and open a PR
