AttnFD
[WACV'26] Attention as Geometric Transformation: Revisiting Feature Distillation for Semantic Segmentation
Install / Use
/learn @AmirMansurian/AttnFDREADME
Attention as Geometric Transformation: Revisiting Feature Distillation for Semantic Segmentation (WACV26 - LENS Workshop)
The source code of (Attention as Geometric Transformation: Revisiting Feature Distillation for Semantic Segmentation).
Also, see our other works:
- Adaptive Inter-Class Similarity Distillation for Semantic Segmentation
- A Comprehensive Survey on Knowledge Distillation
Requirements
- Python3
- PyTorch (> 0.4.1)
- torchvision
- numpy
- scipy
- tqdm
- matplotlib
- pillow
Datasets and Models
- Datasets: [PascalVoc] [Cityscapes]
- Teacher model: [ResNet101-DeepLabV3+]
Download the datasets and teacher models. Put the teacher model in pretrained/ and set the path to the datasets in mypath.py.
Training
-
Without distillation
python train.py --backbone resnet18 --dataset pascal --nesterov --epochs 120 --batch-size 6 -
Distillation
python train_kd.py --backbone resnet18 --dataset pascal --nesterov --epochs 120 --batch-size 6 --attn_lambda 2
Experimental Results
Comparison of results on the PascalVOC dataset.
| Method | mIoU(%) | Params(M) | | ------------------------------------ | ------------------ | --------- | | Teacher: Deeplab-V3 + (ResNet-101) | 77.85 | 59.3 | | Student: Deeplab-V3 + (ResNet-18) | 67.50 | 16.6 | | Student + KD | 69.13 ± 0.11 | 16.6 | | Student + Overhaul | 70.67 ± 0.25 | 16.6 | | Student + DistKD | 69.84 ± 0.11 | 5.9 | | Student + CIRKD | 71.02 ± 0.11 | 5.9 | | Student + LAD | 71.42 ± 0.09 | 5.9 | | Student + AttnFD (ours) | 73.09 ± 0.06 | 5.9 |
Comparison of results on the Cityscapes dataset.
| Method | mIoU(%) | Accuracy(%) | | ----------------- | -------- | ----------- | | Teacher: ResNet101 | 77.66 | 84.05 | | Student: ResNet18 | 64.09 | 74.8 | | Student + KD | 65.21 (+1.12) | 76.32 (+1.74) | | Student + Overhaul | 70.31 (+6.22) | 80.10 (+5.3) | | Student + DistKD | 71.81 (+7.72) | 80.73 (+5.93) | | Student + CIRKD | 70.49 (+6.40) | 79.99 (+5.19) | | Student + LAD | 71.37 (+7.28) | 80.93 (+6.13) | | Student + AttnFD (ours) | 73.04 (+8.95) | 83.01 (+8.21) |
Citation
If you use this repository for your research or wish to refer to our distillation method, please use the following BibTeX entries:
@inproceedings{mansourian2026attention,
title={Attention as Geometric Transformation: Revisiting Feature Distillation for Semantic Segmentation},
author={Mansourian, Amirmohammad and Jalali, Arya and Ahmadi, Rozhan and Kasaei, Shohreh},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1287--1297},
year={2026}
}
@article{mansourian2025enriching,
title={Enriching Knowledge Distillation with Cross-Modal Teacher Fusion},
author={Mansourian, Amir M and Babaei, Amir Mohammad and Kasaei, Shohreh},
journal={arXiv preprint arXiv:2511.09286},
year={2025}
}
@article{mansourian2025a,
title={A Comprehensive Survey on Knowledge Distillation},
author={Amir M. Mansourian and Rozhan Ahmadi and Masoud Ghafouri and Amir Mohammad Babaei and Elaheh Badali Golezani and Zeynab yasamani ghamchi and Vida Ramezanian and Alireza Taherian and Kimia Dinashi and Amirali Miri and Shohreh Kasaei},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025}
}
@article{mansourian2025aicsd,
title={AICSD: Adaptive inter-class similarity distillation for semantic segmentation},
author={Mansourian, Amir M and Ahamdi, Rozhan and Kasaei, Shohreh},
journal={Multimedia Tools and Applications},
pages={1--20},
year={2025},
publisher={Springer}
}
Acknowledgement
This codebase is heavily borrowed from A Comprehensive Overhaul of Feature Distillation . Thanks for their excellent work.
