FastSegFormer
[ISSN 0168-1699, COMPUT ELECTRON AGR 2024] FastSegFormer: A knowledge distillation-based method for real-time semantic segmentation of surface defects in navel oranges.
Install / Use
/learn @caixiongjiang/FastSegFormerREADME
FastSegFormer: A knowledge distillation-based method for real-time semantic segmentation of surface defects in navel oranges
This is the official repository for our work: FastSegFormer(PDF)
News
This work was accepted for publication in the journal Computers and Electronics in Agriculture on December 29, 2023.
Highlights
- Performance of different models on navel orange dataset (test set) against their detection speed on RTX3060:
- Performance of different models on navel orange dataset (test set) against their parameters:
Updates
- [x] The training and testing codes are available here.(April/25/2023)
- [x] Create PyQT interface for navel orange defect segmentation. (May/10/2023)
- [x] Produce 30 frames of navel orange assembly line simulation video. (May/13/2023)
- [x] Add yolov8n-seg and yolov8-seg instance segmentation training, test, and prediction results.Jump to(December/10/2023)
Demos
- Some demos of the segmentation performance of our proposed FastSegFormer:Original image(left) and Label image(middle) and FastSegFormer-P(right). The original image contains enhanced image.
- A demo of Navel Orange Video Segmentation:Original video(left) and detection video(right). The actual detection video reaches 45~55 fps by weighted half-precision (FP16) quantization technique and multi-thread processing technique.(The actual video detection is the total latency of pre-processing, inference and post-processing of the image). Navel orange defect picture and video detection UI is available at FastSegFormer-pyqt.
Overview
-
An overview of the architecture of our proposed FastSegFormer-P. The architecture of FastSegFormer-E is derived from FastSegFormer-P replacing the backbone network EfficientFormerV2-S0.

-
An overview of the proposed multi-resolution knowledge distillation.(To solve the problem that the size and number of channels of the teacher network and student network feature maps are different:the teacher network's feature maps are down-sampled by bilinear interpolation, and the student network's feature maps are convolved point-by-point to increase the number of channels)

P&KL loss:
$$ L_{logits}(\text{S}) = \frac{1}{W_{s}\times H_{s}}(k_1t^2 \sum_{i \in R}\text{KL}(q_i^s, q_i^t) + (1 - k_1)\sum_{i \in R}\text{MSE}(p_i^s, p_i^t)) $$
Where $q_{i}^s$ represents the class probability of the $i$ th pixel output from the simple network S, $q_{i}^t$ represents the class probability of the $i$ th pixel output from the complex network T, $\text{KL}(\cdot)$ represents Kullback-Leibler divergence, $p_{i}^s$ represents the $i$ th pixel output from the simple network S, $p_{i}^t$ represents the $i$ th pixel output from the complex network T, $\text{MSE}(\cdot)$ represents the mean square error calculation, $R={1,2,..., W_s\times H_s}$ represents all pixels, and $t$ represents the temperature coefficient. In this experiment, $t=2$, $k_1=0.5$.
NFD loss:
$$ L_{n}^{NFD} = \sum_{i=1}^n \frac{1}{W_s\times H_s} L_2(\text{Normal}(F_{i}^t), \text{Normal}(F_{i}^s)) $$
Where $n$ represents the number of intermediate feature maps, $W_s$ and $H_s$ represent the height and width of the simple model feature map, $L_2(\cdot)$ represents the Euclidean calculation of the feature maps, $F_{i}^t$ represents the $i$ th feature map generated by the complex network T, $F_{i}^s$ represents the $i$ th feature map generated by the simple network S, $\text{Normal}$ represents the normalization of the feature maps on $(W, H)$, the $\text{Normal}(\cdot)$ is given as follows:
$$ \bar{F} = \frac{1}{\sigma}(F - u) $$
where $F$ represents the original feature map, $\bar{F}$ represents the feature transform, and $u$ and $\sigma$ represent the mean and standard deviation of the features.
Models
- Pretrained backbone network:
| Model(ImageNet-1K) | Input size | ckpt | |:--------------------:|:----------------:|:--------------------------------------------------------------------------------------------------------------------------------:| | EfficientFormerV2-S0 | $224\times 224$ | download | | EfficientFormerV2-S1 | $224\times 224$ | download | | PoolFormer-S12 | $224\times 224$ | download | | PoolFormer-S24 | $224\times 224$ | download | | PoolFormer-S36 | $224\times 224$ | download | | PIDNet-S | $224\times 224$ | download | | PIDNet-M | $224\times 224$ | download | | PIDNet-L | $224\times 224$ | download |
- Teacher network:
| Model | Input size | mIoU(%) | mPA(%) | Params | GFLOPs | ckpt | |:---------------:|:---------------:|:-------:|:------:|:------:|:------:|:-----------------------------------------------------------------------------------------------------------------------------------------------:| | Swin-T-Att-UNet | $512\times 512$ | 90.53 | 94.65 | 49.21M | 77.80 | download |
- FastSegFormer after fine-tuning and knowledge distillation:
| Model | Input size | mIoU(%) | mPA(%) | Params | GFLOPs | RTX3060(FPS) | RTX3050Ti(FPS) | ckpt | onnx | |:----------------:|:----------------:|:-------:|:------:|:------:|:------:|:------------:|:--------------:|:---------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------:| | FastSegFormer-E | $224\times 224$ | 88.78 | 93.33 | 5.01M | 0.80 | 61 | 54 | download | download | | FastSegFormer-P | $224\times 224$ | 89.33 | 93.78 | 14.87M | 2.70 | 108 | 93 | download | download |
Ablation study
You can see all results and process of our experiment in logs dir, which include ablation study and comparison
with other lightweight models.
- The Acc.(mIoU) of FastSegFormer models with different network structure(PPM, MSP and Image reconstruction branch) on validation set:
- Knowledge distillation(KD) and fine-tuning(†):
| Model | mIoU(%) | mPA(%) | mPrecision(%) | Params | GFLOPs | |:----------------------------------:|:-------:|:------:|:-------------:|:------:|:------:| | FastSegFormer-E | 86.51 | 91.63 |
