LMFFNet

Real-time semantic segmentation is widely used in the field of autonomous driving and robotics. Most previous networks achieved great accuracy based on a complicated model involving mass computing. The existing lightweight networks generally reduce the parameter sizes by sacrificing the segmentation accuracy. It is critical to balance the parameters and accuracy for real-time semantic segmentation tasks. In this paper, we introduce a Lightweight-Multiscale-Feature-Fusion Network (LMFFNet) mainly composed of three types of components: Split-Extract-Merge Bottleneck (SEM-B) block, Features Fusion Module (FFM), and Multiscale Attention Decoder (MAD). The SEM-B block extracts sufficient features with fewer parameters. FFMs fuse multiscale semantic features to effectively improve the segmentation accuracy. The MAD well recovers the details of the input images through the attention mechanism. Two networks combined with different components are proposed based on the LMFFNet model. Without pretraining, the smaller network of LMFFNet-S achieves 72.7% mIoU on Cityscapes test set at the 512×1024 resolution with only 1.1 M parameters at a reference speed of 98.9 fps running on a GTX1080Ti GPU while the larger version of LMFFNet-L achieves 74.7% mIoU with 1.4 M parameters at 89.6 fps. Besides, 67.7% mIoU at 208.9 fps and 70.3% mIoU at 72.4 fps are respectively achieved for 360 × 480 and 720 × 960 resolutions on CamVid test set using LMFFNet-S while LMFFNet--L achieves 68.1% mIoU at 182.9 fps and 71.0% mIoU at 66.5 fps, correspondingly. The proposed LMFFNets make an adequate trade-off between accuracy and parameter size for real-time inference for semantic segmentation tasks.

Generate Convert Improve

Install / Use

/learn @Greak-1124/LMFFNet

About this skill

Quality Score

0/100

README

Segmentation performance of LMFFNet

<table class="tg"> <thead> <tr> <th class="tg-amwm">Crop Size*</th> <th class="tg-amwm">Dataset</th> <th class="tg-amwm">Pretrained</th> <th class="tg-amwm">Train type</th> <th class="tg-amwm">mIoU</th> <th class="tg-amwm">Params</th> <th class="tg-amwm">Speed</th> </tr> </thead> <tbody> <tr> <td class="tg-baqh">512,1024</td> <td class="tg-baqh">Cityscapes</td> <td class="tg-baqh">No</td> <td class="tg-baqh">trainval</td> <td class="tg-baqh">75.1</td> <td class="tg-baqh">1.35</td> <td class="tg-baqh">118.9</td> </tr> <tr> <td class="tg-c3ow">1024,1024</td> <td class="tg-c3ow">Cityscapes</td> <td class="tg-c3ow">No</td> <td class="tg-c3ow">trainval</td> <td class="tg-c3ow">76.1</td> <td class="tg-baqh">1.35</td> <td class="tg-baqh">-</td> </tr> <tr> <td class="tg-c3ow">360,480</td> <td class="tg-c3ow">CamVid</td> <td class="tg-c3ow">No</td> <td class="tg-c3ow">trainval</td> <td class="tg-c3ow">69.1</td> <td class="tg-baqh">1.35</td> <td class="tg-baqh">116.4</td> </tr> <tr> <td class="tg-c3ow">720,960</td> <td class="tg-c3ow">CamVid</td> <td class="tg-c3ow">No</td> <td class="tg-c3ow">trainval</td> <td class="tg-c3ow">72.0</td> <td class="tg-baqh">1.35</td> <td class="tg-baqh">120.8</td> </tr> </tbody> </table>

* Represents the resolution of the input image cropping in the training phase. We found that when the randomly cropped image is 1024x1024 in the training phase, the network can perform better. If the input image of LMFFNet is randomly cropped to 1024x1024 resolution, 76.1% mIoU can be achieved on cityscapes.

Preparation

You need to download the Cityscapes and CamVid datasets and place the symbolic links or datasets of the Cityscapes and CamVid datasets in the dataset directory. Our file directory is consistent with DABNet (https://github.com/Reagan1311/DABNet).

dataset
  ├── camvid
  |    ├── train
  |    ├── test
  |    ├── val 
  |    ├── trainannot
  |    ├── testannot
  |    ├── valannot
  |    ├── camvid_trainval_list.txt
  |    ├── camvid_train_list.txt
  |    ├── camvid_test_list.txt
  |    └── camvid_val_list.txt
  ├── cityscapes
  |    ├── gtCoarse
  |    ├── gtFine
  |    ├── leftImg8bit
  |    ├── cityscapes_trainval_list.txt
  |    ├── cityscapes_train_list.txt
  |    ├── cityscapes_test_list.txt
  |    └── cityscapes_val_list.txt

How to run

1 Training

1.1 Cityscapes

python train.py

1.2 CamVid

python python train.py --dataset camvid --train_type trainval --max_epochs 1000 --lr 1e-3 --batch_size 8

2 Testing

2.1 Cityscapes

python predict.py --dataset ${camvid, cityscapes} --checkpoint ${CHECKPOINT_FILE}

To convert the training lables to class lables.

python trainID2labelID.py Package the file into xxx.zip Submit the zip file to https://www.cityscapes-dataset.com/submit/. You can get the results from the https://www.cityscapes-dataset.com/submit/.

2.2 CamVid

python test.py --dataset camvid --checkpoint ${CHECKPOINT_FILE}

4. fps

python eval_forward_time.py --size 512,1024

To be continue...

Citation

@article{shi2022lmffnet,
 title={LMFFNet: A Well-Balanced Lightweight Network for Fast and Accurate Semantic Segmentation},
 author={Shi, Min and Shen, Jialin and Yi, Qingming and Weng, Jian and Huang, Zunkai and Luo, Aiwen and Zhou, Yicong},
 journal={IEEE Transactions on Neural Networks and Learning Systems},
 year={2022},
 publisher={IEEE}
}

Reference

https://github.com/xiaoyufenfei/Efficient-Segmentation-Networks

https://github.com/Reagan1311/DABNet

Related Skills

vue-3d-experience-skill

A comprehensive learning roadmap for mastering 3D Creative Development using Vue 3, Nuxt, and TresJS.

A beautifully designed, floating Pomodoro timer that respects your workspace.

roadmap

A beautifully designed, floating Pomodoro timer that respects your workspace.

progress

A beautifully designed, floating Pomodoro timer that respects your workspace.

Greak-1124

View profile

View on GitHub

GitHub Stars20

CategoryProduct

Updated16d ago

Forks3

Greak-1124/LMFFNet

Languages

Python

Security Score

90/100

Audited on Mar 6, 2026

No findings