SaliencyMamba
[AAAI’2025] SalM²: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
Install / Use
/learn @zhao-chunyu/SaliencyMambaREADME
🔥Update
-
2025/08/02: We have added multiple download options for datasets.
- Baidu:
Trafficgaze,DrFixD-rainy,BDDA - Hugging Face:
Trafficgaze,DrFixD-rainy
- Baidu:
-
2025/07/24: The official trained weights have been uploaded. Details, Download
-
2025/03/03: Complete the contents of the code repository.
- Datasets upload:
Trafficgaze✅,DrFixD-rainy✅,BDDA✅ - Environment configuration:
environment✅ - Visualization code: our code in repository.
visualization✅ - Evaluation metrics code: our code in repository. ~~
python✅~~,Matlab (official)✅
- Datasets upload:
-
2024/12/10: Our paper is accepted by AAAI🎉🎉🎉. <a href="https://arxiv.org/pdf/2502.16214" ><img src="fig/arxiv_.png" alt="arxiv" width="50" height="auto" /></a>
-
2024/11/08: Update supplementary materials. Details
-
2024/10/23: We release the uniform saliency dataset loader. You can simply use it by
from utils.datasets import build_dataset. -
2024/07/25: How to use our model (SalM²).
-
2024/07/24: All the code and models are completed.
-
2024/07/05: We collect the possible datasets to use, and make a uniform dataloader.
-
2024/06/14: Our model is proposed !
💬Motivation 🔁
(1) Using semantic information to guide driver attention.
<div align="center"> <img src="fig\Motivation1.png" width="auto" height="auto" /> </div> <b>Solution:</b> We propose a dual-branch network that separately extracts semantic information and image information. The semantic information is used to guide the image information at the deepest level of image feature extraction. (2) Reducing model parameters and computational complexity.
<div align="center"> <img src="fig\para_s.png" style="zoom: 100%;"><img src="fig\flops_s.png" style="zoom: 100%;"> </div> <b>Solution:</b> We develop a highly lightweight saliency prediction network based on the latest Mamba framework, with only <b>0.0785M</b> (<b>88% reduction compared to SOTA</b>) parameters and <b>4.45G FLOPs</b> (<b>37% reduction compared to SOTA</b>).⚡Proposed Model 🔁
we propose a saliency mamba model, named SalM² that uses "Top-down" driving scene semantic information to guide "Bottom-up" driving scene image information to simulate human drivers' attention allocation.
<img src="fig\overview.jpg" style="zoom: 100%;">📖Datasets 🔁
<div align="center"> <table> <thead> <tr> <th>Name</th> <th>Train (video/frame)</th> <th>Valid (video/frame)</th> <th>Test (video/frame)</th> <th>Dataset example</th> </tr> </thead> <tbody> <tr> <td>TrafficGaze</td> <td>49080</td> <td>6655</td> <td>19135</td> <td><img src="fig/TrafficGaze-example.gif" alt="BDDA-3" style="zoom:100%;" /></td> </tr> <tr> <td>DrFixD-rainy</td> <td>52291</td> <td>9816</td> <td>19154</td> <td><img src="fig/DrFixD-rainy-example.gif" alt="BDDA-1" style="zoom:100%;" /></td> </tr> <tr> <td>BDDA</td> <td>286251</td> <td>63036</td> <td>93260</td> <td><img src="fig/BDDA-example.gif" alt="BDDA-0" style="zoom:100%;" /></td> </tr> </tbody> </table> </div> 【note】 For all datasets we will provide our download link with the official link. Please choose according to your needs.(1) TrafficGaze: This dataset is available on BaiduYun (code: SALM) <a href="https://pan.baidu.com/s/1MJaNCcVe7vLSbcDSG0A3-w?pwd=SALM"><img src="fig/baiduyun.jpg" alt="baiduyun" width="50" /></a> or on Hugging Face <a href="https://huggingface.co/datasets/springyu/TrafficGaze"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HuggingFace" width="35" /></a>. We crop 5 frames before and after each video. Official web in link.
(2) DrFixD-rainy: This dataset is available on BaiduYun (code: SALM) <a href="https://pan.baidu.com/s/1wYqS7ZrkKbxfOHZlczvSUA?pwd=SALM"><img src="fig/baiduyun.jpg" alt="baiduyun" width="50" /></a> or on Hugging Face <a href="https://huggingface.co/datasets/springyu/DrFixD_rainy"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HuggingFace" width="35" /></a>. We crop 5 frames before and after each video. Official web in link.
<div align="center"> <table style="width: 100%; table-layout: auto;"> <tr> <th>TrafficGaze</th> <th>DrFixD-rainy</th> <th>BDDA</th> </tr> <tr> <td> ./TrafficGaze<br>   |——fixdata<br>   |  |——fixdata1.mat<br>   |  |——fixdata2.mat<br>   |  |—— ... ...<br>   |  |——fixdata16.mat<br>   |——trafficframe<br>   |  |——01<br>   |  |  |——000001.jpg<br>   |  |  |—— ... ...<br>   |  |——02<br>   |  |—— ... ...<br>   |  |——16<br>   |——test.json<br>   |——train.json<br>   |——valid.json </td> <td> ./DrFixD-rainy<br>   |——fixdata<br>   |  |——fixdata1.mat<br>   |  |——fixdata2.mat<br>   |  |—— ... ...<br>   |  |——fixdata16.mat<br>   |——trafficframe<br>   |  |——01<br>   |  |  |——000001.jpg<br>   |  |  |—— ... ...<br>   |  |——02<br>   |  |—— ... ...<br>   |  |——16<br>   |——test.json<br>   |——train.json<br>   |——valid.json </td> <td> ./BDDA<br>   |——c(3) BDDA: This dataset we uploaded in BaiduYun (code: BDDA) <a href="https://pan.baidu.com/s/1JDUejLifqF3vFOx-3izYdw?pwd=BDDA" ><img src="fig/baiduyun.jpg" alt="baidunyu" width="50" height="auto" /></a>. Some camera videos and gazemap videos frame rate inconsistency, we have matched and cropped them. Some camera videos do not correspond to gazemap videos, we have filtered them. Official web in link.
