SkillAgentSearch skills...

SaliencyMamba

[AAAI’2025] SalM²: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention

Install / Use

/learn @zhao-chunyu/SaliencyMamba
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <a name="start-anchor"></a> </div> <div align="center"> <img src="fig\title_logo.jpg" alt="logo" width="800" height="auto" /> </div> <div align="center">

arXiv AAAI 2025 Paper License: MIT GitHub GitHub GitHub

Baidu TrafficGaze Baidu DrFixD(Rainy) Baidu BDDA HF TrafficGaze HF DrFixD(Rainy)

</div> <div align="center"> <b>Authors: <a href="https://scholar.google.com.hk/citations?user=IOeG3ygAAAAJ&hl=zh-CN" target="_blank">Chunyu Zhao</a>, Wentao Mu, Xian Zhou, <a href="https://scholar.google.com.hk/citations?user=evBOeoAAAAAJ&hl=zh-CN" target="_blank">Wenbo Liu</a>, Fei Yan, <a href="https://scholar.google.com.hk/citations?user=WQ2hfUYAAAAJ&hl=zh-CN" target="_blank">Tao Deng</a><sup>📧</sup> </b> </div> <div align="center"> <b>Contact: springyu.zhao@foxmail.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;📧: corresponding author</b> </div> <div align="center"> <img src="fig/demo-example1.gif" alt="BDDA-1" width="200" height="auto" /> <img src="fig/demo-example2.gif" alt="BDDA-2" width="200" height="auto" /> <img src="fig/demo-example3.gif" alt="BDDA-2" width="200" height="auto" /> </div>

🔥Update

  • 2025/08/02: We have added multiple download options for datasets.

    • Baidu: Trafficgaze, DrFixD-rainy, BDDA
    • Hugging Face: Trafficgaze, DrFixD-rainy
  • 2025/07/24: The official trained weights have been uploaded. Details, Download

  • 2025/03/03: Complete the contents of the code repository.

    • Datasets upload: Trafficgaze✅, DrFixD-rainy✅, BDDA
    • Environment configuration: environment
    • Visualization code: our code in repository. visualization
    • Evaluation metrics code: our code in repository. ~~python✅~~, Matlab (official)
  • 2024/12/10: Our paper is accepted by AAAI🎉🎉🎉. <a href="https://arxiv.org/pdf/2502.16214" ><img src="fig/arxiv_.png" alt="arxiv" width="50" height="auto" /></a>

  • 2024/11/08: Update supplementary materials. Details

  • 2024/10/23: We release the uniform saliency dataset loader. You can simply use it by from utils.datasets import build_dataset.

  • 2024/07/25: How to use our model (SalM²).

  • 2024/07/24: All the code and models are completed.

  • 2024/07/05: We collect the possible datasets to use, and make a uniform dataloader.

  • 2024/06/14: Our model is proposed !

💬Motivation 🔁

(1) Using semantic information to guide driver attention.

<div align="center"> <img src="fig\Motivation1.png" width="auto" height="auto" /> </div> <b>Solution:</b> We propose a dual-branch network that separately extracts semantic information and image information. The semantic information is used to guide the image information at the deepest level of image feature extraction.

(2) Reducing model parameters and computational complexity.

<div align="center"> <img src="fig\para_s.png" style="zoom: 100%;"><img src="fig\flops_s.png" style="zoom: 100%;"> </div> <b>Solution:</b> We develop a highly lightweight saliency prediction network based on the latest Mamba framework, with only <b>0.0785M</b> (<b>88% reduction compared to SOTA</b>) parameters and <b>4.45G FLOPs</b> (<b>37% reduction compared to SOTA</b>).

⚡Proposed Model 🔁

we propose a saliency mamba model, named SalM² that uses "Top-down" driving scene semantic information to guide "Bottom-up" driving scene image information to simulate human drivers' attention allocation.

<img src="fig\overview.jpg" style="zoom: 100%;">

📖Datasets 🔁

<div align="center"> <table> <thead> <tr> <th>Name</th> <th>Train (video/frame)</th> <th>Valid (video/frame)</th> <th>Test (video/frame)</th> <th>Dataset example</th> </tr> </thead> <tbody> <tr> <td>TrafficGaze</td> <td>49080</td> <td>6655</td> <td>19135</td> <td><img src="fig/TrafficGaze-example.gif" alt="BDDA-3" style="zoom:100%;" /></td> </tr> <tr> <td>DrFixD-rainy</td> <td>52291</td> <td>9816</td> <td>19154</td> <td><img src="fig/DrFixD-rainy-example.gif" alt="BDDA-1" style="zoom:100%;" /></td> </tr> <tr> <td>BDDA</td> <td>286251</td> <td>63036</td> <td>93260</td> <td><img src="fig/BDDA-example.gif" alt="BDDA-0" style="zoom:100%;" /></td> </tr> </tbody> </table> </div> 【note】 For all datasets we will provide our download link with the official link. Please choose according to your needs.

(1) TrafficGaze: This dataset is available on BaiduYun (code: SALM) <a href="https://pan.baidu.com/s/1MJaNCcVe7vLSbcDSG0A3-w?pwd=SALM"><img src="fig/baiduyun.jpg" alt="baiduyun" width="50" /></a> or on Hugging Face <a href="https://huggingface.co/datasets/springyu/TrafficGaze"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HuggingFace" width="35" /></a>. We crop 5 frames before and after each video. Official web in link.

(2) DrFixD-rainy: This dataset is available on BaiduYun (code: SALM) <a href="https://pan.baidu.com/s/1wYqS7ZrkKbxfOHZlczvSUA?pwd=SALM"><img src="fig/baiduyun.jpg" alt="baiduyun" width="50" /></a> or on Hugging Face <a href="https://huggingface.co/datasets/springyu/DrFixD_rainy"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HuggingFace" width="35" /></a>. We crop 5 frames before and after each video. Official web in link.

(3) BDDA: This dataset we uploaded in BaiduYun (code: BDDA) <a href="https://pan.baidu.com/s/1JDUejLifqF3vFOx-3izYdw?pwd=BDDA" ><img src="fig/baiduyun.jpg" alt="baidunyu" width="50" height="auto" /></a>. Some camera videos and gazemap videos frame rate inconsistency, we have matched and cropped them. Some camera videos do not correspond to gazemap videos, we have filtered them. Official web in link.

<div align="center"> <table style="width: 100%; table-layout: auto;"> <tr> <th>TrafficGaze</th> <th>DrFixD-rainy</th> <th>BDDA</th> </tr> <tr> <td> ./TrafficGaze<br> &emsp;&emsp;|——fixdata<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata1.mat<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata2.mat<br> &emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata16.mat<br> &emsp;&emsp;|——trafficframe<br> &emsp;&emsp;|&emsp;&emsp;|——01<br> &emsp;&emsp;|&emsp;&emsp;|&emsp;&emsp;|——000001.jpg<br> &emsp;&emsp;|&emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——02<br> &emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——16<br> &emsp;&emsp;|——test.json<br> &emsp;&emsp;|——train.json<br> &emsp;&emsp;|——valid.json </td> <td> ./DrFixD-rainy<br> &emsp;&emsp;|——fixdata<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata1.mat<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata2.mat<br> &emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——fixdata16.mat<br> &emsp;&emsp;|——trafficframe<br> &emsp;&emsp;|&emsp;&emsp;|——01<br> &emsp;&emsp;|&emsp;&emsp;|&emsp;&emsp;|——000001.jpg<br> &emsp;&emsp;|&emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——02<br> &emsp;&emsp;|&emsp;&emsp;|—— ... ...<br> &emsp;&emsp;|&emsp;&emsp;|——16<br> &emsp;&emsp;|——test.json<br> &emsp;&emsp;|——train.json<br> &emsp;&emsp;|——valid.json </td> <td> ./BDDA<br> &emsp;&emsp;|——c
View on GitHub
GitHub Stars59
CategoryDevelopment
Updated10h ago
Forks6

Languages

Python

Security Score

85/100

Audited on Mar 27, 2026

No findings