EventSAM

Code for CVPR'24 Paper: Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens

Generate Convert Improve

Install / Use

/learn @zhiwen-xdu/EventSAM

About this skill

Quality Score

0/100

README

<p align="right">English | <a href="./README_CN.md">简体中文</a></p> <div align="center"> <img src="assets/Logo02.PNG" width="100%" higth="100%"> <h3 align="center"><strong>Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens [CVPR '24] </strong></h3> <p align="center"> <a>Zhiwen Chen</a><sup>1</sup>   <a>Zhiyu Zhu</a><sup>2</sup>   <a>Yifan Zhang</a><sup>2</sup>   <a>Junhui Hou</a><sup>2</sup>   <a> Guangming Shi</a><sup>1</sup>   <a>Jinjian Wu</a><sup>1</sup> <br> <sup>1</sup>Xidian University    <sup>2</sup>City University of Hong Kong    </div> <p align="center"> <a href="https://arxiv.org/abs/2312.16222" target='_blank'> <img src="https://img.shields.io/badge/Paper-%F0%9F%93%83-purple"> </a> <a href="" target='_blank'> <img src="https://visitor-badge.laobi.icu/badge?page_id=zhiwen-xdu.EventSAM&left_color=gray&right_color=purple"> </a> </p>

About

Official Code for Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens. This paper delves into the nuanced challenge of tailoring the Segment Anything Models (SAMs) for integration with event data, with the overarching objective of attaining robust and universal object segmentation within the event-centric domain.

Getting Started

Installation

Clone the repository locally:

pip install git+https://github.com/happychenpipi/EventSAM.git

Create and activate a conda environment and install the required packages:

conda create -n eventsam python=3.8
conda activate eventsam
bash install_eventsam.sh

📈 Data Preparation

In this work, we collected a large-scale RGB-Event dataset for event-centric segmentation, from current available pixel-level aligned datasets (VisEvent and COESOT), namely RGBE-SEG. To explore the zero-shot performance of our method, we showed more segmentation results on MVSEC, DDD17 and DSEC datasets. In addition, we also provide corresponding groundtruth masks or prediction results for comparison. Please download these data with the link below and put in ./data.

Baidu Pan: <a href="https://pan.baidu.com/s/19ruTHhwtzzVlFG0j-cO19A?pwd=4ek2" target='_blank'><img src="https://img.shields.io/badge/Datasets-purple"></a> <a href="https://pan.baidu.com/s/19-JwiJsMWxz4czaxNwXeSQ?pwd=uq4x" target='_blank'><img src="https://img.shields.io/badge/Groundtruths-blue"></a> <a href="https://pan.baidu.com/s/1kh_6hFgyuDw04bDQLc9O_w?pwd=hn6m" target='_blank'><img src="https://img.shields.io/badge/Predictions-yellow"></a>

Google Drive: <a href="https://drive.google.com/drive/folders/1XftvgcyvTulyql5Uoj9rDIPihP3Yx7On?usp=sharing" target='_blank'><img src="https://img.shields.io/badge/Datasets+Groundtruths+Predictions-red"></a>

Format of All Datasets:

├── RGBE_SEG dataset
    ├── Training Subset (472 sequences)
        ├── dvSave-2021_09_01_06_59_10
            ├── event          # Event Source File： [N,4]-[x,y,t,p]
            ├── rgb_image      # RGB Images, which is the input of teacher network.
            ├── event_image    # Event-oriented Binary Images, which is used for event visualization.
            ├── voxel_image    # Event-oriented Voxel-like Images, which is the input of student network.
        ├── ... 
    ├── Testing Subset For Normal Scenes (104 sequences) # Easy, Medium, Hard
        ├── dvSave-2021_07_30_11_04_12
            ├── event
            ├── rgb_image
            ├── event_image
            ├── voxel_image 
        ├── ...
    ├── Testing Subset For Degraded Scenes (28 sequences) # Low Light, Over Exposure, Motion Blur
        ├── video_0078
            ├── event
            ├── rgb_image
            ├── event_image
            ├── voxel_image 
        ├── ...

├── MVSEC_SEG/DDD17_SEG/DSEC_SEG dataset
    ├── Testing Subset
        ├── seq_name
            ├── event
            ├── rgb_image
            ├── event_image
            ├── voxel_image 
        ├── ...

Format of Groundtruth Masks or Prediction Masks:

├── RGBE_SEG dataset
    ├── Testing Subset For Normal Scenes (108 sequences) # Easy, Medium, Hard
        ├── dvSave-2021_07_30_11_04_12
            ├── **.png     # Groundtruth Masks/Prediction Masks.
        ├── ...

├── MVSEC_SEG/DDD17_SEG/DSEC_SEG dataset
    ├── Testing Subset
        ├── seq_name
            ├── **.png     # Groundtruth Masks/Prediction Masks.
        ├── ...

🚀 Training

First download a pre-trained model checkpoint (e.g. sam_vit_b.pth) SAM and put in ./pretrained. Then the model can be used as teacher for rgb-event knowledge distillation:

python ./event_encoder/train.py

Pre-trained Model

Pre-trained EventSAM model (e.g. rgbe_encoder.pth) needs to be downloaded and put in ./checkpoints. <a href="https://pan.baidu.com/s/1mFtvLAkHFpnGmx_8Ky85kQ?pwd=3c3e" target='_blank'><img src="https://img.shields.io/badge/Baidu Pan-purple"></a> <a href="https://drive.google.com/drive/folders/1XftvgcyvTulyql5Uoj9rDIPihP3Yx7On?usp=sharing" target='_blank'><img src="https://img.shields.io/badge/Google Drive-red"></a>

Evaluation

Predict the segment masks of event images:

python ./evaluate/predict_mask.py

Calculate metrics of predicted masks:

python ./evaluate/calculate_metric.py

Visualization

EventSAM&LLM

To further validate the strong zero-shot object recognition ability of our event-adapt SAM. We integrate it with a vision-language object segmentation framework LISA. Through this, we could further unlock the rich semantic inherent in SAM, for interactive universal object segmentation with Event data. There are some visualizations.

Acknowledgments

Thanks to VisEvent, COESOT, MVSEC, DDD17, DSEC datasets, SAM and LISA projects.

Contact

Feedbacks and comments are welcome! Feel free to contact us via zhiwen.chen@stu.xidian.edu.cn and zhiyuzhu2-c@my.cityu.edu.hk.

📚 Citation

If you use EventSAM in your research, please use the following BibTeX entry.

@InProceedings{Chen_2024_CVPR,
    author    = {Chen, Zhiwen and Zhu, Zhiyu and Zhang, Yifan and Hou, Junhui and Shi, Guangming and Wu, Jinjian},
    title     = {Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {3890-3900}
}

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。