SEVD

Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

Generate Convert Improve

Install / Use

/learn @eventbasedvision/SEVD

About this skill

Quality Score

0/100

README

SEVD : Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

<div> <a href="https://arxiv.org/abs/2404.10540" target="_blank">Paper</a> | <a href="https://eventbasedvision.github.io/SEVD/" target="_blank">Website</a> | <a href="https://docs.google.com/forms/d/e/1FAIpQLSdOhlegSlpzW78DsPSqNCDdfg7IVXsbcKD-BgBnbj_YdjojQg/viewform?usp=sf_link" target="_blank">Data</a> | <a href="https://eventbasedvision.github.io/SEVD/documents/SEVD%20Dataset%20Documentation.pdf" target="_blank">Dataset Documentation</a> </div> <hr> <img src="./images/teaser.png"/>

In recent years, there has been an increasing focus on neuromorphic or event-based vision due to its ability to excel under high dynamic range conditions, offer high temporal resolution, and consume less power than conventional frame-based vision sensors such as RGB cameras. The event cameras, also known as dynamic vision sensors (DVS), mimic the behavior of biological retinas by continuously sampling incoming light and generating signals only when there is a change in light intensity. This results in an event data stream represented as a sequence of ⟨x, y, p, t⟩ tuple, where (x, y) denotes pixel position, t represents time, and p indicates polarity (positive or negative contrast).

While event-based sensing represents a novel area, research efforts have been limited in recent years to fully utilize the capabilities of event-based cameras for perception tasks. Notably, researchers have predominantly used event- based cameras like DAVIS346 by iniVation and Prophesee’s IMX636 / EVK 4 HD to construct automotive datasets. Additionally, researchers have employed frame-to-event simulators such as ESIM and v2e to generate synthetic event-based data. However, it only converts RGB frames of an outdoor scene from MVSEC. This highlights the significant scarcity of readily available synthetic event-based datasets in the field. To bridge this gap and leverage the potential of synthetic data to generate diverse and high-quality vision data tailored for traffic monitoring, we present SEVD – a Synthetic Event-based Vision Dataset designed for autonomous driving and traffic monitoring tasks.

Download

The dataset can be downloaded using this link.

Dataset Overview

SEVD provides multi-view (360°) dataset comprising 27 hr of fixed and 31 hr of ego perception data, with over 9M bounding boxes, recorded across diverse conditions and varying parameters. The event cameras are complemented by five different types of sensors, including RGB, depth, optical flow, semantic, instance segmentation cameras, GNSS and IMU sensors resulting in a diverse array of data.

Folder Structure

SEVD
├── LICENSE
├── images/
├── rvt/
├── ultralytics/
├── README.md
├── carla/ # data generaton pipeline

Baseline

Converting .npz (xytp) Event Stream Files for RVT and RED Training

This guide provides instructions on how to convert .npz (xytp) event stream files for training RVT (Recurrent Vision Transformers) and RED (Recurrent Event-camera Detector) models.

Step 1: Compile .npz Files

Compile the individual .npz files into a single file.

Step 2: Convert to .hdf5 for RVT Training

Convert the compiled .npz file to an .h5 file format for further preprocessing to tensors as detailed in the documentation. [Metavision SDK] (https://docs.prophesee.ai/stable/index.html)

Step 3: Convert to .csv Files

Convert the .npz file to .csv files for further processing.

Step 4: Convert to Metavision HDF5 Format

Use the Metavision SDK (Software Development Kit) to convert the .csv files to Metavision proprietary .raw format for further preprocessing to tensors. Refer to the Metavision SDK documentation for detailed instructions.

Step 5: Train RED

Utilize the generated .hdf5 tensors files from the previous step to train the RED model.

Following these steps will prepare your event stream data for training RVT and RED models effectively.

RVT | YOLO |

License

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

Citation

@article{aliminati2024sevd,
  title={SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception},
  author={Aliminati, Manideep Reddy and Chakravarthi, Bharatesh and Verma, Aayush Atul and Vaghela, Arpitsinh and Wei, Hua and Zhou, Xuesong and Yang, Yezhou},
  journal={arXiv preprint arXiv:2404.10540},
  year={2024}
}

Related Skills

node-connect

345.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

104.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

345.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

345.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。