SEVD
Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
Install / Use
/learn @eventbasedvision/SEVDREADME
SEVD : Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
<div> <a href="https://arxiv.org/abs/2404.10540" target="_blank">Paper</a> | <a href="https://eventbasedvision.github.io/SEVD/" target="_blank">Website</a> | <a href="https://docs.google.com/forms/d/e/1FAIpQLSdOhlegSlpzW78DsPSqNCDdfg7IVXsbcKD-BgBnbj_YdjojQg/viewform?usp=sf_link" target="_blank">Data</a> | <a href="https://eventbasedvision.github.io/SEVD/documents/SEVD%20Dataset%20Documentation.pdf" target="_blank">Dataset Documentation</a> </div> <hr> <img src="./images/teaser.png"/>In recent years, there has been an increasing focus on neuromorphic or event-based vision due to its ability to excel under high dynamic range conditions, offer high temporal resolution, and consume less power than conventional frame-based vision sensors such as RGB cameras. The event cameras, also known as dynamic vision sensors (DVS), mimic the behavior of biological retinas by continuously sampling incoming light and generating signals only when there is a change in light intensity. This results in an event data stream represented as a sequence of ⟨x, y, p, t⟩ tuple, where (x, y) denotes pixel position, t represents time, and p indicates polarity (positive or negative contrast).
While event-based sensing represents a novel area, research efforts have been limited in recent years to fully utilize the capabilities of event-based cameras for perception tasks. Notably, researchers have predominantly used event- based cameras like DAVIS346 by iniVation and Prophesee’s IMX636 / EVK 4 HD to construct automotive datasets. Additionally, researchers have employed frame-to-event simulators such as ESIM and v2e to generate synthetic event-based data. However, it only converts RGB frames of an outdoor scene from MVSEC. This highlights the significant scarcity of readily available synthetic event-based datasets in the field. To bridge this gap and leverage the potential of synthetic data to generate diverse and high-quality vision data tailored for traffic monitoring, we present SEVD – a Synthetic Event-based Vision Dataset designed for autonomous driving and traffic monitoring tasks.
<img src="./images/related-datasets.png"/>Download
The dataset can be downloaded using this link.
Dataset Overview
SEVD provides multi-view (360°) dataset comprising 27 hr of fixed and 31 hr of ego perception data, with over 9M bounding boxes, recorded across diverse conditions and varying parameters. The event cameras are complemented by five different types of sensors, including RGB, depth, optical flow, semantic, instance segmentation cameras, GNSS and IMU sensors resulting in a diverse array of data.
Folder Structure
SEVD
├── LICENSE
├── images/
├── rvt/
├── ultralytics/
├── README.md
├── carla/ # data generaton pipeline
Baseline
<img src="./images/fixed-perception-baselines.png"> </img> <img src="./images/ego-perception-baselines.png"> </img>
Converting .npz (xytp) Event Stream Files for RVT and RED Training
This guide provides instructions on how to convert .npz (xytp) event stream files for training RVT (Recurrent Vision Transformers) and RED (Recurrent Event-camera Detector) models.
Step 1: Compile .npz Files
Compile the individual .npz files into a single file.
Step 2: Convert to .hdf5 for RVT Training
Convert the compiled .npz file to an .h5 file format for further preprocessing to tensors as detailed in the documentation. [Metavision SDK] (https://docs.prophesee.ai/stable/index.html)
Step 3: Convert to .csv Files
Convert the .npz file to .csv files for further processing.
Step 4: Convert to Metavision HDF5 Format
Use the Metavision SDK (Software Development Kit) to convert the .csv files to Metavision proprietary .raw format for further preprocessing to tensors. Refer to the Metavision SDK documentation for detailed instructions.
Step 5: Train RED
Utilize the generated .hdf5 tensors files from the previous step to train the RED model.
Following these steps will prepare your event stream data for training RVT and RED models effectively.
License
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
Citation
@article{aliminati2024sevd,
title={SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception},
author={Aliminati, Manideep Reddy and Chakravarthi, Bharatesh and Verma, Aayush Atul and Vaghela, Arpitsinh and Wei, Hua and Zhou, Xuesong and Yang, Yezhou},
journal={arXiv preprint arXiv:2404.10540},
year={2024}
}
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
