DriveAGI
Notably, GenAD & Dataset Survey. A Collection of Foundation Driving Models by OpenDriveLab. For Vista and DriveLM, please refer to individual page.
Install / Use
/learn @OpenDriveLab/DriveAGIREADME
[!IMPORTANT] 🌟 Stay up to date at opendrivelab.com!
Table of Contents
- NEWS
- ⭐ GenAD: OpenDV Dataset (CVPR 2024 Hightlight)
- Vista (NeurIPS 2024)
- DriveLM (ECCV 2024 Oral)
- DriveData Survey (SCIENTIA SINICA Informationis 2024) <!-- - [Abstract](#abstract) - [Related Work Collection](#related-work-collection) -->
- OpenScene
- OpenLane-V2 Update
NEWS
<details><font color="red">[ NEW❗️]</font> 2024/09/08 We released a mini version of OpenDV-YouTube, containing 25 hours of driving videos. Feel free to try the mini subset by following instructions at OpenDV-mini!
2024/05/28 We released our latest research, Vista, a generalizable driving world model. It's capable of predicting high-fidelity and long-horizon futures, executing multi-modal actions, and serving as a generalizable reward function to assess driving behaviors.
2024/03/24 OpenDV-YouTube Update: Full suite of toolkits for OpenDV-YouTube is now available, including data downloading and processing scripts, as well as language annotations. Please refer to OpenDV-YouTube.
2024/03/15 We released the complete video list of OpenDV-YouTube, a large-scale driving video dataset, for GenAD project. Data downloading and processing script, as well as language annotations, will be released next week. Stay tuned.
2024/01/24
We are excited to announce some update to our survey and would like to thank John Lambert, Klemens Esterle from the public community for their advice to improve the manuscript.
GenAD: OpenDV Dataset <a name="opendv"></a>

Examples of real-world driving scenarios in the OpenDV dataset, including urban, highway, rural scenes, etc.
⭐ Generalized Predictive Model for Autonomous Driving (CVPR 2024, Highlight)
Paper | Video | Poster | Slides
🎦 The Largest Driving Video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.
- Complete video list (under YouTube license): OpenDV Videos.
- The downloaded raw videos (
mostly 1080P) consume about3 TBstorage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming. - Therefore, we preprocess them into conseductive images which are more flexible and efficient to load during training. Processed images consumes about
24 TBstorage space in total. - It's recommended to set up your experiments on a small subset, say 1/20 of the whole dataset. An official mini subset is also provided and you can refer to OpenDV-mini for details. After stablizing the training, you can then apply your method on the whole dataset and hope for the best 🤞.
- The downloaded raw videos (
- <font color="red">[ New❗️]</font> Mini subset: OpenDV-mini.
- A mini version of
OpenDV-YouTube. The raw videos consume about44 GBof storage space and the processed images will consume about390 GBof storage space.
- A mini version of
- Step-by-step instruction for data preparation: OpenDV-YouTube.
- Language annotation for OpenDV-YouTube: OpenDV-YouTube-Language.
Quick facts:
- Task: large-scale video prediction for driving scenes.
- Data source:
YouTube, with careful collection and filtering process. - Diversity Highlights: 1700 hours of driving videos, covering more than 244 cities in 40 countries.
- Related work: GenAD
Accepted at CVPR 2024, Highlight Note: Annotations for other public datasets in OpenDV-2K will not be released since we randomly sampled a subset of them in training, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following our paper.
@inproceedings{yang2024genad,
title={Generalized Predictive Model for Autonomous Driving},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Vista
<div id="top" align="center"> <p align="center"> <img src="assets/vista-teaser.gif" width="1000px" > </p> </div><details>Simulated futures in a wide range of driving scenarios by Vista. Best viewed on demo page.
🌏 A Generalizable Driving World Model with High Fidelity and Versatile Controllability (NeurIPS 2024)
Quick facts:
- Introducing the world's first generalizable driving world model.
- Task: High-fidelity, action-conditioned, and long-horizon future prediction for driving scenes in the wild.
- Dataset:
OpenDV-YouTube,nuScenes - Code and model: https://github.com/OpenDriveLab/Vista
- Video Demo: https://vista-demo.github.io
- Related work: Vista, GenAD
@inproceedings{gao2024vista,
title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability},
author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}
@inproceedings{yang2024genad,
title={{Generalized Predictive Model for Autonomous Driving}},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
</details>
DriveLM
<details> Introducing the First benchmark on **Language Prompt for Driving**.Quick facts:
- Task: given the language prompts as input, predict the trajectory in the scene
- Origin dataset:
nuScenes,CARLA (To be released) - Repo: https://github.com/OpenDriveLab/DriveLM, https://github.com/OpenDriveLab/ELM
- Related work: DriveLM, ELM
- Related challenge: Driving with Language AGC Challenge 2024
DriveData Survey
<details>Abstract
With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. In this survey, we provide a comprehensive analysis of more than 70 papers on the timeline, impact, challenges, and future trends in autonomous driving dataset.
Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future
- English Version
- Chinese Version
Accepted at SCIENTIA SINICA Informationis (中文版)
@article{li2024_driving_dataset_survey,
title = {Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future},
author = {Hongyang Li and Yang Li and Huijie Wang and Jia Zeng and Huilin Xu and Pinlong Cai and Li Chen and Junchi Yan and Feng Xu and Lu Xiong and Jingdong Wang and Futang Zhu and Chunjing Xu and Tiancai Wang and Fei Xia and Beipeng Mu and Zhihui Peng and Dahua Lin and Yu Qiao},
journal = {SCIENTIA SINICA Informationis},
year = {2024},
doi = {10.1360/SSI-2023-0313}
}
<!-- > [Hongyang Li](https://lihongyang.info/)<sup>1</sup>, Yang Li<sup>1</sup>, [Huijie Wang](https://faikit.github.io/)<sup>1</sup>, [Jia Zeng](https://scholar.google.com/citations?user=kYrUfMoAAAAJ)<sup>1</sup>, Pinlong Cai<sup>1</sup>, Dahua Lin<sup>1</sup>, Junchi Yan<sup>2</sup>, Feng Xu<sup>3</sup>, Lu Xiong<sup>4</sup>, Jingdong Wang<sup>5</sup>, Futang Zhu<sup>6</sup>, Kai Yan<sup>7</sup>, Chunjing Xu<sup>8</sup>, Tiancai Wang<sup>9</sup>, Beipeng Mu<sup>10</sup>, Shaoqing Ren<sup>11</sup>, Zhihui Peng<sup>12</sup>, Yu Qiao<sup>1</sup>
>
> <sup>1</sup> Shanghai AI Lab, <sup>2</sup> Shanghai Jiao Tong University, <sup>3</sup> Fudan University, <sup>4</sup> Tongji University, <sup>5</sup> Baidu, <sup>6</sup> BYD, <sup>7</sup> Changan, <sup>8</sup> Huawei, <sup>9</sup> Megvii Technology, <sup>10</sup> Meituan, <sup>11</sup> Nio Automotive, <sup>12</sup> Agibot
> -->

Current autonomous driving datasets can broadly be categorized into two generations since the 2010s. We define the Impact (y-axis) of a dataset based on sensor configuration, input modality, task category, data
Related Skills
qqbot-channel
343.3kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
