DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Install / Use
/learn @OpenDriveLab/DriveLMREADME
<div id="top" align="center"> <p align="center"> <img src="assets/images/repo/title_v2.jpg"> </p>[!IMPORTANT] 🌟 Stay up to date at opendrivelab.com!
DriveLM: Driving with Graph Visual Question Answering
<!-- Download dataset [**HERE**](docs/data_prep_nus.md) (serves as Official source for `Autonomous Driving Challenge 2024`) -->Autonomous Driving Challenge 2024 Driving-with-Language Leaderboard.
https://github.com/OpenDriveLab/DriveLM/assets/54334254/cddea8d6-9f6e-4e7e-b926-5afb59f8dce2
<!-- > above is new demo video. demo scene token: cc8c0bf57f984915a77078b10eb33198 -->Highlights <a name="highlight"></a>
🔥 We instantiate datasets (DriveLM-Data) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
<!-- 🔥 **The key insight** is that with our proposed suite, we obtain a suitable proxy task to mimic the human reasoning process during driving. -->🏁 DriveLM serves as a main track in the CVPR 2024 Autonomous Driving Challenge. Everything you need for the challenge is HERE, including baseline, test data and submission format and evaluation pipeline!
News <a name="news"></a>
[2025/01/08]Drive-Bench release! In-depth analysis in what are DriveLM really benchmarking. Take a look at arxiv.[2024/07/16]DriveLM official leaderboard reopen![2024/07/01]DriveLM got accepted to ECCV 2024! Congrats to the team![2024/06/01]Challenge ended up! See the final leaderboard.[2024/03/25]Challenge test server is online and the test questions are released. Check it out![2024/02/29]Challenge repo release. Baseline, data and submission format, evaluation pipeline. Have a look![2023/08/25]DriveLM-nuScenes demo released.[2023/12/22]DriveLM-nuScenes fullv1.0and paper released.
Table of Contents
- Highlights
- Getting Started
- Current Endeavors and Future Horizons
- TODO List
- DriveLM-Data
- License and Citation
- Other Resources
Getting Started <a name="gettingstarted"></a>
To get started with DriveLM:
<p align="right">(<a href="#top">back to top</a>)</p>Current Endeavors and Future Directions <a name="timeline"></a>
<p align="center"> <img src="assets/images/repo/drivelm_timeline_v3.jpg"> </p>
- The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
- Date below reflects the arXiv submission date.
- If there is any missing work, please reach out to us!
DriveLM attempts to address some of the challenges faced by the community.
- Lack of data: DriveLM-Data serves as a comprehensive benchmark for driving with language.
- Embodiment: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
- Closed-loop: DriveLM-CARLA attempts to explore closed-loop planning with language.
TODO List <a name="newsandtodolist"></a>
- [x] DriveLM-Data
- [x] DriveLM-nuScenes
- [x] DriveLM-CARLA
- [x] DriveLM-Metrics
- [x] GPT-score
- [ ] DriveLM-Agent
- [x] Inference code on DriveLM-nuScenes
- [ ] Inference code on DriveLM-CARLA
DriveLM-Data <a name="drivelmdata"></a>
We facilitate the Perception, Prediction, Planning, Behavior, Motion tasks with human-written reasoning logic as a connection between them. We propose the task of GVQA on the DriveLM-Data.
📊 Comparison and Stats <a name="comparison"></a>
DriveLM-Data is the first language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.
<!-- <center> | Language Dataset | Base Dataset | Language Form | Perspectives | Scale | Release?| |:---------:|:-------------:|:-------------:|:------:|:--------------------------------------------:|:----------:| | [BDD-X 2018](https://github.com/JinkyuKimUCB/explainable-deep-driving) | [BDD](https://bdd-data.berkeley.edu/) | Description | Perception & Reasoning | 8M frames, 20k text strings |**:heavy_check_mark:**| | [HAD 2019](https://usa.honda-ri.com/had) | [HDD](https://usa.honda-ri.com/hdd) | Advice | Goal-oriented & stimulus-driven advice | 5,675 video clips, 45k text strings |**:heavy_check_mark:**| | [DRAMA 2022](https://usa.honda-ri.com/drama) | - | Description | Perception & Planning results | 18k frames, 100k text strings | **:heavy_check_mark:**| | [Rank2Tell 2023](https://arxiv.org/abs/2309.06597) | - | Perception & Planning results | QA + Captions | 5k frames | :x: | | [nuScenes-QA 2023](https://arxiv.org/abs/2305.14836)Related Skills
node-connect
328.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
328.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.0kCommit, push, and open a PR
