ResWorld

This is the implementation of the paper "ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving" (ICLR 2026)

Generate Convert Improve

Install / Use

/learn @mengtan00/ResWorld

About this skill

Quality Score

0/100

README

<div align="center"> <h1>ResWorld</h1> <h3>[ICLR2026] ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving</h3> </div> <div align="center"> <img src="resources/resworld.png" width="800"/> </div><br/>

Main Results

UniAD-style metrics

| Method | L2<sub>MAX</sub> (m) 1s | L2<sub>MAX</sub> (m) 2s | L2<sub>MAX</sub> (m) 3s | L2<sub>MAX</sub> (m) Avg. | CR<sub>MAX</sub> (%) 1s | CR<sub>MAX</sub> (%) 2s | CR<sub>MAX</sub> (%) 3s | CR<sub>MAX</sub> (%) Avg. | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ResWorld | 0.19 | 0.50 | 1.08 | 0.59 | 0.02 | 0.06 | 0.43 | 0.17 |

VAD-style metrics

| Method | L2<sub>AVG</sub> (m) 1s | L2<sub>AVG</sub> (m) 2s | L2<sub>AVG</sub> (m) 3s | L2<sub>AVG</sub> (m) Avg. | CR<sub>AVG</sub> (%) 1s | CR<sub>AVG</sub> (%) 2s | CR<sub>AVG</sub> (%) 3s | CR<sub>AVG</sub> (%) Avg. | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | ResWorld | 0.14 | 0.27 | 0.49 | 0.30 | 0.01 | 0.03 | 0.14 | 0.06 |

Get Started

1. Please follow these steps to install ResWorld.

a. Create a conda virtual environment and activate it.

conda create -n resworld python=3.8 -y
conda activate resworld

b. Install PyTorch and torchvision following the official instructions.

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

c. Install mmcv-full, mmdet and mmseg.

pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

d. Install mmdet3d

git clone https://github.com/open-mmlab/mmdetection3d.git
cd /path/to/mmdetection3d
git checkout -f v0.17.1
python setup.py develop

e. Install nuscenes-devkit.

pip install nuscenes-devkit==1.1.9

2. Prepare nuScenes dataset as the below folder structure:

GeoBEV
├── data
│   ├── nuscenes
│   │   ├── lidarseg
│   │   ├── maps
│   │   ├── samples
│   │   ├── samples_point_label
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval
│   │   ├── vad_nuscenes_infos_temporal_train.pkl
│   │   ├── vad_nuscenes_infos_temporal_val.pkl

a. Download nuScenes 3D detection data HERE and unzip all zip files.

b. Download the train file and val file generated by VAD.

c. Download nuScenes-lidarseg annotations HERE and put it under data/nuscenes/. Create depth label utilized by GeoBEV from point cloud by running:

python tools/generate_point_label.py

3. Train ResWorld model on nuScenes:

Download the backbones pretrained by GeoBEV HERE and put it under ckpts/. Then train the ResWorld model following:

bash tools/dist_train.sh projects/configs/resworld/resworld_config.py 4

4. Evaluate ResWorld model following:

bash tools/dist_test.sh projects/configs/resworld/resworld_config.py work_dirs/resworld_config/epoch_12_ema.pth 4 --eval bbox

The model after training is available HERE.

Acknowledgement

This project is not possible without multiple great open-sourced code bases. We list some notable examples below.

Related Skills

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。