SkillAgentSearch skills...

UniOcc

This is the official implementation of UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

Install / Use

/learn @tasl-lab/UniOcc
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

License arXiv HuggingFace

Alternative: Google Drive Baidu

Autonomous Driving researchers, have you ever been bothered by the fact that popular datasets all have their different formats, and standardizing them is a pain? Have you ever been frustrated by the difficulty of just understanding the file semantics? This challenge is even worse in the occupancy domain. But, UniOcc is here to help.

UniOcc is a unified framework for occupancy forecasting, single-frame occupancy prediction, and occupancy flow estimation in autonomous driving. By integrating multiple real-world (nuScenes, Waymo) and synthetic (CARLA, OpenCOOD) datasets, UniOcc enables multi-domain training, seamless cross-dataset evaluation, and robust benchmarking across diverse driving environments.

Yuping Wang<sup>1,2</sup>*, Xiangyu Huang<sup>3</sup>*, Xiaokang Sun<sup>1</sup>*, Mingxuan Yan<sup>1</sup>, Shuo Xing<sup>4</sup>, Zhengzhong Tu<sup>4</sup>, Jiachen Li<sup>1</sup>

<sup>1</sup>University of California, Riverside; <sup>2</sup>University of Michigan; <sup>3</sup>University of Wisconsin-Madison; <sup>4</sup>Texas A&M University


Supported Tasks

  • Occupancy Forecasting: Predict future 3D occupancy grids over time given historical occupancies or camera inputs.
  • Occupancy Prediction: Generate detailed 3D occupancy grids from camera inputs.
  • Flow Estimation: Provides forward and backward voxel-level flow fields for more accurate motion modeling and object tracking.
  • Multi-Domain Dataset Integration: Supports major autonomous driving datasets (nuScenes, Waymo, CARLA, etc.) with consistent annotation and evaluation pipelines.
  • Ground-Truth-Free Metrics: Beyond standard IoU, introduces shape and dimension plausibility checks for generative or multi-modal tasks.
  • Cooperative Autonomous Driving: Enables multi-agent occupancy fusion and forecasting, leveraging viewpoint diversity from multiple vehicles.

Pre-requisites

We simplify our benchmark so you only need:

  • Python 3.9 or higher
    pip install torch torchvision pillow tqdm numpy open3d shapely matplotlib scikit-learn pickle
    
  • Huggingface
    pip install "huggingface_hub[cli]"
    

You do not need:

  • nuscenes-devkit
  • waymo-open-dataset
  • tensorflow

Dataset Download

The UniOcc dataset is available on HuggingFace. The size of each dataset is as follows:

| Dataset Name | Number of Scenes | Training Instances | Size (GB) | |--------------------------------------|-----------------:|-------------------:|----------:| | NuScenes-via-Occ3D-2Hz-mini | 10 | 404 | 0.6 | | NuScenes-via-OpenOccupancy-2Hz-mini | ~ | ~ | 0.4 | | NuScenes-via-SurroundOcc-2Hz-mini | ~ | ~ | 0.4 | | NuScenes-via-OpenOccupancy-2Hz-val | 150 | 6,019 | 6.2 | | NuScenes-via-Occ3D-2Hz-val | ~ | ~ | 9.1 | | NuScenes-via-SurroundOcc-2Hz-val | ~ | ~ | 6.2 | | NuScenes-via-Occ3D-2Hz-train | 700 | 28,130 | 41.2 | | NuScenes-via-OpenOccupancy-2Hz-train | ~ | ~ | 28.3 |
| NuScenes-via-SurroundOcc-2Hz-train | ~ | ~ | 28.1 | | Waymo-via-Occ3D-2Hz-mini | 10 | 397 | 0.84 | | Waymo-via-Occ3D-2Hz-val | 200 | 8069 | 15.4 |
| Waymo-via-Occ3D-2Hz-train | 798 | 31,880 | 59.5 | | Waymo-via-Occ3D-10Hz-mini | 10 | 1,967 | 4.0 | | Waymo-via-Occ3D-10Hz-val | 200 | 39,987 | 74.4 |
| Waymo-via-Occ3D-10Hz-train | 798 | 158,081 | 286.6 | | Carla-2Hz-mini | 2 | 840 | 1.0 | | Carla-2Hz-val | 4 | 2,500 | 2.9 |
| Carla-2Hz-train | 11 | 8,400 | 9.3 | | Carla-10Hz-mini | 2 | 4,200 | 5.0 | | Carla-10Hz-val | 4 | 12,500 | 15.0 |
| Carla-10Hz-train | 11 | 42,200 | 46.5 | | OPV2V-10Hz-val | 9 | 8035 | 23.5 |
| OPV2V-10Hz-train | 43 | 18676 | 49.8 | | OPV2V-10Hz-test | 16 | 3629 | 9.6 |

To download each dataset, use the following command (recommend you to download only the folders you need):

huggingface-cli download tasl-lab/uniocc --include "NuScenes-via-Occ3D-2Hz-mini*" --repo-type dataset --local-dir ./datasets
huggingface-cli download tasl-lab/uniocc --include "Carla-2Hz-train*" --repo-type dataset --local-dir ./datasets
...

Contents

Inside each dataset, you will find the following files:

datasets
├── NuScenes-via-Occ3D-2Hz-mini
│   ├── scene_infos.pkl
│   ├── scene_001           <-- Scene Name
│   │   ├── 1.npz           <-- Time Step
│   │   ├── 2.npz
│   │   ├── ...
│   ├── scene_002
│   ...
├── OpenCOOD-via-OpV2V-10Hz-val
│   ├── scene_infos.pkl
│   ├── scene_001           <-- Scene Name
│   │   ├── 1061            <-- CAV ID
│   │   │   │   ├── 1.npz   <-- Time Step
│   │   │   │   ├── 2.npz
│   │   │   │   ├── ...
│   │   │   ├── scene_002
│   ...
  • scene_infos.pkl: A list of dictionaries, each containing the scene name, start and end frame, and other metadata.
  • scene_XXX: A directory containing the data for a single scenario.
  • YYY.npz: A NumPy file containing the following data for a single time step.
    • occ_label: A 3D occupancy grid (L x W x H) with semantic labels.
    • occ_mask_camera: A 3D grid (L x W x H) with binary values with 1 indicating the voxel is in the camera FOV and 0 otherwise.
    • occ_flow_forward: A 3D flow field (L x W x H x 3) with voxel flow vectors pointing to each voxel's next frame coordinate. In the last frame, flow is 0. The unit of the flow is num_voxels.
    • occ_flow_backward: A 3D flow field (L x W x H x 3) with voxel flow vectors pointing to each voxel's previous frame coordinate. In the first frame, flow is 0. The unit of the flow is num_voxels.
    • ego_to_world_transformation: A 4x4 transformation matrix from the ego vehicle to the world coordinate system.
    • cameras: A list of camera objects with intrinsic and extrinsic parameters.
      • name: The camera name (i.e. CAM_FRONT in nuScenes).
      • filename: The relative path to the camera image from the original datasource (i.e. nuScenes).
      • intrinsics: A 3x3 intrinsic matrix.
      • extrinsics: A 4x4 extrinsic matrix from the camera to the ego vehicle's LiDAR.
    • annotations: A list of objects with bounding boxes and class labels.
      • token: The object token, consistent with their original datasource.
      • agent_to_ego: A 4x4 transformation matrix from the object to the ego vehicle.
      • agent_to_world: A 4x4 transformation matrix from the object to the world coordinate system.
      • size: The size of the agent's bounding box in meters. (Length, Width, Height)
      • category_id: The object category (i.e. 1 for car, 4 for pedestrian, etc.)
<img src="figures/flow.png" alt="Alt Text" style="width:80%; height:auto;">

Note: we provide the flow annotation to both dynmaic voxels (agents) and static voxels (envrionments) in the scene.


Visualizing the Dataset

You can visualize the dataset using the provided viz.py script. For example:

python uniocc_viz.py --file_path datasets/NuScenes-via-Occ3D-2Hz-mini/scene-0061/0.npz

In this script, we also provide the API to visualize any 3D occupancy grid, with or without a flow field.


Usage

Without Camera Images

If you only need the occupancy data, you can use the provided uniocc_dataset.py script to load the dataset.

from uniocc_dataset import UniOcc

dataset_carla_mini = UniOcc(
    data_root="datasets/Carla-2Hz-mini",
    obs_len=8,
    fut_len=12
)

dataset_nusc_mini = UniOcc(
    data_root="datasets/NuScenes-via-Occ3D-2Hz-mini",
    obs_len=8,
    fut_len=12
)

dataset = torch.utils.data.ConcatDataset([dataset_carla_mini, dataset_nusc_mini])

With Camera Images

If you want to use the camera images from nuScenes, Waymo or OpV2V, it is necessary to download them from the original dataset.

Related Skills

View on GitHub
GitHub Stars203
CategoryDevelopment
Updated11h ago
Forks16

Languages

Python

Security Score

95/100

Audited on Apr 3, 2026

No findings