Uni4d

[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

Generate Convert Improve

Install / Use

/learn @Davidyao99/Uni4d

About this skill

Quality Score

0/100

README

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang
University of Illinois at Urbana-Champaign

CVPR 2025 (Highlight)

Paper | Project Page

4D Reconstruction from Single Casual Video
Teaser

Setup

Uni4D relies on several visual foundation model repositories for preprocessing, included as submodules:

git clone --recursive https://github.com/Davidyao99/uni4d_dev.git

An installation script is provided and tested with CUDA Toolkit 12.1:

conda create -n uni4d python=3.10
conda activate uni4d
bash scripts/install.sh

Download model weights for the visual foundation models:

bash scripts/download_weights.sh

Our preprocessing involves calling GPT via the OpenAI API. This requires an account with credits (usage will be minimal). Find your API key here and set it as an environment variable:

echo "OPENAI_API_KEY=sk-your_api_key_here" > .env

Demo

We've included the lady-running sequence from DAVIS as a demonstration:

bash scripts/demo.sh

The output directory follows this structure:

data/demo/
└── lady-running/                # Demo sequence name
    ├── rgb/                     # Original RGB images
    │   ├── 00000.jpg            # Sequential RGB frames
    │   └── ...
    │
    ├── deva/                    # DEVA model outputs
    │   ├── pred.json            # Prediction data
    │   ├── args.txt             # Arguments/parameters
    │   ├── Annotations/         # Annotation masks
    │   │   ├── 00000.png
    │   │   └── ...
    │   └── Visualizations/      # Visual results
    │       ├── 00000.jpg
    │       └── ...
    │
    ├── ram/                     # RAM model detection results
    │   └── tags.json            # Contains RAM detected classes, GPT output, and filtered classes
    │
    ├── unidepth/                # Depth estimation results
    │   ├── depth_vis/           # Visualizations of depth maps
    │   │   ├── 00000.png
    │   │   └── ...
    │   ├── depth.npy            # Raw depth data
    │   └── intrinsics.npy       # Camera intrinsic parameters
    │
    ├── gsm2/                    # GSM2 model outputs
    │   ├── mask/                # Segmentation masks
    │   │   ├── 00000.png        # Binary mask image
    │   │   ├── 00000.json       # Metadata/parameters
    │   │   └── ...
    │   └── vis/                 # Mask visualizations
    │       ├── 00000.jpg
    │       └── ...
    │
    ├── cotrackerv3_F_G/        # Cotracker outputs (F=frames, G=grid size)
    │   ├── results.npz          # Tracklet data
    │   └── vis/                 # Cotracker visualizations
    │       ├── 00000.png
    │       └── ...
    │
    └── uni4d/                   # Uni4D model outputs
        └── experiment_name/     # Experiment Name
            ├── fused_4d.npz     # Fused 4D representation data
            ├── timer.txt        # Runtime for each stage
            ├── training_info.log # Training logs
            └── *.npz            # Raw results

Customization

Uni4D is modular and any component can be swapped for other visual foundation model outputs. For custom vdeo depth estimation and dynamic masks, save them in the following format:

data/demo/
└── lady-running/                # Demo sequence name
    ├── rgb/                     # Original RGB images
    │   ├── 00000.jpg            # Sequential RGB frames
    │   └── ...
    ├── custom_depth/            # Custom Depth estimation results
    │   ├── depth.npy            # Raw depth data saved as F x 1 x H x W
    │   └── intrinsics.npy       # Camera intrinsic parameters (intrinsics to initialize Uni4D with stored as 3x3 matrix)
    ├── custom_segmentation/     # Custom Segmentation results
    │   ├── Annotations/         # Annotation masks
    │   │   ├── 00000.png        # tracklet id if 1 channel, unique RGB for each tracklet if 3 channel
    ...

When running Uni4d optimization, use the following arguments to use your custom preprocessed estimations:

python ./uni4d/run.py --config ./uni4d/config/config_demo.yaml --depth_net <custom_depth> --dyn_mask_dir <custom_segmentation>

Evaluation

We provide scripts to prepare all datasets, preprocessing, and run Uni4D to reproduce Table 1 and 2 in our main paper.

Datasets

We provide scripts for downloading and preparing datasets (Bonn, TUM-Dynamics, Sintel, Kitti) used in our paper. Run:

bash ./scripts/prepare_datasets.sh

Preprocessing

You can either preprocess all datasets from scratch or download our preprocessed data that we used for the quantitative results in our paper:

Option 1: Preprocess from scratch

bash ./scripts/prepare_datasets.sh

Option 2: Download our preprocessed data

cd data
wget https://uofi.box.com/shared/static/jkbbb0u2oacubd2qhquoweqx9xecydmi.zip -O preprocessed.zip
unzip preprocessed.zip

Running Uni4D

bash ./scripts/run_experiment.sh

Metrics Evaluation

We provide evaluation scripts for pose and depth metrics:

bash ./scripts/eval.sh

Acknowledgements

Our codebase is based on CasualSAM. Our evaluation and dataset preparation is based on MonST3R. Our preprocessing relies on Tracking-Anything-with-DEVA, Grounded-Sam-2, CotrackerV3, Unidepth, and Recognize-Anything. We thank the authors for their excellent work!

Related Skills

qqbot-channel

351.8k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

351.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

Davidyao99

View profile

View on GitHub

GitHub Stars223

CategoryContent

Updated10d ago

Forks13

Davidyao99/uni4d

Languages

Python

Security Score

100/100

Audited on Mar 29, 2026

No findings