<img src="asset/versecrafter.png" alt="VerseCrafter Logo" width="300"> <h2 align="center"> VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control </h2>

<a href="https://sixiaozheng.github.io/">Sixiao Zheng</a>1,2    <a href="#">Minghao Yin</a>3    <a href="https://wbhu.github.io/">Wenbo Hu</a>4†    <a href="https://xiaoyu258.github.io/">Xiaoyu Li</a>4    <a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a>4    <a href="https://yanweifu.github.io/">Yanwei Fu</a>1,2† 1Fudan University    2Shanghai Innovation Institute    3HKU    4ARC Lab, Tencent PCG †Corresponding authors CVPR 2026

✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.

🔥 News

[Feb 21, 2026] 🎉 VerseCrafter is accepted to CVPR 2026!
[Jan 9, 2026] 🚀 VerseCrafter is released! We publish the arXiv preprint, inference code, and model checkpoints.

✅ TODO

[x] Inference code
[ ] Training code
[ ] Data processing code

TL;DR

Dynamic Realistic Video World Model: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
4D Geometric Control: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
Frozen Video Prior + GeoAdapter: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
VerseControl4D Dataset: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.

Installation

Clone the repository:

git clone --recursive https://github.com/TencentARC/VerseCrafter.git
# If you have already cloned the repo, you can update the submodules manually:
git submodule update --init --recursive

cd VerseCrafter

Create and activate the Conda environment:

conda create -n versecrafter python=3.11 -y
conda activate versecrafter

# Install PyTorch
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y 

# Install Python dependencies
pip install -r requirements.txt

# Install MoGe
pip install git+https://github.com/microsoft/MoGe.git

# Install Grounded-SAM-2
cd third_party/Grounded-SAM-2
pip install -e .
pip install --no-build-isolation -e grounding_dino

# Install flash attention
pip install flash-attn --no-build-isolation

# Install pytorch3d
cd ../../
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
pip install --no-build-isolation .
cd ../VerseCrafter

Download Checkpoints

Download VerseCrafter and Wan2.1 models:

pip install --upgrade huggingface_hub
mkdir -p model
hf download --local-dir model/VerseCrafter sxzheng/VerseCrafter
hf download --local-dir model/Wan2.1-T2V-14B Wan-AI/Wan2.1-T2V-14B

Download Grounded-SAM-2 and Grounding DINO checkpoints:

cd third_party/Grounded-SAM-2/checkpoints
bash download_ckpts.sh

cd ../gdino_checkpoints
bash download_ckpts.sh
cd ../../../

Usage

We provide two ways to use VerseCrafter:

| Method | Description | Pros | Cons | |--------|-------------|------|------| | Blender Addon | Deploy API server on GPU machine, call models directly from Blender | One-stop workflow, no context switching, visual trajectory editing | Requires network access to GPU server | | Script Pipeline | Run each step manually via command line | Works offline, full control over each step | Requires manual switching between terminal and Blender |

💡 Tip: We recommend the Blender Addon for most users. It supports proxy authentication for secure server access. If you cannot connect to a remote GPU server, use the Script Pipeline instead.

Option 1: Blender Addon (Recommended)

Blender Addon Operation Demo

VerseCrafter Addon

For detailed instructions, see README_BLENDER.md.

Prerequisites

Blender 4.0+ (4.5+ recommended)
A remote GPU server running the VerseCrafter API

Quick Start

Install the addon:
```
cd VerseCrafter
zip -r blender_addon.zip blender_addon/
```
In Blender: Edit → Preferences → Add-ons → ↓ → Install from Disk... → Select blender_addon.zip → Enable "VerseCrafter Workflow"

Start the API server (on GPU server):

python api_server.py --port 8188 --num_gpus 8

Configure connection in Blender:
- Press N to open the sidebar → VerseCrafter tab
- Set Server URL (e.g., http://<server-ip>:8188)
- Click Test Connection
Run the workflow:
- Step 1: Select input image, set workflow directory, enter object prompt (e.g., "person . car ."), click "Run Preprocessing"
- Step 2: Edit camera and object trajectories visually, click "Export Trajectories"
- Step 3: Enter video prompt, click "Generate Video"

Option 2: Script Pipeline

The inference.sh script provides a complete pipeline for generating videos. You can run the steps individually or use the script as a reference.

1. Configuration

Edit inference.sh to set your input image, output directory, and prompt.

INPUT_IMAGE=demo_data/y57HgqX1uGc_0039750_0041550_0000635_0000716/0001.jpg
OUTPUT_DIR=demo_data/y57HgqX1uGc_0039750_0041550_0000635_0000716
MODEL_PATH="model/VerseCrafter"

2. Run the Pipeline

The pipeline consists of the following steps:

Step 1: Depth Estimation

Generate depth maps using MoGE-V2.

python inference/moge-v2_infer.py  -i $INPUT_IMAGE -o $OUTPUT_DIR/estimated_depth --maps

Step 2: Segmentation

Segment objects using Grounded-SAM-2.

python inference/grounded_sam2_infer.py \
    --image_path "$INPUT_IMAGE" \
    --text_prompt "person . car ." \
    --output_dir "$OUTPUT_DIR/object_mask" \
    --min_area_ratio 0.003 \
    --max_area_ratio 0.2

Step 3: Fit 3D Gaussian

Fit 3D Gaussians to the segmented objects.

python inference/fit_3D_gaussian.py \
    --image_path $INPUT_IMAGE \
    --npz_path $OUTPUT_DIR/estimated_depth/depth_intrinsics.npz \
    --masks_dir $OUTPUT_DIR/object_mask/masks \
    --output_dir $OUTPUT_DIR/fitted_3D_gaussian

The following are input image and its corresponding results:

| Input Image | Depth Map | Segmentation Mask | 3D Gaussian | |-------------|-----------|-------------------|-------------| | asset/0001.png | asset/depth_vis.png | asset/0001_visualization.png | asset/gaussian_overlay_on_image.png |

Step 4: Customize Trajectory (Manual Operation in Blender)

This step requires Blender to interactively edit the 4D control scene. We also provide a demonstration video that shows step-by-step Blender operations for this process:
Watch the Blender operation video here

Prepare Scripts:
- Open inference/blender_script/build_4d_control_scene.py and inference/blender_script/export_blender_custom_trajectories.py.
- Crucial: Update the ROOT_DIR variable in both scripts to the absolute path of your input directory (e.g., /absolute/path/to/demo_data/your_folder).
Build Scene:
- Open Blender.
- Go to the Scripting tab.
- Open or paste the content of build_4d_control_scene.py.
- Run the script to load the scene (point cloud, camera, objects).
Customize Trajectories:
- Switch to the Layout tab.
- Camera Trajectory:
  - Create a curve (e.g., Shift+A → Curve → Bezier).
  - Switch to Edit Mode to draw or adjust the curve.
  - Select the Camera, add a Follow Path constraint targeting the curve.
  - Check Fixed Position.
  - Set the animation duration to 81 frames.
- 3D Gaussian (Object) Trajectory:
  - Select the object (Ellipsoid).
  - Use the same Follow Path method as the camera, or insert Keyframes (I key) for location/rotation/scale.
Export Trajectories:
- Go back to the Scripting tab.
- Open or paste the content of export_blender_custom_trajectories.py.
- Run the script to export custom_camera_trajectory.npz and custom_3D_gaussian_trajectory.json.

This is an animation of custom trajectories in Blender:

Trajectory animation

VerseCrafter

Install / Use

README