DiffusionAsShader

[SIGGRAPH 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Generate Convert Improve

Install / Use

/learn @IGL-HKUST/DiffusionAsShader

About this skill

Quality Score

0/100

README

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

teaser

NEWS:

Jun 5, 2025: We released our script and Blender project for creating synthetic datasets.
Jun 2, 2025: We added inference code based on Wan2.1Fun 1.3B fine-tuning to the Wanfun branch.
Apr 2, 2025: Added functionality for complex and precise camera control for videos, based on VGGT. The --override_extrinsics hyperparameter can be adjusted to append or override camera motion in videos.
Apr 1, 2025: Added support for cotracker.
Feb 17, 2025: We uploaded a validation dataset to Google Drive, containing 4 tasks.

Quickstart

Create environment

Clone the repository and create conda environment:

git clone https://github.com/IGL-HKUST/DiffusionAsShader.git
conda create -n das python=3.10
conda activate das

Install pytorch, we recommend Pytorch 2.5.1 with CUDA 11.8:

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Make sure the submodule and requirements are installed:

mkdir -p submodules
git submodule update --init --recursive
pip install -r requirements.txt

If the submodules are not installed, you need to manually download them and move them to submodules/. Run the following commands to install the submodules:

# MoGe
git clone https://github.com/microsoft/MoGe.git submodules/MoGe
# VGGT
git clone https://github.com/facebookresearch/vggt.git submodules/vggt

Manually download these checkpoints to checkpoints/:
- SpatialTracker checkpoint: Google Drive.
- Our Diffusion as Shader checkpoint: https://huggingface.co/EXCAI/Diffusion-As-Shader

Inference

The inference code was tested on

Ubuntu 20.04
Python 3.10
PyTorch 2.5.1
1 NVIDIA H800 with CUDA version 11.8. (32GB GPU memory is sufficient for generating videos with our code.)

We provide a inference script for our tasks. You can run the demo.py script directly as follows. We also provide a validation dataset in Google Drive for our 4 tasks. You can run the scripts/evaluate_DaS.sh to evaluate the performance of our model.

We release the gradio interface for our tasks. You can run the webui.py script directly as follows.

python webui.py --gpu <gpu_id>

Or you can run these tasks one by one as follows.

1. Motion Transfer

<table border="1"> <tr> <th>Original</th> <th>Object Replacement</th> <th>Style Transfer</th> </tr> <tr> <td><img src="assets/videos/rocket1.gif" alt="Original"></td> <td><img src="assets/videos/rocket2.gif" alt="Object Replacement"></td> <td><img src="assets/videos/rocket3.gif" alt="Style Transfer"></td> </tr> <tr> <th>Original</th> <th>Character Replacement 1</th> <th>Character Replacement 2</th> </tr> <tr> <td><img src="assets/videos/roof_girl.gif" alt="Original"></td> <td><img src="assets/videos/roof_anime.gif" alt="Character Replacement 1"></td> <td><img src="assets/videos/roof_robot.gif" alt="Character Replacement 2"></td> </tr> </table> (The above results must be generated with the assistance of FLUX.1 Kontext.)

python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference video path
    --repaint < True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame
    --gpu <gpu_id> \ # the gpu id

2. Camera Control

<table border="1"> <tr> <th>Arc Right + Zoom out</th> <th>Arc Left + Zoom out</th> <th>Arc Up + Zoom out</th> </tr> <tr> <td><img src="assets/videos/panright+out.gif" alt="Pans Right + Zoom out"></td> <td><img src="assets/videos/panleft+out.gif" alt="Pans Left + Zoom out"></td> <td><img src="assets/videos/panup+out.gif" alt="Pans Up + Zoom out"></td> </tr> <tr> <th>Pans Left + Yaw Left</th> <th>Static</th> <th>Zoom out</th> </tr> <tr> <td><img src="assets/videos/car_panright.gif" alt="Pans Right"></td> <td><img src="assets/videos/car_static.gif" alt="Static"></td> <td><img src="assets/videos/car_zoomout.gif" alt="Zoom out"></td> </tr> </table>

We provide several template camera motion types, you can choose one of them. In practice, we find that providing a description of the camera motion in prompt will get better results.

python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference image or video path
    --camera_motion <camera_motion> \ # the camera motion type, see examples below
    --tracking_method <tracking_method> \ # the tracking method (moge, spatracker, cotracker). For image input, 'moge' is necessary.
    --override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
    --gpu <gpu_id> \ # the gpu id

Here are some tips for camera motion:

trans: translation motion, the camera will move in the direction of the vector (dx, dy, dz) with range [-1, 1]
- Positive X: Move left, Negative X: Move right
- Positive Y: Move down, Negative Y: Move up
- Positive Z: Zoom in, Negative Z: Zoom out
- e.g., 'trans -0.1 -0.1 -0.1' moving right, down and zoom in
- e.g., 'trans -0.1 0.0 0.0 5 45' moving right 0.1 from frame 5 to 45
rot: rotation motion, the camera will rotate around the axis (x, y, z) by the angle
- X-axis rotation: positive X: pitch down, negative X: pitch up
- Y-axis rotation: positive Y: yaw left, negative Y: yaw right
- Z-axis rotation: positive Z: roll counter-clockwise, negative Z: roll clockwise
- e.g., 'rot y 25' rotating 25 degrees around y-axis (yaw left)
- e.g., 'rot x -30 10 40' rotating -30 degrees around x-axis (pitch up) from frame 10 to 40
spiral: spiral motion, the camera will move in a spiral path with the given radius
- e.g., 'spiral 2' spiral motion with radius 2
- e.g., 'spiral 2 15 35' spiral motion with radius 2 from frame 15 to 35

Multiple transformations can be combined using semicolon (;) as separator:

e.g., "trans 0 0 -0.5 0 30; rot x -25 0 30; trans -0.1 0 0 30 48" This will:
1. Zoom in (z-0.5) from frame 0 to 30
2. Pitch up (rotate -25 degrees around x-axis) from frame 0 to 30
3. Move right (x-0.1) from frame 30 to 48

Notes:

Frame range is 0-48 (49 frames in total)
If start_frame and end_frame are not specified, the motion will be applied to all frames (0-48)
Frames after end_frame will maintain the final transformation
For combined transformations, they are applied in sequence

3. Object Manipulation

<table border="1"> <tr> <th>Move Left</th> <th>Move Up</th> <th>Rotate</th> </tr> <tr> <td><img src="assets/videos/move left.gif" alt="Move Left"></td> <td><img src="assets/videos/move up.gif" alt="Move Right"></td> <td><img src="assets/videos/rotate.gif" alt="Rotate"></td> </tr> </table>

We provide several template object manipulation types, you can choose one of them. In practice, we find that providing a description of the object motion in prompt will get better results.

python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference image path
    --object_motion <object_motion> \ # the object motion type (up, down, left, right)
    --object_mask <object_mask_path> \ # the object mask path
    --tracking_method <tracking_method> \ # the tracking method (moge, spatracker). For image input, 'moge' is nesserary.
    --gpu <gpu_id> \ # the gpu id

Or you can create your own object motion and camera motion as follows and replace related codes in demo.py:

object motion

dict: Motion dictionary containing:
  - mask (torch.Tensor): Binary mask for selected object
  - motions (torch.Tensor): Per-frame motion vectors [49, 4, 4] (49 frames, 4x4 homogenous objects motion matrix)

camera motion

list: CameraMotion list containing:
  - camera_motion (list): Per-frame camera poses matrix [49, 4, 4] (49 frames, 4x4 homogenous camera poses matrix)

Related Skills

docs-writer

99.5k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

340.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

IGL-HKUST

View profile

View on GitHub

GitHub Stars816

CategoryContent

Updated3d ago

Forks38

IGL-HKUST/DiffusionAsShader

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings