DiffusionAsShader
[SIGGRAPH 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Install / Use
/learn @IGL-HKUST/DiffusionAsShaderREADME
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
<a href='https://arxiv.org/abs/2501.03847'><img src='https://img.shields.io/badge/arXiv-2501.03847-b31b1b.svg'></a>
<a href='https://igl-hkust.github.io/das/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>

NEWS:
-
Jun 5, 2025: We released our script and Blender project for creating synthetic datasets.
-
Jun 2, 2025: We added inference code based on
Wan2.1Fun 1.3Bfine-tuning to theWanfunbranch. -
Apr 2, 2025: Added functionality for complex and precise camera control for videos, based on
VGGT. The--override_extrinsicshyperparameter can be adjusted to append or override camera motion in videos. -
Apr 1, 2025: Added support for
cotracker. -
Feb 17, 2025: We uploaded a validation dataset to Google Drive, containing 4 tasks.
Quickstart
Create environment
-
Clone the repository and create conda environment:
git clone https://github.com/IGL-HKUST/DiffusionAsShader.git conda create -n das python=3.10 conda activate das -
Install pytorch, we recommend
Pytorch 2.5.1withCUDA 11.8:pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
-
Make sure the submodule and requirements are installed:
mkdir -p submodules git submodule update --init --recursive pip install -r requirements.txtIf the submodules are not installed, you need to manually download them and move them to
submodules/. Run the following commands to install the submodules:# MoGe git clone https://github.com/microsoft/MoGe.git submodules/MoGe # VGGT git clone https://github.com/facebookresearch/vggt.git submodules/vggt -
Manually download these checkpoints to
checkpoints/:- SpatialTracker checkpoint: Google Drive.
- Our Diffusion as Shader checkpoint: https://huggingface.co/EXCAI/Diffusion-As-Shader
Inference
The inference code was tested on
- Ubuntu 20.04
- Python 3.10
- PyTorch 2.5.1
- 1 NVIDIA H800 with CUDA version 11.8. (32GB GPU memory is sufficient for generating videos with our code.)
We provide a inference script for our tasks. You can run the demo.py script directly as follows.
We also provide a validation dataset in Google Drive for our 4 tasks. You can run the scripts/evaluate_DaS.sh to evaluate the performance of our model.
We release the gradio interface for our tasks. You can run the webui.py script directly as follows.
python webui.py --gpu <gpu_id>
Or you can run these tasks one by one as follows.
1. Motion Transfer
<table border="1"> <tr> <th>Original</th> <th>Object Replacement</th> <th>Style Transfer</th> </tr> <tr> <td><img src="assets/videos/rocket1.gif" alt="Original"></td> <td><img src="assets/videos/rocket2.gif" alt="Object Replacement"></td> <td><img src="assets/videos/rocket3.gif" alt="Style Transfer"></td> </tr> <tr> <th>Original</th> <th>Character Replacement 1</th> <th>Character Replacement 2</th> </tr> <tr> <td><img src="assets/videos/roof_girl.gif" alt="Original"></td> <td><img src="assets/videos/roof_anime.gif" alt="Character Replacement 1"></td> <td><img src="assets/videos/roof_robot.gif" alt="Character Replacement 2"></td> </tr> </table> (The above results must be generated with the assistance of FLUX.1 Kontext.)python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference video path
--repaint < True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame
--gpu <gpu_id> \ # the gpu id
2. Camera Control
<table border="1"> <tr> <th>Arc Right + Zoom out</th> <th>Arc Left + Zoom out</th> <th>Arc Up + Zoom out</th> </tr> <tr> <td><img src="assets/videos/panright+out.gif" alt="Pans Right + Zoom out"></td> <td><img src="assets/videos/panleft+out.gif" alt="Pans Left + Zoom out"></td> <td><img src="assets/videos/panup+out.gif" alt="Pans Up + Zoom out"></td> </tr> <tr> <th>Pans Left + Yaw Left</th> <th>Static</th> <th>Zoom out</th> </tr> <tr> <td><img src="assets/videos/car_panright.gif" alt="Pans Right"></td> <td><img src="assets/videos/car_static.gif" alt="Static"></td> <td><img src="assets/videos/car_zoomout.gif" alt="Zoom out"></td> </tr> </table>We provide several template camera motion types, you can choose one of them. In practice, we find that providing a description of the camera motion in prompt will get better results.
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference image or video path
--camera_motion <camera_motion> \ # the camera motion type, see examples below
--tracking_method <tracking_method> \ # the tracking method (moge, spatracker, cotracker). For image input, 'moge' is necessary.
--override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--gpu <gpu_id> \ # the gpu id
Here are some tips for camera motion:
- trans: translation motion, the camera will move in the direction of the vector (dx, dy, dz) with range [-1, 1]
- Positive X: Move left, Negative X: Move right
- Positive Y: Move down, Negative Y: Move up
- Positive Z: Zoom in, Negative Z: Zoom out
- e.g., 'trans -0.1 -0.1 -0.1' moving right, down and zoom in
- e.g., 'trans -0.1 0.0 0.0 5 45' moving right 0.1 from frame 5 to 45
- rot: rotation motion, the camera will rotate around the axis (x, y, z) by the angle
- X-axis rotation: positive X: pitch down, negative X: pitch up
- Y-axis rotation: positive Y: yaw left, negative Y: yaw right
- Z-axis rotation: positive Z: roll counter-clockwise, negative Z: roll clockwise
- e.g., 'rot y 25' rotating 25 degrees around y-axis (yaw left)
- e.g., 'rot x -30 10 40' rotating -30 degrees around x-axis (pitch up) from frame 10 to 40
- spiral: spiral motion, the camera will move in a spiral path with the given radius
- e.g., 'spiral 2' spiral motion with radius 2
- e.g., 'spiral 2 15 35' spiral motion with radius 2 from frame 15 to 35
Multiple transformations can be combined using semicolon (;) as separator:
- e.g., "trans 0 0 -0.5 0 30; rot x -25 0 30; trans -0.1 0 0 30 48"
This will:
- Zoom in (z-0.5) from frame 0 to 30
- Pitch up (rotate -25 degrees around x-axis) from frame 0 to 30
- Move right (x-0.1) from frame 30 to 48
Notes:
- Frame range is 0-48 (49 frames in total)
- If start_frame and end_frame are not specified, the motion will be applied to all frames (0-48)
- Frames after end_frame will maintain the final transformation
- For combined transformations, they are applied in sequence
3. Object Manipulation
<table border="1"> <tr> <th>Move Left</th> <th>Move Up</th> <th>Rotate</th> </tr> <tr> <td><img src="assets/videos/move left.gif" alt="Move Left"></td> <td><img src="assets/videos/move up.gif" alt="Move Right"></td> <td><img src="assets/videos/rotate.gif" alt="Rotate"></td> </tr> </table>We provide several template object manipulation types, you can choose one of them. In practice, we find that providing a description of the object motion in prompt will get better results.
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # checkpoint path (e.g checkpoints/Diffusion-As-Shader)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference image path
--object_motion <object_motion> \ # the object motion type (up, down, left, right)
--object_mask <object_mask_path> \ # the object mask path
--tracking_method <tracking_method> \ # the tracking method (moge, spatracker). For image input, 'moge' is nesserary.
--gpu <gpu_id> \ # the gpu id
Or you can create your own object motion and camera motion as follows and replace related codes in demo.py:
- object motion
dict: Motion dictionary containing: - mask (torch.Tensor): Binary mask for selected object - motions (torch.Tensor): Per-frame motion vectors [49, 4, 4] (49 frames, 4x4 homogenous objects motion matrix) - camera motion
list: CameraMotion list containing: - camera_motion (list): Per-frame camera poses matrix [49, 4, 4] (49 frames, 4x4 homogenous camera poses matrix)
Related Skills
docs-writer
99.5k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
340.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
