Physgen
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation (ECCV 2024)
Install / Use
/learn @stevenlsw/PhysgenREADME
This repository contains the pytorch implementation for the paper PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation, ECCV 2024. In this paper, we present a novel training-free image-to-video generation pipeline integrates physical simulation and generative video diffusion prior.
Overview

📄 Table of Contents
- Installation
- Colab Notebook
- Quick Demo
- Perception
- Simulation
- Rendering
- All-in-One command
- Evaluation
- Custom Image Video Generation
- Citation
Installation
- Clone this repository:
git clone --recurse-submodules https://github.com/stevenlsw/physgen.git cd physgen - Install requirements by the following commands:
conda create -n physgen python=3.9 conda activate physgen pip install -r requirements.txt
Colab Notebook
Run our Colab notebook for quick start!
Quick Demo
-
Run image space dynamics simulation in just 3 seconds without GPU and any displace device and additional setup required!
export PYTHONPATH=$(pwd) name="pool" python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml -
The output video should be saved in
outputs/${name}/composite.mp4. Try setnameto bedomino,balls,pig_ballandcarfor other scenes exploration. The example outputs are shown below:| Input Image | Simulation | Output Video | |:---------------:|:--------------:|:----------------:| | <img src="data/pool/original.png" alt="Pool Original Image" width="200"> | <img src="assets/pool_sim.gif" alt="Pool Simulation GIF" width="200"> | <img src="assets/pool_composite.gif" alt="Pool Composite GIF" width="200"> | | <img src="data/domino/original.png" alt="Domino Original Image" width="200"> | <img src="assets/domino_sim.gif" alt="Domino Simulation GIF" width="200"> | <img src="assets/domino_composite.gif" alt="Domino Composite GIF" width="200"> |
Perception
- Please see perception/README.md for details.
| Input | Segmentation | Normal | Albedo | Shading | Inpainting | |:---------:|:----------------:|:----------:|:----------:|:-----------:|:--------------:| | <img src="data/pig_ball/original.png" alt="input" width="100"/> | <img src="data/pig_ball/vis.png" alt="segmentation" width="100"/> | <img src="data/pig_ball/intermediate/normal_vis.png" alt="normal" width="100"/> | <img src="data/pig_ball/intermediate/albedo_vis.png" alt="albedo" width="100"/> | <img src="data/pig_ball/intermediate/shading_vis.png" alt="shading" width="100"/> | <img src="data/pig_ball/inpaint.png" alt="inpainting" width="100"/> |
Simulation
-
Simulation requires the following input for each image:
image folder/ ├── original.png ├── mask.png # segmentation mask ├── inpaint.png # background inpainting ├── sim.yaml # simulation configuration file -
sim.yamlspecify the physical properties of each object and initial conditions (force and speed on each object). Please seedata/pig_ball/sim.yamlfor an example. Setdisplaytotrueto visualize the simulation process with display device, setsave_snapshottotrueto save the simulation snapshots. -
Run the simulation by the following command:
cd simulation python animate.py --data_root ../data --save_root ../outputs --config ../data/${name}/sim.yaml -
The outputs are saved in
outputs/${name}as follows:output folder/ ├── history.pkl # simulation history ├── composite.mp4 # composite video |── composite.pt # composite video tensor ├── mask_video.pt # foreground masked video tensor ├── trans_list.pt # objects transformation list tensor
Rendering
Relighting
- Relighting requires the following input:
image folder/ # ├── normal.npy # normal map ├── shading.npy # shading map by intrinsic decomposition previous output folder/ ├── composite.pt # composite video ├── mask_video.pt # foreground masked video tensor ├── trans_list.pt # objects transformation list tensor - The
perception_inputis the image folder contains the perception result. Theprevious_outputis the output folder from the previous simulation step. - Run the relighting by the following command:
cd relight python relight.py --perception_input ../data/${name} --previous_output ../outputs/${name} - The output
relight.mp4andrelight.ptis the relighted video and tensor. - Compare between composite video and relighted video: | Input Image | Composite Video | Relight Video | |:---------------:|:-------------------:|:-----------------:| | <img src="data/pig_ball/original.png" alt="Original Input Image" width="200"/> | <img src="assets/pig_ball_composite.gif" alt="Pig Ball Composite GIF" width="200"/> | <img src="assets/pig_ball_relight.gif" alt="Pig Ball Relight GIF" width="200"/> |
Video Diffusion Rendering
-
Download the SEINE model follow instruction
# install git-lfs beforehand mkdir -p diffusion/SEINE/pretrained git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 diffusion/SEINE/pretrained/stable-diffusion-v1-4 wget -P diffusion/SEINE/pretrained https://huggingface.co/Vchitect/SEINE/resolve/main/seine.pt -
The video diffusion rendering requires the following input:
image folder/ # ├── original.png # input image ├── sim.yaml # simulation configuration file (optional) previous output folder/ ├── relight.pt # composite video ├── mask_video.pt # foreground masked video tensor -
Run the video diffusion rendering by the following command:
cd diffusion python video_diffusion.py --perception_input ../data/${name} --previous_output ../outputs/${name}denoise_strengthandpromptcould be adjusted in the above script.denoise_strengthcontrols the amount of noise added, 0 means no denoising, 1 means denoise from scratch with lots of variance to the input image.promptis the input prompt for video diffusion model, we use default foreground object names from perception model as prompt. -
The output
final_video.mp4is the rendered video. -
Compare between relight video and diffuson rendered video: | Input Image | Relight Video | Final Video | |:--------------------------------------:|:--------------------------------------------:|:--------------------------------------------:| | <img src="data/car/original.png" alt="Original Input Image" width="200"/> | <img src="assets/car_relight.gif" alt="Car Composite GIF" width="200"/> | <img src="assets/car_final.gif" alt="Car Relight GIF" width="200"/> |
All-in-One command
We integrate the simulation, relighting and video diffusion rendering in one script. Please follow the Video Diffusion Rendering to download the SEINE model first.
bash scripts/run_demo.sh ${name}
Evaluation
We compare ours against open-sourced img-to-video models DynamiCrafter, I2VGen-XL, SEINE and collected reference videos GT in Sec. <font color="red">4.3</font>.
-
Install pytorch-fid:
pip install pytorch-fid -
Download the evaluation data from here for all comparisons and unzip to
evaluationdirectory. Choose${method name}fromDynamiCrafter,I2VGen-XL,SEINE,ours. -
Evaluate image FID:
python -m pytorch_fid evaluation/${method name}/all evaluation/GT/all -
Evaluate motion FID:
python -m pytorch_fid evaluation/${method nam
