SkillAgentSearch skills...

Dreamscene4d

[NeurIPS 2024] DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

Install / Use

/learn @dreamscene4d/Dreamscene4d
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Carnegie Mellon University
Wen-Hsuan Chu*, Lei Ke*, Katerina Fragkiadaki

[Project Page] | [Paper]

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/demo-1.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/demo-2.gif" width="99%"></td> </tr> </table>

News and Todos

  • (07.07.2024) v1 of the code has been released!
  • (TODO) Release scripts for obtaining evaluation data splits on DAVIS and for reproducing results.

Installation

We recommend using conda to create separate Python environments.

# Create new conda env
conda create -n dreamscene4d python=3.8.18
conda activate dreamscene4d

# Install PyTorch
# Any version between 2.0 to 2.3 should work, no guarantees for higher versions
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia

# Install customized diffusers
pip install ./diffusers

# Install other dependencies
pip install -r requirements.txt

# simple-knn and nvdiffrast
pip install ./simple-knn
pip install git+https://github.com/NVlabs/nvdiffrast/

# GMFlow and Gaussian Splatting

# IMPORTANT: Download weights manually to gmflow/pretrained/gmflow_kitti-285701a8.pth from https://drive.google.com/file/d/1d5C5cgHIxWGsFR1vYs5XrQbbUiZl9TX2/view

# You can check the GMFlow readme for more detailed instructions
git submodule update --init --recursive
# Install modified gaussian splatting (+ depth, alpha rendering)
pip install ./diff-gaussian-rasterization

Demo Run

We provide two ways to run our model: a fully automatic script that executes each stage sequentially and also the stage-by-stage scripts. The stage-by-stage scripts allows you to visualize the outputs of each stage to make sure nothing is broken before you run other scripts. We also included some parsed videos that can be used for some quick demos. If you want to run your own videos, please take a look at the Data Format section to see what is required.

For demo 1 and demo 2, we will demonstrate the usage of the fully automatic scripts.

Demo 1

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/dogs-jump_composed_no_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/dogs-jump_composed_hor_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/dogs-jump_composed_elev_orbit.gif" width="99%"></td> </tr> </table> This example contains a man playing with two dogs, where parts of the person undergo occlusions. We have provided the inpainted frames already. To run the fully automatic script on the inpainted video sequence, use:
# This version loads the per-object inpainted frames. Recommended for videos with occlusions.
# When using custom videos, you should first check if off-the-shelf inpainting methods give reasonable results.

python run_with_inpaint.py --lite --data_dir ./data/JPEGImages/dogs-jump --mask_dir ./data/Annotations/dogs-jump --inpainted_dir ./data/InpaintedImages/dogs-jump --inpainted_mask_dir ./data/InpaintedMasks/dogs-jump --save_name dogs-jump
  • --lite: (Optional), runs the script with less optimization steps. Trade-off between speed and performance.
  • --data_dir: Directory where the RGB frames are saved.
  • --mask_dir: Directory where the maks annotations are saved.
  • --inpainted_dir: Directory where the inpainted RGB frames are saved.
  • --inpainted mask_dir: Directory where the inpainted maks annotations are saved.
  • --save_name: Identifier name for the saved files.

You can also run python run_with_inpaint.py --help to see other optional arguments alongside with their descriptions.

Demo 2

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/stroller_composed_no_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/stroller_composed_hor_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/stroller_composed_elev_orbit.gif" width="99%"></td> </tr> </table> This example contains two objects (a person pushing a stroller). To run fully automatic script on the original video sequence, simply run:
# This version loads the original un-inpainted frames. Recommended for videos without occlusions.
python run_no_inpaint.py --lite --data_dir ./data/JPEGImages/stroller --mask_dir ./data/Annotations/stroller --save_name stroller
  • --lite: (Optional), runs the script with less optimization steps. Trade-off between speed and performance.
  • --data_dir: Directory where the RGB frames are saved.
  • --mask_dir: Directory where the maks annotations are saved.
  • --save_name: Identifier name for the saved files.

Like in demo 1, you can run python run_no_inpaint.py --help to see other optional arguments alongside with their descriptions.


For demo 3 and demo 4, we will demonstrate the usage of the stage-by-stage optimization scripts. This is solely for explanation purposes, you can use the fully-automatic scripts to produce the same results.

Demo 3

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/rollerblade_composed_no_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/rollerblade_composed_hor_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/rollerblade_composed_elev_orbit.gif" width="99%"></td> </tr> </table> This example contains a person jumping on rollerblades. To run the stage-by-stage scripts on the video sequence, use:
# For custom videos, you need to run stage 1 and stage 2 for every object of interest in the video,
# while incrementing the save_path by 1 (e.g. XXX_1, XXX_2, ...).

# Stage 1: 3D Gaussian optimization
python main.py --config configs/image.yaml input=./data/JPEGImages/rollerblade/00000.png input_mask=./data/Annotations/rollerblade/001/00000.png outdir=./gaussians visdir=./vis save_path=rollerblade_1

# Stage 2: Deformation optimization, for fast results, try iters=(30 * Number of frames).
# On custom videos, you can try iters=(50 * Number of frames) or even iters=(100 * Number of frames)
# if the results from less optimation iterations don't look good.
python main_4d.py --config configs/4d.yaml iters=1000 input=./data/JPEGImages/rollerblade input_mask=./data/Annotations/rollerblade/001 outdir=./gaussians visdir=./vis save_path=rollerblade_1

# Stage 3: Scene composition
python main_4d_compose.py --config configs/4d.yaml input=./data/JPEGImages/rollerblade input_mask=[./data/Annotations/rollerblade/001/00000.png] outdir=./gaussians visdir=./vis save_path=rollerblade

Demo 4

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/kid-football_composed_no_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/kid-football_composed_hor_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/kid-football_composed_elev_orbit.gif" width="99%"></td> </tr> </table> This example contains two objects: a person kicking a ball. To run the stage-by-stage scripts on the video sequence, use:
# For custom videos, you need to run stage 1 and stage 2 for every object of interest in the video,
# while incrementing the save_path by 1 (e.g. XXX_1, XXX_2, ...).

# Stage 1: 3D Gaussian optimization
python main.py --config configs/image.yaml input=./data/JPEGImages/kid-football/00054.png input_mask=./data/Annotations/kid-football/001/00054.png outdir=./gaussians visdir=./vis save_path=kid-football_1
python main.py --config configs/image.yaml input=./data/JPEGImages/kid-football/00054.png input_mask=./data/Annotations/kid-football/002/00054.png outdir=./gaussians visdir=./vis save_path=kid-football_2

# Stage 2: Deformation optimization, for fast results, try iters=(30 * Number of frames).
# On custom videos, you can try iters=(50 * Number of frames) or even iters=(100 * Number of frames)
# if the results from less optimation iterations don't look good.
python main_4d.py --config configs/4d.yaml iters=300 input=./data/JPEGImages/kid-football input_mask=./data/Annotations/kid-football/001 outdir=./gaussians visdir=./vis save_path=kid-football_1
python main_4d.py --config configs/4d.yaml iters=300 input=./data/JPEGImages/kid-football input_mask=./data/Annotations/kid-football/002 outdir=./gaussians visdir=./vis save_path=kid-football_2

# Stage 3: Scene composition
python main_4d_compose.py --config configs/4d.yaml input=./data/JPEGImages/kid-football input_mask=[./data/Annotations/kid-football/001/00054.png,./data/Annotations/kid-football/002/00054.png] outdir=./gaussians visdir=./vis save_path=kid-football

For demo 5, we include a long video sequence with 78 frames, consisting of 5 objects. In theory, our method can handle arbitrarily long video sequences (we have tested videos with 120 frames) with an arbitrary amount of objects. However, you should expect a longer optimization time for more complex and longer videos.

Demo 5

<table> <tr> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/gold-fish_composed_no_orbit.gif" width="99%"></td> <td style="text-align: center; vertical-align: middle;"><img src="demo_visuals/gold-fish_composed

Related Skills

View on GitHub
GitHub Stars230
CategoryContent
Updated6d ago
Forks7

Languages

Python

Security Score

95/100

Audited on Mar 30, 2026

No findings