StableV2V
The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".
Install / Use
/learn @AlonzoLeeeooo/StableV2VREADME
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
TCSVT 2025
Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Dong Liu
[Paper (arXiv)] / Paper (TCSVT) / [Project] / [Models (Huggingface)] / [DAVIS-Edit (HuggingFace)] / [Models (wisemodel)] / [DAVIS-Edit (wisemodel)] / [Models (ModelScope)] / [DAVIS-Edit (ModelScope)]
Table of Contents
- <u>1. Overview of
StableV2V</u> - <u>2. News</u>
- <u>3. To-Do Lists</u>
- <u>4. Code Structure</u>
- <u>5. Prerequisites</u>
- <u>6. Inference of
StableV2V(Command Lines)</u> - <u>7. Inference of
StableV2V(Gradio Demo)</u> - <u>8. Details of
DAVIS-Edit</u> - <u>9. Training the Shape-guided Depth Refinement Network</u>
- <u>10. Citation</u>
- <u>11. Results</u>
- <u>12. Star History</u>
- <u>13. Acknowledgements</u>
If you have any questions about this work, please feel free to start a new issue or propose a PR.
<!-- omit in toc -->Overview of StableV2V
StableV2V presents a novel paradigm to perform video editing in a shape-consistent manner, especially handling the editing scenarios when user prompts cause significant shape changes to the edited contents.
Besides, StableV2V shows superior flexibility in handling a wide series of down-stream applications, considering various user prompts from different modalities.
<u><small><🎯Back to Table of Contents></small></u>
<!-- omit in toc -->News
- [Dec. 11th, 2025] Our paper is accepted at TCSVT 2025!
- [Nov. 27th] We uploaded our model weights and the proposed testing benchmark
DAVIS-Editto ModelScope. - [Nov. 21th] We updated a Gradio demo for interactive use of
StableV2V, with detailed illustrations presented in this section. - [Nov. 20th] We uploaded our model weights and the proposed testing benchmark
DAVIS-Editto wisemodel.cn. - [Nov. 19th] We have updated
DAVIS-Editto our HuggingFace dataset repo, and uploaded all the required model weights ofStableV2Vto our HuggingFace model repo. - [Nov. 19th] Our arXiv paper is currently released.
- [Nov. 18th] We updated the codebase of StableV2V.
- [Nov. 17th] We updated our project page.
To-Do List
- [x] Update the codebase of
StableV2V - [x] Upload the curated testing benchmark
DAVIS-Editto our HuggingFace repo - [x] Upload all required model weights of
StableV2Vto our HuggingFace repo - [x] Update a Gradio demo
- Regular Maintainence
<u><small><🎯Back to Table of Contents></small></u>
<!-- omit in toc -->Code Structure
StableV2V
├── LICENSE
├── README.md
├── assets
├── datasets <----- Code of datasets for training of the depth refinement network
├── models <----- Code of model definitions in different components
├── runners <----- Code of engines to run different components
├── inference.py <----- Script to inference StableV2V
├── train_completion_net.py <----- Script to train the shape-guided depth completion network
└── utils <----- Code of toolkit functions
<u><small><🎯Back to Table of Contents></small></u>
<!-- omit in toc -->Prerequisites
<!-- omit in toc -->1. Install the Dependencies
We offer an one-click command line to install all the dependencies that the code requires.
First, create the virtual environment with conda:
conda create -n stablev2v python=3.10
Then, you can execute the following lines to install the dependencies with pip:
bash install_pip.sh
You can also install the dependencies with conda, following the command line below:
bash install_conda.sh
Then, you are ready to go with conda activate stablev2v.
2. Pre-trained Model Weights
Before you start the inference process, you need to prepare the model weights that StableV2V requires.
|Model|Component|Link|
|-|-|-|
|Paint-by-Example|PFE|Fantasy-Studio/Paint-by-Example|
|InstructPix2Pix|PFE|timbrooks/instruct-pix2pix|
|SD Inpaint|PFE|botp/stable-diffusion-v1-5-inpainting|
|ControlNet + SD Inpaint|PFE|ControlNet models at lllyasviel|
|AnyDoor|PFE|xichenhku/AnyDoor|
|RAFT|ISA|Google Drive|
|MiDaS|ISA|Link|
|U2-Net|ISA|Link|
|Depth Refinement Network|ISA|Link|
|SD v1.5|CIG|stable-diffusion-v1-5/stable-diffusion-v1-5|
|ControlNet (depth)|CIG|lllyasviel/control_v11f1p_sd15_depth|
|Ctrl-Adapter|CIG|hanlincs/Ctrl-Adapter (i2vgenxl_depth)|
|I2VGen-XL|CIG|ali-vilab/i2vgen-xl|
Once you downloaded all the model weights, put them in the checkpoints folder.
[!NOTE] If your network environment can get access to HuggingFace, you can directly use the HuggingFace repo ID to download the models. Otherwise we highly recommend you to prepare the model weights locally.
Specfically, make sure you modify the configuration file of AnyDoor at models/anydoor/configs/anydoor.yaml with the path of DINO-v2 pre-trained weights:
(at line 83)
cond_stage_config:
target: models.anydoor.ldm.modules.encoders.modules.FrozenDinoV2Encoder
weight: /path/to/dinov2_vitg14_pretrain.pth
<u><small><🎯Back to Table of Contents></small></u>
<!-- omit in toc -->Inference of StableV2V (Command Lines)
You may refer to the following command line to run StableV2V:
python inference.py --raft-checkpoint-path checkpoints/raft-things.pth --midas-checkpoint-path checkpoints/dpt_swin2_large_384.pt --u2net-checkpoint-path checkpoints/u2net.pth --stable-diffusion-checkpoint-path stable-diffusion-v1-5/stable-diffusion-v1-5 --controlnet-checkpoint-path lllyasviel/control_v11f1p_sd15_depth --i2vgenxl-checkpoint-path ali-vilab/i2vgen-xl --ctrl-adapter-checkpoint-path hanlincs/Ctrl-Adapter --completion-net-checkpoint-path checkpoints/depth-refinement/50000.ckpt --image-editor-type paint-by-example --image-editor-checkpoint-path /path/to/image/editor --source-video-frames examples/frames/bear --external-guidance examples/reference-images/raccoon.jpg --prompt "a raccoon" --outdir results
<details><summary> For detailed illustrations of the arguments, please refer to the table below. </summary>
|Argument|Default Setting|Required or Not|Explanation|
|-|-|-|-|
|Model arguments|-|-|-|
|--image-editor-type|-|Yes|Argument to define the image editor type.|
|--image-editor-checkpoint-path|-|Yes|Path of model weights for the image editor, required by PFE.|
|--raft-checkpoint-path|checkpoints/raft-things.pth|Yes|Path of model weights for RAFT, required by ISA.|
|--midas-checkpoint-path|checkpoints/dpt_swin2_large_382.pt|Yes|Path of model weights for MiDaS, required by ISA.|
|--u2net-checkpoint-path|checkpoints/u2net.pth|Yes|Path of model weights for U2-Net, required by ISA to obtain the segmen
