StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

TCSVT 2025

Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Dong Liu

[Paper (arXiv)] / Paper (TCSVT) / [Project] / [Models (Huggingface)] / [DAVIS-Edit (HuggingFace)] / [Models (wisemodel)] / [DAVIS-Edit (wisemodel)] / [Models (ModelScope)] / [DAVIS-Edit (ModelScope)]

</div>

1. Overview of StableV2V
2. News
3. To-Do Lists
4. Code Structure
5. Prerequisites
6. Inference of StableV2V (Command Lines)
- 6.1. Sketch-based Editing with StableV2V
- 6.2. Video Inpainting with StableV2V
7. Inference of StableV2V (Gradio Demo)
8. Details of DAVIS-Edit
9. Training the Shape-guided Depth Refinement Network
10. Citation
11. Results
12. Star History
13. Acknowledgements

If you have any questions about this work, please feel free to start a new issue or propose a PR.

Overview of `StableV2V`

StableV2V presents a novel paradigm to perform video editing in a shape-consistent manner, especially handling the editing scenarios when user prompts cause significant shape changes to the edited contents. Besides, StableV2V shows superior flexibility in handling a wide series of down-stream applications, considering various user prompts from different modalities.

<🎯Back to Table of Contents>

News

[Dec. 11th, 2025] Our paper is accepted at TCSVT 2025!
[Nov. 27th] We uploaded our model weights and the proposed testing benchmark DAVIS-Edit to ModelScope.
[Nov. 21th] We updated a Gradio demo for interactive use of StableV2V, with detailed illustrations presented in this section.
[Nov. 20th] We uploaded our model weights and the proposed testing benchmark DAVIS-Edit to wisemodel.cn.
[Nov. 19th] We have updated DAVIS-Edit to our HuggingFace dataset repo, and uploaded all the required model weights of StableV2V to our HuggingFace model repo.
[Nov. 19th] Our arXiv paper is currently released.
[Nov. 18th] We updated the codebase of StableV2V.
[Nov. 17th] We updated our project page.

To-Do List

[x] Update the codebase of StableV2V
[x] Upload the curated testing benchmark DAVIS-Edit to our HuggingFace repo
[x] Upload all required model weights of StableV2V to our HuggingFace repo
[x] Update a Gradio demo
Regular Maintainence

<🎯Back to Table of Contents>

Code Structure

StableV2V
├── LICENSE
├── README.md
├── assets
├── datasets                       <----- Code of datasets for training of the depth refinement network
├── models                         <----- Code of model definitions in different components
├── runners                        <----- Code of engines to run different components
├── inference.py                   <----- Script to inference StableV2V
├── train_completion_net.py        <----- Script to train the shape-guided depth completion network
└── utils                          <----- Code of toolkit functions

<🎯Back to Table of Contents>

Prerequisites

1. Install the Dependencies

We offer an one-click command line to install all the dependencies that the code requires. First, create the virtual environment with conda:

conda create -n stablev2v python=3.10

Then, you can execute the following lines to install the dependencies with pip:

bash install_pip.sh

You can also install the dependencies with conda, following the command line below:

bash install_conda.sh

Then, you are ready to go with conda activate stablev2v.

2. Pre-trained Model Weights

Before you start the inference process, you need to prepare the model weights that StableV2V requires.

<details> <summary> We uploaded all model weights that `StableV2V` requires to our HuggingFace repo. Besides, you can also get access to them in their official releases, where we provide the corresponding details in the following table. </summary>

</details>

Once you downloaded all the model weights, put them in the checkpoints folder.

[!NOTE] If your network environment can get access to HuggingFace, you can directly use the HuggingFace repo ID to download the models. Otherwise we highly recommend you to prepare the model weights locally.

Specfically, make sure you modify the configuration file of AnyDoor at models/anydoor/configs/anydoor.yaml with the path of DINO-v2 pre-trained weights:

(at line 83)
cond_stage_config:
  target: models.anydoor.ldm.modules.encoders.modules.FrozenDinoV2Encoder
  weight: /path/to/dinov2_vitg14_pretrain.pth

<🎯Back to Table of Contents>

Inference of `StableV2V` (Command Lines)

You may refer to the following command line to run StableV2V:

python inference.py --raft-checkpoint-path checkpoints/raft-things.pth --midas-checkpoint-path checkpoints/dpt_swin2_large_384.pt --u2net-checkpoint-path checkpoints/u2net.pth  --stable-diffusion-checkpoint-path stable-diffusion-v1-5/stable-diffusion-v1-5 --controlnet-checkpoint-path lllyasviel/control_v11f1p_sd15_depth --i2vgenxl-checkpoint-path ali-vilab/i2vgen-xl --ctrl-adapter-checkpoint-path hanlincs/Ctrl-Adapter --completion-net-checkpoint-path checkpoints/depth-refinement/50000.ckpt --image-editor-type paint-by-example --image-editor-checkpoint-path /path/to/image/editor --source-video-frames examples/frames/bear --external-guidance examples/reference-images/raccoon.jpg --prompt "a raccoon" --outdir results

<details><summary> For detailed illustrations of the arguments, please refer to the table below. </summary>

|Argument|Default Setting|Required or Not|Explanation| |-|-|-|-| |Model arguments|-|-|-| |--image-editor-type|-|Yes|Argument to define the image editor type.| |--image-editor-checkpoint-path|-|Yes|Path of model weights for the image editor, required by PFE.| |--raft-checkpoint-path|checkpoints/raft-things.pth|Yes|Path of model weights for RAFT, required by ISA.| |--midas-checkpoint-path|checkpoints/dpt_swin2_large_382.pt|Yes|Path of model weights for MiDaS, required by ISA.| |--u2net-checkpoint-path|checkpoints/u2net.pth|Yes|Path of model weights for U2-Net, required by ISA to obtain the segmen

StableV2V

Install / Use

README

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

Table of Contents

Overview of StableV2V

News

To-Do List

Code Structure

Prerequisites

1. Install the Dependencies

2. Pre-trained Model Weights

Inference of StableV2V (Command Lines)

Overview of `StableV2V`

Inference of `StableV2V` (Command Lines)