SkillAgentSearch skills...

StableV2V

The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".

Install / Use

/learn @AlonzoLeeeooo/StableV2V

README

<div align="center">

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

TCSVT 2025

Chang Liu, Rui Li, Kaidong Zhang, Yunwei Lan, Dong Liu

[Paper (arXiv)] / Paper (TCSVT) / [Project] / [Models (Huggingface)] / [DAVIS-Edit (HuggingFace)] / [Models (wisemodel)] / [DAVIS-Edit (wisemodel)] / [Models (ModelScope)] / [DAVIS-Edit (ModelScope)]

</div> <!-- omit in toc -->

Table of Contents

If you have any questions about this work, please feel free to start a new issue or propose a PR.

<!-- omit in toc -->

Overview of StableV2V

StableV2V presents a novel paradigm to perform video editing in a shape-consistent manner, especially handling the editing scenarios when user prompts cause significant shape changes to the edited contents. Besides, StableV2V shows superior flexibility in handling a wide series of down-stream applications, considering various user prompts from different modalities.

<div align="center"> <video width="500" src="https://alonzoleeeooo.github.io/assets/github-teasor-comparison.mp4" autoplay loop muted></video> <video width="500" src="https://alonzoleeeooo.github.io/assets/github-teasor-applications.mp4" autoplay loop muted></video> </div>

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

News

<!-- omit in toc -->

To-Do List

  • [x] Update the codebase of StableV2V
  • [x] Upload the curated testing benchmark DAVIS-Edit to our HuggingFace repo
  • [x] Upload all required model weights of StableV2V to our HuggingFace repo
  • [x] Update a Gradio demo
  • Regular Maintainence

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Code Structure

StableV2V
├── LICENSE
├── README.md
├── assets
├── datasets                       <----- Code of datasets for training of the depth refinement network
├── models                         <----- Code of model definitions in different components
├── runners                        <----- Code of engines to run different components
├── inference.py                   <----- Script to inference StableV2V
├── train_completion_net.py        <----- Script to train the shape-guided depth completion network
└── utils                          <----- Code of toolkit functions

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Prerequisites

<!-- omit in toc -->

1. Install the Dependencies

We offer an one-click command line to install all the dependencies that the code requires. First, create the virtual environment with conda:

conda create -n stablev2v python=3.10

Then, you can execute the following lines to install the dependencies with pip:

bash install_pip.sh

You can also install the dependencies with conda, following the command line below:

bash install_conda.sh

Then, you are ready to go with conda activate stablev2v.

<!-- omit in toc -->

2. Pre-trained Model Weights

Before you start the inference process, you need to prepare the model weights that StableV2V requires.

<details> <summary> We uploaded all model weights that `StableV2V` requires to our HuggingFace repo. Besides, you can also get access to them in their official releases, where we provide the corresponding details in the following table. </summary>

|Model|Component|Link| |-|-|-| |Paint-by-Example|PFE|Fantasy-Studio/Paint-by-Example| |InstructPix2Pix|PFE|timbrooks/instruct-pix2pix| |SD Inpaint|PFE|botp/stable-diffusion-v1-5-inpainting| |ControlNet + SD Inpaint|PFE|ControlNet models at lllyasviel| |AnyDoor|PFE|xichenhku/AnyDoor| |RAFT|ISA|Google Drive| |MiDaS|ISA|Link| |U2-Net|ISA|Link| |Depth Refinement Network|ISA|Link| |SD v1.5|CIG|stable-diffusion-v1-5/stable-diffusion-v1-5| |ControlNet (depth)|CIG|lllyasviel/control_v11f1p_sd15_depth| |Ctrl-Adapter|CIG|hanlincs/Ctrl-Adapter (i2vgenxl_depth)| |I2VGen-XL|CIG|ali-vilab/i2vgen-xl|

</details>

Once you downloaded all the model weights, put them in the checkpoints folder.

[!NOTE] If your network environment can get access to HuggingFace, you can directly use the HuggingFace repo ID to download the models. Otherwise we highly recommend you to prepare the model weights locally.

Specfically, make sure you modify the configuration file of AnyDoor at models/anydoor/configs/anydoor.yaml with the path of DINO-v2 pre-trained weights:

(at line 83)
cond_stage_config:
  target: models.anydoor.ldm.modules.encoders.modules.FrozenDinoV2Encoder
  weight: /path/to/dinov2_vitg14_pretrain.pth

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Inference of StableV2V (Command Lines)

You may refer to the following command line to run StableV2V:

python inference.py --raft-checkpoint-path checkpoints/raft-things.pth --midas-checkpoint-path checkpoints/dpt_swin2_large_384.pt --u2net-checkpoint-path checkpoints/u2net.pth  --stable-diffusion-checkpoint-path stable-diffusion-v1-5/stable-diffusion-v1-5 --controlnet-checkpoint-path lllyasviel/control_v11f1p_sd15_depth --i2vgenxl-checkpoint-path ali-vilab/i2vgen-xl --ctrl-adapter-checkpoint-path hanlincs/Ctrl-Adapter --completion-net-checkpoint-path checkpoints/depth-refinement/50000.ckpt --image-editor-type paint-by-example --image-editor-checkpoint-path /path/to/image/editor --source-video-frames examples/frames/bear --external-guidance examples/reference-images/raccoon.jpg --prompt "a raccoon" --outdir results
<details><summary> For detailed illustrations of the arguments, please refer to the table below. </summary>

|Argument|Default Setting|Required or Not|Explanation| |-|-|-|-| |Model arguments|-|-|-| |--image-editor-type|-|Yes|Argument to define the image editor type.| |--image-editor-checkpoint-path|-|Yes|Path of model weights for the image editor, required by PFE.| |--raft-checkpoint-path|checkpoints/raft-things.pth|Yes|Path of model weights for RAFT, required by ISA.| |--midas-checkpoint-path|checkpoints/dpt_swin2_large_382.pt|Yes|Path of model weights for MiDaS, required by ISA.| |--u2net-checkpoint-path|checkpoints/u2net.pth|Yes|Path of model weights for U2-Net, required by ISA to obtain the segmen

View on GitHub
GitHub Stars168
CategoryContent
Updated18d ago
Forks6

Languages

Python

Security Score

100/100

Audited on Mar 2, 2026

No findings