SkillAgentSearch skills...

VideoPainter

[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"

Install / Use

/learn @TencentARC/VideoPainter
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

VideoPainter

[SIGGRAPH 2025] Official code of the paper "VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control"

Keywords: Video Inpainting, Video Editing, Video Generation

Yuxuan Bian<sup>12</sup>, Zhaoyang Zhang<sup>1‡</sup>, Xuan Ju<sup>2</sup>, Mingdeng Cao<sup>3</sup>, Liangbin Xie<sup>4</sup>, Ying Shan<sup>1</sup>, Qiang Xu<sup>2✉</sup><br> <sup>1</sup>ARC Lab, Tencent PCG <sup>2</sup>The Chinese University of Hong Kong <sup>3</sup>The University of Tokyo <sup>4</sup>University of Macau <sup></sup>Project Lead <sup></sup>Corresponding Author

<p align="center"> <a href='https://yxbian23.github.io/project/video-painter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp; <a href="https://arxiv.org/abs/2503.05639"><img src="https://img.shields.io/badge/arXiv-2503.05639-b31b1b.svg"></a> &nbsp; <a href="https://github.com/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github"></a> &nbsp; <a href="https://youtu.be/HYzNfsD3A0s"><img src="https://img.shields.io/badge/YouTube-Video-red?logo=youtube"></a> &nbsp; <a href='https://huggingface.co/datasets/TencentARC/VPData'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> &nbsp; <a href='https://huggingface.co/datasets/TencentARC/VPBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Benchmark-blue'></a> &nbsp; <a href="https://huggingface.co/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a> </p>

Your star means a lot for us to develop this project! ⭐⭐⭐

VPData and VPBench have been fully uploaded (contain 390K mask sequences and video captions). Welcome to use our biggest video segmentation dataset VPData with video captions! 🔥🔥🔥

📖 Table of Contents

🔥 Update Log

  • [2025/3/09] 📢 📢 VideoPainter are released, an efficient, any-length video inpainting & editing framework with plug-and-play context control.
  • [2025/3/09] 📢 📢 VPData and VPBench are released, the largest video inpainting dataset with precise segmentation masks and dense video captions (>390K clips).
  • [2025/3/25] 📢 📢 The 390K+ high-quality video segmentation masks of VPData have been fully released.
  • [2025/3/25] 📢 📢 The raw videos of videovo subset have been uploaded to VPData, to solve the raw video link expiration issue.
  • [2025/4/08] 📢 📢 VideoPainter has been accepted by SIGGRAPH 2025!

TODO

  • [x] Release trainig and inference code
  • [x] Release evaluation code
  • [x] Release VideoPainter checkpoints (based on CogVideoX-5B)
  • [x] Release VPData and VPBench for large-scale training and evaluation.
  • [x] Release gradio demo
  • [ ] Data preprocessing code

🛠️ Method Overview

We propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential.

🚀 Getting Started

<details> <summary><b>Environment Requirement 🌍</b></summary>

Clone the repo:

git clone https://github.com/TencentARC/VideoPainter.git

We recommend you first use conda to create virtual environment, and install needed libraries. For example:

conda create -n videopainter python=3.10 -y
conda activate videopainter
pip install -r requirements.txt

Then, you can install diffusers (implemented in this repo) with:

cd ./diffusers
pip install -e .

After that, you can install required ffmpeg thourgh:

conda install -c conda-forge ffmpeg -y

Optional, you can install sam2 for gradio demo thourgh:

cd ./app
pip install -e .
</details> <details> <summary><b>VPBench and VPData Download ⬇️</b></summary>

You can download the VPBench here, and the VPData here (as well as the Davis we re-processed), which are used for training and testing the BrushNet. By downloading the data, you are agreeing to the terms and conditions of the license. The data structure should be like:

|-- data
    |-- davis
        |-- JPEGImages_432_240
        |-- test_masks
        |-- davis_caption
        |-- test.json
        |-- train.json
    |-- videovo/raw_video
        |-- 000005000
            |-- 000005000000.0.mp4
            |-- 000005000001.0.mp4
            |-- ...
        |-- 000005001
        |-- ...
    |-- pexels/pexels/raw_video
        |-- 000000000
            |-- 000000000000_852038.mp4
            |-- 000000000001_852057.mp4
            |-- ...
        |-- 000000001
        |-- ...
    |-- video_inpainting
        |-- videovo
            |-- 000005000000/all_masks.npz
            |-- 000005000001/all_masks.npz
            |-- ...
        |-- pexels
            |-- ...
    |-- pexels_videovo_train_dataset.csv
    |-- pexels_videovo_val_dataset.csv
    |-- pexels_videovo_test_dataset.csv
    |-- our_video_inpaint.csv
    |-- our_video_inpaint_long.csv
    |-- our_video_edit.csv
    |-- our_video_edit_long.csv
    |-- pexels.csv
    |-- videovo.csv
    

You can download the VPBench, and put the benchmark to the data folder by:

git lfs install
git clone https://huggingface.co/datasets/TencentARC/VPBench
mv VPBench data
cd data
unzip pexels.zip
unzip videovo.zip
unzip davis.zip
unzip video_inpainting.zip

You can download the VPData (only mask and text annotations due to the space limit), and put the dataset to the data folder by:

git lfs install
git clone https://huggingface.co/datasets/TencentARC/VPData
mv VPBench data

# 1. unzip the masks in VPData
python data_utils/unzip_folder.py --source_dir ./data/videovo_masks --target_dir ./data/video_inpainting/videovo
python data_utils/unzip_folder.py --source_dir ./data/pexels_masks --target_dir ./data/video_inpainting/pexels

# 2. unzip the raw videos in Videovo subset in VPData
python data_utils/unzip_folder.py --source_dir ./data/videovo_raw_videos --target_dir ./data/videovo/raw_video

Noted: Due to the space limit, you need to run the following script to download the raw videos of the Pexels subset in VPData. The format should be consistent with VPData/VPBench above (After download the VPData/VPBench, the script will automatically place the raw videos of VPData into the corresponding dataset directories that have been created by VPBench).

cd data_utils
python VPData_download.py
</details> <details> <summary><b>Checkpoints</b></summary>

Checkpoints of VideoPainter can be downloaded from here. The ckpt folder contains

  • VideoPainter pretrained checkpoints for CogVideoX-5b-I2V
  • VideoPainter IP Adapter pretrained checkpoints for CogVideoX-5b-I2V
  • pretrinaed CogVideoX-5b-I2V checkpoint from HuggingFace.

You can download the checkpoints, and put the checkpoints to the ckpt folder by:

git lfs install
git clone https://huggingface.co/TencentARC/VideoPainter
mv VideoPainter ckpt

You also need to download the base model CogVideoX-5B-I2V by:

git lfs install
cd ckpt
git clone https://huggingface.co/THUDM/CogVideoX-5b-I2V

[Optional]You need to download FLUX.1-Fill-dev for first frame inpainting:

git lfs install
cd ckpt
git clone https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev
mv ckpt/FLUX.1-Fill-dev ckpt/flux_inp

[Optional]You need to download SAM2 for video segmentation in gradio demo:

git lfs install
cd ckpt
wget https://huggingface.co/facebook/sam2-hiera-large/resolve/main/sam2_hiera_large.pt

You ca

Related Skills

docs-writer

99.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

335.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

pr

for a github pr, please respond in the following format - ## What type of PR is this? - [ ] 🍕 Feature - [ ] 🐛 Bug Fix - [ ] 📝 Documentation - [ ] 🧑‍💻 Code Refactor - [ ] 🔧 Other ## Description <!-- What changed and why? Optional: include screenshots or other supporting artifacts. --> ## Related Issues <!-- Link issues like: Fixes #123 --> ## Updated requirements or dependencies? - [ ] Requirements or dependencies added/updated/removed - [ ] No requirements changed ## Testing - [ ] Tests added/updated - [ ] No tests needed **How to test or why no tests:** <!-- Describe test steps or explain why tests aren't needed --> ## Checklist - [ ] Self-reviewed the code - [ ] Tests pass locally - [ ] No console errors/warnings ## [optional] What gif best describes this PR?

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

View on GitHub
GitHub Stars588
CategoryContent
Updated1d ago
Forks41

Languages

Python

Security Score

85/100

Audited on Mar 24, 2026

No findings