DiffuEraser

DiffuEraser is a diffusion model for video inpainting, which performs great content completeness and temporal consistency while maintaining acceptable efficiency.

Generate Convert Improve

Install / Use

/learn @lixiaowen-xw/DiffuEraser

About this skill

Quality Score

0/100

README

<div align="center"> <h1>DiffuEraser: A Diffusion Model for Video Inpainting</h1> <div> Xiaowen Li&emsp; Haolan Xue&emsp; Peiran Ren&emsp; Liefeng Bo </div> <div> Tongyi Lab, Alibaba Group&emsp; </div> <div> <strong>TECHNICAL REPORT</strong> </div> <div> <h4 align="center"> <a href="https://lixiaowen-xw.github.io/DiffuEraser-page" target='_blank'><img src="https://img.shields.io/badge/%F0%9F%8C%B1-Project%20Page-blue"></a> <a href="https://arxiv.org/abs/2501.10018" target='_blank'><img src="https://img.shields.io/badge/arXiv-2501.10018-B31B1B.svg"></a> <a href="https://huggingface.co/lixiaowen/diffuEraser"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a> <a href="https://www.modelscope.cn/xingzi/diffuEraser"><img src="https://img.shields.io/badge/%20Modelscope-Model-orange"></a> <a href="https://www.modelscope.cn/studios/iic/DiffuEraser"><img src="https://img.shields.io/badge/%20Modelscope-Demo-purple"></a> </h4> </div> </div>

DiffuEraser is a diffusion model for video inpainting, which outperforms state-of-the-art model Propainter in both content completeness and temporal consistency while maintaining acceptable efficiency.

If you find our work helpful, please help star this repo🌟. Thanks! 😃

If you're using DiffuEraser in your applications, tools, or creative ideas, feel free to share the URL on Discussions.

Update Log

2025.01.20: Release inference code.
2025.03.26: Release training code.
2025.04.08: Release online demo on ModelScope.

TODO

[x] Release training code.
[x] Release HuggingFace/ModelScope demo.
[ ] Release gradio demo.

Results

More results will be displayed on the project page.

https://github.com/user-attachments/assets/b59d0b88-4186-4531-8698-adf6e62058f8

Method Overview

Our network is inspired by BrushNet and Animatediff. The architecture comprises the primary denoising UNet and an auxiliary BrushNet branch. Features extracted by BrushNet branch are integrated into the denoising UNet layer by layer after a zero convolution block. The denoising UNet performs the denoising process to generate the final output. To enhance temporal consistency, temporal attention mechanisms are incorporated following both self-attention and cross-attention layers. After denoising, the generated images are blended with the input masked images using blurred masks.

overall_structure

We incorporate prior information to provide initialization and weak conditioning, which helps mitigate noisy artifacts and suppress hallucinations. Additionally, to improve temporal consistency during long-sequence inference, we expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing capabilities of Video Diffusion Models. Please read the paper for details.

Getting Started

Installation

Clone Repo

git clone https://github.com/lixiaowen-xw/DiffuEraser.git

Create Conda Environment and Install Dependencies

# create new anaconda env
conda create -n diffueraser python=3.9.19  
conda activate diffueraser
# install python dependencies
pip install -r requirements.txt

Prepare pretrained models

Weights will be placed under the ./weights directory.

Download our pretrained models from Hugging Face or ModelScope to the weights folder.
Download pretrained weight of based models and other components:
- stable-diffusion-v1-5 . The full folder size is over 30 GB. If you want to save storage space, you can download only the necessary folders and files: feature_extractor, model_index.json, safety_checker, scheduler, text_encoder, and tokenizer，about 4GB. For training, the 'unet' folder is necessary.
- PCM_Weights
- propainter
- sd-vae-ft-mse
Download motion adapter for training (Optional)
- animatediff-motion-adapter-v1-5-2

The directory structure will be arranged as:

weights
   |- diffuEraser
      |-brushnet
      |-unet_main
   |- stable-diffusion-v1-5
      |-feature_extractor
      |-...
   |- PCM_Weights
      |-sd15
   |- propainter
      |-ProPainter.pth
      |-raft-things.pth
      |-recurrent_flow_completion.pth
   |- sd-vae-ft-mse
      |-diffusion_pytorch_model.bin
      |-...
   |- README.md
   |- animatediff-motion-adapter-v1-5-2 (Optional)
      |- diffusion_pytorch_model.safetensors
      |- ...

Main Inference

We provide some examples in the examples folder. Run the following commands to try it out:

cd DiffuEraser
python run_diffueraser.py

The results will be saved in the results folder. To test your own videos, please replace the input_video and input_mask in run_diffueraser.py . The first inference may take a long time.

The frame rate of input_video and input_mask needs to be consistent. We currently only support mp4 video as input intead of split frames, you can convert frames to video using ffmepg:

ffmpeg -i image%03d.jpg -c:v libx264 -r 25 output.mp4

Notice: Do not convert the frame rate of mask video if it is not consitent with that of the input video, which would lead to errors due to misalignment.

Blow shows the estimated GPU memory requirements and inference time for different resolution:

| Resolution | Gpu Memeory | Inference Time(250f(~10s), L20) | | :--------- | :---------: | :-----------------------------: | | 1280 x 720 | 33G | 314s | | 960 x 540 | 20G | 175s | | 640 x 360 | 12G | 92s |

Training & Evaluation

Data Preparation

The organization of the data directory will be arranged as: (Note: please check it carefully)

data
   |- train
      |- dataset1
         |- video
            |- video1.mp4
            |- ...
         |- metadata.csv
      |- dataset2
         |- video
            |- video1.mp4
            |- ...
         |- metadata.csv
      |- ...
   |- eval
      |- DAVIS
         |- JPEGImages
            |- 480p
               |- <video_name>
                  |- 00000.jpg
                  |- 00001.jpg
         |- Annotations
            |- 480p
               |- <video_name>
                  |- 00000.png
                  |- 00001.png   
         |- ...

You can refer to the format of the metadata.csv as specified in metadata.csv. Column video_path is the path to target video, which is under the parent directory train_data_dir defined in training code, column caption is the caption of video.

Evaluation dataset can be downloaded from DAVIS or others.

Stage1

You can use the script to train and evaluate the model:

# train
sh train_DiffuEraser_stage1.sh
# save checkpoint
python save_checkpoint_stage1.py
# eval
python eval_DiffuEraser_stage1.py

Stage2

Specify the converted stage1 training weights in train_DiffuEraser_stage2.sh

# train
sh train_DiffuEraser_stage2.sh
# save checkpoint
python save_checkpoint_stage2.py
# eval
python eval_DiffuEraser_stage2.py

Citation

If you find our repo useful for your research, please consider citing our paper:

@misc{li2025diffueraserdiffusionmodelvideo,
   title={DiffuEraser: A Diffusion Model for Video Inpainting}, 
   author={Xiaowen Li and Haolan Xue and Peiran Ren and Liefeng Bo},
   year={2025},
   eprint={2501.10018},
   archivePrefix={arXiv},
   primaryClass={cs.CV},
   url={https://arxiv.org/abs/2501.10018}, 
}

License

This repository uses Propainter as the prior model. Users must comply with Propainter's license when using this code. Or you can use other model to replace it.

This project is licensed under the Apache License Version 2.0 except for the third-party components listed below.

Acknowledgement

This code is based on BrushNet, Propainter and Animatediff. The example videos come from Pexels, DAVIS, SA-V and DanceTrack. Thanks for their awesome works.

Related Skills

qqbot-channel

349.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.4k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

349.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

lixiaowen-xw

View profile

View on GitHub

GitHub Stars635

CategoryContent

Updated49m ago

Forks60

lixiaowen-xw/DiffuEraser

Languages

Python

Security Score

100/100

Audited on Apr 6, 2026

No findings