E2FGVI

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Generate Convert Improve

Install / Use

/learn @MCG-NKU/E2FGVI

About this skill

Quality Score

0/100

README

E<sup>2</sup>FGVI (CVPR 2022)

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting<br> Zhen Li<sup>#</sup>, Cheng-Ze Lu<sup>#</sup>, Jianhua Qin, Chun-Le Guo<sup>*</sup>, Ming-Ming Cheng<br> IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022<br>

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [MindSpore Implementation] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here:

:star: News

2022.05.15: We release E<sup>2</sup>FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. :link: Download links: [Google Drive] [Baidu Disk] :movie_camera: Demo video: [Youtube] [B站]
2022.04.06: Our code is publicly available.

Demo

teaser

More examples (click for details):

<table> <tr> <td> <details> <summary> <strong>Coco (click me)</strong> </summary> <img src="https://user-images.githubusercontent.com/21050959/159160822-8ed5947c-e91d-4597-8e20-4b443a2244ed.gif"> </details> </td> <td> <details> <summary> <strong>Tennis </strong> </summary> <img src="https://user-images.githubusercontent.com/21050959/159160843-4b167115-e338-4e0b-9ca4-b564233c2c7a.gif"> </details> </td> </tr> <tr> <td> <details> <summary> <strong>Space </strong> </summary> <img src="https://user-images.githubusercontent.com/21050959/159171328-1222c70e-9bb9-47e3-b765-4b1baaf631f5.gif"> </details> </td> <td> <details> <summary> <strong>Motocross </strong> </summary> <img src="https://user-images.githubusercontent.com/21050959/159163010-ed78b4bd-c8dd-472c-ad3e-82bc8baca43a.gif"> </details> </td> </tr> </table>

Overview

overall_structure

:rocket: Highlights:

SOTA performance: The proposed E<sup>2</sup>FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

[ ] Update website page
[ ] Hugging Face demo
[ ] Efficient inference

Dependencies and Installation

Clone Repo

git clone https://github.com/MCG-NKU/E2FGVI.git

Create Conda Environment and Install Dependencies
```
conda env create -f environment.yml
conda activate e2fgvi
```
- Python >= 3.7
- PyTorch >= 1.5
- CUDA >= 9.2
- mmcv-full (following the pipeline to install)
If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

<table> <thead> <tr> <th>Model</th> <th>:link: Download Links </th> <th>Support Arbitrary Resolution ?</th> <th> PSNR / SSIM / VFID (DAVIS) </th> </tr> </thead> <tbody> <tr> <td>E<sup>2</sup>FGVI</td> <th> [<a href="https://drive.google.com/file/d/1tNJMTJ2gmWdIXJoHVi5-H504uImUiJW9/view?usp=sharing">Google Drive</a>] [<a href="https://pan.baidu.com/s/1qXAErbilY_n_Fh9KB8UF7w?pwd=lsjw">Baidu Disk</a>] </th> <th>:x:</th> <th>33.01 / 0.9721 / 0.116</th> </tr> <tr> <td>E<sup>2</sup>FGVI-HQ</td> <th> [<a href="https://drive.google.com/file/d/10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3/view?usp=sharing">Google Drive</a>] [<a href="https://pan.baidu.com/s/1jfm1oFU1eIy-IRfuHP8YXw?pwd=ssb3">Baidu Disk</a>] </th> <th>:o:</th> <th>33.06 / 0.9722 / 0.117</th> </tr> </tbody> </table>

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E<sup>2</sup>FGVI always rescales the input video to a fixed resolution (432x240), while E<sup>2</sup>FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

<table> <thead> <tr> <th>Dataset</th> <th>YouTube-VOS</th> <th>DAVIS</th> </tr> </thead> <tbody> <tr> <td>Details</td> <td>For training (3,471) and evaluation (508)</td> <td>For evaluation (50 in 90)</td> <tr> <td>Images</td> <td> [<a href="https://competitions.codalab.org/competitions/19544#participate-get-data">Official Link</a>] (Download train and test all frames) </td> <td> [<a href="https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip">Official Link</a>] (2017, 480p, TrainVal) </td> </tr> <tr> <td>Masks</td> <td colspan="2"> [<a href="https://drive.google.com/file/d/1dFTneS_zaJAHjglxU10gYzr1-xALgHa4/view?usp=sharing">Google Drive</a>] [<a href="https://pan.baidu.com/s/1JC-UKmlQfjhVtD81196cxA?pwd=87e3">Baidu Disk</a>] (For reproducing paper results) </td> </tr> </tbody> </table>

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E<sup>2</sup>FGVI. The scores of E<sup>2</sup>FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E<sup>2</sup>FGVI) and [train_e2fgvi_hq.json](./configs/train_e2fgvi_hq.jso

Related Skills

docs-writer

98.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

330.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

MCG-NKU

View profile

View on GitHub

GitHub Stars1.1k

CategoryContent

Updated3d ago

Forks113

MCG-NKU/E2FGVI

Languages

Python

Security Score

85/100

Audited on Mar 19, 2026

No findings