TLBVFI

ICCV 2025

Generate Convert Improve

Install / Use

/learn @ZonglinL/TLBVFI

About this skill

Quality Score

0/100

README

[ICCV 2025] TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Zonglin Lyu, Chen Chen

</div> <p align="center"> <img src="images/visual1.png" width=95%> <p>

Overview

We takes advangtage of temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolutino and attention) to improve the temporal consistentcy of our model.

Quantitative Results

Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.

Qualitative Results

Our method achieves the best visual quality among all recent SOTAs.

For more visualizations, please refer to our <a href="https://zonglinl.github.io/tlbvfi_page/">project page</a>.

Preparation

Package Installation

To install necessary packages, run:

pip install pip==23.2
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Trained Model

The weights of our model are now available at <a href="https://huggingface.co/ucfzl/TLBVFI">huggingface</a>. vimeo_unet.pth is the full model, and vimeo_new.ckpt is the VQ Model (autoencoder).

We will keep the google drive link until July 31 2025. Full model <a href="https://drive.google.com/file/d/1e_v32r6dxRXzjQXo6XDALiO9PM-w6aJS/view?usp=sharing">here</a> and autoencoder <a href="https://drive.google.com/file/d/11HOW6LOwxOae2ET63Fqzs9Dzg3-F9pw9/view?usp=sharing"> here</a>.

Inference

Please leave the model.VQGAN.params.dd_config.load_VFI and model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml as empty, otherwise you need to download the model weights of VFIformer from <a href="https://drive.google.com/drive/folders/140bDl6LXPMlCqG8DZFAXB3IBCvZ7eWyv"> here</a> and our VQ Model. You need to change the path of load_VFI and ckpt_path to the path of downloaded VFIformer and our VQGAN respectively.

Please download our trained model.

Then run:

python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame

This will interpolate 7 frames in between, you may modify the code to interpolate different number of frames with a bisection like methods

python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame

This will interpolate 1 frame in between.

Prepare datasets

Training set

[Vimeo-90K]

Evaluation set

[DAVIS] | [SNU-FILM]

Xiph is automatically downloaded when you run Xiph_eval.py

The DAVIS dataset is preprocessed with the dataset code from LDMVFI and saved in a structured file. Please feel free to directly use it, or you may use the dataloader from LDMVFI.

Data should be in the following structure:

└──── <data directory>/
    ├──── DAVIS/
    |   ├──── bear/
    |   ├──── ...
    |   └──── walking/
    ├──── SNU-FILM/
    |   ├──── test-easy.txt
    |   ├──── ...
    |   └──── test/...
    └──── vimeo_triplet/
        ├──── sequences/
        ├──── tri_testlist.txt
        └──── tri_trainlist.txt

You can either rename folders to our structures, or change the the codes.

Training and Evaluating

Please edit the configs file in configs/Template-LBBDM-video.yaml!

Change data.dataset_config.dataset_path to your path to dataset (the path until <data directory> above)

Change model.VQGAN.params.dd_config.load_VFI to your downloaded VFIformer weights

Train your autoencoder

python3 Autoencoder/main.py --base configs/vqflow-f32.yaml -t --gpus 0,1,2,3 --resume "logs/...."

You may remove resume if you do not need. You can reduce number of gpus accordingly.

After training, you should move the saved VQModel at logs as results/VQGAN/vimeo_new.ckpt. You are also free to change model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml to fit your path of ckpt.

Train the UNet

Make sure that model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml is set correctly.

Please run:

python3 main.py --config configs/Template-LBBDM-video.yaml --train --save_top --gpu_ids 0

You may use --resume_model /path/to/ckpt to resume training. The model will be saved in results/dataset_name in configs file/model_name in configs file. For simplicity, you can leave dataset_name and model_name unchanged as DAVIS and LBBDM-f32 during training.

Evaluate

Please edit the configs file in configs/Template-LBBDM-video.yaml!

change data.eval and data.mode to decide which dataset you want to evaluate. eval is chosen from {"DAVIS","FILM"} and mode is from {"easy","medium","hard","extreme"}

Change data.dataset_name to create a folder to save sampled images. You will need to distinguish different difficulty level for SNU-FILM when you evaluating SNU-FILM. For example, in our implementation, we choose from {"DAVIS","FILM_{difficulty level}"}. The saved images will be in results/dataset_name. Run:

python3 main.py --configs/Template-LBBDM-video.yaml --gpu_ids 0 --resume_model /path/to/vimeo_unet --sample_to_eval

To evaluate Xiph dataset

Run

python3 Xiph_eval.py --resume_model 'path to vimeo_unet.pth'

Above codes save sampled images and print out PSNR/SSIM

Then, to get LPIPS/FloLPIPS/FID, run:


python3 batch_to_entire.py --latent --dataset dataset_name --step 10

python3 copy_GT.py --latent --dataset dataset_name

python3 eval.py --latent --dataset dataset_name --step 10

dataset_name is from 'DAVIS, FILM_{difficulty level}, Xiph_{4K/2K}'

Acknowledgement

We greatfully appreaciate the source code from BBDM, LDMVFI, and VFIformer

Citation

If you find this repository helpful for your research, please cite:

@article{lyu2025tlbvfitemporalawarelatentbrownian,
      title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation}, 
      author={Zonglin Lyu and Chen Chen},
      year={2025},
      eprint={2507.04984},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
}

Related Skills

qqbot-channel

349.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.4k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

349.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

ZonglinL

View profile

View on GitHub

GitHub Stars35

CategoryContent

Updated11d ago

Forks1

ZonglinL/TLBVFI

Languages

Python

Security Score

75/100

Audited on Mar 26, 2026

No findings