TLBVFI
ICCV 2025
Install / Use
/learn @ZonglinL/TLBVFIREADME
[ICCV 2025] TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
<div align="center"> </div> <p align="center"> <img src="images/visual1.png" width=95%> <p>Overview
We takes advangtage of temporal information extraction in the pixel space (3D wavelet) and latent space (3D convolutino and attention) to improve the temporal consistentcy of our model.
<p align="center"> <img src="images/overview.jpg" width=95%> <p>Quantitative Results
Our method achieves state-of-the-art performance in LPIPS/FloLPIPS/FID among all recent SOTAs.
<p align="center"> <img src="images/quant.png" width=95%> <p>Qualitative Results
Our method achieves the best visual quality among all recent SOTAs.
<p align="center"> <img src="images/visual3.png" width=95%> <p>For more visualizations, please refer to our <a href="https://zonglinl.github.io/tlbvfi_page/">project page</a>.
Preparation
Package Installation
To install necessary packages, run:
pip install pip==23.2
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Trained Model
The weights of our model are now available at <a href="https://huggingface.co/ucfzl/TLBVFI">huggingface</a>. vimeo_unet.pth is the full model, and vimeo_new.ckpt is the VQ Model (autoencoder).
We will keep the google drive link until July 31 2025. Full model <a href="https://drive.google.com/file/d/1e_v32r6dxRXzjQXo6XDALiO9PM-w6aJS/view?usp=sharing">here</a> and autoencoder <a href="https://drive.google.com/file/d/11HOW6LOwxOae2ET63Fqzs9Dzg3-F9pw9/view?usp=sharing"> here</a>.
Inference
Please leave the model.VQGAN.params.dd_config.load_VFI and model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml as empty, otherwise you need to download the model weights of VFIformer from <a href="https://drive.google.com/drive/folders/140bDl6LXPMlCqG8DZFAXB3IBCvZ7eWyv"> here</a> and our VQ Model. You need to change the path of load_VFI and ckpt_path to the path of downloaded VFIformer and our VQGAN respectively.
Please download our trained model.
Then run:
python interpolate.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
This will interpolate 7 frames in between, you may modify the code to interpolate different number of frames with a bisection like methods
python interpolate_one.py --resume_model path_to_model_weights --frame0 path_to_the_previous_frame --frame1 path_to_the_next_frame
This will interpolate 1 frame in between.
Prepare datasets
Training set
Evaluation set
Xiph is automatically downloaded when you run Xiph_eval.py
The DAVIS dataset is preprocessed with the dataset code from LDMVFI and saved in a structured file. Please feel free to directly use it, or you may use the dataloader from LDMVFI.
Data should be in the following structure:
└──── <data directory>/
├──── DAVIS/
| ├──── bear/
| ├──── ...
| └──── walking/
├──── SNU-FILM/
| ├──── test-easy.txt
| ├──── ...
| └──── test/...
└──── vimeo_triplet/
├──── sequences/
├──── tri_testlist.txt
└──── tri_trainlist.txt
You can either rename folders to our structures, or change the the codes.
Training and Evaluating
Please edit the configs file in configs/Template-LBBDM-video.yaml!
Change data.dataset_config.dataset_path to your path to dataset (the path until <data directory> above)
Change model.VQGAN.params.dd_config.load_VFI to your downloaded VFIformer weights
Train your autoencoder
python3 Autoencoder/main.py --base configs/vqflow-f32.yaml -t --gpus 0,1,2,3 --resume "logs/...."
You may remove resume if you do not need. You can reduce number of gpus accordingly.
After training, you should move the saved VQModel at logs as results/VQGAN/vimeo_new.ckpt. You are also free to change model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml to fit your path of ckpt.
Train the UNet
Make sure that model.VQGAN.params.ckpt_path in configs/Template-LBBDM-video.yaml is set correctly.
Please run:
python3 main.py --config configs/Template-LBBDM-video.yaml --train --save_top --gpu_ids 0
You may use --resume_model /path/to/ckpt to resume training. The model will be saved in results/dataset_name in configs file/model_name in configs file. For simplicity, you can leave dataset_name and model_name unchanged as DAVIS and LBBDM-f32 during training.
Evaluate
Please edit the configs file in configs/Template-LBBDM-video.yaml!
change data.eval and data.mode to decide which dataset you want to evaluate. eval is chosen from {"DAVIS","FILM"} and mode is from {"easy","medium","hard","extreme"}
Change data.dataset_name to create a folder to save sampled images. You will need to distinguish different difficulty level for SNU-FILM when you evaluating SNU-FILM. For example, in our implementation, we choose from {"DAVIS","FILM_{difficulty level}"}. The saved images will be in results/dataset_name. Run:
python3 main.py --configs/Template-LBBDM-video.yaml --gpu_ids 0 --resume_model /path/to/vimeo_unet --sample_to_eval
To evaluate Xiph dataset
Run
python3 Xiph_eval.py --resume_model 'path to vimeo_unet.pth'
Above codes save sampled images and print out PSNR/SSIM
Then, to get LPIPS/FloLPIPS/FID, run:
python3 batch_to_entire.py --latent --dataset dataset_name --step 10
python3 copy_GT.py --latent --dataset dataset_name
python3 eval.py --latent --dataset dataset_name --step 10
dataset_name is from 'DAVIS, FILM_{difficulty level}, Xiph_{4K/2K}'
Acknowledgement
We greatfully appreaciate the source code from BBDM, LDMVFI, and VFIformer
Citation
If you find this repository helpful for your research, please cite:
@article{lyu2025tlbvfitemporalawarelatentbrownian,
title={TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation},
author={Zonglin Lyu and Chen Chen},
year={2025},
eprint={2507.04984},
archivePrefix={arXiv},
primaryClass={cs.CV},
}
Related Skills
qqbot-channel
349.9kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.4k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
