VFIMamba
[NeurIPS 2024] VFIMamba: Video Frame Interpolation with State Space Models
Install / Use
/learn @MCG-NJU/VFIMambaREADME
[NeurIPS 2024] VFIMamba: Video Frame Interpolation with State Space Models arxiv
<div align="center"> <img src="figs/main0.png" width="1000"/> </div>VFIMamba: Video Frame Interpolation with State Space Models<br> Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang
:boom: News
- [2024.09.26] Accepted by NeurIPS 2024!
- [2024.09.3] Support for directly importing model weights from HuggingFace. Thanks to the HuggingFace team for their efforts!
- [2024.07.3] Demo and evaluation codes released.
:satisfied: HighLights
In this work, we have introduced VFIMamba, the first approach to adapt the SSM model to the video frame interpolation task. We devise the Mixed-SSM Block (MSB) for efficient inter-frame modeling using S6. We also explore various rearrangement methods to convert two frames into a sequence, discovering that interleaved rearrangement is more suitable for VFI tasks. Additionally, we propose a curriculum learning strategy to further leverage the potential of the S6 model. Experimental results demonstrate that VFIMamba achieves the state-of-the-art performance across various datasets, in particular highlighting the potential of the SSM model for VFI tasks with high resolution.
<div align="center"> <img src=figs/main.png width=800 /> </div>:two_hearts:Installation
CUDA 11.7
- torch 1.13.1
- python 3.10.6
- causal_conv1d 1.0.0
- mamba_ssm 1.0.1
- skimage 0.19.2
- numpy
- opencv-python
- timm
- tqdm
- tensorboard
:sunglasses: Play with Demos
- Download the model checkpoints and put the
ckptfolder into the root dir. We also support directly importing model weights from HuggingFace. Please refer to hf_demo_2x.py. - Run the following commands to generate 2x and Nx (arbitrary) frame interpolation demos:
We provide two models, an efficient version (VFIMamba-S) and a stronger one (VFIMamba).
You can choose what you need by changing the parameter model.
Hugging Face Demo
For Hugging Face demo, please refer to the code here.
python hf_demo_2x.py --model **model[VFIMamba_S/VFIMamba]** # for 2x interpolation
Manually Load
python demo_2x.py --model **model[VFIMamba_S/VFIMamba]** # for 2x interpolation
python demo_Nx.py --n 8 --model **model[VFIMamba_S/VFIMamba]** # for 8x interpolation
By running above commands with model VFIMamba, you should get the follow examples by default:
<p float="left"> <img src=figs/out_2x.gif width=340 /> <img src=figs/out_8x.gif width=340 /> </p>You can also use the scale parameter to improve performance at higher resolutions; We will downsample to scale*shape to predict the optical flow and then resize to the original size to perform the other operations. We recommend setting the scale to 0.5 for 2K frames and 0.25 for 4K frames.
python demo_2x.py --model VFIMamba --scale 0.5 # for 2K inputs with VFIMamba
:runner: Evaluation
-
Download the dataset you need:
-
Download the model checkpoints and put the
ckptfolder into the root dir. We also support directly importing model weights from HuggingFace. Please refer to hf_demo_2x.py.
For all benchmarks:
python benchmark/**dataset**.py --model **model[VFIMamba_S/VFIMamba]** --path /where/is/your/**dataset**
You can also test the inference time of our methods on the $H\times W$ image with the following command:
python benchmark/TimeTest.py --model **model[VFIMamba_S/VFIMamba]** --H **SIZE** --W **SIZE**
:muscle: Citation
If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:
@misc{zhang2024vfimambavideoframeinterpolation,
title={VFIMamba: Video Frame Interpolation with State Space Models},
author={Guozhen Zhang and Chunxu Liu and Yutao Cui and Xiaotong Zhao and Kai Ma and Limin Wang},
year={2024},
eprint={2407.02315},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.02315},
}
:heartpulse: License and Acknowledgement
This project is released under the Apache 2.0 license. The codes are based on RIFE, EMA-VFI, MambaIR and SGM-VFI. Please also follow their licenses. Thanks for their awesome works.
Related Skills
qqbot-channel
349.9kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.4k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
