SkillAgentSearch skills...

VideoMaMa

Official implementation of "VideoMaMa: Mask-Guided Video Matting via Generative Prior", CVPR 2026

Install / Use

/learn @cvlab-kaist/VideoMaMa
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1>VideoMaMa: Mask-Guided Video Matting via Generative Prior</h1>

Sangbeom Lim<sup>1</sup> · Seoung Wug Oh<sup>2</sup> · Jiahui Huang<sup>2</sup> · Heeji Yoon<sup>3</sup>
Seungryong Kim<sup>3</sup> · Joon-Young Lee<sup>2</sup>

<sup>1</sup>Korea University    <sup>2</sup>Adobe Research    <sup>3</sup>KAIST AI

CVPR 2026

<a href="https://arxiv.org/abs/2601.14255"><img src='https://img.shields.io/badge/arXiv-VideoMaMa-red' alt='Paper PDF'></a> <a href='https://cvlab-kaist.github.io/VideoMaMa/'><img src='https://img.shields.io/badge/Project_Page-VideoMaMa-green' alt='Project Page'></a> <a href="https://huggingface.co/spaces/SammyLim/VideoMaMa" target='_blank'> <img src="https://img.shields.io/badge/Demo-%F0%9F%A4%97%20Hugging%20Face-blue"> </a>

<strong>VideoMaMa is a mask-guided video matting framework that leverages a video generative prior. By utilizing this prior, it supports stable performance across diverse video domains with fine-grained matting quality.</strong>

<div style="width: 100%; text-align: center; margin:auto;"> <img style="width:100%" src="assets/teaser.jpg"> </div>

For more visual results, go checkout our <a href="https://cvlab-kaist.github.io/VideoMaMa/" target="_blank">project page</a>

</div>

📰 News

VideoMaMa is an open-source project. If you find our work helpful, please consider giving this repository a ⭐.

🔥 TODO

  • [x] Release Demo & Model checkpoint. (Jan 19, 2026)
  • [x] Release ArXiv paper. (Jan 19, 2026)
  • [x] Release Training Code. (Mar 14, 2026)
  • [ ] Evaluation Code.
  • [ ] Release MA-V dataset.

⚙️ Setup

Please run

bash scripts/setup.sh

it will down load stable video diffusion weight, and setup virtual enviroment needed to run whole codes.
We use conda activate videomama.

This will download sam2 which is needed for training sam2-matte.

🎮 Demo

Please check demo readme.

🎯 Inference

Hugging Face Model Card

VideoMaMa model checkpoint — available on the Hugging Face Hub: SammyLim/VideoMaMa.

For inferencing video use this command.

python inference_onestep_folder.py \
--base_model_path "<stabilityai/stable-video-diffusion-img2vid-xt_path>" \
--unet_checkpoint_path "<videomama_checkpoint_path>" \
--image_root_path "/assets/example/image" \
--mask_root_path "assets/example/mask" \
--output_dir "assets/example" \
[--optional_arguments]

For example, If you have setup using above command, this example bash will work.

python inference_onestep_folder.py \
    --base_model_path "checkpoints/stable-video-diffusion-img2vid-xt" \
    --unet_checkpoint_path "checkpoints/VideoMaMa" \
    --image_root_path "/assets/example/image" \
    --mask_root_path "assets/example/mask" \
    --output_dir "assets/example" \
    --keep_aspect_ratio 

For more information about inference setting, please check inference readme.

🚂🚃🚃🚃🚃 Training

Generating training dataset

Please check Data pipeline README.

Model Training

Please check training README.

<!-- # Evaluation Please check [evaluation readme](evaluation/README.md). -->

🎓 Citation

@article{lim2026videomama,
  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
  author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
  journal={arXiv preprint arXiv:2601.14255},
  year={2026}
}

🙏 Acknowledgments

  • SAM2: Meta AI's Segment Anything 2
  • Stable Video Diffusion: Stability AI's video generation model
  • Gradio: For the amazing UI framework

📧 Contact

For questions or issues, please open an issue on our GitHub repository.

We welcome any feedback, questions, or opportunities for collaboration. If you are interested in using this model for industrial applications, or have specific questions about the architecture and training, please feel free to reach out.

📄 License

The code in this repository is released under the CC BY-NC 4.0 license, unless otherwise specified.

This repository builds on implementations and ideas from the Hugging Face ecosystem and the diffusion-e2e-ft project. Many thanks to the original authors and contributors for their open-source work.

The VideoMaMa model checkpoints (specifically VideoMama/unet/* and dino_projection_mlp.pth) are subject to the Stability AI Community License.

View on GitHub
GitHub Stars420
CategoryContent
Updated4h ago
Forks26

Languages

Python

Security Score

80/100

Audited on Apr 8, 2026

No findings