<div align="center"> <h1>VideoMaMa: Mask-Guided Video Matting via Generative Prior</h1>

Sangbeom Lim<sup>1</sup> · Seoung Wug Oh<sup>2</sup> · Jiahui Huang<sup>2</sup> · Heeji Yoon<sup>3</sup>
Seungryong Kim<sup>3</sup> · Joon-Young Lee<sup>2</sup>

<sup>1</sup>Korea University <sup>2</sup>Adobe Research <sup>3</sup>KAIST AI

CVPR 2026

<strong>VideoMaMa is a mask-guided video matting framework that leverages a video generative prior. By utilizing this prior, it supports stable performance across diverse video domains with fine-grained matting quality.</strong>

For more visual results, go checkout our <a href="https://cvlab-kaist.github.io/VideoMaMa/" target="_blank">project page</a>

</div>

📰 News

VideoMaMa is an open-source project. If you find our work helpful, please consider giving this repository a ⭐.

2026-01-19: Our Github Repo is opened!
2026-02-07 ComfyUI-VideoMaMa is now available! (Thanks to @okdalto)
2026-03-13 VideoMaMa is implemented on CorridorKey! (Thanks for @nikopueringer for incredible work!)
2026-04-01 VideoMaMa is implemented on Sammie-Roto-2! (Thanks for @Zarxrax using our model!)

🔥 TODO

[x] Release Demo & Model checkpoint. (Jan 19, 2026)
[x] Release ArXiv paper. (Jan 19, 2026)
[x] Release Training Code. (Mar 14, 2026)
[ ] Evaluation Code.
[ ] Release MA-V dataset.

⚙️ Setup

Please run

bash scripts/setup.sh

it will down load stable video diffusion weight, and setup virtual enviroment needed to run whole codes.
We use conda activate videomama.

This will download sam2 which is needed for training sam2-matte.

🎮 Demo

Please check demo readme.

🎯 Inference

VideoMaMa model checkpoint — available on the Hugging Face Hub: SammyLim/VideoMaMa.

For inferencing video use this command.

python inference_onestep_folder.py \
--base_model_path "<stabilityai/stable-video-diffusion-img2vid-xt_path>" \
--unet_checkpoint_path "<videomama_checkpoint_path>" \
--image_root_path "/assets/example/image" \
--mask_root_path "assets/example/mask" \
--output_dir "assets/example" \
[--optional_arguments]

For example, If you have setup using above command, this example bash will work.

python inference_onestep_folder.py \
    --base_model_path "checkpoints/stable-video-diffusion-img2vid-xt" \
    --unet_checkpoint_path "checkpoints/VideoMaMa" \
    --image_root_path "/assets/example/image" \
    --mask_root_path "assets/example/mask" \
    --output_dir "assets/example" \
    --keep_aspect_ratio

For more information about inference setting, please check inference readme.

🚂🚃🚃🚃🚃 Training

Generating training dataset

Please check Data pipeline README.

Model Training

Please check training README.

🎓 Citation

@article{lim2026videomama,
  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
  author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
  journal={arXiv preprint arXiv:2601.14255},
  year={2026}
}

🙏 Acknowledgments

SAM2: Meta AI's Segment Anything 2
Stable Video Diffusion: Stability AI's video generation model
Gradio: For the amazing UI framework

📧 Contact

For questions or issues, please open an issue on our GitHub repository.

We welcome any feedback, questions, or opportunities for collaboration. If you are interested in using this model for industrial applications, or have specific questions about the architecture and training, please feel free to reach out.

📄 License

The code in this repository is released under the CC BY-NC 4.0 license, unless otherwise specified.

This repository builds on implementations and ideas from the Hugging Face ecosystem and the diffusion-e2e-ft project. Many thanks to the original authors and contributors for their open-source work.

The VideoMaMa model checkpoints (specifically VideoMama/unet/* and dino_projection_mlp.pth) are subject to the Stability AI Community License.

VideoMaMa

Install / Use

README