MatAnyone
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Install / Use
/learn @pq-yang/MatAnyoneREADME
<strong>MatAnyone is a practical human video matting framework supporting target assignment, with stable performance in both semantics of core regions and fine-grained boundary details.</strong>
<div style="width: 100%; text-align: center; margin:auto;"> <img style="width:100%" src="assets/teaser.jpg"> </div>:movie_camera: For more visual results, go checkout our <a href="https://pq-yang.github.io/projects/MatAnyone/" target="_blank">project page</a>
</div>
🚀 Please check out our new release on MatAnyone 2
📮 Update
- [2026.03] Release training codes (TRAIN.md).
- [2025.07] Update Evaluation with scripts and instructions.
- [2025.03] Release our evaluation benchmark - YouTubeMatte.
- [2025.03] Integrate MatAnyone with Hugging Face 🤗
- [2025.02] Release inference codes and gradio demo.
- [2025.02] This repo is created.
🔎 Overview

🔧 Installation
-
Clone Repo
git clone https://github.com/pq-yang/MatAnyone cd MatAnyone -
Create Conda Environment and Install Dependencies
# create new conda env conda create -n matanyone python=3.8 -y conda activate matanyone # install python dependencies pip install -e . # [optional] install python dependencies for gradio demo pip3 install -r hugging_face/requirements.txt
🤗 Load from Hugging Face
Alternatively, models can be directly loaded from Hugging Face to make inference.
pip install -q git+https://github.com/pq-yang/MatAnyone
To extract the foreground and the alpha video you can directly run the following lines. Please refer to inference_hf.py for more arguments.
from matanyone import InferenceCore
processor = InferenceCore("PeiqingYang/MatAnyone")
foreground_path, alpha_path = processor.process_video(
input_path = "inputs/video/test-sample1.mp4",
mask_path = "inputs/mask/test-sample1.png",
output_path = "outputs"
)
🔥 Inference
Download Model
Download our pretrained model from MatAnyone v1.0.0 to the pretrained_models folder (pretrained model can also be automatically downloaded during the first inference).
The directory structure will be arranged as:
pretrained_models
|- matanyone.pth
Quick Test
We provide some examples in the inputs folder. For each run, we take a video and its first-frame segmenatation mask as input. <u>The segmenation mask could be obtained from interactive segmentation models such as SAM2 demo</u>. For example, the directory structure can be arranged as:
inputs
|- video
|- test-sample0 # folder containing all frames
|- test-sample1.mp4 # .mp4, .mov, .avi
|- mask
|- test-sample0_1.png # mask for person 1
|- test-sample0_2.png # mask for person 2
|- test-sample1.png
Run the following command to try it out:
## single target
# short video; 720p
python inference_matanyone.py -i inputs/video/test-sample1.mp4 -m inputs/mask/test-sample1.png
# short video; 1080p
python inference_matanyone.py -i inputs/video/test-sample2.mp4 -m inputs/mask/test-sample2.png
# long video; 1080p
python inference_matanyone.py -i inputs/video/test-sample3.mp4 -m inputs/mask/test-sample3.png
## multiple targets (control by mask)
# obtain matte for target 1
python inference_matanyone.py -i inputs/video/test-sample0 -m inputs/mask/test-sample0_1.png --suffix target1
# obtain matte for target 2
python inference_matanyone.py -i inputs/video/test-sample0 -m inputs/mask/test-sample0_2.png --suffix target2
The results will be saved in the results folder, including the foreground output video and the alpha output video.
- If you want to save the results as per-frame images, you can set
--save_image. - If you want to set a limit for the maximum input resolution, you can set
--max_size, and the video will be downsampled if min(w, h) exceeds. By default, we don't set the limit.
🎪 Interactive Demo
To get rid of the preparation for first-frame segmentation mask, we prepare a gradio demo on hugging face and could also launch locally. Just drop your video/image, assign the target masks with a few clicks, and get the the matting results!
cd hugging_face
# install python dependencies
pip3 install -r requirements.txt # FFmpeg required
# launch the demo
python app.py
By launching, an interactive interface will appear as follow:

👩🏻💻 Training
Please refer to TRAIN.md for instructions.
📊 Evaluation
YouTubeMatte Dataset
We provide a synthetic benchmark YouTubeMatte to enlarge the commonly-used VideoMatte240K-Test. A comparison between them is summarized in the table below.
| Dataset | #Foregrounds | Source | Harmonized | | :------------------ | :----------: | :----------------: | :--------: | | VideoMatte240K-Test | 5 | Purchased Footage | ❌ | | YouTubeMatte | 32 | YouTube Videos | ✅ |
It is noteworthy that we applied harmonization (using Harmonizer) when compositing the foreground on a background. Such an operation effectively makes YouTubeMatte a more challenging benchmark that is closer to the real distribution. As shown in the figure below, while RVM is confused by the harmonized frame, our method still yields robust performance.

Metric Calculation
📦 We provide the inference results with MatAnyone on the YouTubeMatte benchmark here.
To reproduce the quantitative results of YouTubeMatte reported in the paper, we provide the batch inference scripts and evaluation scripts under the ./evaluation folder. We also provide the first-frame segmentation masks we used for evaluation here. To run the evaluation scripts, your files should be arranged as:
data
|- YouTubeMatte_first_frame_seg_mask # for inference only
|- YouTubeMatte
|- youtubematte_512x288
|- youtubematte_1920x1080
|- results
|- youtubematte_512x288
|- youtubematte_1920x1080
Empirically, for low-resolution (youtubematte_512x288) and high-resolution (youtubematte_1920x1080) data, we set different hyperparameter values for --warmup, --erode_kernel, and --dilate_kernel.
# lr: youtubematte_512x288
bash evaluation/infer_batch_lr.sh
python evaluation/eval_yt_lr.py \
--pred-dir ./data/results/youtubematte_512x288 \
--true-dir ./data/YouTubeMatte/youtubematte_512x288
# hr: youtubematte_1920x1080
bash evaluation/infer_batch_hr.sh
python evaluation/eval_yt_hr.py \
--pred-dir ./data/results/youtubematte_1920x1080 \
--true-dir ./data/YouTubeMatte/youtubematte_1920x1080
Similarly, the quantitative results of VideoMatte in the paper could be reproduced in this way, with the first-frame segmentation masks here.
**📦 We also provide the inference results with MatAnyone on the VideoMatte benchmark [here](https://drive.google.com/drive/folders/1SN_7J9P-YxuI-e6QP6AO
