SnapMoGen
SnapMoGen: Human Motion Generation from Expressive Texts [NeurIPS 2025]
Install / Use
/learn @snap-research/SnapMoGenREADME
SnapMoGen: Human Motion Generation from Expressive Texts
<p align="left"> <a href='https://www.arxiv.org/abs/2507.09122'> <img src='https://img.shields.io/badge/Arxiv-Pdf-A42C25?style=flat&logo=arXiv&logoColor=white'></a> <a href='https://snap-research.github.io/SnapMoGen/'> <img src='https://img.shields.io/badge/Project-Page-green?style=flat&logo=Google%20chrome&logoColor=white'></a> <a href='https://huggingface.co/datasets/Ericguo5513/SnapMoGen'> <img src='https://img.shields.io/badge/Dataset-SnapMoGen-blue'></a> </p>
If you find our code or paper helpful, please consider starring this repository and citing the following:
@misc{snapmogen2025,
title={SnapMoGen: Human Motion Generation from Expressive Texts},
author={Chuan Guo and Inwoo Hwang and Jian Wang and Bing Zhou},
year={2025},
eprint={2507.09122},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.09122},
}
:postbox: News
📢 2025-09-25 --- SnapMoGen was accepted to NeurIPS 2025. 📢 2025-07-29 --- Released code and dataset. 📢 2025-07-20 --- Initialized the webpage and git project.
:round_pushpin: Getting Started
1.1 Set Up Conda Environment
conda env create -f environment.yml
conda activate momask-plus
🔁 Alternative: Pip Installation
If you encounter issues with Conda, you can install the dependencies using pip:
pip install -r requirements.txt
✅ Tested on Python 3.8.20.
1.2 Models and Dependencies
Download Pre-trained Models
bash prepare/download_models.sh
Download Evaluation Models and Gloves
(For evaluation only.)
bash prepare/download_evaluators.sh
bash prepare/download_glove.sh
Troubleshooting
To address the download error related to gdown: "Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses". A potential solution is to run pip install --upgrade --no-cache-dir gdown, as suggested on https://github.com/wkentaro/gdown/issues/43. This should help resolve the issue.
(Optional) Download Manually
Visit [Google Drive] to download the models and evaluators mannually.
1.3 Download the Datasets
HumanML3D - Follow the instruction in HumanML3D, then copy the dataset to your data folder:
cp -r ./HumanML3D/ your_data_folder/HumanML3D
SnapMoGen - Download the data from huggingface, then place it in the following directory:
cp -r ./SnapMoGen your_data_folder/SnapMoGen
:rocket: Play with MoMask++
Remember to update the
data.root_dirin all theconfig/*.yamlfiles - with your own data directory path.
2.1 Motion Generation
To generate motion from your own text prompts, use:
python gen_momask_plus.py
You can modify the inference configuration (e.g., number of diffusion steps, guidance scale, etc.) in config/eval_momaskplus.yaml.
2.2 Evaluation
Run the following scripts for quantitive evaluation:
python eval_momask_plus_hml.py # Evaluate on HumanML3D dataset
python eval_momask_plus.py # Evaluate on SnapMoGen dataset
2.3 Training
There are two main components in MoMask++, a multi-scale residual motion VQVAE and a generative masked Transformer.
All checkpoints will be stored under
/checkpoint_dir.
Multi-scale Motion RVQVAE
python train_rvq_hml.py # Train RVQVAE on HumanML3D
python train_rvq.py # Train RVQVAE on SnapMoGen
Configuration files:
config/residual_vqvae_hml.yaml(for HumanML3D)config/residual_vqvae.yaml(for SnapMoGen)
Generative Masked Transformer
python train_momask_plus_hml.py # Train on HumanML3D
python train_momask_plus.py # Train on SnapMoGen
Configuration files:
config/train_momaskplus_hml.yaml(for HumanML3D)config/train_momaskplus.yaml(for SnapMoGen)
Remember to change
vq_nameandvq_ckptto your VQ name and VQ checkpoint in these two configuration files. Training accuracy at around 0.25 is normal.
Global Motion Refinement
We use a separate lightweight root motion regressor to refine the root trajectory. In particular, this regressor is trained given local motion features to predict root linear velocities. During motion generation, we use this regressor to re-predict the resulting root trajectories which effectively reduces sliding feet.
:clapper: Visualization
All animations were manually rendered in Blender using Bitmoji characters.
An example character is available here, and we use this Blender scene for animation rendering.
Retargeting
We recommend using the Rokoko Blender add-on (v1.4.1) for seamless motion retargeting.
⚠️ Note: All motions in SnapMoGen use T-Pose as the rest pose.
If your character rig is in A-Pose, use the rest_pose_retarget.py to convert between T-Pose and A-Pose rest poses:
Acknowlegements
We sincerely thank the open-sourcing of these works where our code is based on:
MoMask, VAR, deep-motion-editing, Muse, vector-quantize-pytorch, T2M-GPT, MDM and MLD
Misc
Contact guochuan5513@gmail.com for further questions.
