SnapMoGen: Human Motion Generation from Expressive Texts

teaser_image

If you find our code or paper helpful, please consider starring this repository and citing the following:

@misc{snapmogen2025,
      title={SnapMoGen: Human Motion Generation from Expressive Texts}, 
      author={Chuan Guo and Inwoo Hwang and Jian Wang and Bing Zhou},
      year={2025},
      eprint={2507.09122},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09122}, 
}

:postbox: News

📢 2025-09-25 --- SnapMoGen was accepted to NeurIPS 2025. 📢 2025-07-29 --- Released code and dataset. 📢 2025-07-20 --- Initialized the webpage and git project.

:round_pushpin: Getting Started

1.1 Set Up Conda Environment

conda env create -f environment.yml
conda activate momask-plus

🔁 Alternative: Pip Installation

If you encounter issues with Conda, you can install the dependencies using pip:

pip install -r requirements.txt

✅ Tested on Python 3.8.20.

1.2 Models and Dependencies

Download Pre-trained Models

bash prepare/download_models.sh

Download Evaluation Models and Gloves

(For evaluation only.)

bash prepare/download_evaluators.sh
bash prepare/download_glove.sh

Troubleshooting

To address the download error related to gdown: "Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses". A potential solution is to run pip install --upgrade --no-cache-dir gdown, as suggested on https://github.com/wkentaro/gdown/issues/43. This should help resolve the issue.

(Optional) Download Manually

Visit [Google Drive] to download the models and evaluators mannually.

1.3 Download the Datasets

HumanML3D - Follow the instruction in HumanML3D, then copy the dataset to your data folder:

cp -r ./HumanML3D/ your_data_folder/HumanML3D

SnapMoGen - Download the data from huggingface, then place it in the following directory:

cp -r ./SnapMoGen your_data_folder/SnapMoGen

:rocket: Play with MoMask++

Remember to update the data.root_dir in all the config/*.yaml files - with your own data directory path.

2.1 Motion Generation

To generate motion from your own text prompts, use:

python gen_momask_plus.py

You can modify the inference configuration (e.g., number of diffusion steps, guidance scale, etc.) in config/eval_momaskplus.yaml.

2.2 Evaluation

Run the following scripts for quantitive evaluation:

python eval_momask_plus_hml.py    # Evaluate on HumanML3D dataset
python eval_momask_plus.py        # Evaluate on SnapMoGen dataset

2.3 Training

There are two main components in MoMask++, a multi-scale residual motion VQVAE and a generative masked Transformer.

All checkpoints will be stored under /checkpoint_dir.

Multi-scale Motion RVQVAE

python train_rvq_hml.py           # Train RVQVAE on HumanML3D
python train_rvq.py               # Train RVQVAE on SnapMoGen

Configuration files:

config/residual_vqvae_hml.yaml (for HumanML3D)
config/residual_vqvae.yaml (for SnapMoGen)

Generative Masked Transformer

python train_momask_plus_hml.py   # Train on HumanML3D
python train_momask_plus.py       # Train on SnapMoGen

Configuration files:

config/train_momaskplus_hml.yaml (for HumanML3D)
config/train_momaskplus.yaml (for SnapMoGen)

Remember to change vq_name and vq_ckpt to your VQ name and VQ checkpoint in these two configuration files. Training accuracy at around 0.25 is normal.

Global Motion Refinement

We use a separate lightweight root motion regressor to refine the root trajectory. In particular, this regressor is trained given local motion features to predict root linear velocities. During motion generation, we use this regressor to re-predict the resulting root trajectories which effectively reduces sliding feet.

:clapper: Visualization

All animations were manually rendered in Blender using Bitmoji characters.
An example character is available here, and we use this Blender scene for animation rendering.

Retargeting

We recommend using the Rokoko Blender add-on (v1.4.1) for seamless motion retargeting.

⚠️ Note: All motions in SnapMoGen use T-Pose as the rest pose.

If your character rig is in A-Pose, use the rest_pose_retarget.py to convert between T-Pose and A-Pose rest poses:

Acknowlegements

We sincerely thank the open-sourcing of these works where our code is based on:

MoMask, VAR, deep-motion-editing, Muse, vector-quantize-pytorch, T2M-GPT, MDM and MLD

Misc

Contact guochuan5513@gmail.com for further questions.

SnapMoGen

Install / Use

README