MotionGPT
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Install / Use
/learn @OpenMotionLab/MotionGPTREADME
| Teaser Video | Demo Video | | :--------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------: | | <video src="https://github.com/OpenMotionLab/MotionGPT/assets/120085716/a741e162-b2f4-4f65-af8e-aa19c4115a9e" /> | <video src="https://github.com/OpenMotionLab/MotionGPT/assets/120085716/ae966d17-6326-43e6-8d5b-8562cf3ffd52" /> |
</div> <!-- ### [MotionGPT: Human Motion as a Foreign Language](https://motion-gpt.github.io/) --> <!-- ### [Project Page](https://motion-gpt.github.io/) | [Arxiv Paper](https://arxiv.org/abs/2306.14795) | [HuggingFace Demo](xxx) -->🏃 Intro MotionGPT
MotionGPT is a unified and user-friendly motion-language model to learn the semantic coupling of two modalities and generate high-quality motions and text descriptions on multiple motion tasks.
<details> <summary><b>Technical details</b></summary>Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this “motion vocabulary”, we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.
<img width="1194" alt="pipeline" src="./assets/images/pipeline.png"> </details>🚩 News
- [2025/06/30] Release 🔥<a href="https://motiongpt3.github.io/"> MotionGPT3 </a>🔥 A bimodal motion-language framework using MoT architecture.
- [2023/09/22] MotionGPT got accepted by NeurIPS 2023
- [2023/09/11] Release the <a href="https://huggingface.co/spaces/OpenMotionLab/MotionGPT">huggingface demo</a> 🔥🔥🔥
- [2023/09/09] Release the training of MotionGPT V1.0 🔥🔥🔥
- [2023/06/20] Upload paper and init project
⚡ Quick Start
<details> <summary><b>Setup and download</b></summary>1. Conda environment
conda create python=3.10 --name mgpt
conda activate mgpt
Install the packages in requirements.txt and install PyTorch 2.0
pip install -r requirements.txt
python -m spacy download en_core_web_sm
We test our code on Python 3.10.6 and PyTorch 2.0.0.
2. Dependencies
Run the script to download dependencies materials:
bash prepare/download_smpl_model.sh
bash prepare/prepare_t5.sh
For Text to Motion Evaluation
bash prepare/download_t2m_evaluators.sh
3. Pre-train model
Run the script to download the pre-train model
bash prepare/download_pretrained_models.sh
4. (Optional) Download manually
Visit the Google Driver to download the previous dependencies.
Visit the Hugging Face to download the pretrained models.
</details>▶️ Demo
<details> <summary><b>Webui</b></summary>Run the following script to launch webui, then visit 0.0.0.0:8888
python app.py
</details>
<details>
<summary><b>Batch demo</b></summary>
We support txt file input, the output motions are npy files and output texts are txt files. Please check the configs/assets.yaml for path config, TEST.FOLDER as output folder.
Then, run the following script:
python demo.py --cfg ./configs/config_h3d_stage3.yaml --example ./demos/t2m.txt
Some parameters:
--example=./demo/t2m.txt: input file as text prompts--task=t2m: evaluation tasks including t2m, m2t, pred, inbetween
The outputs:
npy file: the generated motions with the shape of (nframe, 22, 3)txt file: the input text prompt or text output
💻 Train your own models
<details> <summary><b>Training guidance</b></summary>1. Prepare the datasets
-
Please refer to HumanML3D for text-to-motion dataset setup.
-
Put the instructions data in
prepare/instructionsto the same folder of HumanML3D dataset.
2.1. Ready to train motion tokenizer model
Please first check the parameters in configs/config_h3d_stage1.yaml, e.g. NAME,DEBUG.
Then, run the following command:
python -m train --cfg configs/config_h3d_stage1.yaml --nodebug
2.2. Ready to pretrain MotionGPT model
Please update the parameters in configs/config_h3d_stage2.yaml, e.g. NAME,DEBUG,PRETRAINED_VAE (change to your latest ckpt model path in previous step)
Then, run the following command to store all motion tokens of training set for convenience
python -m scripts.get_motion_code --cfg configs/config_h3d_stage2.yaml
After that, run the following command:
python -m train --cfg configs/config_h3d_stage2.yaml --nodebug
2.3. Ready to instruct-tuning MotionGPT model
Please update the parameters in configs/config_h3d_stage3.yaml, e.g. NAME,DEBUG,PRETRAINED (change to your latest ckpt model path in previous step)
Then, run the following command:
python -m train --cfg configs/config_h3d_stage3.yaml --nodebug
3. Evaluate the model
Please first put the tained model checkpoint path to TEST.CHECKPOINT in configs/config_h3d_stage3.yaml.
Then, run the following command:
python -m test --cfg configs/config_h3d_stage3.yaml --task t2m
Some parameters:
--task: evaluation tasks including t2m(Text-to-Motion), m2t(Motion translation), pred(Motion prediction), inbetween(Motion inbetween)
Due to the python package conflit, the released implement of linguistic metrics in motion translation task is by nlg-metricverse, which may not be consistent to the results implemented by nlg-eval. We will fix this in the future.
</details>👀 Visualization
<details> <summary><b>Render SMPL</b></summary>1. Set up blender - WIP
Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.
YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt
2. (Optional) Render rigged cylinders
Run the following command using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
2. Create SMPL meshes with:
python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda
This outputs:
mesh npy file: the generate SMPL vertices with the shape of (nframe, 6893, 3)ply files: the ply mesh file for blender or meshlab
3. Render SMPL meshes
Run the following command to render SMPL using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
optional parameters:
--mode=video: render mp4 video--mode=sequence: render the whole motion in a png image.
⚠️ FAQ
<details> <summary><b>Question-and-Answer</b></summary>The purpose and ability of MotionGPT
<details> <summary>The motivation of MotionGPT.</summary>Answer: We present MotionGPT to address various human motion-related tasks within one single unified model, by unifying motion modeling with language through a shared vocabulary. To train this unified model, we propose an instructional training scheme under the protocols for multiple motion-language, which further reveals the potential of Large Language Models (LLMs) in motion tasks beyond the success of language generation. However, it is non-trivial for this combination since it needs to model and generate two distinct modes from scratch. Contrary to the previous work leveraging CLIP to extract text embedding as motion generation conditions, like T2M-GPT, MotionGPT introduces the motion-language pre-training on LLM so it can leverage the strong language generation and zero-shot transfer abilities of pre-trained language models, as well as generates human language
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
