AnimationGPT

AnimationGPT is a project focused on generating combat style character animations based on text. This project is trained on the MotionGPT and has produced the first character animation dataset dedicated to combat styles, named CombatMotion, which comes with textual descriptions.

Remarks: Our online server has expired. If you want to use it, please configure the environment locally.

Compare to current text-to-motion dataset

| Dataset | Motions | Texts | Style | Source | | ------------------------------------------------------------ | ---------- | ---------- | ---------- | -------------------- | | KIT-ML | 3,911 | 6,278 | Daily Life | Motion Capture | | HumanML3D | 14,616 | 44,970 | Daily Life | Motion Capture | | Motion-X | 81,084 | 81,084 | Daily Life | Video Reconstruction | | CMP | 8,700 | 26,100 | Combat | Game | | CMR | 14,883 | 14,883 | Combat | Game |

Compared to the current text-to-motion datasets, CombatMotion has the following characteristics:

Derived from game assets.
Features a fighting style, where the animation style in action games tends to be concentrated, and the types of actions are biased.
More detailed textual annotations.

Combat Motion Dataset

Pipline

Obtain game assets in FBX format, redirect them to SMPL, and read the coordinates of human body joints (refer to Fbx2SMPL);
Add textual annotations. For each animation, manually annotate it from the following aspects: action type, weapon type, attack type, locational words, power descriptor words, speed descriptor words, and confusion descriptor words. A partial list of terms is shown below:

| Action type | Weapon type | Attack type | Locative words | Power | Speed | Fuzzy | | ---------------- | ---------------- | ---------------- | ------------------- | -------------- | ------------- | --------- | | Idle | Bare Hand | Left-Handed | In-Place | Light-Weighted | Swift | Piercing | | Get Hit | Sacred Seal | Right-Handed | Towards Left | Steady | Relative Fast | Slash | | Death | Fist | One-Handed | Towards Right | Heavy-Weighted | Uniform Speed | Blunt | | … | … | … | … | … | … | … |

Then, use GPT-4 to combine these annotations into sentences.

| | | | ------------------------------------- | ------------------------------------------- |

The diagram above outlines our annotation process. Initially, we fill in seven key descriptive words based on the characteristics of the animation, followed by writing posture description sentences. Subsequently, we use a large language model to integrate these elements into several complete natural language sentences. Finally, we select the sentence that best meets our requirements as the annotation result.
Process the animation and annotated data into a format compatible with HumanML3D.

CombatMotionProcessed Dataset(CMP)

Download: google drive

CombatMotionProcessed(CMP) is a refined dataset that, in terms of character animation, retains 8,700 high-quality animations with a strong fighting style. In terms of textual annotations, we provide three text annotations for each animation: a concise description, a concise description with sensory details, and a detailed description.

Taking CMP008388 as an example, its corresponding text annotations are:

weapon attack a man holding a Katana,executing a Charged Heavy Attack,Dual Wielding,root motion get Forward, Steady,Powerful and Relative Slow,First slow then fast,Cleanly.
weapon attack a man holding a Katana,executing a Charged Heavy Attack,Dual Wielding,root motion get Forward, Steady,Powerful and Relative Slow,First slow then fast,Cleanly,which make a sense of Piercing,Wide Open,Charged,Accumulating strength.
The character grips the wedge with both hands and charges for a powerful strike. They firmly lower their body, twist to the left, lunge forward with a bow step, and stab with the sword held in both hands.

CombatMotionRaw Dataset(CMR)

Download: google drive

CombatMotionRaw (CMR) is an unrefined dataset containing 14,883 animation entries (CMP is a subset of CMR), but each animation is only provided with one textual annotation. Moreover, the textual annotations in CMR consist of simple concatenations of annotated words. It was found during project development that models trained with this type of annotation performed poorly, thus this format was ultimately not adopted.

Example of textual annotation:

weapon attack curved sword curved greatsword right-handed one-handed charged heavy attack forward steady powerful charged accumulating strength cleanly first slow then fast slash smooth and coherent wide open featherlike roundabout lean over and twist your waist to the left step forward with your right leg store your right hand from the left back swing it diagonally downward and swing two circles.

CMR has a richer set of animation data, unfortunately, the annotations are not detailed enough. You can read the textual annotations from the dataset yourself and refine them.

Model and Evaluation

Here are models trained on the CMP dataset using different algorithms:

MotionGPT Model：google drive
MLD Model：google drive
MDM Model：google drive

Download evaluator: google drive

Evaluation on CMP

| Methods | MultiModal Distance ↓ | R-Precision (top 1)↑ | R-Precision (top 2)↑ | R-Precision(top 3)↑ | FID ↓ | Diversity → | MultiModality ↑ | | ------------ | --------------------- | -------------------- | -------------------- | ------------------- | ----------- | ------------ | --------------- | | Ground Truth | 3.850±0.018 | 0.335±0.004 | 0.513±0.005 | 0.628±0.005 | 0.006±0.003 | 10.098±0.102 | / | | T2M | 4.962±0.031 | 0.252±0.006 | 0.406±0.005 | 0.508±0.006 | 1.898±0.059 | 8.975±0.113 | 4.470±0.112 | | T2M-GPT | 3.701±0.027 | 0.353±0.005 | 0.545±0.006 | 0.663±0.005 | 0.177±0.016 | 10.128±0.132 | 1.798±0.041 | | MDM | 8.414±0.048 | 0.049±0.003 | 0.098±0.005 | 0.148±0.005 | 9.467±0.217 | 7.608±0.100 | 5.682±0.203 | | MLD | 4.331±0.029 | 0.293±0.004 | 0.459±0.003 | 0.568±0.004 | 0.628±0.038 | 9.741±0.093 | 3.035±0.138 | | MMM | 3.621±0.020 | 0.353±0.004 | 0.545±0.004 | 0.667±0.005 | 0.151±0.013 | 10.091±0.086 | 0.757±0.042 | | MoMask | 4.138±0.025 | 0.301±0.005 | 0.481±0.004 | 0.597±0.005 | 0.383±0.018 | 9.689±0.092 | 1.968±0.049 | | MotionGPT | 4.228±0.032 | 0.306±0.004 | 0.486±0.006 | 0.605±0.006 | 0.267±0.017 | 9.357±0.133 | 2.210±0.137 |

Tutorial

If you need to train a model, please download the CMP dataset. Then, follow the tutorials for MotionGPT or other text-to-motion algorithms to set up the environment and train your model.
If you only need to use the AGPT model trained on the CMP dataset, please follow these steps:
1. Set up the environment
  
  Our experimental environment is Ubuntu 22.04, NVIDIA GeForce RTX 4090, and CUDA 11.8
```
git clone https://github.com/OpenMotionLab/MotionGPT.git
cd MotionGPT
conda create python=3.10 --name mgpt
conda activate mgpt
pip install -r requirements.txt
python -m spacy download en_core_web_sm
mkdir deps
cd deps
bash prepare/prepare_t5.sh
bash prepare/download_t2m_evaluators.sh
```
2. Download the CMP dataset
  
  Unzip the dataset into the datasets/humanml3d directory.
```
.
└── humanml3d
    ├── new_joint_vecs
    ├── new_joints
    └── texts
```
3. Generate animations using the model
  - git clone https://github.com/fyyakaxyy/AnimationGPT.git
  - Copy the tools folder and config_AGPT.yaml into the MotionGPT directory
  - Download

AnimationGPT

Install / Use

README

AnimationGPT

Combat Motion Dataset

Pipline

CombatMotionProcessed Dataset(CMP)

CombatMotionRaw Dataset(CMR)

Model and Evaluation

Tutorial