PHUMA
Code for "PHUMA: Physically-Grounded Humanoid Locomotion Dataset"
Install / Use
/learn @DAVIAN-Robotics/PHUMAREADME
PHUMA: Physically-Grounded Humanoid Locomotion Dataset
Kyungmin Lee*, Sibeen Kim*, Minho Park, Hyunseung Kim, Dongyoon Hwang, Hojoon Lee, and Jaegul Choo
DAVIAN Robotics, KAIST AI
arXiv 2025. (* indicates equal contribution)
PHUMA leverages large-scale human motion data while overcoming physical artifacts through careful data curation and physics-constrained retargeting to create a high-quality humanoid locomotion dataset.
🚀 Quick Start
Prerequisites
- Python 3.9
- CUDA 12.4 (recommended)
- Conda package manager
Installation
-
Clone the repository:
git clone https://github.com/DAVIAN-Robotics/PHUMA.git cd PHUMA -
Set up the environment:
conda create -n phuma python=3.9 -y conda activate phuma -
Install dependencies:
pip install -r requirements.txt pip install -e . -
Setup PHUMA:
If you just want to download and use the dataset directly:
bash setup_phuma.shThis will download the pre-built PHUMA dataset (G1 and H1-2) and you're ready to go.
If you want to modify or add custom data, or retarget to a custom robot, continue with the full pipeline below.
📊 Dataset Pipeline
1. Physics-Aware Motion Curation
Our physics-aware curation pipeline filters out problematic motions from human motion data to ensure physical plausibility.
1-1) Starting Point:
We begin with the Humanoid-X collection as described in our paper. For more details, refer to the Humanoid-X repository. If you want to reproduce the PHUMA dataset, a practical starting point is Motion-X, which provides excellent documentation on SMPL-X pose data collection.
<details> <summary><strong>ⅰ) Preprocess SMPL-X Data Format</strong></summary>Motion-X produces SMPL-X data in (N, 322) .npy format or raw .npz format (stageii), but PHUMA requires (N, 69) format, focusing on body pose and excluding face, hands, etc. Our preprocessing script handles both formats automatically:
- Recursively finds all
.npyand.npzfiles in the input folder - For
.npzfiles: converts Motion-X stageii format to (N, 322) with Y-up to Z-up coordinate transformation and FPS downsampling - Converts Motion-X format (N, 322) to PHUMA format (N, 69) by extracting
[transl, global_orient, body_pose] - Preserves the directory structure (e.g.,
aist/subset_0008/) in the output folder
python src/curation/preprocess_motionx_format.py \
--human_pose_folder /path_to_motionx_folder/subfolder \
--output_dir data/human_pose \
--target_fps 30 # Target FPS for npz downsampling (default: 30)
</details>
<details>
<summary><strong>ⅱ) Download SMPL-X Models</strong></summary>
Before running the curation pipeline, you need to download the SMPL-X model files:
- Visit SMPL-X official website
- Register and download the following files:
SMPLX_FEMALE.npzandSMPLX_FEMALE.pklSMPLX_MALE.npzandSMPLX_MALE.pklSMPLX_NEUTRAL.npzandSMPLX_NEUTRAL.pkl
- Place all downloaded files in the
asset/human_model/smplx/directory
1-2) Tuning Curation Thresholds:
The default thresholds are tuned to preserve motions with airborne phases (e.g., jumping) while filtering out physically implausible motions. This means some motions in PHUMA may contain minor penetration or floating artifacts. If you need stricter filtering for specific locomotion types (e.g., walking only), you can adjust the thresholds:
- For single file:
# Set your project directory
PROJECT_DIR="[REPLACE_WITH_YOUR_WORKING_DIRECTORY]/PHUMA"
cd $PROJECT_DIR
# We provide an example clip: data/human_pose/example/kick.npy
human_pose_file="example/kick"
python src/curation/preprocess_smplx.py \
--project_dir $PROJECT_DIR \
--human_pose_file $human_pose_file \
--foot_contact_threshold 0.8 \
--visualize 0
# foot_contact_threshold: Default 0.6. Increase to filter out more floating/penetration.
- For folder:
# Set your project directory
PROJECT_DIR="[REPLACE_WITH_YOUR_WORKING_DIRECTORY]/PHUMA"
cd $PROJECT_DIR
human_pose_folder='data/human_pose/example'
python src/curation/preprocess_smplx_folder.py \
--project_dir $PROJECT_DIR \
--human_pose_folder $human_pose_folder \
--foot_contact_threshold 0.8 \
--visualize 0
<details>
<summary>Output Details</summary>
- Preprocessed motion chunks:
example/kick_chunk_0000.npyandexample/kick_chunk_0001.npyunderdata/human_pose_preprocessed/ - If you set
--visualize 1, will also saveexample/kick_chunk_0000.mp4andexample/kick_chunk_0001.mp4underdata/video/human_pose_preprocessed/
For a complete list of tunable parameters, see src/curation/preprocess_smplx.py.
2. Physics-Constrained Motion Retargeting
To address artifacts introduced during the retargeting process, we employ PhySINK, our physics-constrained retargeting method that adapts curated human motion to humanoid robots while enforcing physical plausibility.
2-0) Custom Robot Setup (Optional):
If you want to retarget to a custom humanoid robot (beyond G1 and H1-2), you first need to generate the required configuration files. Our setup script automatically:
- Adds heel/toe keypoint bodies to the robot model
- Computes a T-pose with ground-adjusted root height
- Generates
custom.xml,scene.xml, andconfig.yaml
# From a MuJoCo XML file
python src/utils/setup_humanoid.py \
--input /path/to/your_robot.xml \
--humanoid_type your_robot_name
# From a URDF file
python src/utils/setup_humanoid.py \
--input /path/to/your_robot.urdf \
--humanoid_type your_robot_name
# If foot bodies are not auto-detected, specify them manually
python src/utils/setup_humanoid.py \
--input /path/to/your_robot.xml \
--humanoid_type your_robot_name \
--left_foot_body left_ankle_roll_link \
--right_foot_body right_ankle_roll_link
Output: Configuration files saved to asset/humanoid_model/<your_robot_name>/
custom.xml— Robot model with heel/toe keypointsscene.xml— MuJoCo scene fileconfig.yaml— Robot configuration (keypoints, bone mapping, joint info, T-pose)
Note: After generation, review
config.yamland verify that the bone mappings and keypoints are correct for your robot.
2-1) Shape Adaptation (One-time Setup):
# Find the SMPL-X shape that best fits a given humanoid robot
# This process only needs to be done once and can be reused for all motion files
python src/retarget/shape_adaptation.py \
--project_dir $PROJECT_DIR \
--robot_name g1
Output: Shape parameters saved to asset/humanoid_model/g1/betas.npy
2-2) Motion Adaptation:
This step retargets human motion to robot motion using PhySINK optimization. You can process either a single file or an entire folder.
- For single file:
# Using the curated data from the previous step for Unitree G1 humanoid robot
human_pose_preprocessed_file="example/kick_chunk_0000"
python src/retarget/motion_adaptation.py \
--project_dir $PROJECT_DIR \
--robot_name g1 \
--human_pose_file $human_pose_preprocessed_file
- For folder (with multiprocessing support):
human_pose_preprocessed_folder="data/human_pose_preprocessed/example"
python src/retarget/motion_adaptation_multiprocess.py \
--project_dir $PROJECT_DIR \
--robot_name g1 \
--human_pose_folder $human_pose_preprocessed_folder \
--gpu_ids 0,1,2,3 \
--processes_per_gpu 2
<details>
<summary>Details</summary>
Multiprocessing Parameters:
--gpu_ids: Comma-separated GPU IDs (e.g.,0,1,2,3). If not specified, uses--device(default:cuda:0).--processes_per_gpu: Number of parallel processes per GPU (default: 1).- Recommended: 1-2 for RTX 3090 (24GB), 2-4 for A100 (40GB+)
- Total workers =
len(gpu_ids) × processes_per_gpu - Example:
--gpu_ids 0,1,2,3 --processes_per_gpu 2→ 8 workers total
--num_workers: Manual override for total number of workers (default: auto-calculated from GPU settings)- Use
-1to use all available CPU cores (for CPU-only processing)
- Use
Additional Options:
--visualize: Set to1to generate visualization videos (default:0)--fps: Frame rate for output videos (default:30)--num_iter_dof: Number of optimization iterations (default:3001)--lr_dof: Learning rate for DOF optimization (default:0.005)- See
python src/retarget/motion_adaptation_multiprocess.py --helpfor all available options
Output:
- Retargeted humanoid motion data:
data/humanoid_pose/g1/kick_chunk_0000.npy- Format: Dictionary containing
root_trans,root_ori,dof_pos, andfps
- Format: Dictionary containing
- If you set
--visualize 1, will also savedata/video/humanoid_pose/g1/kick_chunk_0000.mp4
✩ Custom Robot Support:
We support Unitree G1 and H1-2, but you can also retarget to custom humanoid robots. See 2-0) Custom Robot Setup above to generate the required configuration files, then run the shape and motion adaptation steps as usual with your --robot_name.
🎯 Motion Tracking and Evaluation
To repr
