TokenHSI

[CVPR 2025 Oral] TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

Generate Convert Improve

Install / Use

/learn @liangpan99/TokenHSI

About this skill

Quality Score

0/100

README

<h1 align="center"TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization</h1> <a href='https://liangpan99.github.io/' target='_blank'>Liang Pan</a>1,2 · <a href='https://zeshiyang.github.io/' target='_blank'>Zeshi Yang</a> 3 · <a href='https://frank-zy-dou.github.io/' target='_blank'>Zhiyang Dou</a>2 · <a href='https://wenjiawang0312.github.io/' target='_blank'>Wenjia Wang</a>2 · <a href='https://www.buzhenhuang.com/about/' target='_blank'>Buzhen Huang</a>4 · <a href='https://scholar.google.com/citations?user=KNWTvgEAAAAJ&hl=en' target='_blank'>Bo Dai</a>2,5 · <a href='https://i.cs.hku.hk/~taku/' target='_blank'>Taku Komura</a>2 · <a href='https://scholar.google.com/citations?user=GStTsxAAAAAJ&hl=en&oi=ao' target='_blank'>Jingbo Wang</a>1 1Shanghai AI Lab 2The University of Hong Kong 3Independent Researcher 4Southeast University 5Feeling AI CVPR 2025 🏆️ Oral Presentation (Top 3.3%) Also Spotlight in the 1st Workshop on Humanoid Agents at CVPR 2025 <a href='https://arxiv.org/abs/2503.19901'> <img src='https://img.shields.io/badge/arXiv-2503.19901-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a> <a href='https://arxiv.org/pdf/2503.19901'> <img src='https://img.shields.io/badge/Paper-PDF-yellow?style=flat&logo=arXiv&logoColor=yellow'></a> <a href='https://liangpan99.github.io/TokenHSI/'> <img src='https://img.shields.io/badge/Project-Page-green?style=flat&logo=Google%20chrome&logoColor=green'></a>

🏠 About

<div style="text-align: center;"> <img src="https://github.com/liangpan99/TokenHSI/blob/page/static/images/teaser.png" width=100% > </div> Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene interaction tasks. It excels at seamlessly unifying multiple foundational HSI skills within a single transformer network and flexibly adapting learned skills to challenging new tasks, including skill composition, object/terrain shape variation, and long-horizon task completion.

📹 Demo

<img src="assets/longterm_demo_isaacgym.gif" align="center" width=60% > Long-horizon Task Completion in a Complex Dynamic Environment

🔥 News

[2025-04-07] Released full code. Please note to download the latest datasets and models from Hugging Face.
[2025-04-06] Released three skill composition tasks with pre-trained models.
[2025-04-05] TokenHSI has been selected as an oral paper at CVPR 2025! 🎉
[2025-04-03] Released long-horizon task completion with a pre-trained model.
[2025-04-01] We just updated the Getting Started section. You can play TokenHSI now!
[2025-03-31] We've released the codebase and checkpoint for the foundational skill learning part.

📝 TODO List

[x] Release foundational skill learning
[x] Release policy adaptation - skill composition
[x] Release policy adaptation - object shape variation
[x] Release policy adaptation - terrain shape variation
[x] Release policy adaptation - long-horizon task completion

📖 Getting Started

Dependencies

Follow the following instructions:

Create new conda environment and install pytroch

conda create -n tokenhsi python=3.8
conda activate tokenhsi
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Install IsaacGym Preview 4

cd IsaacGym_Preview_4_Package/isaacgym/python
pip install -e .

# add your conda env path to ~/.bashrc
export LD_LIBRARY_PATH="your_conda_env_path/lib:$LD_LIBRARY_PATH"

Install pytorch3d (optional, if you want to run the long-horizon task completion demo)

We use pytorch3d to rapidly render height maps of dynamic objects for thousands of simulation environments.
```
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install git+https://github.com/facebookresearch/pytorch3d.git@v0.7.7
```

Download SMPL body models and organize them as follows:

|-- assets
|-- body_models
    |-- smpl
        |-- SMPL_FEMALE.pkl
        |-- SMPL_MALE.pkl
        |-- SMPL_NEUTRAL.pkl
        |-- ...
|-- lpanlib
|-- tokenhsi

Motion & Object Data

We provide two methods to generate the motion and object data.

Download pre-processed data from Hugging Face. Please follow the instruction in the dataset page.
Generate data from source:
1. Download AMASS (SMPL-X Neutral), SAMP, and OMOMO.
2. Modify dataset paths in tokenhsi/data/dataset_cfg.yaml file.
```
# Motion datasets, please use your own paths
amass_dir: "/YOUR_PATH/datasets/AMASS"
samp_pkl_dir: "/YOUR_PATH/datasets/samp"
omomo_dir: "/YOUR_PATH/datasets/OMOMO/data"
```
3. We still need to download the pre-processed data from Hugging Face. But now we only require the object data.
4. Run the following script:
```
bash tokenhsi/scripts/gen_data.sh
```

Checkpoints

Download checkpoints from Hugging Face. Please follow the instruction in the model page.

🕹️ Play TokenHSI!

Single task policy trained with AMP

Path-following

# test
sh tokenhsi/scripts/single_task/traj_test.sh
# train
sh tokenhsi/scripts/single_task/traj_train.sh

Sitting

# test
sh tokenhsi/scripts/single_task/sit_test.sh
# train
sh tokenhsi/scripts/single_task/sit_train.sh

Climbing

# test
sh tokenhsi/scripts/single_task/climb_test.sh
# train
sh tokenhsi/scripts/single_task/climb_train.sh

Carrying

# test
sh tokenhsi/scripts/single_task/carry_test.sh
# train
sh tokenhsi/scripts/single_task/carry_train.sh

TokenHSI's unified transformer policy

Foundational Skill Learning

# test
sh tokenhsi/scripts/tokenhsi/stage1_test.sh
# eval
sh tokenhsi/scripts/tokenhsi/stage1_eval.sh carry # we need to specify a task to eval, e.g., traj, sit, climb, or carry.
# train
sh tokenhsi/scripts/tokenhsi/stage1_train.sh