InterPose
(3DV 2026) Pytorch implementation of “InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos”
Install / Use
/learn @Mael-zys/InterPoseREADME
(3DV 2026) InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos
This is the official implementation for paper: "InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos".
[Project Page] [InterPose data] [Paper]

TODO
- [x] Release InterPose dataset
- [x] Data collection framework
- [ ] Spatial control experiments (Training and evaluation)
- [x] Physics-based model: MaskedMimic
- [ ] Kinematics-based model: OmniControl
- [x] Zero-shot human-object interaction experiments (Evaluation)
- [x] Application: HOI-Agent (integrate LLM to enable zero-shot HOI generation in 3D scenes)
Data collection framework
The automatic human motion data collection and annotation pipeline are released now in repo InterPose-data-collection.
Environment Setup
Note: This code was developed with Python 3.8, CUDA 11.7 and PyTorch 2.0.0.
Clone the repo.
git clone git@github.com:Mael-zys/InterPose.git --recursive
cd InterPose/
Install environment
bash scripts/install_InterPose.sh
Prerequisites
-
Please download SMPL-X and put the model to
data/smpl_all_models/. -
Please download all the processed data and put in
processed_data. -
Install ProtoMotion (MaskedMimic) environment and download pretrained models according to README in third-party/ProtoMotion_for_InterPose.
-
(Optional) If you would like to generate visualizations, please download Blender first.
Evaluation: zero-shot generation on OMOMO and BEHAVE dataset.
bash scripts/eval_zero_shot_HOI_generation.sh results/masked_mimic_merged/last.ckpt
Here is an example visualization script:
bash scripts/visualization.sh
HOI-Agent: integrate LLM to enable zero-shot generation in 3D scenes
First config OPENAI_API_KEY in run_HOI_agent.py.
Then run the following script:
bash scripts/run_HOI_agent.sh results/masked_mimic_merged/last.ckpt
Citation
@article{zhang2025interpose,
title={InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos},
author={Zhang, Yangsong and Butt, Abdul Ahad and Varol, G{\"u}l and Laptev, Ivan},
journal={arXiv},
year={2025},
}
Related Repos
We adapted some code from other repos in data processing, learning, evaluation, etc. Please check these useful repos.
https://github.com/lijiaman/chois_release
https://github.com/NVlabs/ProtoMotions/tree/main
https://github.com/lijiaman/omomo_release
