DIMO
[ICCV 2025 (Highlight)] DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Install / Use
/learn @Friedrich-M/DIMOREADME
DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Paper | Project Page | Demo | Poster
<div align="center"> <img src="assets/pipeline.png" alt="VR-Robo Teaser" style="max-width: 100%;" /> </div>DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis
University of Pennsylvania
ICCV 2025 (Highlight)
📜 News
- [2026-01-04] Code and data are pre-released!
- [2025-07-24] DIMO is selected as Highlight Paper!
- [2025-06-26] DIMO is accepted by ICCV 2025! 🎉 We will release code in this repo.
⚙️ Installation
We use Python 3.10 with PyTorch 2.1.1 and CUDA 11.8. The environment and packages can be installed as follows:
git clone --recursive https://github.com/Friedrich-M/DIMO.git && cd DIMO
conda create -y -n dimo -c nvidia/label/cuda-11.8.0 -c defaults cuda-toolkit=11.8 cuda-compiler=11.8 cudnn=8 python=3.10
conda activate dimo
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt --no-build-isolation
pip install --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
pip install submodules/diff-gauss submodules/diff-gaussian-rasterization submodules/KNN_CUDA submodules/simple-knn --no-build-isolation
📂 Data Preparation
Intuition: distill rich motion priors from video models as diverse motion capture.
-
Motion Priors Distillation
-
- We use text-conditioned monocular videos models ([CogVideoX], [Wan2.2], [HunyuanVideo], etc.) to distill rich motion priors. We will add detailed instructions soon.
-
Geometry Priors Distillation
You can skip this step and download our processed example data (51 Trump motions) from Google Drive:
mkdir data && cd data && gdown 1b0_2t_KKhOyKlJsYncUcQm6URecAS6M6 && tar -zxvf data_trump_n51_step20.tar.gz && cd ..
🚀 Training
Intuition: jointly model diverse 3D motions in a shared latent space. To train DIMO, simply run:
sh run_train_latent.sh
- NOTE: You can modify the hyperparameters in
run_train_latent.shas needed. Checkconfigs/train_config.yamlto view all configurable parameters and default settings - NOTE: Set the
vae_latentflag toTrueto enforce a Gaussian distribution on the motion latent code, which also enables KL divergence loss during training.
✨ Testing
You can also skip training and download our pre-trained model from Google Drive for testing:
mkdir ckpts && cd ckpts && gdown 1-a9JxXvoGRV_qy5ontRShc4mgDgVkrsd && tar -zxvf ckpt_trump_n51_step20.tar.gz && cd ..
Once trained, you can perform 4d rendering and visualize key point trajectories by running:
sh run_test_motion.sh
- NOTE: You can specify rendering motions by modifying the
render_videoslist inrun_test_motion.sh; uncomment the corresponding lines to render all motions (it may take some time).
The rendered key point trajectories will look like this (Trump is walking):
https://github.com/user-attachments/assets/6b51b897-ed89-470a-b5e9-b6cb01ccecf0
The 4d rendering results should look like this (reference, fixed view, orbit views):
https://github.com/user-attachments/assets/b7a5c7fd-4d35-4d66-b284-092398f6a29c
- NOTE: Since the video models we use for motion prior distillation were not perfect at that time, the generated videos may contain artifacts. We will update the code and models with more advanced video models like Veo3 and SV4D2.0 in the future.
If you have any questions, please feel free to open an issue or email at linzhan@princeton.edu.
🚦 Applications
With the learned motion latent space, we provide scripts to test the following applications. Simply add the corresponding flags in run_test_motion.sh and run it. We also provide some visualization results below. More instructions will be added soon.
- Latent Space Motion Interpolation
Add test_interpolation=True in run_test_motion.sh
https://github.com/user-attachments/assets/1d2d1173-cfbd-420d-96fb-eb806ab62c33
- Language-Guided Motion Generation
Add test_language=True in run_test_motion.sh
https://github.com/user-attachments/assets/9cbadd77-2b39-48b9-b73d-4d71fcf5b2fb
- Test Motion Reconstruction
Add test_motion=True in run_test_motion.sh
https://github.com/user-attachments/assets/e2e3c1aa-a47b-4cee-8301-12ae9be804eb
🌸 Acknowledgement
Our code is built on top of DreamGaussian, CogVideoX, SV4D. Many thanks to the authors for sharing their code. We also greatly appreciate the help from Yiming Xie.
📝 Citation
If you find this paper useful for your research, please consider citing:
@inproceedings{mou2025dimo,
title={DIMO: Diverse 3D Motion Generation for Arbitrary Objects},
author={Mou, Linzhan and Lei, Jiahui and Wang, Chen and Liu, Lingjie and Daniilidis, Kostas},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={14357--14368},
year={2025}
}
Related Skills
qqbot-channel
348.5kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
348.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
