DIMO

[ICCV 2025 (Highlight)] DIMO: Diverse 3D Motion Generation for Arbitrary Objects

Generate Convert Improve

Install / Use

/learn @Friedrich-M/DIMO

About this skill

Quality Score

0/100

README

DIMO: Diverse 3D Motion Generation for Arbitrary Objects

Paper | Project Page | Demo | Poster

DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis
University of Pennsylvania
ICCV 2025 (Highlight)

📜 News

[2026-01-04] Code and data are pre-released!
[2025-07-24] DIMO is selected as Highlight Paper!
[2025-06-26] DIMO is accepted by ICCV 2025! 🎉 We will release code in this repo.

⚙️ Installation

We use Python 3.10 with PyTorch 2.1.1 and CUDA 11.8. The environment and packages can be installed as follows:

git clone --recursive https://github.com/Friedrich-M/DIMO.git && cd DIMO
conda create -y -n dimo -c nvidia/label/cuda-11.8.0 -c defaults cuda-toolkit=11.8 cuda-compiler=11.8 cudnn=8 python=3.10
conda activate dimo
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt --no-build-isolation

pip install --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
pip install submodules/diff-gauss submodules/diff-gaussian-rasterization submodules/KNN_CUDA submodules/simple-knn --no-build-isolation

📂 Data Preparation

Intuition: distill rich motion priors from video models as diverse motion capture.

Motion Priors Distillation
- We use text-conditioned monocular videos models ([CogVideoX], [Wan2.2], [HunyuanVideo], etc.) to distill rich motion priors. We will add detailed instructions soon.
Geometry Priors Distillation
- We use multi-view video model ([SV3D], [SV4D]) to obtain geometric priors by generating novel views.

You can skip this step and download our processed example data (51 Trump motions) from Google Drive:

mkdir data && cd data && gdown 1b0_2t_KKhOyKlJsYncUcQm6URecAS6M6 && tar -zxvf data_trump_n51_step20.tar.gz && cd ..

🚀 Training

Intuition: jointly model diverse 3D motions in a shared latent space. To train DIMO, simply run:

sh run_train_latent.sh

NOTE: You can modify the hyperparameters in run_train_latent.sh as needed. Check configs/train_config.yaml to view all configurable parameters and default settings
NOTE: Set the vae_latent flag to True to enforce a Gaussian distribution on the motion latent code, which also enables KL divergence loss during training.

✨ Testing

You can also skip training and download our pre-trained model from Google Drive for testing:

mkdir ckpts && cd ckpts && gdown 1-a9JxXvoGRV_qy5ontRShc4mgDgVkrsd && tar -zxvf ckpt_trump_n51_step20.tar.gz && cd ..

Once trained, you can perform 4d rendering and visualize key point trajectories by running:

sh run_test_motion.sh

NOTE: You can specify rendering motions by modifying the render_videos list in run_test_motion.sh; uncomment the corresponding lines to render all motions (it may take some time).

The rendered key point trajectories will look like this (Trump is walking):

https://github.com/user-attachments/assets/6b51b897-ed89-470a-b5e9-b6cb01ccecf0

The 4d rendering results should look like this (reference, fixed view, orbit views):

https://github.com/user-attachments/assets/b7a5c7fd-4d35-4d66-b284-092398f6a29c

NOTE: Since the video models we use for motion prior distillation were not perfect at that time, the generated videos may contain artifacts. We will update the code and models with more advanced video models like Veo3 and SV4D2.0 in the future.

If you have any questions, please feel free to open an issue or email at linzhan@princeton.edu.

🚦 Applications

With the learned motion latent space, we provide scripts to test the following applications. Simply add the corresponding flags in run_test_motion.sh and run it. We also provide some visualization results below. More instructions will be added soon.

Latent Space Motion Interpolation

Add test_interpolation=True in run_test_motion.sh

https://github.com/user-attachments/assets/1d2d1173-cfbd-420d-96fb-eb806ab62c33

Language-Guided Motion Generation

Add test_language=True in run_test_motion.sh

https://github.com/user-attachments/assets/9cbadd77-2b39-48b9-b73d-4d71fcf5b2fb

Test Motion Reconstruction

Add test_motion=True in run_test_motion.sh

https://github.com/user-attachments/assets/e2e3c1aa-a47b-4cee-8301-12ae9be804eb

🌸 Acknowledgement

Our code is built on top of DreamGaussian, CogVideoX, SV4D. Many thanks to the authors for sharing their code. We also greatly appreciate the help from Yiming Xie.

📝 Citation

If you find this paper useful for your research, please consider citing:

@inproceedings{mou2025dimo,
  title={DIMO: Diverse 3D Motion Generation for Arbitrary Objects},
  author={Mou, Linzhan and Lei, Jiahui and Wang, Chen and Liu, Lingjie and Daniilidis, Kostas},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={14357--14368},
  year={2025}
}

Related Skills

qqbot-channel

348.5k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.3k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

348.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

Friedrich-M

View profile

View on GitHub

GitHub Stars147

CategoryContent

Updated10d ago

Forks2

Friedrich-M/DIMO

Languages

Python

Security Score

85/100

Audited on Mar 26, 2026

No findings