Kimodo

Official implementation of Kimodo, a kinematic motion diffusion model for high-quality human(oid) motion generation.

Generate Convert Improve

Install / Use

/learn @nv-tlabs/Kimodo

About this skill

Quality Score

0/100

README

Overview

Kimodo is a kinematic motion diffusion model trained on a large-scale (700 hours) commercially-friendly optical motion capture dataset. The model generates high-quality 3D human and robot motions, and is controlled through text prompts and an extensive set of constraints such as full-body pose keyframes, end-effector positions/rotations, 2D paths, and 2D waypoints. Full details of the model architecture and training are available in the technical report.

This repository provides:

Inference: code and CLI to generate motions on both human and robot skeletons
Interactive Demo: easily author motions with a timeline interface of text prompts and kinematic controls
Annotations: additional text descriptions for the BONES-SEED dataset, including fine-grained temporal descriptions
[Coming Soon] Benchmark: test cases and evaluation code built on the BONES-SEED dataset to evaluate motion generation models based on text and constraint-following abilities

News

See the full changelog for a detailed list of all changes.

[2026-03-19] Breaking: Model inputs/outputs now use the SOMA 77-joint skeleton (somaskel77).
[2026-03-16] Initial open-source release of Kimodo with five model variants (SOMA, G1, SMPL-X), CLI, interactive demo, and timeline annotations for BONES-SEED.

Kimodo Models

Several variations of Kimodo-v1 are available trained on various skeletons and datasets. All models support text-to-motion and kinematic controls.

Note: models will be downloaded automatically when attempting to generate from the CLI or Interactive Demo, so there is no need to download them manually

| Model | Skeleton | Training Data | Release Date | Hugging Face | License | |:-------|:-------------|:------:|:------:|:-------------:|:-------------:| | Kimodo-SOMA-RP-v1 | SOMA | Bones Rigplay 1 | March 16, 2026 | Link | NVIDIA Open Model | | Kimodo-G1-RP-v1 | Unitree G1 | Bones Rigplay 1 | March 16, 2026 | Link | NVIDIA Open Model | | Kimodo-SOMA-SEED-v1 | SOMA | BONES-SEED | March 16, 2026 | Link | NVIDIA Open Model | | Kimodo-G1-SEED-v1 | Unitree G1 | BONES-SEED | March 16, 2026 | Link | NVIDIA Open Model | | Kimodo-SMPLX-RP-v1 | SMPL-X | Bones Rigplay 1 | March 16, 2026 | Link | NVIDIA R&D Model |

By default, we recommend using the models trained on the full Bones Rigplay 1 dataset (700 hours of mocap) for your motion generation needs. The models trained on BONES-SEED use 288 hours of publicly available mocap data so are less capable, but are useful for comparing your own trained models on the same dataset. Soon, we will be releasing a benchmark to make it easy to compare motion generation models trained on BONES-SEED.

Getting Started

Please see the full documentation for detailed installation instructions, how to use the CLI and Interactive Demo, and other practical tips for generating motions with Kimodo:

Full Documentation

Some notes on installation environment:

Kimodo requires ~17GB of VRAM to generate locally, primarily due to the text embedding model
The model has been most extensively tested on GeForce RTX 3090, GeForce RTX 4090, and NVIDIA A100 GPUs, but should work on other recent cards with sufficient VRAM
This repo was developed on Linux, though Windows should work especially if using Docker

Before getting started with motion generation, please review the best practices and be aware of model limitations.

Interactive Motion Authoring Demo

Demo Documentation and Tutorial

The web-based interactive demo provides an intuitive interface for generating motions with any of the Kimodo model variations. After installation, the demo can be launched with the kimodo_demo command. It runs locally on http://127.0.0.1:7860. Open this URL in your browser to access the interface (or use port forwarding if set up on a server).

Demo Features

Multiple Characters: Supports generating with the SOMA, G1, and SMPL-X versions of Kimodo
Text Prompts: Enter one or more natural language descriptions of desired motions on the timeline
Timeline Editor: Add and edit keyframes and constrained intervals on multiple constraint tracks
Constraint Types:
- Full-Body: Complete joint position constraints at specific frames
- 2D Root: Define waypoints or full paths to follow on the ground plane
- End-Effectors: Control hands and feet positions/rotations
Constraint Editing: Editing mode allows for re-posing of constraints or adjusting waypoints
3D Visualization: Real-time rendering of generated motions with skeleton and skinned mesh options
Playback Controls: Preview generated motions with adjustable playback speed
Multiple Samples: Generate and compare multiple motion variations
Examples: Load pre-existing examples to better understand Kimodo's capabilities
Export: Save constraints and generated motions for later use

Command-Line Interface

CLI Documentation and Examples

Motions can also be generated directly from the command line with the kimodo_gen command or by running python -m kimodo.scripts.generate directly.

Key Arguments:

prompt: A single text description or sequence of texts for the desired motion (required)
--model: Which Kimodo model to use for generation
--duration: Motion duration in seconds
--num_samples: Number of motion variations to generate
--constraints: Constraint file to control the generated motion (e.g., saved from the web demo)
--diffusion_steps: Number of denoising steps
--cfg_type / --cfg_weight: Classifier-free guidance (nocfg, regular with one weight, or separated with two weights for text vs. constraints); see the CLI docs
--no-postprocess: Flag to disable foot skate and constraint cleanup post-processing
--seed: Random seed for reproducible results

The script supports different output formats depending on which skeleton is used. By default, a custom NPZ format is saved that is compatible with the web demo. For Kimodo-G1 models, the motion can be saved in the standard MuJoCo qpos CSV format. For Kimodo-SMPLX, motion can be saved in the standard AMASS npz format for compability with existing pipelines.

Default NPZ Output Format

Generated motions are saved as NPZ files containing:

posed_joints: Global joint positions [T, J, 3]
global_rot_mats: Global joint rotation matrices [T, J, 3, 3]
local_rot_mats: Local (parent-relative) joint rotation matrices [T, J, 3, 3]
foot_contacts: Foot contact labels [left heel, left toe, right heel, right toes] [T, 4]
`smooth_root_

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。