Video2robot

End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions

Generate Convert Improve

Install / Use

/learn @AIM-Intelligence/Video2robot

About this skill

Quality Score

0/100

README

video2robot

End-to-end pipeline: Video (or Prompt) → Human Pose Extraction → Robot Motion Conversion

Demo

<video src="https://github.com/user-attachments/assets/a0f1bfb1-7e06-4672-8f6a-320ab60b0bfe" width="800" controls></video> Demo Video <table> <tr> <td align="center" width="50%"> <video src="https://github.com/user-attachments/assets/1d58bac8-173c-499d-b245-65013371d50f" width="400" controls></video> Backflip </td> <td align="center" width="50%"> <video src="https://github.com/user-attachments/assets/94e6d12d-afae-4300-8c5c-c244ad208bdb" width="400" controls></video> Dance Motion </td> </tr> </table>

Pipeline

[Prompt] → Veo → [Video] → PromptHMR → [SMPL-X] → GMR → [Robot Motion]

Project Structure

video2robot/
├── video2robot/            # Main package
│   ├── config.py           # Configuration management
│   ├── pipeline.py         # (Optional) Python API pipeline
│   ├── cli.py              # Console entrypoint for installation
│   ├── video/              # Video generation/processing
│   │   └── veo_client.py   # Google Veo API
│   ├── pose/               # Pose extraction (PromptHMR wrapper)
│   │   └── extractor.py
│   └── robot/              # Robot conversion (GMR wrapper)
│       └── retargeter.py
│
├── scripts/                # CLI scripts
│   ├── run_pipeline.py     # Full pipeline
│   ├── generate_video.py   # Veo video generation
│   ├── extract_pose.py     # Pose extraction
│   └── convert_to_robot.py # Robot conversion
│   └── visualize.py        # Result visualization
│
├── configs/                # Configuration files
├── data/                   # Data (gitignored)
│
└── third_party/            # External dependencies (submodules)
    ├── PromptHMR/          # Pose extraction model
    └── GMR/                # Motion retargeting

Installation

This project requires two conda environments: gmr and phmr.

# Clone repo (with submodules)
git clone --recursive https://github.com/AIM-Intelligence/video2robot.git
cd video2robot

# Or initialize submodules after cloning
git submodule update --init --recursive

1. GMR Environment (Robot Retargeting)

conda create -n gmr python=3.10 -y
conda activate gmr
pip install -e .

For details, see GMR README.

2. PromptHMR Environment (Pose Extraction)

For Blackwell GPU (sm_120) users:

conda create -n phmr python=3.11 -y
conda activate phmr
cd third_party/PromptHMR
bash scripts/install_blackwell.sh

For other GPUs (Ampere, Hopper, etc.):

conda create -n phmr python=3.10 -y
conda activate phmr
cd third_party/PromptHMR
pip install -e .

For details, see PromptHMR README.

Usage

Note: Scripts automatically switch to the appropriate conda environment (gmr or phmr) as needed. Just ensure both environments are installed - no need to manually activate them.

# Full pipeline (action → robot motion) - BASE_PROMPT auto-applied
python scripts/run_pipeline.py --action "Action sequence:
The subject walks forward with four steps."

# Use Sora
python scripts/run_pipeline.py --action "..." --provider sora

# Start from existing video (video.mp4 → robot motion)
python scripts/run_pipeline.py --video /path/to/video.mp4

# Resume from existing project
python scripts/run_pipeline.py --project data/video_001

# Run individual steps
python scripts/generate_video.py --action "Action sequence: The subject walks forward."
python scripts/extract_pose.py --project data/video_001
python scripts/convert_to_robot.py --project data/video_001

# Visualization (auto env switching)
python scripts/visualize.py --project data/video_001
python scripts/visualize.py --project data/video_001 --pose
python scripts/visualize.py --project data/video_001 --robot

Web UI

# Run server (from video2robot root)
uvicorn web.app:app --host 0.0.0.0 --port 8000

# Access in browser
# http://localhost:8000

Features:

Prompt input → Video generation → Pose extraction → Robot conversion automatic pipeline
Video upload support
Veo/Sora model selection
3D visualization (viser)
Video-3D synchronized playback

Environment Setup

# Create .env file
cp .env.example .env

# Set API key
echo "GOOGLE_API_KEY=your-api-key" >> .env

Supported Robots

| Robot | ID | DOF | |-------|-----|-----| | Unitree G1 | unitree_g1 | 29 | | Unitree H1 | unitree_h1 | 19 | | Booster T1 | booster_t1 | 23 |

See GMR README for full list

Output Format

# robot_motion.pkl
{
    "fps": 30.0,
    "robot_type": "unitree_g1",
    "num_frames": 240,
    "root_pos": np.ndarray,    # (N, 3)
    "root_rot": np.ndarray,    # (N, 4) quaternion xyzw
    "dof_pos": np.ndarray,     # (N, DOF)
}

TODO

[ ] lastFrame (Start/End Frame Interpolation) - Veo 3.1 only
- Start image + End image → Generate video smoothly connecting the two
- Useful for "Pose A → Pose B" robot motion videos
[ ] referenceImages (Reference Images) - Veo 3.1 only
- Up to 3 reference images to maintain character/style
- Generate videos with specific character performing actions

Acknowledgements

This project builds upon the following excellent open source projects:

PromptHMR: 3D human mesh recovery from video
GMR: general motion retargeting framework

License

This project depends on third-party libraries with their own licenses:

GMR: MIT License
PromptHMR: Non-Commercial Scientific Research Use Only

Please review both licenses before use.

The core video2robot code is MIT-licensed, but using this repository end-to-end (including PromptHMR) inherits PromptHMR's Non-Commercial Scientific Research Only restriction. Commercial use requires obtaining appropriate permission from the PromptHMR authors.

Related Skills

qqbot-channel

343.1k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

99.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

343.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

AIM-Intelligence

View profile

View on GitHub

GitHub Stars647

CategoryContent

Updated12h ago

Forks78

AIM-Intelligence/video2robot

Languages

Python

Security Score

80/100

Audited on Mar 31, 2026

No findings