Video2robot
End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions
Install / Use
/learn @AIM-Intelligence/Video2robotREADME
video2robot
End-to-end pipeline: Video (or Prompt) → Human Pose Extraction → Robot Motion Conversion
Demo
<p align="center"> <video src="https://github.com/user-attachments/assets/a0f1bfb1-7e06-4672-8f6a-320ab60b0bfe" width="800" controls></video> </p> <p align="center"><b>Demo Video</b></p> <table> <tr> <td align="center" width="50%"> <video src="https://github.com/user-attachments/assets/1d58bac8-173c-499d-b245-65013371d50f" width="400" controls></video> <br><b>Backflip</b> </td> <td align="center" width="50%"> <video src="https://github.com/user-attachments/assets/94e6d12d-afae-4300-8c5c-c244ad208bdb" width="400" controls></video> <br><b>Dance Motion</b> </td> </tr> </table>Pipeline
[Prompt] → Veo → [Video] → PromptHMR → [SMPL-X] → GMR → [Robot Motion]
Project Structure
video2robot/
├── video2robot/ # Main package
│ ├── config.py # Configuration management
│ ├── pipeline.py # (Optional) Python API pipeline
│ ├── cli.py # Console entrypoint for installation
│ ├── video/ # Video generation/processing
│ │ └── veo_client.py # Google Veo API
│ ├── pose/ # Pose extraction (PromptHMR wrapper)
│ │ └── extractor.py
│ └── robot/ # Robot conversion (GMR wrapper)
│ └── retargeter.py
│
├── scripts/ # CLI scripts
│ ├── run_pipeline.py # Full pipeline
│ ├── generate_video.py # Veo video generation
│ ├── extract_pose.py # Pose extraction
│ └── convert_to_robot.py # Robot conversion
│ └── visualize.py # Result visualization
│
├── configs/ # Configuration files
├── data/ # Data (gitignored)
│
└── third_party/ # External dependencies (submodules)
├── PromptHMR/ # Pose extraction model
└── GMR/ # Motion retargeting
Installation
This project requires two conda environments: gmr and phmr.
# Clone repo (with submodules)
git clone --recursive https://github.com/AIM-Intelligence/video2robot.git
cd video2robot
# Or initialize submodules after cloning
git submodule update --init --recursive
1. GMR Environment (Robot Retargeting)
conda create -n gmr python=3.10 -y
conda activate gmr
pip install -e .
For details, see GMR README.
2. PromptHMR Environment (Pose Extraction)
For Blackwell GPU (sm_120) users:
conda create -n phmr python=3.11 -y
conda activate phmr
cd third_party/PromptHMR
bash scripts/install_blackwell.sh
For other GPUs (Ampere, Hopper, etc.):
conda create -n phmr python=3.10 -y
conda activate phmr
cd third_party/PromptHMR
pip install -e .
For details, see PromptHMR README.
Usage
Note: Scripts automatically switch to the appropriate conda environment (
gmrorphmr) as needed. Just ensure both environments are installed - no need to manually activate them.
# Full pipeline (action → robot motion) - BASE_PROMPT auto-applied
python scripts/run_pipeline.py --action "Action sequence:
The subject walks forward with four steps."
# Use Sora
python scripts/run_pipeline.py --action "..." --provider sora
# Start from existing video (video.mp4 → robot motion)
python scripts/run_pipeline.py --video /path/to/video.mp4
# Resume from existing project
python scripts/run_pipeline.py --project data/video_001
# Run individual steps
python scripts/generate_video.py --action "Action sequence: The subject walks forward."
python scripts/extract_pose.py --project data/video_001
python scripts/convert_to_robot.py --project data/video_001
# Visualization (auto env switching)
python scripts/visualize.py --project data/video_001
python scripts/visualize.py --project data/video_001 --pose
python scripts/visualize.py --project data/video_001 --robot
Web UI
# Run server (from video2robot root)
uvicorn web.app:app --host 0.0.0.0 --port 8000
# Access in browser
# http://localhost:8000
Features:
- Prompt input → Video generation → Pose extraction → Robot conversion automatic pipeline
- Video upload support
- Veo/Sora model selection
- 3D visualization (viser)
- Video-3D synchronized playback
Environment Setup
# Create .env file
cp .env.example .env
# Set API key
echo "GOOGLE_API_KEY=your-api-key" >> .env
Supported Robots
| Robot | ID | DOF |
|-------|-----|-----|
| Unitree G1 | unitree_g1 | 29 |
| Unitree H1 | unitree_h1 | 19 |
| Booster T1 | booster_t1 | 23 |
See GMR README for full list
Output Format
# robot_motion.pkl
{
"fps": 30.0,
"robot_type": "unitree_g1",
"num_frames": 240,
"root_pos": np.ndarray, # (N, 3)
"root_rot": np.ndarray, # (N, 4) quaternion xyzw
"dof_pos": np.ndarray, # (N, DOF)
}
TODO
-
[ ]
lastFrame(Start/End Frame Interpolation) - Veo 3.1 only- Start image + End image → Generate video smoothly connecting the two
- Useful for "Pose A → Pose B" robot motion videos
-
[ ]
referenceImages(Reference Images) - Veo 3.1 only- Up to 3 reference images to maintain character/style
- Generate videos with specific character performing actions
Acknowledgements
This project builds upon the following excellent open source projects:
License
This project depends on third-party libraries with their own licenses:
Please review both licenses before use.
The core video2robot code is MIT-licensed, but using this repository end-to-end (including PromptHMR) inherits PromptHMR's Non-Commercial Scientific Research Only restriction. Commercial use requires obtaining appropriate permission from the PromptHMR authors.
Related Skills
qqbot-channel
343.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
