EmbodiedGen
Towards a Generative 3D World Engine for Embodied Intelligence
Install / Use
/learn @HorizonRobotics/EmbodiedGenREADME
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
<!-- [](https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html) --><img src="docs/assets/overall.jpg" alt="Overall Framework" width="700"/>EmbodiedGen is a generative engine to create diverse and interactive 3D worlds composed of high-quality 3D assets(mesh & 3DGS) with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. It composed of six key modules:
Image-to-3D,Text-to-3D,Texture Generation,Articulated Object Generation,Scene GenerationandLayout Generation.
✨ Table of Contents of EmbodiedGen
Follow the documentation to get started!
- 🖼️ Image-to-3D
- 📝 Text-to-3D
- 🎨 Texture Generation
- 🌍 3D Scene Generation
- ⚙️ Articulated Object Generation
- 🏞️ Layout (Interactive 3D Worlds) Generation
- 🎮 Any Simulators
💬 Feedback Wanted: How Do You Use EmbodiedGen & What’s Missing?
🚀 Quick Start
✅ Setup Environment
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git checkout v0.1.7
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen
bash install.sh basic # around 20 mins
# Optional: `bash install.sh extra` for scene3d-cli
✅ Starting from Docker
We provide a pre-built Docker image on Docker Hub with a configured environment for your convenience. For more details, please refer to Docker documentation.
Note: Model checkpoints are not included in the image, they will be automatically downloaded on first run. You still need to set up the GPT Agent manually.
IMAGE=wangxinjie/embodiedgen:env_v0.1.x
CONTAINER=EmbodiedGen-docker-${USER}
docker pull ${IMAGE}
docker run -itd --shm-size="64g" --gpus all --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged --net=host --name ${CONTAINER} ${IMAGE}
docker exec -it ${CONTAINER} bash
✅ Setup GPT Agent
Update the API key in file: embodied_gen/utils/gpt_config.yaml.
You can choose between two backends for the GPT agent:
gpt-4o(Recommended) – Use this if you have access to Azure OpenAI.qwen2.5-vl– An alternative with free usage via OpenRouter, apply a free key here and updateapi_keyinembodied_gen/utils/gpt_config.yaml(50 free requests per day)
📸 Directly use EmbodiedGen All-Simulators-Ready Assets
Explore EmbodiedGen generated assets that are ready for simulation across any simulators (SAPIEN, Isaac Sim, MuJoCo, PyBullet, Genesis, Isaac Gym etc.). Details in chapter any-simulators.
<h2 id="image-to-3d">🖼️ Image-to-3D</h2>
Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems.
(HF space is a simplified demonstration. For the full functionality, please refer to
img3d-cli.)
☁️ Service
Run the image-to-3D generation service locally. Models downloaded automatically on first run, please be patient.
# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &
⚡ API
Generate physically plausible 3D assets from image input via the command-line API.
img3d-cli --image_path apps/assets/example_image/sample_00.jpg apps/assets/example_image/sample_01.jpg \
--n_retry 2 --output_root outputs/imageto3d
# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result
Support the use of SAM3D or TRELLIS as 3D generation model, modify IMAGE3D_MODEL in embodied_gen/scripts/imageto3d.py to switch model.
<h2 id="text-to-3d">📝 Text-to-3D</h2>
Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to
text3d-cli.)
☁️ Service
Deploy the text-to-3D generation service locally.
Text-to-image model based on the Kolors model, supporting Chinese and English prompts. Models downloaded automatically on first run, please be patient.
python apps/text_to_3d.py
⚡ API
Text-to-image model based on SD3.5 Medium, English prompts only. Usage requires agreement to the model license(click accept), models downloaded automatically.
For large-scale 3D asset generation, set --n_image_retry=4 --n_asset_retry=3 --n_pipe_retry=2, slower but better, via automatic checking and retries. For more diverse results, omit --seed_img.
text3d-cli --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
--n_image_retry 1 --n_asset_retry 1 --n_pipe_retry 1 --seed_img 0 \
--output_root outputs/textto3d
Text-to-image model based on the Kolors model.
bash embodied_gen/scripts/textto3d.sh \
--prompts "A globe with wooden base and latitude and longitude lines" "橙色电动手钻,有磨损细节" \
--output_root outputs/textto3d_k
ps: models with more permissive licenses found in embodied_gen/models/image_comm_model.py
<h2 id="texture-generation">🎨 Texture Generation</h2>
Generate visually rich textures for 3D mesh.
☁️ Service
Run the texture generation service locally.
Models downloaded automatically on first run, see download_kolors_weights, geo_cond_mv.
python apps/texture_edit.py
⚡ API
Support Chinese and English prompts.
texture-cli --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
"apps/assets/example_texture/meshes/horse.obj" \
--prompt "举着牌子的写实风格机器人,大眼睛,牌子上写着“Hello”的文字" \
"A gray horse head with flying mane and brown eyes" \
--output_root "outputs/texture_gen" \
--seed 0
<h2 id="3d-scene-generation">🌍 3D Scene Generation</h2> <img src="docs/assets/scene3d.gif" alt="scene3d" style="width: 600px;">
⚡ API
Run
bash install.sh extrato install additional requirements if you need to usescene3d-cli.
It takes ~30mins to generate a color mesh and 3DGS per scene.
CUDA_VISIBLE_DEVICES=0 scene3d-cli \
--prompts "Art studio with easel and canvas" \
--output_dir outputs/bg_scenes/ \
--seed 0 \
--gs3d.max_steps 4000 \
--disable_pano_check
<h2 id="articulated-object-generation">⚙️ Articulated Object Generation</h2>
See our paper published in NeurIPS 2025. [Arxiv Paper] | [Gradio Demo] | [Code]
<img src="docs/assets/articulate.gif" alt="articulate" style="width: 500px;"><h2 id="layout-generation">🏞️ Layout(Interactive 3D Worlds) Generation</h2>
💬 Generate Layout from task description
<table> <tr> <td><img src="docs/assets/layout1.gif" alt="layout1" width="320"/></td> <td><img src="docs/assets/layout2.gif" alt="layout2" width="320"/></td> </tr> <tr> <td><img src="docs/assets/layout3.gif" alt="layout3" width="320"/></td> <td><img src="docs/assets/layout4.gif" alt="layout4" width="320"/></td> </tr> </table>Text-to-
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
