EmbodiedGen

Towards a Generative 3D World Engine for Embodied Intelligence

Generate Convert Improve

Install / Use

/learn @HorizonRobotics/EmbodiedGen

About this skill

Quality Score

0/100

README

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

EmbodiedGen is a generative engine to create diverse and interactive 3D worlds composed of high-quality 3D assets(mesh & 3DGS) with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. It composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation.

✨ Table of Contents of EmbodiedGen

Follow the documentation to get started!

🖼️ Image-to-3D
📝 Text-to-3D
🎨 Texture Generation
🌍 3D Scene Generation
⚙️ Articulated Object Generation
🏞️ Layout (Interactive 3D Worlds) Generation
🎮 Any Simulators

💬 Feedback Wanted: How Do You Use EmbodiedGen & What’s Missing?

🚀 Quick Start

✅ Setup Environment

git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git checkout v0.1.7
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen
bash install.sh basic # around 20 mins
# Optional: `bash install.sh extra` for scene3d-cli

✅ Starting from Docker

We provide a pre-built Docker image on Docker Hub with a configured environment for your convenience. For more details, please refer to Docker documentation.

Note: Model checkpoints are not included in the image, they will be automatically downloaded on first run. You still need to set up the GPT Agent manually.

IMAGE=wangxinjie/embodiedgen:env_v0.1.x
CONTAINER=EmbodiedGen-docker-${USER}
docker pull ${IMAGE}
docker run -itd --shm-size="64g" --gpus all --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged --net=host --name ${CONTAINER} ${IMAGE}
docker exec -it ${CONTAINER} bash

✅ Setup GPT Agent

Update the API key in file: embodied_gen/utils/gpt_config.yaml.

You can choose between two backends for the GPT agent:

gpt-4o (Recommended) – Use this if you have access to Azure OpenAI.
qwen2.5-vl – An alternative with free usage via OpenRouter, apply a free key here and update api_key in embodied_gen/utils/gpt_config.yaml (50 free requests per day)

📸 Directly use EmbodiedGen All-Simulators-Ready Assets

Explore EmbodiedGen generated assets that are ready for simulation across any simulators (SAPIEN, Isaac Sim, MuJoCo, PyBullet, Genesis, Isaac Gym etc.). Details in chapter any-simulators.

<h2 id="image-to-3d">🖼️ Image-to-3D</h2>

Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems. (HF space is a simplified demonstration. For the full functionality, please refer to img3d-cli.)

☁️ Service

Run the image-to-3D generation service locally. Models downloaded automatically on first run, please be patient.

# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &

⚡ API

Generate physically plausible 3D assets from image input via the command-line API.

img3d-cli --image_path apps/assets/example_image/sample_00.jpg apps/assets/example_image/sample_01.jpg \
--n_retry 2 --output_root outputs/imageto3d

# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result

Support the use of SAM3D or TRELLIS as 3D generation model, modify IMAGE3D_MODEL in embodied_gen/scripts/imageto3d.py to switch model.

Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to text3d-cli.)

☁️ Service

Deploy the text-to-3D generation service locally.

Text-to-image model based on the Kolors model, supporting Chinese and English prompts. Models downloaded automatically on first run, please be patient.

python apps/text_to_3d.py

⚡ API

Text-to-image model based on SD3.5 Medium, English prompts only. Usage requires agreement to the model license(click accept), models downloaded automatically.

For large-scale 3D asset generation, set --n_image_retry=4 --n_asset_retry=3 --n_pipe_retry=2, slower but better, via automatic checking and retries. For more diverse results, omit --seed_img.

text3d-cli --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
    --n_image_retry 1 --n_asset_retry 1 --n_pipe_retry 1 --seed_img 0 \
    --output_root outputs/textto3d

Text-to-image model based on the Kolors model.

bash embodied_gen/scripts/textto3d.sh \
    --prompts "A globe with wooden base and latitude and longitude lines" "橙色电动手钻，有磨损细节" \
    --output_root outputs/textto3d_k

ps: models with more permissive licenses found in embodied_gen/models/image_comm_model.py

<h2 id="texture-generation">🎨 Texture Generation</h2>

Generate visually rich textures for 3D mesh.

☁️ Service

Run the texture generation service locally. Models downloaded automatically on first run, see download_kolors_weights, geo_cond_mv.

python apps/texture_edit.py

⚡ API

Support Chinese and English prompts.

texture-cli --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
"apps/assets/example_texture/meshes/horse.obj" \
--prompt "举着牌子的写实风格机器人，大眼睛，牌子上写着“Hello”的文字" \
"A gray horse head with flying mane and brown eyes" \
--output_root "outputs/texture_gen" \
--seed 0

<h2 id="3d-scene-generation">🌍 3D Scene Generation</h2> <img src="docs/assets/scene3d.gif" alt="scene3d" style="width: 600px;">

⚡ API

Run bash install.sh extra to install additional requirements if you need to use scene3d-cli.

It takes ~30mins to generate a color mesh and 3DGS per scene.

CUDA_VISIBLE_DEVICES=0 scene3d-cli \
--prompts "Art studio with easel and canvas" \
--output_dir outputs/bg_scenes/ \
--seed 0 \
--gs3d.max_steps 4000 \
--disable_pano_check

<h2 id="articulated-object-generation">⚙️ Articulated Object Generation</h2>

See our paper published in NeurIPS 2025. [Arxiv Paper] | [Gradio Demo] | [Code]

<h2 id="layout-generation">🏞️ Layout(Interactive 3D Worlds) Generation</h2>

💬 Generate Layout from task description

Text-to-

Related Skills

node-connect

349.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。