Steer3D
Code implementation for "Feedforward 3D Editing via Text-Steerable Image-to-3D"
Install / Use
/learn @ziqi-ma/Steer3DREADME
Feedforward 3D Editing via Text-Steerable Image-to-3D
Steer3D: Injesting text control to image-to-3D models via a ControlNet-inspired design and a synthetic data engine.
[Ziqi Ma][zm], [Hongqiao Chen][hc], [Yisong Yue][yy], [Georgia Gkioxari][gg]
[Project Page] [arXiv]

Table of Contents:
- Overview
- Environment Setup
- Data Engine
- Benchmark
- Inference on the Benchmark
- Inference in the Wild
- Training
- Citing
Overview <a name="overview"></a>
Steer3D adapts the ControlNet architecture to add text steerability to image-to-3D models. Steer3D is trained only 100k-scale synthetic data generated by our data engine. We share code both for the data engine and the model. Scripts for various steps of the data engine are explained in dataengine/README.md. Below we demonstrate how to perform inference both on benchmarks and on user-provided image and editing text. We also release training code and training data, detailed in the Training section.
Environment Setup <a name="environment"></a>
The data engine and the model requires different environments. The data engine environment setup can be found in dataengine/README.md. For the model environment setup, please follow environment.yml. The specific versions work with cuda 12.8. If your cuda version is different, please adjust accordingly.
conda env create -f environment.yml
conda activate steer3d
Note that libraries kaolin, nvdiffrast, diffoctreerast, mip-splatting, and vox2seq might need manual installation. Please refer to this setup script from TRELLIS for installation of these dependencies.
Data Engine <a name="dataengine"></a>
The environment setup for the data engine, as well as scripts for various steps, are detailed in dataengine/README.md.
Benchmark <a name="benchmark"></a>
Edit3D-Bench is available on HuggingFace. The README.md for this dataset details the data format.
Evaluation on the Benchmark <a name="infbench"></a>
Model checkpoints can be downloaded from HuggingFace. Please specify the correct path for the base model and ControlNet weights. Note that the stage 1 base model checkpoint we use is different from the TRELLIS-released checkpoint (discussed in Appendix-D of the paper).
Evaluation configurations can be changed via flags: Once the benchmark is downloaded, it should directly work with the dataloaders specified in the configs provided in configs/. --num_examples can specify the number of benchmark examples to evaluate on.
--num_seeds specifies the number of seeds for evaluation. --export_glb is optional - when it is set, each prediction (as well as source and target) 3D asset will be exported as glbs and saved in the output directory. The glb export requires post-processing which takes time, and this flag will make the script slower. --output_dir specifies output directory, which includes a visualization grid (as .png files) and optionally the glb files.
Please first set PYTHONPATH=[path to Steer3D]. For different benchmark (texture, removal, addition) evaluations, please use the commands below.
Texture:
Please first set the metadata csv path of val_dataset in configs/stage3_controlnet.json to [path-of-Edit3D-Bench]/metadata/texture.csv, and run the following:
python inference/inference_texture.py \
--stage1_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/output \
--num_examples 150 \
--num_seeds 3 \
--split val
Removal:
Please first set the metadata csv path of val_dataset in configs/stage1_controlnet.json to [path-of-Edit3D-Bench]/metadata/remove.csv, and run the following:
python inference/inference_geometry_texture.py \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/output \
--num_examples 50 \
--num_seeds 3 \
--split val
Addition:
Please first set the metadata csv path of val_dataset in configs/stage1_controlnet.json to [path-of-Edit3D-Bench]/metadata/add.csv, and run the following:
python inference/inference_geometry_texture.py \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/output \
--num_examples 50 \
--num_seeds 3 \
--split val
To evaluate 3D (Chamfer distance, F1) and 2D (LPIPS) metrics, please use the rendering and evaluation scripts provided in evaluation/. Detailed instructions can be found in evaluation/README.md.
The authors noticed inference variations for different PyTorch versions (2.4 vs. 2.8). Please contact authors if you run into issues with exact numerical reproducibility.
Inference in the Wild <a name="infwild"></a>
Evaluation on in-the-wild image and text. The flags are similar to Inference on the Benchmark. You can directly pass in an image path via --image_path and editing text via --text (it could also be a .txt file containing multiple editing texts, separated by linebreaks). --texture_only can be set to ensure better geometry consistency for texture-only edits. A visualization png will be generated. If --export_glb is set, glbs of 3D objects will additionally be generated and saved in the output directory. Here we demonstrate 3 example edits: removal, addition, and texture edits for a traffic cone based on a natural photo.
Removal:
python inference/inference_wild.py \
--image_path media/cone.jpg \
--text "Remove the entire bottom base" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_remove.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--num_seeds 1
Texture:
python inference/inference_wild.py \
--image_path media/cone.jpg \
--text "Turn the entire cone into a metallic silver texture" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--texture_only \
--num_seeds 1
Addition:
python inference/inference_wild.py \
--image_path media/cone.jpg \
--text "Add a cap shaped light on top of the cone" \
--stage1_controlnet_checkpoint [path-to-checkpoints]/stage1/controlnet_add.pt \
--stage1_base_checkpoint [path-to-checkpoints]/stage1/base.pt \
--stage1_config configs/stage1_controlnet.json \
--stage2_controlnet_checkpoint [path-to-checkpoints]/stage2/controlnet.pt \
--stage2_base_checkpoint [path-to-checkpoints]/stage2/base.pt \
--stage2_config configs/stage2_controlnet.json \
--output_dir visualizations/single_image \
--num_seeds 1
Training
Please first set PYTHONPATH=[path to Steer3D]
Flow-matching training of stage 1
python trainers/train_stage1.py --config configs/stage1_controlnet.json --output_dir outputs/stage1
Flow-matching training of stage 2
python trainers/train_stage2.py --config configs/stage2_controlnet.json --output_dir outputs/stage2
DPO of stage 2
python trainers/train_stage2.py --config configs/stage2_dpo.json --output_dir outputs/stage2dpo
Finetuning stage 1 backbone
python trainers/train_stage1_backbone.py \
--config configs/stage1_base_sft.json \
--output_dir outputs/stage1_sft
Citing <a name="citing"></a>
Please use the following BibTeX entry if you find our work helpful!
@misc{ma2025feedforward3deditingtextsteerable,
title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
year={2025},
eprint={25
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
