TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

</div>

teaser

TripoSG is an advanced high-fidelity, high-quality and high-generalizability image-to-3D generation foundation model. It leverages large-scale rectified flow transformers, hybrid supervised training, and a high-quality dataset to achieve state-of-the-art performance in 3D shape generation.

✨ Key Features

High-Fidelity Generation: Produces meshes with sharp geometric features, fine surface details, and complex structures
Semantic Consistency: Generated shapes accurately reflect input image semantics and appearance
Strong Generalization: Handles diverse input styles including photorealistic images, cartoons, and sketches
Robust Performance: Creates coherent shapes even for challenging inputs with complex topology

🔬 Technical Highlights

Large-Scale Rectified Flow Transformer: Combines RF's linear trajectory modeling with transformer architecture for stable, efficient training
Advanced VAE Architecture: Uses Signed Distance Functions (SDFs) with hybrid supervision combining SDF loss, surface normal guidance, and eikonal loss
High-Quality Dataset: Trained on 2 million meticulously curated Image-SDF pairs, ensuring superior output quality
Efficient Scaling: Implements architecture optimizations for high performance even at smaller model scales

🔥 Updates

[2025-04] Release TripoSG-scribble, a CFG-distilled, 512 token model for fast shape prototyping from scribble+prompt! Try the online demo here.
[2025-03] Release of TripoSG 1.5B parameter rectified flow model and VAE trained on 2048 latent tokens, along with inference code and interactive demo

🔨 Installation

Clone the repo:

git clone https://github.com/VAST-AI-Research/TripoSG.git
cd TripoSG

Create a conda environment (optional):

conda create -n tripoSG python=3.10
conda activate tripoSG

Install dependencies:

# pytorch (select correct CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/{your-cuda-version}

# other dependencies
pip install -r requirements.txt

💡 Quick Start

Generate a 3D mesh from an image:

python -m scripts.inference_triposg --image-input assets/example_data/hjswed.png --output-path ./output.glb

Limiting the number of faces:

python -m scripts.inference_triposg --image-input assets/example_data/hjswed.png --faces 5000 --output-path ./output.glb

or from scribble+prompt:

 python -m scripts.inference_triposg_scribble --image-input assets/example_scribble_data/cat_with_wings.png --prompt "a cat with wings" --scribble-conf 0.3 --output-path output.glb

The required model weights will be automatically downloaded:

TripoSG (image condition) model from VAST-AI/TripoSG → pretrained_weights/TripoSG = TripoSG-scribble (scribble+prompt condition) model from VAST-AI/TripoSG-scribble → pretrained_weights/TripoSG-scribble
RMBG model from briaai/RMBG-1.4 → pretrained_weights/RMBG-1.4

💻 System Requirements

CUDA-enabled GPU with at least 8GB VRAM

📝 Tips

If you want to use the full VAE module (including the encoder part), you need to uncomment the Line-15 in triposg/models/autoencoders/autoencoder_kl_triposg.py and install torch-cluster. and run:

python -m scripts.inference_vae --surface-input assets/example_data_point/surface_point_demo.npy

🤝 Community & Support

Issues & Discussions: Use GitHub Issues for bug reports and feature requests.
Contributing: We welcome contributions!

📚 Citation

@article{li2025triposg,
  title={TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models},
  author={Li, Yangguang and Zou, Zi-Xin and Liu, Zexiang and Wang, Dehu and Liang, Yuan and Yu, Zhipeng and Liu, Xingchao and Guo, Yuan-Chen and Liang, Ding and Ouyang, Wanli and others},
  journal={arXiv preprint arXiv:2502.06608},
  year={2025}
}

⭐ Acknowledgements

We would like to thank the following open-source projects and research works that made TripoSG possible:

DINOv2 for their powerful visual features
RMBG-1.4 for background removal
🤗 Diffusers for their excellent diffusion model framework
HunyuanDiT for DiT
FlashVDM for their lightning vecset decoder
3DShape2VecSet for 3D shape representation

We are grateful to the broader research community for their open exploration and contributions to the field of 3D generation.

TripoSG

Install / Use

README