SkillAgentSearch skills...

TRELLIS

Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).

Install / Use

/learn @microsoft/TRELLIS
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img src="assets/logo.webp" width="100%" align="center"> <h1 align="center">Structured 3D Latents<br>for Scalable and Versatile 3D Generation</h1> <p align="center"><a href="https://arxiv.org/abs/2412.01506"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a> <a href='https://microsoft.github.io/TRELLIS/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a> <a href='https://huggingface.co/spaces/Microsoft/TRELLIS'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a> </p> <p align="center"><img src="assets/teaser.png" width="100%"></p>

<span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a large 3D asset generation model. It takes in text or image prompts and generates high-quality 3D assets in various formats, such as Radiance Fields, 3D Gaussians, and meshes. The cornerstone of <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a unified Structured LATent (<span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span>) representation that allows decoding to different output formats and Rectified Flow Transformers tailored for <span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span> as the powerful backbones. We provide large-scale pre-trained models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> significantly surpasses existing methods, including recent ones at similar scales, and showcases flexible output format selection and local 3D editing capabilities which were not offered by previous models.

Check out our Project Page for more videos and interactive demos!

<!-- Features -->

🌟 Features

  • High Quality: It produces diverse 3D assets at high quality with intricate shape and texture details.
  • Versatility: It takes text or image prompts and can generate various final 3D representations including but not limited to Radiance Fields, 3D Gaussians, and meshes, accommodating diverse downstream requirements.
  • Flexible Editing: It allows for easy editings of generated 3D assets, such as generating variants of the same object or local editing of the 3D asset.
<!-- Updates -->

⏩ Updates

03/25/2025

  • Release training code.
  • Release TRELLIS-text models and asset variants generation.
    • Examples are provided as example_text.py and example_variant.py.
    • Gradio demo is provided as app_text.py.
    • Note: It is always recommended to do text to 3D generation by first generating images using text-to-image models and then using TRELLIS-image models for 3D generation. Text-conditioned models are less creative and detailed due to data limitations.

12/26/2024

  • Release TRELLIS-500K dataset and toolkits for data preparation.

12/18/2024

  • Implementation of multi-image conditioning for TRELLIS-image model. (#7). This is based on tuning-free algorithm without training a specialized model, so it may not give the best results for all input images.
  • Add Gaussian export in app.py and example.py. (#40)
<!-- Installation -->

📦 Installation

Prerequisites

  • System: The code is currently tested only on Linux. For windows setup, you may refer to #3 (not fully tested).
  • Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs.
  • Software:
    • The CUDA Toolkit is needed to compile certain submodules. The code has been tested with CUDA versions 11.8 and 12.2.
    • Conda is recommended for managing dependencies.
    • Python version 3.8 or higher is required.

Installation Steps

  1. Clone the repo:

    git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
    cd TRELLIS
    
  2. Install the dependencies:

    Before running the following command there are somethings to note:

    • By adding --new-env, a new conda environment named trellis will be created. If you want to use an existing conda environment, please remove this flag.
    • By default the trellis environment will use pytorch 2.4.0 with CUDA 11.8. If you want to use a different version of CUDA (e.g., if you have CUDA Toolkit 12.2 installed and do not want to install another 11.8 version for submodule compilation), you can remove the --new-env flag and manually install the required dependencies. Refer to PyTorch for the installation command.
    • If you have multiple CUDA Toolkit versions installed, PATH should be set to the correct version before running the command. For example, if you have CUDA Toolkit 11.8 and 12.2 installed, you should run export PATH=/usr/local/cuda-11.8/bin:$PATH before running the command.
    • By default, the code uses the flash-attn backend for attention. For GPUs do not support flash-attn (e.g., NVIDIA V100), you can remove the --flash-attn flag to install xformers only and set the ATTN_BACKEND environment variable to xformers before running the code. See the Minimal Example for more details.
    • The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.
    • If you encounter any issues during the installation, feel free to open an issue or contact us.

    Create a new conda environment named trellis and install the dependencies:

    . ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
    

    The detailed usage of setup.sh can be found by running . ./setup.sh --help.

    Usage: setup.sh [OPTIONS]
    Options:
        -h, --help              Display this help message
        --new-env               Create a new conda environment
        --basic                 Install basic dependencies
        --train                 Install training dependencies
        --xformers              Install xformers
        --flash-attn            Install flash-attn
        --diffoctreerast        Install diffoctreerast
        --spconv                Install spconv
        --mipgaussian           Install mip-splatting
        --kaolin                Install kaolin
        --nvdiffrast            Install nvdiffrast
        --demo                  Install all dependencies for demo
    
<!-- Pretrained Models -->

🤖 Pretrained Models

We provide the following pretrained models:

| Model | Description | #Params | Download | | --- | --- | --- | --- | | TRELLIS-image-large | Large image-to-3D model | 1.2B | Download | | TRELLIS-text-base | Base text-to-3D model | 342M | Download | | TRELLIS-text-large | Large text-to-3D model | 1.1B | Download | | TRELLIS-text-xlarge | Extra-large text-to-3D model | 2.0B | Download |

Note: It is always recommended to use the image conditioned version of the models for better performance.

Note: All VAEs are included in TRELLIS-image-large model repo.

The models are hosted on Hugging Face. You can directly load the models with their repository names in the code:

TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")

If you prefer loading the model from local, you can download the model files from the links above and load the model with the folder path (folder structure should be maintained):

TrellisImageTo3DPipeline.from_pretrained("/path/to/TRELLIS-image-large")
<!-- Usage -->

💡 Usage

Minimal Example

Here is an example of how to use the pretrained models for 3D asset generation.

import os
# os.environ['ATTN_BACKEND'] = 'xformers'   # Can be 'flash-attn' or 'xformers', default is 'flash-attn'
os.environ['SPCONV_ALGO'] = 'native'        # Can be 'native' or 'auto', default is 'auto'.
                                            # 'auto' is faster but will do benchmarking at the beginning.
                                            # Recommended to set to 'native' if run only once.

import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils

# Load a pipeline from a model folder or a Hugging Face model hub.
pipeline = TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")
pipeline.cuda()

# Load an image
image = Image.open("assets/example_image/T.png")

# Run the pipeline
outputs = pipeline.run(
    image,
    seed=1,
    # Optional parameters
    # sparse_structure_sampler_params={
    #     "steps": 12,
    #     "cfg_strength": 7.5,
    # },
    # slat_sampler_params={
    #     "steps": 12,
    #     "cfg_strength": 3,
    # },
)
# outputs is a dictionary containing generated 3D assets in different formats:
# - outputs['gaussian']: a list of 3D Gaussians
# - outputs['radiance_field']: a list of 

Related Skills

View on GitHub
GitHub Stars12.1k
CategoryDevelopment
Updated25m ago
Forks1.1k

Languages

Python

Security Score

100/100

Audited on Mar 25, 2026

No findings