SkillAgentSearch skills...

DiffSplat

[ICLR 2025] Official implementation of "DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation".

Install / Use

/learn @chenguolin/DiffSplat
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

[ICLR 2025] DiffSplat

<h4 align="center">

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu

arXiv Project page Model

<p> <img width="144" src="./assets/_demo/1.gif"> <img width="144" src="./assets/_demo/2.gif"> <img width="144" src="./assets/_demo/3.gif"> <img width="144" src="./assets/_demo/4.gif"> <img width="144" src="./assets/_demo/5.gif"> </p> <p> <img width="144" src="./assets/_demo/6.gif"> <img width="144" src="./assets/_demo/7.gif"> <img width="144" src="./assets/_demo/8.gif"> <img width="144" src="./assets/_demo/9.gif"> <img width="144" src="./assets/_demo/10.gif"> </p> <p> <img width="730", src="./assets/_demo/overview.png"> </p> </h4>

This repository contains the official implementation of the paper: DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation, which is accepted to ICLR 2025. DiffSplat is a generative framework to synthesize 3D Gaussian Splats from text prompts & single-view images in 1~2 seconds. It is fine-tuned directly from a pretrained text-to-image diffusion model.

Feel free to contact me (chenguolin@stu.pku.edu.cn) or open an issue if you have any questions or suggestions.

🔥 See Also

You may also be interested in our other works:

📢 News

  • 2025-03-06: Training instructions for DiffSplat and ControlNet are provided.
  • 2025-02-11: Training instructions for GSRecon and GSVAE are provided.
  • 2025-02-02: Inference instructions (text-conditioned & image-conditioned & controlnet) are provided.
  • 2025-01-29: The source code and pretrained models are released. Happy 🐍 Chinese New Year 🎆!
  • 2025-01-22: DiffSplat is accepted to ICLR 2025.

📋 TODO

  • [x] Provide detailed instructions for inference.
  • [x] Provide detailed instructions for GSRecon & GSVAE training.
  • [x] Provide detailed instructions for DiffSplat training.

🔧 Installation

You may need to modify the specific version of torch in settings/setup.sh according to your CUDA version. There are not restrictions on the torch version, feel free to use your preferred one.

git clone https://github.com/chenguolin/DiffSplat.git
cd DiffSplat
bash settings/setup.sh

📊 Dataset

  • We use G-Objaverse with about 265K 3D objects and 10.6M rendered images (265K x 40 views, including RGB, normal and depth maps) for GSRecon and GSVAE training. Its subset with about 83K 3D objects provided by LGM is used for DiffSplat training. Their text descriptions are provided by the latest version of Cap3D (i.e., refined by DiffuRank).
  • We find the filtering is crucial for the generation quality of DiffSplat, and a larger dataset is beneficial for the performance of GSRecon and GSVAE.
  • We store the dataset in an internal HDFS cluster in this project. Thus, the training code can NOT be directly run on your local machine. Please implement your own dataloading logic referring to our provided dataset & dataloader code.

🚀 Usage

📷 Camera Conventions

The camera and world coordinate systems in this project are both defined in the OpenGL convention, i.e., X: right, Y: up, Z: backward. The camera is located at (0, 0, 1.4) in the world coordinate system, and the camera looks at the origin (0, 0, 0). Please refer to kiuikit camera doc for visualizations of the camera and world coordinate systems.

🤗 Pretrained Models

All pretrained models are available at HuggingFace🤗.

| Model Name | Fine-tined From | #Param. | Link | Note | |-------------------------------|---------------------|-------------|----------|----------| | GSRecon | From scratch | 42M | gsrecon_gobj265k_cnp_even4 | Feed-forward reconstruct per-pixel 3DGS from 4-view (RGB, normal, coordinate) maps | | GSVAE (SD) | SD1.5 VAE | 84M | gsvae_gobj265k_sd | | | GSVAE (SDXL) | SDXL fp16 VAE | 84M | gsvae_gobj265k_sdxl_fp16 | fp16-fixed SDXL VAE is more robust | | GSVAE (SD3) | SD3 VAE | 84M | gsvae_gobj265k_sd3 | | | DiffSplat (SD1.5) | SD1.5 | 0.86B | Text-cond: gsdiff_gobj83k_sd15__render<br> Image-cond: gsdiff_gobj83k_sd15_image__render | Best efficiency | | DiffSplat (PixArt-Sigma) | PixArt-Sigma | 0.61B | Text-cond: gsdiff_gobj83k_pas_fp16__render<br> Image-cond: gsdiff_gobj83k_pas_fp16_image__render | Best Trade-off | | DiffSplat (SD3.5m) | SD3.5 median | 2.24B | Text-cond: gsdiff_gobj83k_sd35m__render<br> Image-cond: gsdiff_gobj83k_sd35m_image__render | Best performance | | DiffSplat ControlNet (SD1.5) | From scratch | 361M | Depth: gsdiff_gobj83k_sd15__render__depth<br> Normal: gsdiff_gobj83k_sd15__render__normal<br> Canny: gsdiff_gobj83k_sd15__render__canny | | | (Optional) ElevEst | dinov2_vitb14_reg | 86 M | elevest_gobj265k_b_C25 | (Optional) Single-view image elevation estimation |

⚡ Inference

0. Download Pretrained Models

Note that:

  • Pretrained weights will download from HuggingFace and stored in ./out.
  • Other pretrained models (such as CLIP, T5, image VAE, etc.) will be downloaded automatically and stored in your HuggingFace cache directory.
  • If you face problems in visiting HuggingFace Hub, you can try to set the environment variable export HF_ENDPOINT=https://hf-mirror.com.
  • GSRecon pretrained weights is NOT really used during inference. Only its rendering function is used for visualization.
python3 download_ckpt.py --model_type [MODEL_TYPE] [--image_cond]

# `MODEL_TYPE`: choose from "sd15", "pas", "sd35m", "depth", "normal", "canny", "elevest"
# `--image_cond`: add this flag for downloading image-conditioned models

For example, to download the text-cond SD1.5-based DiffSplat:

python3 download_ckpt.py --model_type sd15

To download the image-cond PixArt-Sigma-based DiffSplat:

python3 download_ckpt.py --model_type pas --image_cond

1. Text-conditioned 3D Object Generation

Note that:

  • Model differences may not be significant for simple text prompts. We recommend using DiffSplat (SD1.5) for better efficiency, DiffSplat (SD3.5m) for better performance, and DiffSplat (PixArt-Sigma) for a better trade-off.
  • By default, export HF_HOME=~/.cache/huggingface, export TORCH_HOME=~/.cache/torch. You can change these paths in scripts/infer.sh. SD3-related models require HuggingFace token for downloading, which is expected to be stored in HF_HOME.
  • Outputs will be stored in ./out/<MODEL_NAME>/inference.
  • Prompt is specified by --prompt (e.g.
View on GitHub
GitHub Stars488
CategoryDevelopment
Updated3h ago
Forks28

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings