MTN

Progressive Text-to-3D Generation for Automatic 3D Prototyping (ACM TOMM)

Generate Convert Improve

Install / Use

/learn @Texaser/MTN

About this skill

Quality Score

0/100

README

MTN (Multi-Scale Triplane Network)

This repository contains the official implementation of Progressive Text-to-3D Generation for Automatic 3D Prototyping (https://arxiv.org/abs/2309.14600).

Paper

Video results

https://github.com/Texaser/MTN/assets/50570271/bdc776a6-ee2d-43ff-9ee3-21784799d3cb

https://github.com/Texaser/MTN/assets/50570271/197fa808-154b-4671-8446-8350b1e166d6

For more videos, please refer to https://www.youtube.com/watch?v=LH6-wKg30FQ

Instructions:

Make sure cuda-toolkit is exported correctly:

nvcc -V

The output is like:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

Install the requirements:

conda create --name MTN python=3.9
conda activate MTN
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
conda install -c conda-forge gcc=11.2.0 gxx=11.2.0
git clone https://github.com/Texaser/MTN.git
cd MTN
pip install -r requirements.txt --no-build-isolation

If compilation fails, you must override the existing torch version to make sure it match your nvcc version (You can get the installation instructions from https://pytorch.org/get-started/previous-versions/).

To use DeepFloyd-IF, you need to accept the usage conditions from hugging face, and login with huggingface-cli login in command line.

Start training!

# choose stable-diffusion version
python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --sd_version 2.1

# use DeepFloyd-IF for guidance:

python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --IF
python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --IF --vram_O # requires ~24G GPU memory
python main.py -O --text "a rabbit, animated movie character, high detail 3d model" --workspace trial_perpneg_if_rabbit --iters 6000 --IF --batch_size 1 --perpneg
python main.py -O --text "a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes" --workspace trial_perpneg_if_bunny --iters 6000 --IF --batch_size 1 --perpneg
python main.py -O --text "A high quality photo of a toy motorcycle" --workspace trial_perpneg_if_motorcycle --iters 6000 --IF --batch_size 1 --perpneg

# larger absolute value of negative_w is used for the following command because the defult negative weight of -2 is not enough to make the diffusion model to produce the views as desired
python main.py -O --text "a DSLR photo of a tiger dressed as a doctor" --workspace trial_perpneg_if_tiger --iters 6000 --IF --batch_size 1 --perpneg --negative_w -3.0

# after the training is finished:
# test (exporting 360 degree video)
python main.py --workspace trial -O --test
# also save a mesh (with obj, mtl, and png texture)
python main.py --workspace trial -O --test --save_mesh
# test with a GUI (free view control!)
python main.py --workspace trial -O --test --gui

Tested environments

python 3.9 & torch 1.13 & CUDA 11.5 on a V100.
python 3.9 & torch 1.13 & CUDA 11.7 on a 3090/4090.

Tips

The training process can sometimes be unstable due to the original code pipeline (StableDreamfusion). In such cases, you might try adjusting the lr to 3e-4 or 5e-4. Setting the lr to 1e-5 is too small for the model to converge effectively. If the model fails, consider using a different prompt or a different random seed.

Known Issue: CUDA version mismatch when building `nvdiffrast`

In some environments, the build process of nvdiffrast may fail due to a strict CUDA version check when the detected system CUDA Toolkit version (e.g., CUDA 12.8) does not match the CUDA version PyTorch was compiled against (e.g., CUDA 11.7).

A temporary workaround that has been used by some developers (not recommended as a permanent solution) is to bypass the CUDA version check inside PyTorch.

Locate the cpp_extension.py file in your PyTorch installation, usually at: ~/miniconda3/envs/<your_env_name>/lib/python3.9/site-packages/torch/utils/cpp_extension.py

You can confirm the path by running:

python -c "import torch; print(torch.__file__)"

Open the file and find the function _check_cuda_version (typically around lines 380–400). Locate the line that raises the error:

raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))

Replace it with a no-op statement, for example:

# raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
print("Warning: CUDA version check temporarily bypassed (build only)")
# or simply:
# pass

Save the file and retry the installation:

pip install -r requirements.txt --no-build-isolation

Star History

If you like this code, please give a star~ <img width="749" alt="image" src="https://github.com/user-attachments/assets/edcaf836-ac45-49f9-9a64-59fbf599a28d" />

Citation

If you find this work useful, a citation will be appreciated via:

@article{yi2026progressive,
  title={Progressive Text-to-3D Generation for Automatic 3D Prototyping},
  author={Yi, Han and Zheng, Zhedong and Xu, Xiangyu and Chua, Tat-seng},
  journal={ACM TOMM},
  year={2026}
}

Acknowledgement

This code base is built upon the following awesome open-source projects: Stable DreamFusion, threestudio

Thanks the authors for their remarkable job !

Related Skills

node-connect

347.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

Texaser

View profile

View on GitHub

GitHub Stars54

CategoryDevelopment

Updated6d ago

Forks10

Texaser/MTN

Languages

Python

Security Score

95/100

Audited on Mar 28, 2026

No findings

MTN

Install / Use

README

MTN (Multi-Scale Triplane Network)

Paper

Video results

Instructions:

Tested environments

Tips

Known Issue: CUDA version mismatch when building nvdiffrast

Star History

Citation

Acknowledgement

Related Skills

Known Issue: CUDA version mismatch when building `nvdiffrast`