MTN
Progressive Text-to-3D Generation for Automatic 3D Prototyping (ACM TOMM)
Install / Use
/learn @Texaser/MTNREADME
MTN (Multi-Scale Triplane Network)
This repository contains the official implementation of Progressive Text-to-3D Generation for Automatic 3D Prototyping (https://arxiv.org/abs/2309.14600).
Paper
Video results
https://github.com/Texaser/MTN/assets/50570271/bdc776a6-ee2d-43ff-9ee3-21784799d3cb
https://github.com/Texaser/MTN/assets/50570271/197fa808-154b-4671-8446-8350b1e166d6
For more videos, please refer to https://www.youtube.com/watch?v=LH6-wKg30FQ
Instructions:
- Make sure cuda-toolkit is exported correctly:
nvcc -V
The output is like:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
- Install the requirements:
conda create --name MTN python=3.9
conda activate MTN
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
conda install -c conda-forge gcc=11.2.0 gxx=11.2.0
git clone https://github.com/Texaser/MTN.git
cd MTN
pip install -r requirements.txt --no-build-isolation
If compilation fails, you must override the existing torch version to make sure it match your nvcc version (You can get the installation instructions from https://pytorch.org/get-started/previous-versions/).
To use DeepFloyd-IF, you need to accept the usage conditions from hugging face, and login with huggingface-cli login in command line.
- Start training!
# choose stable-diffusion version
python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --sd_version 2.1
# use DeepFloyd-IF for guidance:
python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --IF
python main.py --text "a rabbit, animated movie character, high detail 3d model" --workspace trial -O --IF --vram_O # requires ~24G GPU memory
python main.py -O --text "a rabbit, animated movie character, high detail 3d model" --workspace trial_perpneg_if_rabbit --iters 6000 --IF --batch_size 1 --perpneg
python main.py -O --text "a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes" --workspace trial_perpneg_if_bunny --iters 6000 --IF --batch_size 1 --perpneg
python main.py -O --text "A high quality photo of a toy motorcycle" --workspace trial_perpneg_if_motorcycle --iters 6000 --IF --batch_size 1 --perpneg
# larger absolute value of negative_w is used for the following command because the defult negative weight of -2 is not enough to make the diffusion model to produce the views as desired
python main.py -O --text "a DSLR photo of a tiger dressed as a doctor" --workspace trial_perpneg_if_tiger --iters 6000 --IF --batch_size 1 --perpneg --negative_w -3.0
# after the training is finished:
# test (exporting 360 degree video)
python main.py --workspace trial -O --test
# also save a mesh (with obj, mtl, and png texture)
python main.py --workspace trial -O --test --save_mesh
# test with a GUI (free view control!)
python main.py --workspace trial -O --test --gui
Tested environments
- python 3.9 & torch 1.13 & CUDA 11.5 on a V100.
- python 3.9 & torch 1.13 & CUDA 11.7 on a 3090/4090.
Tips
The training process can sometimes be unstable due to the original code pipeline (StableDreamfusion). In such cases, you might try adjusting the lr to 3e-4 or 5e-4. Setting the lr to 1e-5 is too small for the model to converge effectively. If the model fails, consider using a different prompt or a different random seed.
Known Issue: CUDA version mismatch when building nvdiffrast
In some environments, the build process of nvdiffrast may fail due to a strict CUDA version check when the detected system CUDA Toolkit version (e.g., CUDA 12.8) does not match the CUDA version PyTorch was compiled against (e.g., CUDA 11.7).
A temporary workaround that has been used by some developers (not recommended as a permanent solution) is to bypass the CUDA version check inside PyTorch.
Locate the cpp_extension.py file in your PyTorch installation, usually at: ~/miniconda3/envs/<your_env_name>/lib/python3.9/site-packages/torch/utils/cpp_extension.py
You can confirm the path by running:
python -c "import torch; print(torch.__file__)"
Open the file and find the function _check_cuda_version (typically around lines 380–400). Locate the line that raises the error:
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
Replace it with a no-op statement, for example:
# raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
print("Warning: CUDA version check temporarily bypassed (build only)")
# or simply:
# pass
Save the file and retry the installation:
pip install -r requirements.txt --no-build-isolation
Star History
If you like this code, please give a star~ <img width="749" alt="image" src="https://github.com/user-attachments/assets/edcaf836-ac45-49f9-9a64-59fbf599a28d" />
Citation
If you find this work useful, a citation will be appreciated via:
@article{yi2026progressive,
title={Progressive Text-to-3D Generation for Automatic 3D Prototyping},
author={Yi, Han and Zheng, Zhedong and Xu, Xiangyu and Chua, Tat-seng},
journal={ACM TOMM},
year={2026}
}
Acknowledgement
This code base is built upon the following awesome open-source projects: Stable DreamFusion, threestudio
Thanks the authors for their remarkable job !
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
