<div align='center'> <h2><a href="https://mabaorui.github.io/GeoDream_page/">GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation</a></h2>

Baorui Ma1*, Haoge Deng2,1*, Junsheng Zhou3,1, Yu-Shen Liu3, Tiejun Huang1,4, Xinlong Wang1

1BAAI, 2BUPT, 3THU, 4PKU * Equal Contribution

Paper | Project page

</div> We present GeoDream, a 3D generation method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity. Our numerical and visual comparisons demonstrate that GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 &times 1024) and adheres more closely to semantic coherence. To comprehensively evaluate semantic coherence, to our knowledge, we are the first to propose Uni3D-score metric, lifting the measurement from 2D to 3D. You can find detailed usage instructions for training GeoDream and evaluation code of 3D metric <a href="https://github.com/baaivision/Uni3D">Uni3D</a>-score below.

GeoDream alleviates the Janus problems by incorporating explicit 3D priors with 2D diffusion priors. GeoDream generates consistent multi-view rendered images and rich details textured meshes.

Qualitative comparison with baselines. Back views are highlighted with red rectangles for distinct observation of multiple faces.

News

[1/5/2024] Add support for GeoDream extension of ThreeStudio. The extension implementation can be found <a href="https://github.com/baaivision/GeoDream/tree/threestudio">at the threestudio branch</a>.

[12/20/2023] Add support for Stable-Zero123. Follow the instructions here to give it a try.

[12/2/2023] Code released.

Installation

git clone https://github.com/baaivision/GeoDream.git
cd GeoDream

Due to environmental conflicts between the pre-trained multi-view diffusion for predicting source views and the code for constructing cost volume, we currently have to use two separate virtual environments. This is inconvenient for researchers, and we are working hard to resolve the conflicts as part of our future update plans.

Install for predicting source views

See mv-diffusion/README.md for additional information for install.

Installation for constructing cost volume

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

You must have an NVIDIA graphics card with at least 24GB VRAM and have CUDA installed.
Install Python >= 3.8.
(Optional, Recommended) Create a virtual environment:

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip

Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.

# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:

pip install ninja

Install dependencies:

pip install -r requirements.txt
pip install inplace_abn
FORCE_CUDA=1 pip install --no-cache-dir git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0

If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1 before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following here and here, and set pretrained_model_name_or_path and pretrained_model_name_or_path_lora in configs/geodream-neus.yaml, configs/geodream-dmtet-geometry.yaml, and configs/geodream-dmtet-texture.yaml to the local path.

Quickstart

Two steps required for 3D generation are as follows.

Predict source views and construct cost volume (Choose one of the following two.)
- Driven by a given prompt
- Driven by a given reference view with a prompt
GeoDream Training

Predict source views and construct cost volume

Choose one of the two.

Predict source views and construct cost volume driven by a given reference view with a prompt

conda activate geodream_mv
cd ./mv-diffusion
sh run-volume-by-zero123.sh "An astronaut riding a horse" "ref_imges/demo.png"
# Defaulting to use Zero123. If the generated results are not satisfactory, consider using Stable Zero123.
sh run-volume-by-sd-zero123.sh "An astronaut riding a horse" "ref_imges/demo.png"
conda deactivate
cd ..

Predict source views and construct cost volume driven by a given prompt

conda activate geodream_mv
cd ./mv-diffusion
sh step1-run-mv.sh "An astronaut riding a horse"
conda deactivate
. venv/bin/activate
sh step2-run-volume.sh "An astronaut riding a horse"
cd ..

GeoDream Training

Rendered images obtained from Stage1

Rendered images obtained from Stage2+3

Note: we compress these motion pictures for faster previewing.

conda deactivate
. venv/bin/activate
# --------- Stage 1 (NeuS) --------- #
# object generation with 512x512 NeuS rendering, ~25GB VRAM
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="mv-diffusion/volume/An_astronaut_riding_a_horse/con_volume_lod_150.pth"
# if you don't have enough VRAM, try training with 64x64 NeuS rendering, ~15GB VRAM
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="data/con_volume_lod_150.pth" data.width=64 data.height=64
# using the same model for pretrained and LoRA enables 64x64 training with <10GB VRAM
# but the quality is worse due to the use of an epsilon prediction model for LoRA training
python launch.py --config configs/geodream-neus.yaml --train --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.geometry.init_volume_path="data/con_volume_lod_150.pth" data.width=64 data.height=64 system.guidance.pretrained_model_name_or_path_lora="stabilityai/stable-diffusion-2-1-base"

# --------- Stage 2 (DMTet Geometry Refinement) --------- #
# refine geometry
python launch.py --config configs/geodream-dmtet-geometry.yaml --train system.geometry_convert_from=path/to/stage1/trial/dir/ckpts/last.ckpt --gpu 0 system.prompt_processor.prompt="an astronaut riding a horse" system.renderer.context_type=cuda system.geometry_convert_override.isosurface_threshold=0.0

# --------- Stage 3 (DMTet Texturing) --------- #
# texturing with 1024x1024 rasterization, Stable Diffusion VSD guidance, ~20GB VRAM
python launch.py --config configs/geodream-dmtet-texture.yaml system.geometry.isosurface_resolution=256 --train data.batch_size=2 system.renderer.context_type=cuda --gpu 0 system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt system.prompt_processor.prompt="an astronaut riding a horse"
# if you don't have enough VRAM, try training with batch_size=1, ~10GB VRAM
python launch.py --config configs/geodream-dmtet-texture.yaml system.geometry.isosurface_resolution=256 --train data.batch_size=1 system.renderer.context_type=cuda --gpu 0 system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt system.prompt_processor.prompt="an astronaut riding a horse"

We also provide corresponding scripts for researchers to reference: neus-train.sh corresponds to stage 1, mesh-finetuning-geo.sh and mesh-finetuning-texture.sh corresponds to stage 2 and 3.

GeoDream

Install / Use

README