HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation, SIGGRAPH 2024 (Official Implementation)

Project Page | Paper (ArXiv)

Hongyu Liu1,2, Xuan Wang2, Ziyu Wan3, Yujun Shen2, Yibing Song4, Jing Liao3, Qifeng Chen1 1HKUST, 2Ant Group, 3City University of HongKong 4 AI3 Institute, Fudan University

<img alt="headartist" src="docs/HeadArtistTeaser.gif" width="60%"> 👆 Results obtained from HeadArtist👆

Installation

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

You must have an NVIDIA graphics card with at least 6GB VRAM and have CUDA installed.
Install Python >= 3.8.
(Optional, Recommended) Create a virtual environment:

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip

Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.

# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:

pip install ninja

Install dependencies:

pip install -r requirements.txt

Training for HeadArtist

If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1 before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following here and here, and set pretrained_model_name_or_path and controlnet_name_or_path of the guidance and the prompt processor to the local path. (3) If you can not connect to the HuggingFace in China, you can use the mirror HuggingFace to download the models

# Step 1. HeadArtist first generate the geometry, for the unreal-domain head (i.e., T800 in Terminator), you should set the prompt as "a head of T800 in Terminator" and do not use the "photorealistic, flawless face"    
python launch.py configs/headartist-geometry.yaml --train --gpu 0  system.prompt_processor.prompt="a DSLR portrait of elderly woman with deep wrinkles, wearing a knitted hat, photorealistic, flawless face"
# Step 2. To generate the texture, we set the default negative prompts in the config for the real-domain head. For the unreal-domain head, you may need to delete the sketch, cartoon, or drawing. You need set the geometry_convert_from as the output of step1.
python launch.py --config configs/headartist-texture.yaml --train --gpu 0 system.prompt_processor.prompt="a DSLR portrait of elderly woman with deep wrinkles, wearing a knitted hat, photorealistic, flawless face" system.geometry_convert_from=path/to/geometry/stage/trial/dir/ckpts/last.ckpt
# Step 3. We refine the texture further with the perceptual loss. We find the 700 steps can get the best generation quality, so we set the max_steps=20700. You need set the config and resume as the output of step2.
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20700 system.guidance.use_perceptual=True system.loss.lambda_ssd=0  system.loss.lambda_perceptual=100

threestudio uses OmegaConf for flexible configurations. You can easily change any configuration in the YAML file by specifying arguments without --, for example the specified prompt in the above cases. For all supported configurations, please see our documentation.

The training lasts for 10,000 iterations. You can find visualizations of the current status in the trial directory which defaults to [exp_root_dir]/[name]/[tag]@[timestamp], where exp_root_dir (outputs/ by default), name and tag can be set in the configuration file. A 360-degree video will be generated after the training is completed. In training, press ctrl+c one time will stop training and head directly to the test stage which generates the video. Press ctrl+c the second time to fully quit the program.

Resume from checkpoints

If you want to resume from a checkpoint, do:

# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
# you can also perform testing using resumed checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# note that the above commands use parsed configuration files from previous trials
# which will continue using the same trial directory
# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command

# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt

Export Meshes

To export the scene to texture meshes, use the --export option. We currently support exporting to obj+mtl, or obj with vertex colors.

# this uses default mesh-exporter configurations which exports obj+mtl
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter
# specify system.exporter.fmt=obj to get obj with vertex colors
# you may also add system.exporter.save_uv=false to accelerate the process, suitable for a quick peek of the result
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj
# for NeRF-based methods (DreamFusion, Magic3D coarse, Latent-NeRF, SJC)
# you may need to adjust the isosurface threshold (25 by default) to get satisfying outputs
# decrease the threshold if the extracted model is incomplete, increase if it is extruded
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.
# use marching cubes of higher resolutions to get more detailed models
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256

For all the options you can specify when exporting, see the documentation.

See here for example running commands of all our supported models. Please refer to here for tips on getting higher-quality results, and here for reducing VRAM usage.

Code Structure

Here we just briefly introduce the code structure of this project. We will make more detailed documentation about this in the future.

All methods are implemented as a subclass of BaseSystem (in systems/base.py). There typically are six modules inside a system: geometry, material, background, renderer, guidance, and prompt_processor. All modules are subclass of BaseModule (in utils/base.py) except for guidance, and prompt_processor, which are subclass of BaseObject to prevent them from being treated as model parameters and better control their behavior in multi-GPU settings.
All systems, modules, and data modules have their configurations in their own dataclasses.
Base configurations for the whole project can be found in utils/config.py. In the ExperimentConfig dataclass, data, system, and module configurations under system are parsed to configurations of each class mentioned above. These configurations are strictly typed, which means you can only use defined properties in the dataclass and stick to the defined type of each property. This configuration paradigm

HeadArtist

Install / Use

README