Text2Human

Code for Text2Human (SIGGRAPH 2022). Paper: Text2Human: Text-Driven Controllable Human Image Generation

Generate Convert Improve

Install / Use

/learn @yumingj/Text2Human

About this skill

Quality Score

0/100

README

Text2Human - Official PyTorch Implementation

This repository provides the official PyTorch implementation for the following paper:

Text2Human: Text-Driven Controllable Human Image Generation</br> Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy and Ziwei Liu</br> In ACM Transactions on Graphics (Proceedings of SIGGRAPH), 2022.

From MMLab@NTU affliated with S-Lab, Nanyang Technological University and SenseTime Research.

<table> <tr> <td><img src="assets/1.png" width="100%"/></td> <td><img src="assets/2.png" width="100%"/></td> <td><img src="assets/3.png" width="100%"/></td> <td><img src="assets/4.png" width="100%"/></td> </tr> <tr> <td align='center' width='24%'>The lady wears a short-sleeve T-shirt with pure color pattern, and a short and denim skirt.</td> <td align='center' width='24%'>The man wears a long and floral shirt, and long pants with the pure color pattern.</td> <td align='center' width='24%'>A lady is wearing a sleeveless pure-color shirt and long jeans</td> <td align='center' width='24%'>The man wears a short-sleeve T-shirt with the pure color pattern and a short pants with the pure color pattern.</td> <tr> </table>

[Project Page] | [Paper] | [Dataset] | [Demo Video] | [Gradio Web Demo]

Updates

[09/2022] :fire::fire::fire:We have released a high-quality 3D human generative model EVA3D!:fire::fire::fire:
[07/2022] Release the model trained on SHHQ dataset!
[07/2022] Try out the web demo of drawings-to-human! .
[06/2022] Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:
[05/2022] Paper and demo video are released.
[05/2022] Code is released.
[05/2022] This website is created.

Installation

Clone this repo:

git clone https://github.com/yumingj/Text2Human.git
cd Text2Human

Dependencies:

All dependencies for defining the environment are provided in environment/text2human_env.yaml. We recommend using Anaconda to manage the python environment:

conda env create -f ./environment/text2human_env.yaml
conda activate text2human
pip install mmcv-full==1.2.1 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.7.0/index.html
pip install mmsegmentation==0.9.0
conda install -c huggingface tokenizers=0.9.4
conda install -c huggingface transformers=4.0.0
conda install -c conda-forge sentence-transformers=2.0.0

If it doesn't work, you may need to install the following packages on your own:

Python 3.6
PyTorch 1.7.1
CUDA 10.1
sentence-transformers 2.0.0
tokenizers 0.9.4
transformers 4.0.0

(1) Dataset Preparation

In this work, we contribute a large-scale high-quality dataset with rich multi-modal annotations named DeepFashion-MultiModal Dataset. Here we pre-processed the raw annotations of the original dataset for the task of text-driven controllable human image generation. The pre-processing pipeline consists of:

align the human body in the center of the images according to the human pose
fuse the clothing color and clothing fabric annotations into one texture annotation
do some annotation cleaning and image filtering
split the whole dataset into the training set and testing set

You can download our processed dataset from this Google Drive. If you want to access the raw annotations, please refer to the DeepFashion-MultiModal Dataset.

After downloading the dataset, unzip the file and put them under the dataset folder with the following structure:

./datasets
├── train_images
    ├── xxx.png
    ...
    ├── xxx.png
    └── xxx.png
├── test_images
    % the same structure as in train_images
├── densepose
    % the same structure as in train_images
├── segm
    % the same structure as in train_images
├── shape_ann
    ├── test_ann_file.txt
    ├── train_ann_file.txt
    └── val_ann_file.txt
└── texture_ann
    ├── test
        ├── lower_fused.txt
        ├── outer_fused.txt
        └── upper_fused.txt
    ├── train
        % the same files as in test
    └── val
        % the same files as in test

(2) Sampling

HuggingFace Demo

Full Web Demo

Drawing-to-human

Colab

Unofficial Demo implemented by @neverix.

Pretrained Models

Pretrained models can be downloaded from the model zoo. Unzip the file and put them under the pretrained_models folder with the following structure:

pretrained_models
├── index_pred_net.pth
├── parsing_gen.pth
├── parsing_token.pth
├── sampler.pth
├── vqvae_bottom.pth
└── vqvae_top.pth

Model Zoo

Remark: For fair research comparisons, it is suggested to use the standard model.

Generation from Paring Maps

You can generate images from given parsing maps and pre-defined texture annotations:

python sample_from_parsing.py -opt ./configs/sample_from_parsing.yml

The results are saved in the folder ./results/sampling_from_parsing.

Generation from Poses

You can generate images from given human poses and pre-defined clothing shape and texture annotations:

python sample_from_pose.py -opt ./configs/sample_from_pose.yml

Remarks: The above two scripts generate images without language interactions. If you want to generate images using texts, you can use the notebook or our user interface.

User Interface

python ui_demo.py

The descriptions for shapes should follow the following format:

<gender>, <sleeve length>, <length of lower clothing>, <outer clothing type>, <other accessories1>, ...

Note: The outer clothing type and accessories can be omitted.

Examples:
man, sleeveless T-shirt, long pants
woman, short-sleeve T-shirt, short jeans

The descriptions for textures should follow the following format:

<upper clothing texture>, <lower clothing texture>, <outer clothing texture>

Note: Currently, we only support 5 types of textures, i.e., pure color, stripe/spline, plaid/lattice,
    floral, denim. Your inputs should be restricted to these textures.

(3) Training Text2Human

Stage I: Pose to Parsing

Train the parsing generation network. If you want to skip the training of this network, you can download our pretrained model from here.

python train_parsing_gen.py -opt ./configs/parsing_gen.yml

Stage II: Parsing to Human

Step 1: Train the top level of the hierarchical VQVAE. We provide our pretrained model here. This model is trained by:

python train_vqvae.py -opt ./configs/vqvae_top.yml

Step 2: Train the bottom level of the hierarchical VQVAE. We provide our pretrained model here. This model is trained by:

python train_vqvae.py -opt ./configs/vqvae_bottom.yml

Stage 3 & 4: Train the sampler with mixture-of-experts. To train the sampler, we first need to train a model to tokenize the parsing maps. You can access our pretrained parsing maps [here](https://drive.go

Related Skills

qqbot-channel

344.1k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

99.8k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

344.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

yumingj

View profile

View on GitHub

GitHub Stars850

CategoryContent

Updated1mo ago

Forks90

yumingj/Text2Human

Languages

Python

Security Score

80/100

Audited on Mar 1, 2026

No findings