HumanSD

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"

Generate Convert Improve

Install / Use

/learn @IDEA-Research/HumanSD

About this skill

Quality Score

0/100

README

HumanSD

This repository contains the implementation of the ICCV2023 paper:

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation [Project Page] [Paper] [Code] [Video] [Data] Xuan Ju∗12, Ailing Zeng∗1, Chenchen Zhao∗2, Jianan Wang1, Lei Zhang1, Qiang Xu2 ∗ Equal contribution 1International Digital Economy Academy 2The Chinese University of Hong Kong

In this work, we propose a native skeleton-guided diffusion model for controllable HIG called HumanSD. Instead of performing image editing with dual-branch diffusion, we fine-tune the original SD model using a novel heatmap-guided denoising loss. This strategy effectively and efficiently strengthens the given skeleton condition during model training while mitigating the catastrophic forgetting effects. HumanSD is fine-tuned on the assembly of three large-scale human-centric datasets with text-imagepose information, two of which are established in this work.

(a) a generation by the pre-trained pose-less text-guided stable diffusion (SD)
(b) pose skeleton images as the condition to ControlNet and our proposed HumanSD
(c) a generation by ControlNet
(d) a generation by HumanSD (ours). ControlNet and HumanSD receive both text and pose conditions.

HumanSD shows its superiorities in terms of (I) challenging poses, (II) accurate painting styles, (III) pose control capability, (IV) multi-person scenarios, and (V) delicate details.

Table of Contents

HumanSD

TODO

News!! Our paper have been accepted by ICCV2023! Training code is released.

[x] Release inference code and pretrained models
[x] Release Gradio UI demo
[x] Public training data (LAION-Human)
[x] Release training code

Model Overview

Getting Started

Environment Requirement

HumanSD has been implemented and tested on Pytorch 1.12.1 with python 3.9.

Clone the repo:

git clone git@github.com:IDEA-Research/HumanSD.git

We recommend you first install pytorch following official instructions. For example:

# conda
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

Then, you can install required packages thourgh:

pip install -r requirements.txt

You also need to install MMPose following here. Noted that you only need to install MMPose as a python package. PS: Because of the update of MMPose, we recommend you to install 0.29.0 version of MMPose.

Model and Checkpoints

Download necessary checkpoints of HumanSD, which can be found here. The data structure should be like:

|-- humansd_data
    |-- checkpoints
        |-- higherhrnet_w48_humanart_512x512_udp.pth
        |-- v2-1_512-ema-pruned.ckpt
        |-- humansd-v1.ckpt

Noted that v2-1_512-ema-pruned.ckpt should be download from Stable Diffusion.

Quick Demo

You can run demo either through command line or gradio.

You can run demo through command line with:

python scripts/pose2img.py --prompt "oil painting of girls dancing on the stage" --pose_file assets/pose/demo.npz

You can also run demo compared with ControlNet and T2I-Adapter:

python scripts/pose2img.py --prompt "oil painting of girls dancing on the stage" --pose_file assets/pose/demo.npz --controlnet --t2i

You can run gradio demo through:

python scripts/gradio/pose2img.py

We have also provided the comparison of ControlNet and T2I-Adapter, you can run all these methods in one demo. But you need to download corresponding model and checkpoints following:

<details> <summary>To compare ControlNet, and T2I-Adpater's results.</summary> (1) You need to initialize ControlNet and T2I-Adapter as submodule using

git submodule init
git submodule update

(2) Then download checkpoints from: a. T2I-Adapter b. ControlNet. And put them into humansd_data/checkpoints

Then, run:

python scripts/gradio/pose2img.py --controlnet --t2i

Noted that you may have to modify some code in T2I-Adapter due to the path conflict.

e.g., use

from comparison_models.T2IAdapter.ldm.models.diffusion.ddim import DDIMSampler

instead of

from T2IAdapter.ldm.models.diffusion.ddim import DDIMSampler

</details>

Dataset

You may refer to the code here for loading the data.

Laion-Human

You may apply for access of Laion-Human here. Noted that we have provide the pose annotations, images' .parquet file and mapping file, please download the images according to .parquet. The key in .parquet is the corresponding image index. For example, image with key=338717 in 00033.parquet is corresponding to images/00000/000338717.jpg.

After downloading the images and pose, you need to extract zip files and make it looks like:

|-- humansd_data
    |-- datasets
        |-- Laion 
            |-- Aesthetics_Human
                |-- images
                    |-- 00000.parquet
                    |-- 00001.parquet
                    |-- ...
                |-- pose
                    |-- 00000
                        |-- 000000000.npz
                        |-- 000000001.npz
                        |-- ...
                    |-- 00001
                    |-- ... 
                |-- mapping_file_training.json

Then, you can use python utils/download_data.py to download all images.

Then, the file data structure should be like:

|-- humansd_data
    |-- datasets
        |-- Laion 
            |-- Aesthetics_Human
                |-- images
                    |-- 00000.parquet
                    |-- 00001.parquet
                    |-- ...
                    |-- 00000
                        |-- 000000000.jpg
                        |-- 000000001.jpg
                        |-- ...
                    |-- 00001
                    |-- ...
                |-- pose
                    |-- 00000
                        |-- 000000000.npz
                        |-- 000000001.npz
                        |-- ...
                    |-- 00001
                    |-- ... 
                |-- mapping_file_training.json

If you download the LAION-Aesthetics in tar files, which is different from our data structure, we recommend you extract the tar file through code:

import tarfile
tar_file="00000.tar" # 00000.tar - 00286.tar
present_tar_path=f"xxxxxx/{tar_file}"
save_dir="humansd_data/datasets/Laion/Aesthetics_Human/images"
with tarfile.open(present_tar_path, "r") as tar_file:
    for present_file in tar_file.getmembers():
        if present_file.name.endswith(".jpg"):
            print(f"     image:- {present_file.name} -")
            image_save_path=os.path.join(save_dir,tar_file.replace(".tar",""),present_file.name)
            present_image_fp=TarIO.TarIO(present_tar_path, present_file.name)
            present_image=Image.open(present_image_fp)
            present_image_numpy=cv2.cvtColor(np.array(present_image),cv2.COLOR_RGB2BGR)
            if not os.path.exists(os.path.dirname(image_save_path)):
                os.makedirs(os.path.dirname(image_save_path))
            cv2.imwrite(image_save_path,present_image_numpy)

Human-Art

You may download Human-Art dataset here.

The file data structure should be like:

|-- humansd_data
    |-- datasets
        |-- HumanArt 
            |-- images
                |-- 2D_virtual_human
                    |-- cartoon
                        |-- 000000000007.jpg
                        |-- 000000000019.jpg
                        |-- ...
                    |-- digital_art
                    |-- ...
                |-- 3D_virtual_human
                |-- real_human
            |-- pose
                |-- 2D_virtual_human
                    |-- cartoon
                        |-- 000000000007.npz

Related Skills

docs-writer

99.5k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

341.6k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

IDEA-Research

View profile

View on GitHub

GitHub Stars306

CategoryContent

Updated2mo ago

Forks22

IDEA-Research/HumanSD

Languages

Python

Security Score

100/100

Audited on Jan 20, 2026

No findings