Pose2Img
A warping based image translation model focusing on upper body synthesis.
Install / Use
/learn @zyhbili/Pose2ImgREADME
Pose2Img
Upper body image synthesis from skeleton(Keypoints). Pose2Img module in the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv / github]
This is a modified implementation of Synthesizing Images of Humans in Unseen Poses.
Setup
To install dependencies, run
pip install -r requirements.txt
To run this module, you need two NVIDIA gpus with at least 11 GB respectively. Our code is tested on Ubuntu 18.04LTS with Python3.6.
Demo Dataset and Checkpoint
- We provide the dataset and pretrained models of Oliver at here.
- According to
$ROOT/configs/yaml/Oliver.yaml:
- Unzip and put the data to
$ROOT/data/Oliver - Put the pretrained model to
$ROOT/ckpt/Oliver/ckpt_final.pth
Train on the Demo dataset
- Train Script:
python main.py \
--name Oliver \
--config_path configs/yaml/Oliver.yaml \
--batch_size 1 \
- Run Tensorboard for training visualization.
tensorboard --logdir ./log --port={$Port} --bind_all
Demo
Generate a realistic video for Oliver from {keypoints}.npz.
python inference.py \
--cfg_path cfg/yaml/Oliver.yaml \
--name demo \
--npz_path target_pose/Oliver/varying_tmplt.npz \
--wav_path target_pose/Oliver/varying_tmplt.mp4
- In the result directory, you can find
jpgfiles which correspond to the npz.
Train on the custom dataset
-
For your own dataset, you need to modify custom config.yaml.
-
Prepare the keypoints using OpenPose.
-
The raw keypoints for each frame is of shape (3, 137) which is composed of
[pose(3,25), face(3,70), left_hand(3,21),right_hand(3,21)]The definition is as follows:
<img src="statics/keypoints.png" alt="definition" />
Citation
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{qian2021speech,
title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
journal={International Conference on Computer Vision (ICCV)},
year={2021}
}
Related Skills
node-connect
341.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.5kCommit, push, and open a PR
