UniHCP
Official PyTorch implementation of UniHCP
Install / Use
/learn @OpenGVLab/UniHCPREADME
UniHCP: A Unified Model for Human-Centric Perceptions
Usage
Preparation
- Install all required dependencies in requirements.txt.
- Replace all
path...to...in the .yaml configuration files to the absolute path to corresponding dataset locations. - Place MAE pretrained weight <a href="https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth">mae_pretrain_vit_base.pth</a> under
core\models\backbones\pretrain_weightsfolder.
*Only slurm-based distributed training & single-gpu testing is implemented in this repo.
Experiments
All experiment configurations files and launch scripts are located in experiments/unihcp/release folder.
To perform full multi-task training for UniHCP, replace <your partition> in train.sh launch script and run:
sh train.sh 88 coslr1e3_104k_b4324g88_h256_I2k_1_10_001_2I_fairscale_m256
To perform evaluations, keep the test_info_list assignments corresponding to the tests you want to perform
, replace <your partition>, then run :
sh batch_test.sh 1 coslr1e3_104k_b4324g88_h256_I2k_1_10_001_2I_fairscale_m256
Note that in this case, the program would look for checkpoints located at experiments/unihcp/release/checkpoints/coslr1e3_104k_b4324g88_h256_I2k_1_10_001_2I_fairscale_m256
Pretrained Models
Please send the signed <a href="https://drive.google.com/file/d/1O4Z7d5b1w0Vh4T8jvQ1tj_WzX12KWnT9/view?usp=share_link">agreement</a> to mail@yuanzheng.ci to get the download link.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
