Sapiens

High-resolution models for human tasks.

Generate Convert Improve

Install / Use

/learn @facebookresearch/Sapiens

About this skill

Quality Score

0/100

README

<img src="./assets/sapiens_animation.gif" alt="Sapiens" title="Sapiens" width="500"/> <h2 align="center">Foundation for Human Vision Models</h2> <a href="https://rawalkhirodkar.github.io/">Rawal Khirodkar</a> · <a href="https://scholar.google.ch/citations?user=oLi7xJ0AAAAJ&hl=en">Timur Bagautdinov</a> · <a href="https://una-dinosauria.github.io/">Julieta Martinez</a> · <a href="https://about.meta.com/realitylabs/">Su Zhaoen</a> · <a href="https://about.meta.com/realitylabs/">Austin James</a> <a href="https://www.linkedin.com/in/peter-selednik-05036499/">Peter Selednik</a> . <a href="https://scholar.google.fr/citations?user=8orqBsYAAAAJ&hl=ja">Stuart Anderson</a> . <a href="https://shunsukesaito.github.io/">Shunsuke Saito</a> <h3 align="center">ECCV 2024 - Best Paper Candidate</h3> <a href='https://about.meta.com/realitylabs/codecavatars/sapiens/'> <img src='https://img.shields.io/badge/Sapiens-Page-azure?style=for-the-badge&logo=Google%20chrome&logoColor=white&labelColor=000080&color=007FFF' alt='Project Page'> </a> <a href="https://arxiv.org/abs/2408.12569"> <img src='https://img.shields.io/badge/Paper-PDF-green?style=for-the-badge&logo=adobeacrobatreader&logoWidth=20&logoColor=white&labelColor=66cc00&color=94DD15' alt='Paper PDF'> </a> <a href='https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc'> <img src='https://img.shields.io/badge/HuggingFace-Demo-orange?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FF5500&color=orange' alt='Spaces'> </a> <a href='https://rawalkhirodkar.github.io/sapiens/'> <img src='https://img.shields.io/badge/More-Results-ffffff?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0id2hpdGUiIHdpZHRoPSIxOCIgaGVpZ2h0PSIxOCI+PHBhdGggZD0iTTAgMGgyNHYyNEgweiIgZmlsbD0ibm9uZSIvPjxwYXRoIGQ9Ik0xOSAzSDVjLTEuMSAwLTIgLjktMiAydjE0YzAgMS4xLjkgMiAyIDJoMTRjMS4xIDAgMi0uOSAyLTJWNWMwLTEuMS0uOS0yLTItMnpNOSAxN0g3di01aDJ2NXptNCAwaC0ydi03aDJ2N3ptNCAwaC0yVjhoMnY5eiIvPjwvc3ZnPg==&logoColor=white&labelColor=8A2BE2&color=9370DB' alt='Results'> </a>

Sapiens offers a comprehensive suite for human-centric vision tasks (e.g., 2D pose, part segmentation, depth, normal, etc.). The model family is pretrained on 300 million in-the-wild human images and shows excellent generalization to unconstrained conditions. These models are also designed for extracting high-resolution features, having been natively trained at a 1024 x 1024 image resolution with a 16-pixel patch size.

🚀 Getting Started

Clone the Repository

git clone https://github.com/facebookresearch/sapiens.git
export SAPIENS_ROOT=/path/to/sapiens

Recommended: Lite Installation (Inference-only)

For users setting up their own environment primarily for running existing models in inference mode, we recommend the Sapiens-Lite installation.
This setup offers optimized inference (4x faster) with minimal dependencies (only PyTorch + numpy + cv2).

Full Installation

To replicate our complete training setup, run the provided installation script.
This will create a new conda environment named sapiens and install all necessary dependencies.

cd $SAPIENS_ROOT/_install
./conda.sh

Please download the original checkpoints from hugging-face.
You can be selective about only downloading the checkpoints of interest.
Set $SAPIENS_CHECKPOINT_ROOT to be the path to the sapiens_host folder. Place the checkpoints following this directory structure:

sapiens_host/
├── detector/
│   └── checkpoints/
│       └── rtmpose/
├── pretrain/
│   └── checkpoints/
│       ├── sapiens_0.3b/
            ├── sapiens_0.3b_epoch_1600_clean.pth
│       ├── sapiens_0.6b/
            ├── sapiens_0.6b_epoch_1600_clean.pth
│       ├── sapiens_1b/
│       └── sapiens_2b/
├── pose/
   └── checkpoints/
      ├── sapiens_0.3b/
└── seg/
└── depth/
└── normal/

🌟 Human-Centric Vision Tasks

We finetune sapiens for multiple human-centric vision tasks. Please checkout the list below.

Image Encoder <a href="lite/docs/PRETRAIN_README.md" style="color: #FFA500;">[lite]</a>
Pose Estimation <a href="lite/docs/POSE_README.md" style="color: #FFA500;">[lite]</a>
Body Part Segmentation <a href="lite/docs/SEG_README.md" style="color: #FFA500;">[lite]</a>
Depth Estimation <a href="lite/docs/DEPTH_README.md" style="color: #FFA500;">[lite]</a>
Surface Normal Estimation <a href="lite/docs/NORMAL_README.md" style="color: #FFA500;">[lite]</a>

🎯 Easy Steps to Finetuning Sapiens

Finetuning our models is super-easy! Here is a detailed training guide for the following tasks.

📈 Quantitative Evaluations

Pose Estimation

🤝 Acknowledgements & Support & Contributing

We would like to acknowledge the work by OpenMMLab which this project benefits from.
For any questions or issues, please open an issue in the repository.
See contributing and the code of conduct.

License

This project is licensed under LICENSE.
Portions derived from open-source projects are licensed under Apache 2.0.

📚 Citation

If you use Sapiens in your research, please consider citing us.

@article{khirodkar2024sapiens,
  title={Sapiens: Foundation for Human Vision Models},
  author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2408.12569},
  year={2024}
}

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

facebookresearch

View profile

View on GitHub

GitHub Stars5.3k

CategoryDevelopment

Updated4h ago

Forks315

facebookresearch/sapiens

Languages

Python

Security Score

80/100

Audited on Apr 6, 2026

No findings

Sapiens

Install / Use

README

🚀 Getting Started

Clone the Repository

Recommended: Lite Installation (Inference-only)

Full Installation

🌟 Human-Centric Vision Tasks

Image Encoder <sup><small><a href="lite/docs/PRETRAIN_README.md" style="color: #FFA500;">[lite]</a></small></sup>

Pose Estimation <sup><small><a href="lite/docs/POSE_README.md" style="color: #FFA500;">[lite]</a></small></sup>

Body Part Segmentation <sup><small><a href="lite/docs/SEG_README.md" style="color: #FFA500;">[lite]</a></small></sup>

Depth Estimation <sup><small><a href="lite/docs/DEPTH_README.md" style="color: #FFA500;">[lite]</a></small></sup>

Surface Normal Estimation <sup><small><a href="lite/docs/NORMAL_README.md" style="color: #FFA500;">[lite]</a></small></sup>

🎯 Easy Steps to Finetuning Sapiens

Pose Estimation

Body-Part Segmentation

Depth Estimation

Surface Normal Estimation

📈 Quantitative Evaluations

Pose Estimation

🤝 Acknowledgements & Support & Contributing

License

📚 Citation

Related Skills