BBoxMaskPose

[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'

Generate Convert Improve

Install / Use

/learn @MiraPurkrabek/BBoxMaskPose

About this skill

Quality Score

0/100

README

</h1><div id="toc"> <ul align="center" style="list-style: none; padding: 0; margin: 0;"> <summary> <h1 style="margin-bottom: 0.0em;"> BBoxMaskPose v2 </h1> </summary> </ul> </div> </h1><div id="toc"> <ul align="center" style="list-style: none; padding: 0; margin: 0;"> <summary> <h2 style="margin-bottom: 0.2em;"> CVPR 2025 + ICCV 2025 </h2> </summary> </ul> </div> <div align="center"> <img src="data/assets/BMP_043+076+174.gif" alt="BBoxMaskPose v2 loop" height="500px">

</div>

[!IMPORTANT] The new version of <b>BBox-Mask-Pose (BMPv2)</b> is now available on <b>arXiv</b>. BMPv2 significantly improves performance; see the quantitative results reported in the preprint. One of the key contributions is <b>PMPose</b>, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes. The code is integrated in the <code>main</code> branch and was released in Release 2.0.0. Due to repository changes, the version 2.0.0 is not backward compatible with previous versions.

📢 News

Mar 2025: HuggingFace Image Demo is up-to-date with BMPv2. Check-out the 3D generation!
Mar 2026: Version 2.0 with improved (1) pose and (2) SAM and (3) wiring to 3D prediction released.
Feb 2026: SAM-pose2seg won a Best Paper Award on CVWW 2026 🎉
Jan 2026: BMPv2 paper is available on arXiv
Aug 2025: HuggingFace Image Demo is out! 🎮
Jul 2025: Version 1.1 with easy-to-run image demo released
Jun 2025: BMPv1 paper accepted to ICCV 2025! 🎉
Dec 2024: BMPv1 code is available
Nov 2024: The project website is on

📋 Project Overview

Bounding boxes, masks, and poses capture complementary aspects of the human body. BBoxMaskPose links detection, segmentation, and pose estimation iteratively, where each prediction refines the others. PMPose combines probabilistic modeling with mask conditioning for robust pose estimation in crowds. Together, these components achieve state-of-the-art results on COCO and OCHuman, being the first method to exceed 50 AP on OCHuman.

Repository Structure

The repository is organized into two main packages with stable public APIs:

BBoxMaskPose/
├── pmpose/                    # PMPose package (pose estimation)
│   └── pmpose/
│       ├── api.py             # PUBLIC API: PMPose class
│       ├── mm_utils.py        # Internal utilities
│       └── posevis_lite.py    # Visualization
├── mmpose/                    # MMPose fork with our edits
├── bboxmaskpose/              # BBoxMaskPose package (full pipeline)
│   └── bboxmaskpose/
│       ├── api.py             # PUBLIC API: BBoxMaskPose class
│       ├── sam2/              # SAM2 implementation
│       ├── configs/           # BMP configurations
│       └── *_utils.py         # Internal utilities
├── demos/                     # Public API demos
│   ├── PMPose_demo.py         # PMPose usage example
│   ├── BMP_demo.py            # BBoxMaskPose usage example
│   └── quickstart.ipynb       # Interactive notebook
└── demo/                      # Legacy demo (still functional)

Key contributions:

MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
PMPose: a pose estimation model modelling the full keypoint probability distribution AND conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
- Try the demo!
SAM-pose2seg: fine-tuned SAM2 model for pose-guided instance segmentation
- Try the demo!
Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
- Download pre-trained weights below
Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.

For more details, please visit our project website.

🎮 HuggingFace Demos

If you want to try our models without any installation, you can try the free HuggingFace demos.

BBoxMaskPose Demo showcases the whole loop including 3D pose estimation. You can generate GIFs similar to the one at the top of this README. Due to 3D rendering, this demo runs approx 30-60 seconds per image.

PMPose Demo showcase our familly of PMPose models. It is not an itterative methods but standard feed-forward top-down 2D pose estimation method. Check it out if you're interested in fast pose estimation.

🚀 Installation

Docker Installation (Recommended)

The fastest way to get started with GPU support:

# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build

# Run the demo
docker-compose up

Requires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.

Manual Installation

This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.

Basic installation steps:

# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose

# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80

# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"

# Install dependencies
pip install -r requirements.txt
pip install -e .

🎮 Demo

PMPose Demo (Pose Estimation Only)

python demos/PMPose_demo.py --image data/004806.jpg --device cuda

BBoxMaskPose Demo (Full Pipeline)

python demos/BMP_demo.py --image data/004806.jpg --device cuda

After running the demo, outputs are in outputs/004806/. The expected output should look like this:

BBoxMaskPose v2 Demo (Full Pipeline + 3D Mesh Recovery)

This demo extends BMP with SAM-3D-Body for 3D human mesh recovery:

# Basic usage (auto-downloads checkpoint from HuggingFace)
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda

# With local checkpoint
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \
    --sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \
    --mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.pt

SAM-3D-Body Installation (Optional): BMPv2 requires SAM-3D-Body for 3D mesh recovery. Install it separately:

# 1. Install dependencies
pip install -r requirements/sam3d.txt

# 2. Install detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' --no-build-isolation --no-deps

# 3. Install MoGe (optional, for FOV estimation)

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

last30days-skill

5.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary