SkillAgentSearch skills...

BBoxMaskPose

[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'

Install / Use

/learn @MiraPurkrabek/BBoxMaskPose

README

</h1><div id="toc"> <ul align="center" style="list-style: none; padding: 0; margin: 0;"> <summary> <h1 style="margin-bottom: 0.0em;"> BBoxMaskPose v2 </h1> </summary> </ul> </div> </h1><div id="toc"> <ul align="center" style="list-style: none; padding: 0; margin: 0;"> <summary> <h2 style="margin-bottom: 0.2em;"> CVPR 2025 + ICCV 2025 </h2> </summary> </ul> </div> <div align="center"> <img src="data/assets/BMP_043+076+174.gif" alt="BBoxMaskPose v2 loop" height="500px">

Website     License     Video

Paper     Paper     Paper     Paper    

<!-- Papers with code: [![2D Pose AP on OCHuman: 42.5](https://img.shields.io/badge/OCHuman-2D_Pose:_49.2_AP-blue)](https://paperswithcode.com/sota/2d-human-pose-estimation-on-ochuman?p=detection-pose-estimation-and-segmentation-1) &nbsp;&nbsp; [![Human Instance Segmentation AP on OCHuman: 34.0](https://img.shields.io/badge/OCHuman-Human_Instance_Segmentation:_34.0_AP-blue)](https://paperswithcode.com/sota/human-instance-segmentation-on-ochuman?p=detection-pose-estimation-and-segmentation-1) --> </div>

[!IMPORTANT] The new version of <b>BBox-Mask-Pose (BMPv2)</b> is now available on <b>arXiv</b>. BMPv2 significantly improves performance; see the quantitative results reported in the preprint. One of the key contributions is <b>PMPose</b>, a new top-down pose estimation model, that is already strong on standard benchmarks and in crowded scenes. The code is integrated in the <code>main</code> branch and was released in Release 2.0.0. Due to repository changes, the version 2.0.0 is not backward compatible with previous versions.

📢 News

  • Mar 2025: HuggingFace Image Demo is up-to-date with BMPv2. Check-out the 3D generation!
  • Mar 2026: Version 2.0 with improved (1) pose and (2) SAM and (3) wiring to 3D prediction released.
  • Feb 2026: SAM-pose2seg won a Best Paper Award on CVWW 2026 🎉
  • Jan 2026: BMPv2 paper is available on arXiv
  • Aug 2025: HuggingFace Image Demo is out! 🎮
  • Jul 2025: Version 1.1 with easy-to-run image demo released
  • Jun 2025: BMPv1 paper accepted to ICCV 2025! 🎉
  • Dec 2024: BMPv1 code is available
  • Nov 2024: The project website is on

📑 Table of Contents

📋 Project Overview

Bounding boxes, masks, and poses capture complementary aspects of the human body. BBoxMaskPose links detection, segmentation, and pose estimation iteratively, where each prediction refines the others. PMPose combines probabilistic modeling with mask conditioning for robust pose estimation in crowds. Together, these components achieve state-of-the-art results on COCO and OCHuman, being the first method to exceed 50 AP on OCHuman.

Repository Structure

The repository is organized into two main packages with stable public APIs:

BBoxMaskPose/
├── pmpose/                    # PMPose package (pose estimation)
│   └── pmpose/
│       ├── api.py             # PUBLIC API: PMPose class
│       ├── mm_utils.py        # Internal utilities
│       └── posevis_lite.py    # Visualization
├── mmpose/                    # MMPose fork with our edits
├── bboxmaskpose/              # BBoxMaskPose package (full pipeline)
│   └── bboxmaskpose/
│       ├── api.py             # PUBLIC API: BBoxMaskPose class
│       ├── sam2/              # SAM2 implementation
│       ├── configs/           # BMP configurations
│       └── *_utils.py         # Internal utilities
├── demos/                     # Public API demos
│   ├── PMPose_demo.py         # PMPose usage example
│   ├── BMP_demo.py            # BBoxMaskPose usage example
│   └── quickstart.ipynb       # Interactive notebook
└── demo/                      # Legacy demo (still functional)

Key contributions:

  1. MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
    • Download pre-trained weights below
  2. PMPose: a pose estimation model modelling the full keypoint probability distribution AND conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
    • Download pre-trained weights below
  3. BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
    • Try the demo!
  4. SAM-pose2seg: fine-tuned SAM2 model for pose-guided instance segmentation
    • Try the demo!
  5. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
    • Download pre-trained weights below
  6. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.

For more details, please visit our project website.

🎮 HuggingFace Demos

If you want to try our models without any installation, you can try the free HuggingFace demos.

BBoxMaskPose Demo showcases the whole loop including 3D pose estimation. You can generate GIFs similar to the one at the top of this README. Due to 3D rendering, this demo runs approx 30-60 seconds per image.

PMPose Demo showcase our familly of PMPose models. It is not an itterative methods but standard feed-forward top-down 2D pose estimation method. Check it out if you're interested in fast pose estimation.

<p float="left"> <img src="data/assets/BMP-demo_screenshot.png" height="150" /> &nbsp&nbsp&nbsp <img src="data/assets/PMPose-demo_screenshot.png" height="150" /> </p>

🚀 Installation

Docker Installation (Recommended)

The fastest way to get started with GPU support:

# Clone and build
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git
cd BBoxMaskPose
docker-compose build

# Run the demo
docker-compose up

Requires: Docker Engine 19.03+, NVIDIA Container Toolkit, NVIDIA GPU with CUDA 12.1 support.

Manual Installation

This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.

Basic installation steps:

# Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/
cd BBoxMaskPose

# Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
pip install numpy==1.25.1 opencv-python==4.9.0.80

# Install MMLibrary
pip install -U openmim
mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"

# Install dependencies
pip install -r requirements.txt
pip install -e .

🎮 Demo

PMPose Demo (Pose Estimation Only)

python demos/PMPose_demo.py --image data/004806.jpg --device cuda

BBoxMaskPose Demo (Full Pipeline)

python demos/BMP_demo.py --image data/004806.jpg --device cuda

After running the demo, outputs are in outputs/004806/. The expected output should look like this:

<div align="center"> <a href="data/assets/004806_mask.jpg" target="_blank"> <img src="data/assets/004806_mask.jpg" alt="Detection results" width="200" /> </a> &nbsp&nbsp&nbsp&nbsp <a href="data/assets/004806_pose.jpg" target="_blank"> <img src="data/assets/004806_pose.jpg" alt="Pose results" width="200" style="margin-right:10px;" /> </a> </div>

BBoxMaskPose v2 Demo (Full Pipeline + 3D Mesh Recovery)

This demo extends BMP with SAM-3D-Body for 3D human mesh recovery:

# Basic usage (auto-downloads checkpoint from HuggingFace)
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda

# With local checkpoint
python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \
    --sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \
    --mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.pt

SAM-3D-Body Installation (Optional): BMPv2 requires SAM-3D-Body for 3D mesh recovery. Install it separately:

# 1. Install dependencies
pip install -r requirements/sam3d.txt

# 2. Install detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' --no-build-isolation --no-deps

# 3. Install MoGe (optional, for FOV estimation)

Related Skills

View on GitHub
GitHub Stars215
CategoryEducation
Updated21h ago
Forks17

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings