SkillAgentSearch skills...

Hold

[CVPR 2024✨Highlight] Official repository for HOLD, the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and 3D hand-object training data.

Install / Use

/learn @zc-alexfan/Hold

README

[CVPR'24 Highlight] HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

<p align="center"> <img src="./docs/static/logo.png" alt="Image" width="300" height="100%" /> </p>

[ Project Page ] [ Paper ] [ SupMat ] [ ArXiv ] [ Video ] [ HOLD Account ] [ ICCV'25 HOLD+ARCTIC Challenge ]

Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges

News

✨3DV 2026: Looking for hand scans data? PALM is a large-scale dataset containing high-quality 13k registered 3dMD hand scans of 263 subjects and 90k calibrated multiview RGB images. See PALM for details.

<p align="center"> <img src="https://github.com/facebookresearch/PALM/blob/main/docs/static/dataset-teaser.jpg" alt="PALM Teaser" width="80%"/> </p>

🚀 Register a HOLD account here for news such as code release, downloads, and future updates!

  • 2025.07.04: Join our ICCV competition: Two hand + rigid object using HOLD on ARCTIC!
  • 2024.07.04: Join our ECCV competition: Two hand + rigid object using HOLD on ARCTIC!
  • 2024.07.04: HOLD beta is released!
  • 2024.04.04: HOLD is awarded CVPR highlight!
  • 2024.02.27: HOLD is accepted to CVPR'24! Working on code release!
<p align="center"> <img src="./docs/static/teaser.jpeg" alt="Image" width="80%"/> </p>

This is a repository for HOLD, a method that jointly reconstructs hands and objects from monocular videos without assuming a pre-scanned object template.

HOLD can reconstruct 3D geometries of novel objects and hands:

<p align="center"> <img src="./docs/static/360/mug_ours.gif" alt="Image" width="80%"/> <img src="./docs/static/ananas1_itw.jpg" alt="Image" width="80%"/> </p>

Potential directions from HOLD

<p align="center"> <img src="./docs/static/sushi.gif" alt="Image" width="80%"/> </p>

Features

  • Instructions to download in-the-wild videos from HOLD as well as preprocessed data
  • Scripts to preprocess and train on custom videos
  • A volumetric rendering framework to reconstruct dynamic hand-object interaction
  • A generalized codebase for single and two hand interaction with objects
  • A viewer to interact with the prediction
  • Code to evaluate and compare with HOLD in HO3D

TODOs

  • [ ] Tips on good reconstruction
  • [ ] Clean the code further
  • [X] Support arctic for two-hand + rigid object setting

Documentation

Getting started

Get a copy of the code:

git clone https://github.com/zc-alexfan/hold.git
cd hold; git submodule update --init --recursive
  1. Setup environments

    • Follow the instructions here: docs/setup.md.
    • You may skip external dependencies for now.
  2. Train on a preprocessed sequence

    • Start with one of our preprocessed in-the-wild sequences, such as hold_bottle1_itw.
    • Familiarize yourself with the usage guidelines in docs/usage.md for this preprocessed sequence.
    • This will enable you to train, render HOLD, and experiment with our interactive viewer.
    • At this stage, you can also explore the HOLD code in the ./code directory.
  3. Set up external dependencies and process custom videos

    • After understanding the initial tools, set up the "external dependencies" as outlined in docs/setup.md.
    • Preprocess the images from the hold_bottle1_itw sequence by following the instructions in docs/custom.md.
    • Train on this sequence to learn how to build a custom dataset.
    • You can capture your own custom video and reconstruct it in 3D at this point.
    • Most preprocessing artifact files are documented in docs/data_doc.md, which you can use as a reference.
  4. Two-hand setting: Bimanual category-agnostic reconstruction

    • At this point, you can preprocess and train on a custom single-hand sequence.
    • Now you can take on the bimanual category-agnostic reconstruction challenge!
    • Following the instruction in docs/arctic.md to reconstruct two-hand manipulation of ARCTIC sequences.

Official Citation

@inproceedings{fan2024hold,
  title={{HOLD}: Category-agnostic 3d reconstruction of interacting hands and objects from video},
  author={Fan, Zicong and Parelli, Maria and Kadoglou, Maria Eleni and Kocabas, Muhammed and Chen, Xu and Black, Michael J and Hilliges, Otmar},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={494--504},
  year={2024}
}

Star History

Star History Chart

Contact

For technical questions, please create an issue. For other questions, please contact the first author.

Acknowledgments

The authors would like to thank: Benjamin Pellkofer for IT/web support; Chen Guo, Egor Zakharov, Yao Feng, Artur Grigorev for insightful discussion; Yufei Ye for DiffHOI code release.

Our code benefits a lot from Vid2Avatar, aitviewer, VolSDF, NeRF++ and SNARF. If you find our work useful, consider checking out their work.

Related Skills

View on GitHub
GitHub Stars470
CategoryContent
Updated2h ago
Forks14

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings