POEM
[CVPR 2023] POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
Install / Use
/learn @lixiny/POEMREADME
POEM is designed for "reconstructing hand geometry from multi-view". It combines the structure-aware MANO mesh with the unstructured point cloud in the intersected cameras' frustum space. To infer accurate 3D hand mesh from multi-view images, POEM introduce the cross point set attention. It achieves the state-of-the-art performance on three multi-view Hand-Object Datasets: HO3D, DexYCB, OakInk. <br/><br/>
:joystick: Instructions
- See docs/installation.md to setup the environment and install all the required packages.
- See docs/datasets.md to download all the datasets and data assets.
:runner: Training and Evaluation
Available models
- set
${MODEL}as one in[POEM, MVP, PEMeshTR, FTLMeshTR] - set
${DATASET}as one in[DexYCBMV, HO3Dv3MV, OakInkMV]
Download the pretrained checkpoints at :link: ckpt and move the contents to ./checkpoint.
Command line arguments
-g, --gpu_id, visible GPUs for training, e.g.-g 0,1,2,3. evaluation only supports single GPU.-w, --workers, num_workers in reading data, e.g.-w 4, recommend set-wequals to-gon HO3Dv3MV.-p, --dist_master_port, port for distributed training, e.g.-p 60011, set different-pfor different training processes.-b, --batch_size, e.g.-b 32, default is specified in config file, but will be overwritten if-bis provided.--cfg, config file for this experiment, e.g.--cfg config/release/${MODEL}_${DATASET}.yaml.--exp_idspecify the name of experiment, e.g.--exp_id ${EXP_ID}. When--exp_idis provided, the code requires that no uncommitted change is remained in the git repo. Otherwise, it defaults to 'default' for training and 'eval_{cfg}' for evaluation. All results will be saved inexp/${EXP_ID}*{timestamp}.--reload, specify the path to the checkpoint (.pth.tar) to be loaded.
Evaluation
Specify the ${PATH_TO_CKPT} to ./checkpoint/${MODEL}_${DATASET}/checkpoint/{xxx}.pth.tar. Then, run:
# use "--eval_extra" for extra evaluation.
# "auc" compute AUC of the predicted mesh.
# "draw" draw the predicted mesh of each batch.
$ python scripts/eval.py --cfg config/release/${MODEL}_${DATASET}.yaml -g 0 -b 8 --reload ${PATH_TO_CKPT}
The evaluation results will be saved at exp/${EXP_ID}_{timestamp}/evaluations.
Training
$ python scripts/train_ddp.py --cfg config/release/${MODEL}_${DATASET}.yaml -g 0,1,2,3 -w 16
Tensorboard
$ cd exp/${EXP_ID}_{timestamp}/runs/
$ tensorboard --logdir .
Checkpoint
All the training checkpoints are saved at exp/${EXP_ID}_{timestamp}/checkpoints/
License
The code and model provided herein are available for usage as specified in LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.
Citation
@inproceedings{yang2023poem,
author = {Yang, Lixin and Xu, Jian and Zhong, Licheng and Zhan, Xinyu and Wang, Zhicheng and Wu, Kejian and Lu, Cewu},
title = {POEM: Reconstructing Hand in a Point Embedded Multi-View Stereo},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {21108-21117}
}
For more questions, please contact Lixin Yang: siriusyang@sjtu.edu.cn
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
