Headposeplus
Comparable Head Pose Estimation
Install / Use
/learn @kuhnkeF/HeadposeplusREADME
https://user-images.githubusercontent.com/35331054/216019397-2aa3de5f-ea02-4f64-898f-7f787a65cd35.mp4
Our head pose predictions on Biwi [8] dataset
Updates
- 29.03.2023 Added a webcam demo ([here])
- 20.03.2023 Fixed a bug for 6DRepNet evaluation. 6DRepNet was trained with BGR images, but I used RGB for the evaluation. With this update BGR images are used instead, and the performance is now similar to the paper.
HeadPose+
Are head pose estimation results comparable? not really
We provide a comprehensive analysis of factors associated with the evaluation of head pose estimation methods.
We focus on the popular Biwi Kinect Head Pose Database (Biwi) [8] and show that different processing leads to incomparable test sets (Biwi variants).
What can you find:
- Comprehensive evaluation of head pose estimation methods on Biwi variants
- Models, checkpoints and test code for our works
- Code to reproduce and evaluate on different Biwi variants for
- Biwi+, file [3]
- Manually checked face bounding boxes for all frames of Biwi [8]
- Pose labels in RGB camera frame and z-y'-x'' rotation sequence
- Face bounding boxes and test sets (subsets) for Biwi [8] used by other authors, we call these "Biwi variants"
- A PyTorch Biwi variant dataset, file, to easily load the Biwi variants

Biwi variants: Image of different face bounding boxes used by different methods for cropping Biwi [8].
Takeaways:
-
Do different face detectors result in different test sets?
Yesquite drastic differences as the face detector determines a subset of the original Biwi files to be used as test set
(e.g., over 15% of Biwi images skipped for FSA-Net variant) -
Do different test sets change head pose estimation performance?
Yesperformance differences sometimes seem bigger than method related gains -
Is it important to use the same face detector for training and testing?
No/dependswe can achieve similar performance if we post process the detections of different face detection algorithms to have similar bounding boxes (produce a similar face crop) as the ones used during training (requires a known mapping)
Sometimes similar performance can be achieved with boxes from a detector not used during training (depends on method)
However, we notice that even changing the box size by one pixel can result in different results -
Does it matter in which rotation representation (Euler angle rotation order), e.g., z-y'-x'' (we call pyr) or x-y'-z'' (we call ypr), we evaluate our methods?
Yesthe results can be quite different and are not comparable -
Does correcting the pose from depth camera to RGB camera for Biwi improve results? (why do we need this?)
No, no calibration seems to be better.
A possible explanation could be a global offset of center pose (0,0,0) between datasets. (let us know if you find an explanation) -
Is SOTA performance for Biwi on current paperswithcode leaderboard meaningful?
Not really, e.g., Hopenet (2018), reported MAE 4.89 but achieves MAE 3.82 on the Biwi variant used by FSA-Net (2019)
Therefore, we suggest evaluation and comparison of results with precisely defined evaluation protocols and to report them. Biwi+ is a step in this direction. It provides a fixed test set with face bounding boxes for all Biwi images.
Compilation of Results
All results can be found [here]. This section is just a compilation of results on Biwi processed like Biwi+ [3] and Biwi processed like FSA-Net [3], except we generously selected the best performing face crop for each method. Some methods have better results than reported in their publications.
Results on Biwi FSA-Net variant (Biwi test set used by FSA-Net evaluation and others)
| Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | WHENet 2020| 4.79 | 5.06 | 6.00 | 3.33 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ | | FSA-Net 2019| 3.91 | 4.78 | 4.29 | 2.66 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ | | Hopenet 2018| 3.82 | 4.75 | 3.98 | 2.73 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✖ | | RCRw (proposed) 2023 | 3.63 | 4.51 | 3.78 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✖ | | 6DRepNet 2022| 3.41 | 3.92 | 3.70 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ -> MTCNN, FSA-Net | ✖ | ✖ | | | | | | | | | | | | | | | PADACO 2019 | 3.69 | 4.20 | 3.31 | 3.56 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ | | RCRw (proposed) 2023 | 3.34 | 3.91 | 3.43 | 2.68 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ |
Results on Biwi+
Except Hopenet all methods perform best using Biwi+ face bounding boxes.
| Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | WHENet | 7.25 | 8.00 | 8.05 | 5.72 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | FSA-Net | 5.75 | 6.43 | 6.27 | 4.55 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | Hopenet | 5.73 | 7.65 | 5.32 | 4.21 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✔ | | RCRw (proposed) | 4.55 | 6.34 | 4.55 | 2.74 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✔ | | 6DRepNet | 4.39 | 5.19 | 4.62 | 3.37 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | | | | | | | | | | | | | | PADACO | 4.13 | 4.51 | 4.11 | 3.78 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ | | RCRw (proposed) | 3.86 | 4.73 | 3.95 | 2.89 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ |
How To
Setup
git clone --recurse-submodules https://github.com/kuhnkeF/headposeplus.git HeadPosePlus
cd HeadPosePlus
-
We assume a working Anaconda distribution. We use Anacondas virtual environment manager.
-
Change "path_biwi" in (hpp/BiwiDataset.py) to point to your copy of Biwi
- download Biwi Kinect Head Pose Database official website
- change "path_biwi" in (hpp/BiwiDataset.py) to your path containing the folders 01,02,03,...
- Download the model/checkpoint files, see here
chmod +x create_pytorch_env.sh
chmod +x create_tensorflow_env.sh
chmod +x eval_all.sh
./create_pytorch_env.sh
./create_tensorflow_env.sh
./eval_all.sh
Virtual Environments
To run the code we decided to use two environments:
-
A PyTorch environment for evaluation of PADACO, RCRw, Hopenet, 6DRepNet
-
A Tensorflow environment with Keras to evaluate FSA-Net and WHENet
The following scripts set up the environments and install the dependencies.
create_pytorch_env.sh
create_tensorflow_env.sh
Run Code
Run this script (or check out the eval_* python files) to compute the results.
eval_all.sh
Precomputed results can be found in the /results folder.
Remarks and Issues
- Unsupervised validation/model selection (when to stop training?) is another point that leads to incomparable/unfair results (this is the case for many UDA works and cross-dataset evaluation)
- Why is the original Biwi+ missing 1 image (15677 instead of 15678)?
It's the first image of the dataset (01/frame_00003_rgb.png) because the frame_00003_pose.bin file is missing in the annotations. In this updated version we simply copied the bounding box from 01/frame_00004. For our work, this does not change the results as the change of error is smaller than 0.005. - In [1] we report the mean result from 10 different models. We only provide and evaluate one of them here.
Biwi Calibration
Biwi was intended to develop algorithms that work on depth images alone. The annotated poses (ground truth) are in the coordinate frame of the depth camera. The parameters (intrinsic, extrinsic) of the RGB camera in relation to the depth camera is provided by the authors. Therefore, it is possible to transform the ground truth to the RGB camera coordinate frame. A simple test to validate the pose is to render the provided head models and overlay them on top of the RGB images. Only with "calibration" the face in the image and the rendered head overlap correctly.
Citing
Please acknowledge the effort by citing the corresponding papers in your publications.
We hope our code and data helps your research.
[1] <div id="RCRwTBIOM"></div >
@ARTICLE{kuhnke23_RelativePose_TBIOM,
author={Kuhnke, Felix and Ostermann, Jörn},
journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
title={Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency},
year={2023},
volume={},
number={},
pages={1-1},
doi={10.1109/TBIOM.2023.3237039}}
[2] <div id="RCRwFG"></div >
@INPROCEEDINGS{kuhnke21_RelativePose_FG,
t
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
