Headposeplus

Comparable Head Pose Estimation

Generate Convert Improve

Install / Use

/learn @kuhnkeF/Headposeplus

About this skill

Quality Score

0/100

README

https://user-images.githubusercontent.com/35331054/216019397-2aa3de5f-ea02-4f64-898f-7f787a65cd35.mp4

Our head pose predictions on Biwi [8] dataset

Updates

29.03.2023 Added a webcam demo ([here])
20.03.2023 Fixed a bug for 6DRepNet evaluation. 6DRepNet was trained with BGR images, but I used RGB for the evaluation. With this update BGR images are used instead, and the performance is now similar to the paper.

HeadPose+

Are head pose estimation results comparable? not really
We provide a comprehensive analysis of factors associated with the evaluation of head pose estimation methods. We focus on the popular Biwi Kinect Head Pose Database (Biwi) [8] and show that different processing leads to incomparable test sets (Biwi variants).

What can you find:

Comprehensive evaluation of head pose estimation methods on Biwi variants
Models, checkpoints and test code for our works
- RCRw [1][2]
- PADACO [3]
Code to reproduce and evaluate on different Biwi variants for
- Hopenet [4]
- FSA-Net [5]
- 6DRepNet [6]
- WHENet [7]
Biwi+, file [3]
- Manually checked face bounding boxes for all frames of Biwi [8]
- Pose labels in RGB camera frame and z-y'-x'' rotation sequence
Face bounding boxes and test sets (subsets) for Biwi [8] used by other authors, we call these "Biwi variants"
- MTCNN [9] (FSA-Net)
- Dockerface [10] (Hopenet)
- YOLOv3 [11] (WHENet)
A PyTorch Biwi variant dataset, file, to easily load the Biwi variants

Bounding boxes used by different methods for Biwi dataset

Biwi variants: Image of different face bounding boxes used by different methods for cropping Biwi [8].

Takeaways:

Do different face detectors result in different test sets?
Yes quite drastic differences as the face detector determines a subset of the original Biwi files to be used as test set
(e.g., over 15% of Biwi images skipped for FSA-Net variant)
Do different test sets change head pose estimation performance?
Yes performance differences sometimes seem bigger than method related gains
Is it important to use the same face detector for training and testing?
No/depends we can achieve similar performance if we post process the detections of different face detection algorithms to have similar bounding boxes (produce a similar face crop) as the ones used during training (requires a known mapping)
Sometimes similar performance can be achieved with boxes from a detector not used during training (depends on method)
However, we notice that even changing the box size by one pixel can result in different results
Does it matter in which rotation representation (Euler angle rotation order), e.g., z-y'-x'' (we call pyr) or x-y'-z'' (we call ypr), we evaluate our methods?
Yes the results can be quite different and are not comparable
Does correcting the pose from depth camera to RGB camera for Biwi improve results? (why do we need this?)
No, no calibration seems to be better.
A possible explanation could be a global offset of center pose (0,0,0) between datasets. (let us know if you find an explanation)
Is SOTA performance for Biwi on current paperswithcode leaderboard meaningful?
Not really, e.g., Hopenet (2018), reported MAE 4.89 but achieves MAE 3.82 on the Biwi variant used by FSA-Net (2019)

Therefore, we suggest evaluation and comparison of results with precisely defined evaluation protocols and to report them. Biwi+ is a step in this direction. It provides a fixed test set with face bounding boxes for all Biwi images.

Compilation of Results

All results can be found [here]. This section is just a compilation of results on Biwi processed like Biwi+ [3] and Biwi processed like FSA-Net [3], except we generously selected the best performing face crop for each method. Some methods have better results than reported in their publications.

Results on Biwi FSA-Net variant (Biwi test set used by FSA-Net evaluation and others)

| Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | WHENet 2020| 4.79 | 5.06 | 6.00 | 3.33 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ | | FSA-Net 2019| 3.91 | 4.78 | 4.29 | 2.66 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✖ | | Hopenet 2018| 3.82 | 4.75 | 3.98 | 2.73 | ypr | Biwi (FSA-Net) | 13219 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✖ | | RCRw (proposed) 2023 | 3.63 | 4.51 | 3.78 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✖ | | 6DRepNet 2022| 3.41 | 3.92 | 3.70 | 2.60 | ypr | Biwi (FSA-Net) | 13219 | 300W-LP | Biwi+ -> MTCNN, FSA-Net | ✖ | ✖ | | | | | | | | | | | | | | | PADACO 2019 | 3.69 | 4.20 | 3.31 | 3.56 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ | | RCRw (proposed) 2023 | 3.34 | 3.91 | 3.43 | 2.68 | ypr | Biwi (FSA-Net) | 13219 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✖ |

Results on Biwi+

Except Hopenet all methods perform best using Biwi+ face bounding boxes.

| Method | MAE | Pitch | Yaw | Roll | Format | Test Set | Num Images | Training Set | Crop | Unsup. Training on Test Set | Calibrated Biwi | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | WHENet | 7.25 | 8.00 | 8.05 | 5.72 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | FSA-Net | 5.75 | 6.43 | 6.27 | 4.55 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | Hopenet | 5.73 | 7.65 | 5.32 | 4.21 | pyr | Biwi+ | 15678 | 300W_LP | Biwi+ -> Dockerface, Hopenet | ✖ | ✔ | | RCRw (proposed) | 4.55 | 6.34 | 4.55 | 2.74 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✔ | ✔ | | 6DRepNet | 4.39 | 5.19 | 4.62 | 3.37 | pyr | Biwi+ | 15678 | 300W-LP | Biwi+ (DLIB+manual) | ✖ | ✔ | | | | | | | | | | | | | | | PADACO | 4.13 | 4.51 | 4.11 | 3.78 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ | | RCRw (proposed) | 3.86 | 4.73 | 3.95 | 2.89 | pyr | Biwi+ | 15678 | SynHead++ | Biwi+ (DLIB+manual) | ✔ | ✔ |

How To

Setup

git clone --recurse-submodules https://github.com/kuhnkeF/headposeplus.git HeadPosePlus
cd HeadPosePlus

We assume a working Anaconda distribution. We use Anacondas virtual environment manager.
Change "path_biwi" in (hpp/BiwiDataset.py) to point to your copy of Biwi

download Biwi Kinect Head Pose Database official website
change "path_biwi" in (hpp/BiwiDataset.py) to your path containing the folders 01,02,03,...

Download the model/checkpoint files, see here

chmod +x create_pytorch_env.sh
chmod +x create_tensorflow_env.sh
chmod +x eval_all.sh
./create_pytorch_env.sh
./create_tensorflow_env.sh
./eval_all.sh

Virtual Environments

To run the code we decided to use two environments:

A PyTorch environment for evaluation of PADACO, RCRw, Hopenet, 6DRepNet
A Tensorflow environment with Keras to evaluate FSA-Net and WHENet

The following scripts set up the environments and install the dependencies.

create_pytorch_env.sh
create_tensorflow_env.sh

Run Code

Run this script (or check out the eval_* python files) to compute the results.

eval_all.sh

Precomputed results can be found in the /results folder.

Remarks and Issues

Unsupervised validation/model selection (when to stop training?) is another point that leads to incomparable/unfair results (this is the case for many UDA works and cross-dataset evaluation)
Why is the original Biwi+ missing 1 image (15677 instead of 15678)?
It's the first image of the dataset (01/frame_00003_rgb.png) because the frame_00003_pose.bin file is missing in the annotations. In this updated version we simply copied the bounding box from 01/frame_00004. For our work, this does not change the results as the change of error is smaller than 0.005.
In [1] we report the mean result from 10 different models. We only provide and evaluate one of them here.

Biwi Calibration

Biwi was intended to develop algorithms that work on depth images alone. The annotated poses (ground truth) are in the coordinate frame of the depth camera. The parameters (intrinsic, extrinsic) of the RGB camera in relation to the depth camera is provided by the authors. Therefore, it is possible to transform the ground truth to the RGB camera coordinate frame. A simple test to validate the pose is to render the provided head models and overlay them on top of the RGB images. Only with "calibration" the face in the image and the rendered head overlap correctly.

Citing

Please acknowledge the effort by citing the corresponding papers in your publications. We hope our code and data helps your research.
[1] <div id="RCRwTBIOM"></div >

    @ARTICLE{kuhnke23_RelativePose_TBIOM,
    author={Kuhnke, Felix and Ostermann, Jörn},
    journal={IEEE Transactions on Biometrics, Behavior, and Identity Science}, 
    title={Domain Adaptation for Head Pose Estimation Using Relative Pose Consistency}, 
    year={2023},
    volume={},
    number={},
    pages={1-1},
    doi={10.1109/TBIOM.2023.3237039}}

[2] <div id="RCRwFG"></div >

    @INPROCEEDINGS{kuhnke21_RelativePose_FG,
    t

Related Skills

node-connect

351.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。