Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

Introduction

This is the code of paper Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild. We propose a novel facial landmark detector, PIPNet, that is fast, accurate, and robust. PIPNet can be trained under two settings: (1) supervised learning; (2) generalizable semi-supervised learning (GSSL). With GSSL, PIPNet has better cross-domain generalization performance by utilizing massive amounts of unlabeled data across domains.

<img src="images/speed.png" alt="speed" width="640px"> Figure 1. Comparison to existing methods on speed-accuracy tradeoff, tested on WFLW full test set (closer to bottom-right corner is better).<br><br> <img src="images/detection_heads.png" alt="det_heads" width="512px"> Figure 2. Comparison of different detection heads.<br>

Installation

Install Python3 and PyTorch >= v1.1
Clone this repository.

git clone https://github.com/jhb86253817/PIPNet.git

Install the dependencies in requirements.txt.

pip install -r requirements.txt

Demo

We use a modified version of FaceBoxes as the face detector, so go to folder FaceBoxesV2/utils, run sh make.sh to build for NMS.
Back to folder PIPNet, create two empty folders logs and snapshots. For PIPNets, you can download our trained models from here, and put them under folder snapshots/DATA_NAME/EXPERIMENT_NAME/.
Edit run_demo.sh to choose the config file and input source you want, then run sh run_demo.sh. We support image, video, and camera as the input. Some sample predictions can be seen as follows.

PIPNet-ResNet18 trained on WFLW, with image images/1.jpg as the input:
<img src="images/1_out_WFLW_model.jpg" alt="1_out_WFLW_model" width="400px">
PIPNet-ResNet18 trained on WFLW, with a snippet from Shaolin Soccer as the input:
<img src="videos/shaolin_soccer.gif" alt="shaolin_soccer" width="400px">
PIPNet-ResNet18 trained on WFLW, with video videos/002.avi as the input:
<img src="videos/002_out_WFLW_model.gif" alt="002_out_WFLW_model" width="512px">
PIPNet-ResNet18 trained on 300W+CelebA (GSSL), with video videos/007.avi as the input:
<img src="videos/007_out_300W_CELEBA_model.gif" alt="007_out_300W_CELEBA_model" width="512px">

Training

Supervised Learning

Datasets: 300W, COFW, WFLW, AFLW, LaPa

Download the datasets from official sources, then put them under folder data. The folder structure should look like this:

PIPNet
-- FaceBoxesV2
-- lib
-- experiments
-- logs
-- snapshots
-- data
   |-- data_300W
       |-- afw
       |-- helen
       |-- ibug
       |-- lfpw
   |-- COFW
       |-- COFW_train_color.mat
       |-- COFW_test_color.mat
   |-- WFLW
       |-- WFLW_images
       |-- WFLW_annotations
   |-- AFLW
       |-- flickr
       |-- AFLWinfo_release.mat
   |-- LaPa
       |-- train
       |-- val
       |-- test

Go to folder lib, preprocess a dataset by running python preprocess.py DATA_NAME. For example, to process 300W:

python preprocess.py data_300W

Back to folder PIPNet, edit run_train.sh to choose the config file you want. Then, train the model by running:

sh run_train.sh

Generalizable Semi-supervised Learning

Datasets:

data_300W_COFW_WFLW: 300W + COFW-68 (unlabeled) + WFLW-68 (unlabeled)
data_300W_CELEBA: 300W + CelebA (unlabeled)

Download 300W, COFW, and WFLW as in the supervised learning setting. Download annotations of COFW-68 test from here. For 300W+CelebA, you also need to download the in-the-wild CelebA images from here, and the face bounding boxes detected by us. The folder structure should look like this:

PIPNet
-- FaceBoxesV2
-- lib
-- experiments
-- logs
-- snapshots
-- data
   |-- data_300W
       |-- afw
       |-- helen
       |-- ibug
       |-- lfpw
   |-- COFW
       |-- COFW_train_color.mat
       |-- COFW_test_color.mat
   |-- WFLW
       |-- WFLW_images
       |-- WFLW_annotations
   |-- data_300W_COFW_WFLW
       |-- cofw68_test_annotations
       |-- cofw68_test_bboxes.mat
   |-- CELEBA
       |-- img_celeba
       |-- celeba_bboxes.txt
   |-- data_300W_CELEBA
       |-- cofw68_test_annotations
       |-- cofw68_test_bboxes.mat

Go to folder lib, preprocess a dataset by running python preprocess_gssl.py DATA_NAME. To process data_300W_COFW_WFLW, run
```
python preprocess_gssl.py data_300W_COFW_WFLW
```
To process data_300W_CELEBA, run
```
python preprocess_gssl.py CELEBA
```
and
```
python preprocess_gssl.py data_300W_CELEBA
```
Back to folder PIPNet, edit run_train.sh to choose the config file you want. Then, train the model by running:

sh run_train.sh

Evaluation

Edit run_test.sh to choose the config file you want. Then, test the model by running:

sh run_test.sh

Community

lite.ai.toolkit: Provide MNN C++, NCNN C++, TNN C++ and ONNXRuntime C++ version of PIPNet.
torchlm: Provide a PyTorch re-implement of PIPNet with ONNX Export, can install with pip.

Citation

@article{JLS21,
  title={Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild},
  author={Haibo Jin and Shengcai Liao and Ling Shao},
  journal={International Journal of Computer Vision},
  publisher={Springer Science and Business Media LLC},
  ISSN={1573-1405},
  url={http://dx.doi.org/10.1007/s11263-021-01521-4},
  DOI={10.1007/s11263-021-01521-4},
  year={2021},
  month={Sep}
}

Acknowledgement

We thank the following great works:

PIPNet

Install / Use

README