FaceBench
[CVPR 2025] FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Install / Use
/learn @CVI-SZU/FaceBenchREADME
Overview
In this work, we introduce FaceBench, a dataset featuring hierarchical multi-view and multi-level attributes specifically designed to assess the comprehensive face perception abilities of MLLMs. We construct a hierarchical facial attribute structure, which encompasses five views with up to three levels of attributes, totaling over 210 attributes and 700 attribute values. Based on the structure, the proposed FaceBench consists of 49,919 visual question-answering (VQA) pairs for evaluation and 23,841 pairs for fine-tuning. Moreover, we further develop a robust face perception MLLM baseline, Face-LLaVA, by training with our proposed face VQA data.
<div align="center"><img src="./assets/overview.png" width="100%" height="100%"></div>Distribution of visual question-answer pairs
<div align="center"><img src="./assets/VQAs.jpg" width="100%" height="100%"></div>Some samples from our dataset
<div align="center"><img src="./assets/example.png" width="100%" height="100%"></div>News
- [2024-08-20] The Face-LLaVA model is released on HuggingFace🤗.
- [2024-03-27] The paper is released on ArXiv🔥.
TODO
- [X] Release the Face-LLaVA model.
- [X] Release the evaluation code.
- [X] Release the dataset.
Evaluation
Model inference
OMP_NUM_THREADS=8 CUDA_VISIBLE_DEVICES=0 python evaluation/inference.py \
--data-dir ./datasets/example/test.jsonl \
--images-dir ./datasets/example/images/ \
--model-name face_llava_1_5_13b \
--question-type "TFQ, SCQ, MCQ, OEQ" \
--save-dir "./responses-and-results/"
Calculate metrics
OMP_NUM_THREADS=8 CUDA_VISIBLE_DEVICES=5 python evaluation/evaluation.py \
--data-path ./responses-and-results/face_llava_1_5_13b_test_responses.jsonl"
Results
Experimental results of various MLLMs and our Face-LLaVA across five facial attribute views.
<div align="center"><img src="./assets/five-view-results.jpg" width="100%" height="100%"></div>Experimental results of various MLLMs and our Face-LLaVA across Level 1 facial attributes.
<div align="center"><img src="./assets/level-1-results.jpg" width="100%" height="100%"></div>Citation
If you find this work useful for your research, please consider citing our paper:
@inproceedings{wang2025facebench,
title={FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs},
author={Wang, Xiaoqin and Ma, Xusen and Hou, Xianxu and Ding, Meidan and Li, Yudong and Chen, Junliang and Chen, Wenting and Peng, Xiaoyang and Shen, Linlin},
booktitle={Proceedings-2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025},
year={2025}
}
@article{wang2025facebench,
title={FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs},
author={Wang, Xiaoqin and Ma, Xusen and Hou, Xianxu and Ding, Meidan and Li, Yudong and Chen, Junliang and Chen, Wenting and Peng, Xiaoyang and Shen, Linlin},
journal={arXiv preprint arXiv:2503.21457},
year={2025}
}
If you have any questions, you can either create issues or contact me by email wangxiaoqin2022@email.szu.edu.cn.
Acknowledgments
This work is heavily based on LLaVA. Thanks to the authors for their great work.
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
