UniFER
Official repository for the paper “Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models”
Install / Use
/learn @zfkarl/UniFERREADME
UniFER: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models
🌟 Official repository for the paper "Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models"
[📖 Paper] [🤗 Dataset] [🤗 Model]
👀 About UniFER
Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evolved from separate, domain-specific models to more unified approaches. One promising avenue to unify FER tasks is converting conventional FER datasets into visual question-answering (VQA) formats, enabling the direct application of powerful generalist MLLMs for inference. However, despite the success of cutting-edge MLLMs in various tasks, their performance on FER tasks remains largely unexplored. To address this gap, we provide FERBench, a systematic benchmark that incorporates 20 state-of-the-art MLLMs across four widely used FER datasets. Our results reveal that, while MLLMs exhibit good classification performance, they still face significant limitations in reasoning and interpretability.
<p align="center"> <img src="figs/ferbench.png" width="90%"> <br> </p>To this end, we introduce post-training strategies aimed at enhancing the facial expression reasoning capabilities of MLLMs. Specifically, we curate two high-quality and large-scale datasets: UniFER-CoT-230K for cold-start initialization and UniFER-RLVR-360K for reinforcement learning with verifiable rewards (RLVR), respectively. Building upon them, we develop a unified and interpretable FER foundation model termed UniFER-7B, which outperforms many open-sourced and closed-source generalist MLLMs (e.g., Gemini-2.5-Pro and Qwen2.5-VL-72B).
<p align="center"> <img src="figs/unifer_framework.png" width="90%"> <br> </p>🔥 Datasets
Our curated datasets consist of four widely used FER datasets: RAF-DB, FERPlus, AffectNet, and SFEW 2.0. Please download the corresponding images from their official websites before use.
Installation
Clone the repository:
git clone https://github.com/zfkarl/UniFER.git
cd UniFER
Create a conda environment:
conda create -n r1-v python=3.11
conda activate r1-v
Please follow the official instructions here to install both PyTorch and additional dependencies.
FERBench
The proposed four subsets of FERBench are stored in the following json files:
eval_rafdb/data/rafdb_qa.json
eval_ferplus/data/ferplus_qa.json
eval_affectnet/data/affectnet_qa.json
eval_sfew_2.0/data/sfew_2.0_qa.json
UniFER-CoT-230K
Download our dataset, and put the json file UniFER_CoT_230K.json in:
data/UniFER_CoT_230K.json
UniFER-RLVR-360K
Download our dataset, and put the json file UniFER_RLVR_360K.json in:
data/UniFER_RLVR_360K.json
🚀 Training
Stage 1: Cold Start SFT
cd train_unifer/src/scripts
bash run_sft_fer.sh
Stage 2: RLVR GRPO Training
cd train_unifer/src/scripts
bash run_grpo_vllm.sh
💫 Evaluation
After the above two-stage post-training, we can subsequently employ the derived model UniFER-7B for inference and evaluate its performance. You may change the directory name Qwen2.5-VL-7B-FER-GRPO-VLLM-8GPU to UniFER-7B for inference. Also, you can directly download our provided checkpoints for inference.
Inference and Evaluation
On RAFDB:
cd eval_rafdb/code
python infer_unifer.py
python eval_unifer.py
On FERPlus:
cd eval_ferplus/code
python infer_unifer.py
python eval_unifer.py
On AffectNet:
cd eval_affectnet/code
python infer_unifer.py
python eval_unifer.py
On SFEW2.0:
cd eval_sfew_2.0/code
python infer_unifer.py
python eval_unifer.py
Overall Performance:
cd eval_total/code
python eval_unifer.py
🥳 Acknowledgements
We would like to thank R1-V and video-r1, which served as the foundations for our repository.
:white_check_mark: Citation
If you find UniFER useful for your research and applications, please kindly cite using this BibTeX:
@misc{zhang2025rethinkingfacialexpressionrecognition,
title={Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond},
author={Fan Zhang and Haoxuan Li and Shengju Qian and Xin Wang and Zheng Lian and Hao Wu and Zhihong Zhu and Yuan Gao and Qiankun Li and Yefeng Zheng and Zhouchen Lin and Pheng-Ann Heng},
year={2025},
eprint={2511.00389},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.00389},
}
🔥 Please contact fzhang@link.cuhk.edu.hk if you would like to contribute to the leaderboard or have any problems.
