SkillAgentSearch skills...

UniFER

Official repository for the paper “Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models”

Install / Use

/learn @zfkarl/UniFER
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

UniFER: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models

VQA Facial Expression Recognition Emotion Reasoning UniFER-7B

<p align="center"> <img src="./figs/logo.png" width="100%" height="100%"> </p>

🌟 Official repository for the paper "Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models"

[📖 Paper] [🤗 Dataset] [🤗 Model]

👀 About UniFER

Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evolved from separate, domain-specific models to more unified approaches. One promising avenue to unify FER tasks is converting conventional FER datasets into visual question-answering (VQA) formats, enabling the direct application of powerful generalist MLLMs for inference. However, despite the success of cutting-edge MLLMs in various tasks, their performance on FER tasks remains largely unexplored. To address this gap, we provide FERBench, a systematic benchmark that incorporates 20 state-of-the-art MLLMs across four widely used FER datasets. Our results reveal that, while MLLMs exhibit good classification performance, they still face significant limitations in reasoning and interpretability.

<p align="center"> <img src="figs/ferbench.png" width="90%"> <br> </p>

To this end, we introduce post-training strategies aimed at enhancing the facial expression reasoning capabilities of MLLMs. Specifically, we curate two high-quality and large-scale datasets: UniFER-CoT-230K for cold-start initialization and UniFER-RLVR-360K for reinforcement learning with verifiable rewards (RLVR), respectively. Building upon them, we develop a unified and interpretable FER foundation model termed UniFER-7B, which outperforms many open-sourced and closed-source generalist MLLMs (e.g., Gemini-2.5-Pro and Qwen2.5-VL-72B).

<p align="center"> <img src="figs/unifer_framework.png" width="90%"> <br> </p>

🔥 Datasets

Our curated datasets consist of four widely used FER datasets: RAF-DB, FERPlus, AffectNet, and SFEW 2.0. Please download the corresponding images from their official websites before use.

Installation

Clone the repository:

git clone https://github.com/zfkarl/UniFER.git
cd UniFER

Create a conda environment:

conda create -n r1-v python=3.11
conda activate r1-v

Please follow the official instructions here to install both PyTorch and additional dependencies.

FERBench

The proposed four subsets of FERBench are stored in the following json files:

eval_rafdb/data/rafdb_qa.json
eval_ferplus/data/ferplus_qa.json
eval_affectnet/data/affectnet_qa.json
eval_sfew_2.0/data/sfew_2.0_qa.json

UniFER-CoT-230K

Download our dataset, and put the json file UniFER_CoT_230K.json in:

data/UniFER_CoT_230K.json

UniFER-RLVR-360K

Download our dataset, and put the json file UniFER_RLVR_360K.json in:

data/UniFER_RLVR_360K.json

🚀 Training

Stage 1: Cold Start SFT

cd train_unifer/src/scripts
bash run_sft_fer.sh

Stage 2: RLVR GRPO Training

cd train_unifer/src/scripts
bash run_grpo_vllm.sh

💫 Evaluation

After the above two-stage post-training, we can subsequently employ the derived model UniFER-7B for inference and evaluate its performance. You may change the directory name Qwen2.5-VL-7B-FER-GRPO-VLLM-8GPU to UniFER-7B for inference. Also, you can directly download our provided checkpoints for inference.

Inference and Evaluation

On RAFDB:

cd eval_rafdb/code
python infer_unifer.py 
python eval_unifer.py

On FERPlus:

cd eval_ferplus/code
python infer_unifer.py 
python eval_unifer.py

On AffectNet:

cd eval_affectnet/code
python infer_unifer.py 
python eval_unifer.py

On SFEW2.0:

cd eval_sfew_2.0/code
python infer_unifer.py 
python eval_unifer.py

Overall Performance:

cd eval_total/code
python eval_unifer.py

🥳 Acknowledgements

We would like to thank R1-V and video-r1, which served as the foundations for our repository.

:white_check_mark: Citation

If you find UniFER useful for your research and applications, please kindly cite using this BibTeX:

@misc{zhang2025rethinkingfacialexpressionrecognition,
      title={Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond}, 
      author={Fan Zhang and Haoxuan Li and Shengju Qian and Xin Wang and Zheng Lian and Hao Wu and Zhihong Zhu and Yuan Gao and Qiankun Li and Yefeng Zheng and Zhouchen Lin and Pheng-Ann Heng},
      year={2025},
      eprint={2511.00389},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.00389}, 
}

🔥 Please contact fzhang@link.cuhk.edu.hk if you would like to contribute to the leaderboard or have any problems.

View on GitHub
GitHub Stars27
CategoryDevelopment
Updated21d ago
Forks4

Languages

Python

Security Score

75/100

Audited on Mar 13, 2026

No findings