From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos

🔥 [Recognized as a Highly Cited Paper by Web of Science (Top 1%)]

From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos<br> Yin Chen$^{†}$, Jia Li$^{†∗}$, Shiguang Shan, Meng Wang, and Richang Hong

📰 News

[2025.9.15] Our new paper S4D has been accepted by IEEE Transactions on Affective Computing.

[2025.9.15] S2D has been recognized as a Highly Cited Paper by Clarivate.

[2024.9.5] The fine-tuned checkpoints are available.

[2024.9.2] The code and pre-trained models are available.

[2024.8.28] The paper is accepted by IEEE Transactions on Affective Computing.

[2023.12.5] ~~Code and pre-trained models will be released here~~.

🚀 Main Results

Dynamic Facial Expression Recognition

Static Facial Expression Recognition

Visualization

Fine-tune with pre-trained weights

1、 Download the pre-trained weights from baidu drive or google drive or Huggingface, and move it to the ckpts directory.

2、 Run the following command to fine-tune the model on the target dataset.

conda create -n s2d python=3.9
conda activate s2d
pip install -r requirements.txt
bash run.sh

📋 Reported Results and Fine-tuned Weights

The fine-tuned checkpoints can be downloaded from Baidu Drive or Huggingface.

<table border="1" cellspacing="0" cellpadding="5"> <tr> <th rowspan="2">Datasets</th> <th colspan="2">w/o oversampling</th> <th colspan="2">w/ oversampling</th> </tr> <tr> <th>UAR</th> <th>WAR</th> <th>UAR</th> <th>WAR</th> </tr> <tr><td colspan="5" style="text-align: center;">FERV39K</td></tr> <tr> <td>FERV39K</td> <td>41.28</td> <td>52.56</td> <td>43.97</td> <td>46.21</td> </tr> <tr><td colspan="5" style="text-align: center;">DFEW</td></tr> <tr> <td>DFEW01</td> <td>61.56</td> <td>76.16</td> <td>64.80</td> <td>75.35</td> </tr> <tr> <td>DFEW02</td> <td>59.93</td> <td>73.99</td> <td>62.54</td> <td>72.53</td> </tr> <tr> <td>DFEW03</td> <td>61.33</td> <td>76.41</td> <td>66.47</td> <td>75.87</td> </tr> <tr> <td>DFEW04</td> <td>62.75</td> <td>76.31</td> <td>66.03</td> <td>74.48</td> </tr> <tr> <td>DFEW05</td> <td>63.51</td> <td>77.27</td> <td>67.43</td> <td>76.80</td> </tr> <tr> <td>DFEW</td> <td>61.82</td> <td>76.03</td> <td>65.45</td> <td>74.81</td> </tr> <tr><td colspan="5" style="text-align: center;">MAFW</td></tr> <tr> <td>MAFW01</td> <td>32.78</td> <td>46.76</td> <td>36.16</td> <td>44.21</td> </tr> <tr> <td>MAFW02</td> <td>40.43</td> <td>55.96</td> <td>41.94</td> <td>51.22</td> </tr> <tr> <td>MAFW03</td> <td>47.01</td> <td>62.08</td> <td>48.08</td> <td>61.48</td> </tr> <tr> <td>MAFW04</td> <td>45.66</td> <td>62.61</td> <td>47.67</td> <td>60.64</td> </tr> <tr> <td>MAFW05</td> <td>43.45</td> <td>59.42</td> <td>43.16</td> <td>58.55</td> </tr> <tr> <td>MAFW</td> <td>41.86</td> <td>57.37</td> <td>43.40</td> <td>55.22</td> </tr> </table>

✏️ Citation

If you find this work helpful, please consider citing:

@ARTICLE{10663980,
  author={Chen, Yin and Li, Jia and Shan, Shiguang and Wang, Meng and Hong, Richang},
  journal={IEEE Transactions on Affective Computing}, 
  title={From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos}, 
  year={2024},
  volume={},
  number={},
  pages={1-15},
  keywords={Adaptation models;Videos;Computational modeling;Feature extraction;Transformers;Task analysis;Face recognition;Dynamic facial expression recognition;emotion ambiguity;model adaptation;transfer learning},
  doi={10.1109/TAFFC.2024.3453443}}

@ARTICLE{11207542,
  author={Chen, Yin and Li, Jia and Zhang, Yu and Hu, Zhenzhen and Shan, Shiguang and Wang, Meng and Hong, Richang},
  journal={IEEE Transactions on Affective Computing}, 
  title={Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data}, 
  year={2025},
  volume={},
  number={},
  pages={1-15},
  keywords={Videos;Adaptation models;Face recognition;Transformers;Semantics;Multitasking;Computer vision;Spatiotemporal phenomena;Correlation;Emotion recognition;Dynamic facial expression recognition;mixture of experts;self-supervised learning;vision transformer},
  doi={10.1109/TAFFC.2025.3623135}}

S2D

Install / Use

README