Spectre
Official Pytorch Implementation of SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos
Install / Use
/learn @filby89/SpectreREADME
SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos
<a href='https://youtu.be/P1kqrxWNizI'>
<img src='https://img.shields.io/badge/Youtube-Video-red?style=flat&logo=youtube&logoColor=red' alt='Youtube Video'>
</a>
This is the official Pytorch implementation of the paper:
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos
Panagiotis P. Filntisis, George Retsinas, Foivos Paraperas-Papantoniou, Athanasios Katsamanis, Anastasios Roussos, and Petros Maragos
arXiv 2022
News
- 🔧 SPECTRE has been integrated into EMOCAv2!
- 🆕 Check out our new work on 3D face reconstruction, SMIRK, which achieves state-of-the-art results on facial expressions!
Learn more on the project page and in the paper.
Installation
Clone the repo and its submodules:
git clone --recurse-submodules -j4 https://github.com/filby89/spectre
cd spectre
You need to have installed a working version of Pytorch with Python 3.6 or higher and Pytorch 3D. You can use the following commands to create a working installation:
conda create -n "spectre" python=3.8
conda install -c pytorch pytorch=1.11.0 torchvision torchaudio # you might need to select cudatoolkit version here by adding e.g. cudatoolkit=11.3
conda install -c conda-forge -c fvcore fvcore iopath
conda install pytorch3d -c pytorch3d
pip install -r requirements.txt # install the rest of the requirements
Installing a working setup of Pytorch3d with Pytorch can be a bit tricky. For development we used Pytorch3d 0.6.1 with Pytorch 1.10.0.
PyTorch3d 0.6.2 with pytorch 1.11.0 are also compatible.
Install the face_alignment and face_detection packages:
cd external/face_alignment
pip install -e .
cd ../face_detection
git lfs pull
pip install -e .
cd ../..
You may need to install git-lfs to run the above commands. More details
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Download the FLAME model and the pretrained SPECTRE model:
pip install gdown
bash quick_install.sh
Demo
Samples are included in samples folder. You can run the demo by running
python demo.py --input samples/LRS3/0Fi83BHQsMA_00002.mp4 --audio
The audio flag extracts audio from the input video and puts it in the output shape video for visualization purposes (ffmpeg is required for video creation).
Training and Testing
In order to train the model you need to download the trainval and test sets of the LRS3 dataset. After downloading
the dataset, run the following command to extract frames and audio from the videos (audio is not needed for training but it is nice for visualizing the result):
python utils/extract_frames_and_audio.py --dataset_path ./data/LRS3
After downloading and preprocessing the dataset, download the rest needed assets:
bash get_training_data.sh
This command downloads the original DECA pretrained model, the ResNet50 emotion recognition model provided by EMOCA, the pretrained lipreading model and detected landmarks for the videos of the LRS3 dataset provided by Visual_Speech_Recognition_for_Multiple_Languages.
Finally, you need to create a texture model using the repository BFM_to_FLAME. Due to licencing reasons we are not allowed to share it to you.
Now, you can run the following command to train the model:
python main.py --output_dir logs --landmark 50 --relative_landmark 25 --lipread 2 --expression 0.5 --epochs 6 --LRS3_path data/LRS3 --LRS3_landmarks_path data/LRS3_landmarks
and then test it on the LRS3 dataset test set:
python main.py --test --output_dir logs --model_path logs/model.tar --LRS3_path data/LRS3 --LRS3_landmarks_path data/LRS3_landmarks
and run lipreading with AV-hubert:
# and run lipreading with our script
python utils/run_av_hubert.py --videos "logs/test_videos_000000/*_mouth.avi --LRS3_path data/LRS3"
Acknowledgements
This repo is has been heavily based on the original implementation of DECA. We also acknowledge the following repositories which we have benefited greatly from as well:
Citation
If your research benefits from this repository, consider citing the following:
@misc{filntisis2022visual,
title = {Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos},
author = {Filntisis, Panagiotis P. and Retsinas, George and Paraperas-Papantoniou, Foivos and Katsamanis, Athanasios and Roussos, Anastasios and Maragos, Petros},
publisher = {arXiv},
year = {2022},
}
Related Skills
claude-opus-4-5-migration
83.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
337.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.8k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.6kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
