Lip2Speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Install / Use
/learn @Chris10M/Lip2SpeechREADME
Lip2Speech [PDF]
A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content, i.e Lip to Speech Synthesis.
<p align="center"> <img src="images/overview.png" alt="overview" width="600"/></br> </p>Video Input | Processed Input | Speech Output
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Architecture Overview
<p align="center"> <img src="images/method_overview.png" alt="method" width="600"/></br> </p>LRW
Alignment Plot | Melspectogram Output
:-------------------------:|:-------------------------:|
| 
Usage
Demo
The pretrained model is available here [265.12 MB]
Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.
python3 demo.py
Default arguments
- dataset: LRW (10 Samples)
- root: Datasets/SAMPLE_LRW
- model_path: savedmodels/lip2speech_final.pth
- encoding: voice
Evaluate
Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)
python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth
Train
To train the model, we run train.py
python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth
- finetune_model_path - Use as base model to finetune to dataset. (optional)
Acknowledgement
Citation
If you use this research in your work, please cite it using the following metadata.
@misc{millerdurai2022faceilltellspeak,
title={Show Me Your Face, And I'll Tell You How You Speak},
author={Christen Millerdurai and Lotfy Abdel Khaliq and Timon Ulrich},
year={2022},
eprint={2206.14009},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2206.14009},
}
@software{Millerdurai_Lip2Speech_2021,
author = {Millerdurai, Christen and Abdel Khaliq, Lotfy and Ulrich, Timon},
month = {8},
title = {{Lip2Speech}},
url = {https://github.com/Chris10M/Lip2Speech},
version = {1.0.0},
year = {2021}
}
Related Skills
qqbot-channel
351.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
351.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
