SpeechTransProgress
Tracking the progress in end-to-end speech translation
Install / Use
/learn @kahne/SpeechTransProgressREADME
End-to-End Speech Translation Progress
Tutorial
- EACL 2021 tutorial: Speech Translation
- Blog: Getting Started with End-to-End Speech Translation
- ACL 2020 Theme paper: Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
- INTERSPEECH 2019 survey talk: Spoken Language Translation
Data
| Corpus | Direction | Target | Duration | License |
|---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:--------:|-------------------------------------------------------------------------------------------------------------------:|
| CoVoST 2 | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En and En -> {De, Ca, Zh, Fa, Et, Mn, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} | Text | 2880h | CC0 |
| CVSS | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En | Text & Speech | 1900h | CC BY 4.0 |
| mTEDx | {Es, Fr, Pt, It, Ru, El} -> En, {Fr, Pt, It} -> Es, Es -> {Fr, It}, {Es,Fr} -> Pt | Text | 765h | CC BY-NC-ND 4.0 |
| CoVoST | {Fr, De, Nl, Ru, Es, It, Tr, Fa, Sv, Mn, Zh} -> En | Text | 700h | CC0 |
| MUST-C & MUST-Cinema | En -> {De, Es, Fr, It, Nl, Pt, Ro, Ru, Ar, Cs, Fa, Tr, Vi, Zh} | Text | 504h | CC BY-NC-ND 4.0 |
| How2 | En -> Pt | Text | 300h | Youtube & CC BY-SA 4.0 |
| Augmented LibriSpeech | En -> Fr | Text | 236h | CC BY 4.0 |
| Europarl-ST | {En, Fr, De, Es, It, Pt, Pl, Ro, Nl} -> {En, Fr, De, Es, It, Pt, Pl, Ro, Nl} | Text | 280h | CC BY-NC 4.0 |
| Kosp2e | Ko -> En | Text | 198h | Mixed CC |
| Fisher + Callhome | Es -> En | Text | 160h+20h | LDC |
| MaSS | parallel among En, Es, Eu, Fi, Fr, Hu, Ro and Ru | Text & Speech | 172h | Bible.is |
| LibriVoxDeEn | De -> En | Text | 110h | CC BY-NC-SA 4.0 |
| Prabhupadavani | parallel among En, Fr, De, Gu, Hi, Hu, Id, It, Lv, Lt, Ne, Fa, Pl, Pt, Ru, Sl, Sk, Es, Se, Ta, Te, Tr, Bg, Hr, Da and Nl | Text | 94h | |
| BSTC | Zh -> En | Text | 68h | |
| LibriS2S | De <-> En | Text & Speech | 52h/57h | CC BY-NC-SA 4.0 |
Toolkit
Paper
2023
- [arXiv] Tuning Large language model for End-to-end Speech Translation
- [arXiv] Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
- [arXiv] Multilingual Speech-to-Speech Translation into Multiple Target Languages
- [ICCV] MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
- [INTERSPEECH] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
- [INTERSPEECH] Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
- [INTERSPEECH] Joint Speech Translation and Named Entity Recognition
- [INTERSPEECH] StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
- [INTERSPEECH] Knowledge Distillation on Joint Task End-to-End Speech Translation
- [INTERSPEECH] GigaST: A 10,000-hour Pseudo Speech Translation Corpus
- [INTERSPEECH] Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
- [INTERSPEECH] AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
- [INTERSPEECH]
Security Score
Audited on Feb 10, 2026
