End-to-End Speech Translation Progress

Tutorial

EACL 2021 tutorial: Speech Translation
Blog: Getting Started with End-to-End Speech Translation
ACL 2020 Theme paper: Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
INTERSPEECH 2019 survey talk: Spoken Language Translation

Data

| Corpus | Direction | Target | Duration | License | |---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:--------:|-------------------------------------------------------------------------------------------------------------------:| | CoVoST 2 | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En and En -> {De, Ca, Zh, Fa, Et, Mn, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} | Text | 2880h | CC0 | | CVSS | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En | Text & Speech | 1900h | CC BY 4.0 |
| mTEDx | {Es, Fr, Pt, It, Ru, El} -> En, {Fr, Pt, It} -> Es, Es -> {Fr, It}, {Es,Fr} -> Pt | Text | 765h | CC BY-NC-ND 4.0 | | CoVoST | {Fr, De, Nl, Ru, Es, It, Tr, Fa, Sv, Mn, Zh} -> En | Text | 700h | CC0 | | MUST-C & MUST-Cinema | En -> {De, Es, Fr, It, Nl, Pt, Ro, Ru, Ar, Cs, Fa, Tr, Vi, Zh} | Text | 504h | CC BY-NC-ND 4.0 | | How2 | En -> Pt | Text | 300h | Youtube & CC BY-SA 4.0 | | Augmented LibriSpeech | En -> Fr | Text | 236h | CC BY 4.0 | | Europarl-ST | {En, Fr, De, Es, It, Pt, Pl, Ro, Nl} -> {En, Fr, De, Es, It, Pt, Pl, Ro, Nl} | Text | 280h | CC BY-NC 4.0 | | Kosp2e | Ko -> En | Text | 198h | Mixed CC | | Fisher + Callhome | Es -> En | Text | 160h+20h | LDC | | MaSS | parallel among En, Es, Eu, Fi, Fr, Hu, Ro and Ru | Text & Speech | 172h | Bible.is | | LibriVoxDeEn | De -> En | Text | 110h | CC BY-NC-SA 4.0 | | Prabhupadavani | parallel among En, Fr, De, Gu, Hi, Hu, Id, It, Lv, Lt, Ne, Fa, Pl, Pt, Ru, Sl, Sk, Es, Se, Ta, Te, Tr, Bg, Hr, Da and Nl | Text | 94h | | | BSTC | Zh -> En | Text | 68h | | | LibriS2S | De <-> En | Text & Speech | 52h/57h | CC BY-NC-SA 4.0 |

Toolkit

Paper

2023

[arXiv] Tuning Large language model for End-to-end Speech Translation
[arXiv] Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
[arXiv] Multilingual Speech-to-Speech Translation into Multiple Target Languages
[ICCV] MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
[INTERSPEECH] MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
[INTERSPEECH] Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
[INTERSPEECH] Joint Speech Translation and Named Entity Recognition
[INTERSPEECH] StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
[INTERSPEECH] Knowledge Distillation on Joint Task End-to-End Speech Translation
[INTERSPEECH] GigaST: A 10,000-hour Pseudo Speech Translation Corpus
[INTERSPEECH] Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
[INTERSPEECH] AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
[INTERSPEECH]

SpeechTransProgress

Install / Use

README