SkillAgentSearch skills...

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Install / Use

/learn @PaddlePaddle/PaddleSpeech

README

(简体中文|English)

<p align="center"> <img src="./docs/images/PaddleSpeech_logo.png" /> </p> <p align="center"> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a> <a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a> <a href="=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a> <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> </p> <div align="center"> <h4> <a href="#quick-start"> Quick Start </a> | <a href="#documents"> Documents </a> | <a href="#model-list"> Models List </a> | <a href="https://aistudio.baidu.com/aistudio/course/introduce/25130"> AIStudio Courses </a> | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Best Demo Award Paper </a> | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a> </h4> </div>

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

Speech Recognition
<div align = "center"> <table style="width:100%"> <thead> <tr> <th> Input Audio </th> <th width="550"> Recognition Result </th> </tr> </thead> <tbody> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br> </td> <td >I knocked at the door on the ancient side of the building.</td> </tr> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> <td>我认为跑步最重要的就是给我带来了身体健康。</td> </tr> </tbody> </table> </div>
Speech Translation (English to Chinese)
<div align = "center"> <table style="width:100%"> <thead> <tr> <th> Input Audio </th> <th width="550"> Translations Result </th> </tr> </thead> <tbody> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br> </td> <td >我 在 这栋 建筑 的 古老 门上 敲门。</td> </tr> </tbody> </table> </div>
Text-to-Speech
<div align = "center"> <table style="width:100%"> <thead> <tr> <th width="550" > Input Text</th> <th>Synthetic Audio</th> </tr> </thead> <tbody> <tr> <td>Life was like a box of chocolates, you never know what you're gonna get.</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>早上好,今天是2020/10/29,最低温度是-3°C。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>各个国家有各个国家嘅国歌</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> </tbody> </table> </div>

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration
<div align = "center"> <table style="width:100%"> <thead> <tr> <th width="390"> Input Text </th> <th width="390"> Output Text </th> </tr> </thead> <tbody> <tr> <td>今天的天气真不错啊你下午有空吗我想约你一起去吃饭</td> <td>今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。</td> </tr> </tbody> </table> </div>

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

  • 📦 Ease of Use: low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey.
  • 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
  • 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system.
  • 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
  • 📦 Varieties of Functions that Vitalize both Industrial and Academia:
    • 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
    • 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details.
    • 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

Related Skills

View on GitHub
GitHub Stars12.6k
CategoryEducation
Updated21h ago
Forks2.0k

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings