PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Generate Convert Improve

Install / Use

/learn @PaddlePaddle/PaddleSpeech

About this skill

Quality Score

0/100

README

(简体中文|English)

<p align="center"> <img src="./docs/images/PaddleSpeech_logo.png" /> </p> <p align="center"> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a> <a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a> <a href="=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a> <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> </p> <div align="center"> <h4> <a href="#quick-start"> Quick Start </a> | <a href="#documents"> Documents </a> | <a href="#model-list"> Models List </a> | <a href="https://aistudio.baidu.com/aistudio/course/introduce/25130"> AIStudio Courses </a> | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Best Demo Award Paper </a> | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a> </h4> </div>

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

Speech Recognition

<div align = "center"> <table style="width:100%"> <thead> <tr> <th> Input Audio </th> <th width="550"> Recognition Result </th> </tr> </thead> <tbody> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br> </td> <td >I knocked at the door on the ancient side of the building.</td> </tr> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/zh.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> <td>我认为跑步最重要的就是给我带来了身体健康。</td> </tr> </tbody> </table> </div>

Speech Translation (English to Chinese)

<div align = "center"> <table style="width:100%"> <thead> <tr> <th> Input Audio </th> <th width="550"> Translations Result </th> </tr> </thead> <tbody> <tr> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/PaddleAudio/en.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200 style="max-width: 100%;"></a><br> </td> <td >我在这栋建筑的古老门上敲门。</td> </tr> </tbody> </table> </div>

Text-to-Speech

<div align = "center"> <table style="width:100%"> <thead> <tr> <th width="550" > Input Text</th> <th>Synthetic Audio</th> </tr> </thead> <tbody> <tr> <td>Life was like a box of chocolates, you never know what you're gonna get.</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/tacotron2_ljspeech_waveflow_samples_0.2/sentence_1.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>早上好，今天是2020/10/29，最低温度是-3°C。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>大家好，我是 parrot 虚拟老师，我们来读一首诗，我与春风皆过客，I and the spring breeze are passing by，你携秋水揽星河，you take the autumn water to take the galaxy。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/labixiaoxin.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>宜家唔系事必要你讲，但系你所讲嘅说话将会变成呈堂证供。</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/chengtangzhenggong.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> <tr> <td>各个国家有各个国家嘅国歌</td> <td align = "center"> <a href="https://paddlespeech.cdn.bcebos.com/Parakeet/docs/demos/gegege.wav" rel="nofollow"> <img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br> </td> </tr> </tbody> </table> </div>

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration

<div align = "center"> <table style="width:100%"> <thead> <tr> <th width="390"> Input Text </th> <th width="390"> Output Text </th> </tr> </thead> <tbody> <tr> <td>今天的天气真不错啊你下午有空吗我想约你一起去吃饭</td> <td>今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。</td> </tr> </tbody> </table> </div>

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

📦 Ease of Use: low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey.
🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system.
💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
📦 Varieties of Functions that Vitalize both Industrial and Academia:
- 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
- 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details.
- 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

🎉 2025.09.01: Add Whisper large v3 and turbo model.
🤗 2025.08.11: Add code-switch online model and server demo.
👑 2023.05.31: Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech.
🎉 2023.05.18: Add Squeezeformer, Squeezeformer training for ASR on Aishell.
👑 2023.05.04: Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech.
⚡ 2023.04.28: Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
👑 2023.04.25: Add AMP for U2 conformer.
🔥 2023.04.06: Add subtitle file (.srt format) generation example.
🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including DiffSinger、PWGAN and HiFiGAN, the effect is continuously optimized.
👑 2023.03.09: Add Wav2vec2ASR-zh.
🎉 2023.03.07: Add

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

openclaw-plugin-loom

Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

Leadership-Mirror

Product Overview Project Purpose Hack Atria is a leadership development and team management platform that provides AI-powered insights, feedback analysis, and learning resources to help leaders

PaddlePaddle

View profile

View on GitHub

GitHub Stars12.6k

CategoryEducation

Updated21h ago

Forks2.0k

PaddlePaddle/PaddleSpeech

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings