AdaSpeech
An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
Install / Use
/learn @tuanh123789/AdaSpeechREADME
AdaSpeech - PyTorch Implementation
This is an unofficial PyTorch implementation of AdaSpeech. AdaSpeech: Adaptive text to speech for custom voice.
This project is based on ming024's implementation of FastSpeech 2.

Note:
- Support multi languague training, the default phoneme support Vietnamese and English, custom for other language
Utterance level encoderandPhoneme level encoderto improve acoustic generalization

Conditional layer normwhich is the soul of AdaSpeech paper

Requirements:
- Install Pytorch
Before installing pytorch please check your Cuda version by running following command :
nvcc --version
pip install -r requirements.txt
Training
Preprocessing
- First, align the corpus by using MFA tool to get TextGrid (note that you have to run each language separately then move all speaker's TextGrid in to single folder named "textgrid")
- copy textgrid folder in to preprocessed path
run the preprocessing script
python preprocess.py config/pretrain/preprocess.yaml
Training
Train baseline model with
python train.py [-h] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]
Finetune
Preprocessing
First, align the corpus by using MFA tool to get TextGrid (note that only finetune 1 speaker for best quality)
run the preprocessing script
python preprocess.py config/finetune/preprocess.yaml
Finetune
Finetune speaker voice with
python finetune.py [-h] [--pretrain_dir BASE_LINE_MODEL_PATH] [-p PREPROCESS_CONFIG_PATH] [-m MODEL_CONFIG_PATH] [-t TRAIN_CONFIG_PATH] [--vocoder_checkpoint VOCODER_CHECKPOINT_PATH] [--vocoder_config VOCODER_CONFIG_PATH]
TensorBoard
Use
tensorboard [--logdir LOG_PATH]
-
Tensorboard for pretrain model

-
Tensorboard for finetune with only 5 sentences

References
- AdaSpeech: Adaptive text to speech for custom voice.
- ming024's implementation
- rishikksh20's AdaSpeech implementation
Citation
@misc{chen2021adaspeech,
title={AdaSpeech: Adaptive Text to Speech for Custom Voice},
author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
year={2021},
eprint={2103.00993},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Related Skills
node-connect
347.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.6kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
