XTTSv2Scripts
scripts for creating LJSpeech format dataset for TTS task
Install / Use
/learn @zuverschenken/XTTSv2ScriptsREADME
Collection of helper scripts for creating LJSpeech format dataset for TTS. here is the main notebook referencing this.
Steps you must complete before using these scripts:
- Make sure your audio files are appropriate (refer to my main kaggle notebook on this above)
- Install pyannote. Simple instructions are here under the TL;DR heading
I'm assuming you're starting with 1 or more long WAV audio files that all contain at least some speech from your target speaker and you want to turn it into a LJSpeech style dataset.
Using these scripts:
Follow along with the "#NOTE:" comments in each .py file
- Give your source audio to diarize.py to create diarization files
- Give the diarization files and your source audio files to createchunks.py to create short clips of speech
- Give your short clips of speech to transcribeaudio.py to create the text transcription you will use for training
the output of step 2. is your wavs folder and the output of step 3. is your metadata.csv which is everything you need for an LJSpeech style dataset.
