Collection of helper scripts for creating LJSpeech format dataset for TTS. here is the main notebook referencing this.

Steps you must complete before using these scripts:

Make sure your audio files are appropriate (refer to my main kaggle notebook on this above)
Install pyannote. Simple instructions are here under the TL;DR heading

I'm assuming you're starting with 1 or more long WAV audio files that all contain at least some speech from your target speaker and you want to turn it into a LJSpeech style dataset.

Using these scripts:

Follow along with the "#NOTE:" comments in each .py file

Give your source audio to diarize.py to create diarization files
Give the diarization files and your source audio files to createchunks.py to create short clips of speech
Give your short clips of speech to transcribeaudio.py to create the text transcription you will use for training

the output of step 2. is your wavs folder and the output of step 3. is your metadata.csv which is everything you need for an LJSpeech style dataset.

XTTSv2Scripts

Install / Use

README

Steps you must complete before using these scripts:

Using these scripts: