VoiceCloning
Generative voice cloning model using TTS synthesis with state-of-the-art Zero-Shot Multi-Speaker functionality. An web api built with the YourTTS TTS model to clone and generate realistic audio waves
Install / Use
/learn @MartinMashalov/VoiceCloningREADME
Voice Cloning Model with Zero-Shot Attention-Based TTS
The AI used in this API is the YourTTS Zero-Shot Multispeaker TTS implementation of generative audio modeling.
The paper that proposed the YourTTS model was used as a central building block of the API. YourTTS for a multilingual approach for zero-shot multi-speaker TTS which can be utilized on multilingual audio data while building on older VITS approaches.
Reference Implementations used to study TTS concepts can be found here
The Models Researched under open source as provided from Coqui
| Model | URL | |------------------------------|------------------------------------------------------------------------------------------------| | Speaker Encoder | link | | Exp 1. YourTTS-EN(VCTK) | link | | Exp 1. YourTTS-EN(VCTK) + SCL | link | | Exp 2. YourTTS-EN(VCTK)-PT | link | | Exp 2. YourTTS-EN(VCTK)-PT + SCL | link | | Exp 3. YourTTS-EN(VCTK)-PT-FR | link | | Exp 3. YourTTS-EN(VCTK)-PT-FR SCL | link | | Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL | link |
TTS Retraining Data
The audios for the MOS are available here. Also, the MOS the audios are here.
Default TTS Audio Sources:
LibriTTS (test clean): 1188, 1995, 260, 1284, 2300, 237, 908, 1580, 121 and 1089
VCTK: p261, p225, p294, p347, p238, p234, p248, p335, p245, p326 and p302
MLS Portuguese: 12710, 5677, 12249, 12287, 9351, 11995, 7925, 3050, 4367 and 1306
Citation
@ARTICLE{2021arXiv211202418C,
author = {{Casanova}, Edresson and {Weber}, Julian and {Shulby}, Christopher and {Junior}, Arnaldo Candido and {G{\"o}lge}, Eren and {Antonelli Ponti}, Moacir},
title = "{YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone}",
journal = {arXiv e-prints},
keywords = {Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing},
year = 2021,
month = dec,
eid = {arXiv:2112.02418},
pages = {arXiv:2112.02418},
archivePrefix = {arXiv},
eprint = {2112.02418},
primaryClass = {cs.SD},
adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211202418C},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.9kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
