MockingBird
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
Install / Use
/learn @babysor/MockingBirdREADME
🚧 While I no longer actively update this repo, you can find me continuously pushing this tech forward to good side and open-source. I'm also building an optimized and cloud hosted version: https://noiz.ai/ and we're hiring.
<a href="https://trendshift.io/repositories/3869" target="_blank"><img src="https://trendshift.io/api/badge/repositories/3869" alt="babysor%2FMockingBird | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
Features
🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.
🤩 PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060
🌍 Windows + Linux run in both Windows OS and linux OS (even in M1 MACOS)
🤩 Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder
🌍 Webserver Ready to serve your result with remote calling
DEMO VIDEO
Quick Start
1. Install Requirements
1.1 General Setup
Follow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.
- Install PyTorch.
If you get an
ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 )This error is probably due to a low version of python, try using 3.9 and it will install successfully
- Install ffmpeg.
- Run
pip install -r requirements.txtto install the remaining necessary packages.
The recommended environment here is
Repo Tag 0.0.1Pytorch1.9.0 with Torchvision0.10.0 and cudatoolkit10.2requirements.txtwebrtcvad-wheelsbecauserequirements. txtwas exported a few months ago, so it doesn't work with newer versions
- Install webrtcvad
pip install webrtcvad-wheels(If you need)
or
-
install dependencies with
condaormambaconda env create -n env_name -f env.ymlmamba env create -n env_name -f env.ymlwill create a virtual environment where necessary dependencies are installed. Switch to the new environment by
conda activate env_nameand enjoy it.env.yml only includes the necessary dependencies to run the project,temporarily without monotonic-align. You can check the official website to install the GPU version of pytorch.
1.2 Setup with a M1 Mac
The following steps are a workaround to directly use the original
demo_toolbox.pywithout the changing of codes.Since the major issue comes with the PyQt5 packages used in
demo_toolbox.pynot compatible with M1 chips, were one to attempt on training models with the M1 chip, either that person can forgodemo_toolbox.py, or one can try theweb.pyin the project.
1.2.1 Install PyQt5, with ref here.
- Create and open a Rosetta Terminal, with ref here.
- Use system Python to create a virtual environment for the project
/usr/bin/python3 -m venv /PathToMockingBird/venv source /PathToMockingBird/venv/bin/activate - Upgrade pip and install
PyQt5pip install --upgrade pip pip install pyqt5
1.2.2 Install pyworld and ctc-segmentation
Both packages seem to be unique to this project and are not seen in the original Real-Time Voice Cloning project. When installing with
pip install, both packages lack wheels so the program tries to directly compile from c code and could not findPython.h.
-
Install
pyworldbrew install pythonPython.hcan come with Python installed by brewexport CPLUS_INCLUDE_PATH=/opt/homebrew/Frameworks/Python.framework/HeadersThe filepath of brew-installedPython.his unique to M1 MacOS and listed above. One needs to manually add the path to the environment variables.pip install pyworldthat should do.
-
Install
ctc-segmentationSame method does not apply to
ctc-segmentation, and one needs to compile it from the source code on github.git clone https://github.com/lumaku/ctc-segmentation.gitcd ctc-segmentationsource /PathToMockingBird/venv/bin/activateIf the virtual environment hasn't been deployed, activate it.cythonize -3 ctc_segmentation/ctc_segmentation_dyn.pyx/usr/bin/arch -x86_64 python setup.py buildBuild with x86 architecture./usr/bin/arch -x86_64 python setup.py install --optimize=1 --skip-buildInstall with x86 architecture.
1.2.3 Other dependencies
/usr/bin/arch -x86_64 pip install torch torchvision torchaudioPip installingPyTorchas an example, articulate that it's installed with x86 architecturepip install ffmpegInstall ffmpegpip install -r requirements.txtInstall other requirements.
1.2.4 Run the Inference Time (with Toolbox)
To run the project on x86 architecture. ref.
vim /PathToMockingBird/venv/bin/pythonM1Create an executable filepythonM1to condition python interpreter at/PathToMockingBird/venv/bin.- Write in the following content:
#!/usr/bin/env zsh mydir=${0:a:h} /usr/bin/arch -x86_64 $mydir/python "$@" chmod +x pythonM1Set the file as executable.- If using PyCharm IDE, configure project interpreter to
pythonM1(steps here), if using command line python, run/PathToMockingBird/venv/bin/pythonM1 demo_toolbox.py
2. Prepare your models
Note that we are using the pretrained encoder/vocoder but not synthesizer, since the original model is incompatible with the Chinese symbols. It means the demo_cli is not working at this moment, so additional synthesizer models are required.
You can either train your models or use existing ones:
2.1 Train encoder with your dataset (Optional)
-
Preprocess with the audios and the mel spectrograms:
python encoder_preprocess.py <datasets_root>Allowing parameter--dataset {dataset}to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets. -
Train the encoder:
python encoder_train.py my_run <datasets_root>/SV2TTS/encoder
For training, the encoder uses visdom. You can disable it with
--no_visdom, but it's nice to have. Run "visdom" in a separate CLI/process to start your visdom server.
2.2 Train synthesizer with your dataset
-
Download dataset and unzip: make sure you can access all .wav in folder
-
Preprocess with the audios and the mel spectrograms:
python pre.py <datasets_root>Allowing parameter--dataset {dataset}to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh. -
Train the synthesizer:
python train.py --type=synth mandarin <datasets_root>/SV2TTS/synthesizer -
Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.
2.3 Use pretrained model of synthesizer
Thanks to the community, some models will be shared:
| author | Download link | Preview Video | Info | | --- | ----------- | ----- |----- | | @author | https://pan.baidu.com/s/1iONvRxmkI-t1nHqxKytY3g Baidu 4j5d | | 75k steps trained by multiple datasets | @author | https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw Baidu code:om7f | | 25k steps trained by multiple datasets, only works under version 0.0.1 |@FawenYo | https://yisiou-my.sharepoint.com/:u:/g/personal/lawrence_cheng_fawenyo_onmicrosoft_com/EWFWDHzee-NNg9TWdKckCc4BC7bK2j9cCbOWn0-_tK0nOg?e=n0gGgC | input output | 200k steps with local accent of Taiwan, only works under version 0.0.1 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code: 2021 https://www.aliyundrive.com/s/AwPsbo8mcSP code: z2m0 | https://www.bilibili.com/video/BV1uh411B7AD/ | only works under version 0.0.1
2.4 Train vocoder (Optional)
note: vocoder has little difference in effect, so you may not need to train a new one.
- Preprocess the data:
python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>
<datasets_root>replace with your dataset root,<synthesizer_model_path>replace with directory of your best trained models of sythensizer, e.g. sythensizer\saved_mode\xxx
-
Train the wavernn vocoder:
python vocoder_train.py mandarin <datasets_root> -
Train the hifigan vocoder
python vocoder_train.py mandarin <datasets_root> hifigan
3. Launch
3.1 Using the web server
You can then try to run:python web.py and open it in browser, default as http://localhost:8080
3.2 Using the Toolbox
You can then try the toolbox:
python demo_toolbox.py -d <datasets_root>
3.3 Using the command line
You can then try the command: `python gen_voice.py <text_file.txt> your_wav_file
Related Skills
openclaw-plugin-loom
Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
Leadership-Mirror
Product Overview Project Purpose Hack Atria is a leadership development and team management platform that provides AI-powered insights, feedback analysis, and learning resources to help leaders
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
