CVC
CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
Install / Use
/learn @Tinglok/CVCREADME
Contrastive Voice Conversion (CVC)
Video (3m) | Website | Paper
<br> <img src='figs/CVC.jpg' align="center" width=800><br><br>
This implementation is based on CUT, thanks Taesung and Junyan for sharing codes.
We provide a PyTorch implementation of non-parallel voice conversion based on patch-wise contrastive learning and adversarial learning. Compared to baseline CycleGAN-VC, CVC only requires one-way GAN training when it comes to non-parallel one-to-one voice conversion, while improving speech quality and reducing training time.
Prerequisites
- Linux or macOS
- Python 3
- CPU or NVIDIA GPU + CUDA CuDNN
Kick Start
- Clone this repo:
git clone https://github.com/Tinglok/CVC
cd CVC
-
Install PyTorch 1.6 and other dependencies.
For pip users, please type the command
pip install -r requirements.txt.For Conda users, you can create a new Conda environment using
conda env create -f environment.yaml. -
Download pre-trained Parallel WaveGAN vocoder to
./checkpoints/vocoder.
CVC Training and Test
- Download the
VCTKdataset
cd dataset
wget http://datashare.is.ed.ac.uk/download/DS_10283_2651.zip
unzip DS_10283_2651.zip
unzip VCTK-Corpus.zip
cp -r ./VCTK-Corpus/wav48/p* ./voice/trainA
cp -r ./VCTK-Corpus/wav48/p* ./voice/trainB
where the speaker folder could be any speakers (e.g. p256, and p270).
- Train the CVC model:
python train.py --dataroot ./datasets/voice --name CVC
The checkpoints will be stored at ./checkpoints/CVC/.
- Test the CVC model:
python test.py --dataroot ./datasets/voice --validation_A_dir ./datasets/voice/trainA --output_A_dir ./checkpoints/CVC/converted_sound
The converted utterance will be saved at ./checkpoints/CVC/converted_sound.
Baseline CycleGAN-VC Training and Test
- Train the CycleGAN-VC model:
python train.py --dataroot ./datasets/voice --name CycleGAN --model cycle_gan
- Test the CycleGAN-VC model:
python test.py --dataroot ./datasets/voice --validation_A_dir ./datasets/voice/trainA --output_A_dir ./checkpoints/CycleGAN/converted_sound --model cycle_gan
The converted utterance will be saved at ./checkpoints/CycleGAN/converted_sound.
Pre-trained CVC Model
Pre-trained models on p270-to-p256 and many-to-p249 are avaliable at this URL.
TensorBoard Visualization
To view loss plots, run tensorboard --logdir=./checkpoints and click the URL http://localhost:6006/.
Citation
If you use this code for your research, please cite our paper.
@inproceedings{li2021cvc,
author={Tingle Li and Yichen Liu and Chenxu Hu and Hang Zhao},
title={{CVC: Contrastive Learning for Non-Parallel Voice Conversion}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={1324--1328}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
