SkillAgentSearch skills...

NeuralSVB

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Install / Use

/learn @MoonInTheRiver/NeuralSVB
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Learning the Beauty in Songs: Neural Singing Voice Beautifier


arXiv GitHub Stars visitors

<div align="center"> <a href="https://neuralsvb.github.io" target="_blank">Demo&nbsp;Page</a> </div>

This repository is the official PyTorch implementation of our ACL-2022 paper.

0. Dataset (PopBuTFy) Acquirement

Audio samples

  • You can download the dataset from here. Please send us an email for registration (See in apply_form).
  • Dataset preview.

Text labels

NeuralSVB does not need text as input, but the ASR model to extract PPG needs text. Thus we also provide the text labels of PopBuTFy.

<!-- We recommend mixing [LibriTTS](https://www.openslr.org/60/) with PopBuTFy to train the ASR model. -->

1. Preparation

Environment Preparation

Most of the required packages are in https://github.com/NATSpeech/NATSpeech/blob/main/requirements.txt

Or you can prepare environments with the Requirements.txt file in the repository directory.

pip install Requirements.txt

Data Preparation

  1. Extract embeddings of vocal timbre:
    CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/save_emb.yaml
    
  2. Pack the dataset:
    CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/para_bin.yaml
    

Vocoder Preparation

We provide the pre-trained model of HifiGAN-Singing which is specially designed for SVS with NSF mechanism.

Please unzip pre-trained vocoder into checkpoints before training your acoustic model.

This singing vocoder is trained on 100+ hours singing data (including Chinese and English songs).

PPG Extractor Preparation

We provide the pre-trained model of PPG Extractor.

Please unzip pre-trained PPG extractor into checkpoints before training your acoustic model.

After the instructions above, the directory structure should be as follows:

.
|--data
    |--processed
        |--PopBuTFy (unzip PopBuTFy.zip)
            |--data
                |--directories containing wavs
    |--binary
        |--PopBuTFyENSpkEM
|--checkpoints
    |--1009_pretrain_asr_english
        |--
        |--config.yaml
    |--1012_hifigan_all_songs_nsf
        |--
        |--config.yaml

2. Training Example

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset

3. Inference

Inference from packed test set

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset --infer

Inference results will be saved in ./checkpoints/EXP_NAME/generated_ by default.

We provided:

Remember to put the pre-trained models in checkpoints directory.

Inference from raw inputs

WIP.

Limitations

See Appendix D "Limitations and Solutions" in our paper.

Citation

If this repository helps your research, please cite:

@inproceedings{liu-etal-2022-learning-beauty,
title = "Learning the Beauty in Songs: Neural Singing Voice Beautifier",
author = "Liu, Jinglin  and
  Li, Chengxi  and
  Ren, Yi  and
  Zhu, Zhiying  and
  Zhao, Zhou",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.549",
pages = "7970--7983",}

Issues

  • Before raising a issue, please check our Readme and other issues for possible solutions.
  • We will try to handle your problem in time but we could not guarantee a satisfying solution.
  • Please be friendly.

Acknowledgements

The framework of this repository is based on DiffSinger, and is a predecessor of NATSpeech.

View on GitHub
GitHub Stars461
CategoryEducation
Updated3d ago
Forks57

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings