TSNetVocoder

No description available

Generate Convert Improve

Install / Use

/learn @nii-yamagishilab/TSNetVocoder

About this skill

Quality Score

0/100

README

TSNetVocoder

This software is distributed under BSD 3-Clause license. Please see LICENSE for more details.
Paper : http://arxiv.org/abs/1810.11945
Speech samples : https://nii-yamagishilab.github.io/TSNetVocoder/index.html

Reference

Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi, "STFT spectral loss for training a neural speech waveform model," arXiv preprint arXiv:1810.11945, 2018.

Requirements

See Dockerfile.

Usage

Wav files need to be put in 'data/wav_trn' (training), 'data/wav_val' (validation) and 'data/wav_test' (analysis-by-synthesis) directories.
- Following file format is supported.
  - Sampling rate : 16000
  - Quantization bit : 16bit (signed-integer)
  - Number of channels : 1
- Each utterance should be stored in one wav file.
By running 00_run.py, you can find a trained model and analysis-by-synthesis wav files in 'model' and 'gen' directories, respectively.

python3 00_run.py

Using alpha (Option)

alphadir written in Config.py need to be modified.

alphadir = {'trn' : datadir + '/alpha_trn',
            'val' : datadir + '/alpha_val',
            'test' : None}).

alpha files (format: float, extention: .alpha) need to be put in 'data/alpha_trn' and 'data/alpha_val'.
- For example, you can use voiced/unvoiced flags as alpha and extract them from speech waveform using SPTK (http://sp-tk.sourceforge.net/) as follows.

wav2raw -d ./ hoge.wav
x2x +sf hoge.raw | pitch -p 80 -o 1 | sopr -c 1.0 | interpolate -l 1 -p 257 -d > hoge.alpha

Who we are

Shinji Takaki (https://researchmap.jp/takaki/?lang=english)
Toru Nakashika (http://www.sd.is.uec.ac.jp/nakashika/)
Xin Wang (https://researchmap.jp/wangxin/?lang=english)
Junichi Yamagishi (https://researchmap.jp/read0205283/?lang=english)

Related Skills

node-connect

348.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。