TSNetVocoder
No description available
Install / Use
/learn @nii-yamagishilab/TSNetVocoderREADME
TSNetVocoder
- This software is distributed under BSD 3-Clause license. Please see LICENSE for more details.
- Paper : http://arxiv.org/abs/1810.11945
- Speech samples : https://nii-yamagishilab.github.io/TSNetVocoder/index.html
Reference
- Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi, "STFT spectral loss for training a neural speech waveform model," arXiv preprint arXiv:1810.11945, 2018.
Requirements
- See Dockerfile.
Usage
- Wav files need to be put in 'data/wav_trn' (training), 'data/wav_val' (validation) and 'data/wav_test' (analysis-by-synthesis) directories.
- Following file format is supported.
- Sampling rate : 16000
- Quantization bit : 16bit (signed-integer)
- Number of channels : 1
- Each utterance should be stored in one wav file.
- Following file format is supported.
- By running 00_run.py, you can find a trained model and analysis-by-synthesis wav files in 'model' and 'gen' directories, respectively.
python3 00_run.py
Using alpha (Option)
- alphadir written in Config.py need to be modified.
alphadir = {'trn' : datadir + '/alpha_trn',
'val' : datadir + '/alpha_val',
'test' : None}).
- alpha files (format: float, extention: .alpha) need to be put in 'data/alpha_trn' and 'data/alpha_val'.
- For example, you can use voiced/unvoiced flags as alpha and extract them from speech waveform using SPTK (http://sp-tk.sourceforge.net/) as follows.
wav2raw -d ./ hoge.wav
x2x +sf hoge.raw | pitch -p 80 -o 1 | sopr -c 1.0 | interpolate -l 1 -p 257 -d > hoge.alpha
Who we are
- Shinji Takaki (https://researchmap.jp/takaki/?lang=english)
- Toru Nakashika (http://www.sd.is.uec.ac.jp/nakashika/)
- Xin Wang (https://researchmap.jp/wangxin/?lang=english)
- Junichi Yamagishi (https://researchmap.jp/read0205283/?lang=english)
Related Skills
node-connect
348.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
