SourceFilterNeuralFormants

No description available

Generate Convert Improve

Install / Use

/learn @ljuvela/SourceFilterNeuralFormants

About this skill

Quality Score

0/100

README

Neural Formant Synthesis with Differentiable Resonant Filters

Neural formant synthesis using differtiable resonant filters and source-filter model structure.

Authors: Lauri Juvela, Pablo Pérez Zarazaga, Gustav Eje Henter, Zofia Malisz

Model overview
1. Sound samples
Repository installation
Pre-trained models
Inference
Training
Citation information

Model overview <a name="model_struct"></a>

We present a model that performs neural speech synthesis using the structure of the source-filter model, allowing to independently inspect and manipulate the spectral envelope and glottal excitation:

Neural formant pipeline follwing the source-filter model architectrue

Sound samples <a name="sound_samples"></a>

A description of the presented model and sound samples compared to other synthesis/manipulation systems can be found in the project's demo webpage

Repository installation <a name="install"></a>

Conda environment <a name="conda"></a>

First, we need to create a conda environment to install our dependencies. Use mamba to speed up the process if possible.

mamba env create -n neuralformants -f environment.yml
conda activate neuralformants

Pre-trained models are available in HuggingFace, and can be downloaded using git-lfs. If you don't have git-lfs installed (it's included in environment.yml), you can find it here. Use the following command to download the pre-trained models:

git submodule update --init --recursive

Install the package in development mode:

pip install -e .

GlotNet <a name="glotnet"></a>

GlotNet is included partially for WaveNet models and DSP functions. Full repository is available here

HiFi-GAN <a name="hifi"></a>

HiFi-GAN is included in the hifi_gan subdirectory. Original source code is available here

Inference <a name="inference"></a>

We provide a script to run inference on the end-to-end architecture, such that an audio file can be provided as input and a wav file with the manipulated features is stored as output.

Change the feature scaling to modify pitch (with F0) or formants. The scales are provided as a list of 5 elements with the following order:

[F0, F1, F2, F3, F4]

An example with the provided audio samples from the VCTK dataset can be run using:

HiFi-Glot

python inference_hifiglot.py \
    --input_path "./Samples" \
    --output_path "./output/hifi-glot" \
    --config "./checkpoints/HiFi-Glot/config_hifigan.json" \
    --fm_config "./checkpoints/HiFi-Glot/config_feature_map.json" \
    --checkpoint_path "./checkpoints/HiFi-Glot" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

NFS

python inference_hifigan.py \
    --input_path "./Samples" \
    --output_path "./output/nfs" \
    --config "./checkpoints/NFS/config_hifigan.json" \
    --fm_config "./checkpoints/NFS/config_feature_map.json" \
    --checkpoint_path "./checkpoints/NFS" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

NFS-E2E

python inference_hifigan.py \
    --input_path "./Samples" \
    --output_path "./output/nfs-e2e" \
    --config "./checkpoints/NFS-E2E/config_hifigan.json" \
    --fm_config "./checkpoints/NFS-E2E/config_feature_map.json" \
    --checkpoint_path "./checkpoints/NFS-E2E" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

Model training <a name="training"></a>

Training of the HiFi-GAN and HiFi-Glot models is possible with the end-to-end architecture by using the the scripts train_e2e_hifigan.py and train_e2e_hifiglot.py.

Citation information <a name="citation"></a>

Citation information will be added when a pre-print is available.

Related Skills

node-connect

354.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

ljuvela

View profile

View on GitHub

GitHub Stars20

CategoryDevelopment

Updated1mo ago

Forks6

ljuvela/SourceFilterNeuralFormants

Languages

Python

Security Score

85/100

Audited on Feb 27, 2026

No findings

SourceFilterNeuralFormants

Install / Use

README

Neural Formant Synthesis with Differentiable Resonant Filters

Table of contents

Model overview <a name="model_struct"></a>

Sound samples <a name="sound_samples"></a>

Repository installation <a name="install"></a>

Conda environment <a name="conda"></a>

GlotNet <a name="glotnet"></a>

HiFi-GAN <a name="hifi"></a>

Inference <a name="inference"></a>

Model training <a name="training"></a>

Citation information <a name="citation"></a>

Related Skills