24 skills found
yl4579 / StyleTTS2StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
s3prl / S3prlSelf-Supervised Speech Pre-training and Representation Learning Toolkit
wenet-e2e / WespeakerResearch and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
ASR-project / Multilingual PRPhoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.
lucadellalib / FocalcodecA low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation
sinhat98 / Adapter WavlmNo description available
lucadellalib / AudiocodecsA collections of audio codecs with a standardized API
mjhydri / Singing Vocal Beat TrackingThis repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and tempo.
lucadellalib / Discrete Wavlm CodecA neural speech codec based on discrete WavLM representations
hi-paris / Wavlm Vocoder FrenchWavLM-to-Audio neural vocoder for French speech reconstruction — layer ablation study and adversarial supervision as a foundation for continuous voice conversion (JEP 2026)
Amir-Ivry / MAPSS MeasuresThe code for the MAPSS measures for source separation evaluation (ICLR, 2026)
bunyaminergen / WavLMMSDDThis repository combines `WavLM`, a powerful speech representation model from Microsoft, with `MSDD` (Multi-Scale Diarization Decoder), a state-of-the-art approach for speaker diarization from Nvidia.
AnshKapadia / TS VAD PlusTS-VAD+: Transformer-based speaker diarization system developed as part of my MS thesis at NTU Singapore, improving diarization in overlapping speech using ECAPA-TDNN, WavLM, VBx, and memory-aware attention.
Sarasadeghii / Sharif WavLMIn this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.
sadPororo / L TDNNLayer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models, to appear in ICAIIC 2026
sadPororo / LAPRethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification, ISCA Interspeech 2025
alessandropec / Data Driven AI Voice CloningThis repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering
theolepage / Wavlm Ssl SvSOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.
WhiteDogz / Wavlm Svspeaker encoder based on speaker verification task, WavLM pre-trained.
TheKOG / Voice Extractor Based On Whisper And WavLMNo description available