39 skills found · Page 1 of 2
facebookresearch / DenoiserReal Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
facebookresearch / WavAugmentA library for speech data augmentation in time-domain
zcaceres / Spec Augment🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
emexlabs / WearableIntelligenceSystemWearable computing software framework for intelligence augmentation research and applications. Easily build smart glasses apps, relying on built in voice command, speech recognition, computer vision, UI, sensors, smart phone connection, NLP, facial recognition, database, cloud connection, and more. This repo is in beta.
glam-imperial / EmotionalConversionStarGANThis repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".
pyyush / SpecAugmentSpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
a-n-rose / Python Sound ToolSoundPy (alpha stage) is a research-based python package for speech and sound. Applications include deep-learning, filtering, speech-enhancement, audio augmentation, feature extraction and visualization, dataset and audio file conversion, and beyond.
freds0 / Data Augmentation For AsrA set of audio augmentation techniques to perform noise insertion in datasets used for Automatic Speech Recognition.
alicank / Translation Augmented LibriSpeech CorpusLarge scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text.
felixchenfy / Speech Commands Classification By LSTM PyTorchClassification of 11 types of audio clips using MFCCs features and LSTM. Pretrained on Speech Command Dataset with intensive data augmentation.
bobchennan / Sparse Image Warp PytorchPytorch implementation of sparse_image_warp and an example of GoogleBrain's SpecAugment is given: A Simple Data Augmentation Method for Automatic Speech Recognition https://arxiv.org/abs/1904.08779
guglielmocamporese / Learning Invariances In Speech RecognitionIn this work I investigate the speech command task developing and analyzing deep learning models. The state of the art technology uses convolutional neural networks (CNN) because of their intrinsic nature of learning correlated represen- tations as is the speech. In particular I develop different CNNs trained on the Google Speech Command Dataset and tested on different scenarios. A main problem on speech recognition consists in the differences on pronunciations of words among different people: one way of building an invariant model to variability is to augment the dataset perturbing the input. In this work I study two kind of augmentations: the Vocal Tract Length Perturbation (VTLP) and the Synchronous Overlap and Add (SOLA) that locally perturb the input in frequency and time respectively. The models trained on augmented data outperforms in accuracy, precision and recall all the models trained on the normal dataset. Also the design of CNNs has impact on learning invariances: the inception CNN architecture in fact helps on learning features that are invariant to speech variability using different kind of kernel sizes for convolution. Intuitively this is because of the implicit capability of the model on detecting different speech pattern lengths in the audio feature.
Bartelds / Asr AugmentationMaking More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
meelement / Noise Adversarial TacotronReproduction of paper: Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization
sagniklp / Disfluency Removal APIDisfluency Detection, Removal & Correction: Increase Apparent Public Speaking Fluency By Speech Augmentation (ICASSP '19)
gfdb / Wav2augA general purpose task-agnostic speech augmentation policy
viig99 / EsolafastFast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.
irebai / SpecAugment KALDIA KALDI/C++ implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
salesforce / Speech DatasetsSimplified recipes for preparing commonly used speech datasets, and a PyTorch-compatible Python data loader that can perform standard feature computations & data augmentations.
bigdatasciencegroup / AlterEgoA wearable non-invasive silent-speech interface with auditory feedback via bone conduction, for assistive and intelligence augmentation