SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Park, Daniel S. and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D. and Le, Quoc V. Interspeech 2019 [Paper]

About

This repository contains a implementation of the augmentation methodology proposed in the above paper.

Base Input

SpecAugmented Output (Policy = 'LB')

Requirements:

python3
librosa
libsndfile
audioread
ffmpeg
numpy
tensorflow
tensorflow_addons

Usage:

main.py [--dir][--policy]

--dir | path/to/dataset | default='./LibriSpeech/'
--policy | augmentation policy to use from {'LB','LD', 'SS', 'SM'} | deafault='LD'

refer to demo/demo.ipynb for jupyter notebook demo

References:

@article{Park_2019, title={SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition}, url={http://dx.doi.org/10.21437/Interspeech.2019-2680}, DOI={10.21437/interspeech.2019-2680}, journal={Interspeech 2019}, publisher={ISCA}, author={Park, Daniel S. and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D. and Le, Quoc V.}, year={2019}, month={Sep} }

SpecAugment

Install / Use

README