nlpaug

This python library helps you with augmenting nlp for your machine learning projects. Visit this introduction to understand about Data Augmentation in NLP. Augmenter is the basic element of augmentation while Flow is a pipeline to orchestra multi augmenter together.

Features

Generate synthetic data for improving model performance without manual effort
Simple, easy-to-use and lightweight library. Augment data in 3 lines of code
Plug and play to any machine leanring/ neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow)
Support textual and audio input

<h3 align="center">Textual Data Augmentation Example</h3> <br><p align="center"><img src="https://github.com/makcedward/nlpaug/blob/master/res/textual_example.png"/></p> <h3 align="center">Acoustic Data Augmentation Example</h3> <br><p align="center"><img src="https://github.com/makcedward/nlpaug/blob/master/res/audio_example.png"/></p>

Quick Demo

Quick Example
Example of Augmentation for Textual Inputs
Example of Augmentation for Multilingual Textual Inputs
Example of Augmentation for Spectrogram Inputs
Example of Augmentation for Audio Inputs
Example of Orchestra Multiple Augmenters
Example of Showing Augmentation History
How to train TF-IDF model
How to train LAMBADA model
How to create custom augmentation
API Documentation

Augmenter

Flow

Installation

The library supports python 3.5+ in linux and window platform.

To install the library:

pip install numpy requests nlpaug

or install the latest version (include BETA features) from github directly

pip install numpy git+https://github.com/makcedward/nlpaug.git

or install over conda

conda install -c makcedward nlpaug

If you use BackTranslationAug, ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug, installing the following dependencies as well

pip install torch>=1.6.0 transformers>=4.11.3 sentencepiece

If you use LambadaAug, installing the following dependencies as well

pip install simpletransformers>=0.61.10

If you use AntonymAug, SynonymAug, installing the following dependencies as well

pip install nltk>=3.4.5

If you use WordEmbsAug (word2vec, glove or fasttext), downloading pre-trained model first and installing the following dependencies as well

from nlpaug.util.file.download import DownloadUtil
DownloadUtil.download_word2vec(dest_dir='.') # Download word2vec model
DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.') # Download GloVe model
DownloadUtil.download_fasttext(model_name='wiki-news-300d-1M', dest_dir='.') # Download fasttext model

pip install gensim>=4.1.2

If you use SynonymAug (PPDB), downloading file from the following URI. You may not able to run the augmenter if you get PPDB file from other website

http://paraphrase.org/#/download

If you use PitchAug, SpeedAug and VtlpAug, installing the following dependencies as well

pip install librosa>=0.9.1 matplotlib

Recent Changes

1.1.11 Jul 6, 2022

See changelog for more details.

Extension Reading

Data Augmentation library for Text
[Does your NLP model able to prevent adversarial attack?](https://medium.com/hackernoon/does-your-nlp-model-able-

Nlpaug

Install / Use

README