DataGenerator
No description available
Install / Use
/learn @emunaran/DataGeneratorREADME
Tabular data generation using Generative Adversarial Networks (GANs) and Wasserstein GANs (WGANs)
GANs and WGANs are two types of state-of-the-art algorithms to generate data from scratch. Generating data allows us, to some extend, training machine learning (ML) models results in better performance when suffering from a lack of original data. While GANs are widely used for images data, here we'll utilize GANs power to handle tabular data. This project aims to provide a clear and simple implementation (Pytorch) to GANs and WGANs. The former presents the basic idea that presented in the paper "Generative adversarial networks" (Goodfellow et al., 2014). The latter, "Wasserstein generative adversarial networks" (Arjovsky et al., 2017), stands as an extension of the GAN while demonstrating better training stability, less sensitivity for hyperparameters and model architecture, and handle categorical data (as in our case). As an example, we will use the "Pima Indians Diabetes Database" collected from Kaggle.
Getting Started
Clone the repository to your local machine.
Prerequisites
Install the requirements file with
pip install -r requirements.txt
Running example
Here we will use as an example the popular dataset "diabetes" downloaded from Kaggle. Run the main.py file with the following configuration via terminal or the IDE configuration:
Running the GAN model
--algorithm GAN --data-set data/diabetes_dataset/diabetes.csv --epochs 500 --train
Running the WGAN model
--algorithm WGAN --data-set data/diabetes_dataset/diabetes.csv --epochs 200 --train
Results and further discussions
As mentioned above, WGAN manages to handle categorical variables where GAN is not originally designed for - and suffers from what's called "Mode Collapse". In the following histograms we can see both algorithm performance where the WGAN succeded to reproduce the categorical feature "Outcome", where the simple GAN fails.
GAN | WGAN
:-------------------------:|:-------------------------:
|
|
|
|
|
|
|
| 
