R interface to fastai

The fastai package provides R wrappers to fastai.

The fastai library simplifies training fast and accurate neural nets using modern best practices. See the fastai website to get started. The library is based on research into deep learning best practices undertaken at fast.ai, and includes "out of the box" support for vision, text, tabular, and collab (collaborative filtering) models.

Continuous Build Status

| Build | Status | | ----------------- | ------------------------------------------------------------------------------ | | Bionic | | | Focal | | | Mac OS | | | Windows | |

Installation

1. Install miniconda and activate environment:

reticulate::install_miniconda()
reticulate::conda_create('r-reticulate')

2. The dev version:

devtools::install_github('eagerai/fastai')

3. Later, you need to install the python module fastai:

reticulate::use_condaenv('r-reticulate',required = TRUE)
fastai::install_fastai(gpu = FALSE, cuda_version = '11.6', overwrite = FALSE)

4. Restart RStudio!

fast.ai extensions:

Kaggle

We currently prepare the examples of usage of the fastai from R in Kaggle competitions:

Contributions are very welcome!

Tabular data

library(magrittr)
library(fastai)

# download
URLs_ADULT_SAMPLE()

# read data
df = data.table::fread('adult_sample/adult.csv')

Variables:

dep_var = 'salary'
cat_names = c('workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race')
cont_names = c('age', 'fnlwgt', 'education-num')

Preprocess strategy:

procs = list(FillMissing(),Categorify(),Normalize())

Prepare:

dls = TabularDataTable(df, procs, cat_names, cont_names,
      y_names = dep_var, splits = list(c(1:32000),c(32001:32561))) %>%
      dataloaders(bs = 64)

Summary:

model = dls %>% tabular_learner(layers=c(200,100), metrics=accuracy)
model %>% summary()

TabularModel (Input shape: ['64 x 7', '64 x 3'])
================================================================
Layer (type)         Output Shape         Param #    Trainable
================================================================
Embedding            64 x 6               60         True
________________________________________________________________
Embedding            64 x 8               136        True
________________________________________________________________
Embedding            64 x 5               40         True
________________________________________________________________
Embedding            64 x 8               136        True
________________________________________________________________
Embedding            64 x 5               35         True
________________________________________________________________
Embedding            64 x 4               24         True
________________________________________________________________
Embedding            64 x 3               9          True
________________________________________________________________
Dropout              64 x 39              0          False
________________________________________________________________
BatchNorm1d          64 x 3               6          True
________________________________________________________________
BatchNorm1d          64 x 42              84         True
________________________________________________________________
Linear               64 x 200             8,400      True
________________________________________________________________
ReLU                 64 x 200             0          False
________________________________________________________________
BatchNorm1d          64 x 200             400        True
________________________________________________________________
Linear               64 x 100             20,000     True
________________________________________________________________
ReLU                 64 x 100             0          False
________________________________________________________________
Linear               64 x 2               202        True
________________________________________________________________

Total params: 29,532
Total trainable params: 29,532
Total non-trainable params: 0

Optimizer used: <function Adam at 0x7fa246283598>
Loss function: FlattenedLoss of CrossEntropyLoss()

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback

Before fitting try to find optimal learning rate:

model %>% lr_find()

model %>% plot_lr_find(dpi = 200)

Run:

model %>% fit(5, lr = 10^-1)

epoch     train_loss  valid_loss  accuracy  time
0         0.360149    0.329587    0.846702  00:04
1         0.352106    0.345761    0.828877  00:04
2         0.368743    0.340913    0.844920  00:05
3         0.347277    0.333084    0.852050  00:04
4         0.348969    0.350707    0.830660  00:04

Plot loss history:

model %>% plot_loss(dpi = 200)

See training process:

Get confusion matrix:

model %>% get_confusion_matrix()

       <50k  >=50k
<50k   407    22
>=50k   68    64

Plot it:

interp = ClassificationInterpretation_from_learner(model)

interp %>% plot_confusion_matrix(dpi = 90,figsize = c(6,6))

Get predictions on new data:

> model %>% predict(df[10:15,])

       <50k     >=50k classes
1 0.5108562 0.4891439       0
2 0.4827824 0.5172176       1
3 0.4873166 0.5126833       1
4 0.5013804 0.4986197       0
5 0.4964157 0.5035844       1
6 0.5111378 0.4888622       0

Image data

Get Pets dataset:

URLs_PETS()

Define path to folders:

path = 'oxford-iiit-pet'
path_anno = 'oxford-iiit-pet/annotations'
path_img = 'oxford-iiit-pet/images'
fnames = get_image_files(path_img)

See one of examples:

fnames[1]

oxford-iiit-pet/images/american_pit_bull_terrier_129.jpg

Dataloader:

dls = ImageDataLoaders_from_name_re(
  path, fnames, pat='(.+)_\\d+.jpg$',
  item_tfms=Resize(size = 460), bs = 10,
  batch_tfms=list(Normalize_from_stats( imagenet_stats() )
                  )
)

Show batch for visualization:

dls %>% show_batch()

Model architecture:

learn = cnn_learner(dls, resnet34(), metrics = error_rate)

And fit:

learn %>% fit_one_cycle(n_epoch = 2)

epoch     train_loss  valid_loss  error_rate  time
0         0.904872    0.317927    0.105548    00:35
1         0.694395    0.239520    0.083897    00:36

Get confusion matrix and plot:

conf = learn %>% get_confusion_matrix()

library(highcharter)
hchart(conf, label = TRUE) %>%
    hc_yAxis(title = list(text = 'Actual')) %>%
    hc_xAxis(title = list(text = 'Predicted'),
             labels = list(rotation = -90))

Note that the plot is built with highcharter.

Plot top losses:

interp = ClassificationInterpretation_from_learner(learn)

interp %>% plot_top_losses(k = 9, figsize = c(15,11))

Alternatively, load images from folders:

# get sample data
URLs_MNIST_SAMPLE()

# transformations
path = 'mnist_sample'
bs = 20

#load into memory
data = ImageDataLoaders_from_folder(path, size = 26, bs = bs)

# Visualize and train
data %>% show_batch(dpi = 150)

learn = cnn_learner(data, resnet18(), metrics = accuracy)
learn %>% fit(2)

What about the implementation of the latest Computer Vision models?

There is a function in fastai timm_learner which originally written by Zachary Mueller. It helps to quickly load the pretrained models from timm library.

First, lets's see the list of available m

Fastai

Install / Use

README