Fastai
R interface to fast.ai
Install / Use
/learn @EagerAI/FastaiREADME
R interface to fastai
The fastai package provides R wrappers to fastai.
The fastai library simplifies training fast and accurate neural nets using
modern best practices. See the
fastai website to get started. The library
is based on research into deep learning best practices undertaken at fast.ai,
and includes "out of the box" support for vision, text, tabular, and collab
(collaborative filtering) models.
Continuous Build Status
| Build | Status |
| ----------------- | ------------------------------------------------------------------------------ |
| Bionic | |
| Focal |
|
| Mac OS |
|
| Windows |
|
Installation
1. Install miniconda and activate environment:
reticulate::install_miniconda()
reticulate::conda_create('r-reticulate')
2. The dev version:
devtools::install_github('eagerai/fastai')
3. Later, you need to install the python module fastai:
reticulate::use_condaenv('r-reticulate',required = TRUE)
fastai::install_fastai(gpu = FALSE, cuda_version = '11.6', overwrite = FALSE)
4. Restart RStudio!
fast.ai extensions:
Kaggle
We currently prepare the examples of usage of the fastai from R in Kaggle competitions:
- Introduction
- MNIST with Pytorch and fastai
- NLP Binary Classification
- Audio classification
- CycleGAN
- Fastai on Colab TPUs
Contributions are very welcome!
Tabular data
library(magrittr)
library(fastai)
# download
URLs_ADULT_SAMPLE()
# read data
df = data.table::fread('adult_sample/adult.csv')
Variables:
dep_var = 'salary'
cat_names = c('workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race')
cont_names = c('age', 'fnlwgt', 'education-num')
Preprocess strategy:
procs = list(FillMissing(),Categorify(),Normalize())
Prepare:
dls = TabularDataTable(df, procs, cat_names, cont_names,
y_names = dep_var, splits = list(c(1:32000),c(32001:32561))) %>%
dataloaders(bs = 64)
Summary:
model = dls %>% tabular_learner(layers=c(200,100), metrics=accuracy)
model %>% summary()
TabularModel (Input shape: ['64 x 7', '64 x 3'])
================================================================
Layer (type) Output Shape Param # Trainable
================================================================
Embedding 64 x 6 60 True
________________________________________________________________
Embedding 64 x 8 136 True
________________________________________________________________
Embedding 64 x 5 40 True
________________________________________________________________
Embedding 64 x 8 136 True
________________________________________________________________
Embedding 64 x 5 35 True
________________________________________________________________
Embedding 64 x 4 24 True
________________________________________________________________
Embedding 64 x 3 9 True
________________________________________________________________
Dropout 64 x 39 0 False
________________________________________________________________
BatchNorm1d 64 x 3 6 True
________________________________________________________________
BatchNorm1d 64 x 42 84 True
________________________________________________________________
Linear 64 x 200 8,400 True
________________________________________________________________
ReLU 64 x 200 0 False
________________________________________________________________
BatchNorm1d 64 x 200 400 True
________________________________________________________________
Linear 64 x 100 20,000 True
________________________________________________________________
ReLU 64 x 100 0 False
________________________________________________________________
Linear 64 x 2 202 True
________________________________________________________________
Total params: 29,532
Total trainable params: 29,532
Total non-trainable params: 0
Optimizer used: <function Adam at 0x7fa246283598>
Loss function: FlattenedLoss of CrossEntropyLoss()
Callbacks:
- TrainEvalCallback
- Recorder
- ProgressCallback
Before fitting try to find optimal learning rate:
model %>% lr_find()
model %>% plot_lr_find(dpi = 200)
<img src="files/plot_lr.png" height=500 align=center alt="lr"/>
Run:
model %>% fit(5, lr = 10^-1)
epoch train_loss valid_loss accuracy time
0 0.360149 0.329587 0.846702 00:04
1 0.352106 0.345761 0.828877 00:04
2 0.368743 0.340913 0.844920 00:05
3 0.347277 0.333084 0.852050 00:04
4 0.348969 0.350707 0.830660 00:04
Plot loss history:
model %>% plot_loss(dpi = 200)
<img src="files/plot_loss.png" height=500 align=center alt="lr"/>
See training process:
<p align="center"> <img src="files/fastai.gif" height=350 align=center alt="train"/> </p>Get confusion matrix:
model %>% get_confusion_matrix()
<50k >=50k
<50k 407 22
>=50k 68 64
Plot it:
interp = ClassificationInterpretation_from_learner(model)
interp %>% plot_confusion_matrix(dpi = 90,figsize = c(6,6))
<img src="files/conf_.png" height=500 align=center alt="Pets"/>
Get predictions on new data:
> model %>% predict(df[10:15,])
<50k >=50k classes
1 0.5108562 0.4891439 0
2 0.4827824 0.5172176 1
3 0.4873166 0.5126833 1
4 0.5013804 0.4986197 0
5 0.4964157 0.5035844 1
6 0.5111378 0.4888622 0
Image data
Get Pets dataset:
URLs_PETS()
Define path to folders:
path = 'oxford-iiit-pet'
path_anno = 'oxford-iiit-pet/annotations'
path_img = 'oxford-iiit-pet/images'
fnames = get_image_files(path_img)
See one of examples:
fnames[1]
oxford-iiit-pet/images/american_pit_bull_terrier_129.jpg
Dataloader:
dls = ImageDataLoaders_from_name_re(
path, fnames, pat='(.+)_\\d+.jpg$',
item_tfms=Resize(size = 460), bs = 10,
batch_tfms=list(Normalize_from_stats( imagenet_stats() )
)
)
Show batch for visualization:
dls %>% show_batch()
<img src="files/pets.png" height=500 align=center alt="Pets"/>
Model architecture:
learn = cnn_learner(dls, resnet34(), metrics = error_rate)
And fit:
learn %>% fit_one_cycle(n_epoch = 2)
epoch train_loss valid_loss error_rate time
0 0.904872 0.317927 0.105548 00:35
1 0.694395 0.239520 0.083897 00:36
Get confusion matrix and plot:
conf = learn %>% get_confusion_matrix()
library(highcharter)
hchart(conf, label = TRUE) %>%
hc_yAxis(title = list(text = 'Actual')) %>%
hc_xAxis(title = list(text = 'Predicted'),
labels = list(rotation = -90))
<img src="files/conf.png" height=500 align=center alt="Pets"/>
Note that the plot is built with highcharter.
Plot top losses:
interp = ClassificationInterpretation_from_learner(learn)
interp %>% plot_top_losses(k = 9, figsize = c(15,11))
<img src="files/top_loss.png" height=500 align=center alt="Pets"/>
Alternatively, load images from folders:
# get sample data
URLs_MNIST_SAMPLE()
# transformations
path = 'mnist_sample'
bs = 20
#load into memory
data = ImageDataLoaders_from_folder(path, size = 26, bs = bs)
# Visualize and train
data %>% show_batch(dpi = 150)
learn = cnn_learner(data, resnet18(), metrics = accuracy)
learn %>% fit(2)
<img src="files/mnist.png" height=500 align=center alt="Mnist"/>
What about the implementation of the latest Computer Vision models?
There is a function in fastai timm_learner which originally written by
Zachary Mueller.
It helps to quickly load the pretrained models from
timm library.
First, lets's see the list of available m
