Getting Started

Named after the fastest transformer (well, at least of the Autobots), BLURR provides both a comprehensive and extensible framework for training and deploying 🤗 huggingface transformer models with fastai >= 2.0.

Utilizing features like fastai’s new @typedispatch and @patch decorators, along with a simple class hiearchy, BLURR provides fastai developers with the ability to train and deploy transformers on a variety of tasks. It includes a high, mid, and low-level API that will allow developers to use much of it out-of-the-box or customize it as needed.

Supported Text/NLP Tasks: - Sequence Classification

Token Classification
Question Answering
Summarization
Tranlsation
Language Modeling (Causal and Masked)

Supported Vision Tasks: - In progress

Supported Audio Tasks: - In progress

Install

You can now pip install blurr via pip install ohmeow-blurr

Or, even better as this library is under very active development, create an editable install like this:

git clone https://github.com/ohmeow/blurr.git
cd blurr
pip install -e ".[dev]"

How to use

Please check the documentation for more thorough examples of how to use this package.

The following two packages need to be installed for blurr to work:

Imports

import os, warnings

import torch
from transformers import *
from transformers.utils import logging as hf_logging
from fastai.text.all import *

from blurr.text.data.all import *
from blurr.text.modeling.all import *

warnings.simplefilter("ignore")
hf_logging.set_verbosity_error()

os.environ["TOKENIZERS_PARALLELISM"] = "false"

Get your data

path = untar_data(URLs.IMDB_SAMPLE)

model_path = Path("models")
imdb_df = pd.read_csv(path / "texts.csv")

Get `n_labels` from data for config later

n_labels = len(imdb_df["label"].unique())

Get your 🤗 objects

model_cls = AutoModelForSequenceClassification

pretrained_model_name = "bert-base-uncased"

config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = n_labels

hf_arch, hf_config, hf_tokenizer, hf_model = get_hf_objects(
    pretrained_model_name,
    model_cls=model_cls, 
    config=config
)

Build your Data 🧱 and your DataLoaders

# single input
blocks = (
    TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), 
    CategoryBlock
)
dblock = DataBlock(
    blocks=blocks, 
    get_x=ColReader("text"), 
    get_y=ColReader("label"), 
    splitter=ColSplitter()
)

dls = dblock.dataloaders(imdb_df, bs=4)

dls.show_batch(dataloaders=dls, max_n=2, trunc_at=250)

<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>text</th> <th>target</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>raising victor vargas : a review < br / > < br / > you know, raising victor vargas is like sticking your hands into a big, steaming bowl of oatmeal. it's warm and gooey, but you're not sure if it feels right. try as i might, no matter how warm and go</td> <td>negative</td> </tr> <tr> <th>1</th> <td>the shop around the corner is one of the sweetest and most feel - good romantic comedies ever made. there's just no getting around that, and it's hard to actually put one's feeling for this film into words. it's not one of those films that tries too</td> <td>positive</td> </tr> </tbody> </table>

… and 🚂

model = BaseModelWrapper(hf_model)

learn = Learner(
    dls,
    hf_model,
    opt_func=partial(Adam, decouple_wd=True),
    loss_func=CrossEntropyLossFlat(),
    metrics=[accuracy],
    cbs=[BaseModelCallback],
    splitter=blurr_splitter,
)

learn.freeze()

learn.fit_one_cycle(3, lr_max=1e-3)

learn.show_results(learner=learn, max_n=2, trunc_at=250)

Using the high-level Blurr API

Using the high-level API we can reduce DataBlock, DataLoaders, and Learner creation into a single line of code.

Included in the high-level API is a general BLearner class (pronouned “Blurrner”) that you can use with hand crafted DataLoaders, as well as, task specific BLearners like BLearnerForSequenceClassification that will handle everything given your raw data sourced from a pandas DataFrame, CSV file, or list of dictionaries (for example a huggingface datasets dataset)

learn = BlearnerForSequenceClassification.from_data(
    imdb_df, 
    pretrained_model_name, 
    dl_kwargs={"bs": 4}
)

learn.fit_one_cycle(1, lr_max=1e-3)

learn.show_results(learner=learn, max_n=2, trunc_at=250)

Blurr

Install / Use

README

Getting Started

Install

How to use

Imports

Get your data

Get `n_labels` from data for config later

Get your 🤗 objects

Build your Data 🧱 and your DataLoaders

… and 🚂

Using the high-level Blurr API

Related Skills

Blurr

Install / Use

README

Getting Started

Install

How to use

Imports

Get your data

Get n_labels from data for config later

Get your 🤗 objects

Build your Data 🧱 and your DataLoaders

… and 🚂

Using the high-level Blurr API

Related Skills

Get `n_labels` from data for config later