Blurr
A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.
Install / Use
/learn @ohmeow/BlurrREADME
Getting Started
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->Named after the fastest transformer (well, at least of the Autobots), BLURR provides both a comprehensive and extensible framework for training and deploying 🤗 huggingface transformer models with fastai >= 2.0.
Utilizing features like fastai’s new @typedispatch and @patch
decorators, along with a simple class hiearchy, BLURR provides
fastai developers with the ability to train and deploy transformers on a
variety of tasks. It includes a high, mid, and low-level API that will
allow developers to use much of it out-of-the-box or customize it as
needed.
Supported Text/NLP Tasks: - Sequence Classification
- Token Classification
- Question Answering
- Summarization
- Tranlsation
- Language Modeling (Causal and Masked)
Supported Vision Tasks: - In progress
Supported Audio Tasks: - In progress
Install
You can now pip install blurr via pip install ohmeow-blurr
Or, even better as this library is under very active development, create an editable install like this:
git clone https://github.com/ohmeow/blurr.git
cd blurr
pip install -e ".[dev]"
How to use
Please check the documentation for more thorough examples of how to use this package.
The following two packages need to be installed for blurr to work:
Imports
import os, warnings
import torch
from transformers import *
from transformers.utils import logging as hf_logging
from fastai.text.all import *
from blurr.text.data.all import *
from blurr.text.modeling.all import *
warnings.simplefilter("ignore")
hf_logging.set_verbosity_error()
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Get your data
path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path("models")
imdb_df = pd.read_csv(path / "texts.csv")
Get n_labels from data for config later
n_labels = len(imdb_df["label"].unique())
Get your 🤗 objects
model_cls = AutoModelForSequenceClassification
pretrained_model_name = "bert-base-uncased"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = n_labels
hf_arch, hf_config, hf_tokenizer, hf_model = get_hf_objects(
pretrained_model_name,
model_cls=model_cls,
config=config
)
Build your Data 🧱 and your DataLoaders
# single input
blocks = (
TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model),
CategoryBlock
)
dblock = DataBlock(
blocks=blocks,
get_x=ColReader("text"),
get_y=ColReader("label"),
splitter=ColSplitter()
)
dls = dblock.dataloaders(imdb_df, bs=4)
dls.show_batch(dataloaders=dls, max_n=2, trunc_at=250)
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>text</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>raising victor vargas : a review < br / > < br / > you know, raising victor vargas is like sticking your hands into a big, steaming bowl of oatmeal. it's warm and gooey, but you're not sure if it feels right. try as i might, no matter how warm and go</td>
<td>negative</td>
</tr>
<tr>
<th>1</th>
<td>the shop around the corner is one of the sweetest and most feel - good romantic comedies ever made. there's just no getting around that, and it's hard to actually put one's feeling for this film into words. it's not one of those films that tries too</td>
<td>positive</td>
</tr>
</tbody>
</table>
… and 🚂
model = BaseModelWrapper(hf_model)
learn = Learner(
dls,
hf_model,
opt_func=partial(Adam, decouple_wd=True),
loss_func=CrossEntropyLossFlat(),
metrics=[accuracy],
cbs=[BaseModelCallback],
splitter=blurr_splitter,
)
learn.freeze()
learn.fit_one_cycle(3, lr_max=1e-3)
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: left;">
<th>epoch</th>
<th>train_loss</th>
<th>valid_loss</th>
<th>accuracy</th>
<th>time</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0.628744</td>
<td>0.453862</td>
<td>0.780000</td>
<td>00:21</td>
</tr>
<tr>
<td>1</td>
<td>0.367063</td>
<td>0.294906</td>
<td>0.895000</td>
<td>00:22</td>
</tr>
<tr>
<td>2</td>
<td>0.238181</td>
<td>0.279067</td>
<td>0.900000</td>
<td>00:22</td>
</tr>
</tbody>
</table>
learn.show_results(learner=learn, max_n=2, trunc_at=250)
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>text</th>
<th>target</th>
<th>prediction</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>the trouble with the book, " memoirs of a geisha " is that it had japanese surfaces but underneath the surfaces it was all an american man's way of thinking. reading the book is like watching a magnificent ballet with great music, sets, and costumes</td>
<td>negative</td>
<td>negative</td>
</tr>
<tr>
<th>1</th>
<td>< br / > < br / > i'm sure things didn't exactly go the same way in the real life of homer hickam as they did in the film adaptation of his book, rocket boys, but the movie " october sky " ( an anagram of the book's title ) is good enough to stand al</td>
<td>positive</td>
<td>positive</td>
</tr>
</tbody>
</table>
Using the high-level Blurr API
Using the high-level API we can reduce DataBlock, DataLoaders, and Learner creation into a single line of code.
Included in the high-level API is a general BLearner class (pronouned
“Blurrner”) that you can use with hand crafted DataLoaders, as well
as, task specific BLearners like BLearnerForSequenceClassification
that will handle everything given your raw data sourced from a pandas
DataFrame, CSV file, or list of dictionaries (for example a huggingface
datasets dataset)
learn = BlearnerForSequenceClassification.from_data(
imdb_df,
pretrained_model_name,
dl_kwargs={"bs": 4}
)
learn.fit_one_cycle(1, lr_max=1e-3)
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: left;">
<th>epoch</th>
<th>train_loss</th>
<th>valid_loss</th>
<th>f1_score</th>
<th>accuracy</th>
<th>time</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0.530218</td>
<td>0.484683</td>
<td>0.789189</td>
<td>0.805000</td>
<td>00:22</td>
</tr>
</tbody>
</table>
learn.show_results(learner=learn, max_n=2, trunc_at=250)
<style>
/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>text</th>
<th>target</th>
<th>prediction</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>the trouble with the book, " memoirs of a geisha " is that it had japanese surfaces but underneath the surfaces it was all an american man's way of thinking. reading the book is like watching a magnificent ballet with great music, sets, and costumes</td>
<td>negative</td>
<td>negative</td>
</tr>
<tr>
<th>1</th>
<td>< br / > < br / > i'm sure things didn't exactly go the same way in the rRelated Skills
tmux
337.3kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
terraform-provider-genesyscloud
Terraform Provider Genesyscloud
blogwatcher
337.3kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
product
Cloud-agnostic Kubernetes infrastructure with Terraform & Helm for homelabs, edge, and production clusters.
