SkillAgentSearch skills...

LightAutoML

LAMA - automatic model creation framework

Install / Use

/learn @sberbank-ai-lab/LightAutoML

README

<img src=https://github.com/sberbank-ai-lab/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png width=600 />

LightAutoML - automatic model creation framework

Slack Telegram PyPI - Downloads Read the Docs Tests Black

LightAutoML (LAMA) is an AutoML framework by Sber AI Lab.

It provides automatic model creation for the following tasks:

  • binary classification
  • multiclass classification
  • regression

Current version of the package handles datasets that have independent samples in each row. I.e. each row is an object with its specific features and target. Multitable datasets and sequences are a work in progress :)

Note: we use AutoWoE library to automatically create interpretable models.

Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Vasilii Bunakov, Rinchin Damdinov, Pavel Shvets, Alexander Kirilin.

Documentation of LightAutoML is available here, you can also generate it.

(New feature) GPU pipeline

Full GPU pipeline for LightAutoML currently available for developers testing (still in progress). The code and tutorials available here

<a name="toc"></a>

Table of Contents

<a name="installation"></a>

Installation

To install LAMA framework on your machine from PyPI, execute following commands:


# Install base functionality:

pip install -U lightautoml

# For partial installation use corresponding option.
# Extra dependecies: [nlp, cv, report]
# Or you can use 'all' to install everything

pip install -U lightautoml[nlp]

Additionaly, run following commands to enable pdf report generation:

# MacOS
brew install cairo pango gdk-pixbuf libffi

# Debian / Ubuntu
sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

# Fedora
sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2

# Windows
# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows

Back to top

<a name="quicktour"></a>

Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:

  • Use ready preset for tabular data:
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

automl = TabularAutoML(
    task = Task(
        name = 'binary',
        metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1))
)
oof_pred = automl.fit_predict(
    df_train,
    roles = {'target': 'Survived', 'drop': ['PassengerId']}
)
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.

Back to top

<a name="examples"></a>

Resources

Kaggle kernel examples of LightAutoML usage:

Google Colab tutorials and other examples:

  • Tutorial_1_basics.ipynb Open In Colab - get started with LightAutoML on tabular data.
  • Tutorial_2_WhiteBox_AutoWoE.ipynb Open In Colab - creating interpretable models.
  • Tutorial_3_sql_data_source.ipynb Open In Colab - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV.
  • Tutorial_4_NLP_Interpretation.ipynb Open In Colab - example of using TabularNLPAutoML preset, LimeTextExplainer.
  • Tutorial_5_uplift.ipynb Open In Colab - shows how to use LightAutoML for a uplift-modeling task.
  • Tutorial_6_custom_pipeline.ipynb Open In Colab - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc.
  • Tutorial_7_ICE_and_PDP_interpretation.ipynb Open In Colab - shows how to obtain local and global interpretation of model results using ICE and PDP approaches.

Note 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Note 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

Courses, videos and papers

Back to top

<a

View on GitHub
GitHub Stars923
CategoryData
Updated6d ago
Forks98

Languages

Python

Security Score

100/100

Audited on Mar 25, 2026

No findings