DeepTab

DeepTab is a Python package that simplifies tabular deep learning by providing a suite of models for regression, classification, and distributional regression tasks. It includes models such as Mambular, TabM, FT-Transformer, TabulaRNN, TabTransformer, and tabular ResNets.

Generate Convert Improve

Install / Use

/learn @OpenTabular/DeepTab

About this skill

Quality Score

0/100

README

PyPI - Downloads

📘Documentation | 🛠️Installation | Models | 🤔Report Issues

</div> <div style="text-align: center;"> <h1>deeptab: Tabular Deep Learning Made Simple</h1> </div>

deeptab is a Python library for tabular deep learning. It includes models that leverage the Mamba (State Space Model) architecture, as well as other popular models like TabTransformer, FTTransformer, TabM and tabular ResNets. Check out our paper Mambular: A Sequential Model for Tabular Deep Learning, available here. Also check out our paper introducing TabulaRNN and analyzing the efficiency of NLP inspired tabular models.

<h3>⚡ What's New ⚡</h3> <ul> <li>New Models: `Tangos`, `AutoInt`, `Trompt`, `ModernNCA`</li> <li>Pretraining optionality for suitable models.</li> <li>Individual preprocessing: preprocess each feature differently, use pre-trained models for categorical encoding</li> <li>Extract latent representations of tables</li> <li>Use embeddings as inputs</li> <li>Define custom training metrics</li> </ul> <h3> Table of Contents </h3>

🏃 Quickstart
📖 Introduction
🤖 Models
📚 Documentation
🛠️ Installation
🚀 Usage
💻 Implement Your Own Model
🏷️ Citation
License

🏃 Quickstart

Similar to any sklearn model, deeptab models can be fit as easy as this:

from deeptab.models import MambularClassifier
# Initialize and fit your model
model = MambularClassifier()

# X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
model.fit(X, y, max_epochs=150, lr=1e-04)

📖 Introduction

deeptab is a Python package that brings the power of advanced deep learning architectures to tabular data, offering a suite of models for regression, classification, and distributional regression tasks. Designed with ease of use in mind, deeptab models adhere to scikit-learn's BaseEstimator interface, making them highly compatible with the familiar scikit-learn ecosystem. This means you can fit, predict, and evaluate using deeptab models just as you would with any traditional scikit-learn model, but with the added performance and flexibility of deep learning.

🤖 Models

| Model | Description | | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | Mambular | A sequential model using Mamba blocks specifically designed for various tabular data tasks introduced here. | | TabM | Batch Ensembling for a MLP as introduced by Gorishniy et al. | | NODE | Neural Oblivious Decision Ensembles as introduced by Popov et al. | | FTTransformer | A model leveraging transformer encoders, as introduced by Gorishniy et al., for tabular data. | | MLP | A classical Multi-Layer Perceptron (MLP) model for handling tabular data tasks. | | ResNet | An adaptation of the ResNet architecture for tabular data applications. | | TabTransformer | A transformer-based model for tabular data introduced by Huang et al., enhancing feature learning capabilities. | | MambaTab | A tabular model using a Mamba-Block on a joint input representation described here . Not a sequential model. | | TabulaRNN | A Recurrent Neural Network for Tabular data, introduced here. | | MambAttention | A combination between Mamba and Transformers, also introduced here. | | NDTF | A neural decision forest using soft decision trees. See Kontschieder et al. for inspiration. | | SAINT | Improve neural networs via Row Attention and Contrastive Pre-Training, introduced here. | | AutoInt | Automatic Feature Interaction Learning via Self-Attentive Neural Networks introduced here. | | Trompt | Trompt: Towards a Better Deep Neural Network for Tabular Data introduced here. | | Tangos | Tangos: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization introduced here. | | ModernNCA | Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later introduced here. | | TabR | TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023 here |

All models are available for regression, classification and distributional regression, denoted by LSS. Hence, they are available as e.g. MambularRegressor, MambularClassifier or MambularLSS

📚 Documentation

You can find the deeptab API documentation here.

🛠️ Installation

Install deeptab using pip:

pip install deeptab

If you want to use the original mamba and mamba2 implementations, additionally install mamba-ssm via:

pip install mamba-ssm

Be careful to use the correct torch and cuda versions:

pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 torchaudio==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
pip install mamba-ssm

🚀 Usage

<h2> Preprocessing </h2>

deeptab uses pretab preprocessing: https://github.com/OpenTabular/PreTab

Hence, datatypes etc. are detected automatically and all preprocessing methods from pretab as well as from Sklearn.preprocessing are available. Additionally, you can specify that each feature is preprocessed differently, according to your requirements, by setting the feature_preprocessing={}argument during model initialization. For an overview over all available methods: pretab

<h3> Data Type Detection and Transformation </h3>

Ordinal & One-Hot Encoding: Automatically transforms categorical data into numerical formats using continuous ordinal encoding or one-hot encoding. Includes options for transforming outputs to float for compatibility with downstream models.
Binning: Discretizes numerical features into bins, with support for both fixed binning strategies and optimal binning derived from decision tree models.
MinMax: Scales numerical data to a specific range, such as [-1, 1], using Min-Max scaling or similar techniques.
Standardization: Centers and scales numerical features to have a mean of zero and unit variance for better compatibility with certain models.
Quantile Transformations: Normalizes numerical data to follow a uniform or normal distribution, handling distributional shifts effectively.
Spline Transformations: Captures nonlinearity in numerical features using spline-based transformations, ideal for complex relationships.
Piecewise Linear Encodings (PLE): Captures complex numerical patterns by applying piecewise linear encoding, suitable for data with periodic or nonlinear structures.
Polynomial Features: Automatically generates polynomial and interaction terms for numerical features, enhancing the ability to capture higher-order relationships.
Box-Cox & Yeo-Johnson Transformations: Performs power transformations to stabilize variance and normalize distributions.
Custom Binning: Enables user-defined bin edges for precise discretization of numerical data.
Pre-trained Encoding: Use sentence transformers to encode categorical features.

<h2> Fit a Model </h2> Fitting a model in deeptab is as simple as it gets. All models in deeptab are sklearn BaseEstimators. Thus the `.fit` method is implemented for all of them. Additionally, this allows for using all other sklearn inherent methods such as their built in hyperparameter optimization tools.

Related Skills

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

last30days-skill

17.2k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary