TabPFN

Quick Start

Interactive Notebook Tutorial

[!TIP]

Dive right in with our interactive Colab notebook! It's the best way to get a hands-on feel for TabPFN, walking you through installation, classification, and regression examples.

⚡ GPU Recommended: For optimal performance, use a GPU (even older ones with ~8GB VRAM work well; 16GB needed for some large datasets). On CPU, only small datasets (≲1000 samples) are feasible. No GPU? Use our free hosted inference via TabPFN Client.

Installation

Official installation (pip)

pip install tabpfn

OR installation from source

pip install "tabpfn @ git+https://github.com/PriorLabs/TabPFN.git"

OR local development installation: First install uv, which we use for development, then run

git clone https://github.com/PriorLabs/TabPFN.git --depth 1
cd TabPFN
uv sync

Basic Usage

To use our default TabPFN-2.6 model, trained purely on synthetic data:

from tabpfn import TabPFNClassifier, TabPFNRegressor

clf = TabPFNClassifier()
clf.fit(X_train, y_train)  # downloads checkpoint on first use
predictions = clf.predict(X_test)

reg = TabPFNRegressor()
reg.fit(X_train, y_train)  # downloads checkpoint on first use
predictions = reg.predict(X_test)

To use other model versions (e.g. TabPFN-2.5):

from tabpfn import TabPFNClassifier, TabPFNRegressor
from tabpfn.constants import ModelVersion

classifier = TabPFNClassifier.create_default_for_version(ModelVersion.V2_5)
regressor = TabPFNRegressor.create_default_for_version(ModelVersion.V2_5)

For complete examples, see the tabpfn_for_binary_classification.py, tabpfn_for_multiclass_classification.py, and tabpfn_for_regression.py files.

Usage Tips

Use batch prediction mode: Each predict call recomputes the training set. Calling predict on 100 samples separately is almost 100 times slower and more expensive than a single call. If the test set is very large, split it into chunks of 1000 samples each.
Avoid data preprocessing: Do not apply data scaling or one-hot encoding when feeding data to the model.
Use a GPU: TabPFN is slow to execute on a CPU. Ensure a GPU is available for better performance.
Mind the dataset size: TabPFN works best on datasets with fewer than 100,000 samples and 2000 features. For larger datasets, we recommend looking at the Large datasets guide.

TabPFN Ecosystem

Choose the right TabPFN implementation for your needs:

TabPFN Client Simple API client for using TabPFN via cloud-based inference.
TabPFN Extensions A powerful companion repository packed with advanced utilities, integrations, and features - great place to contribute:
- interpretability: Gain insights with SHAP-based explanations, feature importance, and selection tools.
- unsupervised: Tools for outlier detection and synthetic tabular data generation.
- embeddings: Extract and use TabPFN’s internal learned embeddings for downstream tasks or analysis.
- many_class: Handle multi-class classification problems that exceed TabPFN's built-in class limit.
- rf_pfn: Combine TabPFN with traditional models like Random Forests for hybrid approaches.
- hpo: Automated hyperparameter optimization tailored to TabPFN.
- post_hoc_ensembles: Boost performance by ensembling multiple TabPFN models post-training.
To install:
```
git clone https://github.com/priorlabs/tabpfn-extensions.git
pip install -e tabpfn-extensions
```
TabPFN (this repo) Core implementation for fast and local inference with PyTorch and CUDA support.
TabPFN UX No-code graphical interface to explore TabPFN capabilities—ideal for business users and prototyping.

TabPFN Workflow at a Glance

Follow this decision tree to build your model and choose the right extensions from our ecosystem. It walks you through critical questions about your data, hardware, and performance needs, guiding you to the best solution for your specific use case.

---
config:
  theme: 'default'
  themeVariables:
    edgeLabelBackground: 'white'
---
graph LR
    %% 1. DEFINE COLOR SCHEME & STYLES
    classDef default fill:#fff,stroke:#333,stroke-width:2px,color:#333;
    classDef start_node fill:#e8f5e9,stroke:#43a047,stroke-width:2px,color:#333;
    classDef process_node fill:#e0f2f1,stroke:#00796b,stroke-width:2px,color:#333;
    classDef decision_node fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#333;

    style Infrastructure fill:#fff,stroke:#ccc,stroke-width:5px;
    style Unsupervised fill:#fff,stroke:#ccc,stroke-width:5px;
    style Data fill:#fff,stroke:#ccc,stroke-width:5px;
    style Performance fill:#fff,stroke:#ccc,stroke-width:5px;
    style Interpretability fill:#fff,stroke:#ccc,stroke-width:5px;

    %% 2. DEFINE GRAPH STRUCTURE
    subgraph Infrastructure
        start((Start)) --> gpu_check["GPU available?"];
        gpu_check -- Yes --> local_version["Use TabPFN<br/>(local PyTorch)"];
        gpu_check -- No --> api_client["Use TabPFN-Client<br/>(cloud API)"];
        task_type["What is<br/>your task?"]
    end

    local_version --> task_type
    api_client --> task_type

    end_node((Workflow<br/>Complete));

    subgraph Unsupervised
        unsupervised_type["Select<br/>Unsupervised Task"];
        unsupervised_type --> imputation["Imputation"]
        unsupervised_type --> data_gen["Data<br/>Generation"];
        unsupervised_type --> tabebm["Data<br/>Augmentation"];
        unsupervised_type --> density["Outlier<br/>Detection"];
        unsupervised_type --> embedding["Get<br/>Embeddings"];
    end


    subgraph Data
        data_check["Data Checks"];
        model_choice["Samples > 50k or<br/>Classes > 10?"];
        data_check -- "Table Contains Text Data?" --> api_backend_note["Note: API client has<br/>native text support"];
        api_backend_note --> model_choice;
        data_check -- "Time-Series Data?" --> ts_features["Use Time-Series<br/>Features"];
        ts_features --> model_choice;
        data_check -- "Purely Tabular" --> model_choice;
        model_choice -- "No" --> finetune_check;
        model_choice -- "Yes, 50k-100k samples" --> ignore_limits["Set<br/>ignore_pretraining_limits=True"];
        model_choice -- "Yes, >100k samples" --> subsample["Large Datasets Guide<br/>"];
        model_choice -- "Yes, >10 classes" --> many_class["Many-Class<br/>Method"];
    end

    subgraph Performance
        finetune_check["Need<br/>Finetuning?"];
        performance_check["Need Even Better Performance?"];
        speed_check["Need faster inference<br/>at prediction time?"];
        kv_cache["Enable KV Cache<br/>(fit_mode='fit_with_cache')<br/><small>Faster predict; +Memory ~O(N×F)</small>"];
        tuning_complete["Tuning Complete"];

        finetune_check -- Yes --> finetuning["Finetuning"];
        finetune_check -- No --> performance_check;

        finetuning --> performance_check;

        performance_check -- No --> tuning_complete;
        performance_check -- Yes --> hpo["HPO"];
        performance_check -- Yes --> post_hoc["Post-Hoc<br/>Ensembling"];
        performance_check -- Yes --> more_estimators["More<br/>Estimators"];
        performance_check -- Yes --> speed_check;

        speed_check -- Yes --> kv_cache;
        speed_check -- No --> tuning_complete;

        hpo --> tuning_complete;
        post_hoc --> tuning_complete;
        more_estimators --> tuning_complete;
        kv_cache --> tuning_complete;
    end

    subgraph Interpretability

        tuning_complete --> interpretability_check;

        interpretability_check["Need<br/>Interpretability?"];

        interpretability_check --> feature_selection["Feature Selection"];
        interpretability_check --> partial_dependence["Partial Dependence Plots"];
        interpretability_check --> shapley["Explain with<br/>SHAP"];
        interpretability_check --> shap_iq["Explain with<br/>SHAP IQ"];
        interpretability_check -- No --> end_node;

        feature_selection --> end_node;
        partial_dependence --> end_node;
        shapley --> end_node;
        shap_iq --> end_node;

    end

    %% 3. LINK SUBGRAPHS AND PATHS
    task_type -- "Classification or

TabPFN

Install / Use

README

TabPFN

Quick Start

Interactive Notebook Tutorial

Installation

Basic Usage

Usage Tips

TabPFN Ecosystem

TabPFN Workflow at a Glance