TabPFN
⚡ TabPFN: Foundation Model for Tabular Data ⚡
Install / Use
/learn @PriorLabs/TabPFNREADME
TabPFN
<img src="https://github.com/PriorLabs/tabpfn-extensions/blob/main/tabpfn_summary.webp" width="80%" alt="TabPFN Summary">Quick Start
Interactive Notebook Tutorial
[!TIP]
Dive right in with our interactive Colab notebook! It's the best way to get a hands-on feel for TabPFN, walking you through installation, classification, and regression examples.
⚡ GPU Recommended: For optimal performance, use a GPU (even older ones with ~8GB VRAM work well; 16GB needed for some large datasets). On CPU, only small datasets (≲1000 samples) are feasible. No GPU? Use our free hosted inference via TabPFN Client.
Installation
Official installation (pip)
pip install tabpfn
OR installation from source
pip install "tabpfn @ git+https://github.com/PriorLabs/TabPFN.git"
OR local development installation: First install uv, which we use for development, then run
git clone https://github.com/PriorLabs/TabPFN.git --depth 1
cd TabPFN
uv sync
Basic Usage
To use our default TabPFN-2.6 model, trained purely on synthetic data:
from tabpfn import TabPFNClassifier, TabPFNRegressor
clf = TabPFNClassifier()
clf.fit(X_train, y_train) # downloads checkpoint on first use
predictions = clf.predict(X_test)
reg = TabPFNRegressor()
reg.fit(X_train, y_train) # downloads checkpoint on first use
predictions = reg.predict(X_test)
To use other model versions (e.g. TabPFN-2.5):
from tabpfn import TabPFNClassifier, TabPFNRegressor
from tabpfn.constants import ModelVersion
classifier = TabPFNClassifier.create_default_for_version(ModelVersion.V2_5)
regressor = TabPFNRegressor.create_default_for_version(ModelVersion.V2_5)
For complete examples, see the tabpfn_for_binary_classification.py, tabpfn_for_multiclass_classification.py, and tabpfn_for_regression.py files.
Usage Tips
- Use batch prediction mode: Each
predictcall recomputes the training set. Callingpredicton 100 samples separately is almost 100 times slower and more expensive than a single call. If the test set is very large, split it into chunks of 1000 samples each. - Avoid data preprocessing: Do not apply data scaling or one-hot encoding when feeding data to the model.
- Use a GPU: TabPFN is slow to execute on a CPU. Ensure a GPU is available for better performance.
- Mind the dataset size: TabPFN works best on datasets with fewer than 100,000 samples and 2000 features. For larger datasets, we recommend looking at the Large datasets guide.
TabPFN Ecosystem
Choose the right TabPFN implementation for your needs:
-
TabPFN Client Simple API client for using TabPFN via cloud-based inference.
-
TabPFN Extensions A powerful companion repository packed with advanced utilities, integrations, and features - great place to contribute:
interpretability: Gain insights with SHAP-based explanations, feature importance, and selection tools.unsupervised: Tools for outlier detection and synthetic tabular data generation.embeddings: Extract and use TabPFN’s internal learned embeddings for downstream tasks or analysis.many_class: Handle multi-class classification problems that exceed TabPFN's built-in class limit.rf_pfn: Combine TabPFN with traditional models like Random Forests for hybrid approaches.hpo: Automated hyperparameter optimization tailored to TabPFN.post_hoc_ensembles: Boost performance by ensembling multiple TabPFN models post-training.
To install:
git clone https://github.com/priorlabs/tabpfn-extensions.git pip install -e tabpfn-extensions -
TabPFN (this repo) Core implementation for fast and local inference with PyTorch and CUDA support.
-
TabPFN UX No-code graphical interface to explore TabPFN capabilities—ideal for business users and prototyping.
TabPFN Workflow at a Glance
Follow this decision tree to build your model and choose the right extensions from our ecosystem. It walks you through critical questions about your data, hardware, and performance needs, guiding you to the best solution for your specific use case.
---
config:
theme: 'default'
themeVariables:
edgeLabelBackground: 'white'
---
graph LR
%% 1. DEFINE COLOR SCHEME & STYLES
classDef default fill:#fff,stroke:#333,stroke-width:2px,color:#333;
classDef start_node fill:#e8f5e9,stroke:#43a047,stroke-width:2px,color:#333;
classDef process_node fill:#e0f2f1,stroke:#00796b,stroke-width:2px,color:#333;
classDef decision_node fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#333;
style Infrastructure fill:#fff,stroke:#ccc,stroke-width:5px;
style Unsupervised fill:#fff,stroke:#ccc,stroke-width:5px;
style Data fill:#fff,stroke:#ccc,stroke-width:5px;
style Performance fill:#fff,stroke:#ccc,stroke-width:5px;
style Interpretability fill:#fff,stroke:#ccc,stroke-width:5px;
%% 2. DEFINE GRAPH STRUCTURE
subgraph Infrastructure
start((Start)) --> gpu_check["GPU available?"];
gpu_check -- Yes --> local_version["Use TabPFN<br/>(local PyTorch)"];
gpu_check -- No --> api_client["Use TabPFN-Client<br/>(cloud API)"];
task_type["What is<br/>your task?"]
end
local_version --> task_type
api_client --> task_type
end_node((Workflow<br/>Complete));
subgraph Unsupervised
unsupervised_type["Select<br/>Unsupervised Task"];
unsupervised_type --> imputation["Imputation"]
unsupervised_type --> data_gen["Data<br/>Generation"];
unsupervised_type --> tabebm["Data<br/>Augmentation"];
unsupervised_type --> density["Outlier<br/>Detection"];
unsupervised_type --> embedding["Get<br/>Embeddings"];
end
subgraph Data
data_check["Data Checks"];
model_choice["Samples > 50k or<br/>Classes > 10?"];
data_check -- "Table Contains Text Data?" --> api_backend_note["Note: API client has<br/>native text support"];
api_backend_note --> model_choice;
data_check -- "Time-Series Data?" --> ts_features["Use Time-Series<br/>Features"];
ts_features --> model_choice;
data_check -- "Purely Tabular" --> model_choice;
model_choice -- "No" --> finetune_check;
model_choice -- "Yes, 50k-100k samples" --> ignore_limits["Set<br/>ignore_pretraining_limits=True"];
model_choice -- "Yes, >100k samples" --> subsample["Large Datasets Guide<br/>"];
model_choice -- "Yes, >10 classes" --> many_class["Many-Class<br/>Method"];
end
subgraph Performance
finetune_check["Need<br/>Finetuning?"];
performance_check["Need Even Better Performance?"];
speed_check["Need faster inference<br/>at prediction time?"];
kv_cache["Enable KV Cache<br/>(fit_mode='fit_with_cache')<br/><small>Faster predict; +Memory ~O(N×F)</small>"];
tuning_complete["Tuning Complete"];
finetune_check -- Yes --> finetuning["Finetuning"];
finetune_check -- No --> performance_check;
finetuning --> performance_check;
performance_check -- No --> tuning_complete;
performance_check -- Yes --> hpo["HPO"];
performance_check -- Yes --> post_hoc["Post-Hoc<br/>Ensembling"];
performance_check -- Yes --> more_estimators["More<br/>Estimators"];
performance_check -- Yes --> speed_check;
speed_check -- Yes --> kv_cache;
speed_check -- No --> tuning_complete;
hpo --> tuning_complete;
post_hoc --> tuning_complete;
more_estimators --> tuning_complete;
kv_cache --> tuning_complete;
end
subgraph Interpretability
tuning_complete --> interpretability_check;
interpretability_check["Need<br/>Interpretability?"];
interpretability_check --> feature_selection["Feature Selection"];
interpretability_check --> partial_dependence["Partial Dependence Plots"];
interpretability_check --> shapley["Explain with<br/>SHAP"];
interpretability_check --> shap_iq["Explain with<br/>SHAP IQ"];
interpretability_check -- No --> end_node;
feature_selection --> end_node;
partial_dependence --> end_node;
shapley --> end_node;
shap_iq --> end_node;
end
%% 3. LINK SUBGRAPHS AND PATHS
task_type -- "Classification or
