<p align="center"> <img src="./resources/TALENT-LOGO.png" width="1000px"> </p> <p align="center"> <a href='https://arxiv.org/abs/2407.04057'><img src='https://img.shields.io/badge/Arxiv-2407.04057-b31b1b.svg?logo=arXiv'></a> <a href='https://zhuanlan.zhihu.com/p/708721145'><img src='https://img.shields.io/badge/中文解读-b0.svg?logo=zhihu'></a> <a href='https://huggingface.co/datasets/LAMDA-Tabular/TALENT'><img src='https://img.shields.io/badge/%F0%9F%A4%97-TALENT-green'></a> <a href=""><img src="https://img.shields.io/github/stars/qile2000/LAMDA-TALENT?color=4fb5ee"></a> <a href=""><img src="https://img.shields.io/github/last-commit/qile2000/LAMDA-TALENT?color=blue"></a> <br> <img src="https://img.shields.io/badge/PYTORCH-2.0.1-red?style=for-the-badge&logo=pytorch" alt="PyTorch - Version" height="21"> <img src="https://img.shields.io/badge/PYTHON-3.10-red?style=for-the-badge&logo=python&logoColor=white" alt="Python - Version" height="21"> <a href=""> <a href='https://lamda-talent.readthedocs.io/en/latest/?badge=latest'> <img src='https://readthedocs.org/projects/lamda-talent/badge/?version=latest' alt='Documentation Status' /> </a><img src="https://black.readthedocs.io/en/stable/_static/license.svg"></a> </p> <div align="center"> <p> TALENT: A Tabular Analytics and Learning Toolbox <p> <p> <a href="https://arxiv.org/abs/2407.04057">[Paper]</a> <a href="https://zhuanlan.zhihu.com/p/708721145">[中文解读]</a> <a href="https://lamda-talent.readthedocs.io/en/latest">[Docs]</a> <p> </div>

🎉 Introduction

Welcome to TALENT, a benchmark with a comprehensive machine learning toolbox designed to enhance model performance on tabular data. TALENT integrates advanced deep learning models, classical algorithms, and efficient hyperparameter tuning, offering robust preprocessing capabilities to optimize learning from tabular datasets. The toolbox is user-friendly and adaptable, catering to both novice and expert data scientists.

TALENT offers the following advantages:

Diverse Methods: Includes various classical methods, tree-based methods, and the latest popular deep learning methods.
Extensive Dataset Collection: Equipped with 300 datasets, covering a wide range of task types, size distributions, and dataset domains.
Customizability: Easily allows the addition of datasets and methods.
Versatile Support: Supports diverse normalization, encoding, and metrics.

📚Citing TALENT

If you use any content of this repo for your work, please cite the following bib entries:

@article{ye2024closerlookdeeplearning,
         title={A Closer Look at Deep Learning on Tabular Data}, 
         author={Han-Jia Ye and 
         		 Si-Yang Liu and 
         		 Hao-Run Cai and 
         		 Qi-Le Zhou and 
         		 De-Chuan Zhan},
         journal={arXiv preprint arXiv:2407.00956},
         year={2024}
}

@article{JMLR:v26:25-0512,
  author  = {Si-Yang Liu and
			 Hao-Run Cai and
 			 Qi-Le Zhou and
			 Huai-Hong Yin and
			 Tao Zhou and
			 Jun-Peng Jiang and
			 Han-Jia Ye},
  title   = {Talent: A Tabular Analytics and Learning Toolbox},
  journal = {Journal of Machine Learning Research},
  year    = {2025},
  volume  = {26},
  number  = {226},
  pages   = {1--16},
  url     = {http://jmlr.org/papers/v26/25-0512.html}
}

📰 What's New

[2026-03]🌟 We have updated the TALENT-extension datasets and results. Link
[2025-11]🌟 Add RFM (Science).
[2025-11]🌟 Add Real-TabPFN.
[2025-11]🌟 Add LimiX.
[2025-09]🌟 Add xRFM.
[2025-08]🌟 Add Mitra.
[2025-06]🌟 Add TabAutoPNPNet (Electronics 2025).
[2025-06]🌟 Add TabICL (ICML 2025). The current code is based on TabICL v0.1.2.
[2025-05]🌟 Check out our three papers MMTU, Tabular-Temporal-Shift, and BETA accepted at ICML 2025!
[2025-04]🌟 Check out our new survey Representation Learning for Tabular Data: A Comprehensive Survey (Repo). We organize existing methods into three main categories according to their generalization capabilities: specialized, transferable, and general models, which provides a comprehensive taxonomy for deep tabular representation methods.🚀🚀🚀
[2025-02]🌟 Add T2Gformer (AAAI 2023).
[2025-02]🌟 Add TabPFN v2 (Nature).
[2025-02]🌟 Thanks to Hengzhe Zhang for providing a Scikit-Learn compatible wrapper for TALENT!
[2025-01]🌟 Check out our new baseline ModernNCA (ICLR 2025), inspired by traditional Neighbor Component Analysis, which outperforms both tree-based and other deep tabular models, while also reducing training time and model size!🚀🚀🚀
[2025-01]🌟 Check out our latest version of the benchmark paper for updated and expanded results and analysis!
[2025-01]🌟We have curated and released new benchmark datasets, along with updated results of the dataset across a broader range of methods. This update focuses on enhancing dataset quality, including removing duplicates, and correcting tasks where bin-class was mistakenly treated as regression. We have also separated the larger datasets and formed the basic benchmark (300 datasets, including 120 bin-class, 80 multi-class, and 100 regression), and the large benchmark (22 datasets).
[2024-12]🌟 Add TabM (ICLR 2025).
[2024-09]🌟 Add Trompt (ICML 2023).
[2024-09]🌟 Add AMFormer (AAAI 2024).
[2024-08]🌟 Add GRANDE (ICLR 2024).
[2024-08]🌟 Add Excelformer (KDD 2024).
[2024-08]🌟 Add MLP_PLR (NeurIPS 2022).
[2024-07]🌟 Add RealMLP(NeurIPS 2024).
[2024-07]🌟 Add ProtoGate (ICML 2024).
[2024-07]🌟 Add BiSHop (ICML 2024).
[2024-06]🌟 Check out our new baseline ModernNCA, inspired by traditional Neighbor Component Analysis, which outperforms both tree-based and other deep tabular models, while also reducing training time and model size!
[2024-06]🌟 Check out our benchmark paper about tabular data, which provides comprehensive evaluations of classical and deep tabular methods based on our toolbox in a fair manner!

🌟 Methods

TALENT integrates an extensive array of 30+ deep learning architectures for tabular data, including but not limited to:

MLP: A multi-layer neural network, which is implemented according to RTDL.
ResNet: A DNN that uses skip connections across many layers, which is implemented according to RTDL.
SNN: An MLP-like architecture utilizing the SELU activation, which facilitates the training of deeper neural networks.
DANets: A neural network designed to enhance tabular data processing by grouping correlated features and reducing computational complexity.
TabCaps: A capsule network that encapsulates all feature values of a record into vectorial features.
DCNv2: Consists of an MLP-like module combined with a feature crossing module, which includes both linear layers and multiplications.
NODE: A tree-mimic method that generalizes oblivious decision trees, combining gradient-based optimization with hierarchical representation learning.
GrowNet: A gradient boosting framework that uses shallow neural networks as weak learners.
TabNet: A tree-mimic method using sequential attention for feature selection, offering interpretability and self-supervised learning capabilities.
TabR: A deep learning model that integrates a KNN component to enhance tabular data predictions through an efficient attention-like mechanism.
ModernNCA: A deep tabular model inspired by traditional Neighbor Component Analysis, which makes predictions based on the relationships with neighbors in a learned embedding space.
DNNR: Enhances KNN by using local gradients and Taylor approximations for more accurate and interpretable predictions.
AutoInt: A token-based method that uses a multi-head self-attentive neural network to automatically learn high-order feature interactions.
Saint: A token-based method that leverages row and column attention mechanisms for tabular data.
TabTransformer: A token-based method that

TALENT

Install / Use

README

🎉 Introduction

📚Citing TALENT

📰 What's New

🌟 Methods