MultiTab

[AAAI 2026] MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data

Generate Convert Improve

Install / Use

/learn @Armanfard-Lab/MultiTab

About this skill

Quality Score

0/100

README

<div align="center"> <h1>[AAAI'26] MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data</h1> </div>

Environment Setup

Follow these steps to set up the conda environment for the MultiTab project:

Prerequisites

Anaconda or Miniconda installed on your system
NVIDIA GPU with CUDA support (recommended for training)

Installation Steps

Create the conda environment:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate multitab
```

Verify the installation:

python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"

Dataset Setup Instructions

This project supports three datasets for multitask learning experiments:

Higgs Dataset: High-energy physics dataset for binary classification with additional regression targets
ACS Income Dataset: American Community Survey data for income prediction and demographic analysis
AliExpress Dataset: E-commerce dataset for click-through rate and conversion prediction

Quick Setup (Recommended)

The easiest way to set up all datasets is to use our preprocessed H5 files available on Hugging Face:

Configure the data root directory: Edit download_data.sh and set your desired data root:
```
DATA_ROOT="/path/to/your/data/"
```
Make the script executable:
```
chmod +x download_data.sh
```
Run the download script:
```
./download_data.sh
```
This will automatically download all three preprocessed datasets (Higgs, AliExpress, and ACS Income) in H5 format and organize them in the correct directory structure.

Manual Dataset Setup (Optional)

If you prefer to download the datasets from their original sources and perform the preprocessing yourself, please refer to the manual dataset setup instructions.

Running Experiments

Once you have set up your datasets, you can run experiments using the provided training script:

Make the script executable:
```
chmod +x run.sh
```

Configure the experiment parameters: Edit the variables at the top of run.sh:

DATA_ROOT="/path/to/data/"  # Path to your processed datasets
MODEL_NAME="mtt"            # Model to use (mtt, mmoe, ple, etc.)
DATASET="acs_income"        # Dataset name (acs_income, higgs, etc.)
GPU_ID=0                    # GPU ID for training
SEED=42                     # Random seed for reproducibility
PATIENCE=5                  # Early stopping patience

Run the experiment:
```
./run.sh
```

The script will automatically start training with the specified configuration and save results to the logs directory.

📚 Citation

If you use this code or find our work helpful, please cite:

@inproceedings{sinodinos2026multitab,
  title={MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data},
  author={Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={30},
  pages={25499--25507},
  year={2026}
}

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

17.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

Armanfard-Lab

View profile

View on GitHub

GitHub Stars5

CategoryEducation

Updated10d ago

Forks1

Armanfard-Lab/MultiTab

Languages

Python

Security Score

70/100

Audited on Mar 22, 2026

No findings