MultiTab
[AAAI 2026] MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data
Install / Use
/learn @Armanfard-Lab/MultiTabREADME
Environment Setup
Follow these steps to set up the conda environment for the MultiTab project:
Prerequisites
- Anaconda or Miniconda installed on your system
- NVIDIA GPU with CUDA support (recommended for training)
Installation Steps
-
Create the conda environment:
conda env create -f environment.yml -
Activate the environment:
conda activate multitab -
Verify the installation:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
Dataset Setup Instructions
This project supports three datasets for multitask learning experiments:
- Higgs Dataset: High-energy physics dataset for binary classification with additional regression targets
- ACS Income Dataset: American Community Survey data for income prediction and demographic analysis
- AliExpress Dataset: E-commerce dataset for click-through rate and conversion prediction
Quick Setup (Recommended)
The easiest way to set up all datasets is to use our preprocessed H5 files available on Hugging Face:
-
Configure the data root directory: Edit
download_data.shand set your desired data root:DATA_ROOT="/path/to/your/data/" -
Make the script executable:
chmod +x download_data.sh -
Run the download script:
./download_data.shThis will automatically download all three preprocessed datasets (Higgs, AliExpress, and ACS Income) in H5 format and organize them in the correct directory structure.
Manual Dataset Setup (Optional)
If you prefer to download the datasets from their original sources and perform the preprocessing yourself, please refer to the manual dataset setup instructions.
Running Experiments
Once you have set up your datasets, you can run experiments using the provided training script:
-
Make the script executable:
chmod +x run.sh -
Configure the experiment parameters: Edit the variables at the top of
run.sh:DATA_ROOT="/path/to/data/" # Path to your processed datasets MODEL_NAME="mtt" # Model to use (mtt, mmoe, ple, etc.) DATASET="acs_income" # Dataset name (acs_income, higgs, etc.) GPU_ID=0 # GPU ID for training SEED=42 # Random seed for reproducibility PATIENCE=5 # Early stopping patience -
Run the experiment:
./run.sh
The script will automatically start training with the specified configuration and save results to the logs directory.
📚 Citation
If you use this code or find our work helpful, please cite:
@inproceedings{sinodinos2026multitab,
title={MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data},
author={Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={30},
pages={25499--25507},
year={2026}
}
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
17.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
