Neurocard

State-of-the-art neural cardinality estimators for join queries

Generate Convert Improve

Install / Use

/learn @neurocard/Neurocard

About this skill

Quality Score

0/100

README

NeuroCard

NeuroCard is a neural cardinality estimator for multi-table join queries.

NeuroCard's philosophy is to learn as much correlation as possible across tables, thereby achieving high accuracy.

Technical details can be found in the VLDB 2021 paper, NeuroCard: One Cardinality Estimator for All Tables [bibtex].

Quick start | Main modules | Running experiments | Contributors | Citation

Quick start

Set up a conda environment with depedencies installed:

# On Ubuntu/Debian
sudo apt install build-essential
# Install Python environment
conda env create -f environment.yml
conda activate neurocard
# Run commands below inside this directory.
cd neurocard

Download the IMDB dataset as CSV files and place under datasets/job:

# Download size 1.2GB.
bash scripts/download_imdb.sh

# If you already have the CSVs or can export from a
# database, simply link to an existing directory.
# ln -s <existing_dir_with_csvs> datasets/job
# Run the following if the existing CSVs are without headers.
# python scripts/prepend_imdb_headers.py

Launch a short test run:

python run.py --run test-job-light

Main modules

| Module | Description | |------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------| | run | Main script to train and evaluate | | experiments | Registry of experiment configurations | | common | Abstractions for columns, tables, joined relations; column factorization | | factorized_sampler | Unbiased join sampler | | estimators | Cardinality estimators: probabilistic inference for density models; inference for column factorization | | datasets | Registry of datasets and schemas | | Models: made, transformer | Deep autoregressive models: ResMADE & Transformer |

Running experiments

Launch training and evaluation using a single script:

# 'name' is a config registered in experiments.py.
python run.py --run <name>

Registered configs. Hyperparameters are statically declared in experiments.py. New experiments (e.g., changing query files; running hparam tuning) can be specified there.

Configs for evaluation on pretrained checkpoints and full training runs:

| Benchmark | Config (reload pretrained ckpt) | Config (re-train) | Model | Num Params | |------------------|---------------------------------|----------------------------------------------------------------|-------------|------------| | JOB-light | job-light-reload | job-light | ResMADE | 1.0M | | JOB-light-ranges | job-light-ranges-reload | job-light-ranges | ResMADE | 1.1M | | | job-light-ranges-large-reload | job-light-ranges-large | Transformer | 5.4M | | JOB-M | job-m-reload | job-m | ResMADE | 7.2M | | | - | job-m-large (launch with --gpus=4 or lower the batch size) | Transformer | 107M |

The reload configs load pretrained checkpoints and run evaluation only. Normal configs start training afresh and also run evaluation.

Metrics & Monitoring. The key metrics to track are

Cardinality estimation accuracy (Q-errors): fact_psample_<num_psamples>_<quantile>
Quality of the density model: train_bits (negative log-likelihood in bits-per-tuple; lower is better).

The standard output prints these metrics and can be piped into a log file. If TensorBoard is installed, use the following to visualize:

python -m tensorboard.main --logdir ~/ray_results/

Contributors

This repo was written by

Citation

@inproceedings{neurocard,
  title={{NeuroCard}: One Cardinality Estimator for All Tables},
  author={Yang, Zongheng and Kamsetty, Amog and Luan, Sifei and Liang, Eric and Duan, Yan and Chen, Xi and Stoica, Ion},
  journal={Proceedings of the VLDB Endowment},
  volume={14},
  number={1},
  pages={61--73},
  year={2021},
  publisher={VLDB Endowment}
}

Related projects. NeuroCard builds on top of Naru and Variable Skipping.

Related Skills

feishu-drive

347.0k

things-mac

347.0k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

347.0k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

codebase-memory-mcp

1.2k

High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

neurocard

View profile

View on GitHub

GitHub Stars80

CategoryData

Updated1mo ago

Forks29

neurocard/neurocard

Languages

Python

Security Score

100/100

Audited on Mar 3, 2026

No findings