TabCSDI
A code for the NeurIPS 2022 Table Representation Learning Workshop paper: "Diffusion models for missing value imputation in tabular data"
Install / Use
/learn @pfnet-research/TabCSDIREADME
TabCSDI: Diffusion models for missing value imputation in tabular data
This is the repo for the workshop paper: Diffusion models for missing value imputation in tabular data | OpenReview.
Setup
pip install -r requirements.txt
Running experiments
We provide 3 datasets, including Breast (original), Breast (diagnostic), and Census datasets. For census datasets, three categorical variable handling methods are provided.
Run pure numerical datasets experiments:
- Breast (original) dataset
python exe_breast.py
- Breast (diagnostic) dataset
python exe_breastD.py
Run mixed datatypes experiments with census dataset:
- Using feature tokenization for categorical variables
python exe_census_ft.py
- Using analog bits encoding for categorical variables
python exe_census_analog.py
- Using one-hot encoding for categorical variables
python exe_census_onehot.py
Acknowledgements
The code repo is built upon the CSDI repo.
Reference
If you find our code useful or use it in your work, please cite the following paper:
@inproceedings{tashiro2021csdi,
title={Diffusion models for missing value imputation in tabular data},
author={Zheng, Shuhan and Charoenphakdee, Nontawat},
booktitle={NeurIPS Table Representation Learning (TRL) Workshop},
year={2022}
}
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
17.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
