Tablepedia
Author: Wenhao Yu (wyu1@nd.edu). WWW'20. Tabular data extraction. Data science experimental evidence (dataset).
Install / Use
/learn @DM2-ND/TablepediaREADME
Tablepedia: Code and dataset for WWW 2020 paper
Experimental Evidence Extraction in Data Science with Hybrid Table Features and Ensemble Learning <br> Authors: Wenhao Yu (ND), Wei Peng (ZJU), Yu Shu (SCU), Qingkai Zeng (ND), Meng Jiang (ND)
This paper propose a novel system that extracts experimental evidences from data science literature in PDF format and builds up the first experimental database for related research.
Workflow and Example DB constructed by Tablepedia
<img src="img/workflow.png" width="400" align=center> <img src="img/example.png" width="340" align=center>
-
The left figure is the workflow of Tablepedia system: (1) PDF collection, (2)table extraction, (3) experimental evidence database construction, (4)database operations and visualization.
-
The right figure is an example DB constructed by Tablepedia from data science paper PDFs. For a dataset and an evaluation metric, one can use the database to check what the state-of-the-art (highlighted in yellow) is and whether the reported numbers in existing research are consistent (green box) or conflicting (red box).
Code Usage
1. Load data
python data/load_data.py
2. Load annotation
python anno/load_anno.py
2. Extract your own PDF files
python tabula/tabula-java.py
Reference
@inproceedings{yu2020experimental,
title={Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning},
author={Yu, Wenhao and Peng, Wei and Shu, Yu and Zeng, Qingkai and Jiang, Meng},
booktitle={Proceedings of The Web Conference 2020},
pages={951--961},
year={2020}
}
Contact
<img src="img/ack.png" width="20" align=center> Please contact Wenhao Yu (wyu1@nd.edu) if you have any questions.
