SkillAgentSearch skills...

Tablepedia

Author: Wenhao Yu (wyu1@nd.edu). WWW'20. Tabular data extraction. Data science experimental evidence (dataset).

Install / Use

/learn @DM2-ND/Tablepedia
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Tablepedia: Code and dataset for WWW 2020 paper

Experimental Evidence Extraction in Data Science with Hybrid Table Features and Ensemble Learning <br> Authors: Wenhao Yu (ND), Wei Peng (ZJU), Yu Shu (SCU), Qingkai Zeng (ND), Meng Jiang (ND)

This paper propose a novel system that extracts experimental evidences from data science literature in PDF format and builds up the first experimental database for related research.

Workflow and Example DB constructed by Tablepedia

<img src="img/workflow.png" width="400" align=center> <img src="img/example.png" width="340" align=center>

  • The left figure is the workflow of Tablepedia system: (1) PDF collection, (2)table extraction, (3) experimental evidence database construction, (4)database operations and visualization.

  • The right figure is an example DB constructed by Tablepedia from data science paper PDFs. For a dataset and an evaluation metric, one can use the database to check what the state-of-the-art (highlighted in yellow) is and whether the reported numbers in existing research are consistent (green box) or conflicting (red box).

Code Usage

1. Load data

python data/load_data.py

2. Load annotation

python anno/load_anno.py

2. Extract your own PDF files

python tabula/tabula-java.py

Reference

@inproceedings{yu2020experimental,
  title={Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning},
  author={Yu, Wenhao and Peng, Wei and Shu, Yu and Zeng, Qingkai and Jiang, Meng},
  booktitle={Proceedings of The Web Conference 2020},
  pages={951--961},
  year={2020}
}

Contact

<img src="img/ack.png" width="20" align=center> Please contact Wenhao Yu (wyu1@nd.edu) if you have any questions.

View on GitHub
GitHub Stars5
CategoryData
Updated1y ago
Forks0

Languages

Python

Security Score

60/100

Audited on Nov 11, 2024

No findings