WikiTabGen
A benchmark dataset for LLM-based generation of tabular data
Install / Use
/learn @analysis-bots/WikiTabGenREADME
Generating Tables from the Parametric Knowledge of Language Models
This repository contains WikiTabGen - a benchmark for evaluating LLM capabilities in on-demand table generation.
The benchmark includes 100 tables curated and processed from the WikiTables Project. The tables feature a diverse set of properties: length, width, amount of numerical data, and popularity.
LeaderBoard
This is our current leaderboard, evaluating the LLMs ability to generate the correct data in the key columns, non-key columns and overall:
| Rank | LLM | Method | Keys F1 | Non-Keys F1 | Overall F1 | |------|------------|--------------|---------|-------------|------------| | 1 | GPT-4o | Row-by-row | 53.5% | 13.8% | 20.8% | | 2 | LLama3.1-70B | Full-Table | 49.9% | 13.1% | 20.0% | | 3 | GPT-4 | Row-by-row | 53.7% | 12.2% | 19.6% | | 4 | LLama3.1-70B | Row-by-row | 50.0% | 12.2% | 19.0% | | 5 | GPT-4 | Cell-by-cell | 53.7% | 11.1% | 18.6% | | 6 | GPT-4 | Full-Table | 43.8% | 11.5% | 17.5% | | 7 | GPT-4o | Full-Table | 40.3% | 10.5% | 16.3% | | 8 | GPT-3.5 | Full-Table | 46.4% | 9.6% | 16.1% | | 9 | GPT-3.5 | Cell-by-cell | 49.4% | 7.6% | 14.6% | | 10 | GPT-3.5 | Row-by-row | 49.4% | 7.2% | 14.3% |
Usage
Examples for GPT-3.5 for all prompting methods (full table, row-by-row, and cell-by-cell) are available in the example_notebooks folder. You need to set your open.api_key in the Imports section. Upon successful execution, a results folder will be created with the tables subfolder containing generated tables in CSV format, and a result.json file with the logs of prompts and LLM responses.
Evaluation
To produce the evaluation metrics of your experiment, run the notebook example_notebooks/Metrics_calculation.ipynb. You need to set the value of tables_folder (path to CSV files generated by LLM) and result_folder (path to the folder where you want to save the metrics report). The notebook will calculate the metrics and save the report in CSV format in the result_folder.
More
If you encounter any errors or observe unexpected behavior, please report the issue to us.
