SkillAgentSearch skills...

WikiTabGen

A benchmark dataset for LLM-based generation of tabular data

Install / Use

/learn @analysis-bots/WikiTabGen
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Generating Tables from the Parametric Knowledge of Language Models

This repository contains WikiTabGen - a benchmark for evaluating LLM capabilities in on-demand table generation.

The benchmark includes 100 tables curated and processed from the WikiTables Project. The tables feature a diverse set of properties: length, width, amount of numerical data, and popularity.

LeaderBoard

This is our current leaderboard, evaluating the LLMs ability to generate the correct data in the key columns, non-key columns and overall:

| Rank | LLM | Method | Keys F1 | Non-Keys F1 | Overall F1 | |------|------------|--------------|---------|-------------|------------| | 1 | GPT-4o | Row-by-row | 53.5% | 13.8% | 20.8% | | 2 | LLama3.1-70B | Full-Table | 49.9% | 13.1% | 20.0% | | 3 | GPT-4 | Row-by-row | 53.7% | 12.2% | 19.6% | | 4 | LLama3.1-70B | Row-by-row | 50.0% | 12.2% | 19.0% | | 5 | GPT-4 | Cell-by-cell | 53.7% | 11.1% | 18.6% | | 6 | GPT-4 | Full-Table | 43.8% | 11.5% | 17.5% | | 7 | GPT-4o | Full-Table | 40.3% | 10.5% | 16.3% | | 8 | GPT-3.5 | Full-Table | 46.4% | 9.6% | 16.1% | | 9 | GPT-3.5 | Cell-by-cell | 49.4% | 7.6% | 14.6% | | 10 | GPT-3.5 | Row-by-row | 49.4% | 7.2% | 14.3% |

Usage

Examples for GPT-3.5 for all prompting methods (full table, row-by-row, and cell-by-cell) are available in the example_notebooks folder. You need to set your open.api_key in the Imports section. Upon successful execution, a results folder will be created with the tables subfolder containing generated tables in CSV format, and a result.json file with the logs of prompts and LLM responses.

Evaluation

To produce the evaluation metrics of your experiment, run the notebook example_notebooks/Metrics_calculation.ipynb. You need to set the value of tables_folder (path to CSV files generated by LLM) and result_folder (path to the folder where you want to save the metrics report). The notebook will calculate the metrics and save the report in CSV format in the result_folder.

More

If you encounter any errors or observe unexpected behavior, please report the issue to us.

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated1mo ago
Forks1

Languages

Jupyter Notebook

Security Score

70/100

Audited on Feb 9, 2026

No findings