Spider2

[ICLR 2025 Oral] Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Generate Convert Improve

Install / Use

/learn @xlang-ai/Spider2

About this skill

Quality Score

0/100

README

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

<p align="center"> <a href="https://spider2-sql.github.io/">Website</a> • <a href="https://arxiv.org/abs/2411.07763">Paper</a> • <a href="https://docs.google.com/document/d/1a69mxO7m1nMndXp8H_-aggvYDbcbiS3rV9GPXEw-DeM/edit?usp=sharing">Data Update Log</a> • <a href="https://docs.google.com/document/d/1sCobAqJZcko-Vl3biOycwvCIR7kTwBPrhsgVfvaX1Fg/edit?usp=sharing">Submission Guidance</a> </p>

📰 News

2025-11-06: We apologize for the recent Snowflake login and credential issues caused by Snowflake’s password & MFA policy upgrade. Both Web UI login and Python credential access behaviors have changed.
Please carefully review the updated Snowflake guideline before continuing: https://github.com/xlang-ai/Spider2/blob/main/assets/Snowflake_Guideline.md
Thank you for your patience and understanding!
2025-10-29: Major update!
1. We fixed the evaluation-suite issue, so scores are now more accurate and stable. We also refreshed the affected methods on the leaderboard.
2. If you are willing to cover the Snowflake hosting cost (spider2-snow is free by default, but queries are queued), we can share the Spider2 Snowflake data directly to your own Snowflake project. See Spider2_Data_Host.md for details.
3. If you run into MF2A connection errors—meaning your credentials cannot access the Snowflake warehouse—check. Please see Snowflake Guideline.
2025-07-13: We update spider2-snow.jsonl to resolve ambiguities, with the previous version renamed to spider2-snow-0713.jsonl for reference.
2025-06-10: We implemented a tool-call-based Spider-Agent for Spider 2.0-Snow that requires no Docker and significantly improves runtime performance.
2025-05-22: We have created a new task setting, Spider2-DBT, and removed the original Spider2 setting. spider2-dbt consists of only 68 tasks, enabling quick and smooth benchmarking with spider-agent-dbt. It is a comprehensive, repository-level text-to-SQL task.
2025-04-20: We provide the ground-truth tables for spider2-lite and spider2-snow to help quick benchmarking and analysis. However, when using this setting, you must indicate that you are using oracle tables.
2025-01-10: Please refer to the data update log to track changes in the evaluation examples. The leaderboard results will also change dynamically accordingly.

2024-12-24: Considering the many evaluation requirements, we have decided to release all examples and gold answers for self-evaluation. However, only a small amount of gold SQL is available. The leaderboard is still active. To have your method officially validated and upload your scores to the leaderboard, please follow the submission guidance.

👋 Overview

Local Image

<div style="width: 10%; margin: auto;"> <table style="font-size: 12px; width: 100%;"> <tr> <th>Setting</th> <th>Task Type</th> <th>#Examples</th> <th>Databases</th> <th>Cost</th> </tr> <tr> <td><strong>Spider 2.0-Snow</strong></td> <td>Text-to-SQL task</td> <td>547</td> <td>Snowflake(547)</td> <td><span style="color: red;">NO COST!😊</span></td> </tr> <tr> <td><strong>Spider 2.0-Lite</strong></td> <td>Text-to-SQL task</td> <td>547</td> <td>BigQuery(214), Snowflake(198), SQLite(135)</td> <td>Some cost incurred</td> </tr> <tr> <td><strong>Spider 2.0-DBT</strong></td> <td>Code agent task</td> <td>68</td> <td>DuckDB (DBT)(68)</td> <td>NO COST!😊</td> </tr> </table> </div>

Data

The questions/instructions are in spider2-lite.jsonl and spider2-snow.jsonl.

We also release some gold SQLs to help users design prompts and methods, note that we do not recommend using the Spider 2.0 Gold SQL we released for fine-tuning.

🚀 Quickstart (Spider2-lite/snow)

Sign Up for Your Own BigQuery and Snowflake Accounts

To sign up for a BigQuery account, please follow this guideline, get your own credentials.
Follow this guideline and fill out this Spider2 Snowflake Access, and we will send you an account sign-up email, which will allow you to access the Snowflake database.

Important Notes:

If you want to access the FULL dataset of Spider 2.0-Lite, you must complete Step1 and Step2.
If you only want access to the FULL dataset of Spider 2.0-Snow, you only need to complete Step2.

Spider 2.0-Snow (Tool-call Format, UPDATE 2025-06-10)

A Docker-free and ultra-fast Spider-Agent implementation for rapid benchmarking of any model.

spider-agent-tool-call

Spider 2.0-Snow and Spider 2.0-Lite (Based on Docker)

We highly recommend that you directly use Spider2-snow and Spider2-lite for benchmarking and research. First, run the Spider-Agent Framework!!

For more details, please refer to the following links:

🚀 Quickstart (Spider2-dbt)

For more details, please refer to the following links:

spider2-dbt (The Data)
spider-agent-dbt (The Method)

📋 Leaderboard Submission

We only release the gold answer of about partial examples of Spider 2.0-Lite, Spider 2.0-Snow and Spider 2.0-DBT. You must follow this submission guidance to upload your score to leaderboard.

🙇‍♂️ Acknowledgement

We thank Snowflake for their generous support in hosting the Spider 2.0 Challenge. We also thank Minghang Deng, Tianbao Xie, Yiheng Xu, Fan Zhou, Yuting Lan, Per Jacobsson, Yiming Huang, Canwen

Related Skills

feishu-drive

337.4k

things-mac

337.4k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

337.4k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

1.9k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

xlang-ai

View profile

View on GitHub

GitHub Stars766

CategoryData

Updated10h ago

Forks123

xlang-ai/Spider2

Languages

HTML

Security Score

95/100

Audited on Mar 26, 2026

No findings