BamTwoogle

The BamTwoogle dataset accompanies "ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent" paper (https://arxiv.org/abs/2312.10003). It was written to be a complementary, slightly more challenging sequel to Bamboogle dataset. It addresses some of the shortcomings of Bamboogle we discovered while performing human evals for the paper.

Generate Convert Improve

Install / Use

/learn @google-research-datasets/BamTwoogle

About this skill

Quality Score

0/100

README

BamTwoogle dataset

This repository contains the BamTwoogle dataset for the paper ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent.

Overview

BamTwoogle is a small (100 questions in total), handcrafted collection of information-seeking questions. It was written to be a complementary, slightly more challenging sequel to Bamboogle dataset.

Dataset Description

The topics and question formats vary, but in general, BamTwoogle adheres to the following guidelines.

Questions

The majority of questions require two searches or reasoning steps (like Bamboogle), but some of them need 3 or 4.
Must have been manually checked to ensure the answer doesn’t appear on the first page of Google search results.

Expected answers

Should not be ambiguous.
Should not be prone to change over time, either due to the phrasing of the question or to the nature of the answer.
Should account for multiple versions of proper names, etc., where appropriate.
Should prefer Wikipedia as the source of truth for facts (preference given to topics/articles not flagged for incompleteness, lack of sources, etc.)

Citation

@misc{aksitov2023restmeetsreactselfimprovement,
      title={ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent}, 
      author={Renat Aksitov and Sobhan Miryoosefi and Zonglin Li and Daliang Li and Sheila Babayan and Kavya Kopparapu and Zachary Fisher and Ruiqi Guo and Sushant Prakash and Pranesh Srinivasan and Manzil Zaheer and Felix Yu and Sanjiv Kumar},
      year={2023},
      eprint={2312.10003},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2312.10003}, 
}

Related Skills

node-connect

351.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。