BamTwoogle
The BamTwoogle dataset accompanies "ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent" paper (https://arxiv.org/abs/2312.10003). It was written to be a complementary, slightly more challenging sequel to Bamboogle dataset. It addresses some of the shortcomings of Bamboogle we discovered while performing human evals for the paper.
Install / Use
/learn @google-research-datasets/BamTwoogleREADME
BamTwoogle dataset
This repository contains the BamTwoogle dataset for the paper ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent.
Overview
BamTwoogle is a small (100 questions in total), handcrafted collection of information-seeking questions. It was written to be a complementary, slightly more challenging sequel to Bamboogle dataset.
Dataset Description
The topics and question formats vary, but in general, BamTwoogle adheres to the following guidelines.
Questions
- The majority of questions require two searches or reasoning steps (like Bamboogle), but some of them need 3 or 4.
- Must have been manually checked to ensure the answer doesn’t appear on the first page of Google search results.
Expected answers
- Should not be ambiguous.
- Should not be prone to change over time, either due to the phrasing of the question or to the nature of the answer.
- Should account for multiple versions of proper names, etc., where appropriate.
- Should prefer Wikipedia as the source of truth for facts (preference given to topics/articles not flagged for incompleteness, lack of sources, etc.)
Citation
@misc{aksitov2023restmeetsreactselfimprovement,
title={ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent},
author={Renat Aksitov and Sobhan Miryoosefi and Zonglin Li and Daliang Li and Sheila Babayan and Kavya Kopparapu and Zachary Fisher and Ruiqi Guo and Sushant Prakash and Pranesh Srinivasan and Manzil Zaheer and Felix Yu and Sanjiv Kumar},
year={2023},
eprint={2312.10003},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2312.10003},
}
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Oct 16, 2025
