SkillAgentSearch skills...

MacGyver

Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?

Install / Use

/learn @allenai/MacGyver
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MacGyver: Are Large Language Models Creative Problem Solvers?

<p align="left"> <a href='https://arxiv.org/abs/2311.09682'> <img src='https://img.shields.io/badge/Arxiv-2308.16905-A42C25?style=flat&logo=arXiv&logoColor=A42C25'> </a> <a href='https://arxiv.org/pdf/2311.09682.pdf'> <img src='https://img.shields.io/badge/Paper-PDF-yellow?style=flat&logo=arXiv&logoColor=yellow'> </a> <a href='https://github.com/allenai/MacGyver'> <img src='https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github&logoColor=white'></a> </p>

MacGyver is a dataset consisting of over 1,600 real-world verbal problems deliberately designed to trigger innovative usage of objects and necessitate out-of-the-box thinking. Our dataset covers diverse topics, ranging from indoors/household, neutral, to outdoors. Some examples include:

Figure 1. Examples of the problems in our MacGyver dataset with the GPT-4 and human answers. (Pictures, drawn by DALL·E 3, are solely for illustration purposes and may not accurately reflect the text.)


Data

[1. Macgyver Dataset]

Our Macgyver Dataset can be downloaded in data/MacGyver. In addtion to the problem setup and corresponding solution, each data point in problem_solution_pair.xlsx contains the solvability status, and whether solving the problem requires using tools unconventionally.

additional_human_solutions.xlsx contains additional human solutions to our solvable subset.

[2. Additional Annotationed Solutions]

In addition to the problem statements and correct solutions, we release additional solution-annotation pairs (e.g., human annotations for all the machine/human solutions tested in benchmarking) in data/Benchmark_results. We hope these additional 4,700 answer-annotation pairs, containing a full gradient of correctness (completely wrong, partially correct, correct but less efficient, and perfect), will facilitate future works in automatic evaluation.

Code

We release the code to

  • the code to curate the dataset in code/progressive_data_creation
  • the prompt used to collect LLM solutions in code/collect_solutions
  • the prompt used in iterative self-reflect and convergent divergent thinking in code/progressive_data_creation

Contact yufeit@g.ucla.edu if you have questions.

Citation

If you find our paper/dataset/code helpful, please cite us using:

@inproceedings{tian2023macgyver,
  title = {MacGyver: Are Large Language Models Creative Problem Solvers?},
  author = {Tian, Yufei and Ravichander, Abhilasha and Qin, Lianhui and Bras, Ronan Le and Marjieh, Raja and Peng, Nanyun and Choi, Yejin and Griffiths, Thomas L. and Brahman, Faeze},
  year = {2024},
  booktitle = {Proceedings of NAACL},
  eprint = {2311.09682},
  url = {https://arxiv.org/abs/2311.09682},
  primaryclass = {cs.CL},
}
View on GitHub
GitHub Stars30
CategoryDevelopment
Updated2mo ago
Forks7

Languages

Python

Security Score

90/100

Audited on Jan 25, 2026

No findings