BigOBench

BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.

Generate Convert Improve

Install / Use

/learn @facebookresearch/BigOBench

About this skill

Quality Score

0/100

README

<h1 align="center">  <img style="width: 500px" src="./docs/images/logo_transparent.png" alt="logo"> </h1> <div align="center" style="line-height: 1;"> <a href="https://facebookresearch.github.io/BigOBench" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="HomePage" src="https://img.shields.io/badge/🏡%20HomePage-BigOBench-green" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://facebookresearch.github.io/BigOBench/leaderboard.html" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="Leaderboard" src="https://img.shields.io/badge/🏆%20Leaderboard-BigOBench-yellow" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://facebookresearch.github.io/BigOBench/demo.html" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="Explorer" src="https://img.shields.io/badge/🔎%20Explorer-BigOBench-white" style="display: inline-block; vertical-align: middle;"/></a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/facebookresearch/BigOBench" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch/BigOBench-black?logo=github" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://huggingface.co/datasets/facebook/BigOBench" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="HuggingFace" src="https://img.shields.io/badge/🤗%20HuggingFace-facebook/BigOBench-ffc107" style="display: inline-block; vertical-align: middle;"/></a> </div> <div align="center" style="line-height: 1;"><a href="https://arxiv.org/abs/2503.15242" target="_blank" style="margin: 2px; text-decoration: none !important;"><img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2503.15242-b5212f?logo=arxiv" style="display: inline-block; vertical-align: middle;"/></a> </div> <h2 align="center"> <p><i>Can LLMs Generate Code with <br> Controlled Time and Space Complexity?</i></p> </h2>

👋 Overview

🧐 Introduction
🙌 Project Overview
📋 Getting Started with the CODE
📚 Getting Started with the DATA
💯 BigO(Bench) Scores
🤗 Running BigO(Bench) on (your) HuggingFace Models
👨‍💻 Running BigO(Bench) on OpenAI Models
🤖 Running BigO(Bench) on anything else that runs !
🔬📈 Running the Dynamic Complexity Inference Framework on code snippets
🙏 Acknowledgements
License
📝 Citation

[!NOTE] Significant refactoring efforts have been made to enhance the usability and clarity of our codebase for public users. As we continue to identify and address any bugs, we will be pushing regular patches. If you encounter any issues or spot a bug, please don't hesitate to reach out – we would be delighted to investigate and resolve it promptly.

🧐 Introduction <sub><sup>(back to top)<sub><sup>

<span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> is a benchmark of ~300 code problems to be solved in Python, that evaluates whether LLMs can find the time-space complexity of code solutions or generate code solutions themselves that respect a time-space complexity requirement. This benchmark addresses the gap in current evaluations that often overlook the ability of models to comprehend and produce code constrained by computational complexity. <span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> includes a complexity inference framework that can run any Python code snippet, measure multiple runtime and memory footprint values, and infer its algorithmic time-space complexity. It also includes of set of 3,105 coding problems and 1,190,250 solutions from Code Contests annotated with inferred (synthetic) time and space complexity labels from the complexity framework, as well as corresponding runtime and memory footprint values for a large set of input sizes.

🙌 Project Overview <sub><sup>(back to top)<sub><sup>

Our project contains the following modules, and each of them is documented in its attached README file !

📋 Getting Started with the CODE <sub><sup>(back to top)<sub><sup>

To clone the repository, run

git clone git@github.com:facebookresearch/BigOBench.git
cd BigOBench

If you want to install everything at once, and run the BigOBench benchmark:

cd src
bash create_bigobench_env.sh

And then navigate to src/README.md to read about how to run the full BigOBench evaluation pipeline.

Otherwise, if you are specifically interested in one of our modules, you can install the dependencies of each module separately:

For the complexity framework:
```
cd src/complexity
bash create_complexity_env.sh
```
You can then navigate to src/complexity/README.md to get to know the complexity framework.
For the inference engine:
```
cd src/inference
bash create_vllm_env.sh
```
You can then navigate to src/inference/README.md to get to know the inference engine.
For the evaluation harness:
```
cd src/eval
bash create_eval_env.sh
```
You can then navigate to src/eval/README.md to get to know the evaluation harness.

📚 Getting Started with the DATA <sub><sup>(back to top)<sub><sup>

The data is available in 5 .jsonl files, hosted by HuggingFace Datasets.
You can directly download them from the HuggingFace website, or use the CLI

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

341.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.6k

Commit, push, and open a PR