BigOBench
BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.
Install / Use
/learn @facebookresearch/BigOBenchREADME
👋 Overview
[!NOTE] Significant refactoring efforts have been made to enhance the usability and clarity of our codebase for public users. As we continue to identify and address any bugs, we will be pushing regular patches. If you encounter any issues or spot a bug, please don't hesitate to reach out – we would be delighted to investigate and resolve it promptly.
🧐 Introduction <sub><sup>(back to top)<sub><sup>

<span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> is a benchmark of ~300 code problems to be solved in Python, that evaluates whether LLMs can find the time-space complexity of code solutions or generate code solutions themselves that respect a time-space complexity requirement. This benchmark addresses the gap in current evaluations that often overlook the ability of models to comprehend and produce code constrained by computational complexity. <span style="font-variant: small-caps;"><b>BigO(Bench)</b></span> includes a complexity inference framework that can run any Python code snippet, measure multiple runtime and memory footprint values, and infer its algorithmic time-space complexity. It also includes of set of 3,105 coding problems and 1,190,250 solutions from Code Contests annotated with inferred (synthetic) time and space complexity labels from the complexity framework, as well as corresponding runtime and memory footprint values for a large set of input sizes.
🙌 Project Overview <sub><sup>(back to top)<sub><sup>
Our project contains the following modules, and each of them is documented in its attached README file !
-
BigOBench/data -
BigOBench/docs -
BigOBench/src-
🔥3️⃣ Post-process with the Complexity Framework -
src/complexity -
BigOBench/src/complexity -
BigOBench/src/eval -
BigOBench/src/inference
📋 Getting Started with the CODE <sub><sup>(back to top)<sub><sup>
To clone the repository, run
git clone git@github.com:facebookresearch/BigOBench.git
cd BigOBench
If you want to install everything at once, and run the BigOBench benchmark:
cd src
bash create_bigobench_env.sh
And then navigate to src/README.md to read about how to run the full BigOBench evaluation pipeline.
Otherwise, if you are specifically interested in one of our modules, you can install the dependencies of each module separately:
-
For the complexity framework:
cd src/complexity bash create_complexity_env.shYou can then navigate to src/complexity/README.md to get to know the complexity framework.
-
For the inference engine:
cd src/inference bash create_vllm_env.shYou can then navigate to src/inference/README.md to get to know the inference engine.
-
For the evaluation harness:
cd src/eval bash create_eval_env.shYou can then navigate to src/eval/README.md to get to know the evaluation harness.
📚 Getting Started with the DATA <sub><sup>(back to top)<sub><sup>
The data is available in 5 .jsonl files, hosted by HuggingFace Datasets.
You can directly download them from the HuggingFace website, or use the CLI
``
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
