Codescientist
CodeScientist: An automated scientific discovery system for code-based experiments
Install / Use
/learn @allenai/CodescientistREADME

This is the repository for CodeScientist, an end-to-end semi-automated scientific discovery system that designs, iterates, and analyzes scientific experiments that can be expressed as (Python) code. CodeScientist creates novel ideas to explore essentially by using genetic mutations (using an LLM-as-a-mutator paradigm) to mutate combinations of scientific articles and code examples, with code examples including how to prompt an LLM, make a plot, or use a specific benchmark. The experiment ideas can then be implemented using the Experiment Builder, which automatically creates, runs, and debugs the experiment code in a container. When completed, CodeScientist writes a report on the results. Usually, CodeScientist makes several (for example, 5) independent attempts at creating experiments for a given idea, and can create a meta-analysis describing the overall results over each of the 5 experiment attempts.
CodeScientist can be run in two modes:
- Human-in-the-loop: A human helps build code examples, filter experiment ideas to run, and provides short comments on the ideas that might help their implementation. This is the primary mode we report in the paper.
- Fully-automatic: You can run CodeScientist in fully automatic mode with a few clicks, though it is less efficient at producing scientific results.
What you'll find in this repository:
-
CodeScientist Software: CodeScientist is open source, and this repository includes the full set of software and installation instructions.
-
Reports: The CodeScientist paper highlights a set of 20 candidate discoveries (in Table 4). These are readily available here: Example CodeScientist-Generated Experiment Reports and Code
-
Raw Data: The repository also includes a great deal of raw data: full experiment code, logs, ideas, external reviewer ratings, etc.
Table of Contents
- 0. Paper
- 1. Quick Start
- 1.1. I want to read about CodeScientist
- 1.2. I want to examine the papers, code, and other results created by CodeScientist
- 1.3. I want to run CodeScientist on my local machine
- 1.4. I would like to use CodeScientist in my own domain
- 1.5. I want to manually provide CodeScientist an idea to create an experiment for
- 1.6. I want to feed ideas into CodeScientist from another system
- 1.7. How do I use a specific aspect of CodeScientist?
- 1.8. I have a question not answered here
- 2. Example CodeScientist Generated Experiment Reports and Code
- (Experiment 1 – Experiment 6)
- 3. Installation and Running
- 4. Using CodeScientist
- 4.1. Create New Ideas (from Papers)
- 4.2. Create New Experiment (Manual)
- 4.3. Ideation List
- 4.4. Batch Autonomous Experiments
- 4.5. Run Benchmark
- 4.6. Experiment Monitor
- 4.7. Bulk Reporting and Meta-Analysis
- 4.8. Pregenerated Ideas/Filtering Ideas Externally
- 4.9. Your First Experiment -- "Hello World"
- 5. Adding Codeblocks to the Codeblock Library
- 6. The Experiment Builder Sandbox
- 7. Data
- 8. Benchmark
- 9. Using parts of CodeScientist in your own codebase
- 10. Prompts
- 11. Frequently Asked Questions
- 12. Danger Zone
- 13. Citation
- 14. License
- 15. Contact
0. Paper
CodeScientist is described in the following paper: CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation (ACL Findings 2025).

1. Quick Start
<span id="1-1-i-want-to-read-about-codescientist"/>1.1. I want to read about CodeScientist
The CodeScientist paper is available here: Section 0. Paper
<span id="1-2-i-want-to-examine-the-papers-code-and-other-results-created-by-codescientist"/>1.2. I want to examine the papers, code, and other results created by CodeScientist
- Appendix: A number of the highest quality experimental results (as rated by humans) are in the paper's Appendix.
- Example Papers: You can also see the above highly rated experimental results (and a number of rejected papers) here: example_papers/
- Lots of Papers: If you'd like all the details -- high-quality and low-quality experiments, including their papers, code, results, and logs, they are available in bulk here: generated_experments/
1.3. I want to run CodeScientist on my local machine
Please see the installation instructions in Section 3.1. Installation
<span id="1-4-i-would-like-to-use-codescientist-in-my-own-domain"/>1.4. I would like to use CodeScientist in my own domain.
To use CodeScientist in a subdomain other than the provided domain (i.e. agents and environments), there are two steps:
- Add papers in the subdomain: This is as easy as pasting Arxiv links into the
Create New Ideas (from Papers)menu item. - Add codeblocks: If you need specialized codeblocks for your domain other than the general ones provided in this repository, simply add them to the
codeblocksdirectory in the required format.
More information on these steps is provided in Section 3. Installation and Running and Section 4. Using CodeScientist
<span id="1-5-i-want-to-manually-provide-codescientist-an-idea-to-create-an-experiment-for"/>1.5. I want to manually provide CodeScientist an idea to create an experiment for, instead of using LLM-generated ideas.
You can do this by pressing the Create New Experiment (Manual) button on the main menu.
More detailed instructions on running CodeScientist are provided in Section 4.2. Create New Experiment (Manual)
1.6. I want to feed ideas into CodeScientist that were made from some other system.
You can do this in bulk using the Run Benchmark button -- see the section on pre-generating ideas for an example of the format CodeScientist expects here: Secton 4.8 .Pregenerated Ideas/Filtering Ideas Externally, followed by Section 4.5. Run Benchmark
1.7. How do I use [specific aspect of CodeScientist]?
Please see the instructions for various components in the detailed usage instructions of Section 4. Usage Instructions
<span id="1-8-i-have-a-question-not-answered-here"/>1.8. I have a question not answered here.
Please see the documentation below. If you're question isn't answered, please add an issue.
<span id="2-example-codescientist-generated-experiment-reports-and-code"/>2. Example CodeScientist Generated Experiment Reports and Code
Below are six example experiment reports (and s
