This is the repository for CodeScientist, an end-to-end semi-automated scientific discovery system that designs, iterates, and analyzes scientific experiments that can be expressed as (Python) code. CodeScientist creates novel ideas to explore essentially by using genetic mutations (using an LLM-as-a-mutator paradigm) to mutate combinations of scientific articles and code examples, with code examples including how to prompt an LLM, make a plot, or use a specific benchmark. The experiment ideas can then be implemented using the Experiment Builder, which automatically creates, runs, and debugs the experiment code in a container. When completed, CodeScientist writes a report on the results. Usually, CodeScientist makes several (for example, 5) independent attempts at creating experiments for a given idea, and can create a meta-analysis describing the overall results over each of the 5 experiment attempts.

CodeScientist can be run in two modes:

Human-in-the-loop: A human helps build code examples, filter experiment ideas to run, and provides short comments on the ideas that might help their implementation. This is the primary mode we report in the paper.
Fully-automatic: You can run CodeScientist in fully automatic mode with a few clicks, though it is less efficient at producing scientific results.

What you'll find in this repository:

CodeScientist Software: CodeScientist is open source, and this repository includes the full set of software and installation instructions.
Reports: The CodeScientist paper highlights a set of 20 candidate discoveries (in Table 4). These are readily available here: Example CodeScientist-Generated Experiment Reports and Code
Raw Data: The repository also includes a great deal of raw data: full experiment code, logs, ideas, external reviewer ratings, etc.

0. Paper
1. Quick Start
2. Example CodeScientist Generated Experiment Reports and Code
- (Experiment 1 – Experiment 6)
3. Installation and Running
- 3.1. Installation
- 3.2. Running the Server and GUI
4. Using CodeScientist
5. Adding Codeblocks to the Codeblock Library
- 5.1. Browsing existing codeblocks
- 5.2. Adding a new codeblock
6. The Experiment Builder Sandbox
7. Data
8. Benchmark
9. Using parts of CodeScientist in your own codebase
10. Prompts
11. Frequently Asked Questions
12. Danger Zone
13. Citation
14. License
15. Contact

0. Paper

CodeScientist is described in the following paper: CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation (ACL Findings 2025).

codescientist-paper

1. Quick Start

1.1. I want to read about CodeScientist

The CodeScientist paper is available here: Section 0. Paper

1.2. I want to examine the papers, code, and other results created by CodeScientist

Appendix: A number of the highest quality experimental results (as rated by humans) are in the paper's Appendix.
Example Papers: You can also see the above highly rated experimental results (and a number of rejected papers) here: example_papers/
Lots of Papers: If you'd like all the details -- high-quality and low-quality experiments, including their papers, code, results, and logs, they are available in bulk here: generated_experments/

1.3. I want to run CodeScientist on my local machine

Please see the installation instructions in Section 3.1. Installation

1.4. I would like to use CodeScientist in my own domain.

To use CodeScientist in a subdomain other than the provided domain (i.e. agents and environments), there are two steps:

Add papers in the subdomain: This is as easy as pasting Arxiv links into the Create New Ideas (from Papers) menu item.
Add codeblocks: If you need specialized codeblocks for your domain other than the general ones provided in this repository, simply add them to the codeblocks directory in the required format.

More information on these steps is provided in Section 3. Installation and Running and Section 4. Using CodeScientist

1.5. I want to manually provide CodeScientist an idea to create an experiment for, instead of using LLM-generated ideas.

You can do this by pressing the Create New Experiment (Manual) button on the main menu. More detailed instructions on running CodeScientist are provided in Section 4.2. Create New Experiment (Manual)

1.6. I want to feed ideas into CodeScientist that were made from some other system.

You can do this in bulk using the Run Benchmark button -- see the section on pre-generating ideas for an example of the format CodeScientist expects here: Secton 4.8 .Pregenerated Ideas/Filtering Ideas Externally, followed by Section 4.5. Run Benchmark

1.7. How do I use [specific aspect of CodeScientist]?

Please see the instructions for various components in the detailed usage instructions of Section 4. Usage Instructions

1.8. I have a question not answered here.

Please see the documentation below. If you're question isn't answered, please add an issue.

2. Example CodeScientist Generated Experiment Reports and Code

Below are six example experiment reports (and s

Codescientist

Install / Use

README

Table of Contents