GenExp
Code and data for "Agentic AI Integrated with Scientific Knowledge: Laboratory Validation in Systems Biology"
Install / Use
/learn @DanielBrunnsaker/GenExpREADME
Agentic AI Integrated with Scientific Knowledge: Laboratory Validation in Systems Biology
Install and setup
Option 1: using prebuilt image
A prebuilt image is provided.
docker run -it --platform=linux/amd64 --rm --user=vscode docker.io/alecgower/genexp:prebuilt /bin/bash
sudo mkdir /workspaces && sudo chown vscode /workspaces && cd /workspaces
git clone -b experimental https://github.com/DanielBrunnsaker/GenExp.git && cd GenExp
pip3 install --user -r requirements.txt && Rscript scripts/install_requirements.R
Option 2: using Development Container files
To build yourself, clone the Git repository and use the provided configuration in the .devcontainer directory to build and run a container with all software and packages installed. A list of supporting tools and services for Development Containers is provided at https://containers.dev/supporting.
Option 3: manual installation
Install Python and R dependencies
Using Python (3.10.18) and R (4.2.3) the dependencies can be installed from the requirements.txt file, e.g. using conda and the following commands:
$ conda create --name genExp python=3.10.13 r-base=4.2.3 && \\
conda activate genExp && \\
pip install -r requirements.txt && \\
Rscript scripts/install_requirements.R
Install SWI-Prolog
Follow download and install instructions here. It can also be installed using package managers such as apt, snap, and brew.
Install MiniZinc
At parts of the experiment planning process, the workflow makes use of PLAID [1]. In order to run this, one needs to have Minizinc (tested with v.2.8.7) installed. Instructions can be found on the PLAID GitHub. Note that you will need to define the path to the minizinc executable in scripts/config.py.
Install RMLMapper
Run scripts/install_rml.sh to install the RMLMapper JAR in /opt/tools. If you decide to install this somewhere else in your filesystem, you will need to change the RMLMAPPER_JAR variable in map_protocols.sh.
Setting up a ChEBI SPARQL endpoint
Currently the database creation programmes rely on a privately hosted SPARQL endpoint for ChEBI to look up compounds for inclusion in the graph database.
We recommend these steps.
- Follow the instructions for our (recently tested) containerised Fuseki server implementation to build the Docker image locally.
- Download the
chebi.ttlfile from the Zenodo store and save in thedata/directory. - Again following the instructions, run
DATA=chebi.ttl docker compose up load-data. - Once the data loading is complete, run
docker compose up start-server -d.
We are investigating publically hosted options that have the same functionality, which should simplify this process.
Setting up an API-key
Add a key.txt file (see .gitignore) containing only the API key ("sk-proj---XXXXXXXX...") for OpenAI in the root folder.
Usage
Prompts
The template prompts used for the examples in the manuscript are available in /context. Investigation-specific contexts/prompts (i.e. the ones actually used for the different investigations) can be found inside the experiments folder. e.g. experiments/completed_experiments/arginine_202503131539/versions.
Generation and experiment execution parameters
Settings regarding the runs can be found in scripts/config.py.
Note that you will need to change paths to relevant executables in the config-file.
Generating an hypothesis, experimental design and liquid handling scripts
From the scripts folder, run the following command in the terminal (fill in the blanks):
python genexp.py --target <string> --N <integer> --alpha <float>
targetdenotes the metabolite observable used for learning the association (an amino acid, in this case).Ndenotes the number of patterns passed to the hypothesis generation step. If more than 1, an LLM-agent will select the most reasonable one. Default at 1.alphais a float between 0 and 1 that is used to penalize patterns not unique to the specific metabolite observable (a number closer to 1 will ensure that patterns that are only deemed important for your specific target will rank higher). Default at 0.0overrideis an optional parameter, if you want to manually override the selected logic program with one of your own.
This will create a folder in /experiments with all of the details regarding the experiments (e.g. hypothesis, protocol, liquid handling scripts, ...). Note that you will be prompted for stock concentrations (if compounds are not present in the library) during the run. When the scripts have been run on the Hamilton, EVE and via AutonoMS [2] and data has been aquired and saved in data/growth/raw and data/metabolomics/raw, run the following command to process and analyse all of the data. For details regarding data acquisition, see protocol/hamilton/scripts, protocol/overlord/scripts and protocol/mass_spectrometry. output_folder denotes the folder created by the prior step.
python analysis.py --output_folder <string> --metabolomics_analysis <bool>
This will automatically run outlier curation, processing, normalization and statistical testing on the growth data and metabolomics data. It will also generate a basic result report.
Creating graph database
To process the generated hypotheses, and the associated experimental plans and data, run scripts/db_creation.sh and follow the prompts. This will create TrIG files for each study, and one for all hypotheses, in the experiments directory. A merged file with all quads from each file, and those from the ontology, is created at experiments/merged_dataset.trig.
Querying the database
For one-off queries to the database file (experiments/merged_dataset.trig), pipe the SPARQL query to the scripts/pyoxi_query.py script. For example:
echo "SELECT * WHERE { GRAPH ?g { ?s ?p ?o } } LIMIT 10" | python scripts/pyoxi_query.py
It is also possible to host a SPARQL endpoint using Jena Fuseki by following the same steps outlined above for the ChEBI endpoint.
The sample query examples-for-manuscript/sparql/hypotheses_test_support.rq, which is used for the manuscript example of efficient reuse of experimental data.
References:
- https://www.sciencedirect.com/science/article/pii/S266731852300017X?via%3Dihub
- https://pubs.acs.org/doi/10.1021/jasms.3c00396
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
