Beeline
BEELINE: evaluation of algorithms for gene regulatory network inference
Install / Use
/learn @Murali-group/BeelineREADME
BEELINE: Benchmarking gEnE reguLatory network Inference from siNgle-cEll transcriptomic data

BEELINE is a benchmarking framework for evaluating gene regulatory network (GRN) inference algorithms on single-cell RNA-seq data. It runs algorithms via Docker containers, evaluates their output against a ground truth network, and produces summary plots.
Full documentation: https://murali-group.github.io/Beeline/
Setup
1. Install the conda environment
bash utils/setupAnacondaVENV.sh
2. Pull algorithm Docker images
utils/initialize.sh manages Docker images for all supported BEELINE algorithms. By default it pulls pre-built images from the grnbeeline DockerHub organisation. Pass --build to build images locally from source in Algorithms/ instead.
bash utils/initialize.sh [OPTIONS]
| Flag | Description |
|------|-------------|
| -b / --build | Build images locally from source instead of pulling from DockerHub. |
| -v / --verbose | Enable verbose Docker output. |
| --remove-local-images | Remove locally built BEELINE images. If combined with --build, images are removed then rebuilt. |
| --remove-grnbeeline-images | Remove pulled DockerHub (grnbeeline) images. If combined with --build, images are removed then rebuilt. |
| -h / --help | Display usage information and exit. |
3. Activate the environment
source ~/miniconda3/etc/profile.d/conda.sh
conda activate BEELINE
Usage
All three pipeline stages take a YAML configuration file via -c/--config.
1. Run algorithms — BLRunner.py
Runs one or more GRN inference algorithms on the specified datasets.
python BLRunner.py -c config-files/Curated/VSC.yaml
Each algorithm's output is written to outputs/<dataset_id>/<run_id>/<algorithm_id>/rankedEdges.csv.
2. Evaluate results — BLEvaluator.py
Computes evaluation metrics by comparing each algorithm's ranked edge list to the ground truth network.
python BLEvaluator.py -c config-files/Curated/VSC.yaml [flags]
| Flag | Metric |
|------|--------|
| -a / --auc | AUPRC and AUROC |
| -e / --epr | Early precision ratio |
| -s / --sepr | Signed early precision (activation / inhibition) |
| -r / --spearman | Spearman correlation of predicted edge ranks |
| -j / --jaccard | Jaccard index of top-k predicted edges |
| -t / --time | Algorithm runtime |
| -m / --motifs | Network motif counts in top-k predicted networks |
| -p / --paths | Path length statistics on top-k predicted networks |
| -b / --borda | Borda-count edge aggregation across algorithms |
3. Plot results — BLPlotter.py
Generates publication-style figures from evaluation output.
python BLPlotter.py -c config-files/Curated/VSC.yaml -o ./plots [flags]
| Flag | Output | Description |
|------|--------|-------------|
| -a / --auprc | AUPRC/<dataset>-AUPRC.{pdf,png} | Per-dataset AUPRC plots. One run: precision-recall curve. Multiple runs: box plots. |
| -r / --auroc | AUROC/<dataset>-AUROC.{pdf,png} | Per-dataset AUROC plots. One run: ROC curve. Multiple runs: box plots. |
| -e / --epr | EPR/<dataset>-EPR.{pdf,png} | Per-dataset box plot of early precision values per algorithm. |
| --summary | Summary.{pdf,png} | Heatmap of median AUPRC ratio and Spearman stability. |
| --epr-summary | EPRSummary.{pdf,png} | Heatmap of AUPRC ratio, EPR ratio, and signed EPR ratios. |
| --all | all of the above | Run all plots. |
Configuration
Config files are YAML and follow this structure:
input_settings:
input_dir: "inputs/Curated"
datasets:
- dataset_id: "mHSC"
nickname: "mHSC-E" # optional: overrides dataset_id in plot labels
groundTruthNetwork: "GroundTruthNetwork.csv"
runs:
- run_id: "mHSC-500-1"
- run_id: "mHSC-500-2"
algorithms:
- algorithm_id: "GENIE3"
image: "grnbeeline/arboreto:base"
should_run: True
params: {}
- algorithm_id: "PPCOR"
image: "grnbeeline/ppcor:base"
should_run: True
params:
pVal: 0.01
output_settings:
output_dir: "outputs"
input_settings
| Field | Required | Description |
|-------|----------|-------------|
| input_dir | Yes | Base directory containing all input datasets. Can be absolute or relative to the working directory. |
| datasets | Yes | List of dataset groups. See Dataset fields below. |
| algorithms | Yes | List of algorithms to run. See Algorithm fields below. |
Dataset fields
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| dataset_id | Yes | — | Name of the dataset group. Used as a subdirectory under input_dir. |
| should_run | No | [True] | Set to [False] to skip this dataset entirely. |
| groundTruthNetwork | No | GroundTruthNetwork.csv | Filename of the ground truth edge list CSV, located in the dataset group directory (shared across all runs). |
| nickname | No | dataset_id | Short display label used by the plotter for plot titles and heatmap column headers. Does not affect any file paths. |
| scan_run_subdirectories | No | false | When true, runs are discovered automatically by scanning all subdirectories of input_dir/dataset_id/. Mutually exclusive with runs; an error is raised if no subdirectories are found. |
| runs | No* | — | List of individual run variants. Required unless scan_run_subdirectories is set. See Run fields below. |
Run fields
Each entry under runs represents one replicate or condition variant. Input files are expected at input_dir/dataset_id/run_id/.
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| run_id | Yes | — | Identifier for this run. Used as the subdirectory name within the dataset group directory. |
| exprData | No | ExpressionData.csv | Expression data filename, located in the run directory. |
| pseudoTimeData | No | PseudoTime.csv | Pseudotime data filename, located in the run directory. |
Algorithm fields
| Field | Required | Description |
|-------|----------|-------------|
| algorithm_id | Yes | Algorithm name. Must match one of the supported identifiers (see Supported Algorithms). |
| image | Yes | Docker image name to run for this algorithm (e.g., "grnbeeline/genie3:base"). Use "local" for algorithms that run directly in the conda environment without Docker. See the Supported Algorithms table for default image names. |
| should_run | Yes | Set to True to run this algorithm, False to skip it. |
| params | No | Dict of algorithm-specific parameters. Values are typically wrapped in a single-element list (e.g., pVal: [0.01]); the runner unwraps them automatically. |
output_settings
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| output_dir | Yes | — | Base directory for all output files. Can be absolute or relative to the working directory. |
| experiment_id | No | — | When set, inserts an extra path segment between output_dir and the dataset path. Useful for keeping outputs from separate experiment runs (e.g., different parameter sweeps) in the same base directory without overwriting each other. |
Output files are written to:
output_dir/[experiment_id/]dataset_id/run_id/algorithm_id/rankedEdges.csv
Preparing Inputs — generateExpInputs.py
generateExpInputs.py is a preprocessing utility for filtering real scRNA-seq expression data down to a biologically meaningful gene subset before running the BEELINE pipeline. It reads a full expression matrix and a gene-ordering file (containing per-gene p-values and optionally variance), retains only genes that pass a significance threshold, and writes a filtered expression matrix and (optionally) a filtered ground truth network.
Basic usage
python generateExpInputs.py \
-e ExpressionData.csv \
-g GeneOrdering.csv \
-f STRING-network.csv \
-i human-tfs.csv \
-p 0.01 \
-n 500 \
-o my-dataset
This produces my-dataset-ExpressionData.csv and my-dataset-network.csv in the working directory.
Arguments
| Flag | Default | Description |
|------|---------|-------------|
| -e / --expFile | ExpressionData.csv | Full expression matrix (genes × cells). Rows are genes (index column), columns are cells. |
| -g / --geneOrderingFile | GeneOrdering.csv | Gene ordering file indexed by gene name. First column must be a p-value; second column (optional) is per-gene variance used when --sort-variance is active. |
| -f / --netFile | (omit to skip) | Ground truth network CSV with Gene1 and Gene2 columns. When provided, the network is filtered to the retained gene set, self-loops and duplicate edges are removed, and the result is written alongside the expression output. |
| -i / --TFFile | human-tfs.csv | Single-column CSV of transcription factor names. Used to force-include significantly varying TFs regardless of the non-TF gene count limit. |
| -p / --pVal | 0.01 | Nominal p-value cutoff. Genes with a p-value at or above this threshold are excluded. Set to 0 to disable p-value filtering entirely. |
| -n / --numGenes | 500 | Number of non-TF genes to include after TFs have been separated out. Set to 0 to include TFs only. |
| -o / --outPrefix | BL- | Prefix for output filenames. Outputs are written as <prefix>-ExpressionData.csv and <prefix>-network.csv. |
| -c / --BFcorr | enabled | Apply Bonferroni correction to the p-value cutoff (divides -p by the number of tested genes). Disable with --no-BFcorr. |
| -t / --TFs | enabl
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
