SkillAgentSearch skills...

PloverDB

An in-memory database service for hosting and serving biomedical knowledge graphs as TRAPI APIs

Install / Use

/learn @RTXteam/PloverDB
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PloverDB

Plover is a fully in-memory Python-based platform for hosting/serving Biolink-compliant knowledge graphs as TRAPI web APIs.

Plover was developed by Amy Glen. It is currently maintained by:

  • Stephen Ramsey,
  • Frankie Hodges,
  • Adilbek Bazarkulov

In answering queries, Plover abides by all Translator Knowledge Provider reasoning requirements; it also can normalize the underlying graph and convert query node IDs to the proper equivalent identifiers for the given knowledge graph.

Plover accepts TRAPI query graphs at its /query endpoint, which include:

  1. Single-hop query graphs: (>=1 ids)--[>=0 predicates]--(>=0 categories, >=0 ids)
  2. Edge-less query graphs: Consist only of query nodes (all of which must have ids specified)

The knowledge graph to be hosted needs to be in a Biolink-compliant, KGX-style format with separate nodes and edges files; both TSV and JSON Lines formats are supported. See this section for more info.

You must provide publicly accessible URLs from which the nodes/edges files can be downloaded in a config JSON file in PloverDB/app/ (e.g., config.json), or you can provide your graph files locally (see this section for more info). The config file includes a number of settings that can be customized and also defines the way in which node/edge properties should be loaded into TRAPI Attributes. See this section for more info.

Note that a single Plover app can host/serve multiple KPs - each KP is exposed at its own endpoint (e.g., /ctkp, /dakp), and has its own Plover config file. See this section for more info.

PloverDB returns nodes and edges in TRAPI format in response to a query request, but for each node that it returns, PloverDB does not include that node's "equivalent IDs" CURIEs. This is even though PloverDB (under most types of graph index builds) has that information while it is building its indexes. The reason why PloverDB doesn't return equivalent node IDs with its query results is for speed; lists of equivalent IDs can be quite lengthy. This information, furthermore, can be easily obtained by the client directly by calling the NodeNorm API.

Typical EC2 instance type used for building & hosting PloverDB

The PloverDB software has been tested with the following EC2 configuration:

  • AMI: Ubuntu Server 18.04.6 LTS (HVM), SSD Volume Type – ami-01d4b5043e089efa9 (64-bit x86)
  • Instance type: r5a.8xlarge (32 vCPUs, 256 GiB RAM)
  • Storage: 300 GiB root EBS volume
  • Security Group: plover-sg, ingress TCP on ports
    • 22 (SSH)
    • 80 (HTTP)
    • 443 (HTTPS)
    • 8000 (alternate API/UI)
    • 9990 (PloverDB API)

Host environment

  • Architecture: x86_64 (AMD EPYC 7571)
  • Kernel: Linux 5.4.0-1103-aws
  • Python (host): CPython 3.6.9+ (used for tests only)
  • Docker (host): 24.0.2+

Docker container

  • Base image: python:3.12.12 (pinned for reproducibility; see Migration Notes)
  • Python (in-container): CPython 3.12.12
  • WSGI server: uWSGI 2.0.31 (8-16 workers via cheaper algorithm)
  • Container user: plover (uid 1000, gid 1000) for ITRB Kubernetes compatibility
  • Exposed port: 80 (container), mapped to 9991 (host) by default
  • SSL/HTTPS: Handled externally (nginx on host or Kubernetes Ingress)
  • Python dependencies: pinned in requirements.txt

Cost estimate (us-west-2 on-demand):

  • r5a.4xlarge @ $0.904/hr -> build (~1 hr) ≈ 1

Table of Contents

  1. How to run
    1. How to run Plover locally (dev)
    2. How to deploy Plover
    3. Memory and space requirements
  2. How to test
  3. Provided endpoints
  4. Input files
    1. Nodes and edges files
    2. Config file
  5. Debugging
  6. Migration Notes
  7. Developer notes

How to run

First, you need to install Docker if you don't already have it.

  • For Ubuntu 20.04, try sudo apt-get install -y docker.io
  • For Mac, try brew install --cask docker

How to run Plover locally (dev)

To run Plover locally for development (assuming you have installed Docker), simply:

  1. Clone/fork this repo and navigate into it (cd PloverDB/)
  2. Edit the config file at /app/config.json for your particular graph (more info in this section)
  3. Run the following command:
    • bash -x run.sh

This will build a Plover Docker image and run a container off of it, publishing it at port 9991 (http://localhost:9991).

WARNING: run.sh does not seem to check for, and stop, any existing PloverDB container that might already be running on your system. So if there is one, you need to first stop it using sudo docker stop plovercontainer before running run.sh.

Note on ports:

  • The container serves HTTP on port 80 internally
  • By default, run.sh maps this to host port 9991
  • For HTTPS access, configure a reverse proxy (e.g., nginx) on the host to handle SSL on port 9990 and proxy to 9991

See this section for details on using/testing your Plover.

How to deploy Plover

NOTE: For more deployment info specific to the RTX-KG2/ARAX team, see the this page in the Plover wiki.

Because Plover is Dockerized, it can be run on any machine with Docker installed. Our deployment instructions below assume you're using a Linux host machine.

The amount of memory and disk space your host instance will need depends on the size/contents of your graph. See this section for more info on the memory/space requirements.

Steps to be done once, at initial setup for a new instance:

  1. Make sure ports 9990, 80, and 443 on the host instance are open. If you're planning to use the rebuild functionality, also open port 8000.
  2. Install SSL certificates on the host instance and set them up for auto-renewal:
    1. sudo snap install --classic certbot
    2. sudo ln -s /snap/bin/certbot /usr/bin/certbot
    3. sudo certbot certonly --standalone
      1. Enter your instance's domain name (e.g., multiomics.rtx.ai) as the domain to be certified. You can optionally also list any CNAMEs for the instance separated by commas (e.g., multiomics.rtx.ai,ctkp.rtx.ai).
    4. Verify the autorenewal setup by doing a dry run of certificate renewal:
      1. sudo certbot renew --dry-run
  3. Fork the PloverDB repo (or create a new branch, if you have permissions)
  4. Create a domain_name.txt file in PloverDB/app/ like so:
    • echo "multiomics.rtx.ai" > PloverDB/app/domain_name.txt
    • (plug in your domain name in place of multiomics.rtx.ai - needs to be the same domain name entered in the step above when configuring certbot)

Steps to build Plover after initial setup is complete:

  1. Edit the config file at PloverDB/app/config.json for your graph
    1. Most notably, you need to point to nodes/edges files for your graph in TSV or JSON Lines KGX format
    2. We suggest also changing the name of this file for your KP (e.g., config_mykp.json); just ensure that the file name starts with config and ends with .json
    3. Push this change to your PloverDB fork/branch
    4. More info on the config file contents is provided in this section
  2. Run bash -x PloverDB/run.sh

After the build completes and the container finishes loading, your Plover will be accessible at something like https://multiomics.rtx.ai:9990 (plug in your own domain name).

See this section for details on using/testing your Plover.

Automatic deployment methods

There are a couple options for automatic or semi-automatic deployment of your Plover service:

If for an NCATS Translator ITRB deployment, ask ITRB to set up continuous deployment for your fork/branch of the Plover repo, such that committing code to that branch (i.e., updating your config file(s)) will automatically trigger a rebuild of the ITRB application.

Otherwise, for a self-hosted deployment, you can use Plover's built-in remote deployment server. You can do this like so:

  1. On the host instance:
    1. Add a config_secrets.json file in the root PloverDB/ directory. Its contents should look something like this (where you plug in the usernames/API keys that should have deployment permissions):
      1. {"api-keys": {"my-secret-api-key": "myusername"}}
      2. Note that you can make the key and username whatever you would like.
    2. Start up a Python environment and do pip install -r PloverDB/requirements.txt
    3. Start the rebuild server by running fastapi run PloverDB/rebuild_main.py (you may want to do this in a screen session or the like)
  2. From any machine, you can then trigger a deployment/rebuild by submitting a request to the /rebuild endpoint like the following, adapted for your own instance name/username/API key/branch:
curl -X 'POST' \
   'http://multiomics.rtx.ai:8000/rebuild' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -H 'Authorization: Beare

Related Skills

View on GitHub
GitHub Stars8
CategoryData
Updated5d ago
Forks10

Languages

Python

Security Score

85/100

Audited on Mar 27, 2026

No findings