SkillAgentSearch skills...

ROMOP

R package to easily interface with OMOP-formatted EHR data.

Install / Use

/learn @BenGlicksberg/ROMOP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

ROMOP Readme

Benjamin S. Glicksberg 9/14/2018

ROMOP

ROMOP is a flexible R package to interface with the Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model. Briefly, OMOP is a standardized relational database schema for Electronic Health Record (EHR) or Electronic Medical Record (EMR) data (i.e., patient data collected during clinical visits to a health system). The main benefit of a standardized schema is that it allows for interoperability between institutions, even if the underlying EHR vendors are disparate.

For a detailed description of the OMOP common data model, please visit this helpful wiki.

In its backend, OMOP relies on standardized data ontologies and metathesaureses, such as the Unified Medical Language System (UMLS), and as such, the queries within ROMOP heavily rely on these vocabularies. Athena is a great tool to better understand the concepts in these ontologies and identify ideal search terms of interest.

Features of
ROMOP

Manuscript information:
Glicksberg BS, Oskotsky B, Giangreco N, Thangaraj PM, Rudrapatna V, Datta D, Frazier R, Lee N, Larsen R, Tatonetti NP, Butte AJ. ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data. JAMIA open. 2019 Apr;2(1):10-4.

Sandbox Server

The Centers for Medicare and Medicaid Services (CMS) have released a synthetic clinical dataset DE-SynPUF) in the public domain with the aim of being reflective of the patient population but containing no protected health information. The OHDSI group has underwent the task of converting these data into the OMOP CDM format. Users are certainly able to set up this configuration on their own system following the instructions on the GitHub page. We obtained all data files from the OHDSI FTP server (accessed June 17th, 2018) and created the CDM (DDL and indexes) according to their official instructions, but modified for MySQL. For space considerations, we only uploaded one million rows of each of the data files. The sandbox server is a Rshiny server running as an Elastic Compute Cloud (EC2) instance on Amazon Web Services (AWS) querying a MySQL database server (AWS Aurora MySQL).

Requirements

Clinical Data

ROMOP requires EHR data to be in OMOP format and on a server accessible to by the user. In it’s current form, ROMOP can connect to databases in MySQL using the RMySQL driver or many other formats, including Oracle, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Google BigQuery, and Microsoft Parallel Data Warehouse, through utilization of the DatabaseConnector and SqlRender packages developed by the OHDSI group (see below).

Users without access to EHR data might consider using synthetic public data following the instructions provided by the OHDSI group here.

Programming Language

ROMOP is built in the R environment and developed on version 3.4.4 (2018-03-15).

ROMOP requires the following R packages:

  • DBI (developed on version 1.0.0)
  • data.table (developed on version 1.10.4-3).
  • dplyr (developed on version 0.7.4).

Driver-specific:

Installation

Download

ROMOP can be installed easily from github using the devtools package:

library(devtools)
install_github("BenGlicksberg/ROMOP")

Alternatively, the package can be downloaded directly from the github page and installed by the following steps:

  1. Unzip ROMOP-master.zip
  2. R CMD INSTALL ROMOP-master

Please see the Setup section to properly configure the package to work.

 

Setup

Credentials

In accordance with best practices for storing sensitive information, credentials are not saved in plain text but in the .Renviron file. A formatted .Renviron file is provided with the package with the following fields to fill in:

driver = ""
host = ""
username = ""
password = ""
dbname = ""
port = "3306" 
  • driver (case insensitive): “mysql” for MySQL or (according to OHDSI DatabaseConnector package) “postgresql” for PostgreSQL, “oracle” for Oracle, “sql server” for Microsoft SQL Server, “redshift” for Amazon Redshift, “pdw” for Microsoft Parallel Data Warehouse, or “bigquery” for Google BigQuery.
  • host (or server depending on database format)
  • dbname: OMOP EHR database name (or schema depending on database format)

Note that this .Renviron file has to be in the same directory where R is launched. If already using an .Renviron file, add this information to it.

Checks

With credentials correctly configured, the package can be loaded. ROMOP will now check for 3 conditions to be met:

  1. Check that the credentials exist and can be retrieved from .Renviron file:
    requires driver, host, username, password, dbname, and port exist

  2. Check that connection to OMOP EHR server and database can be made:
    uses the above credentails

  3. Check to ensure all required OMOP tables exist and contain (any) data:
    the required tables are:

<!-- end list -->
"concept","concept_ancestor","concept_relationship","condition_occurrence","death",
"device_exposure","drug_exposure","measurement","observation","person","procedure_occurrence","visit_occurrence"
  • if any of the above tables are missing, a warning message will be produced and the package will not be able to load properly.
  • if any of the above tables exist, but do not contain any data, a warning message will be produced but the package will still be able to function.

On start

Successfully pasing all checks will allow the user to begin using ROMOP.

  1. Set an output directory to use with the changeOutDirectory function (note: the default output directory will be declared on package load).
  2. Create/load the Data ontology (required to decode data types) using the makeDataOntology. For the first time running this package, the concept ontology will have to first be built, but if the store_ontology option is selected, the ontology will be saved as an .rds file for subsequent loading.

 

Functions

Utility

getDemographics

Description:  Retrieves and formats patient demographic data from the person and death tables. Option to restrict to patientlist of interest.

Usage:  ptDemo <- getDemographics(patient_list=NULL,declare=TRUE)

Arguments:

  patient_list         comma-separated string of patient ids
         a provdied patientlist will restrict search to ids. NULL will return demographic data for all available patients

  declare         TRUE/FALSE
         if TRUE, outputs status and updates to the screen

Value:

  Returns a data.table with demographic data: person_id, birth_datetime, age, Gender, Race, Ethnicity, death_date, Status (Alive/Deceased)

Details:

  • patient_list should be in the following format: “patient_id_1, patient_id_2, …”

 

getEncounters

Description:  Retrieves and formats patient encounter data from the visit_occurrence table. Requires patientlist input.

Usage:  ptEncs <- getEncounters(patient_list,declare=TRUE)

Arguments:

  patient_list         comma-separated string of patient ids
         searches for all encounter data for the patientlist inout.

  declare         TRUE/FALSE
         if TRUE, outputs status and updates to the screen

Value:

  Returns a data.table with encounter data: person_id, visit_occurrence_id, visit_start_datetime, visit_end_datetime, visit_source_value, visit_concept, visit_source_concept, admitting_concept, discharge_concept

Details:

  • patient_list should be in the following format: “patient_id_1, patient_id_2, …”

 

getClinicalData

Description:  Retrieves all relevant clinical data for individuals in a patientlist. Wrapper for domain-specific getData functions (which can also be used separately).

Usage:  ptClinicalData <- getClinicalData(patient_list, declare=TRUE)

Arguments:

  patient_list         comma-separated string of patient ids
         a provdied patientlist will restrict search to ids. NULL will return demographic data for all available patients

  declare         TRUE/FALSE
         if TRUE, outputs status and updates to the screen

Value:
  Returns a list of data.tables stratified by domain type (e.g., ptClinicalData$Condition, ptClinicalData$Observation, etc…)

Details:

  • patient_list should be in the following format: “patient_id_1, patient_id_2, …”
  • getClinicalData calls domain-specific getData functions for the following domains: Observation, Conditi

Related Skills

View on GitHub
GitHub Stars37
CategoryDevelopment
Updated1mo ago
Forks7

Languages

R

Security Score

90/100

Audited on Jan 29, 2026

No findings