ROMOP
R package to easily interface with OMOP-formatted EHR data.
Install / Use
/learn @BenGlicksberg/ROMOPREADME
ROMOP Readme
Benjamin S. Glicksberg 9/14/2018
ROMOP
ROMOP is a flexible R package to interface with the Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model. Briefly, OMOP is a standardized relational database schema for Electronic Health Record (EHR) or Electronic Medical Record (EMR) data (i.e., patient data collected during clinical visits to a health system). The main benefit of a standardized schema is that it allows for interoperability between institutions, even if the underlying EHR vendors are disparate.
For a detailed description of the OMOP common data model, please visit this helpful wiki.
In its backend, OMOP relies on standardized data ontologies and metathesaureses, such as the Unified Medical Language System (UMLS), and as such, the queries within ROMOP heavily rely on these vocabularies. Athena is a great tool to better understand the concepts in these ontologies and identify ideal search terms of interest.

Manuscript information:
Glicksberg BS, Oskotsky B, Giangreco N, Thangaraj PM, Rudrapatna V, Datta D, Frazier R, Lee N, Larsen R, Tatonetti NP, Butte AJ. ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data. JAMIA open. 2019 Apr;2(1):10-4.
Sandbox Server
The Centers for Medicare and Medicaid Services (CMS) have released a synthetic clinical dataset DE-SynPUF) in the public domain with the aim of being reflective of the patient population but containing no protected health information. The OHDSI group has underwent the task of converting these data into the OMOP CDM format. Users are certainly able to set up this configuration on their own system following the instructions on the GitHub page. We obtained all data files from the OHDSI FTP server (accessed June 17th, 2018) and created the CDM (DDL and indexes) according to their official instructions, but modified for MySQL. For space considerations, we only uploaded one million rows of each of the data files. The sandbox server is a Rshiny server running as an Elastic Compute Cloud (EC2) instance on Amazon Web Services (AWS) querying a MySQL database server (AWS Aurora MySQL).
Requirements
Clinical Data
ROMOP requires EHR data to be in OMOP format and on a server accessible to by the user. In it’s current form, ROMOP can connect to databases in MySQL using the RMySQL driver or many other formats, including Oracle, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Google BigQuery, and Microsoft Parallel Data Warehouse, through utilization of the DatabaseConnector and SqlRender packages developed by the OHDSI group (see below).
Users without access to EHR data might consider using synthetic public data following the instructions provided by the OHDSI group here.
Programming Language
ROMOP is built in the R environment and developed on version 3.4.4 (2018-03-15).
ROMOP requires the following R packages:
- DBI (developed on version 1.0.0)
- data.table (developed on version 1.10.4-3).
- dplyr (developed on version 0.7.4).
Driver-specific:
- RMySQL (developed on version 0.10.14).
- DatabaseConnector (developed on version 2.2.0)
- DatabaseConnectorJars (developed on version 1.0.0)
- SqlRender (developed on version 1.5.2)
Installation
Download
ROMOP can be installed easily from github using the devtools package:
library(devtools)
install_github("BenGlicksberg/ROMOP")
Alternatively, the package can be downloaded directly from the github page and installed by the following steps:
- Unzip ROMOP-master.zip
- R CMD INSTALL ROMOP-master
Please see the Setup section to properly configure the package to work.
Setup
Credentials
In accordance with best practices for storing sensitive information, credentials are not saved in plain text but in the .Renviron file. A formatted .Renviron file is provided with the package with the following fields to fill in:
driver = ""
host = ""
username = ""
password = ""
dbname = ""
port = "3306"
- driver (case insensitive): “mysql” for MySQL or (according to OHDSI DatabaseConnector package) “postgresql” for PostgreSQL, “oracle” for Oracle, “sql server” for Microsoft SQL Server, “redshift” for Amazon Redshift, “pdw” for Microsoft Parallel Data Warehouse, or “bigquery” for Google BigQuery.
- host (or server depending on database format)
- dbname: OMOP EHR database name (or schema depending on database format)
Note that this .Renviron file has to be in the same directory where R is launched. If already using an .Renviron file, add this information to it.
Checks
With credentials correctly configured, the package can be loaded. ROMOP will now check for 3 conditions to be met:
-
Check that the credentials exist and can be retrieved from .Renviron file:
requires driver, host, username, password, dbname, and port exist -
Check that connection to OMOP EHR server and database can be made:
uses the above credentails -
Check to ensure all required OMOP tables exist and contain (any) data:
the required tables are:
"concept","concept_ancestor","concept_relationship","condition_occurrence","death",
"device_exposure","drug_exposure","measurement","observation","person","procedure_occurrence","visit_occurrence"
- if any of the above tables are missing, a warning message will be produced and the package will not be able to load properly.
- if any of the above tables exist, but do not contain any data, a warning message will be produced but the package will still be able to function.
On start
Successfully pasing all checks will allow the user to begin using ROMOP.
- Set an output directory to use with the changeOutDirectory function (note: the default output directory will be declared on package load).
- Create/load the Data ontology (required to decode data types) using the makeDataOntology. For the first time running this package, the concept ontology will have to first be built, but if the store_ontology option is selected, the ontology will be saved as an .rds file for subsequent loading.
Functions
Utility
getDemographics
Description: Retrieves and formats patient demographic data from the person and death tables. Option to restrict to patientlist of interest.
Usage: ptDemo <- getDemographics(patient_list=NULL,declare=TRUE)
Arguments:
patient_list comma-separated string of patient ids
a provdied patientlist will restrict search to ids. NULL will
return demographic data for all available patients
declare TRUE/FALSE
if TRUE, outputs status and updates to the screen
Value:
Returns a data.table with demographic data: person_id, birth_datetime, age, Gender, Race, Ethnicity, death_date, Status (Alive/Deceased)
Details:
- patient_list should be in the following format: “patient_id_1, patient_id_2, …”
getEncounters
Description: Retrieves and formats patient encounter data from the visit_occurrence table. Requires patientlist input.
Usage: ptEncs <- getEncounters(patient_list,declare=TRUE)
Arguments:
patient_list comma-separated string of patient ids
searches for all encounter data for the patientlist inout.
declare TRUE/FALSE
if TRUE, outputs status and updates to the screen
Value:
Returns a data.table with encounter data: person_id, visit_occurrence_id, visit_start_datetime, visit_end_datetime, visit_source_value, visit_concept, visit_source_concept, admitting_concept, discharge_concept
Details:
- patient_list should be in the following format: “patient_id_1, patient_id_2, …”
getClinicalData
Description: Retrieves all relevant clinical data for individuals in a patientlist. Wrapper for domain-specific getData functions (which can also be used separately).
Usage: ptClinicalData <- getClinicalData(patient_list, declare=TRUE)
Arguments:
patient_list comma-separated string of patient ids
a provdied patientlist will restrict search to ids. NULL will
return demographic data for all available patients
declare TRUE/FALSE
if TRUE, outputs status and updates to the screen
Value:
Returns a list of data.tables stratified by domain type (e.g.,
ptClinicalData$Condition, ptClinicalData$Observation, etc…)
Details:
- patient_list should be in the following format: “patient_id_1, patient_id_2, …”
- getClinicalData calls domain-specific getData functions for the following domains: Observation, Conditi
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
