Medicalcoder
R package for working with medical coding schema and identifying records with specific comorbidities
Install / Use
/learn @dewittpe/MedicalcoderREADME
medicalcoder: A Unified and Longitudinally Aware Framework for ICD-Based Comorbidity Assessment <img src="man/figures/hex.svg" width="200px" align="right" alt = "medicalcoder hex logo"/>
<!-- badges: start --> <!-- badges: end -->medicalcoder is a lightweight, base-R package for working with ICD-9 and ICD-10 diagnosis and procedure codes. It implements widely used comorbidity algorithms such as Charlson, Elixhauser, and the Pediatric Complex Chronic Conditions (PCCC), supports longitudinal comorbidity flagging across encounters, and provides fast, dependency-free utilities to look up, validate, and manipulate ICD codes. Designed for portability and reproducibility, the package avoids external dependencies—requiring only R ≥ 3.5.0—yet offers a rich set of curated ICD code libraries from the United States' Centers for Medicare and Medicaid Services (CMS), Centers for Disease Control (CDC), and the World Health Organization (WHO).
The package balances performance with elegance: its internal caching, efficient joins, and compact data structures make it practical for large-scale health data analyses, while its clean design makes it easy to extend or audit. Whether you need to flag comorbidities, explore ICD hierarchies, or standardize clinical coding workflows, medicalcoder provides a robust, transparent foundation for research and applied work in biomedical informatics.
The primary objectives of medicalcoder are:
-
Fully self-contained
-
Minimal Dependencies
- No dependencies other than base R.
- Requires R version ≥ 3.5.0 due to a change in data serialization. R 3.5.0 was released in April 2018. The initial public release of medicalcoder was in 2025.
- Several packages are listed in the Suggests section of the
DESCRIPTIONfile. These are only needed for building vignettes, other documentation, and testing. They are not required to install the package.
-
No Imports
- medicalcoder does not import any non-base namespaces. This improves ease of maintenance and usability.
- Suggested packages are needed only for development work and building vignettes. They are not required for installation or use.
-
That said, there are non-trivial performance gains when passing a
data.tableto thecomorbidities()function. Passing atibbleis typically faster than a basedata.framebut slower than adata.table. (See benchmarking). -
Internal lookup tables
- All required data are included in the package. If you have the
.tar.gzsource file and R ≥ 3.5.0, that is all you need to install and use the package.
- All required data are included in the package. If you have the
-
-
Efficient implementation of multiple comorbidity algorithms
- Implements three general algorithms, each with multiple variants. Details are provided below.
- Supports flagging of subconditions within PCCC.
- Supports longitudinal flagging of comorbidities. medicalcoder will flag comorbidities based on present-on-admission indicators for the current encounter and can look back in time for a patient to flag a comorbidity if reported in a prior encounter. See examples.
-
Tools for working with ICD codes
- Lookup tables.
- Ability to work with both full codes (ICD codes with decimal points) and compact codes (ICD codes with decimal points omitted).
Why use medicalcoder
There are several tools for working with ICD codes and comorbidity algorithms. medicalcoder provides novel features:
- Unified access to multiple comorbidity algorithms through a single function:
comorbidities(). - Support for both ICD-9 and ICD-10 diagnostic and procedure codes.
- Longitudinal patient-level comorbidity flagging using present-on-admission indicators.
- Fully self-contained package (no external dependencies).
Install
From CRAN
install.packages("medicalcoder")
From GitHub
remotes::install_github("dewittpe/medicalcoder")
From source
If you have the .tar.gz file for version X.Y.Z, e.g., medicalcoder_X.Y.Z.tar.gz
you can install from within R via:
install.packages(pkgs = "medicalcoder_X.Y.Z.tar.gz", repos = NULL, type = "source")
From the command line:
R CMD INSTALL medicalcoder_X.Y.Z.tar.gz
Quick Start:
Example Data
Input data for comorbidities() is expected to be in a 'long' format. Each row
is one code with additional columns for patient and/or encounter id. There are
two example data sets in the package: mdcr and mdcr_longitudinal.
data(mdcr, mdcr_longitudinal, package = "medicalcoder")
The mdcr data set consists of 319 856 rows.
Each row contains one ICD code (code). The column icdv denotes
each code as ICD-9 or ICD-10, and the dx column denotes diagnostic (1) or
procedure (0) code. This data set contains diagnostic and procedure codes for
38 262 patients.
str(mdcr)
#> 'data.frame': 319856 obs. of 4 variables:
#> $ patid: int 71412 71412 71412 71412 71412 17087 64424 64424 84361 84361 ...
#> $ icdv : int 9 9 9 9 9 10 9 9 9 9 ...
#> $ code : chr "99931" "75169" "99591" "V5865" ...
#> $ dx : int 1 1 1 1 1 1 1 0 1 1 ...
head(mdcr)
#> patid icdv code dx
#> 1 71412 9 99931 1
#> 2 71412 9 75169 1
#> 3 71412 9 99591 1
#> 4 71412 9 V5865 1
#> 5 71412 9 V427 1
#> 6 17087 10 V441 1
The mdcr_longitudinal data set is distinct from the mdcr data set. The major
difference is that this data set contains only diagnostic codes and there are
only 3 patients. The date column
denotes the date of the diagnosis and allows us to look at changes in
comorbidities over time.
str(mdcr_longitudinal)
#> 'data.frame': 60 obs. of 4 variables:
#> $ patid: int 9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 9663901 ...
#> $ date : IDate, format: "2016-03-18" "2016-03-24" ...
#> $ icdv : int 10 10 10 10 10 10 10 10 10 10 ...
#> $ code : chr "Z77.22" "IMO0002" "V87.7XXA" "J95.851" ...
head(mdcr_longitudinal)
#> patid date icdv code
#> 1 9663901 2016-03-18 10 Z77.22
#> 2 9663901 2016-03-24 10 IMO0002
#> 3 9663901 2016-03-24 10 V87.7XXA
#> 4 9663901 2016-03-25 10 J95.851
#> 5 9663901 2016-03-30 10 IMO0002
#> 6 9663901 2016-03-30 10 Z93.0
Comorbidity Algorithms
There are three comorbidity methods, each with several variants, available in
medicalcoder. All of which are accessible through the comorbidities()
method by specifying the method argument.
General examples and explanations for when conditions are flagged are in the vignette
vignette(topic = "comorbidities", package = "medicalcoder")
Pediatric Complex Chronic Conditions (PCCC)
-
Version 2.0
- BMC Pediatrics: Feudtner et al. (2014)
- Consistent with R package pccc
-
Version 2.1
- Updated code base with the same assessment algorithm as version 2.0.
-
Version 3.0
- JAMA Network Open: Feinstein et al. (2024)
- Children's Hospital Association Toolkit
-
Version 3.1
- Updated code base with same assessment algorithm as version 3.0.
-
All variants can flag conditions and subconditions.
# PCCC v2.1 and v3.1 example
library(medicalcoder)
cmrbs2 <-
comorbidities(
data = mdcr,
id.vars = "patid", # can use more than one column, e.g., site, patient, encounter
icd.codes = "code",
dx.var = "dx",
poa = 1, # consider all codes to be present on admission
method = "pccc_v2.1"
)
cmrbs3 <-
comorbidities(
data = mdcr,
id.vars = "patid",
icd.codes = "code",
dx.var = "dx",
poa = 1, # consider all codes to be present on admission
method = "pccc_v3.1"
)
str(cmrbs2, max.level = 0)
#> Classes 'medicalcoder_comorbidities' and 'data.frame': 38262 obs. of 16 variables:
#> - attr(*, "method")= chr "pccc_v2.1"
#> - attr(*, "id.vars")= chr "patid"
#> - attr(*, "flag.method")= chr "current"
str(cmrbs3, max.level = 0)
#> Classes 'medicalcoder_comorbidities' and 'data.frame': 38262 obs. of 49 variables:
#> - attr(*, "method")= chr "pccc_v3.1"
#> - attr(*, "id.vars")= chr "patid"
#> - attr(*, "flag.method")= chr "current"
A summary of the flagged conditions is generated with a call to summary().
s2 <- summary(cmrbs2)
str(s2)
For pccc_v2.0 and pccc_v2.1 the data.frame returned by summary()
reports the count (unique id.vars with the condition) and percentage.
s3 <- summary(cmrbs3)
st
