Dtms
Discrete-Time Multistate Models
Install / Use
/learn @christiandudel/DtmsREADME
dtms - An R package for discrete-time multistate models
<!-- badges: start --> <!-- badges: end -->Authors
Christian Dudel, dudel@demogr.mpg.de
Disclaimer
This package is currently undergoing development and many functions are experimental. The package comes with no warranty. The content of this repository will change in the future, and functions and features might be removed or changed without warning.
Acknowledgements
We thank Peng Li, Alessandro Feraldi, Aapo Hiilamo, Daniel Schneider, Donata Stonkute, Marcus Ebeling, Flavia Mazzeo, and Angelo Lorenti for helpful comments, suggestions, and code snippets. All errors remain our own.
Citation
If you use this package in your work, please use the following citation (or a variation):
Dudel, C. (2026). dtms: discrete-time multistate models in R. R package version 0.4.2, available at https://CRAN.R-project.org/package=dtms
Overview
The package dtms implements discrete-time multistate models in R. It
comes with many tools to analyze the results of multistate models. The
workflow mainly consists of estimating a discrete-time multistate model
and then applying methods for absorbing Markov chains.
Currently, the following features are implemented:
- Data handling: functions for reshaping and aggregating data, cleaning data, editing states, generating indicators of duration and number of occurrences of a state, indicators of censoring, descriptive information on different types of censoring, and other general descriptive statistics.
- Estimation of transition probabilities: nonparametric estimation; semiparametric estimation (VGAM), random effects and random intercepts (mclogit), and neural networks (nnet); all possible for constrained and unconstrained/fully interacted models. Functions for descriptive statistics on transition probabilities and for plotting them are also available.
- Markov chain methods: survivorship function, (partial) state/life expectancy, (partial) lifetime risk, (partial) distribution of occupation time, (partial) distribution of waiting time to first visit, (partial) distribution of waiting time to last exit, (partial) distribution of waiting time to absorption, based on (partial) distributions variance/standard deviation and median of occupancy time/first visit/last exit/time to absorption, Markov chains with rewards.
- Inference: analytic standard errors and variance-covariance matrix for transition probabilities; for all quantities inference using the resampling bootstrap, the block bootstrap, and a parametric bootstrap, supporting parallel computing.
- Other features: simulation of Markov chains using the package markovchain; survey weights (experimental); irregular time intervals (experimental).
- Examples: the package comes with two simulated data sets which are used for examples. These are described further below. The input data and code for the simulated data is available at https://github.com/christiandudel/dtms_data/.
The documentation provided below does currently not describe all features of the package.
Content
Currently, the following topics are covered in this documentation
- Installation
- General workflow and basic principles
- Model setup
- Preparing and handling data
- Estimating transition probabilities
- Markov chain methods
- Example 1: artificial data
- Data description
- Model setup
- Preparing and handling data
- Estimating transition probabilities
- Markov chain methods
- Example 2: simulated working trajectories
- Data description
- Analysis
- Variance estimation
- Irregular intervals
- Splines, random effects, and random slopes
- Combining dtms with other software
- Using dtms with secure data environments
- Using dtms with large data sets
- References
Installation
You can install dtms from CRAN:
install.packages("dtms")
As an alternative, you can also install the development version of
dtms from GitHub like this:
install.packages("remotes")
remotes::install_github("christiandudel/dtms")
The development version from GitHub might include fixes and changes which are not on CRAN yet, in particular if they are relatively minor.
General workflow and basic principles
The basic workflow consists of four main steps. First, the multistate model is defined in a general way which describes the states included in the model and its timescale (model setup). Second, the input data has to be reshaped and cleaned. Third, transition probabilities are estimated, either via multinomial logistic regression or with nonparametric methods. Fourth, Markov chain methods are applied to calculate statistics to describe the model. These steps and the corresponding functions are described below. Note that not all arguments of each function are described, and in many cases the help files for the individual functions contain useful additional information.
Model setup
To use the dtms package, in a first step discrete-time multistate
models are defined in a abstract way using three components: the names
of the transient states; the names of the absorbing states; and the
values of the time scale. Moreover, there are two additional components
which the user not necessarily needs to specify: the step length of the
timescale, and a separator.
To define these components, the function dtms() is used. It has an
argument for each of the components, but only three are necessary: the
names of the transient states, the names of the absorbing states, and
the values of the time scale. The step length of the timescale is
implicitly defined by the values of the timescale, and the separator
uses a default value which users likely do not want to change in a
majority of applications. In the first example provided further below,
the function dtms() is called like this:
## Load package
library(dtms)
## Define model: Absorbing and transient states, time scale
simple <- dtms(transient=c("A","B"),
absorbing="X",
timescale=0:19)
The arguments transient and absorbing take the names of the
transient and absorbing states, respectively, which are specified as
character vectors. In example above, there are two transient states
called A and B, and one absorbing state called X. Each model needs
at least one transient state and one absorbing state. The argument
timescale takes the values of the timescale which are specified with a
numeric vector. In this example, the time scale starts at 0 and stops at
19, with a step length of 1. The step length does not need to equal 1;
e.g., 0, 2, 4, 6, … would be fine. Moreover, using the argument
timestep, several values for the step length can be specified.
The separator is a character string used to construct what we call long
state names. Its default is _. Long state names consist of a
combination of the names of the transient states with values of the time
scale. They are used internally to handle that transition probabilities
might depend on values of the time scale. Long state names are never
constructed for absorbing states. For instance, if the transient states
are called A and B, the time scale can take on values 0, 1, and 2,
and the separator is _, then the following long state names will be
used: A_0, A_1, A_2, B_0, B_1, and B_2. Due to the temporal
ordering of states not all transitions between these states are
possible; e.g., it is not possible to transition from A_2 to B_0.
Preparing and handling data
The input data has to be panel data in long format. If your data is not in this shape, there are many tools already available in R which allow you to reshape it. An example of data in long format could look like this:
| idvar | timevar | statevar | X | Y | |:------|:--------|:---------|:----|:-----| | 1 | 0 | A | 2 | 1020 | | 1 | 1 | A | 2 | 1025 | | 1 | 2 | B | 2 | 1015 | | 1 | 3 | A | 2 | 1000 | | 2 | 0 | B | 1 | 2300 | | 2 | 1 | A | 1 | 2321 | | … | … | … | … | … |
The first variable, idvar, contains a unit identifier. The first four
rows of the data belong to unit 1. The variable timevar has the
values of the timescale. statevar shows the state each unit is
occupying at a given time. Ideally, the states are provided as character
strings; numeric values will also work. Factors are, however, currently
not supported. X and Z are additional covariates.
The dtms package provides tools to reshape this data into what we call
transition format. For the example data shown above the transition
format looks like this:
| idvar | timevar | fromvar | tovar | X | Y | |:------|:--------|:--------|:------|:----|:-----| | 1 | 0 | A | A | 2 | 1020 | | 1 | 1 | A | B | 2 | 1025 | | 1 | 2 | B | A | 2 | 1015 | | 2 | 0 | B | A | 1 | 2300 | | … | … | … | … | … | … |
Each row shows for each unit (idvar) and given time (timevar) the
state currently occupied (fromvar) and the state the unit will
transition to at the next value of the time scale (tovar). For unit 1,
the last observation in long format is at time 3. However, this is the
final observation and there is no transition to another state after
this. Thi
