dtms - An R package for discrete-time multistate models

Authors

Disclaimer

This package is currently undergoing development and many functions are experimental. The package comes with no warranty. The content of this repository will change in the future, and functions and features might be removed or changed without warning.

Acknowledgements

We thank Peng Li, Alessandro Feraldi, Aapo Hiilamo, Daniel Schneider, Donata Stonkute, Marcus Ebeling, Flavia Mazzeo, and Angelo Lorenti for helpful comments, suggestions, and code snippets. All errors remain our own.

Citation

If you use this package in your work, please use the following citation (or a variation):

Dudel, C. (2026). dtms: discrete-time multistate models in R. R package version 0.4.2, available at https://CRAN.R-project.org/package=dtms

Overview

The package dtms implements discrete-time multistate models in R. It comes with many tools to analyze the results of multistate models. The workflow mainly consists of estimating a discrete-time multistate model and then applying methods for absorbing Markov chains.

Currently, the following features are implemented:

Data handling: functions for reshaping and aggregating data, cleaning data, editing states, generating indicators of duration and number of occurrences of a state, indicators of censoring, descriptive information on different types of censoring, and other general descriptive statistics.
Estimation of transition probabilities: nonparametric estimation; semiparametric estimation (VGAM), random effects and random intercepts (mclogit), and neural networks (nnet); all possible for constrained and unconstrained/fully interacted models. Functions for descriptive statistics on transition probabilities and for plotting them are also available.
Markov chain methods: survivorship function, (partial) state/life expectancy, (partial) lifetime risk, (partial) distribution of occupation time, (partial) distribution of waiting time to first visit, (partial) distribution of waiting time to last exit, (partial) distribution of waiting time to absorption, based on (partial) distributions variance/standard deviation and median of occupancy time/first visit/last exit/time to absorption, Markov chains with rewards.
Inference: analytic standard errors and variance-covariance matrix for transition probabilities; for all quantities inference using the resampling bootstrap, the block bootstrap, and a parametric bootstrap, supporting parallel computing.
Other features: simulation of Markov chains using the package markovchain; survey weights (experimental); irregular time intervals (experimental).
Examples: the package comes with two simulated data sets which are used for examples. These are described further below. The input data and code for the simulated data is available at https://github.com/christiandudel/dtms_data/.

The documentation provided below does currently not describe all features of the package.

Content

Currently, the following topics are covered in this documentation

Installation
General workflow and basic principles
- Model setup
- Preparing and handling data
- Estimating transition probabilities
- Markov chain methods
Example 1: artificial data
- Data description
- Model setup
- Preparing and handling data
- Estimating transition probabilities
- Markov chain methods
Example 2: simulated working trajectories
- Data description
- Analysis
- Variance estimation
Irregular intervals
Splines, random effects, and random slopes
Combining dtms with other software
Using dtms with secure data environments
Using dtms with large data sets
References

Installation

You can install dtms from CRAN:

install.packages("dtms")

As an alternative, you can also install the development version of dtms from GitHub like this:

install.packages("remotes")
remotes::install_github("christiandudel/dtms")

The development version from GitHub might include fixes and changes which are not on CRAN yet, in particular if they are relatively minor.

General workflow and basic principles

The basic workflow consists of four main steps. First, the multistate model is defined in a general way which describes the states included in the model and its timescale (model setup). Second, the input data has to be reshaped and cleaned. Third, transition probabilities are estimated, either via multinomial logistic regression or with nonparametric methods. Fourth, Markov chain methods are applied to calculate statistics to describe the model. These steps and the corresponding functions are described below. Note that not all arguments of each function are described, and in many cases the help files for the individual functions contain useful additional information.

Model setup

To use the dtms package, in a first step discrete-time multistate models are defined in a abstract way using three components: the names of the transient states; the names of the absorbing states; and the values of the time scale. Moreover, there are two additional components which the user not necessarily needs to specify: the step length of the timescale, and a separator.

To define these components, the function dtms() is used. It has an argument for each of the components, but only three are necessary: the names of the transient states, the names of the absorbing states, and the values of the time scale. The step length of the timescale is implicitly defined by the values of the timescale, and the separator uses a default value which users likely do not want to change in a majority of applications. In the first example provided further below, the function dtms() is called like this:

## Load package
library(dtms)
## Define model: Absorbing and transient states, time scale
simple <- dtms(transient=c("A","B"),
               absorbing="X",
               timescale=0:19)

The arguments transient and absorbing take the names of the transient and absorbing states, respectively, which are specified as character vectors. In example above, there are two transient states called A and B, and one absorbing state called X. Each model needs at least one transient state and one absorbing state. The argument timescale takes the values of the timescale which are specified with a numeric vector. In this example, the time scale starts at 0 and stops at 19, with a step length of 1. The step length does not need to equal 1; e.g., 0, 2, 4, 6, … would be fine. Moreover, using the argument timestep, several values for the step length can be specified.

The separator is a character string used to construct what we call long state names. Its default is _. Long state names consist of a combination of the names of the transient states with values of the time scale. They are used internally to handle that transition probabilities might depend on values of the time scale. Long state names are never constructed for absorbing states. For instance, if the transient states are called A and B, the time scale can take on values 0, 1, and 2, and the separator is _, then the following long state names will be used: A_0, A_1, A_2, B_0, B_1, and B_2. Due to the temporal ordering of states not all transitions between these states are possible; e.g., it is not possible to transition from A_2 to B_0.

Preparing and handling data

The input data has to be panel data in long format. If your data is not in this shape, there are many tools already available in R which allow you to reshape it. An example of data in long format could look like this:

| idvar | timevar | statevar | X | Y | |:------|:--------|:---------|:----|:-----| | 1 | 0 | A | 2 | 1020 | | 1 | 1 | A | 2 | 1025 | | 1 | 2 | B | 2 | 1015 | | 1 | 3 | A | 2 | 1000 | | 2 | 0 | B | 1 | 2300 | | 2 | 1 | A | 1 | 2321 | | … | … | … | … | … |

The first variable, idvar, contains a unit identifier. The first four rows of the data belong to unit 1. The variable timevar has the values of the timescale. statevar shows the state each unit is occupying at a given time. Ideally, the states are provided as character strings; numeric values will also work. Factors are, however, currently not supported. X and Z are additional covariates.

The dtms package provides tools to reshape this data into what we call transition format. For the example data shown above the transition format looks like this:

| idvar | timevar | fromvar | tovar | X | Y | |:------|:--------|:--------|:------|:----|:-----| | 1 | 0 | A | A | 2 | 1020 | | 1 | 1 | A | B | 2 | 1025 | | 1 | 2 | B | A | 2 | 1015 | | 2 | 0 | B | A | 1 | 2300 | | … | … | … | … | … | … |

Each row shows for each unit (idvar) and given time (timevar) the state currently occupied (fromvar) and the state the unit will transition to at the next value of the time scale (tovar). For unit 1, the last observation in long format is at time 3. However, this is the final observation and there is no transition to another state after this. Thi

Dtms

Install / Use

README