SkillAgentSearch skills...

Slendr

Population genetic simulations in R

Install / Use

/learn @bodkan/Slendr
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- README.md is generated from README.Rmd. Edit that file instead. -->

slendr: a simulation framework for population genetics

<!-- badges: start -->

CRAN-version CRAN-downloads R-CMD-check Binder Coverage status

<!-- badges: end -->

Overview

slendr is a toolbox for running population genomic simulations entirely in R. Our original motivation for developing it was to provide a framework for simulating spatially-explicit genomic data on real geographic landscapes, however, it has grown to be much more than that since then: slendr can now simulate data from traditional, non-spatial demographic models using msprime as a simulation engine, and it even supports selection scenarios via user-defined SLiM estension snippets. In addition to defining models and simulation data from them, slendr also provides a set of functions for computing population genetic statistics, utilizing the tskit module for underlying computation.

This page briefly summarizes slendr's most important features. A more detailed description of the slendr architecture and an extensive set of practical code examples can be found in our paper in the PCI journal and on our website.


Citing slendr

The slendr paper is now published in the Peer Community Journal!

If you use slendr in your work, please cite it as:

Petr, Martin; Haller, Benjamin C.; Ralph, Peter L.; Racimo, Fernando. slendr: a framework for spatio-temporal population genomic simulations on geographic landscapes. Peer Community Journal, Volume 3 (2023), article no. e121. doi: 10.24072/pcjournal.354.

Citations help me justify further development and fixing bugs! Thank you!


Main features

Here is a brief summary of slendr's most important features. The R package allows you to:

  • Program demographic models, including population splits, population size changes, and gene-flow events using an extremely simple declarative language entirely in R (see this vignette for an example of the model-definition interface). Even complex models can be written with only a very little code and require only a bare minimum of R programming knowledge (the only thing the user needs to know is how to call an R function and what does an R data frame look like).

  • Execute slendr models using efficient, tailor-made SLiM or msprime simulation scripts which are bundled with the R package. Both of these simulation engines save outputs in the form of an efficient tree-sequence data structure. No SLiM or msprime programming is needed!

  • Read and manipulate tree-sequence data and compute population genetic statistics from them via slendr's built-in R interface to the tree-sequence library tskit. slendr provides a library of functions for computing population genetic statistics using an easy-to-use R interface, without having to convert files to other formats (VCF, EIGENSTRAT) for analysis in different software.

  • Although it originally assumed neutrality of simulated models, slendr now provides a simple extension mechanism for customization of SLiM-based models using user-defined SLiM code. Simulations of selection models of arbitrary complexity is now entirely possible.

  • Schedule sampling events which specify how many individuals' genomes, from which populations, and at which times (optionally, at which locations) should be recorded by a given simulation engine (SLiM or msprime) in the simulated tree-sequence object.

  • Encode complex models of population movements on a landscape (see a brief example of such model here, and a more extended explanation in this tutorial).

  • Simulate these dynamic spatial demographic models using SLiM's continuous-space simulation capabilities directly in R (again, no SLiM programming required). The results of such simulations are saved as tree sequences and can be processed and analysed using standard R geospatial data analysis libraries. This is because slendr performs the conversion of tree sequence tables to the appropriate spatial R data type automatically.

  • Specify within-population individual dispersal dynamics from the R interface by leveraging SLiM's individual interaction parameters implemented in the SLiM back-end script.

Utilizing the flexibility of R with its wealth of libraries for statistics, geospatial analysis and graphics, and combining it with the power of population genetic simulation frameworks SLiM and msprime, the slendr R package makes it possible to write entire workflows without the need to leave the R environment.


Testing the R package in an online RStudio session

You can open an RStudio session and test examples from the vignettes directly in your web browser by clicking this button (no installation is needed!):

Binder

In case the RStudio instance appears to be starting very slowly, please be patient (Binder is a freely available service with limited computational resources provided by the community). If Binder crashes, try reloading the web page, which will restart the cloud session.

Once you get a browser-based RStudio session, you can navigate to the vignettes/ directory and test the examples on your own!


Installation

slendr is now available CRAN which means that you can install it simply by entering install.packages("slendr") into your R console.

If you would like to test the latest features of the software (perhaps because you need some bug fixes), you can install it with devtools::install_github("bodkan/slendr") (note that this requires the R package devtools).


Traditional, non-spatial example

Although the primary motivation for developing slendr has been to provide an easy interface for encoding geographically-explicit population genetic models, it turned out to be an amazing tool to program traditional Wright-Fisher population genetic models. For instance, here's a very quick demonstration of how little R code is needed to generate 100 Mb sequence from a simple model of Neanderthal and Denisovan introgression. If you want to read more about this aspect of slendr, please take a look at this and this vignette.

library(slendr)
init_env()

anc_all <- population("ancestor_all", time = 700e3, N = 10000, remove = 640e3)
afr <- population("AFR", parent = anc_all, time = 650e3, N = 10000)
anc_arch <- population("ancestor_archaics", parent = anc_all, time = 650e3, N = 10000, remove = 390e3)
nea <- population("NEA", parent = anc_arch, time = 400e3, N = 2000, remove = 30e3)
den <- population("DEN", parent = anc_arch, time = 400e3, N = 2000, remove = 30e3)
nonafr <- population("nonAFR", parent = afr, time = 100e3, N = 3000, remove = 39e3)
eur <- population("EUR", parent = nonafr, time = 45e3, N = 5000)
pap <- population("PAP", parent = nonafr, time = 45e3, N = 5000)

gf <- list(
  gene_flow(from = nea, to = nonafr, rate = 0.02, start = 55000, end = 50000),
  gene_flow(from = den, to = pap, rate = 0.05, start = 35000, end = 30000)
)
#> Warning: The argument `rate` is about to be deprecated because of its confusing
#> naming and behavior. If you want to specify the rate of migration per
#> unit of time, please use the new argument `migration_rate`. If you want
#> to specify the total amount of ancestry which the `to` population should
#> received from the `from` population, use the new argument `proportion`
#> (this corresponds to the original interpretation of the deprecated `rate`
#> argument, and a simple replacement of `rate` with `proportion` will thus
#> retain the original meaning of your code all).
#> Warning: The argument `r

Related Skills

View on GitHub
GitHub Stars63
CategoryDevelopment
Updated2mo ago
Forks7

Languages

R

Security Score

85/100

Audited on Jan 27, 2026

No findings