STITCH - Sequencing To Imputation Through Constructing Haplotypes

Current Version: 1.8.5 Release date: Jan 19, 2026

Changes in latest version

add make_genetic_map_file_from_vcf function
a few bug fix

For details of past changes please see CHANGELOG.

STITCH is an R and C++ for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

For the old website, please see https://www.well.ox.ac.uk/~rwdavies/stitch.html

Installation
1. github
2. conda
Quick start run
Interactive start
Options and help
Benchmarks
Examples
License
Citation
Testing
Bug reports
Output format
What method to run
What method to run
How to choose K and nGen
About plotting
About reference panels

Installation <a name="paragraph-installation"></a>

STITCH is available to download either through this github repository, or through conda.

## install.package("pak")
pak::pkg_install("rwdavies/STITCH/STITCH")

github <a name="paragraph-installation-github"></a>

A simple way to ensure dependencies are installed, and to install a release of STITCH is as follows. First, install R. Then, do the following

version=1.8.1
wget -O STITCH.zip https://github.com/rwdavies/STITCH/archive/refs/tags/${version}.zip ## or curl
unzip STITCH.zip && mv STITCH-${version} STITCH
cd STITCH && ./scripts/install-dependencies.sh
make install

You can confirm the installation worked using the quick start run below.

To install the latest development from Github, do the following:

git clone --recursive https://github.com/rwdavies/STITCH.git
cd STITCH && ./scripts/install-dependencies.sh
./scripts/build-and-install.R

Note that STITCH as run in the original paper used version 3 of R. However STITCH should work fine with either version 3 or version 4 of R. If you have dependency problems, you can easier post an issue on github, or try the conda installation below.

conda <a name="paragraph-installation-conda"></a>

STITCH (as r-stitch) can be installed using conda. Full tutorials can be found elsewhere, but briefly, something like this should work

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda install r-stitch -c defaults -c bioconda -c conda-forge
source activate
R -e 'library("STITCH")'

Note that currently the command like STITCH.R is not included with the bioconda installation, so from the command line, you can either run something like R -e 'library("STITCH"); STITCH(chr="chr19", bamlist="bamlist.txt", posfile="pos.txt", genfile="gen.txt", outputdir="./", K=4, nGen=100, nCores=1)', or clone the repo to get STITCH.R.

You can confirm the installation worked using the quick start run below.

Quick start run <a name="paragraph-quickstartrun"></a>

A quick test on real data can be performed using

# test on CFW mouse data
wget https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
# or curl -O https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
tar -xzvf STITCH_example_2016_05_10.tgz
./STITCH.R --chr=chr19 --bamlist=bamlist.txt --posfile=pos.txt --genfile=gen.txt --outputdir=./ --K=4 --nGen=100 --nCores=1
# if this works the file stitch.chr19.vcf.gz will be created

Interactive start <a name="paragraph-interactive-start"></a>

It is recommended you follow the instructions above, specifically ./scripts/install-dependencies.sh to install the dependencies, but if you run into problems, or want to install in a more manual fashion, the below should work

Install R if not already installed.
Install R dependencies parallel, Rcpp and RcppArmadillo from CRAN (using the "install.packages" option within R)
Install bgzip and make it available to your PATH. This can be done using a system installation, or doing a local installation and either modifying the PATH variable using code like export PATH=/path/to/dir-with-bgzip-binary/:$PATH, or through R, doing something like Sys.setenv( PATH = paste0("/path/to/dir-with-bgzip-binary/:", Sys.getenv("PATH"))). You'll know samtools is available if you run something like system("which bgzip") in R and get the path to bgzip
Install STITCH. First, download the latest STITCH tar.gz from the releases folder above (or more ideally the releases section of the github page). Second, install by opening R and using install.packages, giving install.packages the path to the downloaded STITCH tar.gz. This should install SeqLib automatically as well.
Download example dataset STITCH_example_2016_05_10.tgz.
Run STITCH. Open R, change your working directory using setwd() to the directory where the example tar.gz was unzipped, and then run STITCH(tempdir = tempdir(), chr = "chr19", bamlist = "bamlist.txt", posfile = "pos.txt", genfile = "gen.txt", outputdir = paste0(getwd(), "/"), K = 4, nGen = 100, nCores = 1). Once complete, a VCF should appear in the current working directory named stitch.chr19.vcf.gz

Options and help <a name="paragraph-optionsandhelp"></a>

For a full list of options, in R, query ?STITCH, or from the command line, STITCH --help.

For a brief writeup of commonly used variables, see Options.md. To pass vectors using the command line, do something like STITCH.R --refillIterations='c(3,40)' or STITCH.R --reference_populations='c("CEU","GBR")'.

For help about errors, see the bug reports section.

Benchmarks <a name="paragraph-benchmarks"></a>

One can see some speed benchmarks in benchmarks/summarize_benchmarking.md

Examples <a name="paragraph-examples"></a>

In the examples directory, there is a script which contains examples using real mouse and human data. One can either run this interactively in R, or run all examples using ./examples/example.R.

License <a name="paragraph-license"></a>

STITCH and the code in this repo is available under a GPL3 license. For more information please see the LICENSE.

Citation <a name="paragraph-citation"></a>

Davies, R. W., Flint J, Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965-969 (2016)

Testing <a name="paragraph-testing"></a>

Tests in STITCH are split into unit or acceptance run using ./scripts/test-unit.sh and ./scripts/test-acceptance.sh. To run all tests use ./scripts/all-tests.sh, which also builds and installs a release version of STITCH. To make compilation go faster do something like export MAKE="make -j 8".

Bug reports <a name="paragraph-bugreports"></a>

The best way to get help is to either submit a bug report on GitHub or to consult the forum and mailing list

https://groups.google.com/forum/#!forum/stitch-imputation

For more detailed questions or other concerns please contact Robert Davies robertwilliamdavies@gmail.com

Output format <a name="paragraph-output-format"></a>

STITCH supports writing to both bgzipped vcfs and bgen, see output_format variable

What method to run <a name="paragraph-what-method"></a>

STITCH can run using one of three "methods" reflecting different underlying statistical and biological models: "diploid", which is the best general method and has the best statistical properties, but has run time proportional to the square of K and so may be slow for large, diverse populations; "pseudoHaploid", which uses statistical approximations that make it less accurate than the diploid method but has run time proportional to K, and so may be suitable for large, diverse populations; and "diploid-inbred", which assumes all samples are completely inbred and as such uses an underlying haplotype based imputation model with run time proportional to K. Note that each of these assumes subjects are diploid, and as such, all methods output diploid genotypes and probabilities.

Notes on the relationship between run time, RAM and performance <a name="paragraph-time-ram-memory"></a>

STITCH can be run on hundreds of thousands of samples, SNPs, or both. Default parameters are set to give good performance for situations somewhere in the middle. Depending on your application, you may want to tweak default parameters to change how STITCH is run and the relationship between run time, RAM and performance. Here is a brief summary of relevant parameters. See section below for note about K.

outputSNPBlockSize: STITCH writes out results approximately this many SNPs at a time. Setting this to a larger value will speed u

STITCH

Install / Use

README