STITCH
STITCH - Sequencing To Imputation Through Constructing Haplotypes
Install / Use
/learn @rwdavies/STITCHREADME
STITCH - Sequencing To Imputation Through Constructing Haplotypes
Current Version: 1.8.5 Release date: Jan 19, 2026
<!-- badges: start --> <!-- badges: end -->Changes in latest version
- add
make_genetic_map_file_from_vcffunction - a few bug fix
For details of past changes please see CHANGELOG.
STITCH is an R and C++ for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
For the old website, please see https://www.well.ox.ac.uk/~rwdavies/stitch.html
Table of contents
- Installation
- Quick start run
- Interactive start
- Options and help
- Benchmarks
- Examples
- License
- Citation
- Testing
- Bug reports
- Output format
- What method to run
- What method to run
- How to choose K and nGen
- About plotting
- About reference panels
Installation <a name="paragraph-installation"></a>
STITCH is available to download either through this github repository, or through conda.
## install.package("pak")
pak::pkg_install("rwdavies/STITCH/STITCH")
github <a name="paragraph-installation-github"></a>
A simple way to ensure dependencies are installed, and to install a release of STITCH is as follows. First, install R. Then, do the following
version=1.8.1
wget -O STITCH.zip https://github.com/rwdavies/STITCH/archive/refs/tags/${version}.zip ## or curl
unzip STITCH.zip && mv STITCH-${version} STITCH
cd STITCH && ./scripts/install-dependencies.sh
make install
You can confirm the installation worked using the quick start run below.
To install the latest development from Github, do the following:
git clone --recursive https://github.com/rwdavies/STITCH.git
cd STITCH && ./scripts/install-dependencies.sh
./scripts/build-and-install.R
Note that STITCH as run in the original paper used version 3 of R. However STITCH should work fine with either version 3 or version 4 of R. If you have dependency problems, you can easier post an issue on github, or try the conda installation below.
conda <a name="paragraph-installation-conda"></a>
STITCH (as r-stitch) can be installed using conda. Full tutorials can be found elsewhere, but briefly, something like this should work
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda install r-stitch -c defaults -c bioconda -c conda-forge
source activate
R -e 'library("STITCH")'
Note that currently the command like STITCH.R is not included with the bioconda installation, so from the command line, you can either run something like R -e 'library("STITCH"); STITCH(chr="chr19", bamlist="bamlist.txt", posfile="pos.txt", genfile="gen.txt", outputdir="./", K=4, nGen=100, nCores=1)', or clone the repo to get STITCH.R.
You can confirm the installation worked using the quick start run below.
Quick start run <a name="paragraph-quickstartrun"></a>
A quick test on real data can be performed using
# test on CFW mouse data
wget https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
# or curl -O https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
tar -xzvf STITCH_example_2016_05_10.tgz
./STITCH.R --chr=chr19 --bamlist=bamlist.txt --posfile=pos.txt --genfile=gen.txt --outputdir=./ --K=4 --nGen=100 --nCores=1
# if this works the file stitch.chr19.vcf.gz will be created
Interactive start <a name="paragraph-interactive-start"></a>
It is recommended you follow the instructions above, specifically ./scripts/install-dependencies.sh to install the dependencies, but if you run into problems, or want to install in a more manual fashion, the below should work
- Install R if not already installed.
- Install R dependencies parallel, Rcpp and RcppArmadillo from CRAN (using the "install.packages" option within R)
- Install bgzip and make it available to your PATH. This can be done using a system installation, or doing a local installation and either modifying the PATH variable using code like
export PATH=/path/to/dir-with-bgzip-binary/:$PATH, or through R, doing something likeSys.setenv( PATH = paste0("/path/to/dir-with-bgzip-binary/:", Sys.getenv("PATH"))). You'll know samtools is available if you run something likesystem("which bgzip")in R and get the path to bgzip - Install STITCH. First, download the latest STITCH tar.gz from the releases folder above (or more ideally the releases section of the github page). Second, install by opening R and using install.packages, giving install.packages the path to the downloaded STITCH tar.gz. This should install SeqLib automatically as well.
- Download example dataset STITCH_example_2016_05_10.tgz.
- Run STITCH. Open R, change your working directory using setwd() to the directory where the example tar.gz was unzipped, and then run
STITCH(tempdir = tempdir(), chr = "chr19", bamlist = "bamlist.txt", posfile = "pos.txt", genfile = "gen.txt", outputdir = paste0(getwd(), "/"), K = 4, nGen = 100, nCores = 1). Once complete, a VCF should appear in the current working directory named stitch.chr19.vcf.gz
Options and help <a name="paragraph-optionsandhelp"></a>
For a full list of options, in R, query ?STITCH, or from the command line, STITCH --help.
For a brief writeup of commonly used variables, see Options.md. To pass vectors using the command line, do something like STITCH.R --refillIterations='c(3,40)' or STITCH.R --reference_populations='c("CEU","GBR")'.
For help about errors, see the bug reports section.
Benchmarks <a name="paragraph-benchmarks"></a>
One can see some speed benchmarks in benchmarks/summarize_benchmarking.md
Examples <a name="paragraph-examples"></a>
In the examples directory, there is a script which contains examples using real mouse and human data. One can either run this interactively in R, or run all examples using ./examples/example.R.
License <a name="paragraph-license"></a>
STITCH and the code in this repo is available under a GPL3 license. For more information please see the LICENSE.
Citation <a name="paragraph-citation"></a>
Davies, R. W., Flint J, Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965-969 (2016)
Testing <a name="paragraph-testing"></a>
Tests in STITCH are split into unit or acceptance run using ./scripts/test-unit.sh and ./scripts/test-acceptance.sh. To run all tests use ./scripts/all-tests.sh, which also builds and installs a release version of STITCH. To make compilation go faster do something like export MAKE="make -j 8".
Bug reports <a name="paragraph-bugreports"></a>
The best way to get help is to either submit a bug report on GitHub or to consult the forum and mailing list
https://groups.google.com/forum/#!forum/stitch-imputation
For more detailed questions or other concerns please contact Robert Davies robertwilliamdavies@gmail.com
Output format <a name="paragraph-output-format"></a>
STITCH supports writing to both bgzipped vcfs and bgen, see output_format variable
What method to run <a name="paragraph-what-method"></a>
STITCH can run using one of three "methods" reflecting different underlying statistical and biological models: "diploid", which is the best general method and has the best statistical properties, but has run time proportional to the square of K and so may be slow for large, diverse populations; "pseudoHaploid", which uses statistical approximations that make it less accurate than the diploid method but has run time proportional to K, and so may be suitable for large, diverse populations; and "diploid-inbred", which assumes all samples are completely inbred and as such uses an underlying haplotype based imputation model with run time proportional to K. Note that each of these assumes subjects are diploid, and as such, all methods output diploid genotypes and probabilities.
Notes on the relationship between run time, RAM and performance <a name="paragraph-time-ram-memory"></a>
STITCH can be run on hundreds of thousands of samples, SNPs, or both. Default parameters are set to give good performance for situations somewhere in the middle. Depending on your application, you may want to tweak default parameters to change how STITCH is run and the relationship between run time, RAM and performance. Here is a brief summary of relevant parameters. See section below for note about K.
- outputSNPBlockSize: STITCH writes out results approximately this many SNPs at a time. Setting this to a larger value will speed u
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
