SkillAgentSearch skills...

STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes

Install / Use

/learn @rwdavies/STITCH
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

STITCH - Sequencing To Imputation Through Constructing Haplotypes

Current Version: 1.8.5 Release date: Jan 19, 2026

<!-- badges: start -->

Build Status R-CMD-check install with bioconda

<!-- badges: end -->

Changes in latest version

  • add make_genetic_map_file_from_vcf function
  • a few bug fix

For details of past changes please see CHANGELOG.

STITCH is an R and C++ for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.

For the old website, please see https://www.well.ox.ac.uk/~rwdavies/stitch.html

Table of contents

  1. Installation
    1. github
    2. conda
  2. Quick start run
  3. Interactive start
  4. Options and help
  5. Benchmarks
  6. Examples
  7. License
  8. Citation
  9. Testing
  10. Bug reports
  11. Output format
  12. What method to run
  13. What method to run
  14. How to choose K and nGen
  15. About plotting
  16. About reference panels

Installation <a name="paragraph-installation"></a>

STITCH is available to download either through this github repository, or through conda.

## install.package("pak")
pak::pkg_install("rwdavies/STITCH/STITCH")

github <a name="paragraph-installation-github"></a>

A simple way to ensure dependencies are installed, and to install a release of STITCH is as follows. First, install R. Then, do the following

version=1.8.1
wget -O STITCH.zip https://github.com/rwdavies/STITCH/archive/refs/tags/${version}.zip ## or curl
unzip STITCH.zip && mv STITCH-${version} STITCH
cd STITCH && ./scripts/install-dependencies.sh
make install

You can confirm the installation worked using the quick start run below.

To install the latest development from Github, do the following:

git clone --recursive https://github.com/rwdavies/STITCH.git
cd STITCH && ./scripts/install-dependencies.sh
./scripts/build-and-install.R

Note that STITCH as run in the original paper used version 3 of R. However STITCH should work fine with either version 3 or version 4 of R. If you have dependency problems, you can easier post an issue on github, or try the conda installation below.

conda <a name="paragraph-installation-conda"></a>

STITCH (as r-stitch) can be installed using conda. Full tutorials can be found elsewhere, but briefly, something like this should work

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda install r-stitch -c defaults -c bioconda -c conda-forge
source activate
R -e 'library("STITCH")'

Note that currently the command like STITCH.R is not included with the bioconda installation, so from the command line, you can either run something like R -e 'library("STITCH"); STITCH(chr="chr19", bamlist="bamlist.txt", posfile="pos.txt", genfile="gen.txt", outputdir="./", K=4, nGen=100, nCores=1)', or clone the repo to get STITCH.R.

You can confirm the installation worked using the quick start run below.

Quick start run <a name="paragraph-quickstartrun"></a>

A quick test on real data can be performed using

# test on CFW mouse data
wget https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
# or curl -O https://www.well.ox.ac.uk/~rwdavies/ancillary/STITCH_example_2016_05_10.tgz
tar -xzvf STITCH_example_2016_05_10.tgz
./STITCH.R --chr=chr19 --bamlist=bamlist.txt --posfile=pos.txt --genfile=gen.txt --outputdir=./ --K=4 --nGen=100 --nCores=1
# if this works the file stitch.chr19.vcf.gz will be created

Interactive start <a name="paragraph-interactive-start"></a>

It is recommended you follow the instructions above, specifically ./scripts/install-dependencies.sh to install the dependencies, but if you run into problems, or want to install in a more manual fashion, the below should work

  1. Install R if not already installed.
  2. Install R dependencies parallel, Rcpp and RcppArmadillo from CRAN (using the "install.packages" option within R)
  3. Install bgzip and make it available to your PATH. This can be done using a system installation, or doing a local installation and either modifying the PATH variable using code like export PATH=/path/to/dir-with-bgzip-binary/:$PATH, or through R, doing something like Sys.setenv( PATH = paste0("/path/to/dir-with-bgzip-binary/:", Sys.getenv("PATH"))). You'll know samtools is available if you run something like system("which bgzip") in R and get the path to bgzip
  4. Install STITCH. First, download the latest STITCH tar.gz from the releases folder above (or more ideally the releases section of the github page). Second, install by opening R and using install.packages, giving install.packages the path to the downloaded STITCH tar.gz. This should install SeqLib automatically as well.
  5. Download example dataset STITCH_example_2016_05_10.tgz.
  6. Run STITCH. Open R, change your working directory using setwd() to the directory where the example tar.gz was unzipped, and then run STITCH(tempdir = tempdir(), chr = "chr19", bamlist = "bamlist.txt", posfile = "pos.txt", genfile = "gen.txt", outputdir = paste0(getwd(), "/"), K = 4, nGen = 100, nCores = 1). Once complete, a VCF should appear in the current working directory named stitch.chr19.vcf.gz

Options and help <a name="paragraph-optionsandhelp"></a>

For a full list of options, in R, query ?STITCH, or from the command line, STITCH --help.

For a brief writeup of commonly used variables, see Options.md. To pass vectors using the command line, do something like STITCH.R --refillIterations='c(3,40)' or STITCH.R --reference_populations='c("CEU","GBR")'.

For help about errors, see the bug reports section.

Benchmarks <a name="paragraph-benchmarks"></a>

One can see some speed benchmarks in benchmarks/summarize_benchmarking.md

Examples <a name="paragraph-examples"></a>

In the examples directory, there is a script which contains examples using real mouse and human data. One can either run this interactively in R, or run all examples using ./examples/example.R.

License <a name="paragraph-license"></a>

STITCH and the code in this repo is available under a GPL3 license. For more information please see the LICENSE.

Citation <a name="paragraph-citation"></a>

Davies, R. W., Flint J, Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965-969 (2016)

Testing <a name="paragraph-testing"></a>

Tests in STITCH are split into unit or acceptance run using ./scripts/test-unit.sh and ./scripts/test-acceptance.sh. To run all tests use ./scripts/all-tests.sh, which also builds and installs a release version of STITCH. To make compilation go faster do something like export MAKE="make -j 8".

Bug reports <a name="paragraph-bugreports"></a>

The best way to get help is to either submit a bug report on GitHub or to consult the forum and mailing list

https://groups.google.com/forum/#!forum/stitch-imputation

For more detailed questions or other concerns please contact Robert Davies robertwilliamdavies@gmail.com

Output format <a name="paragraph-output-format"></a>

STITCH supports writing to both bgzipped vcfs and bgen, see output_format variable

What method to run <a name="paragraph-what-method"></a>

STITCH can run using one of three "methods" reflecting different underlying statistical and biological models: "diploid", which is the best general method and has the best statistical properties, but has run time proportional to the square of K and so may be slow for large, diverse populations; "pseudoHaploid", which uses statistical approximations that make it less accurate than the diploid method but has run time proportional to K, and so may be suitable for large, diverse populations; and "diploid-inbred", which assumes all samples are completely inbred and as such uses an underlying haplotype based imputation model with run time proportional to K. Note that each of these assumes subjects are diploid, and as such, all methods output diploid genotypes and probabilities.

Notes on the relationship between run time, RAM and performance <a name="paragraph-time-ram-memory"></a>

STITCH can be run on hundreds of thousands of samples, SNPs, or both. Default parameters are set to give good performance for situations somewhere in the middle. Depending on your application, you may want to tweak default parameters to change how STITCH is run and the relationship between run time, RAM and performance. Here is a brief summary of relevant parameters. See section below for note about K.

  • outputSNPBlockSize: STITCH writes out results approximately this many SNPs at a time. Setting this to a larger value will speed u

Related Skills

View on GitHub
GitHub Stars86
CategoryDevelopment
Updated1mo ago
Forks22

Languages

C

Security Score

95/100

Audited on Feb 24, 2026

No findings