SkillAgentSearch skills...

Indelible

Structural Variation breakpoint discovery via adaptive learning

Install / Use

/learn @HurlesGroupSanger/Indelible
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

InDelible: Genomic Structural Variant Caller by Adaptive Training

Table of Contents

  1. Development Team
  2. About InDelible
  3. Installation
  4. Usage
  5. Output

Development Team

Alejandro Sifrim (Creator, Developer)

Eugene Gardner (Developer)

Diana Rajan (Experimental validation)

Elena Prigmore (Experimental validation)

Sarah Lindsay (Experimental validation)

Matthew Hurles (Group Leader)

We are affiliated with the Wellcome Sanger Institute, Cambridge, United Kingdom

About InDelible

Version 1.1.3

Abstract

Structural Variations (SVs) are genetic differences greater than 50bps in size and have been identified as causative of diseases such as rare developmental disorders (DD). Patients presenting with DD are typically referred for chromosomal microarray (CMA) to identify large copy-number variants or for single gene or panel tests based on presenting symptoms. Increasingly, patients for which a diagnosis is not forthcoming are additionally referred for whole exome sequencing (WES) which is used to identify single nucleotide variants or small insertions/deletions (InDels). This leaves a class of intermediate size deletions that are technically difficult to identify, hence patients with SVs undetectable by conventional CMA or WES analysis often remain undiagnosed. To this end, we have developed a novel SV discovery approach, ‘InDelible’, and applied it to 13,438 probands with severe DD recruited as part of the Deciphering Developmental Disorders (DDD) study. InDelible queries WES data to identify split read clusters within a gene set of interest and performs variant quality-control utilizing an active learning methodology. Using InDelible we were able to find 59 previously undetected variants among DDD probands, of which 89.8% (53) were in genes previously associated with DD, had phenotypes which putatively match the conditions of the patient in which they were found, and were thus reported to the referring clinician. InDelible was particularly effective at ascertaining variants between 20-500 bps in size, and increased the total number of causal variants identified by DDD in this size range by 46.4% (n = 26 variants). Of particular interest were seven confirmed de novo SVs in the gene MECP2; these variants represent 35.0% of all de novo PTVs in MECP2 among DDD patients and represent an enrichment of large, causal variants compared to other DD-associated genes. InDelible provides a rapid framework for the discovery of likely pathogenic SVs and has the potential to improve the diagnostic yield of WES.

What is InDelible for?

InDelible was originally designed for the ascertainment of large InDels (>20bp) and Structural Variants from Whole Exome Sequencing (WES) data for which other ascertainment approaches have proven refractory. To reduce the search space, InDelible also makes use of a "target gene list" containing genes that the end user is interested in.

Potential Limitations of InDelible

InDelible is likely to be adaptable to a wide range of sequencing technologies, genetic architectures, and disease/conditions. However, we have not specifically tested InDelible with:

  1. Whole Genome Sequencing (WGS) Data

    • While InDelible should, in theory, be able to identify variants from WGS, the number of reads that InDelible has to process would likely result in significantly increased run-times.
  2. Other Diseases

    • InDelible was designed as part of the Deciphering Developmental Disorders (DDD) study and, as such, was targeted to genes known to contribute to dominant developmental disorders. We have provided for the possibility that end-users may want to identify variants within other gene sets, but have not specifically performed variant discovery among a patient cohort to test this functionality.
  3. Other Genetic Architectures

    • While causal de novo variants play an important role in the genetic architecture of severe DD, recessive causes of DD are likewise a major contributor. While we have focused our primary analysis on de novo variation to try to maximise our discovery potential, we do not preclude the possibility that InDelible could also be used to identify homozygous or compound heterozygous variants which could be plausibly linked to a patient's symptoms.
  4. Higher Allele Frequency Variants

    • Likewise, since InDelible was developed as part of DDD, where the largest contributor to patient symptoms is high-penetrance de novo variation, we focused our filtering to such variants. However, InDelible does report all high-confidence variants identified for each patient and could, in theory, be used to identify population level variation.

How to Cite InDelible

Peer-Reviewed Open Access Manuscript

Eugene J. Gardner, Alejandro Sifrim, Sarah J. Lindsay, Elena Prigmore, Diana Rajan, Petr Danecek, Giuseppe Gallone, Ruth Y. Eberhardt, Hilary C. Martin, Caroline F. Wright, David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles. Detecting cryptic clinically relevant structural variation in exome-sequencing data increases diagnostic yield for developmental disorders. The American Journal of Human Genetics (2021).

https://www.cell.com/ajhg/fulltext/S0002-9297(21)00346-3

Preprint

Eugene J. Gardner, Alejandro Sifrim, Sarah J. Lindsay, Elena Prigmore, Diana Rajan, Petr Danecek, Giuseppe Gallone, Ruth Y. Eberhardt, Hilary C. Martin, Caroline F. Wright, David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles. Detecting cryptic clinically-relevant structural variation in exome sequencing data increases diagnostic yield for developmental disorders. medRxiv (2021).

https://www.medrxiv.org/content/10.1101/2020.10.02.20194241v2

For code used to generate figures/text for these manuscripts, please see the following repository:

https://github.com/HurlesGroupSanger/indelible_paper

Installation

Required Software Dependencies

InDelible is written for Python2.7.* or Python3.7.*

InDelible requires the following software to be installed and in $PATH:

  • bedtools
    • Note: If using CRAM formated files with InDelible, bedtools v2.28 or later is required.
  • tabix
  • bgzip
  • bwa
  • blast
    • The python package Biopython requires a local install of in $PATH in order to function. This needs to be installed prior to installing InDelible.

Installing InDelible on a Local Machine

To install InDelible:

Note: We recommend cloning the latest version of our github repo unless you need to reproduce your analysis with an older version. This should allow us to readily fix issues that you may experience with the software.

  1. Clone the git repo:
git clone https://github.com/eugenegardner/indelible.git
cd indelible/
  1. Create a virtual environment and activate it:
  • for Python2:
virtualenv venv/
source venv/bin/activate
  • for Python3:
python3 -m venv venv/
source venv/bin/activate
  1. Install cython, numpy, and other required packages:
pip install "cython==0.29.13"
pip install "numpy==1.17.2"
pip install -r requirements.txt

Note: If you get error(s) about pysam not being able to load specific libraries (like openssl, libbz2, etc.) that is a pysam problem related to htslib. Please see the pysam website to get help.

Note: We have tested this installation protocol, but other versions of pip may try to install packages out of order. If you get errors pertaining to dependencies stored within requirements.txt, you may need to install them one at a time in the order listed in requirements.txt. In particular, pandas may not properly install unless numpy is installed first, as we have done above.

Indelible was tested with the following package versions:

  • cython v0.29.13
  • numpy v1.17.2
  • pandas v0.25.1
  • pyfaidx v0.5.5.2
  • pysam v0.18.0
  • scipy v1.3.1
  • scikit-learn v0.21.3
    • Note: The random forest model provided in data.zip will likely only work with v0.21.3 of scikit-learn (see this link for an explanation why). Thus, if using a different version of scikit-learn, it is necessary to re-train the model with the provided test set included with this repository (data/observation_data.DDD.17IX2019.txt). Please see documentation below on how to train the random forest used by InDelible.
  • PyYAML v5.4
  • Biopython v1.74
  • intervaltree v3.0.2

4

Related Skills

View on GitHub
GitHub Stars17
CategoryEducation
Updated21d ago
Forks1

Languages

Python

Security Score

90/100

Audited on Mar 16, 2026

No findings