Aldy
Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
Install / Use
/learn @0xTCG/AldyREADME
.. raw:: html
<h1 align="center"> <img src="https://user-images.githubusercontent.com/10132487/100571499-1ee1fd00-3288-11eb-9760-75c4b0b98d2a.png" alt="Aldy" width=100px/> </h1> <p align="center"> <a href="https://badge.fury.io/py/aldy"><img src="https://badge.fury.io/py/aldy.svg" alt="Version"/></a> <img src="https://github.com/0xTCG/aldy/workflows/aldy-test/badge.svg" alt="CI Status"/> <a href="https://aldy.readthedocs.io/en/latest/?badge=latest"><img src="https://readthedocs.org/projects/aldy/badge/?version=latest" alt="ReadTheDocs"/></a> <a href="https://codecov.io/github/0xTCG/aldy"><img src="https://codecov.io/github/0xTCG/aldy/coverage.svg?branch=master" alt="Code Coverage"/></a> <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Black"/></a> <br/> <a href="https://www.nature.com/articles/s41467-018-03273-1"><img src="https://img.shields.io/badge/Published%20in-Nature%20Communications-red.svg" alt="Published in Nature Communications" /></a> <a href="https://genome.cshlp.org/content/33/1/61.full"><img src="https://img.shields.io/badge/Published%20in-Genome%20Research-purple.svg" alt="Published in Genome Research" /></a> <br/> <b><i>A quick and nifty tool for genotyping and phasing popular pharmacogenes.</i></b> </p>Aldy 4 calls genotypes of many highly polymorphic pharmacogenes and reports them in a phased star-allele nomenclature. It can also call copy number of a given pharmacogene and genotype each copy present in the sample—something that standard genotype callers like GATK cannot do.
Algorithm details
TL;DR: Aldy 4 uses star-allele databases to guide the process of detecting the most likely genotype.
The optimization is done in three stages via integer linear programming.
See Gene Support_ for more details about the supported pharmacogene databases.
More details, together with the API documentation, are available
at Read the Docs <https://aldy.readthedocs.io/en/latest/>_.
Experimental data is available here <paper>_.
If you are using Aldy, please cite our papers in the
Nature Communications <https://www.nature.com/articles/s41467-018-03273-1>_
and Genome Research <https://genome.cshlp.org/content/33/1/61.full>_.
⚠️ Warning
Please read this carefully if you are using Aldy in a clinical or commercial environment.
Aldy is a computational tool whose purpose is to aid the genotype detection process. It can be of tremendous help in that process. However, it is not perfect, and it can easily make a wrong call if the data is noisy, ambiguous or if the target sample contains a previously unknown allele.
☣️🚨 Do not use the raw output of Aldy (or any other computational tool for that matter) to diagnose a disease or prescribe a drug! You are responsibe for inspecting and validating the results (ideally) in a wet lab before doing something that can have major consequences. 🚨☣️
We really mean it.
Finally, note that the allele databases are still a work in progress and that we still do not know the downstream impact of the vast majority of genotypes.
Installation
Aldy is written in Python and requires Python 3.7+ to run. It is intended to be run on POSIX-based systems (so far, only Linux and macOS have been tested).
The easiest way to install Aldy is to use pip::
pip install aldy
Append --user to the previous command to install Aldy locally
if you cannot write to the system-wide Python directory.
Prerequisite: ILP solver
Aldy requires a mixed integer solver to run.
The following solvers are currently supported:
-
CBC / Google OR-Tools <https://developers.google.com/optimization/>_: a free, open-source MIP solver that is shipped by default with Google's OR-Tools.pipinstalls it by default when installing Aldy.If you have trouble installing `ortools` on a Nix-based Linux distro, try this:: pip install --platform=manylinux1_x86_64 --only-binary=:all: --target ~/.local/lib/python3.8/site-packages ortools -
Gurobi <http://www.gurobi.com>_: a commercial solver which is free for academic purposes. Most thoroughly tested solver: if you encounter any issues with CBC, try Gurobi. After installing it, don't forget to installgurobipypackage by going to Gurobi's installation directory (e.g.,/opt/gurobi/linux64on Linux or/Library/gurobi751/mac64/on macOS) and typing::python3 setup.py install
Sanity check
After installing Aldy and a compatible ILP solver, please make sure to test the installation by issuing the following command (this should take a few minutes)::
aldy test
In case everything is set up properly, you should see something like this::
🐿 Aldy v4.0 (Python 3.7.5 on macOS 12.4)
(c) 2016-2022 Aldy Authors. All rights reserved.
Free for non-commercial/academic use only.
================================ test session starts ================================
platform darwin -- Python 3.7.5, pytest-5.3.1, py-1.8.0, pluggy-0.13.1
rootdir: aldy, inifile: setup.cfg
plugins: anyio-3.6.1, xdist-1.31.0, cov-2.10.1, forked-1.1.3
collected 76 items
aldy/tests/test_cn_real.py ........ [ 10%]
aldy/tests/test_cn_synthetic.py ..... [ 17%]
aldy/tests/test_diplotype_real.py .... [ 22%]
aldy/tests/test_diplotype_synthetic.py ...... [ 30%]
aldy/tests/test_full.py ........... [ 44%]
aldy/tests/test_gene.py ....... [ 53%]
aldy/tests/test_major_real.py ........... [ 68%]
aldy/tests/test_major_synthetic.py ....... [ 77%]
aldy/tests/test_minor_real.py ....... [ 86%]
aldy/tests/test_minor_synthetic.py ...... [ 94%]
aldy/tests/test_query.py .... [100%]
=========================== 76 passed in 131.10s (0:02:11) ==========================
Running
Aldy needs a SAM, BAM, CRAM or VCF file for genotyping. We will be using BAM as an example.
.. attention:: It is assumed that reads are mapped to hg19 (GRCh37) or hg38 (GRCh38). Other reference genomes are not yet supported.
An index is needed for BAM files. Get one by running::
samtools index file.bam
Aldy is invoked as::
aldy genotype -p [profile] -g [gene] file.bam
Sequencing profile selection
The [profile] argument refers to the sequencing profile.
The following profiles are available:
-
illuminaorwgsfor the Illumina WGS or exome (WXS) data (or any uniform-coverage technology)... attention::
It is highly recommended to use samples with at least 40x coverage. Anything below 20x might result in noisy copy number calls and missed variants.
-
pgx1for the PGRNseq v.1 capture protocol data -
pgx2for the PGRNseq v.2 capture protocol data -
pgx3for the PGRNseq v.3 capture protocol data -
10xfor 10X Genomics data.. attention::
For the best results on the 10X Genomics datasets, use the
EMA aligner <https://github.com/arshajii/ema/>_, especially if doing CYP2D6 analysis. Aldy will also use the EMA read cloud information for improved variant phasing. -
exome,wxs,wesfor the whole-exome sequencing data.. attention::
⚠️ Be warned!: whole-exome data is incomplete by definition, and Aldy will not be able to call major star-alleles defined by their intronic or upstream variants. Aldy also assumes that there are only two (2) gene copies if the
wxsprofile is used, as it cannot call copy number changes nor fusions from exome data. -
pacbio-hifi-targeted,pacbio-hifi-targeted-twistfor PacBio HiFi target capture data.. attention::
The provided PacBio capture profiles are custom and are not standard. Please ensure to generate a custom profile if using different PacBio HiFi capture protocols.
If you are using a different technology (e.g., some home-brewed capture kit), you can proceed provided that the following requirements are met:
- all samples have a similar coverage distribution (i.e., two sequenced samples with the same copy number configuration must have similar coverage profiles; please consult us if you are not sure about this)
- your panel includes a copy-number neutral region (currently, Aldy uses CYP2D8 as a copy-number neutral region, but it can be overridden).
Having said that, you can use a sample BAM that is known to have two copies of the genes you wish to genotype (without any fusions or copy number alterations) as a profile as follows::
aldy genotype -p profile-sample.bam -g [gene] file.bam -n [cn-neutral-region]
Alternatively, you can generate a profile for your panel/technology by running::
# Get the profile
aldy profile profile-sample.bam > my-cool-tech.profile
# Run Aldy
aldy genotype -p my-cool-tech.profile -g [gene] file.bam
Note: if you are using long-read captures such as PacBio or Nanopore, make sure to add the following lines to the corresponding profile file::
options:
sam_long_reads: true
Alternatively, you can pass this flag directly to Aldy as --param sam_long_reads=true.
Output
By default, Aldy will generate file-[gene].aldy
(the default location can be changed via -o parameter).
Aldy also supports VCF file output: to enable it, just append .vcf to the output file name.
The summary of the calls is shown at the end of the output::
$ aldy -p pgx2 -g cyp2d6 NA19788.bam
