SkillAgentSearch skills...

Aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes

Install / Use

/learn @0xTCG/Aldy
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

.. raw:: html

<h1 align="center"> <img src="https://user-images.githubusercontent.com/10132487/100571499-1ee1fd00-3288-11eb-9760-75c4b0b98d2a.png" alt="Aldy" width=100px/> </h1> <p align="center"> <a href="https://badge.fury.io/py/aldy"><img src="https://badge.fury.io/py/aldy.svg" alt="Version"/></a> <img src="https://github.com/0xTCG/aldy/workflows/aldy-test/badge.svg" alt="CI Status"/> <a href="https://aldy.readthedocs.io/en/latest/?badge=latest"><img src="https://readthedocs.org/projects/aldy/badge/?version=latest" alt="ReadTheDocs"/></a> <a href="https://codecov.io/github/0xTCG/aldy"><img src="https://codecov.io/github/0xTCG/aldy/coverage.svg?branch=master" alt="Code Coverage"/></a> <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Black"/></a> <br/> <a href="https://www.nature.com/articles/s41467-018-03273-1"><img src="https://img.shields.io/badge/Published%20in-Nature%20Communications-red.svg" alt="Published in Nature Communications" /></a> <a href="https://genome.cshlp.org/content/33/1/61.full"><img src="https://img.shields.io/badge/Published%20in-Genome%20Research-purple.svg" alt="Published in Genome Research" /></a> <br/> <b><i>A quick and nifty tool for genotyping and phasing popular pharmacogenes.</i></b> </p>

Aldy 4 calls genotypes of many highly polymorphic pharmacogenes and reports them in a phased star-allele nomenclature. It can also call copy number of a given pharmacogene and genotype each copy present in the sample—something that standard genotype callers like GATK cannot do.

Algorithm details

TL;DR: Aldy 4 uses star-allele databases to guide the process of detecting the most likely genotype. The optimization is done in three stages via integer linear programming. See Gene Support_ for more details about the supported pharmacogene databases.

More details, together with the API documentation, are available at Read the Docs <https://aldy.readthedocs.io/en/latest/>_.

Experimental data is available here <paper>_.

If you are using Aldy, please cite our papers in the Nature Communications <https://www.nature.com/articles/s41467-018-03273-1>_ and Genome Research <https://genome.cshlp.org/content/33/1/61.full>_.

⚠️ Warning

Please read this carefully if you are using Aldy in a clinical or commercial environment.

Aldy is a computational tool whose purpose is to aid the genotype detection process. It can be of tremendous help in that process. However, it is not perfect, and it can easily make a wrong call if the data is noisy, ambiguous or if the target sample contains a previously unknown allele.

☣️🚨 Do not use the raw output of Aldy (or any other computational tool for that matter) to diagnose a disease or prescribe a drug! You are responsibe for inspecting and validating the results (ideally) in a wet lab before doing something that can have major consequences. 🚨☣️

We really mean it.

Finally, note that the allele databases are still a work in progress and that we still do not know the downstream impact of the vast majority of genotypes.

Installation

Aldy is written in Python and requires Python 3.7+ to run. It is intended to be run on POSIX-based systems (so far, only Linux and macOS have been tested).

The easiest way to install Aldy is to use pip::

pip install aldy

Append --user to the previous command to install Aldy locally if you cannot write to the system-wide Python directory.

Prerequisite: ILP solver

Aldy requires a mixed integer solver to run.

The following solvers are currently supported:

  • CBC / Google OR-Tools <https://developers.google.com/optimization/>_: a free, open-source MIP solver that is shipped by default with Google's OR-Tools. pip installs it by default when installing Aldy.

     If you have trouble installing `ortools` on a Nix-based Linux distro, try this::
    
         pip install --platform=manylinux1_x86_64 --only-binary=:all: --target ~/.local/lib/python3.8/site-packages ortools
    
  • Gurobi <http://www.gurobi.com>_: a commercial solver which is free for academic purposes. Most thoroughly tested solver: if you encounter any issues with CBC, try Gurobi. After installing it, don't forget to install gurobipy package by going to Gurobi's installation directory (e.g., /opt/gurobi/linux64 on Linux or /Library/gurobi751/mac64/ on macOS) and typing::

    python3 setup.py install
    

Sanity check

After installing Aldy and a compatible ILP solver, please make sure to test the installation by issuing the following command (this should take a few minutes)::

aldy test

In case everything is set up properly, you should see something like this::

🐿  Aldy v4.0 (Python 3.7.5 on macOS 12.4)
    (c) 2016-2022 Aldy Authors. All rights reserved.
    Free for non-commercial/academic use only.
================================ test session starts ================================
platform darwin -- Python 3.7.5, pytest-5.3.1, py-1.8.0, pluggy-0.13.1
rootdir: aldy, inifile: setup.cfg
plugins: anyio-3.6.1, xdist-1.31.0, cov-2.10.1, forked-1.1.3
collected 76 items
aldy/tests/test_cn_real.py ........                                            [ 10%]
aldy/tests/test_cn_synthetic.py .....                                          [ 17%]
aldy/tests/test_diplotype_real.py ....                                         [ 22%]
aldy/tests/test_diplotype_synthetic.py ......                                  [ 30%]
aldy/tests/test_full.py ...........                                            [ 44%]
aldy/tests/test_gene.py .......                                                [ 53%]
aldy/tests/test_major_real.py ...........                                      [ 68%]
aldy/tests/test_major_synthetic.py .......                                     [ 77%]
aldy/tests/test_minor_real.py .......                                          [ 86%]
aldy/tests/test_minor_synthetic.py ......                                      [ 94%]
aldy/tests/test_query.py ....                                                  [100%]
=========================== 76 passed in 131.10s (0:02:11) ==========================

Running

Aldy needs a SAM, BAM, CRAM or VCF file for genotyping. We will be using BAM as an example.

.. attention:: It is assumed that reads are mapped to hg19 (GRCh37) or hg38 (GRCh38). Other reference genomes are not yet supported.

An index is needed for BAM files. Get one by running::

samtools index file.bam

Aldy is invoked as::

aldy genotype -p [profile] -g [gene] file.bam

Sequencing profile selection

The [profile] argument refers to the sequencing profile. The following profiles are available:

  • illumina or wgs for the Illumina WGS or exome (WXS) data (or any uniform-coverage technology).

    .. attention::

    It is highly recommended to use samples with at least 40x coverage. Anything below 20x might result in noisy copy number calls and missed variants.

  • pgx1 for the PGRNseq v.1 capture protocol data

  • pgx2 for the PGRNseq v.2 capture protocol data

  • pgx3 for the PGRNseq v.3 capture protocol data

  • 10x for 10X Genomics data

    .. attention::

    For the best results on the 10X Genomics datasets, use the EMA aligner <https://github.com/arshajii/ema/>_, especially if doing CYP2D6 analysis. Aldy will also use the EMA read cloud information for improved variant phasing.

  • exome, wxs, wes for the whole-exome sequencing data

    .. attention::

    ⚠️ Be warned!: whole-exome data is incomplete by definition, and Aldy will not be able to call major star-alleles defined by their intronic or upstream variants. Aldy also assumes that there are only two (2) gene copies if the wxs profile is used, as it cannot call copy number changes nor fusions from exome data.

  • pacbio-hifi-targeted, pacbio-hifi-targeted-twist for PacBio HiFi target capture data

    .. attention::

    The provided PacBio capture profiles are custom and are not standard. Please ensure to generate a custom profile if using different PacBio HiFi capture protocols.

If you are using a different technology (e.g., some home-brewed capture kit), you can proceed provided that the following requirements are met:

  • all samples have a similar coverage distribution (i.e., two sequenced samples with the same copy number configuration must have similar coverage profiles; please consult us if you are not sure about this)
  • your panel includes a copy-number neutral region (currently, Aldy uses CYP2D8 as a copy-number neutral region, but it can be overridden).

Having said that, you can use a sample BAM that is known to have two copies of the genes you wish to genotype (without any fusions or copy number alterations) as a profile as follows::

aldy genotype -p profile-sample.bam -g [gene] file.bam -n [cn-neutral-region]

Alternatively, you can generate a profile for your panel/technology by running::

# Get the profile
aldy profile profile-sample.bam > my-cool-tech.profile
# Run Aldy
aldy genotype -p my-cool-tech.profile -g [gene] file.bam

Note: if you are using long-read captures such as PacBio or Nanopore, make sure to add the following lines to the corresponding profile file::

options:
  sam_long_reads: true

Alternatively, you can pass this flag directly to Aldy as --param sam_long_reads=true.

Output

By default, Aldy will generate file-[gene].aldy (the default location can be changed via -o parameter). Aldy also supports VCF file output: to enable it, just append .vcf to the output file name. The summary of the calls is shown at the end of the output::

$ aldy -p pgx2 -g cyp2d6 NA19788.bam
View on GitHub
GitHub Stars70
CategoryDevelopment
Updated4d ago
Forks23

Languages

C

Security Score

85/100

Audited on Apr 2, 2026

No findings