Pypgx
A Python package for pharmacogenomics (PGx) research
Install / Use
/learn @sbslee/PypgxREADME
.. This file was automatically generated by docs/create.py.
README
.. image:: https://badge.fury.io/py/pypgx.svg :target: https://badge.fury.io/py/pypgx
.. image:: https://readthedocs.org/projects/pypgx/badge/?version=latest :target: https://pypgx.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: https://anaconda.org/bioconda/pypgx/badges/version.svg :target: https://anaconda.org/bioconda/pypgx
.. image:: https://anaconda.org/bioconda/pypgx/badges/license.svg :target: https://github.com/sbslee/pypgx/blob/master/LICENSE
.. image:: https://anaconda.org/bioconda/pypgx/badges/downloads.svg :target: https://anaconda.org/bioconda/pypgx/files
Introduction
The main purpose of the PyPGx package is to provide a unified platform for pharmacogenomics (PGx) research. PyPGx is and always will be completely free and open source.
The package is written in Python, and supports both command line interface
(CLI) and application programming interface (API) whose documentations are
available at the Read the Docs <https://pypgx.readthedocs.io/en/latest/>_.
Quick links:
README <https://pypgx.readthedocs.io/en/latest/readme.html>__Genes <https://pypgx.readthedocs.io/en/latest/genes.html>__Glossary <https://pypgx.readthedocs.io/en/latest/glossary.html>__Tutorials <https://pypgx.readthedocs.io/en/latest/tutorials.html>__CLI <https://pypgx.readthedocs.io/en/latest/cli.html>__API <https://pypgx.readthedocs.io/en/latest/api.html>__SDK <https://pypgx.readthedocs.io/en/latest/sdk.html>__FAQ <https://pypgx.readthedocs.io/en/latest/faq.html>__Changelog <https://pypgx.readthedocs.io/en/latest/changelog.html>__
PyPGx can predict PGx genotypes (e.g. *4/*5) and phenotypes (e.g.
Poor Metabolizer) using various genomic data, including data from
next-generation sequencing (NGS), single nucleotide polymorphism (SNP) array,
and long-read sequencing. Importantly, for NGS data the package can detect
structural variation (SV) <https://pypgx.readthedocs.io/en/latest/ glossary.html#structural-variation-sv>__ using a machine learning-based
approach. Finally, note that PyPGx is compatible with both of the Genome
Reference Consortium Human (GRCh) builds, GRCh37 (hg19) and GRCh38 (hg38).
There are currently 88 pharmacogenes in PyPGx:
.. list-table::
-
- ABCB1
- ABCG2
- ACYP2
- ADRA2A
- ADRB2
-
- ANKK1
- APOE
- ATM
- BCHE
- BDNF
-
- CACNA1S
- CFTR
- COMT
- CYP1A1
- CYP1A2
-
- CYP1B1
- CYP2A6/CYP2A7
- CYP2A13
- CYP2B6/CYP2B7
- CYP2C8
-
- CYP2C9
- CYP2C19
- CYP2D6/CYP2D7
- CYP2E1
- CYP2F1
-
- CYP2J2
- CYP2R1
- CYP2S1
- CYP2W1
- CYP3A4
-
- CYP3A5
- CYP3A7
- CYP3A43
- CYP4A11
- CYP4A22
-
- CYP4B1
- CYP4F2
- CYP17A1
- CYP19A1
- CYP26A1
-
- DBH
- DPYD
- DRD2
- F2
- F5
-
- G6PD
- GRIK1
- GRIK4
- GRIN2B
- GSTM1
-
- GSTP1
- GSTT1
- HTR1A
- HTR2A
- IFNL3
-
- IFNL3
- ITGB3
- ITPA
- MT-RNR1
- MTHFR
-
- NAT1
- NAT2
- NUDT15
- OPRK1
- OPRM1
-
- POR
- PTGIS
- RARG
- RYR1
- SLC6A4
-
- SLC15A2
- SLC22A2
- SLC28A3
- SLC47A2
- SLCO1B1
-
- SLCO1B3
- SLCO2B1
- SULT1A1
- TBXAS1
- TPMT
-
- UGT1A1
- UGT1A4
- UGT1A6
- UGT2B7
- UGT2B15
-
- UGT2B17
- VKORC1
- XPC
Your contributions (e.g. feature ideas, pull requests) are most welcome.
| Author: Seung-been "Steven" Lee | Email: sbstevenlee@gmail.com | License: MIT License
Citation
If you use PyPGx in a published analysis, please report the program version and cite the following article:
- Lee et al., 2022.
ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>__. PLOS ONE.
In this article, PyPGx was used to call star alleles for genomic DNA
reference materials from the Centers for Disease Control and Prevention–based
Genetic Testing Reference Materials Coordination Program (GeT-RM) <https://pypgx.readthedocs.io/en/latest/glossary.html# genetic-testing-reference-materials-coordination-program-get-rm>__, where it
showed almost 100% concordance with genotype results from previous works.
The development of PyPGx was heavily inspired by Stargazer <https:// stargazer.gs.washington.edu/stargazerweb/>__, another star-allele calling
tool developed by Steven when he was in his PhD program at the University of
Washington. Therefore, please also cite the following articles:
- Lee et al., 2019.
Calling star alleles with Stargazer in 28 pharmacogenes with whole genome sequences <https://doi.org/10.1002/cpt.1552>__. Clinical Pharmacology & Therapeutics. - Lee et al., 2018.
Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model <https://doi.org/10.1038/s41436-018-0054-0>__. Genetics in Medicine.
Below is an incomplete list of publications which have used PyPGx:
- Wroblewski et al., 2022.
Pharmacogenetic variation in Neanderthals and Denisovans and implications for human health and response to medications <https://doi.org/10.1101/2021.11.27.470071>__. bioRxiv. - Botton et al., 2020.
Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing <https://doi.org/10.3390/genes11111333>__. Genes.
Support PyPGx
If you find my work useful, please consider becoming a sponsor <https://github.com/sponsors/sbslee>__.
Installation
Following packages are required to run PyPGx:
.. list-table:: :header-rows: 1
-
- Package
- Anaconda
- PyPI
-
fuc- ✅
- ✅
-
scikit-learn- ✅
- ✅
-
openjdk- ✅
- ❌
There are various ways you can install PyPGx. The recommended way is via
conda (Anaconda <https://www.anaconda.com/>__):
.. code-block:: text
$ conda install -c bioconda pypgx
Above will automatically download and install all the dependencies as well.
Alternatively, you can use pip (PyPI <https://pypi.org/>__) to install
PyPGx and all of its dependencies except openjdk (i.e. Java JDK must be
installed separately):
.. code-block:: text
$ pip install pypgx
Finally, you can clone the GitHub repository and then install PyPGx locally:
.. code-block:: text
$ git clone https://github.com/sbslee/pypgx $ cd pypgx $ pip install .
The nice thing about this approach is that you will have access to
development versions that are not available in Anaconda or PyPI. For example,
you can access a development branch with the git checkout command. When
you do this, please make sure your environment already has all the
dependencies installed.
.. note::
Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>__
is one of the default software tools used by PyPGx for haplotype phasing
SNVs and indels. The program is freely available and published under the
GNU General Public License <https://faculty.washington.edu/browning/ beagle/gpl_license>__. Users do not need to download Beagle separately
because a copy of the software (beagle.22Jul22.46e.jar) is already
included in PyPGx.
.. warning:: You're not done yet! Keep scrolling down to obtain the resource bundle for PyPGx, which is essential for running the package.
Resource bundle
Starting with the 0.12.0 version, reference haplotype panel files and
structural variant classifier files in PyPGx are moved to the
pypgx-bundle repository <https://github.com/sbslee/pypgx-bundle>__
(only those files are moved; other files such as allele-table.csv and
variant-table.csv are intact). Therefore, the user must clone the
pypgx-bundle repository with matching PyPGx version to their home
directory in order for PyPGx to correctly access the moved files (i.e. replace
x.x.x with the version number of PyPGx you're using, such as 0.18.0):
.. code-block:: text
$ cd ~ $ git clone --branch x.x.x --depth 1 https://github.com/sbslee/pypgx-bundle
This is undoubtedly annoying, but absolutely necessary for portability reasons because PyPGx has been growing exponentially in file size due to the increasing number of genes supported and their variation complexity, to the point where it now exceeds upload size limit for PyPI (100 Mb). After removal of those files, the size of PyPGx has reduced from >100 Mb to <1 Mb.
Starting with version 0.22.0, you can now specify a custom location for the
pypgx-bundle directory instead of using the home directory. This can be
achieved by setting the bundle location using the PYPGX_BUNDLE environment
variable:
.. code-block:: text
$ export PYPGX_BUNDLE=/path/to/pypgx-bundle
Structural variation detection
Many pharmacogenes are known to have structural variation (SV) <https://pypgx.readthedocs.io/en/latest/glossary.html#structural-variation- sv>__ such as gene deletions, duplications, and hybrids. You can visit the
Genes <https://pypgx.readthedocs.io/en/latest/genes.html>__ page to see the
list of genes with SV.
Some of the SV events can be quite challenging to detect accurately with NGS
data due to misalignment of sequence reads caused by sequence homology with
other gene family members (e.g. CYP2D6 and CYP2D7). PyPGx attempts to address
this issue by training a support vector machine (SVM) <https://scikit- learn.org/stable/modules/generated/sklearn.svm.SVC.html>-based multiclass
classifier using the one-vs-rest strategy <https://scikit-learn.org/stable /modules/generated/sklearn.multiclass.OneVsRestClassifier.html> for each
gene for each GRCh build. Each classifier is tra
