SkillAgentSearch skills...

Pypgx

A Python package for pharmacogenomics (PGx) research

Install / Use

/learn @sbslee/Pypgx

README

.. This file was automatically generated by docs/create.py.

README


.. image:: https://badge.fury.io/py/pypgx.svg :target: https://badge.fury.io/py/pypgx

.. image:: https://readthedocs.org/projects/pypgx/badge/?version=latest :target: https://pypgx.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: https://anaconda.org/bioconda/pypgx/badges/version.svg :target: https://anaconda.org/bioconda/pypgx

.. image:: https://anaconda.org/bioconda/pypgx/badges/license.svg :target: https://github.com/sbslee/pypgx/blob/master/LICENSE

.. image:: https://anaconda.org/bioconda/pypgx/badges/downloads.svg :target: https://anaconda.org/bioconda/pypgx/files

Introduction

The main purpose of the PyPGx package is to provide a unified platform for pharmacogenomics (PGx) research. PyPGx is and always will be completely free and open source.

The package is written in Python, and supports both command line interface (CLI) and application programming interface (API) whose documentations are available at the Read the Docs <https://pypgx.readthedocs.io/en/latest/>_.

Quick links:

  • README <https://pypgx.readthedocs.io/en/latest/readme.html>__
  • Genes <https://pypgx.readthedocs.io/en/latest/genes.html>__
  • Glossary <https://pypgx.readthedocs.io/en/latest/glossary.html>__
  • Tutorials <https://pypgx.readthedocs.io/en/latest/tutorials.html>__
  • CLI <https://pypgx.readthedocs.io/en/latest/cli.html>__
  • API <https://pypgx.readthedocs.io/en/latest/api.html>__
  • SDK <https://pypgx.readthedocs.io/en/latest/sdk.html>__
  • FAQ <https://pypgx.readthedocs.io/en/latest/faq.html>__
  • Changelog <https://pypgx.readthedocs.io/en/latest/changelog.html>__

PyPGx can predict PGx genotypes (e.g. *4/*5) and phenotypes (e.g. Poor Metabolizer) using various genomic data, including data from next-generation sequencing (NGS), single nucleotide polymorphism (SNP) array, and long-read sequencing. Importantly, for NGS data the package can detect structural variation (SV) <https://pypgx.readthedocs.io/en/latest/ glossary.html#structural-variation-sv>__ using a machine learning-based approach. Finally, note that PyPGx is compatible with both of the Genome Reference Consortium Human (GRCh) builds, GRCh37 (hg19) and GRCh38 (hg38).

There are currently 88 pharmacogenes in PyPGx:

.. list-table::

    • ABCB1
    • ABCG2
    • ACYP2
    • ADRA2A
    • ADRB2
    • ANKK1
    • APOE
    • ATM
    • BCHE
    • BDNF
    • CACNA1S
    • CFTR
    • COMT
    • CYP1A1
    • CYP1A2
    • CYP1B1
    • CYP2A6/CYP2A7
    • CYP2A13
    • CYP2B6/CYP2B7
    • CYP2C8
    • CYP2C9
    • CYP2C19
    • CYP2D6/CYP2D7
    • CYP2E1
    • CYP2F1
    • CYP2J2
    • CYP2R1
    • CYP2S1
    • CYP2W1
    • CYP3A4
    • CYP3A5
    • CYP3A7
    • CYP3A43
    • CYP4A11
    • CYP4A22
    • CYP4B1
    • CYP4F2
    • CYP17A1
    • CYP19A1
    • CYP26A1
    • DBH
    • DPYD
    • DRD2
    • F2
    • F5
    • G6PD
    • GRIK1
    • GRIK4
    • GRIN2B
    • GSTM1
    • GSTP1
    • GSTT1
    • HTR1A
    • HTR2A
    • IFNL3
    • IFNL3
    • ITGB3
    • ITPA
    • MT-RNR1
    • MTHFR
    • NAT1
    • NAT2
    • NUDT15
    • OPRK1
    • OPRM1
    • POR
    • PTGIS
    • RARG
    • RYR1
    • SLC6A4
    • SLC15A2
    • SLC22A2
    • SLC28A3
    • SLC47A2
    • SLCO1B1
    • SLCO1B3
    • SLCO2B1
    • SULT1A1
    • TBXAS1
    • TPMT
    • UGT1A1
    • UGT1A4
    • UGT1A6
    • UGT2B7
    • UGT2B15
    • UGT2B17
    • VKORC1
    • XPC

Your contributions (e.g. feature ideas, pull requests) are most welcome.

| Author: Seung-been "Steven" Lee | Email: sbstevenlee@gmail.com | License: MIT License

Citation

If you use PyPGx in a published analysis, please report the program version and cite the following article:

  • Lee et al., 2022. ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>__. PLOS ONE.

In this article, PyPGx was used to call star alleles for genomic DNA reference materials from the Centers for Disease Control and Prevention–based Genetic Testing Reference Materials Coordination Program (GeT-RM) <https://pypgx.readthedocs.io/en/latest/glossary.html# genetic-testing-reference-materials-coordination-program-get-rm>__, where it showed almost 100% concordance with genotype results from previous works.

The development of PyPGx was heavily inspired by Stargazer <https:// stargazer.gs.washington.edu/stargazerweb/>__, another star-allele calling tool developed by Steven when he was in his PhD program at the University of Washington. Therefore, please also cite the following articles:

  • Lee et al., 2019. Calling star alleles with Stargazer in 28 pharmacogenes with whole genome sequences <https://doi.org/10.1002/cpt.1552>__. Clinical Pharmacology & Therapeutics.
  • Lee et al., 2018. Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model <https://doi.org/10.1038/s41436-018-0054-0>__. Genetics in Medicine.

Below is an incomplete list of publications which have used PyPGx:

  • Wroblewski et al., 2022. Pharmacogenetic variation in Neanderthals and Denisovans and implications for human health and response to medications <https://doi.org/10.1101/2021.11.27.470071>__. bioRxiv.
  • Botton et al., 2020. Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing <https://doi.org/10.3390/genes11111333>__. Genes.

Support PyPGx

If you find my work useful, please consider becoming a sponsor <https://github.com/sponsors/sbslee>__.

Installation

Following packages are required to run PyPGx:

.. list-table:: :header-rows: 1

    • Package
    • Anaconda
    • PyPI
    • fuc
    • scikit-learn
    • openjdk

There are various ways you can install PyPGx. The recommended way is via conda (Anaconda <https://www.anaconda.com/>__):

.. code-block:: text

$ conda install -c bioconda pypgx

Above will automatically download and install all the dependencies as well. Alternatively, you can use pip (PyPI <https://pypi.org/>__) to install PyPGx and all of its dependencies except openjdk (i.e. Java JDK must be installed separately):

.. code-block:: text

$ pip install pypgx

Finally, you can clone the GitHub repository and then install PyPGx locally:

.. code-block:: text

$ git clone https://github.com/sbslee/pypgx $ cd pypgx $ pip install .

The nice thing about this approach is that you will have access to development versions that are not available in Anaconda or PyPI. For example, you can access a development branch with the git checkout command. When you do this, please make sure your environment already has all the dependencies installed.

.. note:: Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>__ is one of the default software tools used by PyPGx for haplotype phasing SNVs and indels. The program is freely available and published under the GNU General Public License <https://faculty.washington.edu/browning/ beagle/gpl_license>__. Users do not need to download Beagle separately because a copy of the software (beagle.22Jul22.46e.jar) is already included in PyPGx.

.. warning:: You're not done yet! Keep scrolling down to obtain the resource bundle for PyPGx, which is essential for running the package.

Resource bundle

Starting with the 0.12.0 version, reference haplotype panel files and structural variant classifier files in PyPGx are moved to the pypgx-bundle repository <https://github.com/sbslee/pypgx-bundle>__ (only those files are moved; other files such as allele-table.csv and variant-table.csv are intact). Therefore, the user must clone the pypgx-bundle repository with matching PyPGx version to their home directory in order for PyPGx to correctly access the moved files (i.e. replace x.x.x with the version number of PyPGx you're using, such as 0.18.0):

.. code-block:: text

$ cd ~ $ git clone --branch x.x.x --depth 1 https://github.com/sbslee/pypgx-bundle

This is undoubtedly annoying, but absolutely necessary for portability reasons because PyPGx has been growing exponentially in file size due to the increasing number of genes supported and their variation complexity, to the point where it now exceeds upload size limit for PyPI (100 Mb). After removal of those files, the size of PyPGx has reduced from >100 Mb to <1 Mb.

Starting with version 0.22.0, you can now specify a custom location for the pypgx-bundle directory instead of using the home directory. This can be achieved by setting the bundle location using the PYPGX_BUNDLE environment variable:

.. code-block:: text

$ export PYPGX_BUNDLE=/path/to/pypgx-bundle

Structural variation detection

Many pharmacogenes are known to have structural variation (SV) <https://pypgx.readthedocs.io/en/latest/glossary.html#structural-variation- sv>__ such as gene deletions, duplications, and hybrids. You can visit the Genes <https://pypgx.readthedocs.io/en/latest/genes.html>__ page to see the list of genes with SV.

Some of the SV events can be quite challenging to detect accurately with NGS data due to misalignment of sequence reads caused by sequence homology with other gene family members (e.g. CYP2D6 and CYP2D7). PyPGx attempts to address this issue by training a support vector machine (SVM) <https://scikit- learn.org/stable/modules/generated/sklearn.svm.SVC.html>-based multiclass classifier using the one-vs-rest strategy <https://scikit-learn.org/stable /modules/generated/sklearn.multiclass.OneVsRestClassifier.html> for each gene for each GRCh build. Each classifier is tra

View on GitHub
GitHub Stars86
CategoryEducation
Updated5d ago
Forks19

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings