PanTax
PanTax: Strain-level metagenomic profiling using pangenome graphs
Install / Use
/learn @LuoGroup2023/PanTaxREADME
PanTax: Strain-level metagenomic profiling using pangenome graphs
<!-- [](https://github.com/LuoGroup2023/PanTax) -->Read more about PanTax here:
Strain-level metagenomic profiling using pangenome graphs with PanTax
For a collection of more profiling tools (species or/and strain), please refer to: metagenome-profling-tools
Table of Contents
- Overview
- Solvers to Know Before Installation
- Installation
- Gurobi license
- Genome preprocessing
- Running
- Options
- PanTax output
- Examples
- Possible issues during installation
- Change
- TODO
- Citation
- Patent
[!IMPORTANT] Pantax v2.1.0 is released.
In this release, the entire codebase has been rewritten using a single language, Rust, instead of the previous mixture of Shell, Python, and Rust. In addition, several new features have been introduced. Detailed see release changlog.
Overview
PanTax is a pangenome graph-based taxonomic profiling tool designed for accurate strain-level classification of metagenomic sequencing data. Unlike traditional methods that rely on multiple linear reference genomes, PanTax leverages pangenome graphs to better represent genetic variation and relationships across related genomes. It supports both short and long reads, works across single or multiple species, and delivers superior precision or recall at the strain level. PanTax provides a robust, scalable solution to taxonomic classification for strain resolution, overcoming key limitations of existing tools.
Solvers to Know Before Installation
Before installation, please note that the Path Abundance Optimization (PAO) module in PanTax depends on an ILP solver.
-
Gurobi is the recommended solver and generally provides the best performance.
Gurobiis a commercial solver, and its free (Community) edition is not suitable for solving large-scale optimization problems. Furthermore, the deployment and use of Gurobi on HPC and large-scale server clusters may introduce additional complexity. If you need to useGurobi, please be sure to refer to Gurobi license to obtain a license. Academic users may apply for a free academic license from Gurobi. -
CPLEX is the recommended solver
CPLEXis also a commercial solver, and its free (Community) edition is not suitable for solving large-scale optimization problems. We therefore recommend that researchers use the academic edition, which removes these limitations. The installation process ofCPLEXis relatively complex: users need to register on the IBM website, download the installer locally, and complete the manual installation. -
We also support several open-source ILP solvers, including:
HiGHSCBCGLPK
In our tests, these solvers produce solutions that are comparable in quality to Gurobi, but they are significantly slower.
Installation
PanTax is now distributed as multiple executables based on different solvers:
pantax (gurobi, highs, cbc, glpk) — default
pantax-gb (gurobi)
pantax-cp (cplex)
pantax-free (highs, cbc, glpk)
- From bioconda
conda install -c bioconda -c conda-forge pantax
conda install -c gurobi gurobi=11
## Run pantax.
pantax -h
- From source
git clone https://github.com/LuoGroup2023/PanTax.git -b main
conda create -n pantax
conda activate pantax
condas install -c bioconda -c conda-forge -c gurobi -c defaults \
python=3.10 \
r-base=4.2 \
pggb=0.6.0 \
vg=1.59 \
graphaligner \
sylph \
fastani \
pandas \
numpy \
tqdm \
networkx \
pyarrow \
gurobi=11 \
clang \
rust=1.82 \
hdf5=1.10.5 \
glpk \
coin-or-cbc \
htslib
cd PanTax
# default
bash install.sh
# cplex
# for example: bash install.sh cplex /home/work/wenhai/tools/cplex/CPLEX_Studio1210/cplex
bash install.sh cplex /path/to/cplex
# Run pantax
cd ../scripts
./pantax -h
If the installation environment encounters problems, you can also use conda env create -f pantax.yaml -y to build it.
- From docker
cd docker
docker build -t pantax:v1 .
# 1. run directly in your path with data
docker run -v $(dirname $PWD):/mnt -w /mnt/$(basename $PWD) pantax:v1 pantax -h
# 2. start an interactive docker container session and run in your path with data
docker run -it --rm -v $(dirname $PWD):/mnt -w /mnt/$(basename $PWD) -v /var/run/docker.sock:/var/run/docker.sock pantax:v1 /bin/bash
conda activate pantax
pantax -h
Gurobi license
Please refer to the following steps to install gurobi and obtain a license.
- Get Gurobi and the license
Gurobi is a commercial ILP solver with two licensing options: (1) a single-host license where the license is tied to a single computer and (2) a network license for use in a compute cluster (using a license server in the cluster). Both options are freely and easily available for users in academia. Download Gurobi for your specific platform. Note that the grb we use relies on Gurobi version 11.
To obtain your free academic license for Gurobi, please refer to the following resources:
- For an Academic Named-User License, visit: https://www.gurobi.com/features/academic-named-user-license/
- For an Academic WLS (Web License Service) License, visit: https://www.gurobi.com/features/academic-wls-license/
- Alternatively, you can explore the available options and choose the license that best suits your needs at: https://www.gurobi.com/academia/academic-program-and-licenses/
Here is an example of how to download Gurobi (no login required):
wget https://packages.gurobi.com/11.0/gurobi11.0.3_linux64.tar.gz
- Set environment variable
export GUROBI_HOME="/path/to/gurobi1103/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
export GRB_LICENSE_FILE=/path/to/gurobi.lic
The most important is is to set the environment variable GRB_LICENSE_FILE.
Genome preprocessing
We recommend removing plasmids and redundancy from the genome first with --remove, --compute, --cluster option. Eventually you will get final file containing genomic information in /path/to/database.
If genomes are all in NCBI refseq database, you only need to use -r option to specify the directory containing these genomes.
/path/to/PanTax/scripts/pantax-rg -r ref --remove --cluster
Otherwise, you need to provide a file containing information about the custom genomes.
/path/to/PanTax/scripts/pantax-rg -c genomes_info.txt --remove --cluster
The genomes_info.txt file gives a list of reference genomes in fasta format, which constitute PaxTax's original database, alongwith NCBI's taxonomic information. The input lines in the file should contain at least 5 tab-delimited fields; from left to right, they are Genome IDs, Strain taxonomic IDs, Species taxonomic IDs, Organism names, Genome absolute path.
Here is an example format of genomes_info.txt file:
genome_ID strain_taxid species_taxid organism_name id
GCF_000218545.1_ASM21854v1 593907 11 Cellulomonas gilvus ATCC 13127 /path/to/GCF_000218545.1_ASM21854v1_genomic.fna
GCF_025402875.1_ASM2540287v1 24.1 24 Shewanella putrefaciens /path/to/GCF_025402875.1_ASM2540287v1_genomic.fna
Running
See test/pantax.sh for the more commands.
- Create database only
pantax -f $genome_info --create
You'll need to run /path/to/PanTax/scripts/pantax -f $genome_info --create. This will generate reference_pangenome.gfa and other files in your database directory.
Due to the large size of the reference pangenome we used for testing, we provide the genomes_info.txt used here. You need to download these genomes from NCBI RefSeq and update the actual paths in genomes_info.txt. Please note that NCBI RefSeq periodically updates their database, so we cannot guarantee that all the listed genomes will be available. Building the reference pangenome takes approximately one week with this genomes_info.txt.
- Query with specified database
- species level and strain level
# long read
pantax -f $genome_info -l -r $fq -db $db --species --strain
# short read(pair-end)
pantax -f $genome_info -s -p -r $fq -db $db --species --strain
- species level and then strain level
# long read
# species level
pantax -f $genome_info -l -r $fq -db $db --species -n
# s
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
