Jaeger
Jaeger is a quick and precise tool for detecting phages in sequence assemblies.
Install / Use
/learn @Yasas1994/JaegerREADME
.
,'/ \`.
|\/___\/|
\'\ /`/ ██╗ █████╗ ███████╗ ██████╗ ███████╗██████╗
`.\ /,' ██║██╔══██╗██╔════╝██╔════╝ ██╔════╝██╔══██╗
| ██║███████║█████╗ ██║ ███╗█████╗ ██████╔╝
| ██ ██║██╔══██║██╔══╝ ██║ ██║██╔══╝ ██╔══██╗
|=| ╚█████╔╝██║ ██║███████╗╚██████╔╝███████╗██║ ██║
/\ ,|=|. /\ ╚════╝ ╚═╝ ╚═╝╚══════╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝
,'`. \/ |=| \/ ,'`.
,' `.|\ `-' /|,' `.
,' .-._ \ `---' / _,-. `.
,' `-`-._,-'-' `.
'
Jaeger : an accurate and fast deep-learning tool to detect bacteriophage sequences
Jaeger is a tool that utilizes homology-free machine learning to identify phage genome sequences that are hidden within metagenomes. It is capable of detecting both phages and prophages within metagenomic assemblies.
Citing Jaeger
If you use Jaeger in your work, please consider citing its preprint:
- Jaeger: an accurate and fast deep-learning tool to detect bacteriophage sequences
Yasas Wijesekara, Ling-Yi Wu, Rick Beeloo, Piotr Rozwalak, Ernestina Hauptfeld, Swapnil P. Doijad, Bas E. Dutilh, Lars Kaderali
bioRxiv, 2024.09.24.612722
To cite the code itself:
Installing Jaeger
option 1 : bioconda
The performance of the Jaeger workflow can be significantly increased by utilizing GPUs. To enable GPU support, the CUDA Toolkit and cuDNN library must be accessible to conda.
# create conda environment and install jaeger
mamba create -n jaeger -c bioconda jaeger-bio==1.2
# activate environment
conda activate jaeger
Test the installation with test data
jaeger test
option 2 : Installing from pypi (recomended)
# create a conda environment and activate
mamba create -n jaeger -c nvidia -c conda-forge cuda-nvcc "python>=3.11,<=3.12" pip
conda activate jaeger
# OR create a virtual environment using venv
python3 -m venv jaeger
source jaeger/bin/activate
# to install jaeger with GPU support
pip install jaeger-bio[gpu]
# to install without GPU support
pip install jaeger-bio[cpu]
# to install on a Mac(arm)
pip install jaeger-bio[darwin-arm]
# test the installation
jaeger test
option 3 : Installing the dev version
# create a conda environment and activate
mamba create -n jaeger -c nvidia -c conda-forge cuda-nvcc "python>=3.11,<3.12" pip
conda activate jaeger
# OR create a virtual environment using venv
python3 -m venv jaeger
source jaeger/bin/activate
# install jaeger
# to install with GPU support
pip install --no-cache-dir "jaeger-bio[gpu] @ git+https://github.com/MGXlab/Jaeger@dev"
# to install without GPU support
pip3 install --root-user-action=ignore --no-cache-dir "jaeger-bio[cpu] @ git+https://github.com/MGXlab/Jaeger@dev"
# to install on a Mac(arm)
pip3 install --root-user-action=ignore --no-cache-dir "jaeger-bio[darwin-arm] @ git+https://github.com/MGXlab/Jaeger@dev"
# test the installation
jaeger test
option 4 : Apptainer (singularity)
If you're using Apptainer on a cluster, it's recommended to build the container on your local machine and then transfer it to the cluster.
# get the container def
wget -O jaeger_singularity.def https://raw.githubusercontent.com/Yasas1994/Jaeger/dev/singularity/jaeger_singularity.def
# get the configuration file
wget -O config.json https://raw.githubusercontent.com/Yasas1994/Jaeger/dev/src/jaeger/data/config.json
# to build the container
apptainer build jaeger.sif singularity/jaeger_singularity.def
# test container
apptainer run --nv jaeger.sif jaeger --help
# test the installation
apptainer run --nv jaeger.sif jaeger test
# list jaeger models available for download
apptainer run --nv jaeger.sif download --list
# download jaeger models
apptainer run --nv jaeger.sif download --model jaeger_57341_1.5M_fragment --path /path/to/save/model --config /path/to/config.json
# run jaeger
apptainer run --nv jaeger.sif predict --model jaeger_57341_1.5M_fragment --config /path/to/config.json -i /path/to/input.fasta -o /path/to/save/results
Downloading models
Starting from version 1.2.0, users will need to download the new models separately after installing Jaeger. However, for backward compatibility, Jaeger will still include the old model by default.
Use the --list flag to print out all models available for download
jaeger download --list
Then to download the model and add it to the model path run
jaeger download --path /path/to/store/models --model jaeger_38341_1.4M
If you decide to change the model path later, or if you have a dir witg newly trained/tuned models register the path
jaeger register-models --path /new/model/path
Running Jaeger
CPU/GPU mode
Once the environment is properly set up, using Jaeger is straightforward. The program can accept both compressed and uncompressed .fasta files containing the contigs as input. It will output a table containing the predictions and various statistics calculated during runtime.
jaeger predict -i input_file.fasta -o output_dir --batch 128
To run jaeger with singularity
apptainer run --nv jaeger.sif jaeger predict -i input_file.fasta -o output_dir --batch 128
Selecting the batch parameter
You can control the number of parallel computations using this parameter. By default it is set to 96. If you run into OOM errors, please consider setting the --bactch option to a lower value. for example 96 is good enough for a graphics card with 4 Gb of memory.
What is in the output?
All predictions are summarized in a table located at output_dir/<input_file>_default.jaeger.tsv
┌───────────────────────────────────┬────────┬────────────┬─────────┬───┬─────────────┬────────────────┬──────────────────┬───────────────┐
│ contig_id ┆ length ┆ prediction ┆ entropy ┆ … ┆ Archaea_var ┆ window_summary ┆ terminal_repeats ┆ repeat_length │
╞═══════════════════════════════════╪════════╪════════════╪═════════╪═══╪═════════════╪════════════════╪══════════════════╪═══════════════╡
│ NODE_1109_length_9622_cov_23.163… ┆ 9622 ┆ Phage ┆ 0.43 ┆ … ┆ 0.143 ┆ 1V1n2V ┆ null ┆ null │
│ NODE_1181_length_9275_cov_26.864… ┆ 9275 ┆ Phage ┆ 0.327 ┆ … ┆ 0.504 ┆ 4V ┆ null ┆ null │
│ NODE_123_length_36569_cov_24.228… ┆ 36569 ┆ Phage ┆ 0.503 ┆ … ┆ 1.554 ┆ 9V1n7V ┆ null ┆ null │
│ NODE_149_length_32942_cov_23.754… ┆ 32942 ┆ Phage ┆ 0.458 ┆ … ┆ 3.229 ┆ 3V1n1n11V ┆ null ┆ null │
│ NODE_231_length_24276_cov_21.832… ┆ 24276 ┆ Phage ┆ 0.502 ┆ … ┆ 1.467 ┆ 1V1n3V1n5V ┆ null ┆ null │
└───────────────────────────────────┴────────┴────────────┴─────────┴───┴─────────────┴────────────────┴──────────────────┴───────────────┘
This table provides information about various contigs in a metagenomic assembly. Each row represents a single contig, and the columns provide information about the contig's ID, length, the number of windows identified as prokaryotic, viral, eukaryotic, and archaeal, the prediction of the contig (Phage or Non-phage), the score of the contig for each category (bacterial, viral, eukaryotic and archaeal), and a summary of the windows. The table can be used to identify potential phage sequences in the metagenomic assembly based on the prediction column. The score columns can be used to further evaluate the confidence of the prediction and the window summary column can be used to understand the count of windows that contributed to the final prediction.
Options
jaeger run --help
## Jaeger 1.1.30 (yet AnothEr phaGe idEntifier) Deep-learning based bacteriophage discovery
https://github.com/Yasas1994/Jaeger.git
usage: jaeger run -i INPUT -o OUTPUT
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
path to input file
-o OUTPUT, --output OUTPUT
path to output directory
--fsize [FSIZE] leng
Related Skills
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
last30days-skill
13.4kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
