Metakraken
A snakemake pipeline to process metagenomics samples through kraken + bracken to generate taxonomic profiles
Install / Use
/learn @ramay/MetakrakenREADME
MetaKraken
Snakemake pipeline for profiling composition of microbial communities from metagenomic shotgun sequencing data using Kraken2 and Bracken.
Overview
Input:
Quality processed paired-end fastq files from shotgun metagenome sequencing.
Output:
- Table of microbial species and their relative abundance for each sample, output/merged_abundance_table.txt
- Heatmap of abundance results, output/abundance_heatmap_species.png
- Profile plot pdfs generated per sample or a user defined metadata varaible
Pipeline summary
<img src="utils/rulegraph.png" width="450">Steps
- Kraken2 is used to generate profiles of microbial clades and their abundances.
- Bracken is used to re-estimate reads assigned to species.
- Kreprot2mpa is used to generate profiles in mpa format which is similar to one used by MetaphlAn2
- Normalized mpa files to generate percentages of each clade
- Merge all sampel profiles into one table.
- Generate a heatmap with N most abundant species.
- Generate barplots with taxonomic information
Installation
To use this pipeline, navigate to your project directory and clone this repository into that directory using the following command:
git clone https://github.com/SycuroLab/metakraken.git metakraken
Note: you need to have conda and snakemake installed in order to run this. To install conda, see the instructions here.
To install snakemake using conda, run the following line:
conda install -c bioconda -c conda-forge snakemake
See the snakemake installation webpage for further details.
Config file
All the parameters required to run this pipeline are specified in a config file, written in yaml. See/modify the provided example file with your custom parameters, called config.yaml.
Important parameters
General
- list_files: a file containing the sample names
- path: path to trimmed fastq files
- for: Suffix for forward reads
- rev: suffix for reverse reads
Kraken + Bracken related
-
kraken_db: Location of the Kraken database to use
-
level: Level for bracken taxa (defualt 'S', Option:'D', 'P', 'C', 'O', 'F', 'G', 'S')
-
redreadlen: Read length required for Bracken
-
threshold: specifies the minimum number of reads required for a classification at the specified rank.
Barplots related (used by plot_profile.R)
- variableX: X-axis variable to make barplots at different taxa level
- variableFacet: Facet variable to make barplots at different taxa level
- topN: Top N most abundant families/genra/species to be plotted by the R script
- metadata: comma delimited (csv) metadata file with Sample Names same as used in list_files file
Running Instructions
Test the pipeline by running snakemake -np. This command prints out the commands to be run without actually running them.
To run the pipeline on the Synergy compute cluster, enter the following command from the project directory:
snakemake --cluster-config cluster.json --cluster 'bsub -n {cluster.n} -R {cluster.resources} -W {cluster.walllim} -We {cluster.time} -M {cluster.maxmem} -oo {cluster.output} -e {cluster.error}' --jobs 500 --use-conda
The above command submits jobs to Synergy, one for each sample and step of the QC pipeline. Note: the file cluster.json contains the parameters for the LSF job submission system that Synergy uses. In most cases, this file should not be modified.
Results and log files
Snakemake will create a directory for the results of the pipeline (default: output) as well as a directory called logs for all the log files.
Addition information
Information about Kraken2 can be found on https://ccb.jhu.edu/software/kraken2/index.shtml?t=manual Information about Bracken can be found on https://ccb.jhu.edu/software/bracken/
How to build Databases:
Kraken:
At the moment standard Kraken2 db is downloaded and prepared to be used with this snakemake file. It is saved as kraken2db_2020_02_26 in /gpfs/snyder_work/shared/lsycuro_labshare/dbs/kraken2db_2020_02_26
The command used to build it is
kraken2-build --standard --threads THREADS --db DBNAME
You can also build custom databases. Please read the kraken2 manual for more information.
Bracken:
Bracken files are built using Kraken2 db. It requres Kmer length (defualt=35) and read length of the paried-end reads. You might have to re-run bracke-build if your read length has not already been build
bracken-build -d KRAKEN_DB -t THREADS -k KMER_LEN -l READ_LEN
Note: Kraken2 is working with nullarbor conda file but not with kraken2 conda file
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
