SeqSero2
SeqSero2
Install / Use
/learn @denglab/SeqSero2README
SeqSero2
Salmonella serotype prediction from genome sequencing data.
Online version: http://www.denglab.info/SeqSero2
Introduction
SeqSero2 is a pipeline for Salmonella serotype prediction from raw sequencing reads or genome assemblies
Dependencies
SeqSero2 has three workflows:
(A) Allele micro-assembly (default). This workflow takes raw reads as input and performs targeted assembly of serotype determinant alleles. Assembled alleles are used to predict serotype and flag potential inter-serotype contamination in sequencing data (i.e., presence of reads from multiple serotypes due to, for example, cross or carryover contamination during sequencing).
Allele micro-assembly workflow depends on:
-
Python 3;
-
Biopython 1.73;
(B) Raw reads k-mer. This workflow takes raw reads as input and performs rapid serotype prediction based on unique k-mers of serotype determinants.
Raw reads k-mer workflow (originally SeqSeroK) depends on:
- Python 3;
- SRA Toolkit (optional, just used to fastq-dump sra files);
(C) Genome assembly k-mer. This workflow takes genome assemblies as input and the rest of the workflow largely overlaps with the raw reads k-mer workflow
Installation
Conda
To install the latest SeqSero2 Conda package (recommended):
conda install -c bioconda seqsero2=1.3.1
Git
To install the SeqSero2 git repository locally:
git clone https://github.com/denglab/SeqSero2.git
cd SeqSero2
python3 -m pip install --user .
Other options
Third party SeqSero2 installations (may not be the latest version of SeqSero2):
https://github.com/B-UMMI/docker-images/tree/master/seqsero2
https://github.com/denglab/SeqSero2/issues/13
Executing the code
Make sure all SeqSero2 and its dependency executables are added to your path (e.g. to ~/.bashrc). Then type SeqSero2_package.py to get detailed instructions.
Usage: SeqSero2_package.py
-m <string> (which workflow to apply, 'a'(raw reads allele micro-assembly), 'k'(raw reads and genome assembly k-mer), default=a)
-t <string> (input data type, '1' for interleaved paired-end reads, '2' for separated paired-end reads, '3' for single reads, '4' for genome assembly, '5' for nanopore reads (fasta/fastq))
-i <file> (/path/to/input/file)
-p <int> (number of threads for allele mode, if p >4, only 4 threads will be used for assembly since the amount of extracted reads is small, default=1)
-b <string> (algorithms for bwa mapping for allele mode; 'mem' for mem, 'sam' for samse/sampe; default=mem; optional; for now we only optimized for default "mem" mode)
-d <string> (output directory name, if not set, the output directory would be 'SeqSero_result_'+time stamp+one random number)
-c <flag> (if '-c' was flagged, SeqSero2 will only output serotype prediction without the directory containing log files)
-n <string> (optional, to specify a sample name in the report output)
-s <flag> (if '-s' was flagged, SeqSero2 will not output header in SeqSero_result.tsv)
--check <flag> (use '--check' flag to check the required dependencies)
-v, --version (show program's version number and exit)
Examples
Allele mode:
# Allele workflow ("-m a", default), for separated paired-end raw reads ("-t 2"), use 10 threads in mapping and assembly ("-p 10")
SeqSero2_package.py -p 10 -t 2 -i R1.fastq.gz R2.fastq.gz
K-mer mode:
# Raw reads k-mer ("-m k"), for separated paired-end raw reads ("-t 2")
SeqSero2_package.py -m k -t 2 -i R1.fastq.gz R2.fastq.gz
# Genome assembly k-mer ("-t 4", genome assemblies only predicted by the k-mer workflow, "-m k")
SeqSero2_package.py -m k -t 4 -i assembly.fasta
Output
Upon executing the command, a directory named 'SeqSero_result_Time_your_run' will be created. Your result will be stored in 'SeqSero_result.txt' in that directory. And the assembled alleles can also be found in the directory if using "-m a" (allele mode).
Citation
Zhang S, Den-Bakker HC, Li S, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. SeqSero2: rapid and improved Salmonella serotype determination using whole genome sequencing data. Appl Environ Microbiology. 2019 Sep; 85(23):e01746-19. PMID: 31540993
Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X.
Salmonella serotype determination utilizing high-throughput genome sequencing data.
J Clin Microbiol. 2015 May;53(5):1685-92. PMID: 25762776
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
