SsuisChara
An integrated analysis pipeline for Streptococcus suis
Install / Use
/learn @guogenglin/SsuisCharaREADME
SsuisChara
SsuisChara is a tool for integrally analyze the characteristics of Streptococcus suis using assemblied genome sequence for:
- Species verification
- Serotype prediction
- MLST sequence type
- Virulence associated factors (vafs) screen
- Human infection potential
- Antimicrobial resistance determinants
External Dependencies
BLAST+
prodigal
Usage
Put the python script and database folder into the folder contains your sequence file
SsuisChara [-i] [-o] [-t] [-s] [--min_gene_cov] [--min_gene_id ] [--vf_screen_mode] [--vf_location_details] [--heatmap] [-v]
Input and Output:
-i, --input Input FASTA file
-o, --output Output file
Parameters:
-s, --species Which method you want to use to perform species identification. You can choose "16s" or "ani", ani is much more accurate [default: 16s]
-t, --threads Threads to use for BLAST searches
--min_gene_cov Minimum percentage coverage to consider a single gene complete. [default: 80.0%]
--min_gene_id Minimum percentage identity to consider a single gene complete. [default: 70.0%]
--vf_screen_mode The virulence factor screen mode, two modes, "concise" and "full" were provided. "concise" was set as default
--vf_location_details A table contain the location of every screened virulence factors in the input sequence, two modes, "n" and "y" were provided. "n" was set as default.
--heatmap Generate a heatmap file to visualize the prevalence of VFs
-v, --version Show version number and exit
Quick usage
python SsuisChara.py -i *.fasta
Species verification
In our latest version, we provide two methods for species verification, allowing users to choose the one that best suits their needs: the '16s' method, based on 16S rRNA sequence alignment, and the 'ani' method, based on average nucleotide identity (ANI).
In 2024, a new subspecies of Streptococcus suis, Streptococcus suis subsp. hashimotonensis, was reported, complicating species identification. Isolates of this subspecies share over 95% ANI with other Streptococcus suis strains, which means that performing ANI using only Streptococcus suis isolates as reference is insufficient. which means we cannot only use Streptococcus suis isolate to perform ANI, then I think why don't we collect all type strains within the genus Streptococcus for ANI analysis, while keeping the 16S alignment method as the default.
This approach allows users to choose based on their goals. For example, if a user wants to perform serotype or virulence factor screening on isolates with a clean background, the faster 16S-based method is sufficient. However, if precise species verification of an input genome is required, the ANI-based method can provide a more accurate identification, indicating the exact species within the genus Streptococcus, rather than returning a generic 'NA' as with the old 16S method.
Note that the ANI method is approximately twice as time-consuming as the 16S-based method. We use a 95% ANI threshold and a 97% threshold for 16S rRNA alignment. And if you choose to use 'ani' method, please make sure you have installed fastANI.
Serotype prediction
Initially, serotyping was based on serological tests, subsequently, the serotyping determine locus of many bacteria were found in the genome, and a lot of molecular serotyping method were developed based on the difference of serotyping determine locus, such as multiplex PCR. Now the number of acquired sequenced bacteria genome in public databased are fastly increasing, allow us to explore a full locus alignment method to high throughput and precisely determine serotype in silico.
For Streptococcus suis, the serotype determine locus is capsular polysaccharide (cps) locus. In this analysis, the cps locus of 29 classic Streptococcus suis serotype were collected and used as reference to find the best match in input genome to determine the serotype. Dispite a lot of novel capsular polysaccharide loci (NCL) were found, however, the importance and prevalence of these serotype as limited, so we haven't include them, however, we may upload in the future.
To aviod the uncorrected prediction, we displayed the coverage and identity of predicted serotype, if it lower than the threshold (we set as 95% prelimitarily), a "?" will be added to the end of output string of predicted serotype.
MLST sequence type
The MLST database were obtained from pubMLST, update: 2026-04-01
Seven housekeeping genes, aroA, cpn60, dpr, gki, mutS, recA, and thrA, of Streptococcus suis, were screened in input genome and return there allele number, or closest allele number, then determine there Sequence Type (ST), if not every allele number are exact match, a "?" will be added to the end of output string of predicted ST.
Virulence associated factors (vafs) screen
Total 107 vafs of S. suis were collected from published papers and established as database to screen there presence and absence in input genome. 56 vafs distributed in accessory genome of S. suis, 51 vafs distributed in core genome of S. suis, two screen mode are provided, "concise" and "full", "concise" mode was set as default, only screen 56 vafs in accessory genome, "full" mode could screen all vafs.
A table include the virulence prevalence information will be generated by add a '''--vf_location_details''' paramater, contained the location of matched genes of known VFs. A heatmap based on the presence and absence of vafs could be generated and clustered to visualize the vafs prevalence use '''--heatmap''' command.
Human infection potential
Several vafs are found associated with human S. suis infection in previous study, there prevalence ratio in human S. suis isolated were used as weight of each vafs, we use the prevalence of these vafs to predict the human infection potential of each source isolate of input gneomic sequence.
The summation of weights were named as "zoonotic score", if zoonotic_score >= 70.0, we give human_infection_potential as "high", if 30.0 <= zoonotic_score < 70.0, we give human_infection_potential as "medium", if zoonotic_score < 30.0, we give human_infection_potential as "low".
Antimicrobial resistance determinants
Aminoglycoside, macrolide, and tetracycline are major class of antimicrobial drugs resist by S. suis, we screened the known AMRGs resist these drugs to determine the antimicrobial resistance level of each source isolate of input gneomic sequence, the AMRG_level = The number of these 3 AMR drugs class covered by the AMRG screened in input genome. Then the screened AMRGs will be output in output file.
Output
A simlified result will generated in terminal, inputfile + species + serotype + ST
A detailed table generated in work folder

Also a clusterd heatmap

Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
