SkillAgentSearch skills...

SsuisChara

An integrated analysis pipeline for Streptococcus suis

Install / Use

/learn @guogenglin/SsuisChara
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SsuisChara

SsuisChara is a tool for integrally analyze the characteristics of Streptococcus suis using assemblied genome sequence for:

  • Species verification
  • Serotype prediction
  • MLST sequence type
  • Virulence associated factors (vafs) screen
  • Human infection potential
  • Antimicrobial resistance determinants

External Dependencies

BLAST+

prodigal

Usage

Put the python script and database folder into the folder contains your sequence file

SsuisChara [-i] [-o] [-t] [-s] [--min_gene_cov] [--min_gene_id ] [--vf_screen_mode] [--vf_location_details] [--heatmap] [-v]
Input and Output:
  -i, --input             Input FASTA file
  -o, --output            Output file
Parameters:
  -s, --species           Which method you want to use to perform species identification. You can choose "16s" or "ani", ani is much more accurate [default: 16s]
  -t, --threads           Threads to use for BLAST searches
  --min_gene_cov          Minimum percentage coverage to consider a single gene complete. [default: 80.0%]
  --min_gene_id           Minimum percentage identity to consider a single gene complete. [default: 70.0%]
  --vf_screen_mode        The virulence factor screen mode, two modes, "concise" and "full" were provided. "concise" was set as default
  --vf_location_details   A table contain the location of every screened virulence factors in the input sequence, two modes, "n" and "y" were provided. "n" was set as default.
  --heatmap              Generate a heatmap file to visualize the prevalence of VFs
  -v, --version           Show version number and exit

Quick usage

python SsuisChara.py -i *.fasta 

Species verification

In our latest version, we provide two methods for species verification, allowing users to choose the one that best suits their needs: the '16s' method, based on 16S rRNA sequence alignment, and the 'ani' method, based on average nucleotide identity (ANI).

In 2024, a new subspecies of Streptococcus suis, Streptococcus suis subsp. hashimotonensis, was reported, complicating species identification. Isolates of this subspecies share over 95% ANI with other Streptococcus suis strains, which means that performing ANI using only Streptococcus suis isolates as reference is insufficient. which means we cannot only use Streptococcus suis isolate to perform ANI, then I think why don't we collect all type strains within the genus Streptococcus for ANI analysis, while keeping the 16S alignment method as the default.

This approach allows users to choose based on their goals. For example, if a user wants to perform serotype or virulence factor screening on isolates with a clean background, the faster 16S-based method is sufficient. However, if precise species verification of an input genome is required, the ANI-based method can provide a more accurate identification, indicating the exact species within the genus Streptococcus, rather than returning a generic 'NA' as with the old 16S method.

Note that the ANI method is approximately twice as time-consuming as the 16S-based method. We use a 95% ANI threshold and a 97% threshold for 16S rRNA alignment. And if you choose to use 'ani' method, please make sure you have installed fastANI.

Serotype prediction

Initially, serotyping was based on serological tests, subsequently, the serotyping determine locus of many bacteria were found in the genome, and a lot of molecular serotyping method were developed based on the difference of serotyping determine locus, such as multiplex PCR. Now the number of acquired sequenced bacteria genome in public databased are fastly increasing, allow us to explore a full locus alignment method to high throughput and precisely determine serotype in silico.

For Streptococcus suis, the serotype determine locus is capsular polysaccharide (cps) locus. In this analysis, the cps locus of 29 classic Streptococcus suis serotype were collected and used as reference to find the best match in input genome to determine the serotype. Dispite a lot of novel capsular polysaccharide loci (NCL) were found, however, the importance and prevalence of these serotype as limited, so we haven't include them, however, we may upload in the future.

To aviod the uncorrected prediction, we displayed the coverage and identity of predicted serotype, if it lower than the threshold (we set as 95% prelimitarily), a "?" will be added to the end of output string of predicted serotype.

MLST sequence type

The MLST database were obtained from pubMLST, update: 2026-04-01

Seven housekeeping genes, aroA, cpn60, dpr, gki, mutS, recA, and thrA, of Streptococcus suis, were screened in input genome and return there allele number, or closest allele number, then determine there Sequence Type (ST), if not every allele number are exact match, a "?" will be added to the end of output string of predicted ST.

Virulence associated factors (vafs) screen

Total 107 vafs of S. suis were collected from published papers and established as database to screen there presence and absence in input genome. 56 vafs distributed in accessory genome of S. suis, 51 vafs distributed in core genome of S. suis, two screen mode are provided, "concise" and "full", "concise" mode was set as default, only screen 56 vafs in accessory genome, "full" mode could screen all vafs.

A table include the virulence prevalence information will be generated by add a '''--vf_location_details''' paramater, contained the location of matched genes of known VFs. A heatmap based on the presence and absence of vafs could be generated and clustered to visualize the vafs prevalence use '''--heatmap''' command.

Human infection potential

Several vafs are found associated with human S. suis infection in previous study, there prevalence ratio in human S. suis isolated were used as weight of each vafs, we use the prevalence of these vafs to predict the human infection potential of each source isolate of input gneomic sequence.

The summation of weights were named as "zoonotic score", if zoonotic_score >= 70.0, we give human_infection_potential as "high", if 30.0 <= zoonotic_score < 70.0, we give human_infection_potential as "medium", if zoonotic_score < 30.0, we give human_infection_potential as "low".

Antimicrobial resistance determinants

Aminoglycoside, macrolide, and tetracycline are major class of antimicrobial drugs resist by S. suis, we screened the known AMRGs resist these drugs to determine the antimicrobial resistance level of each source isolate of input gneomic sequence, the AMRG_level = The number of these 3 AMR drugs class covered by the AMRG screened in input genome. Then the screened AMRGs will be output in output file.

Output

A simlified result will generated in terminal, inputfile + species + serotype + ST

屏幕截图 2025-02-27 151443

A detailed table generated in work folder

output

Also a clusterd heatmap

heatmap

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated4d ago
Forks1

Languages

Python

Security Score

85/100

Audited on Apr 1, 2026

No findings