SkillAgentSearch skills...

Severus

A tool for somatic structural variant calling using long reads

Install / Use

/learn @KolmogorovLab/Severus
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Severus

<p> <img src="docs/severus_logo_right.png" alt="Severus logo" align="left" style="width:100px;"/> Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT). It is designed for matching tumor/normal analysis, supports multiple tumor samples, and produces accurate and complete somatic and germline calls. Severus takes advantage of long-read phasing and uses the breakpoint graph framework to model complex chromosomal rearrangements.

Refer to our paper for further details and citation.

For the haplotype-specific copy number analysis check Wakhan.

</p> <br/>

Contents

Installation

The easiest way to install is through conda:

conda create -n severus_env severus
conda activate severus_env
severus --help

Or alternatively, you can clone the repository and run without installation, but you'll still need to install the dependencies via conda:

git clone https://github.com/KolmogorovLab/Severus
cd Severus
conda env create --name severus_env --file environment.yml
conda activate severus_env
./severus.py

Quick Usage

Single sample SV calling (Tumor-only)

severus --target-bam phased_tumor.bam --out-dir severus_out -t 16 --phasing-vcf phased.vcf \
    --vntr-bed ./vntrs/human_GRCh38_no_alt_analysis_set.trf.bed --PON ./pon/PoN_1000G_hg38.tsv.gz

We are currently providing two PoN files for GRCh38 and chm13 which are generated using 1000G data. If you use any of these please cite this paper . To generate a PoN file please see below

Single sample somatic SV calling (Tumor/Normal pair)

severus --target-bam phased_tumor.bam --control-bam phased_normal.bam --out-dir severus_out \
    -t 16 --phasing-vcf phased.vcf --vntr-bed ./vntrs/human_GRCh38_no_alt_analysis_set.trf.bed

Multi-sample somatic SV calling

severus --target-bam phased_tumor1.bam phased_tumor2.bam --control-bam phased_normal.bam \
    --out-dir severus_out -t 16 --phasing-vcf phased.vcf \
    --vntr-bed ./vntrs/human_GRCh38_no_alt_analysis_set.trf.bed

Haplotagged (phased) alignment input is highly recommended but not required. See below for the detailed instructions on how to prepare haplotagged alignments. If using haplotagged bam, the matching phased VCF file should be provided as --phasing-vcf option.

--vntr-bed argument is optional but highly recommended. VNTR annotations for common references are available in the vntrs folder. See below how to generate annotations for a custom reference.

Without --control-bam, only germline variants will be called.

After running, vcf files with somatic and germline(with somatic) calls are available at in the output folder, along with complexSV clusters and additional information about SVs. See [below][#breakpoint-graphs] for complexSV cluster outputs.

Inputs and Parameters

Required

--target-bam    path to one or multiple target bam files (e.g. tumor, must be indexed) 
--out-dir       path to output directory

Highly recommended

--control-bam     path to the control bam file (e.g. normal, must be indexed)
--vntr-bed        path to bed file for tandem repeat regions (must be ordered)
--phasing-vcf     path to vcf file used for phasing (if using haplotype specific SV calling)

For Tumor-only runs

--PON             path to the panel of normal file (e.g. ./pon/PoN_1000G_hg38.tsv.gz)

Optional parameters

--threads               number of threads [8]
--min-support           minimum number of reads supporting a breakpoint [3]
--vaf-thr               variant allele frequency threshold for SVs
--TIN-ratio             tumor in normal ratio [0.01]
--min-mapq              minimum mapping quality for aligned segment [10]
--max-genomic-len       maximum length of genomic segment to form connected components [2Mb]
--min-sv-size           minimum SV size to be reported [50]
--min-reference-flank   minimum distance between a breakpoint and reference ends [10000]
--write-alignments      write read alignments to file
--bp-cluster-size       maximum distance in bp cluster [50]
--write-collapsed-dup   outputs a bed file with identified collapsed duplication regions
--no-ins-seq            do not output insertion sequences to the vcf file
--resolve-overlaps      resolve overlaps between split alignments.
--between-junction-ins  report unmapped insertions around breakpoints
--single-bp             to add hanging breakpoints 
--output-read-ids       outputs read IDs for support reads
--use-supplementary-tag to use HP tag in supplementary alignments. Need to be added if HiPhase or LongPhase is used for haplotagging.
--low-quality           to use more strict settings if one of the samples has a lower quality
--use_germline_genotype to use genotyping for diploid samples

Benchmarking Severus and other SV callers

For the details of benchmarking and complete results, please check our paper

Germline benchmarking results using HG002

First, we verified performance of Severus on a germline SV benchmark. We compared Severus, sniffles2 and cuteSV using HG002 GIAB SV benchmark set. Comparison was perfromed using Minda. The benchamrking was done against the grch38 reference, for consistency with the benchmarks below (confident regions were lifted over). Severus and Sniffles2 performed similarly, with CuteSV running a bit behind.

|SV Caller | TP | FN | FP | Precision | Recall | F1 score | |----------|------|-----|-----|-----------|--------|----------| |Severus|9282|364|556|0.943|0.962|0.953| |sniffles2|9324|322|622|0.937|0.967|0.952| |cuteSV|9291|355|915|0.910|0.963|0.936|

Somatic benchmarking results COLO829s

We compared the performance of existing somatic SV callers nanomonSV, SAVANA and sniffles2 in mosaic mode using COLO829 cell line data against multi-platform Valle-Inclan et al. truthset.
We compared somatic SVs using Minda with 0.1 VAF threshold. Severus had the highest recall and precision on the HiFi dataset, and highest recall on the ONT dataset, with nanomonsv having highest precision.

|Technology|Caller|TP|FN|FP|Precision|Recall|F1| |----------|------|--|--|--|--------|-------|--| |PacBio|Severus|59|9|23|0.720|0.868|0.787| |PacBio|SAVANA|54|14|80|0.403|0.794|0.535| |PacBio|nanomonsv|51|17|15|0.773|0.750|0.761| |PacBio|Sniffles2|36|32|203|0.151|0.529|0.235| |-----| |ONT_R10|Severus|59|9|16|0.787|0.868|0.825| |ONT_R10|SAVANA|57|11|26|0.687|0.838|0.755| |ONT_R10|nanomonsv|50|18|11|0.820|0.735|0.775| |ONT_R10|Sniffles2|37|31|369|0.091|0.544|0.156| |-----| |Illumina|SvABA|48|20|10|0.828|0.706|0.762| |Illumina|GRIPSS*|55|13|0|1.000|0.809|0.894| |Illumina|manta|46|22|7|0.868|0.676|0.760|

Somatic Benchmarking results: Tumor/Normal Cell line pairs

We compared the performance of the somatic SV callers using 5 tumor/normal cell line pairs. Since no ground truth SV calls are available, we created an ensemble set of SVs supported by 2+ technology and 4+ callers (out off 11) for each dataset. This assumes that singleton calls are false-positives, and calls supported by multiple tools are more reliable. Severus consistently had the highest recall and precision against the ensemble SV sets.

<p align="center"> <img src="docs/bench.png" alt="benchmarking" style="width:90%"/> </p>

Output Files

VCF file

For each target sample, Severus outputs a VCF file with somatic SV calls. If the input alignment is haplotagged, haplotype will be reported as HP in INFO. In addition, Severus outputs a set of all SVs (somatic + germline) for each input sample. VCF contains additional information about SVs, such as the clustering of complex variants. Please see the detailed description below.

html plots

Severus outputs a breakpoint graph as interactive plotly graph that describes the derived structure of tumor haplotypes. See the detailed description below.

breakpoint_double.csv

Detailed info about the detected breakpoints for all samples in text format, intended for an advanced user.

breakpoint_clusters_list.tsv

A summary file for the complexSVs clusters.

breakpoint_clusters.tsv

Detailed information of the junctions in involved in complex SVs.

Overview of the Severus algorithm

<p align="center"> <img src="docs/g1.png" alt="Severus workflow" style="width:90%"/> </p>

Somatic SVs in cancer are typically more complex compared to germline SVs. For example, breakage-fusion-bridge (BFB) amplifications are characterized by multiple foldback in

View on GitHub
GitHub Stars166
CategoryDevelopment
Updated6d ago
Forks13

Languages

Python

Security Score

80/100

Audited on Mar 22, 2026

No findings