cuteSV

Notice

The force calling module has been disabled in cuteSV, please install cuteFC to achieve SV force calling/regenotyping.

A new wiki page about diploid-assembly-based SV detection using cuteSV has been established. More details please see here.

Getting Start

                                               __________    __       __
                                              |   ____   |  |  |     |  |
                          _                   |  |    |__|  |  |     |  |
 _______    _     _   ___| |___     ______    |  |          |  |     |  |
|  ___  |  | |   | | |___   ___|   / ____ \   |  |_______   |  |     |  |
| |   |_|  | |   | |     | |      / /____\ \  |_______   |  |  |     |  |
| |        | |   | |     | |      | _______|   __     |  |  \  \     /  /
| |    _   | |   | |     | |  _   | |     _   |  |    |  |   \  \   /  /
| |___| |  | |___| |     | |_| |  \ \____/ |  |  |____|  |    \  \_/  /
|_______|  |_______|     |_____|   \______/   |__________|     \_____/

Installation

$ pip install cuteSV
or
$ conda install -c bioconda cutesv
or 
$ git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && python setup.py install

Introduction

Long-read sequencing enables the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high sensitivity and performance simultaneously due to the complex SV characteristics implied by noisy long reads. Therefore, we propose cuteSV, a sensitive, fast and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to analyze the signatures to implement sensitive SV detection. Benchmarks on real Pacific Biosciences (PacBio) and Oxford Nanopore Technology (ONT) datasets demonstrate that cuteSV has better yields and scalability than state-of-the-art tools.

The benchmark results of cuteSV on the HG002 human sample are below:

BTW, we used Truvari to calculate the recall, precision, and f-measure. For more detailed implementation of SV benchmarks, we show an example here.

Dependence

1. python3
2. scipy
2. pysam
3. Biopython
4. cigar
5. numpy
6. pyvcf3
7. scikit-learn

Usage

cuteSV <sorted.bam> <reference.fa> <output.vcf> <work_dir>

Suggestions

> For PacBio CLR data:
	--max_cluster_bias_INS		100
	--diff_ratio_merging_INS	0.3
	--max_cluster_bias_DEL	200
	--diff_ratio_merging_DEL	0.5

> For PacBio CCS(HIFI) data:
	--max_cluster_bias_INS		1000
	--diff_ratio_merging_INS	0.9
	--max_cluster_bias_DEL	1000
	--diff_ratio_merging_DEL	0.5

> For ONT data:
	--max_cluster_bias_INS		100
	--diff_ratio_merging_INS	0.3
	--max_cluster_bias_DEL	100
	--diff_ratio_merging_DEL	0.3

| Parameter | Description | Default | | :------------ |:---------------|-------------:| |--threads|Number of threads to use.| 16 | |--batches| Batch of genome segmentation interval.|10,000,000| |--sample| Sample name/id |NULL| |--retain_work_dir|Enable to retain temporary folder and files.|False| |--write_old_sigs|Enable to output temporary sig files.|False| |--report_readid|Enable to report supporting read ids for each SV.|False| |--max_split_parts|Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)|7| |--min_mapq|Minimum mapping quality value of alignment to be taken into account.|20| |--min_read_len|Ignores reads that only report alignments with not longer than bp.|500| |--merge_del_threshold|Maximum distance of deletion signals to be merged.|0| |--merge_ins_threshold|Maximum distance of insertion signals to be merged.|100| |--min_support|Minimum number of reads that support a SV to be reported.|10| |--min_size|Minimum length of SV to be reported.|30| |--max_size|Maximum size of SV to be reported. Full length SVs are reported when using -1.|100000| |--genotype|Enable to generate genotypes.|False| |--gt_round|Maximum round of iteration for alignments searching if perform genotyping.|500| |--read_range|The interval range for counting reads distribution.|1000| |--max_cluster_bias_INS|Maximum distance to cluster read together for insertion.|100| |--diff_ratio_merging_INS|Do not merge breakpoints with basepair identity more than the ratio of default for insertion.|0.3| |--max_cluster_bias_DEL|Maximum distance to cluster read together for deletion.|200| |--diff_ratio_merging_DEL|Do not merge breakpoints with basepair identity more than the ratio of default for deletion.|0.5| |--max_cluster_bias_INV|Maximum distance to cluster read together for inversion.|500| |--max_cluster_bias_DUP|Maximum distance to cluster read together for duplication.|500| |--max_cluster_bias_TRA|Maximum distance to cluster read together for translocation.|50| |--diff_ratio_filtering_TRA|Filter breakpoints with basepair identity less than the ratio of default for translocation.|0.6| |--remain_reads_ratio|The ratio of reads remained in cluster to generate the breakpoint. Set lower to get more precise breakpoint when the alignment data have high quality but recommand over 0.5.|1| |-include_bed|Optional given bed file. Only detect SVs in regions in the BED file.|NULL|

Datasets generated from cuteSV

We provided the SV callsets of the HG002 human sample produced by cuteSV form three different long-read sequencing platforms (i.e. PacBio CLR, PacBio CCS, and ONT PromethION).

You can download them at:

Please cite the manuscript of cuteSV before using these callsets.

Changelog

cuteSV (v2.1.3)
1. fix RNAMES field in INFO to remove "NULL"
2. fix the alternative allele in translocations
3. optimize the output logs

cuteSV (v2.1.2)
1. disable the force calling function, please install cuteFC (https://github.com/Meltpinkg/cuteFC) to achieve SV force calling
2. add reference path for processing .cram files
3. add parameter for excluding output sequences
4. fix bugs in resolving insertion sequences

cuteSV (v2.1.1)
1. fix bugs in resolving reference genomes
2. modify several dependencies and remove some useless dependencies
3. update several evaluation scripts

cuteSV (v2.1.0)
1. Speed up both SV discovery calling and force calling comprehensively.
2. Upgrade the force calling module.
3. Modify the temporary files. The sigs file are only generated with the "write_old_sigs" parameter.
4. Update several regulations in signature extraction.

cuteSV (v2.0.3):
1. Fix the error of missing min_size parameter.
2. Fix the missing signatures in duplication clustering.

cuteSV (v2.0.2):
1. Fix several errors in signature extraction.
2. Filter low quality reads in the statistics of reference reads.
3. Modify the rule of merging signatures on the same read.
4. Modify the cluster rule of insertions and deletions in force calling.

cuteSV (v2.0.1):
1.

CuteSV

Install / Use

README