SkillAgentSearch skills...

Vase

Variant Annotation, Segregation and Exclusion for family or cohort based rare-disease sequencing studies.

Install / Use

/learn @david-a-parry/Vase
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

VASE

This is a program for Variant Annotation, Segregation and Exclusion for family or cohort based rare-disease sequencing studies.

INTRODUCTION

VASE can be used to filter VCF files based on allele frequency data, functional consequences from VEP, presence/absence of variants in cases vs controls and inheritance patterns within families. It is designed primarily for use in rare disease cohort or familial studies.

In order to make the most of the functions VASE provides, you will require a multi-sample, VEP annotated VCF. In order to confidently identify variants segregating within families consistent with dominant/recessive/de novo inheritance patterns, your VCF should have been made by calling all of your samples simultaneously (e.g. using the GATK joint-calling workflow).

Detailed instructions and examples to follow in the VASE wiki.

INSTALLATION

VASE requires python3. It has been tested with python 3.5 and 3.6. The modules 'pysam' and 'natsort' from pypi are required and should be installed for you if following the instructions below. You may also wish to install biopython, which is required if you want to write missing CADD/spliceAI scores to bgzipped output.

To install the vase script to $HOME/.local/bin (or possibly on Mac OS /Users/$USER/Library/Python/3.*/bin/) the simplest way is to use pip:

pip3 install git+https://github.com/david-a-parry/vase.git --user

To install with the extra modules required for bgzip output and vase_reporter functionality (recommended) use the following:

pip3 install git+https://github.com/david-a-parry/vase.git#egg=vase[BGZIP,REPORTER,MYGENEINFO] --user

To install system-wide remove the --user flag and ensure you have root priveleges (e.g. using sudo).

Alternatively, you may first clone this repository:

git clone https://github.com/david-a-parry/vase.git

Alternatively use the 'Clone or download' button above. From the newly created vase directory you may install either by running the setup.py script as follows:

python3 setup.py install --user

or by using pip, if installed:

#without extras
pip3 install . --user

#with extras (recommended)
pip3 install .[BGZIP,REPORTER,MYGENEINFO] --user

If you have root privileges you can install system wide as follows:

sudo python3 setup.py install

or:

sudo pip3 install .

USAGE/OPTIONS

usage: vase -i VCF [-o OUTPUT] [-r REPORT_PREFIX]
            [-burden_counts BURDEN_COUNTS] [-gnomad_burden] [-v QUAL]
            [-p | --keep_filters KEEP_FILTERS [KEEP_FILTERS ...]]
            [--exclude_filters EXCLUDE_FILTERS [EXCLUDE_FILTERS ...]]
            [-t TYPE [TYPE ...]] [-max_alts MAX_ALT_ALLELES]
            [--filter_asterisk_only_calls] [-af AF] [-min_af MIN_AF]
            [-filtering_an FILTERING_AN] [-min_an MIN_AN] [-ac AC]
            [-min_ac MIN_AC] [--info_filters INFO_FILTERS [INFO_FILTERS ...]]
            [-c [CSQ [CSQ ...]]] [--impact IMPACT [IMPACT ...]] [--canonical]
            [--flagged_features] [--biotypes BIOTYPE [BIOTYPE ...]]
            [--feature_blacklist FEATURE_BLACKLIST] [--loftee]
            [-m MISSENSE_FILTERS [MISSENSE_FILTERS ...]]
            [--filter_unpredicted] [--keep_if_any_damaging]
            [--splice_filters SPLICE_FILTERS [SPLICE_FILTERS ...]]
            [--splice_filter_unpredicted] [--splice_keep_if_any_damaging]
            [--retain_labels Label=Value [Label=Value ...]] [--no_vep_freq]
            [--vep_af VEP_AF [VEP_AF ...]] [--pathogenic] [--no_conflicted]
            [--g2p G2P] [--check_g2p_consequence] [--check_g2p_inheritance]
            [--region REGION [REGION ...] | --bed BED | --gene_bed BED]
            [--stream] [--exclude_regions] [--cadd_files FILE [FILE ...]]
            [-cadd_dir DIR] [--missing_cadd_scores FILE] [--cadd_phred FLOAT]
            [--cadd_raw FLOAT] [-d VCF [VCF ...]] [-g VCF [VCF ...]]
            [--gnomad_pops POP [POP ...]]
            [--vcf_filter VCF,ID[,INFO_FIELD ...] [VCF,ID[,INFO_FIELD ...]
            ...]] [--dng_vcf DNG_VCF [DNG_VCF ...]] [-f FREQ]
            [--min_freq MIN_FREQ]
            [--max_gnomad_homozygotes MAX_GNOMAD_HOMOZYGOTES] [-b dbSNP_build]
            [--max_build dbSNP_build] [--filter_known] [--filter_novel]
            [--clinvar_path] [-ignore_existing]
            [--splice_ai_vcfs VCF [VCF ...]] [--splice_ai_min_delta DELTA]
            [--splice_ai_max_delta DELTA] [--missing_splice_ai_scores FILE]
            [--cases SAMPLE_ID [SAMPLE_ID ...]]
            [--controls SAMPLE_ID [SAMPLE_ID ...]] [-ped PED] [-gq GQ]
            [-dp DP] [-max_dp MAX_DP] [-het_ab AB] [-hom_ab AB]
            [-con_gq CONTROL_GQ] [-con_dp CONTROL_DP]
            [-con_max_dp CONTROL_MAX_DP] [-con_het_ab AB] [-con_hom_ab AB]
            [-con_ref_ab AB] [-sv_gq SV_GQ] [-sv_dp SV_DP]
            [-sv_max_dp SV_MAX_DP] [-sv_het_ab AB] [-sv_hom_ab AB]
            [-sv_con_gq SV_CONTROL_GQ] [-sv_con_dp SV_CONTROL_DP]
            [-sv_con_max_dp SV_CONTROL_MAX_DP] [-sv_con_het_ab AB]
            [-sv_con_hom_ab AB] [-sv_con_ref_ab AB]
            [--duphold_del_dhffc DHFFC] [--duphold_dup_dhbfc DHBFC]
            [--control_duphold_del_dhffc DHFFC]
            [--control_duphold_dup_dhbfc DHBFC] [--n_cases N_CASES]
            [--n_controls N_CONTROLS] [--confirm_control_gts] [--biallelic]
            [--de_novo] [--dominant] [--min_families MIN_FAMILIES]
            [--singleton_recessive SAMPLE_ID [SAMPLE_ID ...]]
            [--singleton_dominant SAMPLE_ID [SAMPLE_ID ...]]
            [--seg_controls SAMPLE_ID [SAMPLE_ID ...]] [--strict_recessive]
            [--prog_interval N] [--log_progress] [--no_progress] [--quiet]
            [--debug] [--no_warnings] [--silent] [-h]

Variant annotation, segregation and exclusion.

Required Arguments:
  -i VCF, --input VCF   Input VCF filename
                        

Output Arguments:
  -o OUTPUT, --output OUTPUT
                        Filename for VCF output. If this ends in .gz or
                        .bgz the output will be BGZIP compressed.
                        Default = STDOUT
                        
  -r REPORT_PREFIX, --report_prefix REPORT_PREFIX
                        DEPRECATED - use the 'vase_reporter' program
                        provided alongside vase instead.
                        
                        Prefix for segregation summary report output
                        files. If either --biallelic, --de_novo or
                        --dominant options are in effect this option will
                        write summaries for segregating variants to files
                        with the respective suffixes of
                        '_recessive.report.tsv', '_de_novo.report.tsv' and
                        '_dominant.report.tsv'.
                        
  -burden_counts BURDEN_COUNTS, --burden_counts BURDEN_COUNTS
                        File for outputting 'burden counts' per
                        transcript. If specified, the number of alleles
                        passing specified filters will be counted for
                        each transcript identified. Requires your VCF
                        input to be annotated with Ensembl's VEP. Note,
                        that if --cases or --controls are specified when
                        using this argument, variants will not be filtered
                        on presence in cases/controls; instead counts will
                        be written for cases and controls to this file.
                        
  -gnomad_burden, --gnomad_burden
                        If using --burden_counts, use this flag to
                        indicate that the input is from gnomAD and should
                        be parsed per population.
                        

Annotation File Arguments:
  --cadd_files FILE [FILE ...], -cadd_files FILE [FILE ...]
                        One or more tabix indexed CADD annotation files
                        (such as those found at
                        http://cadd.gs.washington.edu/download). Variants
                        in your input that match any scored variant in
                        these files will have the CADD RawScore and PHRED
                        values added to the INFO field, one per ALT
                        allele. Alleles/variants can be filtered on these
                        scores using the --cadd_phred or --cadd_raw
                        options.
                        
  -cadd_dir DIR, --cadd_directory DIR
                        Directory containing one or more tabix indexed
                        CADD annotation files to be used as above. Only
                        files with '.gz' or '.bgz' extensions will be
                        included.
                        
  --missing_cadd_scores FILE
                        Filename to output variants that are not found
                        in CADD annotation files. Output will be gzip
                        compressed and in a format suitable for uploading
                        to https://cadd.gs.washington.edu/score for
                        scoring (or for scoring locally).
                        
  --cadd_phred FLOAT, -cadd_phred FLOAT
                        CADD PHRED score cutoff. Variants with a C

Related Skills

View on GitHub
GitHub Stars12
CategoryDevelopment
Updated10mo ago
Forks2

Languages

Python

Security Score

87/100

Audited on Jun 1, 2025

No findings