SkillAgentSearch skills...

Bravo

Deprecated. See new version: https://github.com/statgen/bravo_api - BRowse All Variants Online

Install / Use

/learn @statgen/Bravo
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BRAVO

BRowse All Variants Online

Installation

  1. System Setup
  2. Configuration
  3. Launch the Application
  4. Data Preparation
    1. Prepare VCF
    2. Prepare percentiles
    3. Prepare coverage
    4. Prepare CRAM
  5. Load Data
  6. Data Backup and Restore

System Setup

BRAVO is packaged using Docker. This is the recommended way by which to deploy the application. System dependencies can be installed using the setup.sh script. The script installs and configures the following applications on your machine:

The script will also create the default directory structure on the local host at /data.

|-- data
|   |-- cache
|   |   |-- igv_cache
|   |-- coverage
|   |-- cram
|   |-- genomes
|   |-- import_vcf

Configuration

Application Settings

The BRAVO configuration file for project dependent settings can be found at config/default.py. You will need to edit some of the settings to match your environment.

Apache Config

It is recommended that you run your BRAVO application behind a reverse proxy such as Apache. An example Apache configuration can be found in apache-example.conf. In order to use Apache as a reverse proxy you will need to enable the mod_proxy and mod_proxy_http modules using the command sudo a2enmod proxy proxy_http.

You will also need to change the BRAVO configuration file setting PROXY from False to True and reload your application as described in the Launch the Application section.

Access Control

Authentication

BRAVO supports user authentication using Google's OAuth 2.0 protocol, which is optional and is disabled by default. This section describes how to enable it.

First, make sure that your BRAVO instance is served using HTTPS protocol.

Second, you need to set up a OAuth with Google. Go here and create a project. Your project will get a Client ID and a Client secret. In the list "Authorized redirect URIs" add your OAuth callback URL, which should look like https://[bravo base URL]/callback/google (e.g. https://mybravo.myinstitution.org/callback/google).

Attention! Don't expose to anyone your Client ID and Client secret, and make sure you are using HTTPS for your callback URL.

Third, follow these steps to enable authentication in BRAVO:

  1. Set the GOOGLE_AUTH variable in the BRAVO configuration file to True.
  2. Assign the GOOGLE_LOGIN_CLIENT_ID variable in the BRAVO configuration file to your Client ID from Google.
  3. Assign the GOOGLE_LOGIN_CLIENT_SECRET variable in the BRAVO configuration file to your Client secret from Google.

Email Whitelist

BRAVO allows whitelist specific users based on their email address. To enable whitelisting, follow these steps:

  1. Set up user authentication as described in Authentication section.

  2. Set the EMAIL_WHITELIST variable in BRAVO configuration file to True.

  3. Import list of emails from a text file (one email per line) to the Mongo database:

    ./manage.py whitelist -w emails.txt
    

Terms of Use

If your BRAVO users must to agree to any terms/conditions before browsing your data, you need to enable Terms of Use page as follows:

  1. Set up user authentication as described in Authentication section.
  2. Set the TERMS variable in BRAVO configuration file to True.
  3. Write your terms/conditions to the templates/terms.html file.

Google Analytics

This step is optional. Go here and do whatever you have to to get your own UA-xxxxxx-xx tracking id. Put that in default.py. Or just leave the default UA-01234567-89, and you won't receive any of the tracking data.

Launch the Application

In order to start the service, run docker-compose up -d from the projects home directory.

In order to stop the application run docker-compose down from the projects home directory.

If you are making changed to configuration files or code and need to reload the changes, run docker-compose down && docker-compose up --build -d in order to stop rebuild the containers with your updates.

Data Preparation

In the data/ directory you will find tools/scripts to prepare your data for importing into Mongo database and using in BRAVO browser.

Prepare VCF

We recommend to process each chromosome separately in parallel. You can further parallelize the process by specifying chromosomal regions in steps (2) and (3).

  1. Compile data preparation tools:

    cd data/DataPrep/
    cget install .
    

    After successful compilation, the executables will be installed in data/DataPrep/cget/bin.

  2. Preprare VCF with the INFO fields NS, AN, AC, AF, Hom, Het, DP, AVGDP, AVGDP_R, AVGGQ, AVGGQ_R, DP_HIST, DP_HIST_R, GQ_HIST, and GQ_HIST_R:

    ./cget/bin/ComputeAlleleCountsAndHistograms -i [input bcf/vcf] -s [samples file] -r [CHR:START-END] -o [output.vcf.gz]
    

    Input BCF/VCF must have DP and GQ FORMAT fields. Input BCF/VCF can be accessed both from local and from Google bucket storages. The input samples file (one sample ID per line) and chromosomal region CHR:START-END are optional.

  3. Run Variant Effect Predictor (VEP) on the VCF created in step (2):

    ./vep -i [input vcf.gz] --plugin LoF[,options]  --assembly [GRCh37/GRCh38] --cache --offline --vcf --sift b --polyphen b --ccds --uniprot --hgvs --symbol --numbers --domains --regulatory --canonical --protein --biotype --af --af_1kg --pubmed --shift_hgvs 0 --allele_number --format vcf --force --buffer_size 100000 --compress_output gzip --no_stats -o [output vcf.gz]
    

    Specify LoF plugin configuration options as you need.

  4. (Optional) Obtain CADD scores from https://cadd.gs.washington.edu and annotate VCF from step (3):

    python add_cadd_scores.py -i [input vcf.gz] -c [cadd_file1.tsv.gz] [cadd_file2.tsv.gz] ...  -o [output vcf.gz]
    

    CADD score files must by accompanied by the corresponding index files. If multiple CADD score files are specified, then the maximal CADD score across all files will be used.

<!-- 5. Now you are ready to import VCF's from step (4) into Mongo database. Index all input VCF files with `tabix` and run the following command: ``` python manage.py variants -t [threads] -v [input chr1 vcf.gz] [input chr2 vcf.gz] ... ``` -->

Prepare percentiles

Percentiles must be computed separately for each INFO field.

  1. For each VCF INFO field (AVGDP, BQZ, CYZ, DP, FIBC_I, FIBC_P, HWE_SLP_I, HWE_SLP_P, IOR, NM0, NM1, NMZ, QUAL, STZ, SVM, ABE, ABZ) run:

    ./cget/bin/ComputePercentiles -i [input vcf.gz] -m [INFO field] -t [threads] -f [min MAF] -F [max MAF] -a [allele count] -p [number of perceniles] -d [description] -o [prefix for output files]
    

    Examples:

    ./cget/bin/ComputePercentiles -i /mymachine/myhome/mydata/chr*.mystudy.vcf.gz -t 10 -p 10 -o QUAL
    ./cget/bin/ComputePercentiles -i /mymachine/myhome/mydata/chr*.mystudy.vcf.gz -m ABE -t 10 -p 10 -d "Expected allele Balance towards Reference Allele on Heterozygous Sites" -o ABE
    
  2. For each INFO field X in step (1), you will have two files X.all_percentiles.json.gz and X.variant_percentile.vcf.gz. The first is a compressed text file with INFO field description and percentiles in JSON format. The second is a compressed VCF file with X_PCTL INFO field which stores the corresponding percentile for every variant.

  3. Index X.variant_percentile.vcf.gz using tabix and annotate your VCF files from previous step:

    find . -maxdepth 1 -name "*.variant_percentile..gz" -exec tabix {} \;
    python add_percentiles.py -i [input vcf.gz] - p QUAL.variant_percentile.vcf.gz ABE.variant_percentile.vcf.gz ... -o [output vcf.gz]
    
<!-- 3. Import `ALL.all_percentiles.gz` from step (2) into Mongo database: ``` python manage.py metrics -m ALL.all_percentiles.gz ``` 4. Update Mongo database with variant percentiles from `*.variant_percentiles.gz` files: ``` [will be added soon] ``` -->

Prepare coverage

To prepare a coverage data for each base-pair position, you can use all your BAM/CRAM files or only a random subset of them (e.g. 1,000) if you need to reduce computational time.

  1. For each chromosome and for each BAM/CRAM file extract depth per base-pair:

    samtools view -q 20 -F 0x0704 -uh [CRAM/BAM file] [chromosome] | samtools calmd -uAEr - [reference FASTA] | bam clipOverlap --in -.ubam --out -.ubam | samtools mpileup -f [reference FASTA] -Q 20 -t DP - | cut -f1-4 | bgzip > [chromosome].[sample].depth.gz
    

    In this step we use clipOverlap from BamUtil.

  2. For each chromosome, create tabix index files for [chromosome].[sample].depth.gz e.g.:

    for f in 10.*.depth.gz; do tabix $f; done
    
  3. For each chromosome, aggregate base-pair coverage acrosss output files [chromosome].[sample].depth.gz from step (1):

    python base_coverage/create_coverage.py -i [files list] aggregate -c [chromosome] -s [start bp] -e [end bp]
    

Related Skills

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated2y ago
Forks8

Languages

HTML

Security Score

70/100

Audited on Aug 4, 2023

No findings