Bravo
Deprecated. See new version: https://github.com/statgen/bravo_api - BRowse All Variants Online
Install / Use
/learn @statgen/BravoREADME
BRAVO
BRowse All Variants Online
Installation
- System Setup
- Configuration
- Launch the Application
- Data Preparation
- Load Data
- Data Backup and Restore
System Setup
BRAVO is packaged using Docker. This is the recommended way by which to deploy the application.
System dependencies can be installed using the setup.sh script. The script installs and configures the following applications on your machine:
- Docker CE
- Docker Compose
- Google Cloud SDK(Used for application data backups.)
The script will also create the default directory structure on the local host at /data.
|-- data
| |-- cache
| | |-- igv_cache
| |-- coverage
| |-- cram
| |-- genomes
| |-- import_vcf
Configuration
Application Settings
The BRAVO configuration file for project dependent settings can be found at config/default.py.
You will need to edit some of the settings to match your environment.
Apache Config
It is recommended that you run your BRAVO application behind a reverse proxy such as Apache. An example Apache configuration can be found in apache-example.conf.
In order to use Apache as a reverse proxy you will need to enable the mod_proxy and mod_proxy_http modules using the command sudo a2enmod proxy proxy_http.
You will also need to change the BRAVO configuration file setting PROXY from False to True and reload your application as described in the Launch the Application section.
Access Control
Authentication
BRAVO supports user authentication using Google's OAuth 2.0 protocol, which is optional and is disabled by default. This section describes how to enable it.
First, make sure that your BRAVO instance is served using HTTPS protocol.
Second, you need to set up a OAuth with Google. Go here and create a project. Your project will get a Client ID and a Client secret. In the list "Authorized redirect URIs" add your OAuth callback URL, which should look like https://[bravo base URL]/callback/google (e.g. https://mybravo.myinstitution.org/callback/google).
Attention! Don't expose to anyone your Client ID and Client secret, and make sure you are using HTTPS for your callback URL.
Third, follow these steps to enable authentication in BRAVO:
- Set the
GOOGLE_AUTHvariable in the BRAVO configuration file toTrue. - Assign the
GOOGLE_LOGIN_CLIENT_IDvariable in the BRAVO configuration file to yourClient IDfrom Google. - Assign the
GOOGLE_LOGIN_CLIENT_SECRETvariable in the BRAVO configuration file to yourClient secretfrom Google.
Email Whitelist
BRAVO allows whitelist specific users based on their email address. To enable whitelisting, follow these steps:
-
Set up user authentication as described in Authentication section.
-
Set the
EMAIL_WHITELISTvariable in BRAVO configuration file toTrue. -
Import list of emails from a text file (one email per line) to the Mongo database:
./manage.py whitelist -w emails.txt
Terms of Use
If your BRAVO users must to agree to any terms/conditions before browsing your data, you need to enable Terms of Use page as follows:
- Set up user authentication as described in Authentication section.
- Set the
TERMSvariable in BRAVO configuration file toTrue. - Write your terms/conditions to the
templates/terms.htmlfile.
Google Analytics
This step is optional. Go here and do whatever you have to to get your own UA-xxxxxx-xx tracking id. Put that in default.py. Or just leave the default UA-01234567-89, and you won't receive any of the tracking data.
Launch the Application
In order to start the service, run docker-compose up -d from the projects home directory.
In order to stop the application run docker-compose down from the projects home directory.
If you are making changed to configuration files or code and need to reload the changes, run docker-compose down && docker-compose up --build -d in order to stop rebuild the containers with your updates.
Data Preparation
In the data/ directory you will find tools/scripts to prepare your data for importing into Mongo database and using in BRAVO browser.
Prepare VCF
We recommend to process each chromosome separately in parallel. You can further parallelize the process by specifying chromosomal regions in steps (2) and (3).
-
Compile data preparation tools:
cd data/DataPrep/ cget install .After successful compilation, the executables will be installed in
data/DataPrep/cget/bin. -
Preprare VCF with the INFO fields NS, AN, AC, AF, Hom, Het, DP, AVGDP, AVGDP_R, AVGGQ, AVGGQ_R, DP_HIST, DP_HIST_R, GQ_HIST, and GQ_HIST_R:
./cget/bin/ComputeAlleleCountsAndHistograms -i [input bcf/vcf] -s [samples file] -r [CHR:START-END] -o [output.vcf.gz]Input BCF/VCF must have DP and GQ FORMAT fields. Input BCF/VCF can be accessed both from local and from Google bucket storages. The input samples file (one sample ID per line) and chromosomal region CHR:START-END are optional.
-
Run Variant Effect Predictor (VEP) on the VCF created in step (2):
./vep -i [input vcf.gz] --plugin LoF[,options] --assembly [GRCh37/GRCh38] --cache --offline --vcf --sift b --polyphen b --ccds --uniprot --hgvs --symbol --numbers --domains --regulatory --canonical --protein --biotype --af --af_1kg --pubmed --shift_hgvs 0 --allele_number --format vcf --force --buffer_size 100000 --compress_output gzip --no_stats -o [output vcf.gz]Specify LoF plugin configuration options as you need.
-
(Optional) Obtain CADD scores from https://cadd.gs.washington.edu and annotate VCF from step (3):
python add_cadd_scores.py -i [input vcf.gz] -c [cadd_file1.tsv.gz] [cadd_file2.tsv.gz] ... -o [output vcf.gz]CADD score files must by accompanied by the corresponding index files. If multiple CADD score files are specified, then the maximal CADD score across all files will be used.
Prepare percentiles
Percentiles must be computed separately for each INFO field.
-
For each VCF INFO field (AVGDP, BQZ, CYZ, DP, FIBC_I, FIBC_P, HWE_SLP_I, HWE_SLP_P, IOR, NM0, NM1, NMZ, QUAL, STZ, SVM, ABE, ABZ) run:
./cget/bin/ComputePercentiles -i [input vcf.gz] -m [INFO field] -t [threads] -f [min MAF] -F [max MAF] -a [allele count] -p [number of perceniles] -d [description] -o [prefix for output files]Examples:
./cget/bin/ComputePercentiles -i /mymachine/myhome/mydata/chr*.mystudy.vcf.gz -t 10 -p 10 -o QUAL ./cget/bin/ComputePercentiles -i /mymachine/myhome/mydata/chr*.mystudy.vcf.gz -m ABE -t 10 -p 10 -d "Expected allele Balance towards Reference Allele on Heterozygous Sites" -o ABE -
For each INFO field
Xin step (1), you will have two filesX.all_percentiles.json.gzandX.variant_percentile.vcf.gz. The first is a compressed text file with INFO field description and percentiles in JSON format. The second is a compressed VCF file withX_PCTLINFO field which stores the corresponding percentile for every variant. -
Index
X.variant_percentile.vcf.gzusingtabixand annotate your VCF files from previous step:find . -maxdepth 1 -name "*.variant_percentile..gz" -exec tabix {} \; python add_percentiles.py -i [input vcf.gz] - p QUAL.variant_percentile.vcf.gz ABE.variant_percentile.vcf.gz ... -o [output vcf.gz]
Prepare coverage
To prepare a coverage data for each base-pair position, you can use all your BAM/CRAM files or only a random subset of them (e.g. 1,000) if you need to reduce computational time.
-
For each chromosome and for each BAM/CRAM file extract depth per base-pair:
samtools view -q 20 -F 0x0704 -uh [CRAM/BAM file] [chromosome] | samtools calmd -uAEr - [reference FASTA] | bam clipOverlap --in -.ubam --out -.ubam | samtools mpileup -f [reference FASTA] -Q 20 -t DP - | cut -f1-4 | bgzip > [chromosome].[sample].depth.gzIn this step we use clipOverlap from BamUtil.
-
For each chromosome, create tabix index files for
[chromosome].[sample].depth.gze.g.:for f in 10.*.depth.gz; do tabix $f; done -
For each chromosome, aggregate base-pair coverage acrosss output files
[chromosome].[sample].depth.gzfrom step (1):python base_coverage/create_coverage.py -i [files list] aggregate -c [chromosome] -s [start bp] -e [end bp]
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
