SkillAgentSearch skills...

Score

Tools to work with GWAS-VCF summary statistics files

Install / Use

/learn @freeseek/Score
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

score

A set of tools to work with summary statistics files following the GWAS-VCF specification. We encourage users to adopt the GWAS-VCF specification rather than the GWAS-SSF specification promoted by the GWAS catalog as the latter is affected by issues and furthermore we believe that many common uses are better addressed by using the more general VCF specification. If you are planning to publish your summary statistics, we encourage you to submit them as GWAS-VCF files or as both GWAS-VCF and as GWAS-SSF files. The latter can be generated from the former with the following command

(echo -e "chromosome\tbase_pair_location\teffect_allele\tother_allele\tbeta\tstandard_error\teffect_allele_frequency\tp_value";
bcftools query -s SM -f "%CHROM\t%POS\t%ALT\t%REF[\t%ES\t%SE\t%AF\t%LP]\n" gwas-vcf.vcf | \
  sed 's/^chr//;s/^X/23/;s/^Y/24/;s/^MT/25/;s/^M/25/;s/\t\./\tNA/g' | awk -F"\t" -v OFS="\t" '{$8=10^(-$8); print}') > gwas-ssf.tsv

If you use BCFtools/liftover in your publication, please cite the following paper from 2024

Genovese G., McCarroll S. et al. BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics 40, Issue 2 (2024). [PMID: 38261650] [DOI: 10.1093/bioinformatics/btae038]

If you use BCFtools/blup or BCFtools/pgs in your publication, please cite the following paper from 2023

Nowbandegani P., O’Connor L.J. et al. (2023) Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet, 55, 1494–1502. [PMID: 37640881] [DOI: 10.1038/s41588-023-01487-8]

If you use BCFtools/metal in your publication, please cite the following paper from 2010

Willer, C. J., Li, Y. & Abecasis, G. R. (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191. [PMID: 20616382] [DOI: 10.1093/bioinformatics/btq340]

If you use any of the other tools in your publication, please cite this website. For any feedback or questions, contact the author

Examples of how to convert existing summary statistics to polygenic score loadings using BCFtools/munge, BCFtools/liftover, and BCFtools/pgs can be found here

<!--ts--> <!--te-->

Usage

Polygenic score tool:

Usage: bcftools +score [options] <in.vcf.gz> [<score1.gwas.vcf.gz> <score2.gwas.vcf.gz> ...]
Plugin options:
       --use <tag>               FORMAT tag to use to compute allele dosages: GP, AP, HDS, DS, GT, AS
       --summaries <dir|file>    summary statistics files from directory or list from file
       --q-score-thr LIST        comma separated list of p-value thresholds
       --counts                  include SNP counts in the output table
   -o, --output <file.tsv>       write output to a file [standard output]
       --sample-header           output header for sample ID column [SAMPLE]
   -e, --exclude <expr>          exclude sites for which the expression is true
   -f, --apply-filters <list>    require at least one of the listed FILTER strings (e.g. "PASS,.")
   -i, --include <expr>          select sites for which the expression is true
   -r, --regions <region>        restrict to comma-separated list of regions
   -R, --regions-file <file>     restrict to regions listed in a file
       --regions-overlap 0|1|2   Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
   -t, --targets [^]<region>     restrict to comma-separated list of regions. Exclude regions with "^" prefix
   -T, --targets-file [^]<file>  restrict to regions listed in a file. Exclude regions with "^" prefix
       --targets-overlap 0|1|2   Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
   -s, --samples [^]<list>       comma separated list of samples to include (or exclude with "^" prefix)
   -S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
       --force-samples           only warn about unknown subset samples

TSV Summary Statistics Options:
   -c, --columns <preset>        column headers from preset (PLINK/PLINK2/REGENIE/SAIGE/BOLT/METAL/PGS/SSF)
   -C, --columns-file <file>     column headers from tab-delimited file
       --use-variant-id          use variant_id to match variants rather than chromosome and base_pair_location

Examples:
   bcftools +score --use DS -o scores.tsv input.bcf -c PLINK score.assoc
   bcftools +score --use DS -o scores.tsv input.bcf -C colheaders.tsv PGC3_SCZ_wave3_public.clumped.v2.tsv.gz
   bcftools +score --use GT -o scores.tsv --q-score-thr 1e-8,1e-7,1e-6,1e-5,1e-4,0.001,0.01,0.05 input.bcf -c GWAS-SSF PGS000001.txt.gz
   bcftools +score --use DS -o scores.tsv -i 'INFO>0.8 && AF>0.01 && AF<0.99' input.bcf -c GWAS-SSF PGS000001.txt.gz PGS000002.txt.gz

Munge summary statistics tool:

Usage: bcftools +munge [options] <score.gwas.ssf.tsv>
Plugin options:
   -c, --columns <preset>          column headers from preset (PLINK/PLINK2/REGENIE/SAIGE/BOLT/METAL/PGS/SSF)
   -C, --columns-file <file>       column headers from tab-delimited file
   -f, --fasta-ref <file>          reference sequence in fasta format
       --fai <file>                reference sequence .fai index
       --set-cache-size <int>      select fasta cache size in bytes
       --iffy-tag <string>         FILTER annotation tag to record whether reference allele could not be determined [IFFY]
       --mismatch-tag <string>     FILTER annotation tag to record whether reference does not match any allele [REF_MISMATCH]
   -s, --sample-name <string>      sample name for the phenotype [SAMPLE]
       --ns <float>                number of samples
       --nc <float>                number of cases
       --ne <float>                effective sample size
       --no-version                do not append version and command line to the header
   -o, --output <file>             write output to a file [no output]
   -O, --output-type u|b|v|z[0-9]  u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
       --threads <int>             use multithreading with INT worker threads [0]
   -W, --write-index[=FMT]         Automatically index the output files [off]

Examples:
      bcftools +munge -c PLINK -f human_g1k_v37.fasta -Ob -o score.bcf score.assoc
      bcftools +munge -C colheaders.tsv -f human_g1k_v37.fasta -s SCZ_2022 -Ob -o PGC3_SCZ.bcf PGC3_SCZ.tsv.gz

Liftover VCFs tool:

Usage: bcftools +liftover [General Options] -- [Plugin Options]
Options:
   run "bcftools plugin" for a list of common options

Plugin options:
   -s, --src-fasta-ref <file>      source reference sequence in fasta format
   -f, --fasta-ref <file>          destination reference sequence in fasta format
       --set-cache-size <int>      select fasta cache size in bytes
   -c, --chain <file>              UCSC liftOver chain file
       --max-snp-gap <int>         maximum distance to merge contiguous blocks separated by same distance [1]
       --max-indel-inc <int>       maximum distance used to increase the size an indel during liftover [250]
       --lift-mt                   force liftover of MT/chrMT [automatically determined from contig lengths]
       --print-blocks <file>       output contiguous blocks used for the liftOver
       --no-left-align             do not attempt to left align indels after liftover
       --reject <file>             output variants that cannot be lifted over
   -O, --reject-type u|b|v|z[0-9]  u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
       --write-src                 write the source contig/position/alleles for lifted variants
       --write-fail                write whether the 5' and 3' anchors have failed to lift
       --write-nw                  write the Needleman-Wunsch alignments when required
       --write-reject              write the reason variants cannot be lifted over

Options for how to update INFO/FORMAT records:
       --fix-tags                  fix Number type for INFO/AC, INFO/AF, FORMAT/GP, and FORMAT/DS tags
       --lift-end                  lift the position of the INFO/END tag instead of recomputing it
        --flip-tag <string>         INFO annotation flag to record whether alleles are flipped [FLIP]
       --swap-tag <string>         INFO annotation to record when alleles are swapped [SWAP]
       --drop-tags <list>          tags to drop when alleles are swapped [.]
       --ac-tags <list>            AC-like tags (must be Number=A,Type=Integer/Float) [INFO/AC,FMT/AC]
       --af-tags <list>            AF-like tags (must be Number=A,Type=Float) [INFO/AF,FMT/AF,FMT/AP1,FMT/AP2]
       --ds-tags <list>            DS-like tags (must be Number=A,Type=Float) [FMT/DS]
       --gt-tags <list>            tags with integers like FORMAT/GT (must be Type=Integer) [INFO/ALLELE_A,INFO/ALLELE_B]
View on GitHub
GitHub Stars131
CategoryDevelopment
Updated3d ago
Forks9

Languages

C

Security Score

95/100

Audited on Mar 27, 2026

No findings