Gwasvcf
Reading, querying and writing GWAS summary data in VCF format
Install / Use
/learn @MRCIEU/GwasvcfREADME
Reading, querying and writing GWAS summary data in VCF format
<!-- badges: start --> <!-- badges: end -->Complete GWAS summary datasets are now abundant. A large repository of curated, harmonised and QC'd datasets is available in the IEU GWAS database. They can be queried via the API directly, or through the ieugwasr R package, or the ieugwaspy Python package. However, for faster querying that can be used in a HPC environment, accessing the data directly and not through cloud systems is advantageous.
We developed a format for storing and harmonising GWAS summary data known as GWAS VCF format which can be created using gwas2vcf. All the data in the IEU GWAS database is available for download in this format. This R package provides fast and convenient functions for querying and creating GWAS summary data in GWAS VCF format (v1.0). See also pygwasvcf a Python3 parser for querying GWAS VCF files.
This package includes:
- a wrapper around the bioconductor/VariantAnnotation package, providing functions tailored to GWAS VCF for reading, querying, creating and writing GWAS VCF format files
- some LD related functions such as using a reference panel to extract proxies, create LD matrices and perform LD clumping
- functions for harmonising a dataset against the reference genome and creating GWAS VCF files.
See also the gwasglue R package for methods to connect the VCF data to Mendelian randomization, colocalisation, fine mapping etc.
Installation
You can install a binary version from our MRC IEU r-universe with
install.packages('gwasvcf', repos = c('https://mrcieu.r-universe.dev', 'https://cloud.r-project.org'))
or install from the GitHub repo
remotes::install_github("mrcieu/gwasvcf")
Usage
See vignettes here: https://mrcieu.github.io/gwasvcf/.
Citation
If using GWAS-VCF files please reference the studies that you use and the following paper:
The variant call format provides efficient and robust storage of GWAS summary statistics. Matthew Lyon, Shea J Andrews, Ben Elsworth, Tom R Gaunt, Gibran Hemani, Edoardo Marcora. bioRxiv 2020.05.29.115824; doi: https://doi.org/10.1101/2020.05.29.115824
Reference datasets
Example GWAS VCF (GIANT 2010 BMI):
- http://fileserve.mrcieu.ac.uk/vcf/IEU-a-2.vcf.gz
- http://fileserve.mrcieu.ac.uk/vcf/IEU-a-2.vcf.gz.tbi
1000 genomes reference panels for LD for each superpopulation - used by default in OpenGWAS:
RSID index for faster querying:
1000 genomes annotations in vcf format harmonised against human genome reference:
- http://fileserve.mrcieu.ac.uk/vcf/1kg_v3_nomult.vcf.gz
- http://fileserve.mrcieu.ac.uk/vcf/1kg_v3_nomult.vcf.gz.tbi
Notes
Example data
data.vcf.gz and data.vcf.gz.tbi are the first few rows of the Speliotes 2010 BMI GWAS
The eur.bed/bim/fam files are the same range as data.vcf.gz, from here http://fileserve.mrcieu.ac.uk/ld/data_maf0.01_rs_ref.tgz
Related Skills
docs-writer
99.5k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
340.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
