SkillAgentSearch skills...

MAGE

Analysis of gene expression and splicing diversity in a subset of samples from the 1000 Genomes Project, including eQTL and sQTL discovery and annotation.

Install / Use

/learn @mccoy-lab/MAGE
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/mccoy-lab/MAGE/main/images/MAGE_logo.large_no_bg_white_letters_w_outline.png"> <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/mccoy-lab/MAGE/main/images/MAGE_logo.large_no_bg_black_letters_w_outline.png"> <img alt="MAGE logo" src="https://raw.githubusercontent.com/mccoy-lab/MAGE/main/images/MAGE_logo.large_white_bg_black_letters.png"> </picture>

MAGE: Multi-ancestry Analysis of Gene Expression

⚠️ 2026-03-27 IMPORTANT NOTICE - SAMPLE SWAP ⚠️

A sample swap has been detected:

  • Library SRR19762530 is labeled as being derived from HG00237; it is actually derived from NA11919.
    • This library was NOT used for downstream QTL mapping. The HG00237 library used for QTL mapping was SRR19762247.
  • Library SRR19762653 is labled as being derived from NA11919; it is actually derived from HG00237.
    • This library WAS used for downstream QTL mapping.

UPDATED STATUS 2026-04-03

  • Two new runs have been added to BioProject PRJNA851328 to correct this sample swap:
    • SRR37907959 derives from HG00237
    • SRR37907958 derives from NA11919
  • These runs replace the incorrect labeled SRR19762530 and SRR19762653 runs
    • A small note: SRR19762530 and SRR19762653 were re-labeled on SRA before being replaced. So SRR19762530 will appear as being derived from NA11919 if you look it up on SRA now.
  • The Zenodo repository has NOT been updated yet. Please continue to use these data with caution.

Thank you Dr. Steven M. Heaton for making us aware of this issue.


DOI

MAGE comprises RNA-seq data from lymphoblastoid cell lines derived from 731 individuals from the 1000 Genomes Project (1KGP), representing 26 globally-distributed populations across five continental groups. These data offer a large, geographically diverse, open access resource to facilitate studies of the distribution, genetic underpinnings, and evolution of variation in human transcriptomes and include data from several ancestry groups that were poorly represented in previous studies.

Data Access

Raw reads

Newly generated RNA sequencing data for the 731 individuals (779 total libraries) is available on the Sequence Read Archive (Accession: PRJNA851328).

Processed data

Processed gene expression matrices and QTL mapping results (as well as a host of other downstream data) are currently available on Zenodo (MAGEv1.0 Zenodo link) as well as Dropbox (MAGEv1.0 Dropbox link).

Briefly, this repo contains the following data:

  1. Sample metadata and sequencing metrics
  2. Gene expression and splicing matrices used for e/sQTL mapping and analyses of global trends of expression/splicing diversity
  3. cis-e/sQTL mapping results, including aFC estimates for cis-eQTLs
  4. Functional annotations of cis-e/sQTLs
  5. Results of colocalization analysis between MAGE e/sQTLs and complex trait GWAS from the PAGE study
  6. Results of analyses of global trends of expression/splicing diversity
  7. Jointly-generated top genotype PCs for samples in MAGE and other resources with paired WGS/RNA-seq data (Geuvadis, GTEx, AFGR)

READMEs are provided for all data in the repo.

If you are having trouble accessing these data, please feel free to contact us to explore other options (e.g., Globus).

Variant calls

The high-coverage variant calls used for QTL mapping were previously generated by the New York Genome Center (NYGC) and are available through the 1KGP FTP site.

Code

Code used for data processing and downstream analyses is made available in the analysis_pipeline/ directory, along with READMEs describing how each script is run.

Code used to produce major figures/panels in the manuscript is made available in the figure_generation/ directory.

The MAGE manuscript

For more information about the MAGE resource as well as analyses performed using this resource, please see our paper:

Sources of gene expression variation in a globally diverse human cohort<br> Dylan J. Taylor, Surya B. Chhetri, Michael G. Tassia, Arjun Biddanda, Stephanie M. Yan, Genevieve L. Wojcik, Alexis Battle, Rajiv C. McCoy

Citing MAGE

If you use MAGE data in your own work, please cite the paper linked above.

Related Skills

View on GitHub
GitHub Stars44
CategoryDevelopment
Updated2h ago
Forks6

Languages

R

Security Score

75/100

Audited on Apr 8, 2026

No findings