Methrix
An R :package: for fast and flexible DNA methylation analysis
Install / Use
/learn @CompEpigen/MethrixREADME
Fast and efficient summarization of generic bedGraph files from Bisufite sequencing
<!-- badges: start --> <!-- badges: end -->Introduction
Bedgraph files generated by BS pipelines often come in various flavors. Critical downstream step requires aggregation of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.
Package documentation
-
For a short and quick documentation, see the Bioconductor vignette
-
A exemplary complete data analysis with steps from reading in to annotation and differential methylation calling can be find in our WGBS best practices workflow.
Citation
Mayakonda A, Schönung M, Hey J, Batra RN, Feuerstein-Akgoz C, Köhler K, Lipka DB, Sotillo R, Plass C, Lutsik P, Toth R. Methrix: an R/bioconductor package for systematic aggregation and analysis of bisulfite sequencing data. Bioinformatics. 2020 Dec 21:btaa1048. doi: 10.1093/bioinformatics/btaa1048. Epub ahead of print. PMID: 33346800.
Installation
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#Installing stable version from BioConductor
BiocManager::install("methrix")
#Installing developmental version from GitHub
BiocManager::install("CompEpigen/methrix")
Ideally one should use the newest versions of R and BioC versions. In case of the older versions (for e.g, R < 4.0), installing from BioConductor might lead to installing an older version of the package. In that case installing from GitHub might be easier since it is more merciful with regards to versions.
Features
- Faster summarization of generic bedGraph files with
data.tableback-end - Fills missing CpGs from reference genome
- Vectorized code (faster, memory expensive) and non-vectorized code (slower, minimal memory)
- Built upon
SummarizedExperimentwith custom methods for CpG extraction, sub-setting, and filtering - Easy conversion to bsseq object for downstream analysis
- Extensive one click interactive html report generation. See here for an example
- Supports serialized arrays with
HDF5ArrayandsaveHDF5SummarizedExperiment
Updates:
see here
Quick usage:
Usage is simple and involves generating a methrix object using read_bedgraphs() command which can then be passed to all downstream analyses.
<p align="center"> <img src="https://github.com/CompEpigen/methrix/blob/master/vignettes/overview.png"> </p>Below is the two step procedure to import WGBS bedGraph files.
Step-1: Extract all CpG loci from the reference genome
> hg19_cpgs = methrix::extract_CPGs(ref_genome = "BSgenome.Hsapiens.UCSC.hg19")
-Extracting CpGs
-Done. Extracted 29,891,155 CpGs from 298 contigs.
There were 50 or more warnings (use warnings() to see the first 50)
Step-2: Read in bedgraphs and generate a methrix object
The example data of the methrix package is used.
#Example bedgraph files
> bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = "*bedGraph\\.gz$", full.names = TRUE)
> meth = methrix::read_bedgraphs(files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4,
stranded = TRUE, collapse_strands = TRUE)
----------------------------
-Preset: Custom
--Missing beta and coverage info. Estimating them from M and U values
-CpGs raw: 29,891,155 (total reference CpGs)
-CpGs retained: 28,217,448(reference CpGs from contigs of interest)
-CpGs stranded: 56,434,896(reference CpGs from both strands)
----------------------------
-Processing: C1.bedGraph.gz
--CpGs missing: 56,434,219 (from known reference CpGs)
-Processing: C2.bedGraph.gz
--CpGs missing: 56,434,207 (from known reference CpGs)
-Processing: N1.bedGraph.gz
--CpGs missing: 56,434,194 (from known reference CpGs)
-Processing: N2.bedGraph.gz
--CpGs missing: 56,434,195 (from known reference CpGs)
-Finished in: 00:02:00 elapsed (00:02:23 cpu)
> meth
An object of class methrix
n_CpGs: 28,217,448
n_samples: 4
is_h5: FALSE
Reference: hg19
Methrix operations
What can be done on methrix object? Following are the key functions
#reading and writing:
read_bedgraphs() #Reads in bedgraph files into methrix
write_bedgraphs() #Writes bedGraphs from methrix object
write_bigwigs() #Writes bigWigs from methrix object
#operations
order_by_sd() #Orders methrix object by SD
region_filter() #Filters matrices by region
mask_methrix() #Masks lowly covered CpGs
coverage_filter() #Filters methrix object based on coverage
subset_methrix() #Subsets methrix object based on given conditions.
remove_uncovered() #Removes loci that are uncovered across all samples
remove_snps() #Removes loci overlapping with possible SNPs
#Visualization and QC
methrix_report() #Creates a detailed interative html summary report from methrix object
methrix_pca() #Principal Component Analysis
plot_pca() #Plots the result of PCA
plot_coverage() #Plots coverage statistics
plot_density() #Plots the density distribution of the beta values
plot_violin() #Plots the distribution of the beta values on a violin plot
plot_stats() #Plot descriptive statistics
get_stats() #Estimate descriptive statistics of the object
#Other
methrix2bsseq() #Convert methrix to bsseq object
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
