Megadepth
R interface to megadepth: BigWig and BAM related utilities
Install / Use
/learn @LieberInstitute/MegadepthREADME
megadepth
<!-- badges: start --> <!-- badges: end -->The goal of megadepth is to provide an R interface to the command line
tool Megadepth for
BigWig and BAM related utilities created by Christopher
Wilks. This R package enables fast
processing of BigWig files on downstream packages such as
dasper and
recount3. The
Megadepth software also
provides utilities for processing BAM files and extracting coverage
information from them.
Here is an illustration on how fast megadepth is compared to other
tools for processing local and remote BigWig files.
<a href="https://github.com/LieberInstitute/megadepth/tree/devel/analysis"><img src="https://raw.githubusercontent.com/LieberInstitute/megadepth/devel/analysis/md_rt_pybw_runtime.png" width="800px" ></a>
Throughout the documentation we use a capital M to refer to the
software by Christopher Wilks and a lower case m to refer to this
R/Bioconductor package.
Installation instructions
Get the latest stable R release from
CRAN. Then install megadepth from
Bioconductor using the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("megadepth")
And the development version from GitHub with:
BiocManager::install("LieberInstitute/megadepth")
Example
In the following example, we install
Megadepth with
install_megadepth(), which downloads a binary for your OS (Linux,
Windows or macOS). We can then use with an example BigWig file to
compute the coverage at a set of regions.
## Load the R package
library("megadepth")
## Install Megadepth's pre-compiled binary on your system
install_megadepth()
#> It seems megadepth has been installed. Use force = TRUE to reinstall or upgrade.
## Next, we locate the example BigWig and annotation files
example_bw <- system.file("tests", "test.bam.all.bw",
package = "megadepth", mustWork = TRUE
)
annotation_file <- system.file("tests", "testbw2.bed",
package = "megadepth", mustWork = TRUE
)
## We can then use megadepth to compute the coverage
bw_cov <- get_coverage(example_bw, op = "mean", annotation = annotation_file)
bw_cov
#> GRanges object with 4 ranges and 1 metadata column:
#> seqnames ranges strand | score
#> <Rle> <IRanges> <Rle> | <numeric>
#> [1] chr10 0-10 * | 0.00
#> [2] chr10 8756697-8756762 * | 15.85
#> [3] chr10 4359156-4359188 * | 3.00
#> [4] GL000219.1 168500-168620 * | 1.26
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
Full set of utilities
Megadepth is very
powerful and can do a lot of different things. The R/Bioconductor
package provides two functions for interfacing with
Megadepth,
megadepth_cmd() and megadepth_shell(). For the first one,
megadepth_cmd(), you need to know the actual command syntax you want
to use and format it accordingly. If you are more comfortable with R
functions, megadepth_shell() uses
cmdfun to power this
interface and capture the standard output stream into R.
To make it easier to use, megadepth includes functions that simplify
the number of arguments, read in the output files, and converts them
into R/Bioconductor friendly objects, such as get_coverage()
illustrated above.
We hope that you’ll find megadepth and
Megadepth useful for
your work. If you are interested in checking how fast megadepth
is, check out the speed
analysis
comparison against other tools. Note that the size of the files used and
the number of genomic regions queried will affect the speed comparisons.
## R-like interface
## that captures the standard output into R
head(megadepth_shell(help = TRUE))
#> [1] "megadepth 1.2.0" ""
#> [3] "BAM and BigWig utility." ""
#> [5] "Usage:" " megadepth <bam|bw|-> [options]"
## Command-like interface
megadepth_cmd("--help")
#> megadepth 1.2.0
#>
#> BAM and BigWig utility.
#>
#> Usage:
#> megadepth <bam|bw|-> [options]
#>
#> Options:
#> -h --help Show this screen.
#> --version Show version.
#> --threads # of threads to do: BAM decompression OR compute sums over multiple BigWigs in parallel
#> if the 2nd is intended then a TXT file listing the paths to the BigWigs to process in parallel
#> should be passed in as the main input file instead of a single BigWig file (EXPERIMENTAL).
#> --prefix String to use to prefix all output files.
#> --no-auc-stdout Force all AUC(s) to be written to <prefix>.auc.tsv rather than STDOUT
#> --no-annotation-stdout Force summarized annotation regions to be written to <prefix>.annotation.tsv rather than STDOUT
#> --no-coverage-stdout Force covered regions to be written to <prefix>.coverage.tsv rather than STDOUT
#> --keep-order Output annotation coverage in the order chromosomes appear in the BAM/BigWig file
#> The default is to output annotation coverage in the order chromosomes appear in the annotation BED file.
#> This is only applicable if --annotation is used for either BAM or BigWig input.
#>
#> BigWig Input:
#> Extract regions and their counts from a BigWig outputting BED format if a BigWig file is detected as input (exclusive of the other BAM modes):
#> Extracts all reads from the passed in BigWig and output as BED format.
#> This will also report the AUC over the annotated regions to STDOUT.
#> If only the name of the BigWig file is passed in with no other args, it will *only* report total AUC to STDOUT.
#> --annotation <bed> Only output the regions in this BED applying the argument to --op to them.
#> --op <sum[default], mean, min, max> Statistic to run on the intervals provided by --annotation
#> --sums-only Discard coordinates from output of summarized regions
#> --distance (2200[default]) Number of base pairs between end of last annotation and start of new to consider in the same BigWig query window (a form of binning) for performance. This determines the number of times the BigWig index is queried.
#> --unsorted (off[default]) There's a performance improvement *if* BED file passed to --annotation is 1) sorted by sort -k1,1 -k2,2n (default is to assume sorted and check for unsorted positions, if unsorted positions are found, will fall back to slower version)
#> --bwbuffer <1GB[default]> Size of buffer for reading BigWig files, critical to use a large value (~1GB) for remote BigWigs.
#> Default setting should be fine for most uses, but raise if very slow on a remote BigWig.
#>
#>
#> BAM Input:
#> Extract basic junction information from the BAM, including co-occurrence
#> If only the name of the BAM file is passed in with no other args, it will *only*
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
