Mumerge
muMerge tool for combining replicate bed-regions
Install / Use
/learn @Dowell-Lab/MumergeREADME
muMerge
muMerge is a tool for combining bed regions from multiple bed files that overlap
Installation
To install muMerge, you clone the github repository as follows:
$ git clone https://github.com/Dowell-Lab/mumerge.git
Requirements
muMerge is written in python version 3, and uses the following software and modules:
- python v3
- numpy
- matplotlib
- bedtools
Usage
Help command
For general usage, used the help command
$ python mumerge.py -h
This will return the general commands needed to run muMerge.
usage: mumerge.py [-h] [-H] [-i INPUT] [-o OUTPUT] [-w WIDTH] [-m MERGED] [-r] [-v]
Merges region calls (mu) generated by Tfit, or other peak calling functions across multiple samples and replicates.
optional arguments:
-h, --help show this help message and exit
-H, --HELP Verbose help info about the input format.
-i INPUT, --input INPUT
Input file (full path) containing bedfiles, sample ID's and replicate grouping names (tab delimited). Each sample on separate line. First line header, equal to '#file<TAB>sampid<TAB>group',
required. 'file' must be full path. 'sampid' can be any string. 'group' can be string or integer. See '-H' help flag for more information.
-o OUTPUT, --output OUTPUT
Output file basename (full path, sans extension). WARNING: will overwrite any existing file)
-w WIDTH, --width WIDTH
The ratio of a the sigma for the corresponding probabilty distribution to the bed region (half-width) --- sigma:half-bed (default: 1???). The choice for this parameter will depend on the data
type as well as how bed regions were inferred from the expression data.
-m MERGED, --merged MERGED
Sorted bedfile (full path) containing the regions over which to combine the sample bedfiles. If not specified, mumerge will generate one directly from the sample bedfiles.
-r, --remove_singletons
Remove calls not present in more than 1 sample
-v, --verbose Verbose printing during processing.
Input files
The <INPUT> file is a tab delimited text file that contains paths to BED files to be merged along with sample names as condition/replicate information for each sample. In the example below, there are 4 samples with two treatment groups.
#file sampid group
/path/to/sample1.bed sample1 control
/path/to/sample2.bed sample2 control
/path/to/sample3.bed sample3 treatment
/path/to/sample4.bed sample4 treatment
Example run command
$ python mumerge.py -i sample_information.txt -o output_path/project_id
Output files
muMerge returns the merged regions in BED file format (project_id_MUMERGE.bed). Additionally, a log file (project_id.log) that details the summary of the run is also inlcuded along with intermediate files (project_id_MISCALLS.bed, project_id_BEDTOOLS_MERGE.bed).
Run time
The overall run time depends on the the number for input BED files and regions being merged. A test case, where 8 samples (~30,000 regions) with 6 condition groups were merged, took about 12 minutes on a MacBook Pro iCore i9 2.3 GHz running macOS v 10.14.6.
- python version 3.6.3
- numpy version 1.19.1
- matplotlib version 3.2.2
- bedtools version 2.30.0
Citation
Please cite the following article if you use muMerge:
@article{rubin2021transcription,
title={Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment},
author={Rubin, Jonathan D and Stanley, Jacob T and Sigauke, Rutendo F and Levandowski, Cecilia B and Maas, Zachary L and Westfall, Jessica and Taatjes, Dylan J and Dowell, Robin D},
journal={Communications biology},
volume={4},
number={1},
pages={1--15},
year={2021},
publisher={Nature Publishing Group}
}
