METEORE
Automatic DNA methylation detection from nanopore tools and their consensus model
Install / Use
/learn @comprna/METEOREREADME

METEORE: MEthylation deTEction with nanopORE sequencing :stars:
About METEORE
METEORE provides snakemake pipelines for various tools to detect DNA methylation from Nanopore sequencing reads. Additionally, it provides new predictive models (random forest and multiple linear regression) that combine the outputs from the tools to produce a consensus prediction with higher accuracy than the individual tools.
NEW UPDATES (Mar-2021)
METEORE can now produce two per-site result files in an augmented BED format for each tool except for DeepMod (which will be updated very soon). The first output file contains the following fields:
- Reference chromosome
- Start position in chromosome
- End position in chromosome
- Read coverage
- Methylation (i.e. methylation frequency)
- Strandedness
In the second output file, we combine the methylation predictions from both strands on CpG sites by averaging the methylation frequencies and adding up the coverage. This output file contains the following fields:
- Reference chromosome
- Start position in chromosome
- End position in chromosome
- Read coverage
- Methylation (i.e. methylation frequency)
Table of Contents
- Pipeline
- Installation
- Tutorial on an example dataset
- Combined model (random forest) usage
- Combined model (multiple linear regression) usage
Pipeline
Fig 1. Pipeline for CpG methylation detection form nanopore sequencing data. All tools take the input fast5 files, detect modified bases (5-methylcytosine at CG dinucleotides in this case) in reads and predict per-site methylation frequency at genome level.
Installation
We recommend to install software dependencies via Conda on Linux. You can find Miniconda installation instructions for Linux here.
Make sure you install the Miniconda Python3 distribution.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Accept the license terms during installation.
For performance and compatibility reasons you should install Mamba via conda to install Snakemake for each pipeline later. See Snakemake documentation for more details.
conda install -c conda-forge mamba
Once you have installed Conda and Mamba, you can download the Snakemake pipelines and the example datasets.
git clone https://github.com/comprna/METEORE.git
cd METEORE/
Tutorial on an example dataset
We provide an example dataset data/example along with a genome reference data/ecoli_k12_mg1655.fasta for you to try the pipelines with. The example contains 50 single-read fast5 files from the positive control dataset for E.coli generated by Simpson et al. (2017).
Run the pipelines with your own data:
- You can run the pipeline with your own dataset by replacing
examplefolder in thedatadirectory with your folder containing the fast5 files. You will use the fast5 folder name to specify your target output file in the Snakemake pipeline. Simply replace example in the output file with your fast5 folder name in the command line below. - You should place the reference genome file in .fasta format in a folder named
data, and re-define the reference genome file within the Snakefile (Nanopolish,Deepsignal1,Tombo,Guppy) by replacingecoli_k12_mg1655.fastawith your specified reference genome.
Nanopolish snakemake pipeline
Create and activate the Conda environment
To install packages for Nanopolish pipeline, run one of the following:
- Installing packages via Mamba
# Create an environment with Snakemake installed
mamba create -c conda-forge -c bioconda -n meteore_nanopolish_env snakemake
# Activate
conda activate meteore_nanopolish_env
# Install all required conda packages with mamba
mamba install -c bioconda nanopolish minimap2 samtools r-data.table r-dplyr r-plyr
- Installing packages using .yml file**
mamba env create -f nanopolish.yml
conda activate meteore_nanopolish_env
Run the snakemake
Before executing the workflow below, make sure you have the basecalled fastq file in the METEORE directory. Nanopolish needs to link the read ids from the fastq file with their signal-level data in the fast5 files. An example fastq file example.fastq is provided.
A Snakefile named Nanopolish contains all rules for the Snakemake workflow. Run the snakemake to create the output files:
snakemake -s Nanopolish nanopolish_results/example_nanopolish-freq-perCG.tsv --cores all
This will produce four index files example.fastq.index, example.fastq.index.fai, example.fastq.index.gzi and example.fastq.index.readdb, and the nanopolish_results output directory containing all output files.
example_nanopolish-log.tsvis the raw output after runningnanopolish call-methylation.example_nanopolish-log-perCG.tsvcontains per-read per-site data, which splits up the CpG group containing multiple nearby sites into its constituent CpG sites.
Chr Pos Strand Log.like.ratio Read_ID
NC_000913.3 3499494 + -0.62 094dfe6b-23ed-4195-8876-805a399fade5
NC_000913.3 3499526 + -0.33 094dfe6b-23ed-4195-8876-805a399fade5
NC_000913.3 3499546 + -0.12 094dfe6b-23ed-4195-8876-805a399fade5
NC_000913.3 3499563 + 8.26 094dfe6b-23ed-4195-8876-805a399fade5
-
example_nanopolish-freq-perCG.tsvstores the final per-site data in a augmented BED format where the columns represent:- Reference chromosome
- Start position in chromosome
- End position in chromosome
- Read coverage
- Methylation (i.e. methylation frequency)
- Strandedness
Chr Pos_start Pos_end Coverage Methylation Strand
NC_000913.3 3503839 3503840 7 1 +
NC_000913.3 3503840 3503841 7 1 -
NC_000913.3 3503849 3503850 7 1 +
NC_000913.3 3503850 3503851 7 1 -
-
example_nanopolish-freq-perCG-combStrand.tsvalso stores the final per-site data in the same augmented BED format but the methylation calls from both strands are merged into a single strand by averaging the methylation frequencies and adding up the coverage for a CpG site. Each column represents:- Reference chromosome
- Start position in chromosome
- End position in chromosome
- Read coverage
- Methylation (i.e. methylation frequency)
Chr Pos_start Pos_end Coverage
