SkillAgentSearch skills...

CHEUI

Concurrent identification of m6A and m5C modifications in individual molecules from nanopore sequencing

Install / Use

/learn @comprna/CHEUI
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CHEUI: Methylation (CH<sub>3</sub>) Estimation Using Ionic current <img src="https://github.com/comprna/CHEUI/blob/master/misc/CHEUI_logo.png" width="280" height="250">

About CHEUI

CHEUI (Methylation (CH<sub>3</sub>) Estimation Using Ionic current) is an RNA modification detection software for Oxford Nanopore direct RNA sequencing data. CHEUI can be used to detect m6A and m5C in individual reads at single-nucleotide resolution from any sample (e.g. single condition), or detect differential m6A or m5C between any two conditions. CHEUI uses a two-stage deep learning method to detect m6A and m5C transcriptome-wide at single-read and single-site resolution in any sequence context (i.e. without any sequence constrains).

CHEUI is open source and freely available under an Academic Public License (see copy of the license in this repository).


Table of Contents



Dependencies

python=3.7
numpy==1.19.2
pandas==1.3.4
tensorflow-gpu==2.4.1
keras-preprocessing==1.1.2

Outline of CHEUI-solo and CHEUI-diff

<img src="https://github.com/comprna/CHEUI/blob/master/misc/pipeline_CHEUI-solo+diff_github.png" width="900" height="500">

Preprocessing data before running CHEUI:

Before running CHEUI:

  1. Raw signal data (fast5) should be basecalled using Guppy 4.0.11+ (4.0.11 or later) (https://community.nanoporetech.com/downloads/guppy/)(basecaller model used template_rna_r9.4.1_70bps*)
  2. Basecalled sequences (fastq) should be aligned to a reference transcriptome using minimap2 and primary, positive strand alignments should be selected, e.g.
minimap2 -ax map-ont -k14 <transcriptome fasta> <read fastq> | samtools view -F 2324 -b | samtools sort > <sorted-bam-file>
samtools index <sorted-bam-file>
  1. Signal data should be resquiggled to aligned sequences using Nanopolish (https://nanopolish.readthedocs.io/en/latest/), ensuring that events are rescaled, e.g.
nanopolish index -s <sequencing_summary.txt> -d <fast5_folder> <read fastq>

nanopolish eventalign -t 48 \
--reads <read fastq> \
--bam <sorted-bam-file> \
--genome <transcriptome fasta> \
--scale-events --signal-index  --samples --print-read-names > nanopolish_out.txt

Install CHEUI

Installation can be performed manually or using Conda (recommended).

Manual installation:

git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

Conda installation with manual CUDA installation (recommended):

conda create --name cheui python=3.7 tensorflow-gpu=2.4.1 pandas=1.3.4 -y && conda activate cheui
git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

Conda installation with integrated CUDA installation (not recommended):

conda create --name cheui python=3.7 tensorflow-gpu=2.4.1 pandas=1.3.4 conda-forge::cudatoolkit-dev -y && conda activate cheui
git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

IMPORTANT


Please follow the instructions below carefully.

  1. Notice that for detecting m6A or m5C, the nanopolish output files require different preprocessing scripts: CHEUI_preprocess_m6A.py for m6A and CHEUI_preprocess_m5C.py for m5C.

  2. CHEUI model 1 (read level predictions) and model 2 (site level predictions) use different predictive models for m6A and m5C that have to be specified using the --DL_model flag:

     for m6A: 
     ```../CHEUI_trained_models/CHEUI_m6A_model1.h5``` and ```../CHEUI_trained_models/CHEUI_m6A_model2.h5``` 
     For m5C: 
     ```../CHEUI_trained_models/CHEUI_m5C_model1.h5``` and ```../CHEUI_trained_models/CHEUI_m5C_model2.h5```
    

Detect m6A and m5C modifications in one condition



CHEUI preprocessing step


This script takes the output from nanopolish and creates a file containing signals corresponding to 9-mers centered in As and IDs.

../scripts/CHEUI_preprocess_m6A.py --help

required arguments:
  -i, --input_nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer_model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out_dir           output directory

optional arguments:
  -h, --help              show this help message and exit
  -v, --version           show program's version number and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of CPUs (threads) to use

Example command of the preprocessing step for m6A:

python3 ../scripts/CHEUI_preprocess_m6A.py -i nanopolish_output_test.txt -m ../kmer_models/model_kmer.csv -o out_A_signals+IDs.p -n 15

The processing of the Nanopolish output for m5C is very similar:

../scripts/CHEUI_preprocess_m5C.py --help

required arguments:
  -i, --input_nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer_model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out_dir           output directory

optional arguments:
  -h, --help              show this help message and exit
  -v, --version           show program's version number and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of cores to use

Example command of the preprocessing step for m5C:

python3 ../scripts/CHEUI_preprocess_m5C.py -i nanopolish_output_test.txt -m ../kmer_models/model_kmer.csv -o out_C_signals+IDs.p -n 15

CHEUI preprocessing step -- C++ version


A faster method to run the CHEUI preprocessing step. The C++ version is 2-10x times faster than the python version.

Installation

cd ../scripts/preprocessing_CPP/
./build.sh

Parameters of the program

$ ./CHEUI -h
required arguments:
  -i, --input-nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer-model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out-dir           output directory
  --m6A/--m5C             preprocessing type

optional arguments:
  -h, --help              show this help message and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of cores to use
  -t, --temp-dir          temp file directory (default: out dir)

Example command of the preprocessing step for m6A:

./CHEUI -i ../../test/nanopolish_output_test.txt -o ../../test/out_A_signals+IDs.p/ -m ../../kmer_models/model_kmer.csv -n 16 --m6A

Example command of the preprocessing step for m5C:

./CHEUI -i ../../test/nanopolish_output_test.txt -o ../../test/out_C_signals+IDs.p/ -m ../../kmer_models/model_kmer.csv -n 16 --m5C

For large nanopolish file, we recommend to split the file into smaller files and run the preprocessing step, then using the following command to combine the outputs

python3 ../scripts/combine_binary_file.py -i [output binary folder] -o [combined output file name]
`

Related Skills

View on GitHub
GitHub Stars44
CategoryEducation
Updated1mo ago
Forks4

Languages

C++

Security Score

80/100

Audited on Feb 13, 2026

No findings