CHEUI

Concurrent identification of m6A and m5C modifications in individual molecules from nanopore sequencing

Generate Convert Improve

Install / Use

/learn @comprna/CHEUI

About this skill

Quality Score

0/100

README

CHEUI: Methylation (CH<sub>3</sub>) Estimation Using Ionic current <img src="https://github.com/comprna/CHEUI/blob/master/misc/CHEUI_logo.png" width="280" height="250">

About CHEUI

CHEUI (Methylation (CH<sub>3</sub>) Estimation Using Ionic current) is an RNA modification detection software for Oxford Nanopore direct RNA sequencing data. CHEUI can be used to detect m6A and m5C in individual reads at single-nucleotide resolution from any sample (e.g. single condition), or detect differential m6A or m5C between any two conditions. CHEUI uses a two-stage deep learning method to detect m6A and m5C transcriptome-wide at single-read and single-site resolution in any sequence context (i.e. without any sequence constrains).

CHEUI is open source and freely available under an Academic Public License (see copy of the license in this repository).

Dependencies
Outline of CHEUI-solo and CHEUI-diff
Preprocessing data before running CHEUI
Install CHEUI
IMPORTANT
Detect m6A and m5C modifications in one condition
Identify differential RNA modifications between two conditions

Dependencies

python=3.7
numpy==1.19.2
pandas==1.3.4
tensorflow-gpu==2.4.1
keras-preprocessing==1.1.2

Outline of CHEUI-solo and CHEUI-diff

Preprocessing data before running CHEUI:

Before running CHEUI:

Raw signal data (fast5) should be basecalled using Guppy 4.0.11+ (4.0.11 or later) (https://community.nanoporetech.com/downloads/guppy/)(basecaller model used template_rna_r9.4.1_70bps*)
Basecalled sequences (fastq) should be aligned to a reference transcriptome using minimap2 and primary, positive strand alignments should be selected, e.g.

minimap2 -ax map-ont -k14 <transcriptome fasta> <read fastq> | samtools view -F 2324 -b | samtools sort > <sorted-bam-file>
samtools index <sorted-bam-file>

Signal data should be resquiggled to aligned sequences using Nanopolish (https://nanopolish.readthedocs.io/en/latest/), ensuring that events are rescaled, e.g.

nanopolish index -s <sequencing_summary.txt> -d <fast5_folder> <read fastq>

nanopolish eventalign -t 48 \
--reads <read fastq> \
--bam <sorted-bam-file> \
--genome <transcriptome fasta> \
--scale-events --signal-index  --samples --print-read-names > nanopolish_out.txt

Install CHEUI

Installation can be performed manually or using Conda (recommended).

Manual installation:

git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

Conda installation with manual CUDA installation (recommended):

conda create --name cheui python=3.7 tensorflow-gpu=2.4.1 pandas=1.3.4 -y && conda activate cheui
git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

Conda installation with integrated CUDA installation (not recommended):

conda create --name cheui python=3.7 tensorflow-gpu=2.4.1 pandas=1.3.4 conda-forge::cudatoolkit-dev -y && conda activate cheui
git clone https://github.com/comprna/CHEUI.git
cd CHEUI/test

IMPORTANT

Please follow the instructions below carefully.

Notice that for detecting m6A or m5C, the nanopolish output files require different preprocessing scripts: CHEUI_preprocess_m6A.py for m6A and CHEUI_preprocess_m5C.py for m5C.

CHEUI model 1 (read level predictions) and model 2 (site level predictions) use different predictive models for m6A and m5C that have to be specified using the --DL_model flag:

 for m6A: 
 ```../CHEUI_trained_models/CHEUI_m6A_model1.h5``` and ```../CHEUI_trained_models/CHEUI_m6A_model2.h5``` 
 For m5C: 
 ```../CHEUI_trained_models/CHEUI_m5C_model1.h5``` and ```../CHEUI_trained_models/CHEUI_m5C_model2.h5```

Detect m6A and m5C modifications in one condition

CHEUI preprocessing step

This script takes the output from nanopolish and creates a file containing signals corresponding to 9-mers centered in As and IDs.

../scripts/CHEUI_preprocess_m6A.py --help

required arguments:
  -i, --input_nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer_model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out_dir           output directory

optional arguments:
  -h, --help              show this help message and exit
  -v, --version           show program's version number and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of CPUs (threads) to use

Example command of the preprocessing step for m6A:

python3 ../scripts/CHEUI_preprocess_m6A.py -i nanopolish_output_test.txt -m ../kmer_models/model_kmer.csv -o out_A_signals+IDs.p -n 15

The processing of the Nanopolish output for m5C is very similar:

../scripts/CHEUI_preprocess_m5C.py --help

required arguments:
  -i, --input_nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer_model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out_dir           output directory

optional arguments:
  -h, --help              show this help message and exit
  -v, --version           show program's version number and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of cores to use

Example command of the preprocessing step for m5C:

python3 ../scripts/CHEUI_preprocess_m5C.py -i nanopolish_output_test.txt -m ../kmer_models/model_kmer.csv -o out_C_signals+IDs.p -n 15

CHEUI preprocessing step -- C++ version

A faster method to run the CHEUI preprocessing step. The C++ version is 2-10x times faster than the python version.

Installation

cd ../scripts/preprocessing_CPP/
./build.sh

Parameters of the program

$ ./CHEUI -h
required arguments:
  -i, --input-nanopolish  Nanopolish output file. Nanopolish should be run with the following flags:
                          nanopolish eventalign --reads <in.fasta>--bam
                          <in.bam> --genome <genome.fa> --print-read-names--
                          scale-events --samples > <out.txt>
  -m, --kmer-model        file containing the expected signal k-mer means
                          (available at CHEUI/kmer_models/model_kmer.csv)
  -o, --out-dir           output directory
  --m6A/--m5C             preprocessing type

optional arguments:
  -h, --help              show this help message and exit
  -s <str>, --suffix_name <str>
                          name to use for output files
  -n CPU, --cpu CPU       Number of cores to use
  -t, --temp-dir          temp file directory (default: out dir)

Example command of the preprocessing step for m6A:

./CHEUI -i ../../test/nanopolish_output_test.txt -o ../../test/out_A_signals+IDs.p/ -m ../../kmer_models/model_kmer.csv -n 16 --m6A

Example command of the preprocessing step for m5C:

./CHEUI -i ../../test/nanopolish_output_test.txt -o ../../test/out_C_signals+IDs.p/ -m ../../kmer_models/model_kmer.csv -n 16 --m5C

For large nanopolish file, we recommend to split the file into smaller files and run the preprocessing step, then using the following command to combine the outputs

python3 ../scripts/combine_binary_file.py -i [output binary folder] -o [combined output file name]
`

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

comprna

View profile

View on GitHub

GitHub Stars44

CategoryEducation

Updated1mo ago

Forks4

comprna/CHEUI

Languages

C++

Security Score

80/100

Audited on Feb 13, 2026

No findings