NanoDoc
RNA modification detection using Nanopore raw reads with Deep One Class classification
Install / Use
/learn @uedaLabR/NanoDocREADME
nanoDoc
Introduction
RNA modification detection using Nanopore raw reads with Deep One Class classification
This software requires In vitro unmodified RNA raw read and Native RNA raw read to process.
GPU enviroment is strongly recommended to run this software.
You can find more infromation about nanoDoc in this preprint:
nanoDoc: RNA modification detection using Nanopore raw reads with Deep One-Class Classification
https://www.biorxiv.org/content/10.1101/2020.09.13.295089v1
Dependency
Python
Python (>= 3.6), packages,tensorflow, checked with tensorflow 2.3, cuda/10.1, cudnn/7.6
Install
$ git clone https://github.com/uedaLabR/nanoDoc.git
$ cd nanoDoc/src
$ python3 -m venv venv3
$ source venv3/bin/activate
(venv3) $ pip install --upgrade pip
(venv3) $ pip install -r requirements.txt
(venv3) $ cd..
(venv3) $ mkdir weight5mer
(venv3) $ mv./weight5mer_1/* ./weight5mer
(venv3) $ mv./weight5mer_2/* ./weight5mer
(venv3) $ rm -r /weight5mer_*
Prepareation
Basecalling and signal alignment is required prior to run this program.
Tombo (https://github.com/nanoporetech/tombo) resiggle command is used for propressing.
Commands
formatFile: create uniform bin sized parque file from tombo resiggled signle fast5 files.
python ./nanoDoc.py formatFile -i fast5dir -o outputdir -r fast_genome_reference -t thread
fast5dir - directory contains fast5 files (files will be searched under the directory recursively)
outputdir - oututdir
fast_genome_reference - reference fasta file
thread - number of threads (defult 4), this process is slow. a large number of thread (e.g. 10) is recommended if resources allowed.
e.g.
python ./nanoDoc.py formatfile -i /mydir/testIVT/singleFast5 -o /mydir/testIVTout -r /reference/NC000913.fa -t 10
analysis: analyse modification sites, given IVT and Native raw reads sequence
python ./nanoDoc.py analysis -w 5-mer wight -p parameter_file -rraw dir_to_IVT_raw_parquet_file -traw dir_to_Native_raw_parquet_file -o result_output -chrom chromosome(defult first chromosome in the reference) -s start -e end -strand strand(defult "+")
5-mer wight - please download prebuild weight for each 5mer
parameter_file - please download parameter file for nanoDoc
dir_to_IVT_raw_parquet_file - directory containing reference Invitro parquet files, created by formatFile command
dir_to_Native_raw_parquet_file - directory containing target Native files, created by formatFile command
result_output - outout text format file.
chromosome - chromosome, which have to match to reference file
start - start position (defult 0)
end - end position (defult end of the reference)
strand - strand "+" or "- (defult "+")
e.g.
python ./nanoDoc.py analysis -w /weight5mer/ -p /param20.txt -r /reference/NC000913.fa -rraw /equalbinnedpq/ecrRnaIvt -traw /equalbinnedpq/ecrRnaNative -o /result/result.txt -s 4035631 -e 4037072
please download the preculculate weight from repository weight5mer_1/weight5mer_2 and merge inner directory to /weight5mer
result_output format:
output file consist of 11 columns
pos - genomic position
5mer - 5mer composition at the site
depth_tgt - target reads depth
depth_ref - reference reads depth
med_current - median current value (target)
mad_current - median absolute deviation (MAD) of current value (target)
med_currentR - median current value (reference)
mad_currentR - median absolute deviation (MAD) of current value (reference)
current_ratio - tatget to reference ratio of median current value
scoreSide1 - referenct to target accumulation score
scoreSide2 - target to referenct accumulation score
scoreTotal - total score
Commands for preparetion (calculate weight)
make 5mer parquet for training
python ./nanoDocPrep.py make5mer -r /path/to/reference.fa -rraw /path/to/ivt/parquetfile/testIVTout -ssize 12000 -o /path/to/out/fivemerparquet
training first cnn
python ./nanoDocPrep.py traincnn -in /path/to/out/fivemerparquet -o /path/to/out/weight
DOC training
python ./nanoDocPrep.py traindoc -in /path/to/out/fivemerparquet -wight /path/to/out/weight/xxx.hdf -o /path/to/out/fivemer/weight
Related Skills
node-connect
341.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.5kCommit, push, and open a PR
