Ccsmeth
Detecting DNA methylation from PacBio CCS reads
Install / Use
/learn @PengNi/CcsmethREADME
ccsmeth
Detecting DNA methylation from PacBio CCS reads
Contents
Installation
ccsmeth is built on Python3 and PyTorch.
- Prerequisites:
Python3.* (version>=3.8)
pbccs (version>=6.3.0)
pbmm2 (version>=1.9.0) or minimap2 (version>=2.22-r1101)
samtools (version>=1.12)
CUDA Toolkit (version>=10.2, for GPU only) - Dependencies:
numpy
statsmodels
scikit-learn
PyTorch (version >=1.2.0, <=2.1.0)
tqdm
pysam
pybedtools
pytabix
System Requirements
ccsmeth requires only a standard computer with enough RAM to support the in-memory operations. Using GPU could acceralate the process of methylation calling.
Recommended Hardware: 128 GB RAM, 40 CPU processors, 4 TB disk storage, >=8 GB GPU
Recommended OS: Linux (Ubuntu 16.04, CentOS 7, etc.)
Option 1. One-step installation
Install ccsmeth, its dependencies, and other required packages in one step using conda and environment.yml:
# download deepsignal-plant
git clone https://github.com/PengNi/ccsmeth.git
# install tools in environment.yml
conda env create --name ccsmethenv -f /path/to/ccsmeth/environment.yml
# then the environment can be activated to use
conda activate ccsmethenv
Option 2. Step-by-step installation
(1) install ccsmeth
It is highly recommended installing ccsmeth in a virtual environment.
conda create -n ccsmethenv python=3.8
# activate
conda activate ccsmethenv
# deactivate this environment
conda deactivate
# install ccsmeth after activating ccsmethenv
# install ccsmeth from github (latest version)
git clone https://github.com/PengNi/ccsmeth.git
cd ccsmeth
python setup.py install
# OR, install ccsmeth using pip
pip install ccsmeth
# OR, install ccsmeth using conda
conda install ccsmeth -c bioconda
(2) install necessary packages
Install necessary packages (bedtools, and pbccs, pbmm2 or minimap2, samtools) in the same environment. Installing of those packages using Bioconda is recommended:
conda install bedtools -c bioconda # required by pybedtools->ccsmeth:call_mods
conda install pbccs pbmm2 samtools -c bioconda
Also install the cuda version of pytoch and cudatoolkit (>=10.2) if you want use GPU to run ccsmeth in your GPU machine. Uninstall the wrong pytorch first if you have installed it before.
conda install pytorch==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
Trained models
See models:
For the ccsmeth call_mods module:
- model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v3.ckpt: model of ccsmeth call_mods module for 5mCpG detection, trained using NA12898 pcr/MSssI and HG002 native (BS-seq as standard) PacBio Sequel II (kit 2.0) CCS reads. (for version >=0.5.0)
- model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt: model of ccsmeth call_mods module for 5mCpG detection, trained using NA12898 pcr/MSssI and HG002 native (BS-seq as standard) PacBio Sequel II (kit 2.0) CCS reads. (for version <=0.4.1)
For the aggregate mode of ccsmeth call_freqb module:
- model_ccsmeth_5mCpG_aggregate_attbigru_b11.v2p.ckpt: model of aggregate mode of ccsmeth call_freqb module for 5mCpG detection, trained using HG002 native (BS-seq as standard) PacBio Sequel II (kit 2.0) CCS reads.
Demo data
Check demo for some demo data to play with:
- hg002.chr20_demo.hifi.bam: HG002 demo hifi reads which are aligned to human genome chr20:10000000-10100000.
- chr20_demo.fa: reference sequence of human chr20:10000000-10100000.
- hg002_bsseq_chr20_demo.bed: HG002 BS-seq results of region chr20:10000000-10100000.
Quick start
Use denovo mode (first call_mods, then align_hifi):
# 1. call hifi reads with kinetics if needed
# should have added pbccs to $PATH or the used environment
ccsmeth call_hifi --subreads /path/to/subreads.bam \
--threads 10 \
--output /path/to/output.hifi.bam
# 2. call modifications
# output: [--output].modbam.bam
CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods \
--input /path/to/output.hifi.bam \
--model_file /path/to/ccsmeth/models/model_call_mods.ckpt \
--output /path/to/output.hifi.call_mods \
--threads 10 --threads_call 2 --model_type attbigru2s \
--mode denovo
# 3. align hifi reads
# should have added pbmm2 to $PATH or the used environment
ccsmeth align_hifi \
--hifireads /path/to/output.hifi.call_mods.modbam.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.call_mods.modbam.pbmm2.bam \
--threads 10
# 4. call modification frequency
# outputs: [--output].[--call_mode].all.bed
# if the input bam file contains haplotags,
# there will be [--output].[--call_mode].[hp1/hp2].bed in outputs.
# use '--call_mode count' (default):
ccsmeth call_freqb \
--input_bam /path/to/output.hifi.call_mods.modbam.pbmm2.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.call_mods.modbam.pbmm2.freq \
--threads 10 --sort --bed
# OR, use '--call_mode aggregate':
# NOTE: usually is more accurate than 'count' mode
ccsmeth call_freqb \
--input_bam /path/to/output.hifi.call_mods.modbam.pbmm2.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.call_mods.modbam.pbmm2.freq \
--threads 10 --sort --bed \
--call_mode aggregate \
--aggre_model /path/to/ccsmeth/models/model_aggregate.ckpt
OR, use align mode (first align_hifi, then call_mods):
# 1. call hifi reads with kinetics if needed
# should have added pbccs to $PATH or the used environment
ccsmeth call_hifi --subreads /path/to/subreads.bam \
--threads 10 \
--output /path/to/output.hifi.bam
# 2. align hifi reads
# should have added pbmm2 to $PATH or the used environment
ccsmeth align_hifi \
--hifireads /path/to/output.hifi.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.pbmm2.bam \
--threads 10
# 3. call modifications
# output: [--output].modbam.bam
CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods \
--input /path/to/output.hifi.pbmm2.bam \
--ref /path/to/genome.fa \
--model_file /path/to/ccsmeth/models/model_call_mods.ckpt \
--output /path/to/output.hifi.pbmm2.call_mods \
--threads 10 --threads_call 2 --model_type attbigru2s \
--mode align
# 4. call modification frequency
# outputs: [--output].[--call_mode].all.bed
# if the input bam file contains haplotags,
# there will be [--output].[--call_mode].[hp1/hp2].bed in outputs.
# use '--call_mode count':
ccsmeth call_freqb \
--input_bam /path/to/output.hifi.pbmm2.call_mods.modbam.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.pbmm2.call_mods.modbam.freq \
--threads 10 --sort --bed
# OR, use '--call_mode aggregate':
# NOTE: usually is more accurate than 'count' mode
ccsmeth call_freqb \
--input_bam /path/to/output.hifi.pbmm2.call_mods.modbam.bam \
--ref /path/to/genome.fa \
--output /path/to/output.hifi.pbmm2.call_mods.modbam.freq \
--threads 10 --sort --bed \
--call_mode aggregate \
--aggre_model /path/to/ccsmeth/models/model_aggregate.ckpt
Usage
Users can use ccsmeth subcommands --help/-h for help.
[the cmds need to be updated]
1. call hifi reads
ccsmeth call_hifi -h
usage: ccsmeth call_hifi [-h] --subreads SUBREADS [--output OUTPUT]
[--path_to_ccs PATH_TO_CCS] [--threads THREADS]
[--min-passes MIN_PASSES] [--by-strand] [--hd-finder]
[--log-level LOG_LEVEL]
[--path_to_samtools PATH_TO_SAMTOOLS]
call hifi reads with kinetics from subreads.bam using CCS, save in bam/sam
format. cmd: ccsmeth call_hifi -i input.subreads.bam
optional arguments:
-h, --help show this help message and exit
--path_to_samtools PATH_TO_SAMTOOLS
full path to the executable binary samtools file. If
not specified, it is assumed that samtools is in the
PATH.
INPUT:
--subreads SUBREADS, -i SUBREADS
path to subreads.bam file as input
OUTPUT:
--output OUTPUT, -o O
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
