Methylpy
WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
Install / Use
/learn @yupenghe/MethylpyREADME
methylpy
Welcome to the home page of methylpy, a pyhton-based analysis pipeline for
- (single-cell) (whole-genome) bisulfite sequencing data
- (single-cell) NOMe-seq data
- differential methylation analysis
methylpy is available at github and PyPI.
Note
- Version 1.3 has major changes on options related to mapping. A new aligner, minimap2, is supported starting
in this version. To accommodate this new features,
--bowtie2option is replaced with--aligner, which specifies the aligner to use. The parameters of--build-referencefunction are modified as well. - methylpy only considers cytosines that are in uppercase in the genome fasta file (i.e. not masked)
- methylpy was initiated by and built on the work of Mattew D. Schultz
- beta version of tutorial is released!
What can methylpy do?
Processing bisulfite sequencing data and NOMe-seq data
- fast and flexible pipeline for both single-end and paired-end data
- all the way from raw reads (fastq) to methylation state and/or open chromatin readouts
- also support getting readouts from alignment (BAM file)
- including options for read trimming, quality filter and PCR duplicate removal
- accept compressed input and generate compressed output
- support post-bisulfite adaptor tagging (PBAT) data
Calling differentially methylated regions (DMRs)
- DMR calling at single cytosine level
- support comparison across 2 or more samples/groups
- conservative and accurate
- useful feature for dealing with low-coverage data by combining data of adjacent cytosines
What you want to do
- Use methylpy without installation
- Install methylpy
- Test methylpy
- Process data
- Call DMRs
- Additional functions for data processing
- Cite methylpy
run methylpy -h to get a list of functions.
Use methylpy without installation
Methylpy can be used within docker container with all dependencies resolved. The docker image for methylpy
can be built from the Dockerfile under methylpy/ directory using the below command. It will take ~3g space.
git clone https://github.com/yupenghe/methylpy.git
cd methylpy/
docker build -t methylpy:latest ./
Then, you can start a docker container by running
docker run -it methylpy:latest
methylpy can be run with full functionality within the container. You can mount your working directory
to the container by adding -v option to the docker command and store methylpy output there.
docker run -it -v /YOUR/WORKING/PATH/:/output methylpy:latest
See here for details.
Install methylpy
Step 1 - Download methylpy
The easiest way of installing methylpy will be through PyPI by running pip install methylpy. The
command pip install --upgrade methylpy updates methylpy to latest version.
Methylpy can also be installed through anaconda or [miniconda] (https://docs.conda.io/en/latest/miniconda.html).
conda env create --name methylpy_env
conda activate methylpy_env
conda install -y -c bioconda -c conda-forge methylpy
Alternatively, methylpy can be installed through github: enter the directory where you would like to install methylpy and run
git clone https://github.com/yupenghe/methylpy.git
cd methylpy/
python setup.py install
If you would like to install methylpy in path of your choice, run
python setup.py install --prefix=/USER/PATH/.
Then, try methylpy and if no error pops out, the setup is likely successful.
See Test methylpy for more rigorious test.
Last, processing large dataset will require large spare space for temporary files.
Usually, the default directory for temporary files will not meet the need.
You may want to set the TMPDIR environmental variable to the (absolute) path of a directory
on hard drive with sufficient space (e.g. /YOUR/TMP/DIR/). This can be done by adding the
below command to ~/.bashrc file: export TMPDIR=/YOUR/TMP/DIR/ and run source ~/.bashrc.
Step 2 - Install dependencies
python is required for running methylpy. Both python2 (>=2.7.9) and python3 (>=3.6.2) will work. methylpy also depends on two python modules, numpy and scipy. The easiest way to get these dependencies is to install anaconda.
In addition, some features of methylpy depend on several publicly available tools (not all of them are required if you only use a subset of methylpy functions).
- cutadapt (>=1.9) for raw read trimming
- bowtie and/or bowtie2 for alignment
- samtools (>=1.3) for alignment result manipulation. Samtools can also be installed using conda
conda install -c bioconda samtools - Picard (>=2.10.8) for PCR duplicate removal
- java for running Picard (its path needs to be included in
PATHenvironment variable) . - wigToBigWig for converting methylpy output to bigwig format
Lastly, if paths to cutadapt, bowtie/bowtie2, samtools and wigToBigWig are included in PATH variable,
methylpy can run these tools directly. Otherwise, the paths have to be passed to methylpy as augments.
Path to Picard needs to be passed to methylpy as a parameter to run PCR duplicate removal.
Optional step - Compile rms.cpp
DMR finding requires an executable methylpy/methylpy/run_rms_tests.out, which was compiled from
C++ code methylpy/methylpy/rms.cpp. In most cases, the precompiled file can be used directly. To
test this, simply run execute methylpy/methylpy/run_rms_tests.out. If help page shows, recompiling
is not required. If error turns up, the executable needs to be regenerated by compiling rms.cpp and
this step requires GSL installed correctly. In most linux operating
system, the below commands will do the job
cd methylpy/methylpy/
g++ -O3 -l gsl -l gslcblas -o run_rms_tests.out rms.cpp
In Ubuntu (>=16.04), please try the below commands first.
cd methylpy/methylpy/
g++ -o run_rms_tests.out rms.cpp `gsl-config --cflags --libs`
Lastly, the compiled file run_rms_tests.out needs to be copied to the
directory where methylpy is installed. You can get the directory by running
the blow commands in python console (python to open a python console):
import methylpy
print(methylpy.__file__[:methylpy.__file__.rfind("/")]+"/")
Test methylpy
To test whether methylpy and the dependencies are installed and set up correctly, run
wget http://neomorph.salk.edu/yupeng/share/methylpy_test.tar.gz
tar -xf methylpy_test.tar.gz
cd methylpy_test/
python run_test.py
The test should take around 3 minutes, and progress will be printed on screen. After the test is started,
two files test_output_msg.txt and test_error_msg.txt will be generated. The former
contains more details about each test and the later stores error message (if any) as well as additional
information.
If test fails, please check test_error_msg.txt for the error message. If you decide to submit an issue
regarding test failure to methylpy github page, please include the error message in this file.
Process data
Please see tutorial. for more details.
Step 1 - Build converted genome reference
Build bowtie/bowtie2 index for converted genome. Run methylpy build-reference -h
to get more information. An example of building mm10 mouse reference index:
methylpy build-reference \
--input-files mm10_bt2/mm10.fa \
--output-prefix mm10_bt2/mm10 \
--bowtie2 True
Step 2 - Process bisulfite sequencing and NOMe-seq data
Function single-end-pipeline is For processing single-end data. Run
methylpy single-end-pipeline -h to get help information. Below code
is an example of using methylpy to process single-end bisulfite sequencing
data. For processing NOMe-seq data, please use num_upstr_bases=1 to include
one base upstream cytosine as part of cytosine sequence context, which can be
used to tease out GC sites.
methylpy single-end-pipeline \
--read-files raw/mESC_R1.fastq.gz \
--sample mESC \
--forward-ref mm10_bt2/mm10_f \
--reverse-ref mm10_bt2/mm10_r \
--ref-fasta mm10_bt2/mm10.fa \
--num-procs 8 \
--remove-clonal True \
--path-to-picard="picard/"
An command example for processing paired-end data.
Run methylpy paired-end-pipeline -h to get more information.
methylpy paired-end-pipeline \
--read1-files raw/mESC_R1.fastq.gz \
--read2-files raw/mESC_R2.fastq.gz \
--sample mESC \
--forward-ref mm10_bt2/mm10_f \
--reverse-ref mm10_bt2/mm10_r \
--ref-fasta mm10_bt2/mm10.fa \
--num-procs 8 \
--remove-clonal True \
--path-to-picard="picard/"
If you would like methylpy to perform binomial test for teasing out sites that show
methylation above noise level (which is mainly due to sodium bisulfite non-conversion),
please check options --binom-test and --unmethylated-control.
Output format
Output file(s) are (compressed) tab-separated text file(s) in allc format. "allc" stands
for all cytosine (C). Each row in an allc file corresponds to one cytosine in the genome.
An allc file contain 7 mandatory columns and no header. Two additional columns may be added
with --add-snp-info option when using single-end-pipeline, `paired-en
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
