SpikeFlow
Pipeline to analyse ChIP-Rx data, i.e ChIP-Seq with reference exogenous genome spike-in normalization
Install / Use
/learn @DavideBrex/SpikeFlowREADME
A snakemake pipeline for the analysis of ChIP-Rx data
If you use this workflow in a paper, don't forget to give credits to the authors. See the citation section.
About
SpikeFlow is a Snakemake-based workflow designed to analyze ChIP-seq data with spike-in normalization (i.e. ChIP-Rx). Spike-in controls are used to provide a reference for normalizing sample-to-sample variation. These controls are typically DNA from a different species added to each ChIP and input sample in consistent amounts. This workflow facilitates accurate and reproducible chromatin immunoprecipitation studies by integrating state-of-the-art computational methodologies.
Key Features:
-
Diverse Normalization Techniques: This workflow implements five normalization methods,to accommodate varying experimental designs and enhance data comparability. These are: RPM, RRPM, Rx-Input, Downsampling, and Median.
-
Quality Control: Spike-in quality control, to ensure a proper comparison between different experimental conditions
-
Peak Calling: The workflow incorporates three algorithms for peak identification, crucial for delineating protein-DNA interaction sites. The user can choose the type of peak to be called: narrow (macs2), broad (epic2), or very-broad (edd). Moreover, the pipeline allows to call spike-in normalised peaks (only narrow and broad modes).
-
BigWig Generation for Visualization: Normalised BigWig files are generated for genome-wide visualization, compatible with standard genomic browsers, thus facilitating detailed chromatin feature analyses.
-
Differential Peak Analysis: If needed, the user can enable spike-in normalized differential peak analysis. The pipeline will generate PCA and Volcano plots to facilitate the interpretation of the results.
-
Scalability: Leveraging Snakemake, the workflow ensures an automated, error-minimized pipeline adaptable to both small and large-scale genomic datasets.
Currently supported genomes:
- Endogenous: mm9, mm10, hg19, hg38
- Exogenous (spike-in): dm3, dm6, mm10, mm9, hg19, hg38, hs1
Table of Contents
<a name="install"></a>
Installation
Step 1 - Install a Conda-based Python3 distribution
If you do not already have Conda installed on your machine/server, install a Conda-based Python3 distribution. We recommend Mambaforge, which includes Mamba, a fast and robust replacement for the Conda package manager. Mamba is preferred over the default Conda solver due to its speed and reliability.
⚠️ NOTE: Conda (or Mamba) is needed to run SpikeFlow.
Step 2 - Install Snakemake
To run this pipeline, you'll need to install Snakemake.
If you already have it installed in a conda environment, please check with the command snakemake --version and ensure a version >= 7.17.0.
Otherwise, please follow the instructions below.
Once you have conda installed, you can create a new environment and install Snakemake with:
conda create -c bioconda -c conda-forge -n snakemake snakemake
For mamba, use the following code:
mamba create -c conda-forge -c bioconda -n snakemake snakemake
Once the environment is created, activate it with:
conda activate snakemake
or
mamba activate snakemake
For further information please check the Snakemake documentation on how to install.
Step 3 - Install Singularity (recommended)
For a fast workflow installation, it is recommended to use Singularity (compatible with version 3.9.5). This bypasses the need for Conda to set up required environments, as these are already present within the container that will be pulled from dockerhub with the use of the --software-deployment-method conda apptainer flag.
To install singularity check its website.
Step 4 - Download SpikeFlow
To obtain SpikeFlow, you have two options:
-
Download the source code as zip file from the latest version. For example:
wget https://github.com/DavideBrex/SpikeFlow/archive/refs/tags/v1.2.0.zipwill download a zip file. Unzip it and move to the Spikeflow-1.2.0 folder. -
Clone the repository on your local machine. See here the instructions.
Step 5 - Test the workflow
Once you obtained the latest version of SpikeFlow, the config.yaml and the samples_sheet.csv files are already set to run an installation test.
You can open them to have an idea about their structure.
All the files needed for the test are in the .test folder (on ubuntu, type ctrl + h to see hidden files and folders).
To test whether SpikeFlow is working properly, jump directly to the Run the workflow section of the documentation.
The usage of this workflow is also described in the Snakemake Workflow Catalog.
<a name="config"></a>
Configuration
1. Sample Sheet Input Requirements
Before executing the pipeline, you need to prepare a sample sheet containing detailed information about the samples to analyze. You can find an example of this file under config/samples_sheet.csv.
The required format is a comma-separated values (CSV) file, consisting of eight columns and including a header row.
For each sample (row), you need to specify:
| Column Name | Description | |------------------- |---------------------------------------------------------------------------------------------------------- | | sample | Unique sample name | | replicate | Integer indicating the number of replicate (if no replicate simply add 1) | | antibody | Antibody used for the experiment (leave empty for Input samples) | | control | Unique sample name of the control (it has to be specified also in the sample column, but in another row) | | control_replicate | Integer indicating the number of replicate for the control sample (if no replicate simply add 1) | | peak_type | Can only be equal to: narrow, broad, very-broad. It indicates the type of peak calling to perform | | fastq_1 | Path to the fastq file of the sample (if paired-end, here goes the forward mate, i.e. R1) | | fastq_2 | ONLY for paired-end, otherwise leave empty. Path to the fastq file of the reverse mate (i.e. R2) |
For the input samples, leave empty the values of all the columns except for sample, replicate and fastq path(s).
Example 1 (single end)
|sample |replicate|antibody|control |control_replicate|peak_type|fastq_1 |fastq_2| |-----------------|---------|--------|---------------|-----------------|---------|------------------------------------------------------|-------| |H3K4me3_untreated|1 |H3K4me3 |Input_untreated|1 |narrow |fastq/H3K4me3_untreated-1_L1.fastq.gz | | |H3K4me3_untreated|1 |H3K4me3 |Input_untreated|1 |narrow |fastq/H3K4me3_untreated-1_L2.fastq.gz | | |Input_untreated |1 | | | | |fastq/Input-untreated-1_fastq.gz| |
⚠️ NOTE: If your sample has multiple lanes, you can simple add a new row with the same values in all the columns except for fastq_1 (and fastq_2 if PE). In the table above, H3K4me3_untreated has two lanes
Example 2 (paired end)
|sample |replicate|antibody|control |control_replicate|peak_type|fastq_1 |fastq_2 | |-----------------|---------|--------|---------------|-----------------|---------|------------------------------------------------------|-------------------------------------| |H3K9me2_untreated|1 |H3K9me2 |Input_untreated|1 |very-broad|fastq/H3K9me2_untreated-1_R1.fastq.gz |fastq/H3K9me2_untreated-1_R2.fastq.gz| |H3K9me2_untreated|2 |H3K9me2 |Input_untreated|1 |very-broad|fastq/H3K9me2_untreated-2_R1.fastq.gz |fastq/H3K9me2_untreated-2_R2.fastq.gz| |H3K9me2_EGF |1 |H3K9me2 |Input_EGF |1 |very-broad|fastq/H3K9me2_EGF-1_R1.fastq.gz |fastq/H3K9me2_EGF-1_R2.fastq.gz | |H3K9me2_EGF |2 |H3K9me2 |Input_EGF |1 |very-broad|fastq/H3K9me2_EGF-2_R1.fastq.gz |fastq/H3K9me2_EGF-2_R2.fastq.gz | |Input_untreated |1 | | | | |fastq/Input-untreated-1_R1.fastq.gz |fastq/Input-untreated-1_R2.fastq.gz | |Input_EGF |1 | |
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
