SkillAgentSearch skills...

SpikeFlow

Pipeline to analyse ChIP-Rx data, i.e ChIP-Seq with reference exogenous genome spike-in normalization

Install / Use

/learn @DavideBrex/SpikeFlow

README

<img align="left" width="55%" src="LogoSpikeFlow.png"> <br clear="left"/>

A snakemake pipeline for the analysis of ChIP-Rx data

Snakemake GitHub actions status DOI

If you use this workflow in a paper, don't forget to give credits to the authors. See the citation section.

About

SpikeFlow is a Snakemake-based workflow designed to analyze ChIP-seq data with spike-in normalization (i.e. ChIP-Rx). Spike-in controls are used to provide a reference for normalizing sample-to-sample variation. These controls are typically DNA from a different species added to each ChIP and input sample in consistent amounts. This workflow facilitates accurate and reproducible chromatin immunoprecipitation studies by integrating state-of-the-art computational methodologies.

Key Features:

  • Diverse Normalization Techniques: This workflow implements five normalization methods,to accommodate varying experimental designs and enhance data comparability. These are: RPM, RRPM, Rx-Input, Downsampling, and Median.

  • Quality Control: Spike-in quality control, to ensure a proper comparison between different experimental conditions

  • Peak Calling: The workflow incorporates three algorithms for peak identification, crucial for delineating protein-DNA interaction sites. The user can choose the type of peak to be called: narrow (macs2), broad (epic2), or very-broad (edd). Moreover, the pipeline allows to call spike-in normalised peaks (only narrow and broad modes).

  • BigWig Generation for Visualization: Normalised BigWig files are generated for genome-wide visualization, compatible with standard genomic browsers, thus facilitating detailed chromatin feature analyses.

  • Differential Peak Analysis: If needed, the user can enable spike-in normalized differential peak analysis. The pipeline will generate PCA and Volcano plots to facilitate the interpretation of the results.

  • Scalability: Leveraging Snakemake, the workflow ensures an automated, error-minimized pipeline adaptable to both small and large-scale genomic datasets.

Currently supported genomes:

  • Endogenous: mm9, mm10, hg19, hg38
  • Exogenous (spike-in): dm3, dm6, mm10, mm9, hg19, hg38, hs1

Table of Contents

  1. Installation
  2. Configuration
  3. Run the workflow
  4. Output files
  5. Troubleshooting
  6. Citation

<a name="install"></a>

Installation

Step 1 - Install a Conda-based Python3 distribution

If you do not already have Conda installed on your machine/server, install a Conda-based Python3 distribution. We recommend Mambaforge, which includes Mamba, a fast and robust replacement for the Conda package manager. Mamba is preferred over the default Conda solver due to its speed and reliability.

⚠️ NOTE: Conda (or Mamba) is needed to run SpikeFlow.

Step 2 - Install Snakemake

To run this pipeline, you'll need to install Snakemake.

If you already have it installed in a conda environment, please check with the command snakemake --version and ensure a version >= 7.17.0. Otherwise, please follow the instructions below.

Once you have conda installed, you can create a new environment and install Snakemake with:

conda create -c bioconda -c conda-forge -n snakemake snakemake

For mamba, use the following code:

 mamba create -c conda-forge -c bioconda -n snakemake snakemake

Once the environment is created, activate it with:

conda activate snakemake

or

mamba activate snakemake

For further information please check the Snakemake documentation on how to install.

Step 3 - Install Singularity (recommended)

For a fast workflow installation, it is recommended to use Singularity (compatible with version 3.9.5). This bypasses the need for Conda to set up required environments, as these are already present within the container that will be pulled from dockerhub with the use of the --software-deployment-method conda apptainer flag.

To install singularity check its website.

Step 4 - Download SpikeFlow

To obtain SpikeFlow, you have two options:

  • Download the source code as zip file from the latest version. For example: wget https://github.com/DavideBrex/SpikeFlow/archive/refs/tags/v1.2.0.zip will download a zip file. Unzip it and move to the Spikeflow-1.2.0 folder.

  • Clone the repository on your local machine. See here the instructions.

Step 5 - Test the workflow

Once you obtained the latest version of SpikeFlow, the config.yaml and the samples_sheet.csv files are already set to run an installation test. You can open them to have an idea about their structure. All the files needed for the test are in the .test folder (on ubuntu, type ctrl + h to see hidden files and folders).

To test whether SpikeFlow is working properly, jump directly to the Run the workflow section of the documentation.

The usage of this workflow is also described in the Snakemake Workflow Catalog.

<a name="config"></a>

Configuration

1. Sample Sheet Input Requirements

Before executing the pipeline, you need to prepare a sample sheet containing detailed information about the samples to analyze. You can find an example of this file under config/samples_sheet.csv. The required format is a comma-separated values (CSV) file, consisting of eight columns and including a header row. For each sample (row), you need to specify:

| Column Name | Description | |------------------- |---------------------------------------------------------------------------------------------------------- | | sample | Unique sample name | | replicate | Integer indicating the number of replicate (if no replicate simply add 1) | | antibody | Antibody used for the experiment (leave empty for Input samples) | | control | Unique sample name of the control (it has to be specified also in the sample column, but in another row) | | control_replicate | Integer indicating the number of replicate for the control sample (if no replicate simply add 1) | | peak_type | Can only be equal to: narrow, broad, very-broad. It indicates the type of peak calling to perform | | fastq_1 | Path to the fastq file of the sample (if paired-end, here goes the forward mate, i.e. R1) | | fastq_2 | ONLY for paired-end, otherwise leave empty. Path to the fastq file of the reverse mate (i.e. R2) |

For the input samples, leave empty the values of all the columns except for sample, replicate and fastq path(s).

Example 1 (single end)

|sample |replicate|antibody|control |control_replicate|peak_type|fastq_1 |fastq_2| |-----------------|---------|--------|---------------|-----------------|---------|------------------------------------------------------|-------| |H3K4me3_untreated|1 |H3K4me3 |Input_untreated|1 |narrow |fastq/H3K4me3_untreated-1_L1.fastq.gz | | |H3K4me3_untreated|1 |H3K4me3 |Input_untreated|1 |narrow |fastq/H3K4me3_untreated-1_L2.fastq.gz | | |Input_untreated |1 | | | | |fastq/Input-untreated-1_fastq.gz| |

⚠️ NOTE: If your sample has multiple lanes, you can simple add a new row with the same values in all the columns except for fastq_1 (and fastq_2 if PE). In the table above, H3K4me3_untreated has two lanes

Example 2 (paired end)

|sample |replicate|antibody|control |control_replicate|peak_type|fastq_1 |fastq_2 | |-----------------|---------|--------|---------------|-----------------|---------|------------------------------------------------------|-------------------------------------| |H3K9me2_untreated|1 |H3K9me2 |Input_untreated|1 |very-broad|fastq/H3K9me2_untreated-1_R1.fastq.gz |fastq/H3K9me2_untreated-1_R2.fastq.gz| |H3K9me2_untreated|2 |H3K9me2 |Input_untreated|1 |very-broad|fastq/H3K9me2_untreated-2_R1.fastq.gz |fastq/H3K9me2_untreated-2_R2.fastq.gz| |H3K9me2_EGF |1 |H3K9me2 |Input_EGF |1 |very-broad|fastq/H3K9me2_EGF-1_R1.fastq.gz |fastq/H3K9me2_EGF-1_R2.fastq.gz | |H3K9me2_EGF |2 |H3K9me2 |Input_EGF |1 |very-broad|fastq/H3K9me2_EGF-2_R1.fastq.gz |fastq/H3K9me2_EGF-2_R2.fastq.gz | |Input_untreated |1 | | | | |fastq/Input-untreated-1_R1.fastq.gz |fastq/Input-untreated-1_R2.fastq.gz | |Input_EGF |1 | |

Related Skills

View on GitHub
GitHub Stars14
CategoryDevelopment
Updated11d ago
Forks0

Languages

Python

Security Score

95/100

Audited on Mar 19, 2026

No findings