CRISPResso
Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Install / Use
/learn @lucapinello/CRISPRessoREADME
THIS IS AN OLD VERSION OF CRISPRESSO AND IT IS NOW DEPRECATED
PLEASE USE CRISPRESSO2
https://github.com/pinellolab/crispresso2
.. image:: https://github.com/lucapinello/CRISPResso/blob/master/CRISPResso.png?raw=true
CRISPResso is a software pipeline for the analysis of targeted CRISPR-Cas9 sequencing data. This algorithm allows for the quantification of both non-homologous end joining (NHEJ) and homologous directed repair (HDR) occurrences.
CRISPResso automatizes and performs the following steps summarized in the figure below:
- filters low quality reads,
- trims adapters,
- aligns the reads to a reference amplicon,
- quantifies the proportion of HDR and NHEJ outcomes,
- quantifies frameshift/inframe mutations (if applicable) and identifies affected splice sites,
- produces a graphical report to visualize and quantify the indels distribution and position.
.. image:: https://github.com/lucapinello/CRISPResso/blob/master/CRISPResso_pipeline.png?raw=true
The CRISPResso suite accommodates single or pooled amplicon deep sequencing, WGS datasets and allows the direct comparison of individual experiments. In fact four additional utilities are provided:
- CRISPRessoPooled: a tool for the analysis of pooled amplicon experiments
- CRISPRessoWGS: a tool for the analysis of WGS data or prealigned reads in .bam format
- CRISPRessoCompare:a tool for the comparison of two CRISPResso analyses, useful for example to compare treated and untreated samples or to compare different experimental conditions
- CRISPRessoPooledCompare: a tool to compare experiments involving several regions analyzed by either CRISPRessoPooled or CRISPRessoWGS
TRY IT ONLINE!
If you don't like command line tools you can also use CRISPResso online here: http://crispresso.rocks
Installation and Requirements
To install the command line version of CRISPResso, some dependencies must be installed before running the setup:
- Python 2.7 Anaconda: http://continuum.io/downloads
- Java: http://java.com/download
- C compiler / make. For Mac with OSX 10.7 or greater, open the terminal app and type and execute the command 'make', which will trigger the installation of OSX developer tools.Windows systems are not officially supported, although CRISPResso may work with Cygwin (https://www.cygwin.com/).
After checking that the required software is installed you can install CRISPResso from the official Python repository following these steps:
- Open a terminal window
- Type the command:
.. code:: bash
pip install CRISPResso --no-use-wheel --verbose
3) Close the terminal window
Alternatively if want to install the package without the PIP utility:
- Download the setup file: https://github.com/lucapinello/CRISPResso/archive/master.zip and decompress it
- Open a terminal window and go to the folder where you have decompressed the zip file
- Type the command: python setup.py install
- Close the terminal window and open a new one (this is important in order to setup correctly the PATH variable in your system).
The Setup will try to install these software for you:
- Trimmomatic(tested with v0.33): http://www.usadellab.org/cms/?page=trimmomatic
- Flash(tested with v1.2.11): http://ccb.jhu.edu/software/FLASH/
- Needle from the EMBOSS suite(tested with 6.6.0): ftp://emboss.open-bio.org/pub/EMBOSS/
If the setup fails on your machine you have to install them manually and put these utilities/binary files in your path!
To check that the installation worked, open a terminal window and execute CRISPResso --help, you should see the help page.
The setup will automatically create a folder in your home folder called CRISPResso_dependencies (if this folder is deleted, CRISPResso will not work!)! If you want to put the folder in a different location, you need to set the environment variable: CRISPRESSO_DEPENDENCIES_FOLDER. For example to put the folder in /home/lpinello/other_stuff you can write in the terminal BEFORE the installation:
.. code:: bash
export CRISPRESSO_DEPENDENCIES_FOLDER=/home/lpinello/other_stuff
Docker Image
If you like Docker, we provide a Docker image ready to use, so no installation is required!
https://hub.docker.com/r/lucapinello/crispresso/
To use the image first install Docker: https://docs.docker.com/engine/installation/
Then type the command:
docker pull lucapinello/crispresso
See an example on how to run CRISPResso from a Docker image in the section TESTING CRISPResso below.
OUTPUT
The output of CRISPResso consists of a set of informative graphs that allow for the quantification and visualization of the position and type of outcomes within an amplicon sequence. An example is shown below:
.. image:: https://github.com/lucapinello/CRISPResso/blob/master/CRISPResso_output.png?raw=true
Usage
CRISPResso requires two inputs: (1) paired-end reads (two files) or single-end reads (single file) in .fastq format (fastq.gz files are also accepted) from a deep sequencing experiment and (2) a reference amplicon sequence to assess and quantify the efficiency of the targeted mutagenesis. The amplicon sequence expected after HDR can be provided as an optional input to assess HDR frequency. One or more sgRNA sequences (without PAM sequences) can be provided to compare the predicted cleavage position/s to the position of the observed mutations. Coding sequence/s may be provided to quantify frameshift and potential splice site mutations.
The reads are first filtered based on the quality score (phred33) in order to remove potentially false positive indels. The filtering based on the phred33 quality score can be modulated by adjusting the optimal parameters (see additional notes below). The adapters are trimmed from the reads using Trimmomatic and then sequences are merged with FLASha (if using paired-end data).The remaining reads are then aligned with needle from the EMBOSS suite, an optimal global sequence aligner based on the Needleman-Wunsch algorithm that can easily accounts for gaps. Finally, after analyzing the aligned reads, a set of informative graphs are generated, allowing for the quantification and visualization of the position and type of outcomes within the amplicon sequence.
NHEJ events:
The required inputs are:
- Two files for paired-end reads or a single file for single-end reads in fastq format (fastq.gz files are also accepted). The reads are assumed to be already trimmed for adapters. If reads are not trimmed, please use the --trim_sequences option and the --trimmomatic_options_string if you are using an adapter different than Nextera.
- The reference amplicon sequence must also be provided.
Example:
.. code:: bash
CRISPResso -r1 reads1.fastq.gz -r2 reads2.fastq.gz -a AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT
HDR events: The required inputs are:
- Two files for paired-end reads or a single file for single-end reads in fastq format (fastq.gz files are also accepted). The reads are assumed to be already trimmed for adapters.
- The reference amplicon sequence.
- The expected amplicon sequence after HDR must also be provided.
Example:
.. code:: bash
CRISPResso -r1 reads1.fastq.gz -r2 reads2.fastq.gz -a GCTTACACTTGCTTCTGACACAACTGTGTTCACGAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGAATGCCGTCACCACCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGA -e GCTTACACTTGCTTCTGACACAACTGTGTTCACGAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGTGGAAAAAAACGCCGTCACGACGTTATGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGA
IMPORTANT: You must input the entire reference amplicon sequence (’Expected HDR Amplicon sequence’ is the reference for the sequenced amplicon, not simply the donor sequence). If only the donor sequence is provided, an error will result
Understanding the parameters of CRISPResso
Required parameters To run CRISPResso, only 2 parameters are required for single end reads, or 3 for paired end reads:
-r1 or --fastq_r1: This parameter allows for the specification of the first fastq file.
-r2 or --fastq_r2 FASTQ_R2: This parameter allows for the specification of the second fastq file for paired end reads.
-a or --amplicon_seq: This parameter allows the user to enter the amplicon sequence used for the experiment.
Optional parameters In addition to the required parameters explained in the previous section, several optional parameters can be adjusted to tweak your analysis, and to ensure CRISPResso analyzes your data in the best possible way.
-g or --guide_seq or: This parameter allows for the specification of the sgRNA sequence. If more than one sequence are included, please separate by comma/s. If the guide RNA sequence is entered, then the position of the guide RNA and the cleavage site will be indicated on the output analysis plots. Note that the sgRNA needs to be input as the guide RNA sequence (usually 20 nt) immediately 5' of the PAM sequence (usually NGG for SpCas9). If the PAM is found on the opposite strand with respect to the Amplicon Sequence, ensure the sgRNA sequence is also found on the opposite strand. The CRISPResso convention is to depict the expected cleavage position using the value of the parameter cleavage_offset nt 3' from the end of the guide. In addition, the use of alternate nucleases to SpCas9 is supported. For example, if using the Cpf1 system, enter the sequence (usually 20 nt) immediately 3' of the PAM sequence and explicitly set the cleavage_offset parameter to 1, since the default setting of -3 is suitable only for SpCas9. (default:None)
-e or
