SkillAgentSearch skills...

CASCABEL

Automated pipeline for amplicon sequence analysis

Install / Use

/learn @AlejandroAb/CASCABEL
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Cascabel

Cascabel is a pipeline designed to run amplicon sequence analysis across single or multiple read libraries. The objective of this pipeline is to create different output files which allow the user to explore data in a simple and meaningful way, as well as facilitate downstream analysis, based on the generated output files.

CASCABEL was designed for short read high-throughput sequence data. It covers quality control on the fastq files, assembling paired-end reads to fragments (it can also handle single end data), splitting the libraries into samples (optional), OTU picking and taxonomy assignment. Besides other output files, it will return an OTU table.

Our pipeline is implemented with Snakemake as workflow management engine and allows customizing the analyses by offering several choices for most of the steps. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an html and optional pdf report.

Current version: 7.0.0

Installation

The easiest and recommended way to do install Cascabel is via Conda. The fastest way to obtain Conda is to install Miniconda, a mini version of Anaconda that includes only conda and its dependencies.

Miniconda

In order to install conda or miniconda please see the following tutorial (recommended) or, if you are working with a Linux OS, you can try the following:

Download the installer:

<pre><code class="text"> wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh </code></pre>

Execute the installation script and follow the instructions.

<pre><code class="text"> bash Miniconda3-latest-Linux-x86_64.sh </code></pre>

Download CASCABEL

Once that you have conda installed we are ready to clone or download the project.

You can clone the project:

<pre><code class="text"> git clone https://github.com/AlejandroAb/CASCABEL.git </code></pre>

Or download it from this repository:

<pre><code class="text"> wget https://github.com/AlejandroAb/CASCABEL/archive/master.zip </code></pre>

After downloading or cloning the repository, cd to the "CASCABEL" directory and there execute the following command in order to create CASCABEL's environment:

<pre><code class="text"> conda env create --name cascabel --file environment.yaml </code></pre>

Activate environment

Now you can activate your new environment.

<pre><code class="text"> conda activate cascabel </code></pre>

After activating the environment, if you have the environmental variable PER5LIB preconfigured, it is possinle that you will need to change it. To avoid any issue, configre PER5LIB path as follow:_

<pre><code class="text"> export PERL5LIB=/path/to/conda/.conda/envs/cascabel/perl5 </code></pre>

Just make sure to change /path/to/conda/ for the correct path on your system

To identify your path, you can use the followinig command:

<pre><code class="text"> which snakemake </code></pre>

Dada2

There are some issues reported while installing dada2 within conda, if you are experiencing issues you need to perform one more final step in order to install dada2

Enter to R shell (just type R) and execute the following command:

<pre><code class="text"> BiocManager::install("dada2", version = "3.10") </code></pre>

*Please notice that BiocManager should be already installed, so you just need to execute previous command. You can also find more information at [dada2's installation guide.](https://benjjneb.github.io/dada2/dada-i nstallation.html)

<pre><code class="text"> BiocManager::install("dada2", version = "3.10") </code></pre>

*Please notice that BiocManager should be already installed, so you just need to execute previous command. You can also find more information at dada2's installation guide.

Getting started

Required input files:

  • Forward raw reads (fastq or fastq.gz)
  • Reverse raw reads (fastq or fastq.gz) (only for paired-end layout)
  • File with barcode information (only for demultiplexing: format)

Main expected output files for downstream analysis

  • Demultiplexed and trimmed reads
  • OTU or ASV table
  • Representative sequences fasta file
  • Taxonomy OTU assignation
  • Taxonomy summary
  • Representative sequence alignment
  • Phylogenetic tree
  • CASCABEL Report

Run Cascabel

All the parameters and behavior of the workflow is specified through the configuration file, therefore the easiest way to have the pipeline running is to filling up some required parameters on such file.

#------------------------------------------------------------------------------#
#                             Project Name                                     #
#------------------------------------------------------------------------------#
# The name of the project for which the pipeline will be executed. This should #
# be the same name used as the first parameter on init_sample.sh script (if    #
# used for multiple libraries                                                 #
#------------------------------------------------------------------------------#
PROJECT: "My_CASCABEL_Project"

#------------------------------------------------------------------------------#
#                            LIBRARIES/SAMPLES                                 #
#------------------------------------------------------------------------------#
# SAMPLES/LIBRARIES you want to include in the analysis.                       #
# Use the same library names as with the init_sample.sh script.                #
# Include each library name surrounded by quotes, and comma separated.         #
# i.e LIBRARY:  ["LIB_1","LIB_2",..."LIB_N"]                                   #
# LIBRARY_LAYOUT: Configuration of the library; all the libraries/samples      #
#                 must have the same configuration; use:                       #
#                 "PE" for paired-end reads [Default].                         #
#                 "SE" for single-end reads.                                   #
#------------------------------------------------------------------------------#
LIBRARY: ["EXP1"]
LIBRARY_LAYOUT: "PE"

#------------------------------------------------------------------------------#
#                             INPUT FILES                                      #
#------------------------------------------------------------------------------#
# To run Cascabel for multiple libraries you can provide an input file, tab    #
# separated with the following columns:                                        #
# - Library: Name of the library (this have to match with the values entered   #
#            in the LIBRARY variable described above).                         #
# - Forward reads: Full path to the forward reads.                             #
# - Reverse reads: Full path to the reverse reads (only for paired-end).       #
# - metadata:      Full path to the file with the information for              #
#                  demultiplexing the samples (only if needed).                #
# The full path of this file should be supplied in the input_files variable,   #
# otherwise, you have to enter the FULL PATH for both: the raw reads and the   #
# metadata file (barcode mapping file). The metadata file is only needed if    #
# you want to perform demultiplexing.                                          #
# If you want to avoid the creation of this file a third solution is available #
# using the script init_sample.sh. More info at the project Wiki:              #
# https://github.com/AlejandroAb/CASCABEL/wiki#21-input-files                  #
#                                                                              #
#-----------------------------       PARAMS       -----------------------------#
#                                                                              #
# - fw_reads:  Full path to the raw reads in forward direction (R1)            #
# - rw_reads:  Full path to the raw reads in reverse direction (R2)            #
# - metadata:  Full path to the metadata file with barcodes for each sample    #
#              to perform library demultiplexing                               #
# - input_files: Full path to a file with the information for the library(s)   #
#                                                                              #
# ** Please supply only one of the following:                                  #
#     - fw_reads, rv_reads and metadata                                        #
#     - input_files                                                            #
#     - or use init_sample.sh script directly                                  #
#------------------------------------------------------------------------------#
fw_reads: "/full/path/to/forward.reads.fq"
rv_reads: "/full/path/to/reverse.reads.fq"
metadata: "/full/path/to/metadata.barcodes.txt"
#or
input_files: "/full/path/to/input_reference.txt"

#------------------------------------------------------------------------------#
#  ASV_WF:             Binned qualities and Big data workflow                  #
#------------------------------------------------------------------------------#
# For fastq files with binned qualities (e.g. NovaSeq and NextSeq) the error   #
# learning process within dada2 can be affected, and some data scientists      #
# suggest that enforcing monotonicity could be beneficial for the analysis.    #
# In this section, you can modify key parameters to enforce monotonicity and   #
# also go through a big data workflow when the number of reads may exceed the  #
# physical memory limit.
# More on binned qualities: https://www.illumina.com/content/d

Related Skills

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated4mo ago
Forks3

Languages

Python

Security Score

92/100

Audited on Nov 25, 2025

No findings