CASCABEL
Automated pipeline for amplicon sequence analysis
Install / Use
/learn @AlejandroAb/CASCABELREADME
Cascabel
Cascabel is a pipeline designed to run amplicon sequence analysis across single or multiple read libraries. The objective of this pipeline is to create different output files which allow the user to explore data in a simple and meaningful way, as well as facilitate downstream analysis, based on the generated output files.
CASCABEL was designed for short read high-throughput sequence data. It covers quality control on the fastq files, assembling paired-end reads to fragments (it can also handle single end data), splitting the libraries into samples (optional), OTU picking and taxonomy assignment. Besides other output files, it will return an OTU table.
Our pipeline is implemented with Snakemake as workflow management engine and allows customizing the analyses by offering several choices for most of the steps. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an html and optional pdf report.
Current version: 7.0.0
Installation
The easiest and recommended way to do install Cascabel is via Conda. The fastest way to obtain Conda is to install Miniconda, a mini version of Anaconda that includes only conda and its dependencies.
Miniconda
In order to install conda or miniconda please see the following tutorial (recommended) or, if you are working with a Linux OS, you can try the following:
Download the installer:
<pre><code class="text"> wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh </code></pre>Execute the installation script and follow the instructions.
<pre><code class="text"> bash Miniconda3-latest-Linux-x86_64.sh </code></pre>Download CASCABEL
Once that you have conda installed we are ready to clone or download the project.
You can clone the project:
<pre><code class="text"> git clone https://github.com/AlejandroAb/CASCABEL.git </code></pre>Or download it from this repository:
<pre><code class="text"> wget https://github.com/AlejandroAb/CASCABEL/archive/master.zip </code></pre>After downloading or cloning the repository, cd to the "CASCABEL" directory and there execute the following command in order to create CASCABEL's environment:
<pre><code class="text"> conda env create --name cascabel --file environment.yaml </code></pre>Activate environment
Now you can activate your new environment.
<pre><code class="text"> conda activate cascabel </code></pre>After activating the environment, if you have the environmental variable PER5LIB preconfigured, it is possinle that you will need to change it. To avoid any issue, configre PER5LIB path as follow:_
<pre><code class="text"> export PERL5LIB=/path/to/conda/.conda/envs/cascabel/perl5 </code></pre>Just make sure to change /path/to/conda/ for the correct path on your system
To identify your path, you can use the followinig command:
<pre><code class="text"> which snakemake </code></pre>Dada2
There are some issues reported while installing dada2 within conda, if you are experiencing issues you need to perform one more final step in order to install dada2
Enter to R shell (just type R) and execute the following command:
*Please notice that BiocManager should be already installed, so you just need to execute previous command. You can also find more information at [dada2's installation guide.](https://benjjneb.github.io/dada2/dada-i nstallation.html)
<pre><code class="text"> BiocManager::install("dada2", version = "3.10") </code></pre>*Please notice that BiocManager should be already installed, so you just need to execute previous command. You can also find more information at dada2's installation guide.
Getting started
Required input files:
- Forward raw reads (fastq or fastq.gz)
- Reverse raw reads (fastq or fastq.gz) (only for paired-end layout)
- File with barcode information (only for demultiplexing: format)
Main expected output files for downstream analysis
- Demultiplexed and trimmed reads
- OTU or ASV table
- Representative sequences fasta file
- Taxonomy OTU assignation
- Taxonomy summary
- Representative sequence alignment
- Phylogenetic tree
- CASCABEL Report
Run Cascabel
All the parameters and behavior of the workflow is specified through the configuration file, therefore the easiest way to have the pipeline running is to filling up some required parameters on such file.
#------------------------------------------------------------------------------#
# Project Name #
#------------------------------------------------------------------------------#
# The name of the project for which the pipeline will be executed. This should #
# be the same name used as the first parameter on init_sample.sh script (if #
# used for multiple libraries #
#------------------------------------------------------------------------------#
PROJECT: "My_CASCABEL_Project"
#------------------------------------------------------------------------------#
# LIBRARIES/SAMPLES #
#------------------------------------------------------------------------------#
# SAMPLES/LIBRARIES you want to include in the analysis. #
# Use the same library names as with the init_sample.sh script. #
# Include each library name surrounded by quotes, and comma separated. #
# i.e LIBRARY: ["LIB_1","LIB_2",..."LIB_N"] #
# LIBRARY_LAYOUT: Configuration of the library; all the libraries/samples #
# must have the same configuration; use: #
# "PE" for paired-end reads [Default]. #
# "SE" for single-end reads. #
#------------------------------------------------------------------------------#
LIBRARY: ["EXP1"]
LIBRARY_LAYOUT: "PE"
#------------------------------------------------------------------------------#
# INPUT FILES #
#------------------------------------------------------------------------------#
# To run Cascabel for multiple libraries you can provide an input file, tab #
# separated with the following columns: #
# - Library: Name of the library (this have to match with the values entered #
# in the LIBRARY variable described above). #
# - Forward reads: Full path to the forward reads. #
# - Reverse reads: Full path to the reverse reads (only for paired-end). #
# - metadata: Full path to the file with the information for #
# demultiplexing the samples (only if needed). #
# The full path of this file should be supplied in the input_files variable, #
# otherwise, you have to enter the FULL PATH for both: the raw reads and the #
# metadata file (barcode mapping file). The metadata file is only needed if #
# you want to perform demultiplexing. #
# If you want to avoid the creation of this file a third solution is available #
# using the script init_sample.sh. More info at the project Wiki: #
# https://github.com/AlejandroAb/CASCABEL/wiki#21-input-files #
# #
#----------------------------- PARAMS -----------------------------#
# #
# - fw_reads: Full path to the raw reads in forward direction (R1) #
# - rw_reads: Full path to the raw reads in reverse direction (R2) #
# - metadata: Full path to the metadata file with barcodes for each sample #
# to perform library demultiplexing #
# - input_files: Full path to a file with the information for the library(s) #
# #
# ** Please supply only one of the following: #
# - fw_reads, rv_reads and metadata #
# - input_files #
# - or use init_sample.sh script directly #
#------------------------------------------------------------------------------#
fw_reads: "/full/path/to/forward.reads.fq"
rv_reads: "/full/path/to/reverse.reads.fq"
metadata: "/full/path/to/metadata.barcodes.txt"
#or
input_files: "/full/path/to/input_reference.txt"
#------------------------------------------------------------------------------#
# ASV_WF: Binned qualities and Big data workflow #
#------------------------------------------------------------------------------#
# For fastq files with binned qualities (e.g. NovaSeq and NextSeq) the error #
# learning process within dada2 can be affected, and some data scientists #
# suggest that enforcing monotonicity could be beneficial for the analysis. #
# In this section, you can modify key parameters to enforce monotonicity and #
# also go through a big data workflow when the number of reads may exceed the #
# physical memory limit.
# More on binned qualities: https://www.illumina.com/content/d
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
