Falcon
A tool to infer metagenomic sample composition
Install / Use
/learn @cobilab/FalconREADME
1. Installation
1.1 Automatic installation with Conda
conda install -c cobilab falcon --yes
1.2 Manual installation
git clone https://github.com/cobilab/falcon.git
cd falcon/src/
cmake .
make
cp FALCON ../../
cp FALCON-filter ../../
cp FALCON-filter-visual ../../
cp FALCON-inter ../../
cp FALCON-inter-visual ../../
cd ../../
Cmake is needed for installation.
2. Demo
Search for the top 15 similar viruses in sample reads that we provide in folder test:
cd test
gunzip reads.fq.gz
gunzip VDB.fa.gz
./FALCON -v -F -t 15 -l 47 -x top.txt reads.fq VDB.fa
It will identify Zaire Ebolavirus in the samples (top.txt) according to the following image
<p align="center"><img src="imgs/top.png" alt="Top" width="604" border="0" /></p>3. Building a reference database
3.1 Build the latest NCBI viral database
An example of building a reference database from NCBI:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/assembly_summary.txt
awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary.txt > ASCG.txt
mkdir -p GB_DB_VIRAL
mkdir -p GB_DB_VIRAL_CDS
mkdir -p GB_DB_VIRAL_RNA
cat ASCG.txt | xargs -I{} -n1 -P8 wget -P GB_DB_VIRAL {}/*_genomic.fna.gz
mv GB_DB_VIRAL/*_cds_from_genomic.fna.gz GB_DB_VIRAL_CDS/
mv GB_DB_VIRAL/*_rna_from_genomic.fna.gz GB_DB_VIRAL_RNA/
zcat GB_DB_VIRAL/*.fna.g > VDB.fa
For building reference databases for multiple domains/kingdoms (bacterial, fungi, protozoa, plant, etc), use:
https://raw.githubusercontent.com/cobilab/gto/master/scripts/gto_build_dbs.sh
3.2 Download an existing database
<p align="justify"> An already reference viral database is available <a href="http://sweet.ua.pt/pratas/datasets/VDB.fa.gz">here</a>. With this example, you only need to uncompress it, namely through: gunzip VDB.fa.gz, and use it in FALCON along with the FASTQ reads. </p>4. Usage
The FALCON package includes the following binaries:
- <b>FALCON</b>: metagenomic composition analysis;
- <b>FALCON-filter</b>: local interations - localization;
- <b>FALCON-filter-visual</b>: visualization of global and local similarities;
- <b>FALCON-inter</b>: inter-similarity between database genomes;
- <b>FALCON-inter-visual</b>: visualization of inter-similarities.
4.1 Metagenomic composition analysis
To see the possible options of FALCON type
./FALCON
or
./FALCON -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-Z database local similarity,
-s show compression levels,
-l <level> compression level [1;47],
-p <sample> subsampling (default: 1),
-t <top> top of similarity (default: 20),
-n <nThreads> number of threads (default: 2),
-x <FILE> similarity top filename,
-y <FILE> local similarities filename,
Mandatory arguments:
[FILE1]:[FILE2]:... metagenomic filename (FASTQ),
Use ":" for splitting files.
[FILE] database filename (Multi-FASTA).
4.2 Local detection
For local interactions detection and visualization the FALCON package provides <b>FALCON-filter</b> and <b>FALCON-filter-visual</b>.
4.2.1 Filtering
To see the possible options of FALCON-filter type
./FALCON-filter
or
./FALCON-filter -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-s <size> filter window size,
-w <type> filter window type,
-x <sampling> filter window sampling,
-sl <lower> similarity lower bound,
-su <upper> similarity upper bound,
-dl <lower> size lower bound,
-du <upper> size upper bound,
-t <threshold> threshold [0;2.0],
-o <FILE> output filename,
Mandatory arguments:
[FILE] profile filename (from FALCON).
4.2.2 Visualization
To see the possible options of FALCON-filter-visual type
./FALCON-filter-visual
or
./FALCON-filter-visual -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-w <width> square width (for each value),
-s <ispace> square inter-space (between each value),
-i <indexs> color index start,
-r <indexr> color index rotations,
-u <hue> color hue,
-sl <lower> similarity lower bound,
-su <upper> similarity upper bound,
-dl <lower> size lower bound,
-du <upper> size upper bound,
-bg show only the best of group,
-g <color> color gamma,
-e <size> enlarge painted regions,
-ss do NOT show global scale,
-sn do NOT show names,
-o <FILE> output image filename,
Mandatory arguments:
[FILE] profile filename (from FALCON-filter).
4.3 Database inter-similarity
4.3.1 Mapping inter-similarity
To see the possible options of FALCON-inter type
./FALCON-inter
or
./FALCON-inter -h
These will print the following options:
Non-mandatory arguments:
-h give this help,

