SkillAgentSearch skills...

Circompara2

Improved bioinformatic pipeline to identify and quantify circRNA expression from RNA-seq data by combining multiple circRNA detection methods

Install / Use

/learn @egaffo/Circompara2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README


Title: CirComPara2
Subtitle: CircRNA detection from RNA-seq data using multiple methods
Project: CirComPara2
Author: Enrico Gaffo
Affiliation: Compgen - University of Padova
Web: http://compgen.bio.unipd.it
Date: January 20, 2021
output: html_document: toc: yes number_sections: no

Circompara2

CirComPara2 is a computational pipeline to detect, quantify, and correlate expression of linear and circular RNAs from RNA-seq data that combines multiple circRNA-detection methods.

<!--TODO: more exhaustive description -->

Quick install

Execute the following commands to download and install (locally) in your system the scripts and tools required to run circompara2. If something goes wrong with the installation process try to manually install each software listed below.

Required software before installation

You'll need some libraries and software installed in your system before starting the circompara2 installation. In a fresh Ubuntu 20.04 (Focal) you need to install the following packages by running:

sudo apt install git python2.7 wget unzip pkg-config default-jre r-base-core libcurl4-openssl-dev libxml2-dev libssl-dev curl pigz python-is-python2 python-dev-is-python2

Virtual environment

Because not all software integrated in circompara2 runs on Python3, circompara2 still uses python2.7. If you system default is Python3, then you might want to consider installing and running circompara2 under a virtual environment, such as one generated with virtualenv:

virtualenv -p /usr/bin/python2.7 p2.7venv
## activate the virtual environment
source p2.7venv/bin/activate

Now you can proceed with the installation (or lanch circompara2 if you have already installed it).

Installation commands

Download and extract the latest release of CirComPara, or clone the GIT repository, enter circompara2 directory and run the automatic installer script:

git clone http://github.com/egaffo/circompara2
cd circompara2
./src/utils/bash/install_circompara
## make a link to the circompara2 main script into the main directory
ln -s src/utils/bash/circompara circompara2

Test your installation

cd test_circompara/analysis
../../circompara2

If you plan to use single-end reads, test with:

cd test_circompara/analysis_se
../../circompara2

Add circompara2 to your environment

Once completed the installation, if you do not want to type the whole path to the circompara2 executable each time, you can update your PATH environment variable. From the terminal type the following command (replace the /path/to/circompara2/install/dir string with circompara2's actual path)

export PATH=/path/to/circompara2/install/dir:$PATH

Another way is to link circompara2's main script in your local bin directory

cd /home/user/bin
ln -s /path/to/circompara2/install/dir/circompara2

Alternative installation: the circompara2 Docker image

A Docker image of CirComPara2 is available from DockerHub in case you are struggling with the installation. The Docker image saves you from the installation burden, just pull the image:

docker pull egaffo/circompara2:v0.1.2.1

How to use

Set your analysis project

This section shows how to set your project directory and run the analysis. To run an analysis usually you want to specify your data (the sequenced reads in FASTQ format) and a reference genome in FASTA format.

Compose META file

You have to specify read files and sample names in a metadata table file. The file format is a comma separated text file with the following header line:

file,sample

Then, each row corresponds to a read file. If you have paired-end sequenced samples write one line per file with the same sample name.

An example of the metadata table:

| file | sample | |------------------------|--------| | /path/to/reads_S1_1.fq | S1 | | /path/to/reads_S1_2.fq | S1 | | /path/to/reads_S2_1.fq | S2 | | /path/to/reads_S2_1.fq | S2 |

and metadata file content:

file,sample
/path/to/reads_S1_1.fq,S1
/path/to/reads_S1_2.fq,S1
/path/to/reads_S2_1.fq,S2
/path/to/reads_S2_1.fq,S2

In the meta file you can also specify the adapter sequences to preprocess the reads, just add an adapter column with the adpter file.

| file | sample | adapter | |------------------------|--------|---------------------| | /path/to/reads_S1_1.fq | S1 | /path/to/adapter.fa | | /path/to/reads_S1_2.fq | S1 | /path/to/adapter.fa |

Specify the reference genome file

A required parameter is the reference genome. You can either pass the reference genome from the command line

./circompara2 "GENOME_FASTA='/home/user/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa'"

or by setting the GENOME_FASTA parameter in the vars.py file; e.g.:

GENOME_FASTA = '/home/user/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa'

Specify options in vars.py

Although parameters can be set from command line (sorrounded by quotes), you can set them in the vars.py file, which must be placed into the directory where circompara2 is called.
Below there is the full list of the parameters.

Parameters

META: The metadata table file where you specify the project samples, etc.
    default: meta.csv

ANNOTATION: Gene annotation file (like Ensembl GTF/GFF)
    default: 

GENOME_FASTA: The FASTA file with the reference genome
    default: 

CIRCRNA_METHODS: Comma separated list of circRNA detection methods to use. Repeated values will be collapsed into unique values. Currently supported: ciri, dcc, circrna_finder, find_circ, circexplorer2_star, circexplorer2_bwa, circexplorer2_tophat, circexplorer2_segemehl, testrealign (a.k.a. Segemehl). Set an empty string to use all methods available (including deprecated methods). 
    default: ciri,find_circ,circexplorer2_star,circexplorer2_bwa,circexplorer2_segemehl,circexplorer2_tophat,dcc

CPUS: Set number of CPUs
    default: 1

GENEPRED: The genome annotation in GenePred format
    default: 

GENOME_INDEX: The index of the reference genome for HISAT2
    default: 

SEGEMEHL_INDEX: The .idx index for segemehl
    default: 

BWA_INDEX: The index of the reference genome for BWA
    default: 

BOWTIE2_INDEX: The index of the reference genome for BOWTIE2
    default: 

STAR_INDEX: The directory path where to find Star genome index
    default: 

BOWTIE_INDEX: The index of the reference genome for BOWTIE when using CIRCexplorer2_tophat
    default: 

HISAT2_EXTRA_PARAMS: Extra parameters to add to the HISAT2 aligner fixed parameters '--dta --dta-cufflinks --rg-id <SAMPLE> --no-discordant --no-mixed --no-overlap'. For instance, '--rna-strandness FR' if stranded reads are used.
    default: --seed 123

BWA_PARAMS: Extra parameters for BWA
    default: -T 19

SEGEMEHL_PARAMS: SEGEMEHL extra parameters
    default: -D 0

TOPHAT_PARAMS: Extra parameters to pass to TopHat
    default: 

STAR_PARAMS: Extra parameters to pass to STAR
    default: --runRNGseed 123 --outSJfilterOverhangMin 15 15 15 15 --alignSJoverhangMin 15 --alignSJDBoverhangMin 15 --seedSearchStartLmax 30 --outFilterScoreMin 1 --outFilterMatchNmin 1 --outFilterMismatchNmax 2 --chimSegmentMin 15 --chimScoreMin 15 --chimScoreSeparation 10 --chimJunctionOverhangMin 15

BOWTIE2_PARAMS: Extra parameters to pass to Bowtie2 in addition to -p $CPUS --reorder --score-min=C,-15,0 -q
    default: --seed 123

STRINGTIE_PARAMS: Stringtie extra parameters. F.i. '--rf' assumes a stranded library fr-firststrand, to be used if dUTPs stranded library were sequenced  
    default:  

CIRI_EXTRA_PARAMS: CIRI additional parameters
    default: 

DCC_EXTRA_PARAMS: DCC additional parameters
    default: -fg -M -F -Nr 1 1 -N

CE2_PARAMS: Parameters to pass to CIRCexplorer2 annotate
    default:

TESTREALIGN_PARAMS: Segemehl/testrealign filtering parameters-q indicates the minimum median quality of backsplices ends (like the Haarz parameter)
    default: -q median_1

FINDCIRC_EXTRA_PARAMS: Parameters for find_circ.py. Additional parameters: --best-qual INT is used to filter find_circ results according to best_qual_left and best_qual_right fields >= INT. Default: INT = 40. --filter-tags TAG is used to filter lines of find_circ.py output (sites.bed). Repeat it if multiple consecutive filter tags has to be applied.
    default: --best-qual 40 --filter-tags UNAMBIGUOUS_BP --filter-tags ANCHOR_UNIQUE

CFINDER_EXTRA_PARAMS: Parameters for CircRNA_finder 
    default:

PREPROCESSOR: The read preprocessing tool to use. Currently, only "trimmomatic" is supported.Leave empty for no read preprocessing.
    default: 

PREPROCESSOR_PARAMS: Read preprocessor extra parameters. F.i. if Trimmomatic, an empty string defaults to MAXINFO:40:0.5 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:30 MINLEN:50 AVGQUAL:30 
    default: 

LINEAR_EXPRESSION_METHODS: The method to be used for the linear expression estimates/transcriptome reconstruction. To run more methods use a comma separated list. However, only the first method in the list will be used in downstream processing. Currently supported methods: stringtie,cufflinks,htseq.  
    default: stringtie  

TOGGLE_TRANSCRIPTOME_RECONSTRUCTION: Set True to enable transcriptome reconstruction. Default only quantifies genes and transcripts from the given annotation GTF file
    default: False

READSTAT_METHODS: Comma separated list of methods to use for read statistics. Currently supported: fastqc
    default: fastqc
View on GitHub
GitHub Stars10
CategoryDevelopment
Updated1mo ago
Forks1

Languages

Python

Security Score

75/100

Audited on Feb 13, 2026

No findings