SkillAgentSearch skills...

CirComPara

:microscope: A multi-method comparative bioinformatics pipeline to detect and study circRNAs from RNA-seq data

Install / Use

/learn @egaffo/CirComPara

README


Title: CirComPara
Subtitle: a multi-method comparative bioinformatics pipeline to detect and study circRNAs from RNA-seq data
Project: CirComPara
Author: Enrico Gaffo
Affiliation: Compgen - University of Padova
Web: http://compgen.bio.unipd.it
Date: December 21, 2016
output: html_document: keep_md: no number_sections: no toc: no

CirComPara

CirComPara is a computational pipeline to detect, quantify, and correlate expression of linear and circular RNAs from RNA-seq data.

<!--TODO: more exhaustive description -->

Quick install

Execute the following commands to download and install (locally) in your system the scripts and tools required to run CirComPara. If something goes wrong with the installation process try to manually install the software as described below.

Download and extract [the latest release of CirComPara][circompara_pack_link], or clone the GIT repository, enter CirComPara directory and run the automatic installer script:

git clone http://github.com/egaffo/CirComPara
cd CirComPara
./install_circompara

Test your installation

NB: in the sed string change the /full/circompara/dir/path path with your installation directory

cd test_circompara/analysis
../../circompara

If you plan to use single-end reads, test with:

cd test_circompara/analysis_se
../../circompara

If you receive some error messages try to follow instructions in Installation troubleshooting section.

Add CirComPara to your environment

Once completed the installation, if you do not want to type the whole path to the CirComPara executable each time, you can update your PATH environment variable. From the terminal type the following command (replace the /path/to/circompara/install/dir string with CirComPara's actual path)

export PATH=/path/to/circompara/install/dir:$PATH

Another way is to link CirComPara's main script in your local bin directory

cd /home/user/bin
ln -s /path/to/circompara/install/dir/circompara

CirComPara Docker image

A Docker image of CirComPara is available from DockerHub.

To pull the image:

docker pull egaffo/circompara-docker

You'll find the instructions on how to use the docker image at https://hub.docker.com/r/egaffo/circompara-docker.

How to use

Set your analysis project

This section shows how to set your project directory and run the analysis. To run an analysis usually you want to specify your data (the sequenced reads in FASTQ format) and a reference genome in FASTA format.

Compose META file

You have to specify read files, sample names and sample experimental condition in a metadata table file. The file format is a comma separated text file with the following header:

file,sample,condition

Then, each row corresponds to a read file. If you have paired-end sequenced samples write one line per file with the same sample name and condition.

An example of the metadata table:

file|sample|condition ----|------|--------- /home/user/reads_S1_1.fq|S1|WT /home/user/reads_S1_2.fq|S1|WT /home/user/reads_S2_1.fq|S2|MU /home/user/reads_S2_1.fq|S2|MU

and metadata file content:

file,sample,condition
/home/user/reads_S1_1.fq,S1,WT
/home/user/reads_S1_2.fq,S1,WT
/home/user/reads_S2_1.fq,S2,MU
/home/user/reads_S2_1.fq,S2,MU

In the meta file you can also specify the adapter sequences to preprocess the reads, just add an adapter column with the adpter file.

file|sample|condition|adapter ----|------|---------|------- /home/user/reads_S1_1.fq|S1|WT|/home/user/circompara/adapter.fa /home/user/reads_S1_2.fq|S1|WT|/home/user/circompara/adapter.fa

Specify the reference genome file

A required parameter is the reference genome. You can either pass the reference genome from the command line

./circompara "GENOME_FASTA='/home/user/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa'"

or by setting the GENOME_FASTA parameter in the vars.py file; e.g.:

GENOME_FASTA = '/home/user/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa'

Specify options in vars.py

Although parameters can be set from command line (sorrounded by quotes), you can set them in the vars.py file, which must be placed into the directory where CirComPara is called.
Below there is the full list of the parameters:

META: The metadata table file where you specify the project samples, etc.
    default: meta.csv

ANNOTATION: Gene annotation file (like Ensembl GTF/GFF)
    default: 

GENOME_FASTA: The FASTA file with the reference genome
    default: 

CIRCRNA_METHODS: Comma separated list of circRNA detection methods to use. Repeated values will be collapsed into unique values. Currently supported: ciri, find_circ, circexplorer2_star, circexplorer2_bwa, circexplorer2_tophat, circexplorer2_segemehl, testrealign (unfiltered segemehl; use of circexplorer2_segemehl is recommended for a better filtering of segemehl predictions). Set an empty string to use all methods available (including deprecated methods). 
    default: ciri,find_circ,circexplorer2_star,circexplorer2_bwa,circexplorer2_segemehl

CPUS: Set number of CPUs
    default: 4

GENEPRED: The genome annotation in GenePred format
    default: 

GENOME_INDEX: The index of the reference genome for HISAT2
    default: 

SEGEMEHL_INDEX: The .idx index for segemehl
    default: 

BWA_INDEX: The index of the reference genome for BWA
    default: 

BOWTIE2_INDEX: The index of the reference genome for BOWTIE2
    default: 

STAR_INDEX: The directory path where to find Star genome index
    default: 

BOWTIE_INDEX: The index of the reference genome for BOWTIE when using CIRCexplorer2_tophat
    default: 

HISAT2_EXTRA_PARAMS: Extra parameters to add to the HISAT2 aligner fixed parameters '--dta --dta-cufflinks --rg-id <SAMPLE> --no-discordant --no-mixed --no-overlap'. For instance, '--rna-strandness FR' if stranded reads are used.
    default: 

BWA_PARAMS: Extra parameters for BWA
    default: 

SEGEMEHL_PARAMS: SEGEMEHL extra parameters
    default: 

TOPHAT_PARAMS: Extra parameters to pass to TopHat
    default: 

STAR_PARAMS: Extra parameters to pass to STAR
    default: 

CUFFLINKS_PARAMS: Cufflinks extra parameters. F.i. '--library-type fr-firststrand' if dUTPs stranded library were used for the sequencing
    default: 

CUFFQUANT_EXTRA_PARAMS: Cuffquant parameter options to specify. E.g. --frag-bias-correct $GENOME_FASTA  --multi-read-correct --max-bundle-frags 9999999
    default: 

CUFFDIFF_EXTRA_PARAMS: Cuffdiff parameter options to specify. E.g. --frag-bias-correct $GENOME_FASTA  --multi-read-correct
    default: 

CUFFNORM_EXTRA_PARAMS: Extra parameters to use if using Cuffnorm
    default: --output-format cuffdiff  

STRINGTIE_PARAMS: Stringtie extra parameters. F.i. '--rf' assumes a stranded library fr-firststrand, to be used if dUTPs stranded library were sequenced  
    default:  

CIRI_EXTRA_PARAMS: CIRI additional parameters
    default: 

PREPROCESSOR: The preprocessing method
    default: trimmomatic

PREPROCESSOR_PARAMS: Read preprocessor extra parameters. F.i. if Trimmomatic, an empty string defaults to MAXINFO:40:0.5 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:30 MINLEN:50 AVGQUAL:30 
    default: 

LINEAR_EXPRESSION_METHODS: The method to be used for the linear expression estimates/transcriptome reconstruction. To run more methods use a comma separated list. However, only the first method in the list will be used in downstream processing. Currently supported methods: stringtie,cufflinks,htseq.  
    default: stringtie  

TOGGLE_TRANSCRIPTOME_RECONSTRUCTION: Set True to enable transcriptome reconstruction. Default only quantifies genes and transcripts from the given annotation GTF file
    default: False

DIFF_EXP: Set the method to and enable differential expression computation for linear genes/transcripts. Current methods supported: cufflinks, ballgown, DESeq2. Only available if more than one sample and more than one condition are given. N.B: differential expression tests for circRNAs is not yet implemented
    default: 

READSTAT_METHODS: Comma separated list of methods to use for read statistics. Currently supported: fastqc,fastx
    default: fastqc

MIN_METHODS: Number of methods that commmonly detect a circRNA to define the circRNA as reliable. If this value exceeds the number of methods specified, it will be set to the number of methods.
    default: 2

MIN_READS: Number of reads to consider a circRNA as expressed
    default: 2

BYPASS_LINEAR: Skip analysis of linear transcripts. This will also skip the analysis of linear-to-circular expression correlation
    default: False

CIRC_PE_MAPPING: By default, linearly unmapped reads are collapsed into single-end reads to search for circRNA backsplices. Set this option to "True" to force circRNA method aligners to maintain paired-end read alignment
   default: False  

Run the analysis

To trigger the analyses you simply have to call the ./circompara script in the analysis directory. Remember that if you used the vars.py option file, this has to be in the analysis directory.

cd /home/user/circrna_analysis
/home/user/circompara/circompara

Additional options from the Scons engine:

  • Basic execution: run the analysis as a linear pipeline, i.e. no parallel task execution, and stop on errors
/path/to/circompara/dir/circompara
  • Show parameters: to show the parameters set before actually run the analysis, use -h:
/path/to/circompara/dir/circompara -h
  • Dryrun: to see which commands will be executed without actually execute them, use the -n option. NB: many commands will be listed, so you should redirect to a file or pipe to a reader like less
/path/to/circompara/dir/circompara -n | less -SR
  • Multitasks: the -j option specifies how m

Related Skills

View on GitHub
GitHub Stars16
CategoryDevelopment
Updated3mo ago
Forks11

Languages

R

Security Score

77/100

Audited on Dec 4, 2025

No findings