SkillAgentSearch skills...

SpliceLauncher

RNAseq pipeline for alternative splicing junctions

Install / Use

/learn @LBGC-CFB/SpliceLauncher
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SpliceLauncher


SpliceLauncher is a pipeline tool to study the alternative splicing. It works in three steps:

  • Get a read count matrix from fastq files, by a dedicated RNAseq pipeline (A step in diagram below).
  • Generate data files used hereafter (B step in diagram below)
  • Run SpliceLauncher from a read count matrix (C step and furthermore in diagram below).

SpliceLauncher

Table of contents


Repository contents<a id="1"></a>

  • dataTest: example of input files
  • scripts: complementary scripts to run SpliceLauncher

Prerequisites to install SpliceLauncher<a id="2"></a>

The SpliceLauncher pipeline needs to install the following tools and R librairies:

  • STAR (v2.7 or later)
  • samtools (v1.3 or later)
  • BEDtools (v2.17 or later)
  • R with WriteXLS and Cairo packages
  • Perl

STAR <a id="3"></a>

Following instruction were from the STAR manual

Get the g++ compiler for linux

sudo apt-get update
sudo apt-get install g++
sudo apt-get install make

Download the latest release and uncompress it

# Get latest STAR source
version="2.7.0c"
wget https://github.com/alexdobin/STAR/archive/${version}.tar.gz
tar -xzf ${version}.tar.gz
cd STAR-${version}

# Alternatively, get STAR source using git
git clone https://github.com/alexdobin/STAR.git

Compile under Linux

# Compile
cd STAR/source
make STAR

Samtools <a id="4"></a>

Download the samtools package at: https://github.com/samtools/samtools/releases/latest

Configure samtools for linux:

cd samtools-1.x
./configure --prefix=/where/to/install
make
make install

For more information, please see the samtools manual

BEDtools <a id="5"></a>

Installation of BEDtools for linux:

wget https://github.com/arq5x/bedtools2/releases/download/v2.25.0/bedtools-2.25.0.tar.gz
tar -zxvf bedtools-2.25.0.tar.gz
cd bedtools2
make

For more information, please see the BEDtools tutorial

Install R libraries <a id="6"></a>

The library WriteXLS allows to save result in xlsx format if you do not want to install it, use the --txtOut option. The library Cairo allows to print result in pdf format if you do not want to install it, do not add --Graphics option. Open the R console:

install.packages("WriteXLS")
install.packages("Cairo")

Installing SpliceLauncher <a id="7"></a>

Download the latest release from of SpliceLauncher source using git

git clone https://github.com/LBGC-CFB/SpliceLauncher
cd ./SpliceLauncher

Singularity image <a id="8"></a>

As the Singularity image config file is not writable, you need to use a local version of the config file SpliceLauncher and all its dependencies are also integrated in a Singularity image:

  1. To build it:
sudo singularity build /path/to/SpliceLauncher.simg /path/to/splicelauncher.recipe
  1. To use it
sudo singularity run /path/to/SpliceLauncher.simg --config /path/to/my_config.cfg --help

Download the reference files <a id="9"></a>

The reference files are the genome (Fasta) and the corresponding annotation file (GFF3):

  1. Reference genome in fasta format
  2. The annotation file in GFF v3 format

Steps:

  1. Download Fasta genome: from RefSeq FTP server or from Gencode.

For example, human hg19 genome file from RefSeq:

#the ftp URL depends on your assembly genome choice
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.fna.gz
gunzip ./GRCh37_latest_genomic.fna.gz
  1. Download the GFF annotation file, either from RefSeq FTP server or from Gencode.

For example, human hg19 annotation file from RefSeq:

wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.gff.gz
gunzip ./GRCh37_latest_genomic.gff.gz
head ./GRCh37_latest_genomic.gff
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build GRCh37.p13
#!genome-build-accession NCBI_Assembly:GCF_000001405.25
#!annotation-date
#!annotation-source
##sequence-region NC_000001.10 1 249250621
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
NC_000001.10	RefSeq	region	1	249250621	.	+	.	ID=id0;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
NC_000001.10	BestRefSeq	gene	11874	14409	.	+	.	ID=gene0;Dbxref=GeneID:100287102,HGNC:HGNC:37102;Name=DDX11L1;description=DEAD/H-box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;gene_biotype=misc_RNA;pseudo=true
NC_000001.10	BestRefSeq	transcript	11874	14409	.	+	.	ID=rna0;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;Name=NR_046018.2;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H-box helicase 11 like 1;transcript_id=NR_046018.2
NC_000001.10	BestRefSeq	exon	11874	12227	.	+	.	ID=id1;Parent=rna0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H-box helicase 11 like 1;transcript_id=NR_046018.2
  1. [Optional] Convert contig of RefSeq to UCSC chromosome names, you will need to download the assembly report, an example of this report is provide in dataTest folder but it is a truncating example so do not use for your own genome. For url example of an assembly report of GRCh37 opf RefSeq can be dowload from https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_assembly_report.txt Use the --assembly_report of INSTALL mode to launch the convertion. This step usually takes one hour.

  2. [Optionnal] To reduce needed memory, we can also restrict the analysis to the primary assembly, without unplaced contigs:

grep ">" GRCh37_latest_genomic.fna | grep -v "unplaced genomic contig"| grep -v "unlocalized genomic contig" | grep -v "genomic patch"| grep -v "alternate locus" | sed 's/^>//' > chr_names
seqtk subseq GRCh37_latest_genomic.fna chr_names > GRCh37_latest_genomic.sub.fna
cut -f 1 -d ' ' chr_names > chr_names_id
head -n 9 GRCh37_latest_genomic.gff > GRCh37_latest_genomic.sub.gff
grep -f chr_names_id GRCh37_latest_genomic.gff >> GRCh37_latest_genomic.sub.gff

Configure SpliceLauncher with INSTALL mode <a id="10"></a>

SpliceLauncher comes with a ready to use config.cfg file. It contains the paths of software and files used by SpliceLauncher. The INSTALL mode of SpliceLauncher updates this config cfg file. If you define the path to GFF (v3) file and path to the FASTA genome, the INSTALL mode will extract all necessary information from this GFF and indexing the STAR genome. This informations are stored in a BED file that contains the exon coordinates, in a sjdb file that contains the intron coordinates and a text file that contains the details of transcript structures. You need to define where these files will saving by the -O, --output argument

Use INSTALL mode of SpliceLauncher:

cd /path/to/SpliceLauncher/
mkdir ./refSpliceLauncher # Here this folder will contain the reference files used by SpliceLauncher
bash ./SpliceLauncher.sh --runMode INSTALL \
    -O ./refSpliceLauncher \
    --STAR /path/to/STAR \
    --samtools /path/to/samtools \
    --bedtools /path/to/bedtools \
    --gff /path/to/gff \
    --threads < number of thread > \
    --fasta /path/to/fasta

Running the SpliceLauncher tests<a id="11"></a>

The example files are provided in dataTest, with the example data provided in single end RNAseq (1x75pb) on BRCA1 and BRCA2 transcripts:

cd /path/to/SpliceLauncher
bash ./SpliceLauncher.sh --runMode Align,Count,SpliceLauncher -F ./dataTest/fastq/ -O ./testSpliceLauncher/ \
    # Optional \
    -t <number of thread> \
    -m <allowed memory in bits> \
    --Graphics \
    --tmpDir /path/to/tmpDir # path to save tmp file during alignment

Output directory tree <a id="12"></a>

SpliceLauncher/outdir
├── Bam
│   ├── {sample}.Aligned.sortedByCoord.out.bam
|   ├── {sample}.Aligned.sortedByCoord.out.bam.csi
|   ├── {sample}.Aligned.sortedByCoord.out_juncs.bed
|   ├── {sample}.SJ.out.tab
│   └── ...
├── getClosestExons
│   ├── {sample}.Aligned.sortedByCoord.out.count
│   └── ...
├── {run name}_results
│   ├── {run name}_figures_output
│       ├── {run name}_{sample}.pdf
│       └── ...
│   ├── {run name}_outputSpliceLauncher.xlsx
│   └── {run name}.bed
├── {run name}.txt
├── {run name}_report_{run date}.txt

The results of SpliceLauncher analysis are in {run name}_results.

The final results are displayed in the file {run name}_outputSpliceLauncher.xlsx. The scheme of this file is:

| Column names | Example | Description | |------------:

Related Skills

View on GitHub
GitHub Stars14
CategoryDevelopment
Updated9h ago
Forks10

Languages

R

Security Score

95/100

Audited on Mar 26, 2026

No findings