SpliceLauncher
RNAseq pipeline for alternative splicing junctions
Install / Use
/learn @LBGC-CFB/SpliceLauncherREADME
SpliceLauncher
SpliceLauncher is a pipeline tool to study the alternative splicing. It works in three steps:
- Get a read count matrix from fastq files, by a dedicated RNAseq pipeline (A step in diagram below).
- Generate data files used hereafter (B step in diagram below)
- Run SpliceLauncher from a read count matrix (C step and furthermore in diagram below).

Table of contents
- Repository contents
- Prerequisites to install SpliceLauncher
- Installing SpliceLauncher
- Running the SpliceLauncher tests
- SpliceLauncher options
- Authors
- License
Repository contents<a id="1"></a>
- dataTest: example of input files
- scripts: complementary scripts to run SpliceLauncher
Prerequisites to install SpliceLauncher<a id="2"></a>
The SpliceLauncher pipeline needs to install the following tools and R librairies:
- STAR (v2.7 or later)
- samtools (v1.3 or later)
- BEDtools (v2.17 or later)
- R with WriteXLS and Cairo packages
- Perl
STAR <a id="3"></a>
Following instruction were from the STAR manual
Get the g++ compiler for linux
sudo apt-get update
sudo apt-get install g++
sudo apt-get install make
Download the latest release and uncompress it
# Get latest STAR source
version="2.7.0c"
wget https://github.com/alexdobin/STAR/archive/${version}.tar.gz
tar -xzf ${version}.tar.gz
cd STAR-${version}
# Alternatively, get STAR source using git
git clone https://github.com/alexdobin/STAR.git
Compile under Linux
# Compile
cd STAR/source
make STAR
Samtools <a id="4"></a>
Download the samtools package at: https://github.com/samtools/samtools/releases/latest
Configure samtools for linux:
cd samtools-1.x
./configure --prefix=/where/to/install
make
make install
For more information, please see the samtools manual
BEDtools <a id="5"></a>
Installation of BEDtools for linux:
wget https://github.com/arq5x/bedtools2/releases/download/v2.25.0/bedtools-2.25.0.tar.gz
tar -zxvf bedtools-2.25.0.tar.gz
cd bedtools2
make
For more information, please see the BEDtools tutorial
Install R libraries <a id="6"></a>
The library WriteXLS allows to save result in xlsx format if you do not want to install it, use the --txtOut option. The library Cairo allows to print result in pdf format if you do not want to install it, do not add --Graphics option. Open the R console:
install.packages("WriteXLS")
install.packages("Cairo")
Installing SpliceLauncher <a id="7"></a>
Download the latest release from of SpliceLauncher source using git
git clone https://github.com/LBGC-CFB/SpliceLauncher
cd ./SpliceLauncher
Singularity image <a id="8"></a>
As the Singularity image config file is not writable, you need to use a local version of the config file SpliceLauncher and all its dependencies are also integrated in a Singularity image:
- To build it:
sudo singularity build /path/to/SpliceLauncher.simg /path/to/splicelauncher.recipe
- To use it
sudo singularity run /path/to/SpliceLauncher.simg --config /path/to/my_config.cfg --help
Download the reference files <a id="9"></a>
The reference files are the genome (Fasta) and the corresponding annotation file (GFF3):
- Reference genome in fasta format
- The annotation file in GFF v3 format
Steps:
For example, human hg19 genome file from RefSeq:
#the ftp URL depends on your assembly genome choice
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.fna.gz
gunzip ./GRCh37_latest_genomic.fna.gz
For example, human hg19 annotation file from RefSeq:
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.gff.gz
gunzip ./GRCh37_latest_genomic.gff.gz
head ./GRCh37_latest_genomic.gff
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build GRCh37.p13
#!genome-build-accession NCBI_Assembly:GCF_000001405.25
#!annotation-date
#!annotation-source
##sequence-region NC_000001.10 1 249250621
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
NC_000001.10 RefSeq region 1 249250621 . + . ID=id0;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
NC_000001.10 BestRefSeq gene 11874 14409 . + . ID=gene0;Dbxref=GeneID:100287102,HGNC:HGNC:37102;Name=DDX11L1;description=DEAD/H-box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;gene_biotype=misc_RNA;pseudo=true
NC_000001.10 BestRefSeq transcript 11874 14409 . + . ID=rna0;Parent=gene0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;Name=NR_046018.2;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H-box helicase 11 like 1;transcript_id=NR_046018.2
NC_000001.10 BestRefSeq exon 11874 12227 . + . ID=id1;Parent=rna0;Dbxref=GeneID:100287102,Genbank:NR_046018.2,HGNC:HGNC:37102;gbkey=misc_RNA;gene=DDX11L1;product=DEAD/H-box helicase 11 like 1;transcript_id=NR_046018.2
-
[Optional] Convert contig of RefSeq to UCSC chromosome names, you will need to download the assembly report, an example of this report is provide in dataTest folder but it is a truncating example so do not use for your own genome. For url example of an assembly report of GRCh37 opf RefSeq can be dowload from https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_assembly_report.txt Use the --assembly_report of INSTALL mode to launch the convertion. This step usually takes one hour.
-
[Optionnal] To reduce needed memory, we can also restrict the analysis to the primary assembly, without unplaced contigs:
grep ">" GRCh37_latest_genomic.fna | grep -v "unplaced genomic contig"| grep -v "unlocalized genomic contig" | grep -v "genomic patch"| grep -v "alternate locus" | sed 's/^>//' > chr_names
seqtk subseq GRCh37_latest_genomic.fna chr_names > GRCh37_latest_genomic.sub.fna
cut -f 1 -d ' ' chr_names > chr_names_id
head -n 9 GRCh37_latest_genomic.gff > GRCh37_latest_genomic.sub.gff
grep -f chr_names_id GRCh37_latest_genomic.gff >> GRCh37_latest_genomic.sub.gff
Configure SpliceLauncher with INSTALL mode <a id="10"></a>
SpliceLauncher comes with a ready to use config.cfg file. It contains the paths of software and files used by SpliceLauncher. The INSTALL mode of SpliceLauncher updates this config cfg file. If you define the path to GFF (v3) file and path to the FASTA genome, the INSTALL mode will extract all necessary information from this GFF and indexing the STAR genome. This informations are stored in a BED file that contains the exon coordinates, in a sjdb file that contains the intron coordinates and a text file that contains the details of transcript structures. You need to define where these files will saving by the -O, --output argument
Use INSTALL mode of SpliceLauncher:
cd /path/to/SpliceLauncher/
mkdir ./refSpliceLauncher # Here this folder will contain the reference files used by SpliceLauncher
bash ./SpliceLauncher.sh --runMode INSTALL \
-O ./refSpliceLauncher \
--STAR /path/to/STAR \
--samtools /path/to/samtools \
--bedtools /path/to/bedtools \
--gff /path/to/gff \
--threads < number of thread > \
--fasta /path/to/fasta
Running the SpliceLauncher tests<a id="11"></a>
The example files are provided in dataTest, with the example data provided in single end RNAseq (1x75pb) on BRCA1 and BRCA2 transcripts:
cd /path/to/SpliceLauncher
bash ./SpliceLauncher.sh --runMode Align,Count,SpliceLauncher -F ./dataTest/fastq/ -O ./testSpliceLauncher/ \
# Optional \
-t <number of thread> \
-m <allowed memory in bits> \
--Graphics \
--tmpDir /path/to/tmpDir # path to save tmp file during alignment
Output directory tree <a id="12"></a>
SpliceLauncher/outdir
├── Bam
│ ├── {sample}.Aligned.sortedByCoord.out.bam
| ├── {sample}.Aligned.sortedByCoord.out.bam.csi
| ├── {sample}.Aligned.sortedByCoord.out_juncs.bed
| ├── {sample}.SJ.out.tab
│ └── ...
├── getClosestExons
│ ├── {sample}.Aligned.sortedByCoord.out.count
│ └── ...
├── {run name}_results
│ ├── {run name}_figures_output
│ ├── {run name}_{sample}.pdf
│ └── ...
│ ├── {run name}_outputSpliceLauncher.xlsx
│ └── {run name}.bed
├── {run name}.txt
├── {run name}_report_{run date}.txt
The results of SpliceLauncher analysis are in {run name}_results.
The final results are displayed in the file {run name}_outputSpliceLauncher.xlsx. The scheme of this file is:
| Column names | Example | Description | |------------:
Related Skills
node-connect
337.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
