SkillAgentSearch skills...

MntJULiP

Comprehensive and scalable differential splicing analyses

Install / Use

/learn @splicebox/MntJULiP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MntJULiP

MntJULiP is a program for comprehensively and accurately quantifying splicing differences at intron level from collections of RNA-seq data. MntJULiP detects both differences in intron abundance levels, or differential splicing abundance (DSA), and differences in intron splicing ratios relative to the local gene output, or differential splicing ratio (DSR). MntJULiP uses a Bayesian mixture model that allows comparison of multiple conditions, and can model the effects of covariates to enable analyses of large and complex human data from disease cohorts and population studies.

Described in:

For the original version of MntJULiP, please refer to the original branch.

Copyright (C) 2019-2025, and GNU GPL v3.0, by †Wui Wang Lui, †Guangyu Yang, Liliana Florea († equal contributors)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

<a name="table-of-contents"></a> Table of contents

<a name="what-is-mntjulip"></a> What is MntJULiP?

MntJULiP is a high-performance Python package for comprehensive and accurate quantification of splicing differences from RNA-seq data. It uses Bayesian mixture models to detect changes in splicing ratios (differential splicing ratio, DSR) and in absolute splicing levels (differential splicing abundance, DSA). Its statistical underpinnings include a Dirichlet multinomial mixture model, to test for differences in the splicing ratio, and a zero-inflated negative binomial mixture model, to test for differential splicing abundance. MntJULiP works at the level of introns, or splice junctions, and therefore it is assembly-free, and can be used with or without a reference annotation. MntJULiP can perform multi-way comparisons, which may be desirable for complex and time-series experiments. Additionally, it can model confounders such as age, sex, BMI and others, and removes their biases from the data to allow for accurate comparisons. MntJULiP is fully scalable, and can work with data sets from a few to hundred or thousands of RNA-seq samples.

MntJULiP was designed to be compatible with alignments generated by STAR or Tophat2, but can work with output from other aligners.

Features

  • A novel assembly-free model for detecting differential intron abundance and differential intron splicing ratio across samples;
  • Allows multi-way comparisons;
  • Incorporates the treatment of covariates within the models;
  • Can be used with or without a reference annotation;
  • Multi-threaded and highly scalable, can process hundreds of samples in hours.

<a name="installation"></a> Installation

MntJULiP is written in Python, you can install the latest version from our GitHub repository. To download the code, you can clone this repository by

git clone https://github.com/splicebox/MntJULiP.git

System requirement

  • Linux
  • Python 3.7 or later
  • gcc 6.4.0 or later

Prerequisites:

MntJULiP has the following dependencies:

  • PyStan, a package for statistical modeling and high-performance statistical computation.
  • NumPy, a fundamental package for scientific computing with Python.
  • SciPy, a Python-based package for mathematics, science, and engineering.  
  • Statsmodels, a Python module for the estimation of different statistical models, conducting statistical tests and data exploration.  
  • Pandas, a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
  • Dask, a Python package that provides advanced parallelism for analytics.
  • scikit-learn, a Python simple and efficient tools for predictive data analysis.

The required packages may be installed using conda:

cd MnJULiP
conda env create -f mntjulip_cov.yml
conda activate mj-cov
#  run setup.py to install MntJULiP and all the required packages
python3 setup.py install  

<a name="usage"></a> Usage

Usage: python run.py [options] [--bam-list bam_file_list.txt | --splice-list splice_file_list.txt]

required arguments:
  --bam-list BAM_LIST   a text file that contains the list of the BAM file
                        paths and sample conditions
  OR
  --splice-list SPLICE_LIST
                        a text file that contains the list of the SPLICE file
                        paths and sample conditions
optional arguments:
  --anno-file ANNO_FILE
                        annotation file in GTF format
  --out-dir OUT_DIR     output folder to store the results and temporary files (default: ./out)
  --num-threads NUM_THREADS
                        number of CPU cores use to run the program (default: 4)
  --raw-counts-only     output sample-level raw values only (default: both raw and estimated values)
  -v, --version         show program's version number and exit
  -h, --help            show this help message and exit

Test run MntJULiP with test data:

cd test_data
python ../run.py --splice-list splice.4F1M.cov 

Here is an example to run MntJULiP with a set of alignment files and the GENCODE annotation:

ANNO_FILE="gencode.v22.annotation.gtf"
BAM_LIST="bam_file_list.txt"
python3 run.py --bam-list ${BAM_LIST} \
               --anno-file ${ANNO_FILE} \
               --threads 8            

The 'bam_list' is a .txt file with columns separated by tabs ('\t'); covariate columns are optional. Here is an example:

sample	condition	covariate1	covariate2
sample1.bam	control	Male	18
sample2.bam	case	Female	49

Note that in the current version, MntJULiP automatically determines a reference condition, by sorting the conditions and choosing the lexicographically smallest one.

Extracting the splice junctions and their read counts needed for quantification from the .bam files is the first and most time consuming step. It may be beneficial to avoid recalculating the values on subsequent runs of the data or subgroups. Also, in some cases the .bam files may not be available, or the splice junctions can be obtained from other sources. For efficient processing and to accommodate such cases, MntJULiP can work directly with the .splice files, instead of the .bam files, as input.

Here is an example of how to run MntJULiP with the GENCODE annotation and the splice file list:

SPLICE_LIST="splice_file_list.txt"
ANNO_FILE="gencode.v22.annotation.gtf.gz"
python run.py --splice-list ${SPLICE_LIST} \
               --anno-file ${ANNO_FILE} \
               --num-threads 8 

The 'splice_file_list.txt' is a .txt file with columns separated by 'tab' or '\t'; covariate columns are optional. Here is an example:

sample  condition	covariate1	covariate2
sample1.splice.gz  control	Male	18
sample2.splice.gz  case	Female	49

The .splice file for a given sample is a space or ' ' separated file with at least 5 columns "chrom start end count strand" (the header is excluded):

chr1 1311924 1312018 100 -

A splice file may have additional columns; for instance, those generated by the junc tool included in this package will distinguish the numbers of uniquely and multimapped supporting reads:

chr1 1311924 1312018 100 - 67 33 ...

<a name="inputoutput"></a> Input/Output

Input

The main input of MntJULiP is a list of BAM files containing RNA-seq read alignments. The BAM files can be generated by STAR with or without '--outSAMstrandField intronMotif' option.

STAR --genomeDir ${STARIDX_DIR} \
     --outSAMstrandField intronMotif \
     --readFilesIn ${DATA_DIR}/${name}_1.fastq ${DATA_DIR}/${name}_2.fastq \
     --outSAMtype BAM SortedByCoordinate \
     --outFileNamePrefix ${WORK_DIR}/${name}/

Alternatively, MntJULiP can take as input .splice files generated by an external program, such as the junc tool included in this package, or other versions from our PsiCLASS or CLASS2 packages.

junc my_alignment_fil.bam -a > my_alignment_file.splice

Output

Output generated by MntJULiP includes five types of files:

  • diff_spliced_introns.txt and diff_spliced_groups.txt: contains the results of the differential intron splicing ratio (DSR) analysis;
  • diff_introns.txt: contains the results of the differential intron abundance (DSA) analysis;
  • intron_data.txt and group_data.txt: contains information about the introns (splice junctions), including genomic coordinates, raw and estimated read counts, average abundance levels, etc.; and respectively gro
View on GitHub
GitHub Stars18
CategoryDevelopment
Updated2mo ago
Forks4

Languages

C

Security Score

90/100

Audited on Feb 4, 2026

No findings