IsoRefiner

IsoRefiner is a refinement tool to identify exon-intron structures of transcript (RNA) isoforms using long reads. It employs multiple transcript-identification tools, filters erroneous structures, merges results from the tools, and constructs the final dataset including novel transcript structures. Its inputs are long reads and reference data (genome and annotation), and it outputs a refined dataset (GTF file). We tested IsoRefiner using Oxford Nanopore cDNA reads, although it can potentially accept other types of reads such as PacBio. We have submitted a paper describing the IsoRefiner algorithm, and it is under review.

Publication

Tanaka Y., Sunamura N., Kajitani R., Ikeguchi M., and Kunimoto R. Long-read RNA sequencing unveils a novel cryptic exon in MNAT1 along with its full-length transcript structure in TDP-43 proteinopathy. Communications Biology 8, 1056 (2025). https://doi.org/10.1038/s42003-025-08463-4

Installation

We tested IsoRefiner on Linux x86_64 environments. After installation, you can execute isorefiner command.

Bioconda

conda install -y -c conda-forge -c bioconda isorefiner

Miniconda is utilized. It may take a long time to solve dependencies, and you can use mamba instead of conda to reduce the time.
Alternatively, you can install it and create a virtual environment simultaneously:

conda create -y -c conda-forge -c bioconda -n isorefiner_env python=3.12.8 isorefiner
conda activate isorefiner_env

python=3.12.8 is added to save time to solve dependencies.

Docker

docker pull rkajitani/isorefiner

Start a container interactively:

docker run -it -v $(pwd):/work -w /work rkajitani/isorefiner /bin/bash

Or, run isorefiner as a command:

docker run -v $(pwd):/work -w /work --rm rkajitani/isorefiner isorefiner ...

Binding of directories, -v $(pwd):/work -w /work, can be changed arbitrarily.

Dependency

Required tools are listed in the YAML file for conda. All of the required bioinformatics tools can be installed through the Bioconda channel.

Test

cd test/isorefiner
bash cmd.sh

A dataset and script are in the test directory. If you succeed in a test, isorefiner_refined.gtf is output. The test dataset was generated by a simulator, SQANTI-SIM.

Quick start

isorefiner trans_struct_wf -r reads.fastq -g genome.fasta -a ref_annot.gtf -t 32

Above, IsoRefiner executes a workflow to refine transcript structures. Its subcommands are used internally. reads.fastq is a file of input long reads (FASTQ or FASTA, gzip allowed). Multiple files can be specified as space-delimited string (e.g., "reads_1.fastq reads_2.fastq"). genome.fasta and ref_annot.gtf are the reference genome and annotation, respectively. The number of threads (parallelization) is 32 in this command. Final result is isorefiner_refined.gtf.

Workflow command usage

The command below runs an end-to-end workflow, which uses subcommands internally. Although detailed parameters for internal steps can not be specified, it is convenient to run the workflow without preparing a complex shell-script. Intermediate files are in the directory named isorefiner_{command}_work (default) or -d argument, and a log file named log.txt is created in the same directory.

trans_struct_wf

Workflow of transcript-structure refinement.

isorefiner trans_struct_wf [-h] -r [READS ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS]

options:
  -h, --help            show this help message and exit
  -r [READS ...], --reads [READS ...]
                        Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
  -g GENOME, --genome GENOME
                        Reference genome (FASTA, mandatory) (default: None)
  -a REF_GTF, --ref_gtf REF_GTF
                        Reference genome annotation (GTF, mandatory) (default: None)
  -o OUT_GTF, --out_gtf OUT_GTF
                        Final output file name (GTF) (default: isorefiner_refined.gtf)
  -d WORK_DIR, --work_dir WORK_DIR
                        Working directory containing intermediate and log files (default: isorefiner_trans_struct_wf_work)
  -t THREADS, --threads THREADS
                        Number of threads (default: 1)

output: isorefiner_refined.gtf (-o argument)

Command usage for each step

Each command below corresponds to a specific step used in the workflow. When specifing detailed parameters, it is suitable to execute these commands directly with options. The example step-by-step procedures are written in step_by_step.sh. Intermediate files are in the directory named isorefiner_{command}_work (default) or -d argument, and a log file named log.txt is created in the same directory.

trim

Trim nanopore reads using Porechop_ABI.

isorefiner trim [-h] -r [READS ...] [-o OUT_PREFIX] [-d WORK_DIR] [-t THREADS] [-p TOOL_OPTION]

options:
  -h, --help            show this help message and exit
  -r [READS ...], --reads [READS ...]
                        Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
  -o OUT_PREFIX, --out_prefix OUT_PREFIX
                        Prefix of final output files (extentions are those of input files) (default: isorefiner_trimmed)
  -d WORK_DIR, --work_dir WORK_DIR
                        Working directory containing intermediate and log files (default: isorefiner_trim_work)
  -t THREADS, --threads THREADS
                        Number of threads (default: 1)
  -p TOOL_OPTION, --tool_option TOOL_OPTION
                        Option for Porechomp_ABI (quoted string) (default: )

output: isorefiner_trimmed.fastq ({-o argument}.fastq)
  When multiple input files, isorefiner_trimmed_1.fastq isorefiner_trimmed_2.fastq ...
  File extentions are inherited from the input files.

map

Map reads to the reference genome using Minimap2, and sort BAM files.

isorefiner map [-h] -r [READS ...] -g GENOME [-o OUT_PREFIX] [-d WORK_DIR] [-t THREADS] [-m MM2_OPTION] [-s SORT_OPTION]

options:
  -h, --help            show this help message and exit
  -r [READS ...], --reads [READS ...]
                        Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
  -g GENOME, --genome GENOME
                        Reference genome (FASTA, mandatory) (default: None)
  -o OUT_PREFIX, --out_prefix OUT_PREFIX
                        Prefix of output BAM files (default: isorefiner_mapped)
  -d WORK_DIR, --work_dir WORK_DIR
                        Working directory containing intermediate and log files (default: isorefiner_map_work)
  -t THREADS, --threads THREADS
                        Number of threads (default: 1)
  -m MM2_OPTION, --mm2_option MM2_OPTION
                        Option for minimap2 (quoted string) (default: -x splice -ub -k14 --secondary=no)
  -s SORT_OPTION, --sort_option SORT_OPTION
                        Option for samtools sort (quoted string) (default: -m 2G)

output: isorefiner_mapped.bam ({-o argument}.bam)
  When multiple input files, isorefiner_mapped_1.bam isorefiner_mapped_2.bam ...

run_bambu

Run Bambu (read mapping-based tool).

isorefiner run_bambu [-h] -b [BAM ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS]

options:
  -h, --help            show this help message and exit
  -b [BAM ...], --bam [BAM ...]
                        Mapped reads files (BAM, mandatory) (default: None)
  -g GENOME, --genome GENOME
                        Reference genome (FASTA, mandatory) (default: None)
  -a REF_GTF, --ref_gtf REF_GTF
                        Reference genome annotation (GTF, mandatory) (default: None)
  -o OUT_GTF, --out_gtf OUT_GTF
                        Final output file name (GTF) (default: isorefiner_bambu.gtf)
  -d WORK_DIR, --work_dir WORK_DIR
                        Working directory containing intermediate and log files (default: isorefiner_bambu_work)
  -t THREADS, --threads THREADS
                        Number of threads (default: 1)

output: isorefiner_bambu.gtf (-o argument)

run_espresso

Run ESPRESSO (read mapping-based tool).

isorefiner run_espresso [-h] -b [BAM ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS] [-s TOOL_S_OPTION] [-c TOOL_C_OPTION] [-q TOOL_Q_OPTION]

options:
  -h, --help            show this help message and exit
  -b [BAM ...], --bam [BAM ...]
                        Mapped reads files (BAM, mandatory) (default: None)
  -g GENOME, --genome GENOME
                        Reference genome (FASTA, mandatory) (default: None)
  -a REF_GTF, --ref_gtf REF_GTF
                        Reference genome annotation (GTF, mandatory) (default: None)
  -o OUT_GTF, --out_gtf OUT_GTF
                        Final output file name (GTF) (default: isorefiner_espresso.gtf)
  -d WORK_DIR, --work_dir WORK_DIR
                        Working directory containing intermediate and log files (default: isorefiner_espresso_work)
  -t THREADS, --threads THREADS
                        Number of threads (default: 1)
  -s TOOL_S_OPTION, --tool_s_option TOOL_S_OPTION
                        Option for ESPRESSO_S.pl (quoted string) (default: )
  -c TOOL_C_OPTION, --tool_c_option TOOL_C_OPTION
                        Option for ESPRESSO_C.pl (quoted string) (default: )
  -q TOOL_Q_OPTION, --tool_q_option TOOL_Q_OPTION
                        Option for ESPRESSO_Q.pl (quoted string) (default: )

output: isorefiner_espresso.gtf (-o argument)

run_isoquant

Run IsoQuant (read mapping-based tool).

IsoRefiner

Install / Use

README

IsoRefiner

Publication

Installation

Bioconda

Docker

Dependency

Test

Quick start

Workflow command usage

trans_struct_wf

Command usage for each step

trim

map

run_bambu

run_espresso

run_isoquant