IsoRefiner
Refinement tool for transcript isoform identification using long reads.
Install / Use
/learn @rkajitani/IsoRefinerREADME
IsoRefiner
IsoRefiner is a refinement tool to identify exon-intron structures of transcript (RNA) isoforms using long reads. It employs multiple transcript-identification tools, filters erroneous structures, merges results from the tools, and constructs the final dataset including novel transcript structures. Its inputs are long reads and reference data (genome and annotation), and it outputs a refined dataset (GTF file). We tested IsoRefiner using Oxford Nanopore cDNA reads, although it can potentially accept other types of reads such as PacBio. We have submitted a paper describing the IsoRefiner algorithm, and it is under review.
Publication
Tanaka Y., Sunamura N., Kajitani R., Ikeguchi M., and Kunimoto R. Long-read RNA sequencing unveils a novel cryptic exon in MNAT1 along with its full-length transcript structure in TDP-43 proteinopathy. Communications Biology 8, 1056 (2025). https://doi.org/10.1038/s42003-025-08463-4
Installation
We tested IsoRefiner on Linux x86_64 environments. After installation, you can execute isorefiner command.
Bioconda
conda install -y -c conda-forge -c bioconda isorefiner
Miniconda is utilized. It may take a long time to solve dependencies, and you can use mamba instead of conda to reduce the time.
Alternatively, you can install it and create a virtual environment simultaneously:
conda create -y -c conda-forge -c bioconda -n isorefiner_env python=3.12.8 isorefiner
conda activate isorefiner_env
python=3.12.8 is added to save time to solve dependencies.
Docker
docker pull rkajitani/isorefiner
Start a container interactively:
docker run -it -v $(pwd):/work -w /work rkajitani/isorefiner /bin/bash
Or, run isorefiner as a command:
docker run -v $(pwd):/work -w /work --rm rkajitani/isorefiner isorefiner ...
Binding of directories, -v $(pwd):/work -w /work, can be changed arbitrarily.
Dependency
Required tools are listed in the YAML file for conda. All of the required bioinformatics tools can be installed through the Bioconda channel.
Test
cd test/isorefiner
bash cmd.sh
A dataset and script are in the test directory. If you succeed in a test, isorefiner_refined.gtf is output. The test dataset was generated by a simulator, SQANTI-SIM.
Quick start
isorefiner trans_struct_wf -r reads.fastq -g genome.fasta -a ref_annot.gtf -t 32
Above, IsoRefiner executes a workflow to refine transcript structures. Its subcommands are used internally. reads.fastq is a file of input long reads (FASTQ or FASTA, gzip allowed). Multiple files can be specified as space-delimited string (e.g., "reads_1.fastq reads_2.fastq"). genome.fasta and ref_annot.gtf are the reference genome and annotation, respectively. The number of threads (parallelization) is 32 in this command. Final result is isorefiner_refined.gtf.
Workflow command usage
The command below runs an end-to-end workflow, which uses subcommands internally. Although detailed parameters for internal steps can not be specified, it is convenient to run the workflow without preparing a complex shell-script. Intermediate files are in the directory named isorefiner_{command}_work (default) or -d argument, and a log file named log.txt is created in the same directory.
trans_struct_wf
Workflow of transcript-structure refinement.
isorefiner trans_struct_wf [-h] -r [READS ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS]
options:
-h, --help show this help message and exit
-r [READS ...], --reads [READS ...]
Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
-g GENOME, --genome GENOME
Reference genome (FASTA, mandatory) (default: None)
-a REF_GTF, --ref_gtf REF_GTF
Reference genome annotation (GTF, mandatory) (default: None)
-o OUT_GTF, --out_gtf OUT_GTF
Final output file name (GTF) (default: isorefiner_refined.gtf)
-d WORK_DIR, --work_dir WORK_DIR
Working directory containing intermediate and log files (default: isorefiner_trans_struct_wf_work)
-t THREADS, --threads THREADS
Number of threads (default: 1)
output: isorefiner_refined.gtf (-o argument)
Command usage for each step
Each command below corresponds to a specific step used in the workflow. When specifing detailed parameters, it is suitable to execute these commands directly with options. The example step-by-step procedures are written in step_by_step.sh. Intermediate files are in the directory named isorefiner_{command}_work (default) or -d argument, and a log file named log.txt is created in the same directory.
trim
Trim nanopore reads using Porechop_ABI.
isorefiner trim [-h] -r [READS ...] [-o OUT_PREFIX] [-d WORK_DIR] [-t THREADS] [-p TOOL_OPTION]
options:
-h, --help show this help message and exit
-r [READS ...], --reads [READS ...]
Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
-o OUT_PREFIX, --out_prefix OUT_PREFIX
Prefix of final output files (extentions are those of input files) (default: isorefiner_trimmed)
-d WORK_DIR, --work_dir WORK_DIR
Working directory containing intermediate and log files (default: isorefiner_trim_work)
-t THREADS, --threads THREADS
Number of threads (default: 1)
-p TOOL_OPTION, --tool_option TOOL_OPTION
Option for Porechomp_ABI (quoted string) (default: )
output: isorefiner_trimmed.fastq ({-o argument}.fastq)
When multiple input files, isorefiner_trimmed_1.fastq isorefiner_trimmed_2.fastq ...
File extentions are inherited from the input files.
map
Map reads to the reference genome using Minimap2, and sort BAM files.
isorefiner map [-h] -r [READS ...] -g GENOME [-o OUT_PREFIX] [-d WORK_DIR] [-t THREADS] [-m MM2_OPTION] [-s SORT_OPTION]
options:
-h, --help show this help message and exit
-r [READS ...], --reads [READS ...]
Reads (FASTQ or FASTA, gzip allowed, mandatory) (default: None)
-g GENOME, --genome GENOME
Reference genome (FASTA, mandatory) (default: None)
-o OUT_PREFIX, --out_prefix OUT_PREFIX
Prefix of output BAM files (default: isorefiner_mapped)
-d WORK_DIR, --work_dir WORK_DIR
Working directory containing intermediate and log files (default: isorefiner_map_work)
-t THREADS, --threads THREADS
Number of threads (default: 1)
-m MM2_OPTION, --mm2_option MM2_OPTION
Option for minimap2 (quoted string) (default: -x splice -ub -k14 --secondary=no)
-s SORT_OPTION, --sort_option SORT_OPTION
Option for samtools sort (quoted string) (default: -m 2G)
output: isorefiner_mapped.bam ({-o argument}.bam)
When multiple input files, isorefiner_mapped_1.bam isorefiner_mapped_2.bam ...
run_bambu
Run Bambu (read mapping-based tool).
isorefiner run_bambu [-h] -b [BAM ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS]
options:
-h, --help show this help message and exit
-b [BAM ...], --bam [BAM ...]
Mapped reads files (BAM, mandatory) (default: None)
-g GENOME, --genome GENOME
Reference genome (FASTA, mandatory) (default: None)
-a REF_GTF, --ref_gtf REF_GTF
Reference genome annotation (GTF, mandatory) (default: None)
-o OUT_GTF, --out_gtf OUT_GTF
Final output file name (GTF) (default: isorefiner_bambu.gtf)
-d WORK_DIR, --work_dir WORK_DIR
Working directory containing intermediate and log files (default: isorefiner_bambu_work)
-t THREADS, --threads THREADS
Number of threads (default: 1)
output: isorefiner_bambu.gtf (-o argument)
run_espresso
Run ESPRESSO (read mapping-based tool).
isorefiner run_espresso [-h] -b [BAM ...] -g GENOME -a REF_GTF [-o OUT_GTF] [-d WORK_DIR] [-t THREADS] [-s TOOL_S_OPTION] [-c TOOL_C_OPTION] [-q TOOL_Q_OPTION]
options:
-h, --help show this help message and exit
-b [BAM ...], --bam [BAM ...]
Mapped reads files (BAM, mandatory) (default: None)
-g GENOME, --genome GENOME
Reference genome (FASTA, mandatory) (default: None)
-a REF_GTF, --ref_gtf REF_GTF
Reference genome annotation (GTF, mandatory) (default: None)
-o OUT_GTF, --out_gtf OUT_GTF
Final output file name (GTF) (default: isorefiner_espresso.gtf)
-d WORK_DIR, --work_dir WORK_DIR
Working directory containing intermediate and log files (default: isorefiner_espresso_work)
-t THREADS, --threads THREADS
Number of threads (default: 1)
-s TOOL_S_OPTION, --tool_s_option TOOL_S_OPTION
Option for ESPRESSO_S.pl (quoted string) (default: )
-c TOOL_C_OPTION, --tool_c_option TOOL_C_OPTION
Option for ESPRESSO_C.pl (quoted string) (default: )
-q TOOL_Q_OPTION, --tool_q_option TOOL_Q_OPTION
Option for ESPRESSO_Q.pl (quoted string) (default: )
output: isorefiner_espresso.gtf (-o argument)
run_isoquant
Run IsoQuant (read mapping-based tool).
i
