Immunum Logo

Immunum is a high-performance antibody and TCR sequence numbering tool for Rust, Python, Polars and JS/TS.

Try it in your browser: interactive demo.

Overview

immunum is a library for numbering antibody and T-cell receptor (TCR) variable domain sequences. It uses Needleman-Wunsch semi-global alignment against position-specific scoring matrices built from consensus sequences, with BLOSUM62-based substitution scores.

Available as:

Rust crate — core library and CLI
Python package — with a Polars plugin for vectorized batch processing
npm package — for Node.js and browsers

Supported chains

| Antibody | TCR | | ------------ | ----------- | | IGH (heavy) | TRA (alpha) | | IGK (kappa) | TRB (beta) | | IGL (lambda) | TRD (delta) | | | TRG (gamma) |

Chain codes: H (IGH), K (IGK), L (IGL), A (TRA), B (TRB), D (TRD), G (TRG).

Chain type is automatically detected by aligning against all loaded chains and selecting the best match.

Numbering schemes

IMGT — all 7 chain types
Kabat — antibody chains (IGH, IGK, IGL)

Python
JavaScript / npm
- Installation
- Usage
Rust
- Installation
- Usage
CLI
- Options
- Input
- Output
- Examples
Development
Project structure

Python

Installation

pip install immunum

Numbering

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.number(sequence)
print(result.chain)       # H
print(result.confidence)  # 0.78
print(result.numbering)   # {"1": "Q", "2": "V", "3": "Q", ...}

Segmentation

segment splits the sequence into FR/CDR regions:

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.segment(sequence)
assert result.fr1 == 'QVQLVQSGAEVKRPGSSVTVSCKAS'
assert result.cdr1 == 'GGSFSTYA'
assert result.fr2 == 'LSWVRQAPGRGLEWMGG'
assert result.cdr2 == 'VIPLLTIT'
assert result.fr3 == 'NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC'
assert result.cdr3 == 'AREGTTGKPIGAFAH'
assert result.fr4 == 'WGQGTLVTVSS'

Polars plugin

For batch processing, immunum.polars registers elementwise Polars expressions:

import polars as pl
import immunum.polars as imp

df = pl.DataFrame({"sequence": [
    "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS",
    "DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK",
]})

# Add a struct column with chain, scheme, confidence, numbering
result = df.with_columns(
    imp.number(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("numbered")
)

# Add a struct column with FR/CDR segments
result = df.with_columns(
    imp.segment(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("segmented")
)

The number expression returns a struct with fields chain, scheme, confidence, and numbering (a struct of position→residue). The segment expression returns a struct with fields fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4, prefix, postfix.

JavaScript / npm

Installation

npm install immunum

Usage

const { Annotator } = require("immunum");

const annotator = new Annotator(["H", "K", "L"], "imgt");

const sequence =
  "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";

const result = annotator.number(sequence);
console.log(result.chain);      // "H"
console.log(result.confidence); // 0.97
console.log(result.numbering);  // { "1": "Q", "2": "V", ... }

const segments = annotator.segment(sequence);
console.log(segments.cdr3); // "AREGTTGKPIGAFAH"

annotator.free(); // or use `using annotator = new Annotator(...)` with explicit resource management

Rust

Installation

Add to Cargo.toml:

[dependencies]
immunum = "0.9"

Usage

use immunum::{Annotator, Chain, Scheme};

let annotator = Annotator::new(
    &[Chain::IGH, Chain::IGK, Chain::IGL],
    Scheme::IMGT,
    None, // uses default min_confidence of 0.5
).unwrap();

let sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";

let result = annotator.number(sequence).unwrap();
println!("Chain: {}", result.chain);        // IGH
println!("Confidence: {:.2}", result.confidence);
for (aa, pos) in sequence.chars().zip(result.positions.iter()) {
    println!("{} -> {}", aa, pos);
}

let segments = annotator.segment(sequence).unwrap();
println!("CDR3: {}", segments.cdr3);

CLI

immunum number [OPTIONS] [INPUT] [OUTPUT]

Options

| Flag | Description | Default | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------- | ------- | | -s, --scheme | Numbering scheme: imgt (i), kabat (k) | imgt | | -c, --chain | Chain filter: h,k,l,a,b,g,d or groups: ig, tcr, all. Accepts any form (h, heavy, igh), case-insensitive. | ig | | -f, --format | Output format: tsv, json, jsonl | tsv |

Input

Accepts a raw sequence, a FASTA file, or stdin (auto-detected):

immunum number EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS
immunum number sequences.fasta
cat sequences.fasta | immunum number
immunum number - < sequences.fasta

Output

Writes to stdout by default, or to a file if a second positional argument is given:

immunum number sequences.fasta results.tsv
immunum number -f json sequences.fasta results.json

Examples

# Kabat scheme, JSON output
immunum number -s kabat -f json EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS

# All chains (antibody + TCR), JSONL output
immunum number -c all -f jsonl sequences.fasta

# TCR sequences only, save to file
immunum number -c tcr tcr_sequences.fasta output.tsv

# Extract sequences from a TSV column and pipe in (see fixtures/ig.tsv)
tail -n +2 fixtures/ig.tsv | cut -f2 | immunum number
awk -F'\t' 'NR==1{for(i=1;i<=NF;i++) if($i=="sequence") c=i} NR>1{print $c}' fixtures/ig.tsv | immunum number

# Filter TSV output to CDR3 positions (111-128 in IMGT)
immunum number sequences.fasta | awk -F'\t' '$4 >= 111 && $4 <= 128'

# Filter to heavy chain results only
immunum number -c all sequences.fasta | awk -F'\t' 'NR==1 || $2=="H"'

# Extract CDR3 sequences with jq
immunum number -f json sequences.fasta | jq '[.[] | {id: .sequence_id, numbering}]'

Development

To orchestrate a project between cargo and python, we use task. You can install it with:

uv tool install go-task-bin

And then run task or task --list-all to get the full list of available tasks.

By default, dev profile will be used in all but benchmark-* tasks, but you can change it via providing PROFILE=release to your task.

Also, by default, task caches results, but you can ignore it by running task my-task -f.

Building local environment

# build a dev environment
task build-local

# build a dev environment with --release flag
task build-local PROFILE=release

Testing

task test-rust    # test only rust code
task test-python  # test only python code
task test         # test all code

Linting

task format  # formats python and rust code
task lint    # runs linting for python and rust

Benchmarking

There are multiple benchmarks in the repository. For full list, see task | grep benchmark:

$ task | grep benchmark
* benchmark-accuracy:           Accuracy benchmark across all fixtures (1k sequences, 7 rounds each)
* benchmark-cli:                Benchmark correctness of the CLI tool
* benchmark-comparison:         Speed + correctness benchmark: immunum vs antpack vs anarci (1k IGH sequences)
* benchmark-scaling:            Scaling benchmark: sizes 100..10M (10x steps), 1 round, H/imgt. Pass CLI_ARGS to filter tools, e.g. -- --tools immunum
* benchmark-speed:              Speed benchmark across dataset sizes (100 to 1M sequences, 7 rounds, H/imgt)
* benchmark-speed-polars:       Speed benchmark for immunum polars across

Immunum

Install / Use

README

Overview

Supported chains

Numbering schemes

Table of Contents

Python

Installation

Numbering

Segmentation

Polars plugin

JavaScript / npm

Installation

Usage

Rust

Installation

Usage

CLI

Options

Input

Output

Examples

Development

Building local environment

Testing

Linting

Benchmarking