Immunum
A high-performance antibody and TCR sequence numbering tool for Rust, Python, Polars and JS/TS.
Install / Use
/learn @ENPICOM/ImmunumREADME
Immunum is a high-performance antibody and TCR sequence numbering tool for Rust, Python, Polars and JS/TS.
Try it in your browser: interactive demo.
Overview
immunum is a library for numbering antibody and T-cell receptor (TCR) variable domain sequences. It uses Needleman-Wunsch semi-global alignment against position-specific scoring matrices built from consensus sequences, with BLOSUM62-based substitution scores.
Available as:
- Rust crate — core library and CLI
- Python package — with a Polars plugin for vectorized batch processing
- npm package — for Node.js and browsers
Supported chains
| Antibody | TCR | | ------------ | ----------- | | IGH (heavy) | TRA (alpha) | | IGK (kappa) | TRB (beta) | | IGL (lambda) | TRD (delta) | | | TRG (gamma) |
Chain codes: H (IGH), K (IGK), L (IGL), A (TRA), B (TRB), D (TRD), G (TRG).
Chain type is automatically detected by aligning against all loaded chains and selecting the best match.
Numbering schemes
- IMGT — all 7 chain types
- Kabat — antibody chains (IGH, IGK, IGL)
Table of Contents
Python
Installation
pip install immunum
Numbering
from immunum import Annotator
annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")
sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"
result = annotator.number(sequence)
print(result.chain) # H
print(result.confidence) # 0.78
print(result.numbering) # {"1": "Q", "2": "V", "3": "Q", ...}
Segmentation
segment splits the sequence into FR/CDR regions:
from immunum import Annotator
annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")
sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"
result = annotator.segment(sequence)
assert result.fr1 == 'QVQLVQSGAEVKRPGSSVTVSCKAS'
assert result.cdr1 == 'GGSFSTYA'
assert result.fr2 == 'LSWVRQAPGRGLEWMGG'
assert result.cdr2 == 'VIPLLTIT'
assert result.fr3 == 'NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC'
assert result.cdr3 == 'AREGTTGKPIGAFAH'
assert result.fr4 == 'WGQGTLVTVSS'
Polars plugin
For batch processing, immunum.polars registers elementwise Polars expressions:
import polars as pl
import immunum.polars as imp
df = pl.DataFrame({"sequence": [
"QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS",
"DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK",
]})
# Add a struct column with chain, scheme, confidence, numbering
result = df.with_columns(
imp.number(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("numbered")
)
# Add a struct column with FR/CDR segments
result = df.with_columns(
imp.segment(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("segmented")
)
The number expression returns a struct with fields chain, scheme, confidence, and numbering (a struct of position→residue). The segment expression returns a struct with fields fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4, prefix, postfix.
JavaScript / npm
Installation
npm install immunum
Usage
const { Annotator } = require("immunum");
const annotator = new Annotator(["H", "K", "L"], "imgt");
const sequence =
"QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";
const result = annotator.number(sequence);
console.log(result.chain); // "H"
console.log(result.confidence); // 0.97
console.log(result.numbering); // { "1": "Q", "2": "V", ... }
const segments = annotator.segment(sequence);
console.log(segments.cdr3); // "AREGTTGKPIGAFAH"
annotator.free(); // or use `using annotator = new Annotator(...)` with explicit resource management
Rust
Installation
Add to Cargo.toml:
[dependencies]
immunum = "0.9"
Usage
use immunum::{Annotator, Chain, Scheme};
let annotator = Annotator::new(
&[Chain::IGH, Chain::IGK, Chain::IGL],
Scheme::IMGT,
None, // uses default min_confidence of 0.5
).unwrap();
let sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";
let result = annotator.number(sequence).unwrap();
println!("Chain: {}", result.chain); // IGH
println!("Confidence: {:.2}", result.confidence);
for (aa, pos) in sequence.chars().zip(result.positions.iter()) {
println!("{} -> {}", aa, pos);
}
let segments = annotator.segment(sequence).unwrap();
println!("CDR3: {}", segments.cdr3);
CLI
immunum number [OPTIONS] [INPUT] [OUTPUT]
Options
| Flag | Description | Default |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------- | ------- |
| -s, --scheme | Numbering scheme: imgt (i), kabat (k) | imgt |
| -c, --chain | Chain filter: h,k,l,a,b,g,d or groups: ig, tcr, all. Accepts any form (h, heavy, igh), case-insensitive. | ig |
| -f, --format | Output format: tsv, json, jsonl | tsv |
Input
Accepts a raw sequence, a FASTA file, or stdin (auto-detected):
immunum number EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS
immunum number sequences.fasta
cat sequences.fasta | immunum number
immunum number - < sequences.fasta
Output
Writes to stdout by default, or to a file if a second positional argument is given:
immunum number sequences.fasta results.tsv
immunum number -f json sequences.fasta results.json
Examples
# Kabat scheme, JSON output
immunum number -s kabat -f json EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS
# All chains (antibody + TCR), JSONL output
immunum number -c all -f jsonl sequences.fasta
# TCR sequences only, save to file
immunum number -c tcr tcr_sequences.fasta output.tsv
# Extract sequences from a TSV column and pipe in (see fixtures/ig.tsv)
tail -n +2 fixtures/ig.tsv | cut -f2 | immunum number
awk -F'\t' 'NR==1{for(i=1;i<=NF;i++) if($i=="sequence") c=i} NR>1{print $c}' fixtures/ig.tsv | immunum number
# Filter TSV output to CDR3 positions (111-128 in IMGT)
immunum number sequences.fasta | awk -F'\t' '$4 >= 111 && $4 <= 128'
# Filter to heavy chain results only
immunum number -c all sequences.fasta | awk -F'\t' 'NR==1 || $2=="H"'
# Extract CDR3 sequences with jq
immunum number -f json sequences.fasta | jq '[.[] | {id: .sequence_id, numbering}]'
Development
To orchestrate a project between cargo and python, we use task.
You can install it with:
uv tool install go-task-bin
And then run task or task --list-all to get the full list of available tasks.
By default, dev profile will be used in all but benchmark-* tasks, but you can change it
via providing PROFILE=release to your task.
Also, by default, task caches results, but you can ignore it by running task my-task -f.
Building local environment
# build a dev environment
task build-local
# build a dev environment with --release flag
task build-local PROFILE=release
Testing
task test-rust # test only rust code
task test-python # test only python code
task test # test all code
Linting
task format # formats python and rust code
task lint # runs linting for python and rust
Benchmarking
There are multiple benchmarks in the repository. For full list, see task | grep benchmark:
$ task | grep benchmark
* benchmark-accuracy: Accuracy benchmark across all fixtures (1k sequences, 7 rounds each)
* benchmark-cli: Benchmark correctness of the CLI tool
* benchmark-comparison: Speed + correctness benchmark: immunum vs antpack vs anarci (1k IGH sequences)
* benchmark-scaling: Scaling benchmark: sizes 100..10M (10x steps), 1 round, H/imgt. Pass CLI_ARGS to filter tools, e.g. -- --tools immunum
* benchmark-speed: Speed benchmark across dataset sizes (100 to 1M sequences, 7 rounds, H/imgt)
* benchmark-speed-polars: Speed benchmark for immunum polars across
