UniVI
UniVI is a scalable multi-modal VAE toolkit for aligning heterogeneous single-cell datasets into a shared latent space—supporting unimodal, dual-modal, and tri-modal (and beyond) integration. It can additionally be used for cross-modal imputation, data generation of biologically-relevant synthetic samples, data denoising, and structured evaluation.
Install / Use
/learn @Ashford-A/UniVIREADME
UniVI
<picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/Ashford-A/UniVI/v0.4.7/assets/figures/univi_overview_dark.png"> <img src="https://raw.githubusercontent.com/Ashford-A/UniVI/v0.4.7/assets/figures/univi_overview_light.png" alt="UniVI overview and evaluation roadmap" width="100%"> </picture>UniVI is a multi-modal variational autoencoder (VAE) toolkit for aligning and integrating single-cell modalities such as RNA, ADT (CITE-seq), ATAC, and coverage-aware / proportion-like assays (e.g., single-cell methylome features).
Common use cases:
- Joint embedding of paired multimodal data (CITE-seq, Multiome, TEA-seq)
- Bridge mapping / projection of unimodal cohorts into a paired latent
- Cross-modal imputation (RNA→ADT, ATAC→RNA, RNA→methylome, …)
- Denoising / reconstruction with likelihood-aware decoders
- Generating biologically-relevant samples due to the generative nature of VAEs
- Evaluation (FOSCTTM, Recall@k, mixing/entropy, label transfer, clustering, basic MoE gating diagnostics)
Advanced/experimental use cases (all optional, model can be run entirely without these):
- Supervised heads (either a decoder classification head or a whole categorical encoder/decoder model VAE, treated as a modality)
- Expanded MoE gating diagnostics (setting a simple gating network during training)
- Transformer encoders (experimental, added for exploratory analysis)
- Fused transformer latent space (even more experimental, added for exploratory analysis/future model expansion)
Preprint
If you use UniVI in your work, please cite:
Ashford AJ, Enright T, Somers J, Nikolova O, Demir E.
Unifying multimodal single-cell data with a mixture-of-experts β-variational autoencoder framework.
bioRxiv (2025; updated 2026). doi: 10.1101/2025.02.28.640429
@article{Ashford2025UniVI,
title = {Unifying multimodal single-cell data with a mixture-of-experts β-variational autoencoder framework},
author = {Ashford, A. J. and Enright, T. and Somers, J. and Nikolova, O. and Demir, E.},
journal = {bioRxiv},
date = {2025},
doi = {10.1101/2025.02.28.640429},
url = {https://www.biorxiv.org/content/10.1101/2025.02.28.640429},
note = {Preprint (updated 2026)}
}
Installation
PyPI
pip install univi
UniVI requires PyTorch. If
import torchfails, install PyTorch for your platform/CUDA from PyTorch's official install instructions.
Conda / mamba
conda install -c conda-forge univi
# or
mamba install -c conda-forge univi
Development install (from source)
git clone https://github.com/Ashford-A/UniVI.git
cd UniVI
conda env create -f envs/univi_env.yml
conda activate univi_env
pip install -e .
Data expectations
UniVI expects per-modality AnnData objects.
- Each modality is an
AnnData - For paired settings, modalities share the same cells (
obs_names, same order) - Raw counts often live in
.layers["counts"] - Model inputs typically live in
.X(or.obsm["X_*"]for ATAC LSI) - Model input is a dictionary of these
AnnDataobjects with the dictionary key specifying the modality (e.g.rna,adt,atac). These keys are used later for evaluation functions (cross-reconstruction etc.).
Recommended convention:
.layers["counts"]= raw counts / raw signal.X/.obsm["X_*"]= model input space (log1p RNA, CLR ADT, LSI ATAC, methyl fractions, etc.).layers["denoised_*"]/.layers["imputed_*"]= UniVI outputs
Quickstart (Python / Jupyter)
Minimal "notebook path": load paired AnnData → preprocess → train → encode/evaluate → plot.
The sections below walk through a complete CITE-seq (RNA + ADT) example. All patterns generalize to Multiome (RNA + ATAC), TEA-seq (RNA + ADT + ATAC), and any other paired combination supported by UniVI.
0) Imports
import numpy as np
import scanpy as sc
import torch
from torch.utils.data import DataLoader, Subset
from univi import UniVIMultiModalVAE, ModalityConfig, UniVIConfig, TrainingConfig
from univi.data import MultiModalDataset, align_paired_obs_names, collate_multimodal_xy_recon
from univi.trainer import UniVITrainer
collate_multimodal_xy_reconis the required collate function forDataLoaderwhen usingMultiModalDataset. It correctly handles the(x, recon_targets)batch format expected by the trainer, including coverage-aware modalities such as beta-binomial methylome. Always pass it ascollate_fn=collate_multimodal_xy_reconwhen constructing your loaders.
1) Load paired AnnData
For CITE-seq data:
rna = sc.read_h5ad("path/to/rna_citeseq.h5ad")
adt = sc.read_h5ad("path/to/adt_citeseq.h5ad")
For Multiome (RNA + ATAC):
rna = sc.read_h5ad("path/to/rna_multiome.h5ad")
atac = sc.read_h5ad("path/to/atac_multiome.h5ad")
For tri-modal TEA-seq / DOGMA-seq / ASAP-seq:
rna = sc.read_h5ad("path/to/rna.h5ad")
adt = sc.read_h5ad("path/to/adt.h5ad")
atac = sc.read_h5ad("path/to/atac.h5ad")
2) Preprocess each modality
After preprocessing, set
.Xto the model input space and keep raw counts in.layers["counts"]. Match thelikelihoodinModalityConfigto your.Xspace (see the likelihood guidance table in step 4).
RNA — log-normalize, select HVGs, scale:
rna.layers["counts"] = rna.X.copy()
rna.var["mt"] = rna.var_names.str.upper().str.startswith("MT-")
sc.pp.calculate_qc_metrics(rna, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True)
sc.pp.normalize_total(rna, target_sum=1e4)
sc.pp.log1p(rna)
rna.raw = rna # snapshot log-space for plotting/DE
sc.pp.highly_variable_genes(rna, flavor="seurat_v3", n_top_genes=2000, subset=True)
sc.pp.scale(rna, max_value=10)
ADT — CLR per cell, scale per protein:
adt.layers["counts"] = adt.X.copy()
def clr_per_cell(X):
X = X.toarray() if hasattr(X, "toarray") else np.asarray(X)
logX = np.log1p(X)
return logX - logX.mean(axis=1, keepdims=True)
adt.X = clr_per_cell(adt.layers["counts"])
sc.pp.scale(adt, zero_center=True, max_value=10)
ATAC — TF-IDF → LSI, drop first component:
atac.layers["counts"] = atac.X.copy()
def tfidf(X):
X = X.tocsr() if hasattr(X, "tocsr") else X
cell_sum = np.asarray(X.sum(axis=1)).ravel()
cell_sum[cell_sum == 0] = 1.0
tf = X.multiply(1.0 / cell_sum[:, None])
df = np.asarray((X > 0).sum(axis=0)).ravel()
idf = np.log1p(X.shape[0] / (1.0 + df))
return tf.multiply(idf)
X_tfidf = tfidf(atac.layers["counts"])
from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(n_components=101, random_state=0)
X_lsi = svd.fit_transform(X_tfidf)
atac.obsm["X_lsi"] = X_lsi[:, 1:] # drop first component (depth correlated)
Post-preprocessing: assemble adata_dict
# Sanity check (CITE-seq)
assert rna.n_obs == adt.n_obs and np.all(rna.obs_names == adt.obs_names)
# CITE-seq
adata_dict = {"rna": rna, "adt": adt}
# Multiome
# adata_dict = {"rna": rna, "atac": atac}
# Tri-modal
# adata_dict = {"rna": rna, "adt": adt, "atac": atac}
# Unimodal VAE
# adata_dict = {"rna": rna}
align_paired_obs_names(adata_dict) # ensures matching obs_names and order
Avoiding data leakage: if you want to run UniVI inductively, apply feature selection, scaling, and any learned transforms (e.g., PCA/LSI) on the training set only, then apply the training-set-derived parameters to validation and test sets.
3) Dataset + DataLoaders
Device detection (CUDA → MPS → XPU → CPU):
device = (
"cuda" if torch.cuda.is_available() else
("mps" if getattr(torch.backends, "mps", None) is not None
and torch.backends.mps.is_available() else
("xpu" if hasattr(torch, "xpu") and torch.xpu.is_available() else
"cpu"))
)
Build dataset:
dataset = MultiModalDataset(
adata_dict=adata_dict,
device=None, # dataset yields CPU tensors; model handles GPU transfer
X_key_by_mod={
"rna" : "X", # uses rna.X
"adt" : "X", # uses adt.X
# "atac": "obsm:X_lsi", # uses atac.obsm["X_lsi"]
},
)
Train / val / test split (80 / 10 / 10):
n = rna.n_obs
idx = np.arange(n)
rng = np.random.default_rng(0)
rng.shuffle(idx)
n_train = int(0.8 * n)
n_val = int(0.1 * n)
train_idx = idx[:n_train]
val_idx = idx[n_train : n_train + n_val]
test_idx = idx[n_train + n_val :]
# Save split indices for reproducibility
np.savez("splits_seed0.npz", train_idx=train_idx, val_idx=val_idx, test_idx=test_idx)
Construct loaders (always pass collate_fn=collate_multimodal_xy_recon):
train_loader = DataLoader(
Subset(dataset, train_idx),
batch_size=256,
shuffle=True,
num_workers=0,
collate_fn=collate_multimodal_xy_recon,
)
val_loader = DataLoader(
Subset(dataset, val_idx),
batch_size=256,
shuffle=False,
num_workers=0,
collate_fn=collate_multimodal_xy_recon,
)
test_loader = DataLoader(
Subset(dataset, test_idx),
batch_size=256,
shuffle=Fa
