OpusTomo
structural heterogeneity analysis for cryo-ET subtomogram
Install / Use
/learn @alncat/OpusTomoREADME
Table of contents
OPUS-ET <div id="opustomo">
This repository contains the implementation of OPUS-Electron Tomography (OPUS-ET), developed by Zhenwei (Benedict, 本笃) Luo in the group of Prof. Jianpeng Ma at Fudan University.
OPUS-ET is designed to work seamlessly with the WARP/M pipeline (through a modified WARP which can export subtomogram without CTF correction and CTF parameters as csv, https://github.com/alncat/warp/tree/alncat) to facilitate high-resolution cryo-electron tomography (cryo-ET) structure determination and to reveal in situ macromolecular dynamics. OPUS-ET supports different parallel strategies for efficient training! Note that OPUS-ET is still undergoing code improvements, you can kept up to data by git pull.
▶ Tutorials: see the OPUS-ET wiki: https://github.com/alncat/opusTomo/wiki
▶ Preprint: https://www.biorxiv.org/content/10.1101/2025.11.21.688990v1
Overview
Structural heterogeneity is a central challenge in cryo-ET and arises at multiple stages of the processing pipeline:
- Subtomogram picking:
After tomogram reconstruction, you need to pick subtomograms corresponding to the macromolecule of interest. Because cells contain a large variety of different molecular species, template matching and subtomogram picking are heavily affected by structural and compositional heterogeneity.
- Downstream analysis:
Even after obtaining a relatively “pure” set of subtomograms for a target complex, structural heterogeneity persists. Macromolecules in the cellular environment are dynamic and can adopt multiple conformations and compositions. The vitrified samples therefore preserve a rich ensemble of states rather than a single static structure.
OPUS-ET is designed to tackle this structural heterogeneity end-to-end, from the earliest picking stages through to detailed conformational landscape analysis.
Key Features
- Filtering template matching results
OPUS-ET can filter template matching outputs (PyTOM) to obtain highly homogeneous subtomogram sets that are suitable for sub-nanometer resolution reconstruction.
- Multi-scale heterogeneity analysis
OPUS-ET can be applied to STA results (e.g., from M): disentangle compositional heterogeneity, reconstructing different subcomplexes or binding states; reconstruct continuous conformational dynamics, providing a low-dimensional representation of in situ flexibility.
In short, OPUS-ET aims to reconstruct both compositional and conformational changes across the entire cryo-ET processing pipeline, from raw picking to high-resolution structural and dynamical analysis.
Exemplar dynamics resolved by OPUS-ET are shown below: The counter rotation between F0 and F1 subcomplexes in ATP synthase when switching primary states (3C->1A),
https://github.com/user-attachments/assets/0a743ce3-e09c-4610-a9b4-c355154f856f
For S. pombe 80S ribosome, a part of translocation dynamics resolved by traversing PC3, which shows the translocation of A/T- and P- site tRNAs to A/P- and P/E- site tRNAs. A superb reference for the translation elongation cycle can be found in Ranjan's work, https://www.embopress.org/doi/full/10.15252/embj.2020106449 .
https://github.com/user-attachments/assets/19db80d0-7de6-4ae6-9549-ead16f61915a
A part of translation elongation cycle resolved by traversing PC9, which shows the translocation of A/T- tRNAs to A-site and the exit of E-site tRNA for Cm-treated M. pneumoniae 70S ribosome. The model is trained using M refined subtomograms.
https://github.com/user-attachments/assets/9d5a52f7-95dd-4c30-b5bd-938dbfaa5d69
Another important function of OPUS-ET is clustering the template matching result and achiving high-resolution reconstruction, which are detailed in the wiki (https://github.com/alncat/opusTomo/wiki)
<img width="859" height="664" alt="image" src="https://github.com/user-attachments/assets/9f49fb93-e19e-4162-9af4-a198e1e7c9a7" />The architecture of OPUS-ET is demonstrated as follows:
<img width="838" height="334" alt="image" src="https://github.com/user-attachments/assets/97a52494-18fc-4e12-9314-9fd23a9bd6e8" />The architecture of encoder is (Encoder class in cryodrgn/models.py):
<img width="2056" height="314" alt="image" src="https://github.com/user-attachments/assets/641c9cb1-4c8e-4eb2-a2e2-a2b9cb3e4875" />The architecture of composition decoder is (ConvTemplate class in cryodrgn/models.py. In this version, the default size of output volume is set to 160^3, which can be set via --templateres:
The architecture of conformation decoder is:
<img width="457" height="144" alt="image" src="https://github.com/user-attachments/assets/a34ab59d-fb4e-47c6-9aa7-bd0138661bdb" />OPUS-ET directly takes 3D subtomograms as input. Its performance is robust across subtomograms obtained from a wide range of particle localization methods, including neural network–based approaches such as DeePiCt (https://github.com/ZauggGroup/DeePiCt) and more classical, template-based methods such as PyTom (https://github.com/SBC-Utrecht/PyTom/, which provides GPU-accelerated template matching with a user-friendly GUI).
For structural heterogeneity analysis, OPUS-ET incorporates simple yet powerful statistical tools—principally PCA and k-means clustering. In practice, these methods provide rich insights into both conformational and compositional variability within macromolecular assemblies. In particular, applying PCA to the learned latent space enables OPUS-ET to decompose structural variations in cryo-ET datasets into interpretable modes of motion. This greatly facilitates downstream biological interpretation. Conceptually, this latent-space PCA is analogous to normal mode analysis (NMA) in structural biology, which characterizes the intrinsic dynamical modes of macromolecules.
C. reinhardtii ATP synthase <a name="atp"></a>
The C. reinhardtii dataset is publicly available at EMPIAR-11830. OPUS-ET resolved the rotary substates of ATP synthase in situ. <img width="792" height="711" alt="image" src="https://github.com/user-attachments/assets/9f5b9e9d-269e-441b-84b9-2c02a76e9e87" />
S. pombe 80S Ribosome <a name="80s"></a>
The S. pombe dataset is publicly available at EMPIAR-10988 (https://www.ebi.ac.uk/empiar/EMPIAR-10988/). In this dataset, OPUS-ET has shown superior structural disentanglement ability to capture continous structral changes into PCs of the composition latent space, and characterizes functionally important subpopulations. The results are deposited in https://zenodo.org/records/12631920.
<img width="860" height="662" alt="image" src="https://github.com/user-attachments/assets/a13ee6d1-f614-43fb-96b6-6d53f0612e6d" />S. pombe FAS <a name="fas"></a>
It can even reconstruct higher resolution structure for FAS in EMPIAR-10988 by clustering 221 particles from 4800 noisy subtomograms picked by template matching! The template matching and subtomogram averaging results are in the folder https://drive.google.com/drive/folders/1OijHVrCu3M-OgqvNu_YZ4jW8OWn6OwaV?usp=drive_link, fasp_expanded.star stores the subtomogram averaging results after D3 symmetry expansion.
We can also reconstruct the dynamics for FAS, using only 221 particles!
https://github.com/user-attachments/assets/806c518c-427d-41c9-905d-17b18fba8922
Getting Started <a name="setup"></a>
After cloning the repository, to run OPUS-ET, you need to have an environment with pytorch installed and a machine with GPUs. The recommended hardware configuration is a machine with 4 V100 GPUs. You can create the conda environment for OPUS-ET using the environment file in the source folder by executing
conda env create --name opuset -f environment.yml
This will create an environment with cuda 11.3 and pytorch 1.11.0. There are also other environment files for choosing. OPUS-ET has several different training scripts implementing different parallelisms. dsd train_tomo implements a legacy data parallel training. cryodrgn.commands.train_tomo_dist implements distributed data parallel training using torch.distributed. dsd train_tomo_hvd implements distributed data parallel training using horovod, which should have horovod installed according to the tutorial https://github.com/alncat/opusTomo/wiki/horovod-installation. After the environment is sucessfully created, you can then activate it and install OPUS-ET within the environment.
conda activate opuset
You can then install OPUS-ET by changing to the directory with cloned repository, and execute
pip install -e .
OPUS-ET can be kept up to date by
git pull
A quick tutorial on small dataset can be found at https://github.com/alncat/opusTomo/wiki/Convert-M's-2D-particle-image-series-to-subtomograms-for-training-OPUS%E2%80%90ET.
Usage Example:
In overall, the commands for training in OPUS-ET can be invoked by calling
dsd commandx ...
while the commands for result analysis can be accessed by calling
dsdsh commandx ...
More information about each argument of the command can be displayed using
dsd commandx -h
or
dsdsh commandx -h
Data Preparation for OPUS-ET Using dsdsh prepare:
There is a command dsdsh prepare for data preparation. Under the hood, dsdsh prepare points to the prepare.sh inside analysis_scripts. Suppose the version of Relion star file below 3.0, the data preparation can be done by,
dsdsh prepare /work/consensus_data.star 236 2.1
$1
