Cryodrgn

Neural networks for cryo-EM reconstruction

Generate Convert Improve

Install / Use

/learn @ml-struct-bio/Cryodrgn

About this skill

Quality Score

0/100

README

:snowflake::dragon: cryoDRGN: Deep Reconstructing Generative Networks for cryo-EM and cryo-ET heterogeneous reconstruction

CryoDRGN is a neural network based algorithm for heterogeneous cryo-EM reconstruction. In particular, the method models a continuous distribution over 3D structures by using a neural network based representation for the volume.

Documentation

The latest documentation for cryoDRGN is available in our user guide, including an overview and walkthrough of cryoDRGN installation, training and analysis. A brief quick start is provided below.

For any feedback, questions, or bugs, please file a Github issue or start a Github discussion.

Updates in Version 4.2.x

[NEW] cryoDRGN-AI ab initio reconstruction method integrated into cryoDRGN as cryodrgn abinit
- former ab-initio reconstruction methods are deprecated as cryodrgn abinit_het_old and cryodrgn abinit_homo_old
- cryodrgn analyze, landscape, etc. now support cryoDRGN-AI models as well as the previous cryoDRGN models
more memory-efficient ab initio reconstruction
support for Python 3.13 and PyTorch 2.9; PyTorch <2.0 is no longer supported

A full list of cryoDRGN version updates can be found at our release notes.

Installation

cryodrgn may be installed via pip, and we recommend installing cryodrgn in a clean conda environment. Our package is compatible with Python versions 3.10 through 3.13; we recommend using the latest available Python version:

# Create and activate conda environment
(base) $ conda create --name cryodrgn python=3.13
(cryodrgn) $ conda activate cryodrgn

# install cryodrgn
(cryodrgn) $ pip install cryodrgn

You can alternatively install a newer, less stable, development version of cryodrgn using our beta release channel:

(cryodrgn) $ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ cryodrgn --pre

More installation instructions are found in the documentation.

Quickstart: heterogeneous reconstruction with consensus poses

1. Preprocess image stack

First resize your particle images using the cryodrgn downsample command:

<details><summary><code>$ cryodrgn downsample -h</code></summary>

usage: cryodrgn downsample [-h] -D D -o MRCS [--is-vol] [--chunk CHUNK]
                           [--datadir DATADIR]
                           mrcs

Downsample an image stack or volume by clipping fourier frequencies

positional arguments:
  mrcs               Input images or volume (.mrc, .mrcs, .star, .cs, or .txt)

optional arguments:
  -h, --help         show this help message and exit
  -D D               New box size in pixels, must be even
  -o MRCS            Output image stack (.mrcs) or volume (.mrc)
  --is-vol           Flag if input .mrc is a volume
  --chunk CHUNK      Chunksize (in # of images) to split particle stack when
                     saving
  --relion31         Flag for relion3.1 star format
  --datadir DATADIR  Optionally provide path to input .mrcs if loading from a
                     .star or .cs file
  --max-threads MAX_THREADS
                     Maximum number of CPU cores for parallelization (default: 16)
  --ind PKL          Filter image stack by these indices

</details>

We recommend first downsampling images to 128x128 since larger images can take much longer to train:

$ cryodrgn downsample [input particle stack] -D 128 -o particles.128.mrcs

The maximum recommended image size is D=256, so we also recommend downsampling your images to D=256 if your images are larger than 256x256:

$ cryodrgn downsample [input particle stack] -D 256 -o particles.256.mrcs

The input file format can be a single .mrcs file, a .txt file containing paths to multiple .mrcs files, a RELION .star file, or a cryoSPARC .cs file. For the latter two options, if the relative paths to the .mrcs are broken, the argument --datadir can be used to supply the path to where the .mrcs files are located.

If there are memory issues with downsampling large particle stacks, add the --chunk 10000 argument to save images as separate .mrcs files of 10k images.

2. Parse image poses from a consensus homogeneous reconstruction

CryoDRGN expects image poses to be stored in a binary pickle format (.pkl). Use the parse_pose_star or parse_pose_csparc command to extract the poses from a .star file or a .cs file, respectively.

Example usage to parse image poses from a RELION 3.1 starfile:

$ cryodrgn parse_pose_star particles.star -o pose.pkl

Example usage to parse image poses from a cryoSPARC homogeneous refinement particles.cs file:

$ cryodrgn parse_pose_csparc cryosparc_P27_J3_005_particles.cs -o pose.pkl -D 300

Note: The -D argument should be the box size of the consensus refinement (and not the downsampled images from step 1) so that the units for translation shifts are parsed correctly.

3. Parse CTF parameters from a .star/.cs file

CryoDRGN expects CTF parameters to be stored in a binary pickle format (.pkl). Use the parse_ctf_star or parse_ctf_csparc command to extract the relevant CTF parameters from a .star file or a .cs file, respectively.

Example usage for a .star file:

$ cryodrgn parse_ctf_star particles.star -o ctf.pkl

If the box size and Angstrom/pixel values are not included in the .star file under fields _rlnImageSize and _rlnImagePixelSize respectively, the -D and --Apix arguments to parse_ctf_star should be used instead to provide the original parameters of the input file (before any downsampling):

$ cryodrgn parse_ctf_star particles.star -D 300 --Apix 1.03 -o ctf.pkl

Example usage for a .cs file:

$ cryodrgn parse_ctf_csparc cryosparc_P27_J3_005_particles.cs -o ctf.pkl

4. (Optional) Test pose/CTF parameters parsing

Next, test that pose and CTF parameters were parsed correctly using the voxel-based backprojection script. The goal is to quickly verify that there are no major problems with the extracted values and that the output structure resembles the structure from the consensus reconstruction before training.

Example usage:

$ cryodrgn backproject_voxel projections.128.mrcs \
        --poses pose.pkl \
        --ctf ctf.pkl \
        -o backproject.128 \
        --first 10000

The output structure backproject.128/backproject.mrc will not be identical to the consensus reconstruction because we only used the first 10k particles images for quicker results. If the structure is too noisy to interpret, you can use more images with --first 25000 or use the entire particle stack (by leaving off the --first flag).

Note: If the volume does not resemble your structure, you may need to use the flag --uninvert-data. This flips the data sign (e.g. light-on-dark or dark-on-light), which may be needed depending on the convention used in upstream processing tools.

5. Running cryoDRGN heterogeneous reconstruction

When the input images (.mrcs), poses (.pkl), and CTF parameters (.pkl) have been prepared, a cryoDRGN model can be trained with following command:

<details><summary><code>$ cryodrgn train_vae -h</code></summary>

usage: cryodrgn train_vae [-h] -o OUTDIR --zdim ZDIM --poses POSES [--ctf pkl]
                          [--load WEIGHTS.PKL] [--checkpoint CHECKPOINT]
                          [--log-interval LOG_INTERVAL] [-v] [--seed SEED]
                          [--ind PKL] [--uninvert-data] [--no-window]
                          [--window-r WINDOW_R] [--datadir DATADIR] [--lazy]
                          [--max-threads MAX_THREADS]
                          [--tilt TILT] [--tilt-deg TILT_DEG] [-n NUM_EPOCHS]
                          [-b BATCH_SIZE] [--wd WD] [--lr LR] [--beta BETA]
                          [--beta-control BETA_CONTROL] [--norm NORM NORM]
                          [--no-amp] [--multigpu] [--do-pose-sgd]
                          [--pretrain PRETRAIN] [--emb-type {s2s2,quat}]
                          [--pose-lr POSE_LR] [--enc-layers QLAYERS]
                          [--enc-dim QDIM]
                          [--encode-mode {conv,resid,mlp,tilt}]
                          [--enc-mask ENC_MASK] [--use-real]
                          [--dec-layers PLAYERS] [--dec-dim PDIM]
                          [--pe-type {geom_ft,geom_full,geom_lowf,geom_nohighf,linear_lowf,gaussian,none}]
                          [--feat-sigma FEAT_SIGMA] [--pe-dim PE_DIM]
                          [--domain {hartley,fourier}]
                          [--activation {relu,leaky_relu}]
                          particles

Train a VAE for heterogeneous reconstruction with known pose

positional arguments:
  particles             Input particles (.mrcs, .star, .cs, or .txt)

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        Output directory to save model
  --zdim ZDIM           Dimension of late

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。