DISCO (DIffusion for Sequence-structure CO-design) is a multimodal generative model that simultaneously co-designs protein sequences and 3D structures, conditioned on and co-folded with arbitrary biomolecules — including small-molecule ligands, DNA, and RNA. Unlike sequential pipelines that first generate a backbone and then apply inverse folding, DISCO generates both modalities jointly, enabling sequence-based objectives to inform structure generation and vice versa.

DISCO achieves state-of-the-art in silico performance in generating binders for diverse biomolecular targets with fine-grained property control, performing best on 178/179 evaluated ligands, as well as DNA and RNA. Applied to new-to-nature catalysis, DISCO was conditioned solely on reactive intermediates — without pre-specifying catalytic residues or relying on template scaffolds — to design diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B–H and C(sp³)–H insertions, with top activities exceeding those of engineered enzymes. Random mutagenesis of a selected design further yielded a fourfold activity gain, indicating that the designed enzymes are evolvable.

Quick Start

Install — see Installation below.
Set up prerequisites — (optionally) configure CUTLASS (see below).
Run:

python runner/inference.py \
  experiment=designable \
  input_json_path=input_jsons/unconditional_config.json \
  seeds=\[0,1,2,3,4\]

If a run is interrupted, simply rerun the same command — DISCO automatically skips samples that have already been generated.

Note: The first time you run inference, it may take some time before inference steps begin, as the pairformer kernels are compiled just in time.

📦 Installation

DISCO uses uv for dependency management (you may need to install uv first). To install:

AMD GPUs: DeepSpeed does not support AMD GPUs. If you are using an AMD GPU, remove the deepspeed dependency from pyproject.toml before running uv sync, and run with use_deepspeed_evo_attention=false.

uv sync

By default, uv sync installs PyTorch with its default backend. If you need a specific CUDA or CPU backend, uninstall torch and reinstall with the desired index URL. For example, for CUDA 12.4:

uv pip uninstall torch
uv pip install torch --torch-backend=cu124

To activate the environment run from the top-level of the repository:

source .venv/bin/activate

🔧 Prerequisites

CUTLASS (optional)

By default, DISCO uses DeepSpeed4Science EvoformerAttention for memory-efficient attention, which significantly reduces GPU memory usage and enables inference on longer sequences. This requires NVIDIA CUTLASS to be available on disk and a GPU with Ampere or newer architecture (e.g. A100, L40S, H100, H200, B100, B200).

To set it up, clone the CUTLASS repository and set the CUTLASS_PATH environment variable:

git clone https://github.com/NVIDIA/cutlass.git /path/to/cutlass
export CUTLASS_PATH=/path/to/cutlass

You can add CUTLASS_PATH to your shell profile so it persists across sessions. The attention kernels will be compiled the first time they are invoked.

If you prefer to skip the CUTLASS installation, disable DeepSpeed attention on the command line:

python runner/inference.py use_deepspeed_evo_attention=false ...

This falls back to a naive attention implementation that materializes the full attention matrix and uses substantially more GPU memory.

🚀 Running Inference

Inference is run through the Hydra-based runner:

python runner/inference.py \
  experiment=designable \
  input_json_path=input_jsons/your_config.json \
  seeds=\[$(seq -s "," 0 4)\]

Key command-line options

| Option | Description | |--------|-------------| | experiment= | Experiment preset (designable or diverse). See Experiment Presets below. | | input_json_path= | Path to the input JSON file describing what to generate. | | seeds= | List of random seeds, e.g. [0,1,2]. Each seed produces one sample per job in the input JSON, so the total number of generated samples equals len(seeds) * len(jobs). | | num_inference_seeds= | Alternative to seeds=: generates seeds [0, 1, ..., N-1]. For example, num_inference_seeds=100 produces 100 samples per job. | | effort= | Compute preset: fast (default) or max. We only recommend effort=fast for unconditional generation; for conditional generation (e.g. ligand- or DNA/RNA-conditioned) use effort=max. See Trading off quality for speed. | | dump_dir= | Output directory for generated structures. Defaults to ./output. |

Experiment presets

DISCO ships with two experiment presets that control the trade-off between designability and diversity:

designable — Uses entropy-adaptive temperature scaling and noisy guidance over both sequence and structure. This steers the model toward samples that are more likely to refold correctly under an external structure predictor, at the cost of reduced structural variety.
diverse — Disables noisy guidance and entropy-adaptive temperature. The model samples more freely from its learned distribution, producing greater structural variety at the cost of lower average designability.

Which preset to use depends on the task — see Reproducing Paper Experiments for guidance on which preset was used in each experiment.

Tip: cheaper designable runs. The designable preset uses noisy guidance, which increases the effective batch size of each forward pass and slows down inference. You can disable it while keeping the rest of the designable settings by adding sample_diffusion.noisy_guidance.enabled=false on the command line. This gives slightly lower designability scores but reduces compute costs, which can be useful for rapid prototyping or large-scale screening runs.

Trading off quality for speed

DISCO provides two effort presets that control the number of recycling cycles and diffusion steps:

| Preset | Diffusion steps | Recycling cycles | Description | |--------|:-:|:-:|-------------| | effort=fast | 100 | 2 | ~4x faster inference with only ~10% lower co-designability. Good for prototyping and large screening runs. This is the default. | | effort=max | 200 | 4 | Full quality used in the paper. |

⚠️ Important: We only recommend effort=fast for unconditional generation. For conditional generation (e.g. ligand- or DNA/RNA-conditioned), use effort=max for best results.

# Fast (default) — good for prototyping
python runner/inference.py \
  experiment=designable \
  input_json_path=input_jsons/your_config.json \
  seeds=\[0,1,2,3,4\]

# Max quality — reproducing paper results
python runner/inference.py \
  experiment=designable \
  effort=max \
  input_json_path=input_jsons/your_config.json \
  seeds=\[0,1,2,3,4\]

You can also override the individual parameters directly with model.N_cycle= and sample_diffusion.N_step=.

The figure above shows the trade-off between co-designability, structural diversity, and compute, measured using the designable preset with noisy guidance disabled. Beyond 2 cycles and 100 steps, returns diminish quickly.

Note: When benchmarking against DISCO, use effort=max to reproduce the full-quality results reported in the paper.

Output directory

Generated structures are saved under dump_dir with the following layout:

dump_dir/
  pdbs/
    <name>_sample_<seed>.pdb
    <name>_sample_<seed>_ligands.txt   # only if ligands are present
  sequences/
    <name>_sample_<seed>.txt
  ERR/
    <name>.txt                         # only for failed samples

Here <name> is the job name from the input JSON (e.g. length_200_heme_b) and <seed> is the random seed used for that sample.

You can override dump_dir on the command line:

python runner/inference.py dump_dir=/my/output/dir ...

By default it resolves to ./output relative to the working directory.

🔬 Reproducing Paper Experiments

The sections below walk through each class of experiment from the paper. We provide all input JSON files needed to reproduce the reported results. To make comparisons to DISCO easier, the raw generated samples and results for all in silico experiments are available on Hugging Face.

🧬 Unconditional protein generation

In the unconditional setting, DISCO receives no conditioning target and generates both a protein sequence and a 3D structure from scratch. We evaluate at

DISCO

Install / Use

README