ADFLIP: All-atom inverse protein folding through discrete flow matching (ICML2025)

ADFLIP

Description

Implementation for "All-atom inverse protein folding through discrete flow matching" Link.

Environment Setup

conda create -n ADFLIP python=3.10 pip -y
conda activate ADFLIP
pip install -r requirements.txt
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
pip install hydra-core
conda install -c conda-forge -c bioconda mmseqs2

Training

To train ADFLIP from scratch:

conda activate ADFLIP
export PYTHONPATH=$PWD:$PYTHONPATH
python3 trainer.py --config_path config/train_v1.yaml

Training configuration (hyperparameters, data paths, wandb logging, etc.) can be modified in config/train_v1.yaml.

Usage

There are two main ways to sample sequences from a given input file:

Fixed-step sampling using a constant time step (dt):


# Fixed-step sampling
samples, logits = flow_model.sample(
    input_file,
    dt=0.2
)

Adaptive-step sampling based on model uncertainty (up to num_step, stops when confidence > threshold):

# Adaptive sampling
samples, logits = flow_model.adaptive_sample(
    input_file,
    num_step=8,
    threshold=0.9
)

The entire workflow for using ADFLIP can be found the file. It loads a checkpoint, processes a PDB file, runs sampling, and computes recovery rates:

Comments

Our codebase for discrete flow matching builds on Discrete Flow Models. Thanks for open-sourcing!

Citation

If you consider our codes and datasets useful, please cite:

@inproceedings{
      yi2025allatom,
      title={All-atom inverse protein folding through discrete flow matching},
      author={Kai Yi and Kiarash Jamali and Sjors HW Scheres},
      booktitle={Forty-second International Conference on Machine Learning},
      year={2025},
      url={https://openreview.net/forum?id=8tQdwSCJmA}
      }