SkillAgentSearch skills...

Carbonara

C++ package that provides tools for correcting structural predictions of proteins (eg. from X-Ray Crystallography or AlphaFold) using X-ray small-angle scattering (SAXS) in solution

Install / Use

/learn @Prior-Lab-Durham-University/Carbonara

README

Carbonara

Carbonara bridges the gap between crystal-like and solution-state conformations by efficiently refining protein structures using experimental SAXS (Small Angle X-ray Scattering) data. Starting from AI-predicted models or crystallographic structures, Carbonara rapidly explores conformational space to identify physiologically relevant solution-state conformations. The method can incorporate additional experimental constraints such as disulfide bonds, NMR distance measurements, or FRET data to further guide the refinement process.

<p align="center"> <img src="figures/method_overview_arrows.png" alt="Method Overview" width="600"/> </p>

Schematic representation of the Carbonara refinement pipeline. The workflow proceeds from an initial structure (a) with identification of flexible regions (b), conformational sampling guided by SAXS data constraints (c), model selection based on optimal fit (d), and finally all-atom reconstruction (e) for downstream applications.

Building with CMake

To build the project using CMake, follow these steps:

  1. Open a terminal and make sure you have CMake installed on your system (version 3.10 or higher is recommended)
cmake -version
  1. Navigate to the carbonara root directory:
cd path/to/carbonara
  1. Inside the carbonara directory, create a build directory and navigate into it:
mkdir build
cd build
  1. Generate the build files:
cmake ..
  1. Build the project:
make

Reproducing structures refined in the paper

To reproduce the refinement of the two structures presented in the paper, first ensure you are located in /path/to/carbonara then run the following:

human SMARCAL1

sh RunMe_humanSMARCAL1.sh

ChiLob7/4 IgG2

sh RunMe_C239S.sh

Using Carbonara for new structures

To refine protein structure predictions with your own SAXS data, you'll need:

  1. A PDB starting model (AlphaFold or crystal structure recommended)
  2. SAXS experimental data in Å units with three columns: q, I, and I error

Setting up the Python environment

# Create a new conda environment
conda create -n carbonara_py python=3.10
conda activate carbonara_py

# Install required packages
pip install pandas 
pip install numpy 
pip install cython 
pip install tqdm 
pip install mdtraj 
pip install biobox
pip install plotly

Setting up the RunMe for a monomer:

python setup_carbonara.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName 

Or, if you trust all the default settings the following will run the fitting script atomatically

run_carbonara_oneshot.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName 

Setting up (or oneshot run) the RunMe for a multimer to allow rotations:

python setup_carbonara.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName --rotation

run_carbonara_oneshot.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName --rotation

If the user has a pae file and wants to use its uncertainties to specify the flexibility (should be a .json or .npy) (can also have rotation or not if its a monomer)

python setup_carbonara.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation

run_carbonara_oneshot.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation

If the user expects the molecule to occupy multiple states in solution, or suspects significant variation in Rg, they can run mixture refinements e.g.

python setup_carbonara.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation --mixture_n 2 --max_mixture_combos 10

run_carbonara_oneshot.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation --mixture_n 2 --max_mixture_combos 10

# Optional flags for customising refinement
--fit_n_times INT     Number of times to run the fit (default: 20), i.e the batch size of the proposed seeding
--min_q FLOAT         Minimum q-value (default: 0.01)
--max_q FLOAT         Maximum q-value (default: 0.2)   - NOTE YOU CANNOT GO HIGHER THAN 0.2.
--max_fit_steps INT   Maximum number of fitting steps (default: 10000) 10000 might take of order a day, 1000 a few hours
--pairedQ             Use paired predictions
--rotation            Apply affine rotations
--alphaFoldFlex       Use a pae prediction to specify the flexibility of the molecule
--pae_flex_threshold  Alter the default pae flexibility threshold (above which linkers are considered open for variation)- Default 16, increase to be more permissive.
--mixture_n           Number of structures to consider in a single refinement, default 1, if you are unsure but suspect variation 2/3 will find significant strucutal variability
--max_mixture_combos  Number of mixture combinations to try (e.g for 2 {0,1},{0.1,0.9},{0.2,0.8} e.t.c., default is 30, recommend 10 for 2, 15 for 3 e.t.c (only meaninful if mixture_n>1)

Then (if not using the oneshot command) run:


sh RunMe_*ProteinName*.sh

Colab implementation to facilitate specialised setup

Carbonara’s key strength is its flexibility: users can specify as little or as much of the structure to vary, enforce rigid-body motions of subdomains, and apply a wide range of distance constraints. We strongly recommend tailoring the fitting and constraint parameters to reflect prior structural knowledge, as each system is unique. While the “out-of-the-box” one-shot workflow can yield informative results, careful refinement of these parameters can substantially improve both the quality of the fit and the physical realism of the resulting models.

To aid the user in making these decisions a Colab implementation of the setup which features graphical interactivity and a guided walkthrough of the setup. The follwing are basic versions for both monomer and multimer cases.

⚠️ These notebooks are shared in view-only mode.
To use them, click “Copy to Drive” at the top of the Colab page.
This will create your own editable copy in your Google Drive.
You can then run the code directly in Colab, or download the fitting folders and scripts if you prefer to work locally.

Citation

If you use Carbonara in your research, please cite our preprint!

@article{carbonara2025,
  title={Carbonara: A Rapid Method for SAXS-Based Refinement of Protein Structures},
  author={McKeown, J. and Bale, A. and Brown, C. and Fisher, H. and Rambo, R. and Essex, J. and Degiacomi, M. and Prior, C.},
  journal={ResearchSquare},
  year={2025},
  doi={10.21203/rs.3.rs-6447099/v1},
  url={https://doi.org/10.21203/rs.3.rs-6447099/v1}
}

Shield: CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

View on GitHub
GitHub Stars4
CategoryDesign
Updated7d ago
Forks1

Languages

Jupyter Notebook

Security Score

75/100

Audited on Mar 31, 2026

No findings