Carbonara
C++ package that provides tools for correcting structural predictions of proteins (eg. from X-Ray Crystallography or AlphaFold) using X-ray small-angle scattering (SAXS) in solution
Install / Use
/learn @Prior-Lab-Durham-University/CarbonaraREADME
Carbonara
Carbonara bridges the gap between crystal-like and solution-state conformations by efficiently refining protein structures using experimental SAXS (Small Angle X-ray Scattering) data. Starting from AI-predicted models or crystallographic structures, Carbonara rapidly explores conformational space to identify physiologically relevant solution-state conformations. The method can incorporate additional experimental constraints such as disulfide bonds, NMR distance measurements, or FRET data to further guide the refinement process.
<p align="center"> <img src="figures/method_overview_arrows.png" alt="Method Overview" width="600"/> </p>Schematic representation of the Carbonara refinement pipeline. The workflow proceeds from an initial structure (a) with identification of flexible regions (b), conformational sampling guided by SAXS data constraints (c), model selection based on optimal fit (d), and finally all-atom reconstruction (e) for downstream applications.
Building with CMake
To build the project using CMake, follow these steps:
- Open a terminal and make sure you have CMake installed on your system (version 3.10 or higher is recommended)
cmake -version
- Navigate to the carbonara root directory:
cd path/to/carbonara
- Inside the carbonara directory, create a build directory and navigate into it:
mkdir build
cd build
- Generate the build files:
cmake ..
- Build the project:
make
Reproducing structures refined in the paper
To reproduce the refinement of the two structures presented in the paper, first ensure you are located in /path/to/carbonara then run the following:
human SMARCAL1
sh RunMe_humanSMARCAL1.sh
ChiLob7/4 IgG2
sh RunMe_C239S.sh
Using Carbonara for new structures
To refine protein structure predictions with your own SAXS data, you'll need:
- A PDB starting model (AlphaFold or crystal structure recommended)
- SAXS experimental data in Å units with three columns: q, I, and I error
Setting up the Python environment
# Create a new conda environment
conda create -n carbonara_py python=3.10
conda activate carbonara_py
# Install required packages
pip install pandas
pip install numpy
pip install cython
pip install tqdm
pip install mdtraj
pip install biobox
pip install plotly
Setting up the RunMe for a monomer:
python setup_carbonara.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName
Or, if you trust all the default settings the following will run the fitting script atomatically
run_carbonara_oneshot.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName
Setting up (or oneshot run) the RunMe for a multimer to allow rotations:
python setup_carbonara.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName --rotation
run_carbonara_oneshot.py --pdb path/to/pdb --saxs path/to/saxs --name ProteinName --rotation
If the user has a pae file and wants to use its uncertainties to specify the flexibility (should be a .json or .npy) (can also have rotation or not if its a monomer)
python setup_carbonara.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation
run_carbonara_oneshot.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation
If the user expects the molecule to occupy multiple states in solution, or suspects significant variation in Rg, they can run mixture refinements e.g.
python setup_carbonara.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation --mixture_n 2 --max_mixture_combos 10
run_carbonara_oneshot.py -p path/to/pdb -s path/to/saxs -f path/to/pae --name ProteinName --alphaFoldFlex --rotation --mixture_n 2 --max_mixture_combos 10
# Optional flags for customising refinement
--fit_n_times INT Number of times to run the fit (default: 20), i.e the batch size of the proposed seeding
--min_q FLOAT Minimum q-value (default: 0.01)
--max_q FLOAT Maximum q-value (default: 0.2) - NOTE YOU CANNOT GO HIGHER THAN 0.2.
--max_fit_steps INT Maximum number of fitting steps (default: 10000) 10000 might take of order a day, 1000 a few hours
--pairedQ Use paired predictions
--rotation Apply affine rotations
--alphaFoldFlex Use a pae prediction to specify the flexibility of the molecule
--pae_flex_threshold Alter the default pae flexibility threshold (above which linkers are considered open for variation)- Default 16, increase to be more permissive.
--mixture_n Number of structures to consider in a single refinement, default 1, if you are unsure but suspect variation 2/3 will find significant strucutal variability
--max_mixture_combos Number of mixture combinations to try (e.g for 2 {0,1},{0.1,0.9},{0.2,0.8} e.t.c., default is 30, recommend 10 for 2, 15 for 3 e.t.c (only meaninful if mixture_n>1)
Then (if not using the oneshot command) run:
sh RunMe_*ProteinName*.sh
Colab implementation to facilitate specialised setup
Carbonara’s key strength is its flexibility: users can specify as little or as much of the structure to vary, enforce rigid-body motions of subdomains, and apply a wide range of distance constraints. We strongly recommend tailoring the fitting and constraint parameters to reflect prior structural knowledge, as each system is unique. While the “out-of-the-box” one-shot workflow can yield informative results, careful refinement of these parameters can substantially improve both the quality of the fit and the physical realism of the resulting models.
To aid the user in making these decisions a Colab implementation of the setup which features graphical interactivity and a guided walkthrough of the setup. The follwing are basic versions for both monomer and multimer cases.
⚠️ These notebooks are shared in view-only mode.
To use them, click “Copy to Drive” at the top of the Colab page.
This will create your own editable copy in your Google Drive.
You can then run the code directly in Colab, or download the fitting folders and scripts if you prefer to work locally.
Citation
If you use Carbonara in your research, please cite our preprint!
@article{carbonara2025,
title={Carbonara: A Rapid Method for SAXS-Based Refinement of Protein Structures},
author={McKeown, J. and Bale, A. and Brown, C. and Fisher, H. and Rambo, R. and Essex, J. and Degiacomi, M. and Prior, C.},
journal={ResearchSquare},
year={2025},
doi={10.21203/rs.3.rs-6447099/v1},
url={https://doi.org/10.21203/rs.3.rs-6447099/v1}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

