Autoprot
The autoprot pipeline allows for absolute quantification of proteins from raw mass-spectrometry files in an automated manner.
Install / Use
/learn @biosustain/AutoprotREADME
======== autoprot
.. image:: https://img.shields.io/badge/License-GPLv3-blue.svg :target: https://www.gnu.org/licenses/gpl-3.0 :alt: GNU General Public License 3.0
.. image:: https://img.shields.io/badge/operating%20system-Windows-orange :target: https://www.microsoft.com/en-us/windows :alt: Windows
.. image:: https://img.shields.io/github/last-commit/biosustain/autoprot :target: https://github.com/biosustain/autoprot :alt: Last commit
|
The autoprot pipeline allows for absolute quantification of proteins from raw mass spectrometry (MS) files in an automated manner.
The pipeline covers data analysis from both DIA and DDA methods, where a fully open-source option is available for DIA methods.
Raw data from labelled, label-free and standard-free approaches can be analysed with the pipeline.
The normalisation of peptide intensities into protein intensities is performed with seven different algorithms to identify the optimal algorithm for the current experiment.
The incorporated algorithms are Top3 (Silva et al., 2006 <https://www.sciencedirect.com/science/article/pii/S1535947620315127>),
Top all (Silva et al., 2006 <https://www.sciencedirect.com/science/article/pii/S1535947620315127>),
iBAQ (Schwanhausser et al., 2011 <https://www.nature.com/articles/nature10098>),
APEX (Lu et al., 2007 <https://www.nature.com/articles/nbt1270>),
NSAF (Zybailov et al., 2006 <https://pubs.acs.org/doi/full/10.1021/pr060161n>),
LFAQ (Chang et al., 2019 <https://pubs.acs.org/doi/full/10.1021/acs.analchem.8b03267>),
and xTop (Mori et al., 2021 <https://www.embopress.org/doi/full/10.15252/msb.20209536>_).
Install
The required files can be downloaded from this GitHub repository with the following command:
::
git clone git@github.com:biosustain/autoprot.git
Due to the many available options, the autoprot pipeline depends on a number of different software and packages.
A list of all dependencies and their corresponding, tested version is provided below.
The autoprot.ps1 script and multiple other scripts or executables have to be added to the PATH variable for the autoprot pipeline to work properly.
While file paths can be added to the PATH variable through the command line <https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_environment_variables?view=powershell-7.2>,
on Windows one can also add to the PATH variable through the graphical user interface <https://docs.oracle.com/en/database/oracle/machine-learning/oml4r/1.5.1/oread/creating-and-modifying-environment-variables-on-windows.html#GUID-DD6F9982-60D5-48F6-8270-A27EC53807D0> (GUI).
To test if the autoprot pipeline is set up properly, the files in Examples\Input can be used in combination with raw MS files <https://www.ebi.ac.uk/pride/archive/projects/PXD043377>_ of the standard-free DIA analysis (place the 9 .raw files in the Examples\Input folder first) for a test run with the following command:
::
autoprot.ps1 -osDIA -mode "directDIA" -approach "free" -InputDir "$PSScriptRoot\..\Examples\Input" -ExpName "test_run" -fasta "$PSScriptRoot\..\Examples\Input\URF_UP000000625_E_coli.fasta" -totalProt "$PSScriptRoot\..\Examples\Input\CPD_example.csv"
The output files can be verified with the files in Examples\Output.
Dependencies ^^^^^^^^^^^^
=================== ====================== ============ Name Version Source =================== ====================== ============ PowerShell 7 7.2.4 https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.2#installing-the-msi-package (Windows operating system has PowerShell 5.1 as default, however PowerShell 7.2 (or higher) is required alongside the default, so that additional functions can be accessed. The whole pipeline runs on 7.2 or up.) Python 3.8.8 (or higher) https://www.anaconda.com/ (Including numpy==1.20.1, pandas==1.2.4, statsmodels==0.12.2, matplotlib==3.3.4, Biopython==1.78. Add location of python.exe to PATH variable.) Spectronaut 17 (17.3.230224.55965) https://biognosys.com/software/spectronaut/ (Commercially available. Add location of spectronaut.exe to PATH variable.) DIA-NN 1.8 https://github.com/vdemichev/DiaNN (Open source.) Proteome Discoverer 2.4.1.15 https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html (Commercially available. Not actually part of the pipeline, since no command line tool is available.) R (Rscript) 4.1.1 (or higher) https://cran.r-project.org/bin/windows/base/ (Add location of rscript.exe to PATH variable.) aLFQ 1.3.5 https://github.com/aLFQ/aLFQ (Add package to R.) xTop 1.2 https://gitlab.com/mm87/xtop (Add location of xTop_pipeline.py to PATH variable.) LFAQ 1.0.0 https://github.com/LFAQ/LFAQ (Add location of LFAQ executables to PATH variable.) =================== ====================== ============
Usage
The autoprot.ps1 script can be executed in PowerShell 7 (when added to the PATH variable) as follows:
::
autoprot.ps1 [args]
To access the autoprot help from the command line in PowerShell 7:
::
Get-Help autoprot.ps1 -Full
When the autoprot.ps1 script is located on a drive with restricted access, e.g. a network drive, and cannot be executed, the following command can provide access to execute the script:
::
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
The available arguments are:
-osDIA [flag] enables the open-source option for DIA analysis, which uses DIA-NN instead of Spectronaut. -mode [string] mandatory specify the acquisition mode as "DDA", "DIA" or "directDIA". -approach [string] mandatory specify the quantification approach as "label", "unlabel" or "free". -InputDir [directory] mandatory specify the input directory containing all input files with raw MS spectra. The output directory will be located in the input directory after the run. -ExpName [string] mandatory specify the name of the experiment. -fasta [file] mandatory specify the FASTA file with the proteome sequences. -totalProt [file] mandatory specify the file with the cellular protein density values for each sample. -DDAresultsFile [file] (mandatory for "DDA" mode) specify the file with the Proteome Discoverer Peptide Groups results. -SpecLib [file] (mandatory for "DIA" mode) specify the file with the spectral library for the "DIA" mode. -BGSfasta [file] (mandatory for "directDIA" mode with Spectronaut) specify the FASTA file in .BGSfasta format, which is required for the "directDIA" mode using Spectronaut (commercial). -ISconc [file] (mandatory for "label" and "unlabel" approaches) specify the file with the absolute concentrations of each standard peptide ("label" approach) or protein ("unlabel" approach).
Specific input data
Ensure that the FASTA file with the proteome sequences follows the official UniProt configuration for the headers. An example FASTA file can be found in Examples\Input\URF_UP000000625_E_coli.fasta.
All workflows in DIA and directDIA mode can be initialised from .RAW files (Thermo Fisher Scientific instrument specific - please open an issue if another type is required in combination with Spectronaut)
using either Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland)
or DIA-NN <https://github.com/vdemichev/DiaNN>_ (open source; Demichev et al., 2019 <https://www.nature.com/articles/s41592-019-0638-x>).
Any workflow in DDA mode can be initialised from the PeptideGroups.csv output file of Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html> (Thermo Fisher Scientific, Waltham, MA, USA).
How to get the PeptideGroups.csv file with Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html>_ results:
Open the .PDRESULTS file of the study in Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html>_,
click on "File" -> "Export" -> "To Microsoft Excel", select "Peptide Groups" from the drop-down menu for level 1 and click on "Export".
Open the resulting file in Microsoft Excel and save as a .CSV file with the name PeptideGroups.
For a workflow in directDIA mode using Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland),
a BGSfasta version of the fasta file is required. This BGSfasta version can be obtained by loading the fasta file with the proteome sequences in Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland)
as a protein database. Then, the BGSfasta version of the fasta file should be in the folder $HOME\Databases\Spectronaut\.
The autoprot pipeline has two custom input files which are described below.
Cellular protein density ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The table with cellular protein density for each sample should have the following headers: Sample [string] with the name of each sample which should be the same as the names of the .RAW files and
CPD [float] with the c
Related Skills
ai-cmo
Collection of my Agent Skills and books.
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
38PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
