SkillAgentSearch skills...

Autoprot

The autoprot pipeline allows for absolute quantification of proteins from raw mass-spectrometry files in an automated manner.

Install / Use

/learn @biosustain/Autoprot
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

======== autoprot

.. image:: https://img.shields.io/badge/License-GPLv3-blue.svg :target: https://www.gnu.org/licenses/gpl-3.0 :alt: GNU General Public License 3.0

.. image:: https://img.shields.io/badge/operating%20system-Windows-orange :target: https://www.microsoft.com/en-us/windows :alt: Windows

.. image:: https://img.shields.io/github/last-commit/biosustain/autoprot :target: https://github.com/biosustain/autoprot :alt: Last commit

|

The autoprot pipeline allows for absolute quantification of proteins from raw mass spectrometry (MS) files in an automated manner. The pipeline covers data analysis from both DIA and DDA methods, where a fully open-source option is available for DIA methods. Raw data from labelled, label-free and standard-free approaches can be analysed with the pipeline. The normalisation of peptide intensities into protein intensities is performed with seven different algorithms to identify the optimal algorithm for the current experiment. The incorporated algorithms are Top3 (Silva et al., 2006 <https://www.sciencedirect.com/science/article/pii/S1535947620315127>), Top all (Silva et al., 2006 <https://www.sciencedirect.com/science/article/pii/S1535947620315127>), iBAQ (Schwanhausser et al., 2011 <https://www.nature.com/articles/nature10098>), APEX (Lu et al., 2007 <https://www.nature.com/articles/nbt1270>), NSAF (Zybailov et al., 2006 <https://pubs.acs.org/doi/full/10.1021/pr060161n>), LFAQ (Chang et al., 2019 <https://pubs.acs.org/doi/full/10.1021/acs.analchem.8b03267>), and xTop (Mori et al., 2021 <https://www.embopress.org/doi/full/10.15252/msb.20209536>_).

Install

The required files can be downloaded from this GitHub repository with the following command:

::

git clone git@github.com:biosustain/autoprot.git

Due to the many available options, the autoprot pipeline depends on a number of different software and packages. A list of all dependencies and their corresponding, tested version is provided below. The autoprot.ps1 script and multiple other scripts or executables have to be added to the PATH variable for the autoprot pipeline to work properly. While file paths can be added to the PATH variable through the command line <https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_environment_variables?view=powershell-7.2>, on Windows one can also add to the PATH variable through the graphical user interface <https://docs.oracle.com/en/database/oracle/machine-learning/oml4r/1.5.1/oread/creating-and-modifying-environment-variables-on-windows.html#GUID-DD6F9982-60D5-48F6-8270-A27EC53807D0> (GUI).

To test if the autoprot pipeline is set up properly, the files in Examples\Input can be used in combination with raw MS files <https://www.ebi.ac.uk/pride/archive/projects/PXD043377>_ of the standard-free DIA analysis (place the 9 .raw files in the Examples\Input folder first) for a test run with the following command:

::

autoprot.ps1 -osDIA -mode "directDIA" -approach "free" -InputDir "$PSScriptRoot\..\Examples\Input" -ExpName "test_run" -fasta "$PSScriptRoot\..\Examples\Input\URF_UP000000625_E_coli.fasta" -totalProt "$PSScriptRoot\..\Examples\Input\CPD_example.csv"

The output files can be verified with the files in Examples\Output.

Dependencies ^^^^^^^^^^^^

=================== ====================== ============ Name Version Source =================== ====================== ============ PowerShell 7 7.2.4 https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.2#installing-the-msi-package (Windows operating system has PowerShell 5.1 as default, however PowerShell 7.2 (or higher) is required alongside the default, so that additional functions can be accessed. The whole pipeline runs on 7.2 or up.) Python 3.8.8 (or higher) https://www.anaconda.com/ (Including numpy==1.20.1, pandas==1.2.4, statsmodels==0.12.2, matplotlib==3.3.4, Biopython==1.78. Add location of python.exe to PATH variable.) Spectronaut 17 (17.3.230224.55965) https://biognosys.com/software/spectronaut/ (Commercially available. Add location of spectronaut.exe to PATH variable.) DIA-NN 1.8 https://github.com/vdemichev/DiaNN (Open source.) Proteome Discoverer 2.4.1.15 https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html (Commercially available. Not actually part of the pipeline, since no command line tool is available.) R (Rscript) 4.1.1 (or higher) https://cran.r-project.org/bin/windows/base/ (Add location of rscript.exe to PATH variable.) aLFQ 1.3.5 https://github.com/aLFQ/aLFQ (Add package to R.) xTop 1.2 https://gitlab.com/mm87/xtop (Add location of xTop_pipeline.py to PATH variable.) LFAQ 1.0.0 https://github.com/LFAQ/LFAQ (Add location of LFAQ executables to PATH variable.) =================== ====================== ============

Usage

The autoprot.ps1 script can be executed in PowerShell 7 (when added to the PATH variable) as follows:

::

autoprot.ps1 [args]

To access the autoprot help from the command line in PowerShell 7:

::

Get-Help autoprot.ps1 -Full

When the autoprot.ps1 script is located on a drive with restricted access, e.g. a network drive, and cannot be executed, the following command can provide access to execute the script:

::

Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process

The available arguments are:

-osDIA [flag] enables the open-source option for DIA analysis, which uses DIA-NN instead of Spectronaut. -mode [string] mandatory specify the acquisition mode as "DDA", "DIA" or "directDIA". -approach [string] mandatory specify the quantification approach as "label", "unlabel" or "free". -InputDir [directory] mandatory specify the input directory containing all input files with raw MS spectra. The output directory will be located in the input directory after the run. -ExpName [string] mandatory specify the name of the experiment. -fasta [file] mandatory specify the FASTA file with the proteome sequences. -totalProt [file] mandatory specify the file with the cellular protein density values for each sample. -DDAresultsFile [file] (mandatory for "DDA" mode) specify the file with the Proteome Discoverer Peptide Groups results. -SpecLib [file] (mandatory for "DIA" mode) specify the file with the spectral library for the "DIA" mode. -BGSfasta [file] (mandatory for "directDIA" mode with Spectronaut) specify the FASTA file in .BGSfasta format, which is required for the "directDIA" mode using Spectronaut (commercial). -ISconc [file] (mandatory for "label" and "unlabel" approaches) specify the file with the absolute concentrations of each standard peptide ("label" approach) or protein ("unlabel" approach).

Specific input data

Ensure that the FASTA file with the proteome sequences follows the official UniProt configuration for the headers. An example FASTA file can be found in Examples\Input\URF_UP000000625_E_coli.fasta.

All workflows in DIA and directDIA mode can be initialised from .RAW files (Thermo Fisher Scientific instrument specific - please open an issue if another type is required in combination with Spectronaut) using either Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland) or DIA-NN <https://github.com/vdemichev/DiaNN>_ (open source; Demichev et al., 2019 <https://www.nature.com/articles/s41592-019-0638-x>). Any workflow in DDA mode can be initialised from the PeptideGroups.csv output file of Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html> (Thermo Fisher Scientific, Waltham, MA, USA). How to get the PeptideGroups.csv file with Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html>_ results: Open the .PDRESULTS file of the study in Proteome Discoverer <https://www.thermofisher.com/dk/en/home/industrial/mass-spectrometry/liquid-chromatography-mass-spectrometry-lc-ms/lc-ms-software/multi-omics-data-analysis/proteome-discoverer-software.html>_, click on "File" -> "Export" -> "To Microsoft Excel", select "Peptide Groups" from the drop-down menu for level 1 and click on "Export". Open the resulting file in Microsoft Excel and save as a .CSV file with the name PeptideGroups.

For a workflow in directDIA mode using Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland), a BGSfasta version of the fasta file is required. This BGSfasta version can be obtained by loading the fasta file with the proteome sequences in Spectronaut <https://biognosys.com/software/spectronaut/>_ (commercial; Biognosys AG, Schlieren, Switzerland) as a protein database. Then, the BGSfasta version of the fasta file should be in the folder $HOME\Databases\Spectronaut\.

The autoprot pipeline has two custom input files which are described below.

Cellular protein density ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The table with cellular protein density for each sample should have the following headers: Sample [string] with the name of each sample which should be the same as the names of the .RAW files and CPD [float] with the c

Related Skills

View on GitHub
GitHub Stars4
CategoryProduct
Updated10mo ago
Forks4

Languages

PowerShell

Security Score

77/100

Audited on May 24, 2025

No findings