Premessa
R package for pre-processing of mass and flow cytometry data
Install / Use
/learn @ParkerICI/PremessaREADME
premessa
premessa is an R package for pre-processing of flow and mass cytometry data, that includes panel editing/renaming for FCS files, bead-based normalization and debarcoding.
Copyright 2016. Parker Institute for Cancer Immunotherapy
---> Make sure to have a backup copy of your data before you use the software! <---
New in version 0.3.0:
- Added UI for file concatenation under the normalizer GUI
- Much faster debarcoding. Note that for the purpose of debarcoder plotting, data will now be downsampled to 100000 events. This means that absolute cell numbers in the plots will not reflect the absolute cell numbers in the final data (but the ratios and trends will be correct). The final debarcoded data will always include all events
Installation
Install required R packages
You need to install the devtools package, available from CRAN, and the flowCore package from Bioconductor. The rest of the dependencies for premessa will be automatically installed
devtools
Open an R session, type the following command and select a CRAN mirror when prompted.
install.packages("devtools")
flowCore
Open an R session and type the following commands
source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")
Install premessa
Start an R session and type the following commands
library(devtools)
install_github("ParkerICI/premessa")
This will install the premessa R package together with all the required dependencies.
Note: the latest version of devtools seems to be occasionally having problems installing dependencies on Windows. If the installation of premessa fails due to a missing package, please install the offending packages manually, using the R install.packages function
Usage
The software allows you to perform four operations:
For each operation there is a separate GUI, and an associated set of R functions that can be called wihtout using the GUI, if desired
Panel editing and renaming
premessa includes a component for editing and renaming the panel of a set of FCS files. This is useful when you need to harmonize panels across a number of files, so that they can be prepared for downstream analysis (most analysis tools expect files that are part of the same analysis to have identical panels).
A few warnings. premessa is opinionated in the way it handles the information in the FCS panels, and is specifically tailored to handle the most common use case. The FCS file specification is problematic in a lot of ways and most instrument and analysis software packages do not use or interpret the information correctly anyways. There are 3 ways to refer to a channel in an FCS file, by number (e.g. the order in which they appear in the file), by name (e.g. Dy161Di ) and by description (e.g. CD3). premessa uses the name as unique channel identifier for two reasons:
- it is guaranteed to be unique in a valid FCS file
- it minimizes the risk of confusion when matching channels between multiple FCS files, as it corresponds to the intuitive notion of matching channels based on their identity instead of their ordering.
The consequence of this choice is that the ordering of the channels is not preserved during the processing. Also at present premessa only preseves the name and description parameter keywords (i.e. $PnN and $PnS). All other parameter keywords (e.g. $PnG, $PnL, $PnO etc.) are discarded. Most of these keywords are used incorrectly anyways, but please feel free to open an issue if this is impacting your workflow.
Starting the GUI and selecting the working directory
You can start the panel editor GUI by typing the following commands in your R session
library(premessa)
paneleditor_GUI()
This will open a new web browser window, which is used for displaying the GUI. Upon starting, a file selection window will also appear from your R session. You should use this window to navigate to the directory containing the data you want to analyze, and select any file in that directory. The directory itself will then become the working directory for the software.
To stop the software simply hit the "ESC" key in your R session.
Usage
Once you have selected the working directory, the software will extract the panel information from all the FCS files contained in the directory. This information is then displayed in a table, where each row corresponds to a different parameter name ($PnN keyword), indicated by the row names (leftmost column), and each column corresponds to a different file, indicated in the column header. Each cell represents the description string ($PnS keyword) of a specific parameter in a given file. If a parameter is missing from a file, the word absent is displayed in the corresponding cell, which will be colored orange (note that this means that absent cannot be a valid parameter name)
Whatever is written in the table when the Process files button is pressed, represents what the parameters will be renamed to. In other words the table represents the current state of the files, and you have to edit the individual cells as necessary to reflect the desired final state of the files. You can use the same shortcuts you use in Excel to facilitate the editing process (e.g. shift-click to select multiple rows or columns, ctrl-C and ctrl-V for copy and paste respectively, etc.). However be careful that pressing ctrl-Z (the conventional undo shortcut) will undo all you changes
The table begins with three special columns:
- Remove: if the box is checked the corresponding parameter is removed from all the files, and the row is grayed out
- Parameter: this column represent the parameter name ($PnN keyword). Initially it will be identical to the first column of row names, but this column is editable. You can edit this column if you want to change the parameter names in the output files.
- Most common: this column indicates what is the most common description value for that parameter, across all the files under analysis (i.e. the most common string across the row). Cells whose value differs from the value indicated in this column are displayed with a light pink background.
The table columns are sorted by the number of problematic columns, i.e. by the number of pink and orange cells in the column. The first three columns of the table are fixed and always visible when you scroll the table horizontally. Please note that the browser included with the current vesion of RStudio seems to have a problem where the column headers do not scroll correctly. If that is the case, open the application in a regular web browser, by click on the "open browser" button in the top right corner of the RStudio browser.
Two controls are located at the top of the table
- Output folder name: a text box where you can input the name of the output folder. If this folder does not exist, it will be created as a sub-folder of the current working directory.
- Process files: this button will start file processing. A file will be created in the output folder, with the same name as the original input file. If a file of the same name exists already, it will be overwritten. No change is made to the original files.
FCS file concatenation
The premessa package contains a simple function for concatenating multiple FCS files together. This is useful in case the acquisition of a single samples has been split across multiple files. The function is called concatenate_fcs_files and its documentation can be acessed directly from R. Note that this function assumes that the files are all identical panel-wise, i.e. they have the same parameter names and descriptions. Please use the panel editor before concatenation if that is not the case.
Bead-based normalization
The idea behind the method is described in this publication. This software represents an R re-implementation of the original normalization software developed for Matlab.
Sample data for testing is available here (only download the FCS in the top level directory, not the contents of the beads and normed sub-folders).
Inputs and outputs
The normalization workflow involves the following steps:
- Beads identification through gating
- Data normalization
- Beads removal (optional)
Assuming the working directory is called working_directory and contains two FCS files called A.fcs and B.fcs, at the end of the workflow the following directory structure and output files will be generated
working_directory
|--- A.fcs
|--- B.fcs
|--- normed
|--- A_normalized.fcs
|--- B_normalized.fcs
|--- beads_before_and_after.pdf
|--- beads_gates.json
|--- beads_vs_time
|--- A.pdf
|--- B.pdf
|--- beads_removed
|--- A_normalized_beadsremoved.fcs
|--- B_normalized_beadsremoved.fcs
|--- removed_events
|--- A_normalized_removedEvents.fcs
|--- B_normalized_removedEvents.fcs
|--- beads
|--- A_beads.fcs
|--- B_beads.fcs
- A_normalized.fcs: contains the normalized data, with an added parameter called beadDist representing the square root of the Mahalanobis distance of each event from the centroid of the beads population
- beads_before_and_after.pdf: a plot of the median intensities of the beads channels before and after normalization. This plot contains a single median value per sample. Therefore it will not be informative if you are normalizing a single sample
- beads_gates.json: this file contains the gates that were used to iden
