premessa

premessa is an R package for pre-processing of flow and mass cytometry data, that includes panel editing/renaming for FCS files, bead-based normalization and debarcoding.

---> Make sure to have a backup copy of your data before you use the software! <---

New in version 0.3.0:

Added UI for file concatenation under the normalizer GUI
Much faster debarcoding. Note that for the purpose of debarcoder plotting, data will now be downsampled to 100000 events. This means that absolute cell numbers in the plots will not reflect the absolute cell numbers in the final data (but the ratios and trends will be correct). The final debarcoded data will always include all events

Installation

Install required R packages

You need to install the devtools package, available from CRAN, and the flowCore package from Bioconductor. The rest of the dependencies for premessa will be automatically installed

devtools

Open an R session, type the following command and select a CRAN mirror when prompted.

install.packages("devtools")

flowCore

Open an R session and type the following commands

source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")

Install premessa

Start an R session and type the following commands

library(devtools)
install_github("ParkerICI/premessa")

This will install the premessa R package together with all the required dependencies.

Note: the latest version of devtools seems to be occasionally having problems installing dependencies on Windows. If the installation of premessa fails due to a missing package, please install the offending packages manually, using the R install.packages function

Usage

The software allows you to perform four operations:

Panel editing and renaming
FCS file concatenation
Bead-based normalization
De-barcoding

For each operation there is a separate GUI, and an associated set of R functions that can be called wihtout using the GUI, if desired

Panel editing and renaming

premessa includes a component for editing and renaming the panel of a set of FCS files. This is useful when you need to harmonize panels across a number of files, so that they can be prepared for downstream analysis (most analysis tools expect files that are part of the same analysis to have identical panels).

A few warnings. premessa is opinionated in the way it handles the information in the FCS panels, and is specifically tailored to handle the most common use case. The FCS file specification is problematic in a lot of ways and most instrument and analysis software packages do not use or interpret the information correctly anyways. There are 3 ways to refer to a channel in an FCS file, by number (e.g. the order in which they appear in the file), by name (e.g. Dy161Di ) and by description (e.g. CD3). premessa uses the name as unique channel identifier for two reasons:

it is guaranteed to be unique in a valid FCS file
it minimizes the risk of confusion when matching channels between multiple FCS files, as it corresponds to the intuitive notion of matching channels based on their identity instead of their ordering.

The consequence of this choice is that the ordering of the channels is not preserved during the processing. Also at present premessa only preseves the name and description parameter keywords (i.e. $PnN and $PnS). All other parameter keywords (e.g. $PnG, $PnL, $PnO etc.) are discarded. Most of these keywords are used incorrectly anyways, but please feel free to open an issue if this is impacting your workflow.

Starting the GUI and selecting the working directory

You can start the panel editor GUI by typing the following commands in your R session

library(premessa)
paneleditor_GUI()

This will open a new web browser window, which is used for displaying the GUI. Upon starting, a file selection window will also appear from your R session. You should use this window to navigate to the directory containing the data you want to analyze, and select any file in that directory. The directory itself will then become the working directory for the software.

To stop the software simply hit the "ESC" key in your R session.

Usage

Once you have selected the working directory, the software will extract the panel information from all the FCS files contained in the directory. This information is then displayed in a table, where each row corresponds to a different parameter name ($PnN keyword), indicated by the row names (leftmost column), and each column corresponds to a different file, indicated in the column header. Each cell represents the description string ($PnS keyword) of a specific parameter in a given file. If a parameter is missing from a file, the word absent is displayed in the corresponding cell, which will be colored orange (note that this means that absent cannot be a valid parameter name)

Whatever is written in the table when the Process files button is pressed, represents what the parameters will be renamed to. In other words the table represents the current state of the files, and you have to edit the individual cells as necessary to reflect the desired final state of the files. You can use the same shortcuts you use in Excel to facilitate the editing process (e.g. shift-click to select multiple rows or columns, ctrl-C and ctrl-V for copy and paste respectively, etc.). However be careful that pressing ctrl-Z (the conventional undo shortcut) will undo all you changes

The table begins with three special columns:

Remove: if the box is checked the corresponding parameter is removed from all the files, and the row is grayed out
Parameter: this column represent the parameter name ($PnN keyword). Initially it will be identical to the first column of row names, but this column is editable. You can edit this column if you want to change the parameter names in the output files.
Most common: this column indicates what is the most common description value for that parameter, across all the files under analysis (i.e. the most common string across the row). Cells whose value differs from the value indicated in this column are displayed with a light pink background.

The table columns are sorted by the number of problematic columns, i.e. by the number of pink and orange cells in the column. The first three columns of the table are fixed and always visible when you scroll the table horizontally. Please note that the browser included with the current vesion of RStudio seems to have a problem where the column headers do not scroll correctly. If that is the case, open the application in a regular web browser, by click on the "open browser" button in the top right corner of the RStudio browser.

Two controls are located at the top of the table

Output folder name: a text box where you can input the name of the output folder. If this folder does not exist, it will be created as a sub-folder of the current working directory.
Process files: this button will start file processing. A file will be created in the output folder, with the same name as the original input file. If a file of the same name exists already, it will be overwritten. No change is made to the original files.

FCS file concatenation

The premessa package contains a simple function for concatenating multiple FCS files together. This is useful in case the acquisition of a single samples has been split across multiple files. The function is called concatenate_fcs_files and its documentation can be acessed directly from R. Note that this function assumes that the files are all identical panel-wise, i.e. they have the same parameter names and descriptions. Please use the panel editor before concatenation if that is not the case.

Bead-based normalization

The idea behind the method is described in this publication. This software represents an R re-implementation of the original normalization software developed for Matlab.

Sample data for testing is available here (only download the FCS in the top level directory, not the contents of the beads and normed sub-folders).

Inputs and outputs

The normalization workflow involves the following steps:

Beads identification through gating
Data normalization
Beads removal (optional)

Assuming the working directory is called working_directory and contains two FCS files called A.fcs and B.fcs, at the end of the workflow the following directory structure and output files will be generated

working_directory
|--- A.fcs
|--- B.fcs
|--- normed
     |--- A_normalized.fcs
     |--- B_normalized.fcs
     |--- beads_before_and_after.pdf
     |--- beads_gates.json
     |--- beads_vs_time
          |--- A.pdf
          |--- B.pdf
     |--- beads_removed
          |--- A_normalized_beadsremoved.fcs
          |--- B_normalized_beadsremoved.fcs
          |--- removed_events
               |--- A_normalized_removedEvents.fcs
               |--- B_normalized_removedEvents.fcs
     |--- beads
          |--- A_beads.fcs
          |--- B_beads.fcs

A_normalized.fcs: contains the normalized data, with an added parameter called beadDist representing the square root of the Mahalanobis distance of each event from the centroid of the beads population
beads_before_and_after.pdf: a plot of the median intensities of the beads channels before and after normalization. This plot contains a single median value per sample. Therefore it will not be informative if you are normalizing a single sample
beads_gates.json: this file contains the gates that were used to iden

Premessa

Install / Use

README

premessa

Installation

Install required R packages

devtools

flowCore

Install premessa

Usage

Panel editing and renaming

Starting the GUI and selecting the working directory

Usage

FCS file concatenation

Bead-based normalization

Inputs and outputs