SampleSheet.py

Create an Illumina® Sample Sheet, the comma-separated text document required by Illumina® sequencing systems to specify (1) sequencing parameters and (2) sample-barcode relationships. Customize [Header], [Reads], and [Data] sections; in particular, draw on up to 192 8-bp barcodes (96x i7 & 96x i5 indices) to specify up to 9,216 sample-barcode relationships for multiplexed amplicon sequencing.

Table of contents

Background
Features
Requirements
Synopsis
System setup
- Option 1: Virtual machine
- Option 2: Direct install
Code launch notes
- Launching .py program
  - Command line .py
  - Command line .py in virtual environment
- Launching .ipynb program
  - Jupyter Notebook .ipynb
  - Jupyter Notebook .ipynb in virtual environment
Operation notes
Input notes
Output notes
Visual summary of key script operations
Status
Contact

Background

<img src="SampleSheet_img/SampleSheet_thumbnail.png" align="left" width="600"> Sequencing by synthesis (SBS) collects millions to billions of DNA sequence reads *en masse*. DNA templates from tens to thousands of independent sample sources can be barcoded, pooled, and sequenced on a common flow cell. Unique indices (barcode sequences) allow pooled reads to be assigned to cognate sample sources (demultiplexed).

This script automates creation of an Illumina® Sample Sheet, the comma-separated text document required by Illumina® sequencing systems to specify (1) sequencing parameters and (2) sample-barcode relationships. With this script, a Sample Sheet with up to 9,216 sample-barcode relationships can be automatically generated in <1 second, following user entry of a single simplified list containing up to 96 sample prefixes assigned to an i7 index range and unique i5 index (each sample prefix to be expanded to up to 96 individual samples, suffixed by well ID (i.e., A01-H12) of a 96-well plate).

Features

Automates data entry into 3 Illumina® Sample Sheet sections, based on command-line input provided by a user:
- [Header] (InvestigatorName, ProjectName)
- [Reads] (# of reads)
- [Data] (Sample ID, i7 index, i5 index)

Requirements

Python 3.7 or higher - instructions for install below
Python library for command line script (suggested) PrettyTable - instructions for install below

Synopsis

This script returns a Sample Sheet file compatible with Illumina® sequencing platforms.

Users are asked for the path to an output directory in which a Sample Sheet will be created, along with user-specific variables for Sample Sheet [Header], [Reads], and [Data] sections.

For [Data] relationships between sample names and i7+i5 indices, SampleSheet.py draws upon a set of 192 custom primers with unique 8-bp barcodes compatible with Illumina® sequencing platforms; these indices allow up to 9,216 samples to be arrayed in 96-well (or 384-well) format with unique barcodes for pooled sequencing.

(see 'Input notes' for details).

Note on index usage: In this script, each i7 index identifies an individual well within a 96-well plate format (each well is uniquely barcoded by a single i7 index), whereas a single i5 index defines all wells of a specific plate (up to 96 wells in a single plate are barcoded by a common i5 index). Primer sequences (and indices used by SampleSheet.py) can be found in associated files, i7_barcode_primers.xls and i5_barcode_primers.xls.

For further usage details, please refer to the following manuscript:

Ehmsen, Knuesel, Martinez, Asahina, Aridomi, Yamamoto (2021)

Please cite usage as:

SampleSheet.py
Ehmsen, Knuesel, Martinez, Asahina, Aridomi, Yamamoto (2021)

System setup

Virtual machine

Alleles_and_altered_motifs.ova

The programs are available for use either individually or packaged into a virtual machine which can be run on Mac, Linux, or Windows operating systems. The "Alleles_and_altered_motifs" virtual machine comes pre-installed with BLAST, MEME, the full hg38 genome BLAST database, test datasets, and all the external dependencies needed to run SampleSheet, CollatedMotifs, and Genotypes. Windows users are encouraged to use the virtual machine to run CollatedMotifs because the MEME suite software upon which CollatedMotifs relies is not natively supported on Windows OS.

Detailed instuctions on Virtual machine download and setup at <a href="https://doi.org/10.5281/zenodo.3406861">Download Alleles_and_altered_motifs virtual machine</a> from Zenodo, DOI 10.5281/zenodo.3406861
Note: Running the virtual machine requires virtualization software, such as Oracle VM VirtualBox, available for download at <a href="https://www.virtualbox.org/">Download virtualbox Software</a> https://www.virtualbox.org/

Linux and Mac users can also follow the steps below to install SampleSheet, Genotypes, and CollatedMotifs. If you are running Windows, you can follow the steps below to install SampleSheet and Genotypes (without CollatedMotifs).

Direct install

2.1. Python 3 setup

First confirm that Python 3 (required) and Jupyter Notebook (optional) are available on your system, or download & install by following the steps below

Mac and Linux OS generally come with Python pre-installed, but Windows OS does not. Check on your system for the availability of Python version 3.7 or higher by following guidelines below:

First open a console in Terminal (Mac/Linux OS) or PowerShell (Windows OS), to access the command line interface.
Check to see which version of Python your OS counts as default by issuing the following command (here, $ refers to your command-line prompt and is not a character to be typed):

$ python --version
- If the output reads Python 3.7.3 or any version >=3.7, you are good to go and can proceed to Jupyter Notebook (optional).
- If the output reads Python 2.7.10 or anything below Python 3, this signifies that a Python version <3 is the default version, and you will need to check whether a Python version >=3.7 is available on your system.
 - To check whether a Python version >=3.7 is available on your system, issue the following command:
 
 $ python3 --version
 - If the output finds a Python version >=3.7 (such as Python 3.7.3), you are good to go and can proceed to Jupyter Notebook (optional).
 - If the output does not find a Python version >3.7, use one of the following two options to download and install Python version >=3.7 on your computer:

Python 3 (required)

Option 1) Install Python 3 prior to Jupyter Notebook This option is recommended for most users

Go to the following website to download and install Python https://www.python.org/downloads/
- Select "Download the latest version for X", and then follow installation guidelines and prompts when you double-click the downloaded package to complete installation.
- Once you have downloaded and installed a Python 3 version >=3.7, double-check in your command-line that Python 3 can be found on your system by issuing the following command:
  
  $ python3 --version
- The output should signify the Python version you just installed. Proceed to Jupyter Notebook (optional).

Anaconda (Optional: Python 3 with Jupyter Notebook in one)

Option 2) Install Python 3 and Jupyter Notebook (together as part of Anaconda package)

Note, this method has only been tested for use of SampleSheet.py and Genotypes.py on Windows and may not work on all Mac or Linux systems in conjunction with the use of Python Virtual Environments (virtualenv) to run CollatedMotifs.py**
Anaconda (with Jupyter Notebook) Download & Installation https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html
- Download Anaconda with Python 3, and then follow installation guidelines and prompts when you double-click the downloaded package to complete installation.

SampleSheet

Install / Use

README

<span style="color:mediumblue">SampleSheet.py</span>

<span style="color:mediumblue">Table of contents</span>

<span style="color:mediumblue">Background</span>

<span style="color:mediumblue">Features</span>

<span style="color:mediumblue">Requirements</span>

<span style="color:mediumblue">Synopsis</span>

<span style="color:mediumblue">System setup</span>

<span style="color:dodgerblue">Virtual machine<span>

<span style="color:dodgerblue">Alleles_and_altered_motifs.ova</span>

<span style="color:dodgerblue">Direct install<span>

<span style="color:dodgerblue">2.1. Python 3 setup</span>

<span style="color:dodgerblue">Python 3 (required)</span>

<span style="color:dodgerblue">Anaconda (Optional: Python 3 with Jupyter Notebook in one)</span>