ScreenPro2
Flexible analysis of high-content CRISPR screening
Install / Use
/learn @ArcInstitute/ScreenPro2README
ScreenPro2
Introduction
TL;DR
ScreenPro2 enables perform flexible analysis on high-content CRISPR screening datasets. It has functionalities to process data from diverse CRISPR screen platforms and is designed to be modular to enable easy extension to custom CRISPR screen platforms or other commonly used platforms in addition to the ones currently implemented.
<details> <summary>Background</summary> <br>
Functional genomics field is evolving rapidly and many more CRISPR screen platforms are now developed. Therefore, it's important to have a standardized workflow to analyze the data from these screens. ScreenPro2 is provided to enable researchers to easily process and analyze data from CRISPR screens. Currently, you need to have a basic background in programming (especially Python) to use ScreenPro2.
ScreenPro2 is conceptually similar to the ScreenProcessing pipeline but ScreenPro2 is designed to be more modular, flexible, and extensible. Common CRISPR screen methods that we have implemented here are illustrated in a recent review paper:
Fig. 1: Common types of CRISPR screening modalities indicating advances in CRISPR methods.
<img width="1000" alt="image" src="https://github.com/GilbertLabUCSF/ScreenPro2/assets/53412130/a39400ad-b24f-4859-b6e7-b4d5f269119c"></details>
Installation
ScreenPro2 is available on PyPI and can be installed with pip:
pip install ScreenPro2
For the latest version (development version) install from GitHub:
pip install git+https://github.com/ArcInstitute/ScreenPro2.git
Usage
Command Line Interface (CLI)
ScreenPro2 has a built-in command line interface (CLI). You can access the CLI by running the following command in your terminal:
screenpro --help
Python Package Usage
You can also use ScreenPro2 as a Python package. To use ScreenPro2 in your Python code, you can import it as follows:
import screenpro as scp
Analysis Workflow
Data analysis for CRISPR screens with NGS readouts can be broken down into three main steps:
Step 1: FASTQ processing
The first step in analyzing CRISPR screens with deep sequencing readouts is to process the FASTQ files and generate counts for each guide RNA element in the library. ScreenPro2 has built-in functionalities to process FASTQ files and generate counts for different types of CRISPR screens platforms (see Supported CRISPR Screen Platforms).
<details> <summary>Command Line Interface (CLI)</summary> <br> ScreenPro2 has a built-in command line interface (CLI) to process FASTQ files and generate counts.screenpro guidecounter --help
A draft code to process FASTQ files and generate counts for CRISPRa/i-single-sgRNA-screens dataset:
screenpro guidecounter
--cas-type dCas9
--single-guide-design
-l <path-to-CRISPR-library-table>
-p <path-to-fastq-directory>
-s <sample-id-1>,<sample2-id> # comma-separated list of sample ids, i.e. `<sample_id>.fastq.gz` for single sgRNA screens
-o <output-directory>
--write-count-matrix
A draft code to process FASTQ files and generate counts for CRISPRa/i-dual-sgRNA-screens dataset:
screenpro guidecounter
--cas-type dCas9
--dual-guide-design
-l <path-to-CRISPR-library-table>
-p <path-to-fastq-directory>
-s <sample-id-1>,<sample2-id> # comma-separated list of sample ids, i.e. `<sample_id>_R[1,2].fastq.gz` for dual sgRNA screens
-o <output-directory>
--write-count-matrix
</details> <details> <summary>Python Package Usage</summary> <br>
In addition to the CLI, ScreenPro2 has a built-in method to process FASTQ files and generate counts in Python.
This method is implemented in the ngs module and relvent submodules.
A minor novelty here has enabled processing single, dual, or multiple sgRNA
CRISPR screens. Also, this approach can retain recombination events which can
occur in dual or higher order sgRNA CRISPR screens.
Currently, GuideCounter class from the ngs module can process FASTQ files and generate counts for standard
CRISPR screens with single or dual
guide design.
Here is a draft code to process FASTQ files and generate counts for an experiment with CRISPRa/i-dual-sgRNA-screens:
# Initialize the GuideCounter object
counter = scp.GuideCounter(cas_type = 'cas9', library_type = 'single_guide_design')
# Load the reference library
counter.load_library("<path-to-CRISPR-library-table>", sep = '\t', verbose = True, index_col=None)
# Define the samples
samples = []
## `samples` is a list of sample ids in the experiment.
## Each sample id should match the sample name in the FASTQ files, i.e. <sample_id>.fastq.gz
# Process the FASTQ files and generate counts
counter.get_counts_matrix(
fastq_dir = '<path-to-fastq-directory>',
samples = samples,
verbose = True
)
Here is a draft code to process FASTQ files and generate counts for an experiment with CRISPRa/i-dual-sgRNA-screens:
# Initialize the Counter object
counter = scp.GuideCounter(cas_type = 'dCas9', library_type = 'dual_guide_design')
# Load the reference library
counter.load_library("<path-to-CRISPR-library-table>", sep = '\t', verbose = True, index_col=None)
# Define the samples
samples = []
## `samples` is a list of sample ids in the experiment.
## Each sample id should match the sample name in the FASTQ files, i.e. <sample_id>_R[1,2].fastq.gz
# Process the FASTQ files and generate counts
counter.get_counts_matrix(
fastq_dir = '<path-to-fastq-directory>',
samples = samples,
verbose = True
)
After this, you have .counts_mat calculated in the GuideCounter object.
To proceed, you need to create an AnnData object from the counts matrix and metadata. You can use the following code to create an AnnData object:
adata = counter.build_counts_anndata()
</details> <br>
Step 2: Phenotype calculation
Once you have the counts, you can use ScreenPro2 phenoscore and phenostats modules to calculate the phenotype scores and statistics between screen arms.
First, load your data into an AnnData object (see anndata for more information).
The AnnData object must have the following contents:
adata.X– counts matrix (samples x targets) where each value represents the sequencing count from NGS data.adata.obs– a pandas dataframe of samples metadata including "condition" and "replicate" columns.- "condition": the condition for each sample in the experiment.
- "replicate": the replicate number for each sample in the experiment.
adata.var– a pandas dataframe of targets in sgRNA library including "target" and "targetType" columns.- "target": the target for each entry in reference sgRNA library. For single sgRNA libraries, this column can be used to store gene names. For dual or multiple targeting sgRNA libraries, this column can be used to store gene pairs or any other relevant information about the target.
- "targetType": the type of target for each entry in reference sgRNA library. Note that this column is used to
distinguish between different types of sgRNAs in the library and negative control sgRNAs can be defined as
"targetType" == "negative_control". This is important for the phenotype calculation step.
ScreenPro2 has a built-in class for different types of CRISPR screen assays. Currently, there is a class called PooledScreens
that can be used to process data from pooled CRISPR screens. To create a PooledScreens object from an AnnData object,
you can use the following example code:
import pandas as pd
import anndata as ad
from screenpro.assays import PooledScreens
adata = ad.AnnData(
X = counts_df, # pandas dataframe of counts (samples x targets)
obs = meta_df, # pandas dataframe of samples metadata including "condition" and "replicate" columns
var = target_df # pandas dataframe of targets metadata including "target" and "targetType" columns
)
screen = PooledScreens(adata)
<img width="600" alt="image" src="https://github.com/ArcInstitute/ScreenPro2/assets/53412130/bb38d119-8f24-44fa-98ab-7ef4457ef8d2">
</details> <details> <summary>Run workflows</summary>
Related Skills
qqbot-channel
343.3kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
