<h1 align="center" style="display: block; font-size: 2.5em; font-weight: bold; margin-block-start: 1em; margin-block-end: 1em;"> <div align="right"> <a href="https://www.niddk.nih.gov/research-funding/at-niddk/labs-branches/laboratory-cell-molecular-biology/rna-biology-section"> <img src="https://upload.wikimedia.org/wikipedia/commons/a/a8/National_Institute_of_Diabetes_and_Digestive_and_Kidney_Diseases_%28NIDDK%29_Logo.png" alt="Visit our Lab Page" height="60"> </a> </div>

<br /><br /><strong>PICB - <u>pi</u>RNA <u>C</u>luster <u>B</u>uilder</strong>

</h1>

Introduction

PICB (piRNA Cluster Builder) is a flexible toolkit for assembling, prioritizing, and characterizing piRNA clusters.

Motivation
Getting started
How to run PICB
- Data preparation
- Running PICB
Parameter adjustments
Output
Other PICB functions
Let's give it a try - An Example
Authors, Citation and Acknowledgments

Motivation

piRNAs (PIWI-interacting RNAs) and their PIWI protein partners play a key role in fertility and maintaining genome integrity by restricting mobile genetic elements (transposons) in germ cells. piRNAs originate from genomic regions which are called piRNA clusters.

PICB identifies genomic regions with a high density of piRNAs. This construction of piRNA clusters is performed through stepwise integration of unique and multimapping piRNAs.

Figure 1: PICB considers unique mapping piRNAs (NH=1), primary alignments of multimapping piRNAs (NH>1), and all possible alignments stepwise to build seeds, cores, and clusters. Find additional information in our <a href="https://www.sciencedirect.com/science/article/pii/S2211124724011288" target="_blank"> recent publication</a>.

Only very limited programming knowledge is needed to run PICB. Check out our step-by-step instructions and our demonstration below.

Please visit our <a href="https://www.sciencedirect.com/science/article/pii/S2211124724011288" target="_blank">publication</a> for full context.

Getting started

PICB runs in R versions <span>≥</span> 4.2. to 4.4.

It is possible to run PICB in RStudio, in an R script on your local machine or with High Performance Computing (HPC) resources or in Jupyter Notebook (using R). Keep in mind that as for any handling of large-scale sequencing data you need to have sufficient memory on your device (or cluster) allocated.

<b>1. Load dependencies in R environment</b>

You will need to install and load the following required R packages:

install.packages(c("data.table", "seqinr", "openxlsx", "dplyr"))
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("IRanges", "GenomicRanges", "GenomicAlignments", "Rsamtools", "Biostrings", "GenomeInfoDb", "BSgenome", "rtracklayer"))

💡 In case you have not worked with GRanges yet, we recommend reading the following <a href=https://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.html target="_blank">GRanges Introduction</a>.

<b>2. Install PICB</b>

PICB is available to install from any of the following sources. If you choose to install from GitHub, make sure to have either devtools or remotes installed.

| Where | Source | Command | |-------------|----------|-----------------------------------------------------------------------------| | R | GitHub | From GitHub repository: devtools::install_github("HaaseLab/PICB") or remotes::install_github("HaaseLab/PICB") | | Web Browser | GitHub | <a href="https://github.com/HaaseLab/PICB/archive/refs/heads/main.zip">Download GitHub repository here.</a> Now unzip the file and run install.packages("Downloads/PICB-main", repos=NULL, type="source") in R. | | Terminal | GitHub | From GitHub repository: Clone Source Code: git clone https://github.com/HaaseLab/PICB.git  |

Now load PICB in your R environment:

library('PICB')

From now on it gets even easier.

How to run PICB

There are just two required inputs: the BAM File and the Reference Genome.

Data preparation

Checklist for having the right BAM File

[ ] NH and NM tags are included
[ ] Indexed (.bai file required)

PICB stops when these requirements are not met and provides a descriptive error message.

Four options for providing the Reference Genome

Your genome is part of the <a href= https://kasperdanielhansen.github.io/genbioconductor/html/BSgenome.html target="_blank" rel="noopener noreferrer">BSgenome</a> package

# Example for Drosophila melanogaster, replace with your own genome - previous installation of BSgenome required
library(BSgenome.Dmelanogaster.UCSC.dm6)
myGenome <- "BSgenome.Dmelanogaster.UCSC.dm6"

Chromosome names and lengths according to the BAM file</label>

myGenome <- GenomeInfoDb::Seqinfo(
    seqnames = c("chr2L", "chr2R", "chr3L", "chr3R", "chr4", "chrX", "chrY"), 
    seqlengths = c(23513712, 25286936, 28110227, 32079331, 1348131, 23542271, 3667352))

Using a Seqinfo object

Or use an existing Seqinfo object:

myGenome <- GenomeInfoDb::Seqinfo(genome = "dm6")

Fasta with the assembled genome sequence.

myGenome <- PICBloadfasta(FASTA.NAME="/path/to/your/genome.fa")

Running PICB

Load your mapped piRNAs with PICBload

myAlignments <- PICBload(
    BAMFILE = bam_file,
    REFERENCE.GENOME = myGenome
)

Build piRNA clusters with PICBbuild

myClusters <- PICBbuild(
    IN.ALIGNMENTS = myAlignments,
    REFERENCE.GENOME = myGenome
)$clusters

💡 Both PICBload and PICBbuild allow wide-ranging adjustments: Read the below section Parameter adjustments to adapt to sparse reference genomes and specific limitations of the data set.

💡 As described here, PICBbuild integrates unique mapping (seeds), primary multimapping (cores) and secondary alignments stepwise. You can access the outputs of the previous steps by using $seeds or $cores instead of $clusters.

Parameter adjustments

PICB allows wide-ranging parameter adjustments to adapt to e.g. sparse reference genomes and specific limitations of the data set. Tables of adjustments for both functions are shown below.

<details close><summary title="Click to show/hide details">Click to show / hide: Parameters for <code>PICBload</code></summary><br/>

Required parameters:

BAMFILE
REFERENCE.GENOME

<br>

Adjustable parameters:

| Parameter Name | Possible Values | Default Value | Explanation | |----------------|-----------------|---------------|-------------| | VERBOSE | TRUE, FALSE | TRUE | Allows disabling progress messages while running PICBload. | | IS.SECONDARY.ALIGNMENT | TRUE, FALSE, NA | NA (all alignments) | Determines which alignment types (primary multimappers and secondary multimappers) will be loaded. | | SEQ.LEVELS.STYLE | character | "UCSC" | Allows changing the style of chromosome names in the output. Supported styles are listed in GenomeInfoDb::genomeStyles(). NA allows keeping all chromosomes from REFERENCE.GENOME and does not filter by standard linear chromosome names. Make sure that chromosome names must match between bam file and provided reference genome. | | STANDARD.CONTIGS.ONLY | TRUE, FALSE | TRUE | Determines whether alignments from non-standard contigs are used. | | FILTER.BY.FLAG | TRUE, FALSE | TRUE | Allows only those alignments with flag values present in the vector of allowed flags SELECT.FLAG. Default values of SELECT.FLAG are 0, 16, 272 and 256 (primary and secondary alignments on plus and minus strands). If FALSE, includes all flags. | | USE.SIZE.FILTER | TRUE, FALSE | TRUE | Allows filtering of alignments based on size. Default value is 18-50 nt. | | TAGS | vector | c("NH","NM") | Indicates list of tags to be extracted from given bam file. The “NH” tag is required to deduce if the alignment is unique mapping or multimapping. The “NM” is required to identify mismatches if required in further analysis. | | GET.ORIGINAL.SEQUENCE | TRUE, FALSE | FALSE | Allows extraction of original read sequence from the bam file. | | SIMPLE.CIGAR | TRUE, FALSE | TRUE | Allows filtering out spliced alignments. | | PERFECT.MATCH.ONLY | TRUE, FALSE | FALSE | Allows filtering out alignments with mismatches. | | WHAT | vector | c(“flag”) | Allows importing flags. Corresponds to “what” from ScanBamParam-class [Morgan M, Pagès H, Obenchain V, Hayden N (2023). Rsamtools: Binary alignment (BA

PICB

Install / Use

README