Pooled Sequencing Analyses for Sex Signal (PSASS)

Overview

PSASS (Pooled Sequencing Analysis for Sex Signal) is a software to compare pooled sequencing datasets from two groups (usually two sexes). Results from PSASS can be easily visualized using the sgtr R package. PSASS is integrated in a Snakemake workflow to perform all required steps starting from a genome and reads files. PSASS was developed as part of a project by the LPGP lab from INRA, Rennes, France.

Citing PSASS

There is currently no paper officially describing PSASS. To cite PSASS, use the DOI provided by Zenodo.

Installation

Install with conda

PSASS is available in Bioconda. To install psass with conda, run the following command:

conda install -c bioconda psass

Install from source

PSASS implements parsing of alignment files using the htslib library, which requires autotools to build and depends on zlib, libbz2, and liblzma to read CRAM files. All are installed by default on most linux distributions. Compilation was tested with gcc >= 5.3.0.

To build psass, follow these instructions:

# Clone the repository
git clone https://github.com/RomainFeron/PSASS.git
# Navigate to the PSASS directory
cd psass
# Build PSASS
make

Usage

In summary, psass takes as input one reads alignment file for each pool and computes the following metrics:

Between-pools Fst in a sliding window
Number of SNPs specific to each pool, defined as SNPs heterozygous in one pool and homozygous in the other, in a sliding window
Absolute and relative depth for each pool in a sliding window
(optional) The position of all bases with high between-pools Fst
(optional) The position of all SNPs specific to each pool
(optional) Number of SNPs specific to each pool, and absolute and relative depth in each pool for all genes in a provided GFF file, for both coding and noncoding parts of the genes

Currently, psass implements three commands:

pileup : generate a nucleotides depth file from two alignment files
analyze : compute pool comparison metrics from a nucleotides depth file
convert : convert output from samtools mpileup to a nucleotides depth file (deprecated, use pileup instead)

Quickstart

In this example, we will compare a pool of female individuals and a pool of male individuals with psass. We assume the following input data:

genome.fa: the assembly to which reads from each pool were aligned
females.cram: alignment file for the reads from the female pool (sorted by genomic coordinates)
males.cram: alignment file for the reads from the male pool (sorted by genomic coordinates)

Generate a pileup file with `psass pileup`:

psass pileup --reference genome.fa --output-file pileup.tsv females.cram males.cram

This command generates the file pileup.tsv which contains the nucleotide composition of each base in genome.fa for each pool.

Compute metrics with `psass analyze`:

psass analyze --window-size 10000 --output-resolution 1000 --snp-file psass_snps.tsv pileup.tsv psass_window.tsv

This command generates two output files:

psass_window.tsv which contains between-pool FST, pool-specific SNPs, and depth for each pool in a sliding window of 10,000 bp, output every 1,000 bp.
psass_snps.tsv which contains the position of each pool-specific SNP.

pileup

The pileup function generates a file with the nucleotide composition of all genomic positions for any number of alignment files. Alignment files can be provided in CRAM or BAM format and need to be sorted by coordinate. The output is a wig-like file with a header line giving the contig name and length followed by one line giving the nucleotide composition in each alignment file for each positions in the contig for all contigs in the alignment files.

Usage: psass pileup [OPTIONS] ALIGNMENT_FILES...

Arguments:

Argument | Type | Description | Default | -----------------------|------------|------------------------------------------------------------------|---------| ALIGNMENT_FILES | file | One alignment file for each pool, in either CRAM or BAM format | | --reference, -r | file | Reference file in fasta format, required with CRAM input files | | --output-file, -o | string | Write output to this file instead of stdout | | --min-map-quality, -q | string | Minimum mapping quality to include a read in pileup | 0 | --help | | Display help message | |

pileup generates a file with the following format:

#Files   <first_input_file>   <second_input_file>   # Comment line
region=<contig name>   len=<contig length>          # Header line for the first contig, encoding the contig name and its length
nA,nT,nC,nG,nN,nO   nA,nT,nC,nG,nN,nO               # Count for each type of nucleotide (comma-separated) in each pool (tab-separated) for pos 0
nA,nT,nC,nG,nN,nO   nA,nT,nC,nG,nN,nO               # Count for each type of nucleotide (comma-separated) in each pool (tab-separated) for pos 1
...
region=<contig name>   len=<contig length>          # Header line for the second contig, encoding the contig name and its length
nA,nT,nC,nG,nN,nO   nA,nT,nC,nG,nN,nO               # Count for each type of nucleotide (comma-separated) in each pool (tab-separated) for pos 0
nA,nT,nC,nG,nN,nO   nA,nT,nC,nG,nN,nO               # Count for each type of nucleotide (comma-separated) in each pool (tab-separated) for pos 1
...

analyze

The analyze function computes between-pool FST, pool-specific SNPs, and depth in a sliding window from a nucleotide composition file generate with psass pileup from two alignment files.

Usage: psass analyze [OPTIONS] INPUT_FILE OUTPUT_FILE

Arguments:

Argument | Type | Description | Default | -------------------------|------------|-------------------------------------------------------------------------|-----------| INPUT_FILE | file | Path to a nucleotides depth file generated by psass pileup or convert | | OUTPUT_FILE | string | Path to an output file for sliding window metrics | | --pool1, -p | string | Name of the first pool | females | --pool2, -q | string | Name of the second pool | males | --snp-file, -s | string | Output pool-specific SNPs to this file | | --fst-file, -f | string | Output high FST positions to this file | | --genes-file, -g | string | Output gene metrics to this file (requires a GFF file) | | --gff-file, -G | string | Path to a GFF file for gene-specific output | | --popoolation | | If set, assumes the input file was generated with popoolation2 | | --min-depth, -d | int | Minimum depth to include a site in the analyses | 10 | --window-size, -w | int | Size of the sliding window (in bp) | 100000 | --output-resolution, -r | int | Output resolution for sliding window metrics (in bp) | 10000 | --freq-het, -e | float | Allele frequency to consider a SNP heterozygous in a pool | 0.5 | --range-het, -u | float | Range of allele frequency to consider a SNP heterozygous in a pool | 0.15 | --freq-hom, -o | float | Allele frequency to consider a SNP homozygous in a pool | 1 | --range-hom, -v | float | Range of allele frequency to consider a SNP homozygous in a pool | 0.05 | --min-fst, -t | float | Minimum FST to output a site in the FST positions file | 0.1 | --group-snps | | If set, group consecutive snps to count them as a single polymorphism | | --help | | Print this help message and exit | |

Output files

Sliding window output file

A tabulated file with contig, position on the contig, contig length, number of pool-specific SNPs, between-pool FST in the window, absolute depth, and relative depth for each pool in a sliding window of size given by --window-size. Output every N bp, with N given by --output-resolution.

Contig   Position  Length  Snps_<pool1>  Snps_<pool2>       Fst  Abs_depth_<pool1>  Abs_depth_<pool2>  Rel_depth_<pool1>  Rel_depth_<pool2>
Contig1         0    6000             4             5    0.0000                166                174               0.73               0.74
Contig1     10000

PSASS

Install / Use

README

Pooled Sequencing Analyses for Sex Signal (PSASS)

Overview

Installation

Install with conda

Install from source

Usage

Quickstart

Generate a pileup file with `psass pileup`:

Compute metrics with `psass analyze`:

pileup

analyze

Output files

PSASS

Install / Use

README

Pooled Sequencing Analyses for Sex Signal (PSASS)

Overview

Installation

Install with conda

Install from source

Usage

Quickstart

Generate a pileup file with psass pileup:

Compute metrics with psass analyze:

pileup

analyze

Output files

Generate a pileup file with `psass pileup`:

Compute metrics with `psass analyze`: