SkillAgentSearch skills...

Peakzilla

Peakzilla is a self-learning algorithm to identify transcription factor binding sites from ChIP-seq data. I would be very happy if you try it and provide me with feedback, so I can improve peakzilla and make you a happier user. Please feel free to send an e-mail to: steinmann.jonas@gmail.com

Install / Use

/learn @steinmann/Peakzilla
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PEAKZILLA

Peakzilla identifies sites of enrichment and transcription factor binding sites from transcription factor ChIP-seq and ChIP-exo experiments at hight accuracy and resolution. It is designed to perform equally well for data from any species. All necessary parameters are estimated from the data. Peakzilla is suitable for both single and paired end data from any sequencing platform.

Note that peakzilla is not suited for the identification of broad regions of enrichment (e.g. ChIP-seq for histone marks), we recommand using MACS instead: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) 9(9):R137

USAGE

Options: -h, --help show this help message and exit -m N_MODEL_PEAKS, --model_peaks=N_MODEL_PEAKS number of most highly enriched regions used to estimate peak size: default = 200 -c ENRICHMENT_CUTOFF, --enrichment_cutoff=ENRICHMENT_CUTOFF minimum cutoff for fold enrichment: default = 2 -s SCORE_CUTOFF, --score_cutoff=SCORE_CUTOFF minimum cutoff for peak score: default = 1 -e, --gaussian use empirical model estimate instead of gaussian -p, --bedpe input is paired end and in BEDPE format -l LOG, --log=LOG directory/filename to store log file to: default = log.txt -n, --negative write negative peaks to negative_peaks.tsv

DEPENDENCIES

  • Runs on OSX and Linux based operating systems
  • Requires Python 2.7
  • For 2x better perormance use PyPy instead of CPython

INPUT FORMAT

Peakzilla accepts BED formated alignments as input.

For converstion to BED format and working with BED files and alignments in general I highly reccommend:

  • bowtie (http://bowtie-bio.sourceforge.net/)
  • SAMtools (http://samtools.sourceforge.net/)
  • bedtools (http://code.google.com/p/bedtools/)

WORKFLOW EXAMPLE

use bowtie to map uniquely mappable reads to the genome

bowtie -p4 -m1 --sam genome_index input.fastq input.sam bowtie -p4 -m1 --sam genome_index chip.fastq chip.sam

convert to BAM format

samtools view -bS input.sam > input.bam samtools view -bS chip.sam > chip.bam

convert to BED format

bamToBed -i input.bam > input.bed bamToBed -i chip.bam > chip.bed

run peakzilla

python peakzilla.py chip.bed input.bed > chip_peaks.tsv

Comparison of 2 datasets

Determine significant peaks with a score threshold of 10

python peakzilla.py -s 10 chip1.bed input1.bed > chip1_s10_peaks.tsv

Determine enriched regions with a score threshold of 2

python peakzilla.py -s 2 chip2.bed input2.bed > chip2_s2_peaks.tsv

Overlap significant peaks from chip1 with enriched regions from chip2

intersectBed -a chip1_s10_peaks.tsv -b chip2_s2_peaks.tsv > intersect_peaks.tsv

For example datasets as well as an example of a computational pipeline for the comparative analysis of ChIP-seq datasets, please refer to our publication: Bardet AF et al. A computational pipeline for comparative ChIP-seq analyses. Nature Protocols (2011) 7(1):45-61 (http://www.starklab.org/data/bardet_natprotoc_2011/)

OPTIONS

One of peakzilla's design goals is to learn all the necessary information from the data. The usage of the options should therefore not be required.

OUTPUT FORMAT

  • Results are printed as a table of tab delimited values to stdout
  • Logs are appended to logs.txt in the current directory or a custom directory/filename specified by the -l option
  • Enriched regions in the control sample are written to negative_peaks.tsv or a custom directory/filename specified by the -n option
  • Columns represent Chromosome / Start / End / Name / Summit / Score / ChIP / Control / FoldEnrichment / DistributionScore / FDR (%)

Related Skills

View on GitHub
GitHub Stars21
CategoryEducation
Updated5mo ago
Forks5

Languages

Python

Security Score

87/100

Audited on Oct 31, 2025

No findings