Peakzilla

Peakzilla is a self-learning algorithm to identify transcription factor binding sites from ChIP-seq data. I would be very happy if you try it and provide me with feedback, so I can improve peakzilla and make you a happier user. Please feel free to send an e-mail to: steinmann.jonas@gmail.com

Generate Convert Improve

Install / Use

/learn @steinmann/Peakzilla

About this skill

Quality Score

0/100

README

PEAKZILLA

Peakzilla identifies sites of enrichment and transcription factor binding sites from transcription factor ChIP-seq and ChIP-exo experiments at hight accuracy and resolution. It is designed to perform equally well for data from any species. All necessary parameters are estimated from the data. Peakzilla is suitable for both single and paired end data from any sequencing platform.

Note that peakzilla is not suited for the identification of broad regions of enrichment (e.g. ChIP-seq for histone marks), we recommand using MACS instead: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) 9(9):R137

USAGE

Options: -h, --help show this help message and exit -m N_MODEL_PEAKS, --model_peaks=N_MODEL_PEAKS number of most highly enriched regions used to estimate peak size: default = 200 -c ENRICHMENT_CUTOFF, --enrichment_cutoff=ENRICHMENT_CUTOFF minimum cutoff for fold enrichment: default = 2 -s SCORE_CUTOFF, --score_cutoff=SCORE_CUTOFF minimum cutoff for peak score: default = 1 -e, --gaussian use empirical model estimate instead of gaussian -p, --bedpe input is paired end and in BEDPE format -l LOG, --log=LOG directory/filename to store log file to: default = log.txt -n, --negative write negative peaks to negative_peaks.tsv

DEPENDENCIES

Runs on OSX and Linux based operating systems
Requires Python 2.7
For 2x better perormance use PyPy instead of CPython

INPUT FORMAT

Peakzilla accepts BED formated alignments as input.

For converstion to BED format and working with BED files and alignments in general I highly reccommend:

bowtie (http://bowtie-bio.sourceforge.net/)
SAMtools (http://samtools.sourceforge.net/)
bedtools (http://code.google.com/p/bedtools/)

WORKFLOW EXAMPLE

use bowtie to map uniquely mappable reads to the genome

bowtie -p4 -m1 --sam genome_index input.fastq input.sam bowtie -p4 -m1 --sam genome_index chip.fastq chip.sam

convert to BAM format

samtools view -bS input.sam > input.bam samtools view -bS chip.sam > chip.bam

convert to BED format

bamToBed -i input.bam > input.bed bamToBed -i chip.bam > chip.bed

run peakzilla

python peakzilla.py chip.bed input.bed > chip_peaks.tsv

Comparison of 2 datasets

Determine significant peaks with a score threshold of 10

python peakzilla.py -s 10 chip1.bed input1.bed > chip1_s10_peaks.tsv

Determine enriched regions with a score threshold of 2

python peakzilla.py -s 2 chip2.bed input2.bed > chip2_s2_peaks.tsv

Overlap significant peaks from chip1 with enriched regions from chip2

intersectBed -a chip1_s10_peaks.tsv -b chip2_s2_peaks.tsv > intersect_peaks.tsv

For example datasets as well as an example of a computational pipeline for the comparative analysis of ChIP-seq datasets, please refer to our publication: Bardet AF et al. A computational pipeline for comparative ChIP-seq analyses. Nature Protocols (2011) 7(1):45-61 (http://www.starklab.org/data/bardet_natprotoc_2011/)

OPTIONS

One of peakzilla's design goals is to learn all the necessary information from the data. The usage of the options should therefore not be required.

OUTPUT FORMAT

Results are printed as a table of tab delimited values to stdout
Logs are appended to logs.txt in the current directory or a custom directory/filename specified by the -l option
Enriched regions in the control sample are written to negative_peaks.tsv or a custom directory/filename specified by the -n option
Columns represent Chromosome / Start / End / Name / Summit / Score / ChIP / Control / FoldEnrichment / DistributionScore / FDR (%)

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

19.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary