Peakzilla
Peakzilla is a self-learning algorithm to identify transcription factor binding sites from ChIP-seq data. I would be very happy if you try it and provide me with feedback, so I can improve peakzilla and make you a happier user. Please feel free to send an e-mail to: steinmann.jonas@gmail.com
Install / Use
/learn @steinmann/PeakzillaREADME
PEAKZILLA
Peakzilla identifies sites of enrichment and transcription factor binding sites from transcription factor ChIP-seq and ChIP-exo experiments at hight accuracy and resolution. It is designed to perform equally well for data from any species. All necessary parameters are estimated from the data. Peakzilla is suitable for both single and paired end data from any sequencing platform.
Note that peakzilla is not suited for the identification of broad regions of enrichment (e.g. ChIP-seq for histone marks), we recommand using MACS instead: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) 9(9):R137
USAGE
Options: -h, --help show this help message and exit -m N_MODEL_PEAKS, --model_peaks=N_MODEL_PEAKS number of most highly enriched regions used to estimate peak size: default = 200 -c ENRICHMENT_CUTOFF, --enrichment_cutoff=ENRICHMENT_CUTOFF minimum cutoff for fold enrichment: default = 2 -s SCORE_CUTOFF, --score_cutoff=SCORE_CUTOFF minimum cutoff for peak score: default = 1 -e, --gaussian use empirical model estimate instead of gaussian -p, --bedpe input is paired end and in BEDPE format -l LOG, --log=LOG directory/filename to store log file to: default = log.txt -n, --negative write negative peaks to negative_peaks.tsv
DEPENDENCIES
- Runs on OSX and Linux based operating systems
- Requires Python 2.7
- For 2x better perormance use PyPy instead of CPython
INPUT FORMAT
Peakzilla accepts BED formated alignments as input.
For converstion to BED format and working with BED files and alignments in general I highly reccommend:
- bowtie (http://bowtie-bio.sourceforge.net/)
- SAMtools (http://samtools.sourceforge.net/)
- bedtools (http://code.google.com/p/bedtools/)
WORKFLOW EXAMPLE
use bowtie to map uniquely mappable reads to the genome
bowtie -p4 -m1 --sam genome_index input.fastq input.sam bowtie -p4 -m1 --sam genome_index chip.fastq chip.sam
convert to BAM format
samtools view -bS input.sam > input.bam samtools view -bS chip.sam > chip.bam
convert to BED format
bamToBed -i input.bam > input.bed bamToBed -i chip.bam > chip.bed
run peakzilla
python peakzilla.py chip.bed input.bed > chip_peaks.tsv
Comparison of 2 datasets
Determine significant peaks with a score threshold of 10
python peakzilla.py -s 10 chip1.bed input1.bed > chip1_s10_peaks.tsv
Determine enriched regions with a score threshold of 2
python peakzilla.py -s 2 chip2.bed input2.bed > chip2_s2_peaks.tsv
Overlap significant peaks from chip1 with enriched regions from chip2
intersectBed -a chip1_s10_peaks.tsv -b chip2_s2_peaks.tsv > intersect_peaks.tsv
For example datasets as well as an example of a computational pipeline for the comparative analysis of ChIP-seq datasets, please refer to our publication: Bardet AF et al. A computational pipeline for comparative ChIP-seq analyses. Nature Protocols (2011) 7(1):45-61 (http://www.starklab.org/data/bardet_natprotoc_2011/)
OPTIONS
One of peakzilla's design goals is to learn all the necessary information from the data. The usage of the options should therefore not be required.
OUTPUT FORMAT
- Results are printed as a table of tab delimited values to stdout
- Logs are appended to logs.txt in the current directory or a custom directory/filename specified by the -l option
- Enriched regions in the control sample are written to negative_peaks.tsv or a custom directory/filename specified by the -n option
- Columns represent Chromosome / Start / End / Name / Summit / Score / ChIP / Control / FoldEnrichment / DistributionScore / FDR (%)
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
