Pipette
variant discovery and annotation using GATK and Ensembl
Install / Use
/learn @metalhelix/PipetteREADME
h1. Pipette: A framework for Bioinformatic pipelines
Pipette is a framework to quickly and easily build a pipeline for multi-step processing of input. It is especially suited for Bioinformatic applications.
Pipette also contains a collection of pipelines created in Pipette. Current Pipette Pipelines:
- variant: a variant calling and annotation pipeline.
h3. How it works
Pipette pipelines are made of steps. Each step is independent of other steps. Each step defines 2 attributes for itself:
- input: The input parameters the step needs to run.
- output: The parameters the step will generate after running.
Pipette chains these steps together, passing in input parameters and collecting output parameters automatically.
To create a new Pipeline in Pipette, simply sub-class the Pipeline class.
h3. Tools
Pipette also includes a number of tools that perform the bulk of the processing for each step. These tools are simple wrappers for existing bioinformatic programs and also custom processing.
h3. Requirements
Pipette is written in Ruby. Also, various tools are expected to be installed if used. See below for required tools for each pipeline.
h1. Pipette Pipelines:
h2. Variant calling and annotation pipeline
Pipeline to facilitate the automation of finding variants (SNPs/Indels) in an aligned BAM file.
h3. Prerequisites
Currently, this pipeline does not perform the initial alignment step. This pipeline starts with an indexed BAM file and a reference genome.
h3. Requirements
This pipeline wraps the calling of a number of external tools to perform the variant calling and annotation.
h3. Current Applications Expected by vp:
- "GATK":http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit - performs most of the SNP / Indel calling steps using the GATK's "Unified Genotyper":http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper
- "samtools":http://samtools.sourceforge.net/ - for indexing output from GATK
- "snpEff":http://snpeff.sourceforge.net/ - for annotation
h3. Configuration Options
<pre> ./variant_pipeline.rb -h Usage: variant_pipeline [options] -i, --input BAM_FILE REQUIRED - Input BAM file to call SNPs on -r, --reference FA_FILE REQUIRED - Reference Fasta file for genome -o, --output PREFIX Output prefix to use for generated files -j, --cores NUM Specify number of cores to run GATK on. Default: 4 -c, --recalibrate COVARIATE_FILE If provided, recalibration will occur using input covariate file. Default: recalibration not performed -a, --annotate GENOME Annotate the SNPs and Indels using Ensembl based on input GENOME. Example Genome: FruitFly --gatk JAR_FILE Specify GATK installation --snpeff JAR_FILE Specify snppEff Jar location --snpeff_config CONFIG_FILE Specify snppEff config file location --samtools BIN_PATH Specify location of samtools -q, --quiet Turn off some output -s realign,recalibrate,call,filter,annotate, --steps Specify only which steps of the pipeline should be executed -y, --yaml YAML_FILE Yaml configuration file that can be used to load options. Command line options will trump yaml options -h, --help Displays help screen, then exits </pre>h3. Run Example
The easiest way to run the variant pipeline is to create a config yaml file to store most of the required configurations and then run using the -y flag.
h3. sample_config.yml
<pre> reference: "~/genomes/Drosophila_melanogaster.BDGP5.4.54.dna.fasta" input: "./aligned.fly.bam" output: aligned.fly annotate: dm5.34 gatk: "~/tools/GATK/GenomeAnalysisTK.jar" snpeff: "~/tools/snpEff/snpEff.jar" snpeff_config: "~/tools/snpEff/snpEff.config" </pre>Then simply run variant_pipeline with the -y flag:
<pre> ./variant_pipeline/pipette.rb variant -y ./sample_config.yml </pre>Related Skills
node-connect
352.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
