SkillAgentSearch skills...

Pipette

variant discovery and annotation using GATK and Ensembl

Install / Use

/learn @metalhelix/Pipette
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

h1. Pipette: A framework for Bioinformatic pipelines

Pipette is a framework to quickly and easily build a pipeline for multi-step processing of input. It is especially suited for Bioinformatic applications.

Pipette also contains a collection of pipelines created in Pipette. Current Pipette Pipelines:

  • variant: a variant calling and annotation pipeline.

h3. How it works

Pipette pipelines are made of steps. Each step is independent of other steps. Each step defines 2 attributes for itself:

  • input: The input parameters the step needs to run.
  • output: The parameters the step will generate after running.

Pipette chains these steps together, passing in input parameters and collecting output parameters automatically.

To create a new Pipeline in Pipette, simply sub-class the Pipeline class.

h3. Tools

Pipette also includes a number of tools that perform the bulk of the processing for each step. These tools are simple wrappers for existing bioinformatic programs and also custom processing.

h3. Requirements

Pipette is written in Ruby. Also, various tools are expected to be installed if used. See below for required tools for each pipeline.

h1. Pipette Pipelines:

h2. Variant calling and annotation pipeline

Pipeline to facilitate the automation of finding variants (SNPs/Indels) in an aligned BAM file.

h3. Prerequisites

Currently, this pipeline does not perform the initial alignment step. This pipeline starts with an indexed BAM file and a reference genome.

h3. Requirements

This pipeline wraps the calling of a number of external tools to perform the variant calling and annotation.

h3. Current Applications Expected by vp:

  • "GATK":http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit - performs most of the SNP / Indel calling steps using the GATK's "Unified Genotyper":http://www.broadinstitute.org/gsa/wiki/index.php/Unified_genotyper
  • "samtools":http://samtools.sourceforge.net/ - for indexing output from GATK
  • "snpEff":http://snpeff.sourceforge.net/ - for annotation

h3. Configuration Options

<pre> ./variant_pipeline.rb -h Usage: variant_pipeline [options] -i, --input BAM_FILE REQUIRED - Input BAM file to call SNPs on -r, --reference FA_FILE REQUIRED - Reference Fasta file for genome -o, --output PREFIX Output prefix to use for generated files -j, --cores NUM Specify number of cores to run GATK on. Default: 4 -c, --recalibrate COVARIATE_FILE If provided, recalibration will occur using input covariate file. Default: recalibration not performed -a, --annotate GENOME Annotate the SNPs and Indels using Ensembl based on input GENOME. Example Genome: FruitFly --gatk JAR_FILE Specify GATK installation --snpeff JAR_FILE Specify snppEff Jar location --snpeff_config CONFIG_FILE Specify snppEff config file location --samtools BIN_PATH Specify location of samtools -q, --quiet Turn off some output -s realign,recalibrate,call,filter,annotate, --steps Specify only which steps of the pipeline should be executed -y, --yaml YAML_FILE Yaml configuration file that can be used to load options. Command line options will trump yaml options -h, --help Displays help screen, then exits </pre>

h3. Run Example

The easiest way to run the variant pipeline is to create a config yaml file to store most of the required configurations and then run using the -y flag.

h3. sample_config.yml

<pre> reference: "~/genomes/Drosophila_melanogaster.BDGP5.4.54.dna.fasta" input: "./aligned.fly.bam" output: aligned.fly annotate: dm5.34 gatk: "~/tools/GATK/GenomeAnalysisTK.jar" snpeff: "~/tools/snpEff/snpEff.jar" snpeff_config: "~/tools/snpEff/snpEff.config" </pre>

Then simply run variant_pipeline with the -y flag:

<pre> ./variant_pipeline/pipette.rb variant -y ./sample_config.yml </pre>

Related Skills

View on GitHub
GitHub Stars17
CategoryDevelopment
Updated2y ago
Forks2

Languages

Ruby

Security Score

60/100

Audited on Jan 30, 2024

No findings