SkillAgentSearch skills...

Bashbone

A bash/biobash library for workflow and pipeline design

Install / Use

/learn @Hoffmann-Lab/Bashbone
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

Bashbone

A bash and biobash library for workflow and pipeline design within but not restricted to the scope of Next Generation Sequencing (NGS) data analyses.

Outline

For developers - bash library

▲ back to top

  • Get a full bash error stack trace in interactive shells or within scripts
  • Write command line code in your favorite programming language via Here-documents for later orchestrated execution
  • Add object-oriented programming (oop) like syntactic sugar to bash variables and arrays to avoid complex parameter-expansions, variable-expansions and brace-expansions
  • Execute commands in parallel on your machine or submit them as jobs to a workflow manager like sun grid engine (SGE) and log stdout, stderr and exit codes per job
  • Benchmark runtime and memory usage
  • Infer number of parallel instances according to targeted memory consumption or targeted threads per instance
  • Log execution of bash functions at different verbosity levels
  • Extend the library by custom bash functions which will inherit
    • Stack trace
    • Termination of all function related (sub-)processes, including asynchronous background jobs upon error/exit or when reaching prompt-command (interactive shell)
    • Removal of temporary files created via mktemp and execution of custom cleanup commands upon error/exit or when reaching prompt-command (interactive shell)
  • Profit from helper functions that implement
    • Multi-threaded joining of multiple files
    • Multi-threaded sorting
    • Multi-threaded de-compression
    • Multi-threaded compression plus indexing for random access by byte offset or line number without noticeable overhead
    • Multi-threaded application of commands on an compressed and indexed or flat file on per-line or per-chunk basis

For users - biobash library

▲ back to top

  • Get a full bash error stack trace in interactive shells or within scripts
  • Easily design multi-threaded pipelines to perform NGS related tasks
  • Use many best-practice parameterized and heavily run-time tweaked software wrappers
  • Most software related parameters will be inferred directly from input data, so that all functions require just a minimal set of input arguments
  • Benefit from a non-root stand-alone installer without need for any prerequisites
  • Get genomes, annotations from Ensembl, variants from GATK resource bundle and RAW sequencing data from NCBI Sequence Read Archive (SRA)

Covered tasks

▲ back to top

  • For paired-end and single-end derived raw sequencing or prior mapped read data
    • RNA-Seq protocols (RNA, RIP, m6A, ..)
    • DNA-Seq protocols (WGS, ChIP, Chip-exo, ATAC, CAGE, Quant, Cut&Tag, ..)
    • Bisulfite converted DNA-Seq protocols (WGBS, RRBS)
  • Data quality anlysis and preprocessing
    • adapter and poly-mono/di-nucleotide clipping
    • quality trimming
    • error correction
    • artificial rRNA depletion
  • Read alignment and post-processing
    • knapsack problem based slicing of alignment files for parallel task execution
    • sorting, filtering
    • UMI based de-duplication or removal of optical and PCR duplicates
    • generation of pools and pseudo-replicates
    • read group modification, split N-cigar reads, left-alignment and base quality score recalibration
  • Gene fusion detection
  • Methyl-C calling and prediction of differentially methylated regions
  • Expression analysis
    • Read quantification (also from quasi-mappings), TPM and Z-score normalization and heatmap plotting
    • Inference of strand specific library preparation methods
    • Inference of differential expression as well as clusters of co-expression
    • Detection of differential splice junctions and differential exon usage
    • Gene ontology (GO) gene set enrichment and over representation analysis plus semantic similarity based clustering and visualizations
  • Implementation of ENCODE v3 best-practice ChIP-Seq Peak calling
    • Peak calling from RIP-Seq, MeRIP-Seq, m6A-Seq and other related IP-Seq data
    • Inference of effective genome sizes
  • Variant detection from DNA or RNA sequencing experiments
    • Integration of multiple solutions for germline and somatic calling
    • VCF normalization
    • Tree reconstruction from homozygous sites
  • ssGSEA and survival analysis from TCGA cancer expression data
  • Genome and SRA data retrieval
    • Genome to transcriptome conversion
    • Data visualization via IGV batch processing

License

▲ back to top

The whole project is licensed under the GPL v3 (see LICENSE file for details), except the the third-party tools set-upped during installation. Please refer to the corresponding licenses

Copyleft (C) 2020, Konstantin Riege

Download

▲ back to top

This will download you a copy which includes the latest developments

git clone --recursive https://github.com/Hoffmann-Lab/bashbone

To check out the latest release (irregularly compiled) do

cd bashbone
git checkout $(git describe --tags)

Bash library usage (without full installation)

▲ back to top

Do's and don'ts

▲ back to top

When used, in a script, bashbone is meant to be sourced at the very top to handle positional arguments and to re-execute (-r true) the script under its own process group id in order to take care of proper termination (-a "$@"). It will enable error stack tracing and sub-process handling globally by setting traps for EXIT ERR RETURN INT. So, don't override them. In case your script intends to spawn deamons use setsid or disable bashbone first.

#!/usr/bin/env bash
source <path/to/bashbone>/activate.sh -r true -a "$@"
# do stuff
# now spawn deamons
setsid deamon1 &
bashbone -x
deamon2 &

Please note, that error tracing in bash is circumvented by using || or '&&' constructs. Therefore, avoid them in any context of function calls.

#!/usr/bin/env bash
source <path/to/bashbone>/activate.sh -r true -a "$@"

function myfun(){
  cat file_not_found
  echo "error ignored. starting time consuming calculation now."
}
# DON'T !
myfun || echo "failed with code $?"

Quick start

▲ back to top

To get all third-party tools set-upped and subsequently all biobash bashbone functions to work properly, see also

For a lite installation that gets you the minimum required tools (GNU parallel, gztool, mdless) in order to make use of developer functions, execute

scripts/setup.sh -i lite -d <path/to/installation>

To see the usage, do

scripts/setup.sh -h

DESCRIPTION
Bashbone setup routine

SYNOPSIS
setup.sh -i [all|upgrade] -d [path]

OPTIONS
-i | --install [lite|all|upgrade] : install into given directory
-g | --use-config                 : use supplied yaml files and URLs instead of cutting edge tools
-d | --directory [path]           : installation path
-t | --threads [value]            : threads - predicted default: 32
-l | --log [path]                 : log file - default: [-d]/install.log
-v | --verbose                    : enable verbose mode
-h | --help                       : prints this message

DEVELOPER OPTIONS
-s | --source [path,..]           : source file(s) to overload compile::[lite|all|upgrade|<tool>] functions
-i | --install [<tool>,..]        : install into given directory

Now load the bashbone library in an interactive terminal session. Note, that none of your environment settings will be modified.

source <path/of/installation>/latest/bashbone/activate.sh

To see the activate script usage, do

source <path/of/installation>/latest/bashbone/activate.sh -h

This is bashbone activation script.

To see lists of available options and functions, source me and execute bashbone -h

Usage:
-h              | this help
-l <legacymode> | true/false let commander inerts line breaks, thus crafts one-liners from makecmd here-documents
                  default: false
-i <path>       | to installation root <path>/latest
                  default: infe

Related Skills

View on GitHub
GitHub Stars10
CategoryDesign
Updated6mo ago
Forks4

Languages

Shell

Security Score

82/100

Audited on Sep 22, 2025

No findings