SkillAgentSearch skills...

MrBiomics

MrBiomics: Composable modules and recipes automate bioinformatics for multi-omics analyses

Install / Use

/learn @epigen/MrBiomics

README

🚀🧬 MrBiomics: Composable <ins>m</ins>odules and <ins>r</ins>ecipes automate <ins>bi</ins>oinformatics for multi-<ins>omics</ins> analyses

"For many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few")." - The Pareto Principle by Vilfredo Pareto

Get 80% of all standard (biomedical) data science analyses done semi-automated with 20% of the effort, by leveraging Snakemake's module functionality to use and combine pre-existing workflows into arbitrarily complex analyses.

[!IMPORTANT]
If you use MrBiomics, please don't forget to give credit to the authors by citing this original repository and the respective Modules and Recipes.

⚡ Quickstart: 5 Commands to Your First Results!

Stop wrestling with complicated setups and start discovering new biology! Get MrBiomics up and running from scratch (assuming conda is installed) with exactly 5 commands:

conda create -y -n snakemake -c conda-forge -c bioconda snakemake=8.25.3
git clone https://github.com/epigen/MrBiomics.git
cd MrBiomics
conda activate snakemake
snakemake --software-deployment-method conda --cores 1

🎉 Boom! You're analyzing data! This simple snippet creates your Snakemake environment, fetches MrBiomics, and executes the built-in quickstart analysis.

[!TIP]

Grab a coffee! It takes around 10 minutes to run the first time as it automatically installs all the required software for you.

Curious about the magic that just happened? Dive deeper and explore MrBiomics' capabilities starting with our Quickstart!

⏳ TL;DR - More Time for Science!

"Programming is about trying to make the future less painful. It’s about making things easier for our teammates." from The Pragmatic Programmer by Andy Hunt & Dave Thomas

  • Why: Time is the most precious resource. By taking care of efficiency (i.e., maximum output with limited resources) scientists can re-distribute their time to focus on effectiveness (i.e., the biggest impact possible).
  • How: Functional Knowledge Management. Leverage the latest developments in workflow management to (re-)use and combine independent computational modules into arbitrarily complex analyses to benefit from modern innovation methods (e.g., fast prototyping, design thinking, and agile concepts).
  • What: Independent and single-purpose computational Modules, implemented as Snakemake workflows, encode standard approaches that are used to scale, automate, and parallelize analyses. Recipes combine modules into end-to-end best practice workflows, thereby accelerating analyses to the point of the unknown. Snakemake's module functionality enables Projects to combine modules, recipes and custom code into arbitrarily complex multi-omics analyses at scale.

Illustration of MrBiomics Modules, Recipes and Projects Illustration of MrBiomics Modules, Recipes and Projects applied to a case study on human hematopoiesis

[!NOTE]
Altogether this enables complex, portable, transparent, reproducible, and documented analyses of multi-mics data at scale.

🧠 Functional Knowledge Management

"The best documentation is automation." - Wise Person on the Internet

Functional Knowledge Management (FKM) is our knowledge-management approach in which validated best practices are captured as executable software functions, modules, or recipes.

  • Each artefact simultaneously documents the know-how and performs the task, creating a living, testable, and composable code collection that closes the gap between theory and practice.
  • Rigorous modularity and version control keep every function self-contained, tested, and repository-tracked, enabling safe reuse and continuous evolution.
  • As a compounding asset base new functions can be built on earlier ones, steadily expanding an ever-richer compendium of trusted solutions.

🧩 Modules

"Is it functional, multifunctional, durable, well-fitted, simple, easy to maintain, and thoroughly tested? Does it provide added value, and doesn't cause unnecessary harm? Can it be simpler? Is it an innovation?" - Patagonia Design Principles

Modules are Snakemake workflows, consisting of Rules for multi-step analyses, that are independent, single-purpose, and sufficiently abstracted to be compatible with most up- and downstream analyses. A {module} can be general-purpose (e.g., Unsupervised Analysis) or modality-specific (e.g., ATAC-seq processing). Currently, the following eleven modules are available, roughly ordered by their applicability from general to specific:

| Module | Type (Data Modality) | DOI | Version | Stars | | :---: | :---: | :---: | :---: | :---: | | Unsupervised Analysis | General Purpose<br>(tabular data) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/unsupervised_analysis?style=plastic"> | | Fetch NGS Data and Metadata using iSeq | Bioinformatics<br>(NGS data) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/fetch_ngs?style=plastic"> | Split, Filter, Normalize and Integrate Sequencing Data | Bioinformatics<br>(NGS counts) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/spilterlize_integrate?style=plastic"> | | Differential Analysis with limma | Bioinformatics<br>(NGS data) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/dea_limma?style=plastic"> | Enrichment Analysis | Bioinformatics<br>(genes/genomic regions) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/enrichment_analysis?style=plastic"> | | Genome Track Visualization | Bioinformatics<br>(aligned BAM files) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/genome_tracks?style=plastic"> | | ATAC-seq Processing, Quantification & Annotation | Bioinformatics<br>(ATAC-seq) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/atacseq_pipeline?style=plastic"> | | RNA-seq Processing, Quantification & Annotation | Bioinformatics<br>(RNA-seq) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/rnaseq_pipeline?style=plastic"> | | scRNA-seq Processing using Seurat | Bioinformatics<br>(scRNA-seq) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/scrnaseq_processing_seurat?style=plastic"> | | Differential Analysis using Seurat | Bioinformatics<br>(scRNA-seq) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/dea_seurat?style=plastic"> | | Perturbation Analysis using Mixscape from Seurat | Bioinformatics<br>(scCRISPR-seq) | DOI | GitHub Release | <img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/epigen/mixscape_seurat?style=plastic"> |

[!NOTE]
⭐️ Star and share modules you find valuable 📤 — help others discover them, and guide our future work!

[!TIP] For detailed instructions on the installation, configuration, and execution of modules, you can check out the wiki. Generic instructions are also shown in the modules' respective Snakmake workflow catalog entry.

📋 Projects using multiple Modules

_“Absorb what is useful. Discard what is no

Related Skills

View on GitHub
GitHub Stars165
CategoryData
Updated10h ago
Forks3

Languages

R

Security Score

100/100

Audited on Apr 1, 2026

No findings