Arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

Generate Convert Improve

Install / Use

/learn @aertslab/Arboreto

About this skill

Quality Score

0/100

README

.. image:: img/arboreto.png :alt: arboreto :scale: 100% :align: left

.. image:: https://travis-ci.com/aertslab/arboreto.svg?branch=master :alt: Build Status :target: https://travis-ci.com/aertslab/arboreto

.. image:: https://readthedocs.org/projects/arboreto/badge/?version=latest :alt: Documentation Status :target: http://arboreto.readthedocs.io/en/latest/?badge=latest

.. image:: https://anaconda.org/bioconda/arboreto/badges/version.svg :alt: Bioconda package :target: https://anaconda.org/bioconda/arboreto

.. image:: https://img.shields.io/pypi/v/arboreto :alt: PyPI package :target: https://pypi.org/project/arboreto/

.. epigraph::

*The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.*

.. _arboreto: https://arboreto.readthedocs.io .. _arboreto documentation: https://arboreto.readthedocs.io .. _notebooks: https://github.com/tmoerman/arboreto/tree/master/notebooks .. _issue: https://github.com/tmoerman/arboreto/issues/new

.. _dask: https://dask.pydata.org/en/latest/ .. _dask distributed: https://distributed.readthedocs.io/en/latest/

.. _GENIE3: http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html .. _Random Forest: https://en.wikipedia.org/wiki/Random_forest .. _ExtraTrees: https://en.wikipedia.org/wiki/Random_forest#ExtraTrees .. _Stochastic Gradient Boosting Machine: https://en.wikipedia.org/wiki/Gradient_boosting#Stochastic_gradient_boosting .. _early-stopping: https://en.wikipedia.org/wiki/Early_stopping

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances in high-throughput gene profiling technology.

The arboreto_ software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms exemplified by GENIE3_ [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by a series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression model to predict a target gene's expression profile.

Members of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto, we specify the parallelizable computation as a dask_ graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the tasks in a dask graph to the available computational resources. Arboreto uses the dask distributed_ scheduler to spread out the computational tasks over multiple processes running on one or multiple machines.

Arboreto currently supports 2 GRN inference algorithms:

GRNBoost2: a novel and fast GRN inference algorithm using Stochastic Gradient Boosting Machine_ (SGBM) [3] regression with early-stopping_ regularization.
GENIE3: the classic GRN inference algorithm using Random Forest_ (RF) or ExtraTrees_ (ET) regression.

Get Started

Arboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.

Read the arboreto documentation_.
Browse example notebooks_.
Report an issue_.

License

BSD 3-Clause License

pySCENIC

.. _pySCENIC: https://github.com/aertslab/pySCENIC .. _SCENIC: https://aertslab.org/#scenic

Arboreto is a component in pySCENIC_: a lightning-fast python implementation of the SCENIC_ pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

References

Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE
Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., ... & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.
Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083–1086 (2017). doi: 10.1038/nmeth.4463

Related Skills

claude-opus-4-5-migration

83.3k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

337.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

TrendRadar

49.8k

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

mcp-for-beginners

15.7k

This open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.

aertslab

View profile

View on GitHub

GitHub Stars66

CategoryEducation

Updated23d ago

Forks35

aertslab/arboreto

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 3, 2026

No findings