DFNET
Network-guided greedy decision forest for feature subset selection
Install / Use
/learn @pievos101/DFNETREADME
Network-guided greedy decision forest for feature subset selection
Paper
https://www.nature.com/articles/s41598-022-21417-8
Installation
The DFNET R-package can be installed using devtools.
install.packages("devtools")
devtools::install_github("pievos101/DFNET")
Usage
See our examples using synthetic data sets or real world cancer data.
Generally speaking, DFNET follows a four step process:
- Preparing the input data (graph and features)
- Training the forest.
- Finding useful decision trees.
- Using these trees for evaluation.
Preparing input data
DFNET expects an igraph::igraph and a 2D or 3D feature array, as well as a
target vector with the same number of rows as the array.
The vertex names of the graph should be the same as the column names of the array.
When in doubt, use launder or related functions to prepare the input data.
Training the forest
Once you have your graph and features, you can train your forest like so:
forest <- train(,
graph, features, target,
...
)
If you have a pre-trained forest, you can use that for training as well:
forest <- train(forest,
graph, features, target,
...
)
Finding useful trees
Since DFNET performs greedy optimization, the last generation of trees
is the best according to the provided test metric. DFNET provides overrides for
the standard R methods head and tail, which return generation.
# get the selected modules
last_gen <- tail(forest, 1)
tree_imp <- attr(last_gen, "last.performance")
Note, that performance metrics for earlier generations are not kept. Several importance scores can be derived from these metrics.
e_imp <- edge_importance(graph, last_gen$trees, tree_imp)
f_imp <- feature_importance(last_gen, features)
m_imp <- module_importance(
graph,
last_gen$modules,
e_imp,
tree_imp
)
The module importance is particularly useful for feature selection, as it combines the importance of edges within a module with the overall accuracy of the decision tree. You can use it to order decision trees or simply extract the best one.
best <- which.max(as.numeric(m_imp[, "total"]))
best.tree <- last_gen$trees[[best]]
by_importance <- order(m_imp[, "total"], decreasing = TRUE)
last_gen$trees[by_importance]
Using these trees for evaluation
DFNET provides an override for the predict method, that functions much like ranger's.
# Predict using the best DT
pred_best = predict(best.tree, test_data)$predictions
# predict using all detected modules
pred_all = predict(last_gen, test_data)$predictions
You can use ModelMetrics to evaluate the accuracy, precision, recall, or other performance metrics.
ModelMetrics::auc(pred_best, test_target)
ModelMetrics::auc(pred_all, test_target)
Now, lets check the performance of that module on the independent test data set. We compare the results with the performance of all trees selected.
# Prepare test data
colnames(mRNA_test) = paste(colnames(mRNA_test),"$","mRNA", sep="")
colnames(Methy_test) = paste(colnames(Methy_test),"$","Methy", sep="")
DATA_test = as.data.frame(cbind(mRNA_test, Methy_test))
# Predict using the best DT
pred_best = predict(best_DT, DATA_test)$predictions
# predict using all detected modules
pred_all = predict(last_gen, DATA_test)$predictions
pred_best
pred_all
# Check the performance of the predictions
ModelMetrics::auc(pred_best, target[test_ids])
ModelMetrics::auc(pred_all, target[test_ids])
Finally, we provide an extension to compute tree-based SHAP values via treeshap.
forest_unified = dfnet.unify(last_gen$trees, test_data)
forest_shap = treeshap(forest_unified, test_data)
BibTeX Citation
@article{pfeifer2022multi,
title={Multi-omics disease module detection with an explainable Greedy Decision Forest},
author={Pfeifer, Bastian and Baniecki, Hubert and Saranti, Anna and Biecek, Przemyslaw and Holzinger, Andreas},
journal={Scientific Reports},
volume={12},
number={1},
pages={1--15},
year={2022},
publisher={Nature Publishing Group}
}
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
