StackNet
StackNet is a computational, scalable and analytical Meta modelling framework
Install / Use
/learn @kaz-Anova/StackNetREADME
StackNet
This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science department at UCL. My PhD was sponsored by dunnhumby.
StackNet is empowered by H2O's agorithms
(NEW) There is a Python implementation of StackNet
StackNet and other topics can now be discussed on FaceBook too :
Contents
- What is StackNet
- How does it work
- The Modes
- Some Notes about StackNet
- Algorithms contained
- Algorithm's Tuning parameters
- Run StackNet
- Installations
- Command Line Parameters
- Data Format
- Commandline Train Statement
- Commandline predict Statement
- Examples
- Run StackNet from within Java code
- Potential Next Steps
- Reference
- News
- Special Thanks

What is StackNet
StackNet is a computational, scalable and analytical framework implemented with a software implementation in Java that resembles a feedforward neural network and uses Wolpert's stacked generalization [1] in multiple levels to improve accuracy in machine learning problems. In contrast to feedforward neural networks, rather than being trained through back propagation, the network is built iteratively one layer at a time (using stacked generalization), each of which uses the final target as its target.
The Sofware is made available under MIT licence.
[1] Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
How does it work
Given some input data, a neural network normally applies a perceptron along with a transformation function like relu, sigmoid, tanh or others.
The StackNet model assumes that this function can take the form of any supervised machine learning algorithm
Logically the outputs of each neuron, can be fed onto next layers.
The algorithms can be classifiers or regressors or any estimator that produces an output..
For classification problems, to create an output prediction score for any number of unique categories of the response variable, all selected algorithms in the last layer need to have outputs dimensionality equal to the number those unique classes. In case where there are many such classifiers, the results is the scaled average of all these output predictions and can be written as:
The Modes
The stacking element of the StackNet model could be run with two different modes.
Normal stacking mode
The first mode (e.g. the default) is the one already mentioned and assumes that in each layer uses the predictions (or output scores) of the direct previous one similar with a typical feedforward neural network or equivalently:
Restacking mode
The second mode (also called restacking) assumes that each layer uses previous neurons activations as well as all previous layers neurons (including the input layer). Therefore the previous formula can be re-written as:
The intuition behind this mode is derived from the fact that the higher level algorithm has extracted information from the input data, but rescanning the input space may yield new information not obvious from the first passes. This is also driven from the forward training methodology discussed below and assumes that convergence needs to happen within one model iteration.
The modes may also be viewed bellow:

K-fold Training
The typical neural networks are most commonly trained with a form of backpropagation, however, stacked generalization requires a forward training methodology that splits the data into two parts – one of which is used for training and the other for predictions. The reason this split is necessary is to avoid overfitting .
However splitting the data into just two parts would mean that in each new layer the second part needs to be further dichotomized increasing the bias as each algorithm will have to be trained and validated on increasingly fewer data. To overcome this drawback, the algorithm utilises a k-fold cross validation (where k is a hyperparameter) so that all the original training data is scored in different k batches thereby outputting n shape training predictions where n is the size of the samples in the training data. Therefore the training process consists of two parts:
-
Split the data k times and run k models to output predictions for each k part and then bring the k parts back together to the original order so that the output predictions can be used in later stages of the model.
-
Rerun the algorithm on the whole training data to be used later on for scoring the external test data. There is no reason to limit the ability of the model to learn using 100% of the training data since the output scoring is already unbiased (given that it is always scored as a holdout set).
The K-fold train/predict process is illustrated below:

It should be noted that (1) is only applied during training to create unbiased predictions for the second layers model to fit one. During the scoring time (and after model training is complete) only (2) is in effect.
All models must be run sequentially based on the layers, but the order of the models within the layer does not matter. In other words, all models of layer one need to be trained to proceed to layer two but all models within the layer can be run asynchronously and in parallel to save time. The k-fold may also be viewed as a form of regularization where a smaller number of folds (but higher than 1) ensure that the validation data is big enough to demonstrate how well a single model could generalize. On the other hand higher k means that the models come closer to running with 100% of the training and may yield more unexplained information. The best values could be found through cross-validation. Another possible way to implement this could be to save all the k models and use the average of their predicting to score the unobserved test data, but this has all the models never trained with 100% of the training data and may be suboptimal.
Some Notes about StackNet
StackNet is (commonly) better than the best single model it contains in each first layer however, its ability to perform well still relies on a mix of strong and diverse single models in order to get the best out of this Meta modelling methodology.
StackNet (methodology - not the software) was also used to win the Truly Native data modelling competition hosted by the popular data science platform Kaggle in 2015
StackNet in simple terms is also explained in kaggle's blog
Network's example:

StackNet is made available now with a handful of classifiers and regressors. The implementations are based on the original papers and software. However, most have some personal tweaks in them.
Algorithms contained
Native
- AdaboostForestRegressor
- AdaboostRandomForestClassifier
- DecisionTreeClassifier
- DecisionTreeRegressor
- GradientBoostingForestClassifier
- GradientBoostingForestRegressor
- RandomForestClassifier
- RandomForestRegressor
- Vanilla2hnnregressor
- Vanilla2hnnclassifier
- Softmaxnnclassifier
- Multinnregressor
- NaiveBayesClassifier
- LSVR
- LSVC
- LogisticRegression
- LinearRegression
- LibFmRegressor
- LibFmClassifier
Native - Not fully developed
- knnClassifier
- knnRegressor
- KernelmodelClassifier
- KernelmodelRegressor
Wrappers
- XgboostRegressor
- XgboostClassifier
- LightgbmRegressor
- LightgbmClassifier
- FRGFRegressor
- FRGFClassifier
- [OriginalLibFMClassifier](/parameters/PARAMETERS.MD#originallibfmclassifier
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
