SkillAgentSearch skills...

StackNet

StackNet is a computational, scalable and analytical Meta modelling framework

Install / Use

/learn @kaz-Anova/StackNet
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

StackNet

This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science department at UCL. My PhD was sponsored by dunnhumby.

StackNet is empowered by H2O's agorithms

(NEW) There is a Python implementation of StackNet

StackNet and other topics can now be discussed on FaceBook too :

Contents

Alt text

What is StackNet

StackNet is a computational, scalable and analytical framework implemented with a software implementation in Java that resembles a feedforward neural network and uses Wolpert's stacked generalization [1] in multiple levels to improve accuracy in machine learning problems. In contrast to feedforward neural networks, rather than being trained through back propagation, the network is built iteratively one layer at a time (using stacked generalization), each of which uses the final target as its target.

The Sofware is made available under MIT licence.

[1] Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.

How does it work

Given some input data, a neural network normally applies a perceptron along with a transformation function like relu, sigmoid, tanh or others.

The StackNet model assumes that this function can take the form of any supervised machine learning algorithm

Logically the outputs of each neuron, can be fed onto next layers.

The algorithms can be classifiers or regressors or any estimator that produces an output..

For classification problems, to create an output prediction score for any number of unique categories of the response variable, all selected algorithms in the last layer need to have outputs dimensionality equal to the number those unique classes. In case where there are many such classifiers, the results is the scaled average of all these output predictions and can be written as:

The Modes

The stacking element of the StackNet model could be run with two different modes.

Normal stacking mode

The first mode (e.g. the default) is the one already mentioned and assumes that in each layer uses the predictions (or output scores) of the direct previous one similar with a typical feedforward neural network or equivalently:

Restacking mode

The second mode (also called restacking) assumes that each layer uses previous neurons activations as well as all previous layers neurons (including the input layer). Therefore the previous formula can be re-written as:

The intuition behind this mode is derived from the fact that the higher level algorithm has extracted information from the input data, but rescanning the input space may yield new information not obvious from the first passes. This is also driven from the forward training methodology discussed below and assumes that convergence needs to happen within one model iteration.

The modes may also be viewed bellow:

Alt text

K-fold Training

The typical neural networks are most commonly trained with a form of backpropagation, however, stacked generalization requires a forward training methodology that splits the data into two parts – one of which is used for training and the other for predictions. The reason this split is necessary is to avoid overfitting .

However splitting the data into just two parts would mean that in each new layer the second part needs to be further dichotomized increasing the bias as each algorithm will have to be trained and validated on increasingly fewer data. To overcome this drawback, the algorithm utilises a k-fold cross validation (where k is a hyperparameter) so that all the original training data is scored in different k batches thereby outputting n shape training predictions where n is the size of the samples in the training data. Therefore the training process consists of two parts:

  1. Split the data k times and run k models to output predictions for each k part and then bring the k parts back together to the original order so that the output predictions can be used in later stages of the model.

  2. Rerun the algorithm on the whole training data to be used later on for scoring the external test data. There is no reason to limit the ability of the model to learn using 100% of the training data since the output scoring is already unbiased (given that it is always scored as a holdout set).

The K-fold train/predict process is illustrated below:

Alt text

It should be noted that (1) is only applied during training to create unbiased predictions for the second layers model to fit one. During the scoring time (and after model training is complete) only (2) is in effect.

All models must be run sequentially based on the layers, but the order of the models within the layer does not matter. In other words, all models of layer one need to be trained to proceed to layer two but all models within the layer can be run asynchronously and in parallel to save time. The k-fold may also be viewed as a form of regularization where a smaller number of folds (but higher than 1) ensure that the validation data is big enough to demonstrate how well a single model could generalize. On the other hand higher k means that the models come closer to running with 100% of the training and may yield more unexplained information. The best values could be found through cross-validation. Another possible way to implement this could be to save all the k models and use the average of their predicting to score the unobserved test data, but this has all the models never trained with 100% of the training data and may be suboptimal.

Some Notes about StackNet

StackNet is (commonly) better than the best single model it contains in each first layer however, its ability to perform well still relies on a mix of strong and diverse single models in order to get the best out of this Meta modelling methodology.

StackNet (methodology - not the software) was also used to win the Truly Native data modelling competition hosted by the popular data science platform Kaggle in 2015

StackNet in simple terms is also explained in kaggle's blog

Network's example:

Alt text

StackNet is made available now with a handful of classifiers and regressors. The implementations are based on the original papers and software. However, most have some personal tweaks in them.

Algorithms contained

Native

Native - Not fully developed

  • knnClassifier
  • knnRegressor
  • KernelmodelClassifier
  • KernelmodelRegressor

Wrappers

Related Skills

View on GitHub
GitHub Stars1.3k
CategoryDevelopment
Updated10d ago
Forks340

Languages

Java

Security Score

95/100

Audited on Mar 24, 2026

No findings