alttext

Table of contents:

Introduction
Installation
Input Data
Using HEROS
Hyperparameters
Algorithm History
Citing HEROS
Futher Documentation
License
Contact
Acknowledgements

Introduction

HEROS (Heuristic Evolutionary Rule Optimization System) is an evolutionary rule-based machine learning (ERBML) algorithm framework for supervised learning. It is designed to agnostically modeling simple/complex and/or clean/noisy problems (without hyperparameter optimization) and yield maximally human interpretable models. HEROS adopts a two-phase approach separating rule optimization, and rule-set (i.e. model) optimization, each with distinct multi-objective Pareto-front-based optimization. Rules are optimized based on maximizing rule-accuracy and instance coverage using a Pareto-inspired rule fitness function. Differently, models are optimized based on maximizing balanced accuracy and minimizing rule-set size using an NSGA-II-inspired evolutionary algorithm. This package is scikit-learn compatible. A simple visual summary of HEROS rule-based modeling is given below:

alttext

To date, HEROS functionality has been validated on binary classification problems, and has also passed bug checks on multiclass outcomes and data with a mix of categorical and quantitative features. This project is under active development with a number of improvements/expansions planned or in progress. For example, we will be expanding HEROS to support regression, and survival outcomes in future releases.

A schematic detailing how the HEROS algorithm works is given below:

alttext

Installation

HEROS can be installed with pip or by cloning this repository.

Pip

HEROS can most easily be installed using the following pip command:

pip install skheros

In order to run the HEROS_Demo_Notebook, download it and make sure to set the following notebook parameter to False in order to import HEROS from the above pip installation.

load_from_cloned_repo = False

Clone Respository

To install/run HEROS from this cloned repository, run the following commands from the desired folder:

git clone --single-branch https://github.com/UrbsLab/heros
cd heros
pip install -r requirements.txt

Input Data

HEROS's fit() method takes 'X', an array-like {n_samples, n_features} object of training instances, as well as 'y', an array-like {n_samples} object of training labels, like other standard scikit-learn classification algorithms.

Specifying Feature Types (Categorical vs. Quantiative)

The fit() method can (and should) also be passed 'cat_feat_indexes', an array-like max({n_features}) object of feature indexes in 'X' that are to be treated as categorical variables (where all others will be treated as quantitative by default).

Instance Identifiers

The fit() method can optionally be passed 'row_id', an array-like {n_samples} object of instance lables to link internal feature tracking scores with specific training instances.

Loading Expert Knowledge Scores for (Phase I) Rule-Covering (i.e. Initialization)

The fit() method can optionally be passed 'ek', an array-like {n_features} object of feature weights that probabilistically influence rule covering (i.e. rule-initialization), such that features with higher weights are more likely to be 'specified' within initialized rules.

Loading a Previously Trained (Phase I) Rule-Population

Lastly, the fit() method can optionally be passed 'pop_df', a dataframe object, including a previously trained HEROS-formatted rule population. This allows users to reboot progress from a previous run, or to manually add their own custom rules to the initial population.

Using HEROS

HEROS can be used similar to other scikit-learn supervised machine learning modeling algorithms with some added funcationalities to enhance run options, and post analyses (e.g. visualization generation).

Demonstration Notebook

A Jupyter Notebooks has been included to demonstrate how HEROS (and it's functions) can be applied to train, evaluate, and apply models with a wide variety of saved outputs, visualizations and model prediction explanations. We strongly recommend exploring this demonstration notebook to get familiar with HEROS and its capabilities.

HEROS_Demo_Notebook

This notebook is currently set up to run by cloning this repository and running the included notebook.

Basic Run Command Walk-Through

As a simple example of HEROS data preparation and training:

# Data Preparation
train_data = pd.read_csv('evaluation/datasets/partitioned/gametes/A_uni_4add_CV_Train_1.txt', sep="\t")
outcome_label = 'Class'
X = train_data.drop(outcome_label, axis=1)
cat_feat_indexes = list(range(X.shape[1])) #all feature are categorical
X = X.values
y = train_df[outcome_label].values 

# HEROS Initialization and Training
from skheros.heros import HEROS # import from pip installation
heros = HEROS(iterations=10000, pop_size=500, nu=1, model_iterations=100, model_pop_size=100)
heros = heros_trained.fit(X, y, cat_feat_indexes=cat_feat_indexes)

Once trained, HEROS can be applied to make predictions on testing data. Users have the option to choose the model to use; either (1) the top Phase II model from the model-pareto front (selected based on maximizing testing performance, maximizing instance coverage, and, if possible, minimizing rule-count) - RECOMMENDED, (2) the default top Phase II model (automatically selected based on training performance), or (3) the entire Phase I rule population.

Below is an example of the first option (RECOMMENDED):

# Data Preparation
test_data = pd.read_csv('evaluation/datasets/partitioned/gametes/A_uni_4add_CV_Test_1.txt', sep="\t")
X_test = test_data.drop(outcome_label, axis=1)
X_test = X_test.values
y_test = test_data[outcome_label].values 

# HEROS Prediction (Model selection via Phase II Model Testing Evaluation) and Performance Report
best_model_index = heros.auto_select_top_model(X_test,y_test)
predictions = heros.predict(X_test, target_model=best_model_index)
print(classification_report(predictions, y_test, digits=8))

To get predicitions with the second option, after preparing the data we would run the following:

# HEROS Prediction (Model selection via Phase II Default Model Selection) and Performance Report
predictions = heros.predict(X_test)
print(classification_report(predictions, y_test, digits=8))

To get predictions with the third option, after preparing the data we would run the following:

# HEROS Prediction (Whole Phase I Rule Population Applied as Model) and Performance Report
predictions = heros.predict(X_test,whole_rule_pop=True)
print(classification_report(predictions, y_test, digits=8))

Differently, HEROS will return prediction probabilities using the following:

predictions = heros.predict_proba(X_test, target_model=best_model_index)

HEROS can also return whether each instance is covered (i.e. at least one rule matches it in the given 'model') using the following:

predictions = heros.predict_covered(X_test, target_model=best_model_index)

Lastly, HEROS can give direct explanations of individual model predictions using the following:

testing_instance = X_test[0] # Testing instance index 0 arbitrarily chosen here
heros.predict_explanation(testing_instance, feature_names, target_model=best_model_index)

The parameter, feature_names, is the ordered list of original feature names from the training dataset.

Below is a simple example prediction explanation for a HEROS model trained on the 6-bit multiplexer problem:

PREDICTION REPORT ------------------------------------------------------------------

Outcome Prediction: 0

Model Prediction Probabilities: {0: 1.0, 1: 0.0}

Instance Covered by Model: Yes

Number of Matching Rules: 1

PREDICTION EXPLANATION -------------------------------------------------------------

Supporting Rules: --------------------

6 rule copies assert that IF: (A_0 = 0) AND (A_1 = 0) AND (R_0 = 0) THEN: predict outcome '0' with 100.0% confidence based on 68 matching training instances (15.11% of training instances)

Contradictory Rules: -----------------

No contraditory rules matched.

In the case that multiple rules match an instance, they will all be displayed in a similar human-readable format.

Hyperparameters

Key Hyperparameters

While HEROS has a number of available hyperparameters only a few are expected to have a significant impact on algorithm performance (see first table below). In general, setting iterations and pop_size to larger integers is expected to improve training performance, but will require longer Phase I run times, and the same is true for model_iterations and model_pop_size with respect to Phase II. The nu parameter should always be set to 1 unless the user is confident that they are modeling a problem that can achieve 100% testing accuracy (i.e. a problem with no signal noise).

| Hyperparameter | Description | Type/Options | Default Value | | -------------- | ----------- | ------------- | ------------- | | iterations | Number of (rule population) learning iterations (Phase I) | int | 100000 | | pop_size | Maximum 'micro' rule-population size (Phase I) | int | 1000 | | *mo

Heros

Install / Use

README

Introduction

Installation

Pip

Clone Respository

Input Data

Specifying Feature Types (Categorical vs. Quantiative)

Instance Identifiers

Loading Expert Knowledge Scores for (Phase I) Rule-Covering (i.e. Initialization)

Loading a Previously Trained (Phase I) Rule-Population

Using HEROS

Demonstration Notebook

Basic Run Command Walk-Through

Hyperparameters

Key Hyperparameters