Mango
Parallel Hyperparameter Tuning in Python
Install / Use
/learn @ARM-software/MangoQuality Score
Category
OperationsSupported Platforms
Tags
README
Mango: A parallel hyperparameter tuning library
Mango is a python library to find the optimal hyperparameters for machine learning classifiers. Mango enables parallel optimization over complex search spaces of continuous/discrete/categorical values.
Check out the quick 12 seconds demo of Mango approximating a complex decision boundary of SVM
Mango has the following salient features:
- Easily define complex search spaces compatible with the scikit-learn.
- A novel state-of-the-art gradient-free optimizer for continuous/discrete/categorical values.
- Modular design to schedule objective function on local, cluster, or cloud infrastructure.
- Failure detection in the application layer for scalability on commodity hardware.
- New features are continuously added due to the testing and usage in production settings.
Index
- Installation
- Getting started
- Hyperparameter tuning example
- Search space definitions
- Scheduler
- Optional configurations
- Additional features
- CASH feature
- Platform-aware neural architecture search
- Mango introduction slides & Mango production usage slides.
- Core Mango research papers to cite and novel applications built over Mango
<a name="setup"></a>
1. Installation
Using pip:
pip install arm-mango
From source:
$ git clone https://github.com/ARM-software/mango.git
$ cd mango
$ pip3 install .
<!--
- Mango requires scikit-learn and is developed for python 3, some other packages are installed which required to optimize xgboost classifiers and fbprophet.
!-->
<a name="getting-started"></a>
2. Getting Started
Mango is straightforward to use. Following example minimizes the quadratic function whose input is an integer between -10 and 10.
from mango import scheduler, Tuner
# Search space
param_space = dict(x=range(-10,10))
# Quadratic objective Function
@scheduler.serial
def objective(x):
return x * x
# Initialize and run Tuner
tuner = Tuner(param_space, objective)
results = tuner.minimize()
print(f'Optimal value of parameters: {results["best_params"]} and objective: {results["best_objective"]}')
# => Optimal value of parameters: {'x': 0} and objective: 0
<a name="knnexample"></a>
3. Hyperparameter Tuning Example
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from mango import Tuner, scheduler
# search space for KNN classifier's hyperparameters
# n_neighbors can vary between 1 and 50, with different choices of algorithm
param_space = dict(n_neighbors=range(1, 50),
algorithm=['auto', 'ball_tree', 'kd_tree', 'brute'])
@scheduler.serial
def objective(**params):
X, y = datasets.load_breast_cancer(return_X_y=True)
clf = KNeighborsClassifier(**params)
score = cross_val_score(clf, X, y, scoring='accuracy').mean()
return score
tuner = Tuner(param_space, objective)
results = tuner.maximize()
print('best parameters:', results['best_params'])
print('best accuracy:', results['best_objective'])
# => best parameters: {'algorithm': 'ball_tree', 'n_neighbors': 11}
# => best accuracy: 0.9332401800962584
Note that best parameters may be different but accuracy should be ~ 0.93. More examples are available
in the examples directory (Facebook's Prophet,
XGBoost, SVM).
<a name="DomainSpace"></a>
4. Search Space
The search space defines the range and distribution of input parameters to the objective function. Mango search space is compatible with scikit-learn's parameter space definitions used in RandomizedSearchCV or GridSearchCV. The search space is defined as a dictionary with keys being the parameter names (string) and values being list of discreet choices, range of integers or the distributions.
[!NOTE]
Mango does not scale or normalize the search space parameters by default. Users should use their judgement on whether input space needs to be normalized.
Example of some common search spaces are:
Integer
Following space defines x as an integer parameters with values in range(-10, 11) (11 is not included):
param_space = dict(x=range(-10, 11)) #=> -10, -9, ..., 10
# you can use steps for sparse ranges
param_space = dict(x=range(0, 101, 10)) #=> 0, 10, 20, ..., 100
Integers are uniformly sampled from the given range and are assumed to be ordered and treated as continuous variables.
Categorical
Discreet categories can be defined as lists. For example:
# string
param_space = dict(color=['red', 'blue', 'green'])
# float
param_space = dict(v=[0.2, 0.1, 0.3])
# mixed
param_space = dict(max_features=['auto', 0.2, 0.3])
Lists are uniformly sampled and are assumed to be unordered. They are one-hot encoded internally.
Distributions
All the distributions, including multivariate, supported by scipy.stats are supported.
In general, distributions must provide a rvs method for sampling.
Uniform distribution
Using uniform(loc, scale) one obtains the uniform distribution on [loc, loc + scale].
from scipy.stats import uniform
# uniformly distributed between -1 and 1
param_space = dict(a=uniform(-1, 2))
Log uniform distribution
We have added loguniform distribution by extending the scipy.stats.distributions constructs.
Using loguniform(loc, scale) one obtains the loguniform distribution on <code>[10<sup>loc</sup>, 10<sup>loc + scale</sup>]</code>.
from mango.domain.distribution import loguniform
# log uniformly distributed between 10^-3 and 10^-1
param_space = dict(learning_rate=loguniform(-3, 2))
Hyperparameter search space examples
Example hyperparameter search space for Random Forest Classifier:
param_space = dict(
max_features=['sqrt', 'log2', .1, .3, .5, .7, .9],
n_estimators=range(10, 1000, 50), # 10 to 1000 in steps of 50
bootstrap=[True, False],
max_depth=range(1, 20),
min_samples_leaf=range(1, 10)
)
Example search space for XGBoost Classifier:
from scipy.stats import uniform
from mango.domain.distribution import loguniform
param_space = {
'n_estimators': range(10, 2001, 100), # 10 to 2000 in steps of 100
'max_depth': range(1, 15), # 1 to 14
'reg_alpha': loguniform(-3, 6), # 10^-3 to 10^3
'booster': ['gbtree', 'gblinear'],
'colsample_bylevel': uniform(0.05, 0.95), # 0.05 to 1.0
'colsample_bytree': uniform(0.05, 0.95), # 0.05 to 1.0
'learning_rate': loguniform(-3, 3), # 0.001 to 1
'reg_lambda': loguniform(-3, 6), # 10^-3 to 10^3
'min_child_weight': loguniform(0, 2), # 1 to 100
'subsample': uniform(0.1, 0.89) # 0.1 to 0.99
}
Example search space for SVM:
from scipy.stats import uniform
from mango.domain.distribution import loguniform
param_dict = {
'kernel': ['rbf', 'sigmoid'],
'gamma': uniform(0.1, 4), # 0.1 to 4.1
'C': loguniform(-7, 8) # 10^-7 to 10
}
<a name="scheduler"></a>
5. Scheduler
Mango is designed to take advantage of distributed computing. The objective function can be scheduled to
run locally or on a cluster with parallel evaluations. Mango is designed to allow the use of any distributed
computing framework (like Celery or Kubernetes). The scheduler module comes with some pre-defined
schedulers.
Serial scheduler
Serial scheduler runs locally with one objective function evaluation at a time
from mango import scheduler
@scheduler.serial
def objective(x):
return x * x
Parallel scheduler
Parallel scheduler runs locally and uses joblib to evaluate the objective functions in parallel
from mango import scheduler
@scheduler.parallel(n_jobs=2)
def

