HKube is a cloud-native open source framework to run distributed pipeline of algorithms built on Kubernetes.

HKube optimally utilizing pipeline's resources, based on user priorities and heuristics.

Features

Distributed pipeline of algorithms
- Receives DAG graph as input and automatically parallelizes your algorithms over the cluster.
- Manages the complications of distributed processing, keep your code simple (even single threaded).
Language Agnostic - As a container based framework designed to facilitate the use of any language for your algorithm.
Batch Algorithms - Run algorithms as a batch - instances of the same algorithm in order to accelerate the running time.
Optimize Hardware Utilization
- Containers automatically placed based on their resource requirements and other constraints, while not sacrificing availability.
- Mixes critical and best-effort workloads in order to drive up utilization and save resources.
- Efficient execution and clustering by heuristics which uses pipeline and algorithm metrics with combination of user requirements.
Build API - Just upload your code, you don't have to worry about building containers and integrating them with HKube API.
Cluster Debugging
- Debug a part of a pipeline based on previous results.
- Debug a single algorithm on your IDE, while the rest of the algorithms running in the cluster.
Jupyter Integration - Scale your jupyter running tasks Jupyter with hkube.

User Guide

Installation
- Dependencies
- Helm
APIs
API Usage Example

Installation

Dependencies

HKube runs on top of Kubernetes so in order to run HKube we have to install it's prerequisites.

Kubernetes - Install Kubernetes or Minikube or microk8s.
Helm - HKube installation uses Helm, follow the installation guide.

Helm

Add the HKube Helm repository to helm:

helm repo add hkube http://hkube.io/helm/

Configure a docker registry for builds
Create a values.yaml file for custom helm values

build_secret:
# pull secret is only needed if docker hub is not accessible
  pull:
    registry: ''
    namespace: ''
    username: ''
    password: ''
# enter your docker hub / other registry credentials
  push:
    registry: '' # can be left empty for docker hub
    namespace: '' # registry namespace - usually your username
    username: ''
    password: ''

Install HKube chart

helm install hkube/hkube  -f ./values.yaml --name my-release

This command installs HKube in a minimal configuration for development. Check production-deployment.

APIs

There are three ways to communicate with HKube: Dashboard, REST API and CLI.

UI Dashboard

Dashboard is a web-based HKube user interface. Dashboard supports every functionality HKube has to offer.

REST API

HKube exposes it's functionality with REST API.

API Spec
Swagger-UI - locally {yourDomain}/hkube/api-server/swagger-ui

CLI

hkubectl is HKube command line tool.

hkubectl [type] [command] [name]

# More information
hkubectl --help

Download hkubectl latest version.

curl -Lo hkubectl https://github.com/kube-HPC/hkubectl/releases/latest/download/hkubectl-linux \
&& chmod +x hkubectl \
&& sudo mv hkubectl /usr/local/bin/

For mac replace with hkubectl-macos
For Windows download hkubectl-win.exe

Config hkubectl with your running Kubernetes.

# Config
hkubectl config set endpoint ${KUBERNETES-MASTER-IP}

hkubectl config set rejectUnauthorized false

Make sure kubectl is configured to your cluster.

HKube requires that certain pods will run in privileged security permissions, consult your Kubernetes installation to see how it's done.

API Usage Example

The Problem

We want to solve the next problem with given input and a desired output:

Input: Two numbers N, k.
Desired Output: A number M so: <div style="text-align:center"><img src="https://latex.codecogs.com/svg.latex?M&space;=&space;\sum_{i=1}^N&space;k\cdot&space;i" title="M = \sum_{i=1}^N k\cdot i" /></div>

For example: N=5, k=2 will result: <div style="text-align:center"><img src="https://latex.codecogs.com/svg.latex?2\cdot1+2\cdot&space;2&space;+&space;2\cdot&space;3&space;+&space;2\cdot&space;4&space;+&space;2\cdot&space;5&space;=&space;2&space;+&space;4&space;+6+8+10&space;=&space;30&space;=&space;M" title="2\cdot1+2\cdot 2 + 2\cdot 3 + 2\cdot 4 + 2\cdot 5 = 2 + 4 +6+8+10 = 30 = M" /></div>

Solution

We will solve the problem by running a distributed pipeline of three algorithms: Range, Multiply and Reduce.

Range Algorithm

Creates an array of length N.

 N = 5
 5 -> [1,2,3,4,5]

Multiply Algorithm

Multiples the received data from Range Algorithm by k.

k = 2
[1,2,3,4,5] * (2) -> [2,4,6,8,10]

Reduce Algorithm

The algorithm will wait until all the instances of the Multiply Algorithm will finish then it will summarize the received data together .

[2,4,6,8,10] -> 30

Building a Pipeline

We will implement the algorithms using various languages and construct a pipeline from them using HKube.

PipelineExample

Pipeline Descriptor

The pipeline descriptor is a JSON object which describes and defines the links between the nodes by defining the dependencies between them.

{
  "name": "numbers",
  "nodes": [
    {
      "nodeName": "Range",
      "algorithmName": "range",
      "input": ["@flowInput.data"]
    },
    {
      "nodeName": "Multiply",
      "algorithmName": "multiply",
      "input": ["#@Range", "@flowInput.mul"]
    },
    {
      "nodeName": "Reduce",
      "algorithmName": "reduce",
      "input": ["@Multiply"]
    }
  ],
  "flowInput": {
    "data": 5,
    "mul": 2
  }
}

Note the flowInput: data = N = 5, mul = k = 2

Node dependencies

HKube allows special signs in nodes input for defining the pipeline execution flow.

In our case we used:

(@) — References input parameters for the algorithm.

(#) — Execute nodes in parallel and reduce the results into single node.

(#@) — By combining # and @ we can create a batch processing on node results.

JSON

JSON Breakdown

We created a pipeline name numbers.

    "name":"numbers"

The pipeline is defined by three nodes.

"nodes":[
    {
            "nodeName":"Range",
            "algorithmName":"range",
            "input":["@flowInput.data"]
        },
        {
            "nodeName":"Multiply",
            "algorithmName":"multiply",
            "input":["#@Range","@flowInput.mul"]
        },
        {
            "nodeName":"Reduce",
            "algorithmName":"reduce",
            "input":["@Multiply"]
        },
    ]

In HKube, the linkage between the nodes is done by defining the algorithm inputs. Multiply will be run after Range algorithm because of the input dependency between them.

Keep in mind that HKube will transport the results between the nodes automatically for doing it HKube currently support two different types of transportation layers object storage and files system.

Group 4 (3)

Hkube

Install / Use

README