Woken
An orchestration platform for Docker containers running data mining algorithms
Install / Use
/learn @LREN-CHUV/WokenREADME
Woken: Workflow for Analytics
An orchestration platform for Docker containers running data mining algorithms.
This project exposes a web interface to execute on demand data mining algorithms defined in Docker containers and implemented using any tool or language (R, Python, Java and more are supported).
It relies on a runtime environment containing Mesos and Chronos to control and execute the Docker containers over a cluster.
Usage
docker run --rm --env [list of environment variables] --link woken hbpmip/woken:3.0.2
where the environment variables are:
- CLUSTER_IP: Name of this server advertised in the Akka cluster
- CLUSTER_PORT: Port of this server advertised in the Akka cluster
- CLUSTER_NAME: Name of Woken cluster, default to 'woken'
- WOKEN_PORT_8088_TCP_ADDR: Address of Woken master server
- WOKEN_PORT_8088_TCP_PORT: Port of Woken master server, default to 8088
- DOCKER_BRIDGE_NETWORK: Name of the Docker bridge network. Default to 'bridge'
- NETWORK_INTERFACE: IP address for listening to incoming HTTP connections. Default to '0.0.0.0'
- WEB_SERVICES_PORT: Port for the HTTP server in Docker container. Default to 8087
- WEB_SERVICES_SECURE: If yes, HTTPS with a custom certificate will be used. Default to no.
- WEB_SERVICES_USER: Name used to protected the web servers protected with HTTP basic authentication. Default to 'admin'
- WEB_SERVICES_PASSWORD: Password used to protected the web servers protected with HTTP basic authentication.
- LOG_LEVEL: Level for logs on standard output, default to WARNING
- LOG_CONFIG: on/off - log configuration on start, default to off
- VALIDATION_MIN_SERVERS: minimum number of servers with the 'validation' functionality in the cluster, default to 0
- SCORING_MIN_SERVERS: minimum number of servers with the 'scoring' functionality in the cluster, default to 0
- KAMON_ENABLED: enable monitoring with Kamon, default to no
- ZIPKIN_ENABLED: enable reporting traces to Zipkin, default to no. Requires Kamon enabled.
- ZIPKIN_IP: IP address to Zipkin server. Requires Kamon and Zipkin enabled.
- ZIPKIN_PORT: Port to Zipkin server. Requires Kamon and Zipkin enabled.
- PROMETHEUS_ENABLED: enable reporting metrics to Prometheus, default to no. Requires Kamon enabled.
- PROMETHEUS_IP: IP address to Prometheus server. Requires Kamon and Prometheus enabled.
- PROMETHEUS_PORT: Port to Prometheus server. Requires Kamon and Prometheus enabled.
- SIGAR_SYSTEM_METRICS: Enable collection of metrics of the system using Sigar native library, default to no. Requires Kamon enabled.
- JVM_SYSTEM_METRICS: Enable collection of metrics of the JVM using JMX, default to no. Requires Kamon enabled.
- MINING_LIMIT: Maximum number of concurrent mining operations. Default to 100
- EXPERIMENT_LIMIT: Maximum number of concurrent experiments. Default to 100
- RELEASE_STAGE: Release stage used when reporting errors to Bugsnag. Values are dev, staging, production
- DATA_CENTER_LOCATION: Location of the datacenter, used when reporting errors to Bugsnag
- CONTAINER_ORCHESTRATION: Container orchestration system used to execute the Docker containers. Values are mesos, docker-compose, kubernetes
Getting started
Follow these steps to get started:
- Git-clone this repository.
git clone https://github.com/LREN-CHUV/woken.git
- Change directory into your clone:
cd woken
- Build the application
You need the following software installed:
- Docker 18.09 or better with docker-compose
./build.sh
- Run the application
You need the following software installed to execute some tests:
cd tests
./run.sh
tests/run.sh uses docker-compose to start a full environment with Mesos, Zookeeper and Chronos, all of those are required for the proper execution of Woken.
- Create a DNS alias in /etc/hosts
127.0.0.1 localhost frontend
- Browse to http://frontend:8087 or run one of the query* script located in folder 'tests'.
Available Docker containers
The Docker containers that can be executed on this platform require a few specific features.
TODO: define those features - parameters passed as environment variables, in and out directories, entrypoint with a 'compute command', ...
The project algorithm-repository contains the Docker images that can be used with woken.
Available commands
Mining query
Performs a data mining task.
Path: /mining/job Verb: POST
Takes a Json document in the body, returns a Json document.
Json input should be of the form:
{
"user": {"code": "user1"},
"variables": [{"code": "var1"}],
"covariables": [{"code": "var2"},{"code": "var3"}],
"grouping": [{"code": "var4"}],
"filters": [],
"algorithm": "",
"datasets": [{"code": "dataset1"},{"code": "dataset2"}]
}
where:
- variables is the list of variables
- covariables is the list of covariables
- grouping is the list of variables to group together
- filters is the list of filters. The format used here is coming from JQuery QueryBuilder filters, for example
{"condition":"AND","rules":[{"id":"FULLNAME", "field":"FULLNAME","type":"string","input":"text","operator":"equal","value":"Isaac Fulmer"}],"valid":true} - datasets is an optional list of datasets, it can be used in distributed mode to select the nodes to query and in all cases add a filter rule of type
{"condition":"OR","rules":[{"field":"dataset","operator","equals","value":"dataset1"},{"field":"dataset","operator","equals","value":"dataset2"}]} - algorithm is the algorithm to use.
Currently, the following algorithms are supported:
- data: returns the raw data matching the query
- linearRegression: performs a linear regression
- summaryStatistics: performs a summary statistics than can be used to draw box plots.
- knn
- naiveBayes
Experiment query
Performs an experiment comprised of several data mining tasks and an optional cross-validation step used to compute the fitness of each algorithm and select the best result.
TODO: document API
Release
You need the following software installed:
Execute the following commands to distribute Woken as a Docker container:
./publish.sh
Installation
For production, woken requires Mesos and Chronos. To install them, you can use either:
- mip-microservices-infrastructure, a collection of Ansible scripts deploying a full Mesos stack on Ubuntu servers.
- mantl.io, a microservice infrstructure by Cisco, based on Mesos.
- Mesosphere DCOS DC/OS (the datacenter operating system) is an open-source, distributed operating system based on the Apache Mesos distributed systems kernel.
What's in a name?
Woken :
- the Woken river in China - we were looking for rivers in China
- passive form of awake - it launches Docker containers and computations
- workflow - the previous name, not too different
Acknowledgements
Funding
This work has been funded by the European Union Seventh Framework Program (FP7/20072013) under grant agreement no. 604102 (HBP)
This work is part of SP8 of the Human Brain Project (SGA1).
Sponsors
Thanks for the generous support of <img src="docs/bugsnag_logo_navy.png" height="16" alt="Bugsnag"></img> who offered us a Standard plan allowing us to inspect and report efficiently errors in our software.
Tools
We use the following tools for development:
- IntelliJ IDEA
- <img src="docs/bugsnag_logo_navy.png" height="16" alt=Bugsnag></img> to report errors in real time to our development team
- CircleCI for continuous integration
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
