Styx

"The path to execution", Styx is a service that schedules batch data processing jobs in Docker containers on Kubernetes.

Generate Convert Improve

Install / Use

/learn @spotify/Styx

About this skill

Quality Score

0/100

README

This repo was archived!

We decided to discontinue the Styx oss repo.

Styx

A batch job scheduler for Kubernetes

Description

Styx is a service that is used to trigger periodic invocations of Docker containers. The information needed to schedule such invocations, is read from a set of files on disk or an external service providing such information. The service takes responsibility for triggering and possibly also re-triggering invocations until a successful exit status has been emitted or some other limit has been reached. Styx is built using the [Apollo] framework and uses [Kubernetes] for container orchestration.

Styx can optionally provide some dynamic arguments to container executions that indicates which time period a particular invocation belongs to. For example an hourly job for the first hour of 2016-01-01 might have the dynamic argument 2016-01-01T00 appended to the container invocation.

The envisioned main use case for Styx is to execute data processing job, possibly long running processes that transform data periodically. Its initial use case is to run workflows of jobs orchestrated using [Luigi], but it does not have any intrinsic ties to Luigi. Styx can just as well execute a container with some simple bash scripts.

Styx was built to function smoothly on Google Cloud Platform, thus it makes use of Google products such as Google Cloud Datastore, Google Cloud Bigtable and Google Container Engine. However, the integrations with these products are all done through clear interfaces and other backends can easily be added.

Key concepts

The key concept that Styx concerns itself with is Workflows. A Workflow is either enabled or disabled and has a Schedule. A Schedule specifies how often a Workflow should be triggered, which Docker image to run and which arguments to pass to it on each execution. Each time a Workflow is triggered, a Workflow Instance is created. The Workflow instance is tracked as 'active' until at least one execution of the Docker image returns with a 0 exit code. Styx keeps track of Workflow Instance executions and provides information about them via the API.

Development status

Styx is actively being developed and deployed internally at Spotify where it is being used to run more than 10000 production workflows. Because of how we build and integrate infrastructure components at Spotify, this repository does not contain a GUI at the time of writing, while we do have one internally. The goal is to break out more of these components into open source projects that complement each other.

More docs

[Styx design]
[External services]
API Specification - HTML version

Usage

Setup

A fully functional Service can be found in styx-standalone-service. This packaging contains both the API and Scheduler service in one artifact. This is how you build and run it.

The following configuration keys in styx-standalone.conf have to be specified for the service to work:

# Google Container Engine (GKE) cluster
styx.gke.default.project-id = ""
styx.gke.default.cluster-zone = ""
styx.gke.default.cluster-id = ""
styx.gke.default.namespace = ""

# Google Cloud Bigtable instance
styx.bigtable.project-id = ""
styx.bigtable.instance-id = ""

# Google Cloud Datastore config
styx.datastore.project-id = ""
styx.datastore.namespace = ""

Build the project:

> mvn package

Run the service:

> java -jar styx-standalone-service/target/styx-standalone-service.jar

Workflow configuration

Refer to API Specification for how to deploy a workflow.

id: my-workflow
docker_image: my-workflow:0.1
docker_args: ['./run.sh', '{}']
schedule: hourly
offset: PT1H
service_account: my-service-account@my-project.iam.gserviceaccount.com
running_timeout: PT2H
retry_condition: "(#tries < 2 && #triggerType == 'backfill') || (#triggerType != 'backfill')"

`id` [string]

A unique identifier for the workflow (lower-case-hyphenated). This identifier is used to refer to the workflow through the API.

`docker_image` [string]:

The Docker image that should be executed.

`docker_args` [string]

The list of arguments passed to the Docker container.

This list should only contain strings. Any occurrences of the {} placeholder argument will be replaced with the current partition date or datehour. Note that it must be quoted in the yaml file in order not to be interpreted as an object.

Example arguments for the supported schedule values:

- hourly - 2016-04-01T14, 2016-04-01T15, ... (UTC hours)
- daily  - 2016-04-01,    2016-04-02,    ...
- weekly - 2016-04-04,    2016-04-11,    ... (Mondays)

`schedule` [string]

How often the workflow should be triggered and what the {} placeholder will be replaced with in docker_args.

Supports [cron] syntax, along with a set of human readable aliases:

@hourly,   hourly   = 0 * * * *
@daily,    daily    = 0 0 * * *
@weekly,   weekly   = 0 0 * * MON
@monthly,  monthly  = 0 0 1 * *
@yearly,   yearly   = 0 0 1 1 *
@annually, annually = 0 0 1 1 *

`offset` [string]

An [ISO 8601 Duration] specification for offsetting the cron schedule.

This is useful for when setting up a schedule that needs to be offset in time relative to the schedule timestamps. For instance, an hourly schedule that needs to process a bucket of data for each hour will not be able to run until at the end of that hour. We can then use an offset value of PT1H. The injected placeholder would reflect a logical time of the schedule (00, 01, 02, ...) one hour earlier than the actual run time (01, 02, 03, ...). This is specially useful for irregular schedules.

In fact, it is so common that we need to use a "last hour" parameter in jobs that we've set the default offset for all the well known (aliased) schedules to +1 period. E.g for an @hourly schedule, the default offset is PT1H, and for a @daily schedule the offset is P1D

Example: a job needs to run daily at 2 AM but the partition argument needs to be midnight

schedule: '@daily'
offset: P1DT2H

At 2017-06-30T02 the execution for 2017-06-29 will be triggered.

`service_account` [email address]

The [Service Account] email address belonging to a project in [Google Cloud Platform].

If the workflow intends to use keys of a [Service Account], Styx will create both JSON and p12 keys for the specified service_account, rotate keys on daily basis, and garbage collect unused keys older than 48h.

Styx stores the created keys in [Kubernetes Secrets] and mounts them under /etc/styx-wf-sa-keys/ in the container.

Styx injects an environment variable to the container named as GOOGLE_APPLICATION_CREDENTIALS pointing to the JSON key file.

In order for Styx to be able to create/delete keys for the service_account of a workflow, the [Service Account] that Styx itself runs as should be granted Service Account Key Admin role for the service_account of the workflow.

If authorization is enabled for the service, the service_account will be used to authorize deployments and actions (create/modify/delete, trigger a new instance, retry/halt an existing instance and create backfill) on the workflow. To authorize an account, grant it the [configured role]) for the [Service Account] of the workflow.

For information on how to grant an account a role in a [Service Account], follow this guide: [Granting Roles to Service Accounts].

`env` [dictionary]

Custom environment variables to be injected into running container.

`running_timeout` [string]

An [ISO 8601 Duration] specification for timing out container execution. The default is configurable in styx conf file through styx.stale-state-ttls.running. If not set it defaults to styx.stale-state-ttls.default. The upper boundary of the running_timeout is configurable through styx.max-running-timeout. If not set it defaults to styx.stale-state-ttls.running, which defaults to styx.stale-state-ttls.default as stated above.

`retry_condition` [string]

A SpEL boolean expression. If the expression evaluates to false, Styx will stop retrying and halt the workflow instance immediately. This configuration has no impact on possible max number of tries, meaning it can only be used to halt workflow instance earlier.

The following variables will be injected by Styx so that they can be used in the expression:

#exitCode: the exit code from the last execution
#tries: total number of tries, which equals to 1 when the first time a workflow instance gets executed and 2 when the first retry is issued
#consecutiveFailures: total number of consecutive failures that is not missing dependency
#triggerType: natural, backfill or ad-hoc

Triggering and executions

Each time a Workflow Schedule is triggered, Styx will treat that trigger as a first class entity. Each Trigger will have at least one Execution which can potentially take a long time to execute. If another Trigger happens during this time, both triggers will be active, each with one running container. Because Styx treats each Trigger individually, it can ensure that each one of them complete success

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

spotify

View profile

View on GitHub

GitHub Stars270

CategoryDevelopment

Updated21d ago

Forks48

spotify/styx

Languages

Java

Security Score

95/100

Audited on Mar 11, 2026

No findings

Styx

Install / Use

README

This repo was archived!

Styx

Description

Key concepts

Development status

More docs

Usage

Setup

Workflow configuration

id [string]

docker_image [string]:

docker_args [string]

schedule [string]

offset [string]

service_account [email address]

env [dictionary]

running_timeout [string]

retry_condition [string]

Triggering and executions

Related Skills

`id` [string]

`docker_image` [string]:

`docker_args` [string]

`schedule` [string]

`offset` [string]

`service_account` [email address]

`env` [dictionary]

`running_timeout` [string]

`retry_condition` [string]