Stolos
A Directed Acyclic Graph task dependency scheduler designed to simplify complex distributed pipelines
Install / Use
/learn @sailthru/StolosREADME
Summary:
Stolos is a task dependency scheduler that helps build distributed pipelines. It shares similarities with Chronos, Luigi, and Azkaban, yet remains fundamentally different from all three.
The goals of Stolos are the following:
- Manage the order of execution of interdependent applications, where each application may run many times with different input parameters.
- Provide an elegant way to define and reason about job dependencies.
- Built for fault tolerance and scalability.
- Applications are completely decoupled from and know nothing about Stolos.

How does Stolos work?
Stolos consists of three primary components:
- a Queue (stores job state)
- a Configuration (defines task dependencies. this is a JSON file by default)
- the Runner (ie. runs code via bash command or a plugin)
Stolos manages a queuing system to decide, at any given point in time, if the current application has any jobs, if the current job is runnable, whether to queue child or parent jobs, or whether to requeue the current job if it failed.
Stolos "wraps" jobs. This is an important concept for three reasons. First, before the job starts and after the job finishes, Stolos updates the job's state in the queueing system. Second, rather than run a job directly (ie from command-line), Stolos runs directly from the command-line, where it will check the application's queue and run a queued job. If no job is queued, Stolos will wait for one or exit. Third, Stolos must run once for every job in the queue. Stolos is like a queue consumer, and an external process must maintain a healthy number of queue consumers. This can be done with crontab (meh), Relay.Mesos, or any auto scaler program.
Stolos lets its users define deterministic dependency relationships between applications. The documentation explains this in detail. In the future, we may let users define non-deterministic dependency relationships, but we don't see the benefits yet.
Applications are completely decoupled from Stolos. This means applications can run independently of Stolos and can also integrate directly with it without any changes to the application's code. Stolos identifies an application via a Configuration, defined in the documentation.
Unlike many other dependency schedulers, Stolos is decentralized. There is no central server that "runs" things. Decentralization here means a few things. First, Stolos does not care where or how jobs run. Second, it doesn't care about which queuing or configuration backends are used, provided that Stolos is able to communicate with these backends. Third, and perhaps most importantly, the mission-critical questions about Consistency vs Availability vs Partition Tolerance (as defined by the CAP theorem) are delegated to the queue backend (and to some extent, the configuration backend).
Stolos, in summary:
- manages job state, queues future work, and starts your applications.
- language agnostic (but written in Python).
- "at least once" semantics (a guarantee that a job will successfully complete or fail after n retries)
- designed for apps of various sizes: from large hadoop jobs to jobs that take a second to complete
What this is project not:
- not aware of machines, nodes, network topologies and infrastructure
- does not (and should not) auto-scale workers
- not (necessarily) meant for "real-time" computation
- This is not a grid scheduler (ie this does not solve a bin packing problem)
- not a crontab. (in certain cases this is not entirely true)
- not meant to manage long-running services or servers (unless order in which they start is important)
Similar tools out there:
Requirements:
- A Queue backend (Redis or ZooKeeper)
- A Configuration backend (JSON file, Redis, ...)
- Some Python libraries (Kazoo, Networkx, Argparse, ...)
Optional requirements:
- Apache Spark (for Spark plugin)
- GraphViz (for visualizing the dependency graph)
Background: Inspiration
The inspiration for this project comes from the notion that the way we manage dependencies in our system defines how we characterize the work that exists in our system.
This project arose from the needs of Sailthru's Data Science team to manage execution of pipeline applications. The team has a complex data pipeline (build models, algorithms and applications that support many clients), and this leads to a wide variety of work we have to perform within our system. Some work is very specific. For instance, we need to train the same predictive model once per (client, date). Other tasks might be more complex: a cross-client analysis across various groups of clients and ranges of dates. In either case, we cannot return results without having previously identified data sets we need, transformed them, created some extra features on the data, and built the model or analysis.
Since we have hundreds or thousands of instances of any particular application, we cannot afford to manually verify that work gets completed. Therefore, we need a system to manage execution of applications.
Concept: Application Dependencies as a Directed Graph
We can model dependencies between applications as a directed graph, where nodes are apps and edges are dependency requirements. The following section explains how Stolos uses a directed graph to define application dependencies.
We start with an assumption that our applications depend on each other:
Scenario 1: Scenario 2:
App_A App_A
| / \
v v v
App_B App_B App_C
| |
| App_D
| |
v v
App_E
In Scenario 1, App_B cannot run until App_A completes. In
Scenario 2, App_B and App_C cannot run until App_A completes,
but App_B and App_C can run in any order. Also, App_D requires
App_C to complete, but doesn't care if App_B has run yet.
App_E requires App_D and App_B to have completed.
By design, we also support the scenario where one application expands into multiple subtasks, or jobs. The reason for this is that if we run a hundred or thousand variations of the one app, the results of each job (ie subtask) may bubble down through the dependency graph independently of other jobs.
There are several ways jobs may depend on other jobs, and this system captures all deterministic dependency relationships (as far as we can tell).
Imagine the scenario where App_A --> App_B
Scenario 1:
App_A
|
v
App_B
Let's say App_A becomes multiple jobs, or subtasks, App_A_i. And App_B
also becomes multiple jobs, App_Bi. Scenario 1 may transform
into one of the following:
Scenario1, Situation I
becomes App_A1 App_A2 App_A3 App_An
-------> | | | |
+----------+------+---------+
| | | |
v v v v
App_B1 App_B2 App_B3 App_Bn
Scenario1, Situation II
App_A1 App_A2 App_A3 App_An
or becomes | | | |
-------> | | | |
v v v v
App_B1 App_B2 App_B3 App_Bn
In Situation 1, each job, App_Bi, depends on completion of all of App_A's
jobs before it can run. For instance, App_B1 cannot run until all App_A
jobs (1 to n) have completed. From Stolos's point of view, this is not
different than the simple case where App_A(1 to n) --> App_Bi. In this
case, we create a dependency graph for each App_Bi. See below:
Scenario1, Situation I (view 2)
becomes App_A1 App_A2 App_A3 App_An
-------> | | | |
+----------+------+---------+
|
v
App_Bi
In Situation 2, each job, App_Bi, depends only on completion of
its related job in App_A, or App_Ai. For instance,
App_B1 depends on completion of App_A1, but it doesn't have any
dependency on App_A2's completion. In this case, we create n
dependency graphs, as shown in Scenario 1, Situation II.
As we have just seen, dependencies can be modeled as directed acyclic
multi-graphs. (acyclic means no cycles - ie no loops. multi-graph
contains many separate graphs). Situation 2 is the
default in Stolos (App_Bi depends only on App_Ai).
Concept: Job IDs
For details on how to use and configure job_ids, see the section, Job
ID Configuration This section
explains what job_ids are.
Stolos recognizes apps (ie App_A or App_B) and jobs (App_A1,
App_A2, ...). An application, or app, represents a group of jobs. A
job_id identifies jobs, and it is made up of "identifiers" that we mash
together via a job_id template. A `
Related Skills
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
2.0kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
Flyaro-waffle-app
Waffle Delight - Full Stack MERN Application Rules & Documentation Project Overview A comprehensive waffle delivery application built with MERN stack featuring premium UI/UX, admin management, a
