Tork
Tork is a lightweight, distributed workflow engine that runs tasks as simple scripts within Docker containers.
Install / Use
/learn @runabol/TorkREADME
Tork is a highly-scalable, general-purpose workflow engine. It lets you define jobs consisting of multiple tasks, each running inside its own container. You can run Tork on a single machine (standalone mode) or set it up in a distributed environment with multiple workers.
Features

- REST API – Submit jobs, query status, cancel/restart
- Horizontally scalable – Add workers to handle more tasks
- Task isolation – Tasks run in containers for isolation, idempotency, and resource limits
- Automatic recovery – Tasks are recovered if a worker crashes
- Stand-alone and distributed – Run all-in-one or distributed with Coordinator + Workers
- Retry failed tasks – Configurable retry with backoff
- Middleware – HTTP, Job, Task, Node middleware for auth, logging, metrics
- No single point of failure – Stateless, leaderless coordinators
- Task timeout – Timeout per task
- Full-text search – Search jobs via the API
- Runtime agnostic – Docker, Podman, Shell
- Webhooks – Notify on job/task state changes
- Pre/Post tasks – Pre/Post tasks for setup/teardown
- Expression language – Expressions for conditionals and dynamic values
- Conditional tasks – Run tasks based on
ifconditions - Parallel tasks – Parallel Task
- Each task – Each Task for looping
- Subjob task – Sub-Job Task
- Task priority – Priority (0–9)
- Secrets – Secrets with auto-redaction
- Scheduled jobs – Scheduled jobs with cron
- Web UI – Tork Web for viewing and submitting jobs
Quick Start
Requirements
Set up PostgreSQL
Start a PostgreSQL container:
Note: For production, consider a managed PostgreSQL service for better reliability and maintenance.
docker run -d \
--name tork-postgres \
-p 5432:5432 \
-e POSTGRES_PASSWORD=tork \
-e POSTGRES_USER=tork \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-e POSTGRES_DB=tork postgres:15.3
Run the migration to create the database schema:
TORK_DATASTORE_TYPE=postgres ./tork migration
Hello World
Start Tork in standalone mode:
./tork run standalone
Create hello.yaml:
# hello.yaml
---
name: hello job
tasks:
- name: say hello
image: ubuntu:mantic
run: |
echo -n hello world
- name: say goodbye
image: alpine:latest
run: |
echo -n bye world
Submit the job:
JOB_ID=$(curl -s -X POST --data-binary @hello.yaml \
-H "Content-type: text/yaml" http://localhost:8000/jobs | jq -r .id)
Check status:
curl -s http://localhost:8000/jobs/$JOB_ID
{
"id": "ed0dba93d262492b8cf26e6c1c4f1c98",
"state": "COMPLETED",
...
}
Running in distributed mode
In distributed mode, the Coordinator schedules work and Workers execute tasks. A message broker (e.g. RabbitMQ) moves tasks between them.
Start RabbitMQ:
docker run \
-d -p 5672:5672 -p 15672:15672 \
--name=tork-rabbitmq \
rabbitmq:3-management
Note: For production, consider a dedicated RabbitMQ service.
Run the coordinator:
TORK_DATASTORE_TYPE=postgres TORK_BROKER_TYPE=rabbitmq ./tork run coordinator
Run one or more workers:
TORK_BROKER_TYPE=rabbitmq ./tork run worker
Submit the same job as before; the coordinator and workers will process it.
Adding external storage
Tasks are ephemeral; container filesystems are lost when a task ends. To share data between tasks, use an external store (e.g. MinIO/S3).
Start MinIO:
docker run --name=tork-minio \
-d -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data \
--console-address ":9001"
Example job with two tasks (write to MinIO, then read back):
name: stateful example
inputs:
minio_endpoint: http://host.docker.internal:9000
secrets:
minio_user: minioadmin
minio_password: minioadmin
tasks:
- name: write data to object store
image: amazon/aws-cli:latest
env:
AWS_ACCESS_KEY_ID: "{{ secrets.minio_user }}"
AWS_SECRET_ACCESS_KEY: "{{ secrets.minio_password }}"
AWS_ENDPOINT_URL: "{{ inputs.minio_endpoint }}"
AWS_DEFAULT_REGION: us-east-1
run: |
echo "Hello from Tork!" > /tmp/data.txt
aws s3 mb s3://mybucket
aws s3 cp /tmp/data.txt s3://mybucket/data.txt
- name: read data from object store
image: amazon/aws-cli:latest
env:
AWS_ACCESS_KEY_ID: "{{ secrets.minio_user }}"
AWS_SECRET_ACCESS_KEY: "{{ secrets.minio_password }}"
AWS_ENDPOINT_URL: "{{ inputs.minio_endpoint }}"
AWS_DEFAULT_REGION: us-east-1
run: |
aws s3 cp s3://mybucket/data.txt /tmp/retrieved.txt
echo "Contents of retrieved file:"
cat /tmp/retrieved.txt
Installation
Download the Tork binary for your system from the releases page.
Create a directory and unpack:
mkdir ~/tork
cd ~/tork
tar xzvf ~/Downloads/tork_0.1.66_darwin_arm64.tgz
./tork
You should see the Tork banner and help. On macOS you may need to allow the binary in Security & Privacy settings.
PostgreSQL and migration
See Quick Start – Set up PostgreSQL and run:
TORK_DATASTORE_TYPE=postgres ./tork migration
Standalone mode
./tork run standalone
Distributed mode
Configure the broker (e.g. in config.toml):
# config.toml
[broker]
type = "rabbitmq"
[broker.rabbitmq]
url = "amqp://guest:guest@localhost:5672/"
Start RabbitMQ, then:
./tork run coordinator
./tork run worker
Queues
Tasks go to the default queue unless overridden. Workers subscribe to queues; you can run multiple consumers per queue:
# config.toml
[worker.queues]
default = 5
video = 2
[broker]
type = "rabbitmq"
Route a task to a specific queue:
name: transcode a video
queue: video
image: jrottenberg/ffmpeg:3.4-alpine
run: |
ffmpeg -i https://example.com/some/video.mov output.mp4
Architecture
A workflow is a job: a series of tasks (steps) run in order. Jobs are usually defined in YAML:
---
name: hello job
tasks:
- name: say hello
image: ubuntu:mantic
run: echo -n hello world
- name: say goodbye
image: ubuntu:mantic
run: echo -n bye world
Components:
- Coordinator – Tracks jobs, dispatches work to workers, handles retries and failures. Stateless and leaderless; does not run tasks.
- Worker – Runs tasks via a runtime (usually Docker).
- Broker – Routes messages between Coordinator and Workers.
- Datastore – Persists job and task state.
- Runtime – Execution environment for tasks (Docker, Podman, Shell).
Jobs
A job is a list of tasks executed in order.
Simple example
name: hello job
tasks:
- name: say hello
var: task1
image: ubuntu:mantic
run: |
echo -n hello world > $TORK_OUTPUT
- name: say goodbye
image: ubuntu:mantic
run: |
echo -n bye world
Submit:
curl -s -X POST --data-binary @job.yaml \
-H "Content-type: text/yaml" \
http://localhost:8000/jobs
Inputs
name: mov to mp4
inputs:
source: https://example.com/path/to/video.mov
tasks:
- name: convert the video to mp4
image: jrottenberg/ffmpeg:3.4-alpine
env:
SOURCE_URL: '{{ inputs.source }}'
run: |
ffmpeg -i $SOURCE_URL /tmp/output.mp4
Secrets
Use the secrets block for sensitive values (redacted in API responses):
name: my job
secrets:
api_key: 1111-1111-1111-1111
tasks:
- name: my task
image: alpine:latest
run: curl -X POST -H "API_KEY: $API_KEY" http://example.com
env:
API_KEY: '{{secrets.api_key}}'
Defaults
Set defaults for all tasks:
name: my job
defaults:
retry:
limit: 2
limits:
cpus: 1
memory: 500m
timeout: 10m
queue: highcpu
priority: 3
tasks:
- name: my task
image: alpine:latest
run: echo hello world
Auto Delete
name: my job
autoDelete:
after: 6h
tasks:
- name: my task
image: alpine:latest
run: echo hello world
Webhooks
name: my job
webhooks:
- url: http://example.com/my/webhook
event: job.StateChange # or task.StateChange
headers:
my-header: somevalue
if: "{{ job.State == 'COMPLETED' }}"
tasks:
- name: my task
image: alpine:latest
run: echo hello world
Permissions
`
