Flow
Exploration of a data-flow programming paradigm
Install / Use
/learn @andrewdavidmackenzie/FlowREADME
Welcome!
Welcome to flow, for defining, compiling and running parallel,
dataflow programs like the one below (that is a visual
representation generated by the compiler from the flow definition and rendered with graphviz) of a
flow program to generate a sequence of fibonacci numbers.
If you are a programmer, your intuition will probably tell you a lot already about how flow works
without any explanation.
This flow program generates a fibonacci series on standard output.
It is one of the examples (fibonacci) in the
flowr crate
that is part of theflow project, and the first thing I got working (much to my own delight!).
The two inputs to add (i1 and i2) are initialized "once" (at startup) with 0 and 1.
The output (sum) is then fed back to input i2 and the value presented at input i2 previously is fed back to
input i1.
The output (sum) is also sent to the default (unnamed) input of the stdout function which prints
the value to standard output.
The program runs until integer overflow causes no output to be produced and it stops.
Here you can see it being executed by the flowrgui graphical flow runner:

What is a dataflow program?
A data flow program consists of a graph of processes (hierarchical in this case, as a process within it can be another graph of processes, and so on down) that act on data that flow between them on defined connections.
- it is declarative and defines what processes are used, and how they are connected
- processes are small and single purpose and "pure". They get a series of inputs, execute an algorithm (probably written in some procedural language) and produce an output.
- The application used to run a flow (a "flow runner") provides ways for it to interact with the execution environment via "impure" functions, for things like Stdio, File System, etc.
What characteristics do they have?
Why is writing a dataflow program something interesting to explore in the first place?
Well, data flow programs define the program in terms of the processing steps that needs to be done on data and the dependencies between the data, making them inherently parallelizable and distributable (and in my mind, kind of the minimal essence or expression of the algorithm).
Processes only run on data when it is available, making them "event driven" (where the "event" is the availability of data...or alternatively, the data expresses an event that needs processing done on it and some output created). They are not focussed so much on the procedural steps that need to be done and the control flow of the same, but on the required transformations to the data and on data flow through the program.
What does the flow project include?
Currently, flows are defined declaratively in a text file (toml, json or yaml are supported) that is then compiled to a flow manifest, which is executed.
The flow project includes:
- Compiler: a library and a binary (
flowclibandflowc) for compiling flows - Runner: a library (
flowrlib) and two binaries for running flows:flowrcli- default command line runner and debugger to use from a terminalflowrgui- a GUI application for running and debugging flows
- Job executor:
flowrexbinary can be discovered (on same machine or local network) by a runner and used to execute jobs, distributing execution in a basic fashion - Standard Library:
flowstdliblibrary of pre-defined flows and functions that can be re-used in flows - Examples: A set of example flows to illustrate flow programming (more to come!)
- Here is a screenshot of the mandlebrot example rendering a monochrome mandlebrot set using
flowrgui
- Here is a screenshot of the mandlebrot example rendering a monochrome mandlebrot set using

- The book covers defining flows, the runtime semantics, command line tool options and how to use them,
the
flowstdliblibrary functions and flows,flowrcli/flowrgui's context functions and more. It is published online here.
What made me want to do it?
You can read more about what made me want to do this project, based on ideas gathered over a few decades on and off (combined with looking for a "real" project to use to learn rust!) in the book's Inspirations for flow section. The core reason is: I wanted to know if I could do it and make it work, having stopped being a Software Engineer many years ago, based on rough ideas and intuition I had in my head (no real formal knowledge in this area or reading of books and papers - that came later after I did it).
I implemented the runtime "semantics" as needed as I implemented the examples. It's been a journey of discovery: of writing something like this (for me), learning rust in the process and learning how such a programming paradigm could work. I learned it could work, but requires a change in how you think about programming (with procedural programming so ingrained in us). Sometimes I struggled to think about relatively simple algorithms in a completely new way. This reminded me of when I got stuck trying to write a loop in Prolog, in University. If you're trying to write a loop ...."you're thinking about it wrong".
Installing
You can install many of the crates from crates.io, but due to unresolved issues in packaging
non-source files, a total working installation cannot yet be achieved using cargo install.
The workaround in the meantime is to clone the repo and build all from source (see below).
Building flow
For more details on how to build flow locally and contribute to it, please see
building flow
Install the dependencies with make config, then run make, which builds everything and installs the flowc and
flowr andflowrex binaries.
NOTE: Building of flowstdlib the first time will take a long time, as it is compiling many rust functions to
WebAssembly.
Running your first 'flow'
With flowc and flowr installed, you can run the 'fibonacci' example flow using:
cargo run --example fibonacci
You should get a fibonacci series of numbers output to the terminal.
The first flow section of the book walks you through it.
Tech decisions
Job/Work Distribution - with Threads
Flow was started before async landed in rust, and so it uses a manually managed thread pool for executing "jobs" (functions with their set of inputs). Rewriting in async rust would make sense in some areas but be quite a chunk of disruptive work, so I haven't done it yet.
Message Passing - with Zero MQ
I started with channels for distributing Jobs and results between threads. I wanted to enable distributing work across the (local for now) network and so moved to ZeroMQ message queues and passing messages. This is used for inter-thread and inter-process message passing indistinctly. ZeroMQ rust bindings don't support all socket types (at the time of writing) so I had to use the REQ/REP pattern, which has some restrictions on the protocol, and which end writes first - which I also had to work around. For a while I kept
Discovery - with mDNS and beacons
To discover "executors" (processes with threads, able to execute flow jobs) on the network I wrote my own small discovery crate (as I couldn't get libp2p mDNS or other mDNS crates to work). Not very happy with it as it frequently ties up ports and other issues I have had to work around.
Portability - with WebAssembly (WASM)
Library functions are compiled both to native and optionally linked statically to a flow runner with a feature, AND compiled to wasm (and their size optimized to around 110KB) and described in a library manifest. Libraries are referenced from a flow's compiled manifest and if the library is already statically linked then the native implementations can be used, or the WASM supplied files can be used, under control of an option. I have used this to have a flow program running on my mac, and with the flowrex job executor running on a connected RaspberryPi running native or WASM.
When a user writes a new flow and includes a "provided implementation" (a custom function used in the flow), they write it in rust and it is compiled to WASM and loaded at run time.
Client - Server
I knew I wanted to be able to distriubute flow execution between processes, and I know I wanted to have the ability to have a background process coordinate execution and execute jobs, and have different UIs (CLI, GUI) and be able to use standard input/output from CLI. So, the "context functions" (impure functions that interact with the environment where a flow runs) are implemented in the "runner" and can be CLI or GUI implementations. That lead to some messy client/server message passing, that is now pretty stable and works on both CLI and GUI with the same backend (in ºflowrlibº) coordinating a flow and executing jobs - but with some complexity.
Testing
Testing coverage is about 85%-90% and I try to keep it high. There are simple unit tests for functions, a lot of tests around the compiler semantics, integration tests of flow compile/run errors, and of flows that compile & run correctly, and the examples all have suppl
