Welcome!

Welcome to flow, for defining, compiling and running parallel, dataflow programs like the one below (that is a visual representation generated by the compiler from the flow definition and rendered with graphviz) of a flow program to generate a sequence of fibonacci numbers.

If you are a programmer, your intuition will probably tell you a lot already about how flow works without any explanation. First flow This flow program generates a fibonacci series on standard output. It is one of the examples (fibonacci) in the flowr crate that is part of theflow project, and the first thing I got working (much to my own delight!).

The two inputs to add (i1 and i2) are initialized "once" (at startup) with 0 and 1. The output (sum) is then fed back to input i2 and the value presented at input i2 previously is fed back to input i1. The output (sum) is also sent to the default (unnamed) input of the stdout function which prints the value to standard output. The program runs until integer overflow causes no output to be produced and it stops.

Here you can see it being executed by the flowrgui graphical flow runner:

Fibonacci Series

What is a `dataflow program`?

A data flow program consists of a graph of processes (hierarchical in this case, as a process within it can be another graph of processes, and so on down) that act on data that flow between them on defined connections.

it is declarative and defines what processes are used, and how they are connected
processes are small and single purpose and "pure". They get a series of inputs, execute an algorithm (probably written in some procedural language) and produce an output.
The application used to run a flow (a "flow runner") provides ways for it to interact with the execution environment via "impure" functions, for things like Stdio, File System, etc.

What characteristics do they have?

Why is writing a dataflow program something interesting to explore in the first place?

Well, data flow programs define the program in terms of the processing steps that needs to be done on data and the dependencies between the data, making them inherently parallelizable and distributable (and in my mind, kind of the minimal essence or expression of the algorithm).

Processes only run on data when it is available, making them "event driven" (where the "event" is the availability of data...or alternatively, the data expresses an event that needs processing done on it and some output created). They are not focussed so much on the procedural steps that need to be done and the control flow of the same, but on the required transformations to the data and on data flow through the program.

What does the `flow` project include?

Currently, flows are defined declaratively in a text file (toml, json or yaml are supported) that is then compiled to a flow manifest, which is executed.

The flow project includes:

Compiler: a library and a binary (flowclib and flowc) for compiling flows
Runner: a library (flowrlib) and two binaries for running flows:
- flowrcli - default command line runner and debugger to use from a terminal
- flowrgui - a GUI application for running and debugging flows
Job executor: flowrex binary can be discovered (on same machine or local network) by a runner and used to execute jobs, distributing execution in a basic fashion
Standard Library: flowstdlib library of pre-defined flows and functions that can be re-used in flows
Examples: A set of example flows to illustrate flow programming (more to come!)
- Here is a screenshot of the mandlebrot example rendering a monochrome mandlebrot set using flowrgui

Mandlebrot set

The book covers defining flows, the runtime semantics, command line tool options and how to use them, the flowstdlib library functions and flows, flowrcli/flowrgui's context functions and more. It is published online here.

What made me want to do it?

You can read more about what made me want to do this project, based on ideas gathered over a few decades on and off (combined with looking for a "real" project to use to learn rust!) in the book's Inspirations for flow section. The core reason is: I wanted to know if I could do it and make it work, having stopped being a Software Engineer many years ago, based on rough ideas and intuition I had in my head (no real formal knowledge in this area or reading of books and papers - that came later after I did it).

I implemented the runtime "semantics" as needed as I implemented the examples. It's been a journey of discovery: of writing something like this (for me), learning rust in the process and learning how such a programming paradigm could work. I learned it could work, but requires a change in how you think about programming (with procedural programming so ingrained in us). Sometimes I struggled to think about relatively simple algorithms in a completely new way. This reminded me of when I got stuck trying to write a loop in Prolog, in University. If you're trying to write a loop ...."you're thinking about it wrong".

Installing

You can install many of the crates from crates.io, but due to unresolved issues in packaging non-source files, a total working installation cannot yet be achieved using cargo install.

The workaround in the meantime is to clone the repo and build all from source (see below).

Building `flow`

For more details on how to build flow locally and contribute to it, please see building flow Install the dependencies with make config, then run make, which builds everything and installs the flowc and flowr andflowrex binaries.

NOTE: Building of flowstdlib the first time will take a long time, as it is compiling many rust functions to WebAssembly.

Running your first 'flow'

With flowc and flowr installed, you can run the 'fibonacci' example flow using:

cargo run --example fibonacci

You should get a fibonacci series of numbers output to the terminal.

The first flow section of the book walks you through it.

Tech decisions

Job/Work Distribution - with Threads

Flow was started before async landed in rust, and so it uses a manually managed thread pool for executing "jobs" (functions with their set of inputs). Rewriting in async rust would make sense in some areas but be quite a chunk of disruptive work, so I haven't done it yet.

Message Passing - with Zero MQ

I started with channels for distributing Jobs and results between threads. I wanted to enable distributing work across the (local for now) network and so moved to ZeroMQ message queues and passing messages. This is used for inter-thread and inter-process message passing indistinctly. ZeroMQ rust bindings don't support all socket types (at the time of writing) so I had to use the REQ/REP pattern, which has some restrictions on the protocol, and which end writes first - which I also had to work around. For a while I kept

Discovery - with mDNS and beacons

To discover "executors" (processes with threads, able to execute flow jobs) on the network I wrote my own small discovery crate (as I couldn't get libp2p mDNS or other mDNS crates to work). Not very happy with it as it frequently ties up ports and other issues I have had to work around.

Portability - with WebAssembly (WASM)

Library functions are compiled both to native and optionally linked statically to a flow runner with a feature, AND compiled to wasm (and their size optimized to around 110KB) and described in a library manifest. Libraries are referenced from a flow's compiled manifest and if the library is already statically linked then the native implementations can be used, or the WASM supplied files can be used, under control of an option. I have used this to have a flow program running on my mac, and with the flowrex job executor running on a connected RaspberryPi running native or WASM.

When a user writes a new flow and includes a "provided implementation" (a custom function used in the flow), they write it in rust and it is compiled to WASM and loaded at run time.

Client - Server

I knew I wanted to be able to distriubute flow execution between processes, and I know I wanted to have the ability to have a background process coordinate execution and execute jobs, and have different UIs (CLI, GUI) and be able to use standard input/output from CLI. So, the "context functions" (impure functions that interact with the environment where a flow runs) are implemented in the "runner" and can be CLI or GUI implementations. That lead to some messy client/server message passing, that is now pretty stable and works on both CLI and GUI with the same backend (in ºflowrlibº) coordinating a flow and executing jobs - but with some complexity.

Testing

Testing coverage is about 85%-90% and I try to keep it high. There are simple unit tests for functions, a lot of tests around the compiler semantics, integration tests of flow compile/run errors, and of flows that compile & run correctly, and the examples all have suppl

Flow

Install / Use

README

Welcome!

What is a `dataflow program`?

What characteristics do they have?

What does the `flow` project include?

What made me want to do it?

Installing

Building `flow`

Running your first 'flow'

Tech decisions

Job/Work Distribution - with Threads

Message Passing - with Zero MQ

Discovery - with mDNS and beacons

Portability - with WebAssembly (WASM)

Client - Server

Testing

Flow

Install / Use

README

Welcome!

What is a dataflow program?

What characteristics do they have?

What does the flow project include?

What made me want to do it?

Installing

Building flow

Running your first 'flow'

Tech decisions

Job/Work Distribution - with Threads

Message Passing - with Zero MQ

Discovery - with mDNS and beacons

Portability - with WebAssembly (WASM)

Client - Server

Testing

What is a `dataflow program`?

What does the `flow` project include?

Building `flow`