SkillAgentSearch skills...

Faust

Python Stream Processing

Install / Use

/learn @robinhood/Faust
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

.. XXX Need to change this image to readthedocs before release

.. image:: https://raw.githubusercontent.com/robinhood/faust/8ee5e209322d9edf5bdb79b992ef986be2de4bb4/artwork/banner-alt1.png

=========================== Deprecation Notice

This library has been deprecated and no longer managed or supported. The current active community project can be found at https://github.com/faust-streaming/faust

=========================== Python Stream Processing

|build-status| |coverage| |license| |wheel| |pyversion| |pyimp|

:Version: 1.10.4 :Web: http://faust.readthedocs.io/ :Download: http://pypi.org/project/faust :Source: http://github.com/robinhood/faust :Keywords: distributed, stream, async, processing, data, queue, state management

.. sourcecode:: python

# Python Streams
# Forever scalable event processing & in-memory durable K/V store;
# as a library w/ asyncio & static typing.
import faust

Faust is a stream processing library, porting the ideas from Kafka Streams_ to Python.

It is used at Robinhood_ to build high performance distributed systems and real-time data pipelines that process billions of events every day.

Faust provides both stream processing and event processing, sharing similarity with tools such as Kafka Streams, Apache Spark/Storm/Samza/Flink_,

It does not use a DSL, it's just Python! This means you can use all your favorite Python libraries when stream processing: NumPy, PyTorch, Pandas, NLTK, Django, Flask, SQLAlchemy, ++

Faust requires Python 3.6 or later for the new async/await_ syntax, and variable type annotations.

Here's an example processing a stream of incoming orders:

.. sourcecode:: python

app = faust.App('myapp', broker='kafka://localhost')

# Models describe how messages are serialized:
# {"account_id": "3fae-...", amount": 3}
class Order(faust.Record):
    account_id: str
    amount: int

@app.agent(value_type=Order)
async def order(orders):
    async for order in orders:
        # process infinite stream of orders.
        print(f'Order for {order.account_id}: {order.amount}')

The Agent decorator defines a "stream processor" that essentially consumes from a Kafka topic and does something for every event it receives.

The agent is an async def function, so can also perform other operations asynchronously, such as web requests.

This system can persist state, acting like a database. Tables are named distributed key/value stores you can use as regular Python dictionaries.

Tables are stored locally on each machine using a super fast embedded database written in C++, called RocksDB_.

Tables can also store aggregate counts that are optionally "windowed" so you can keep track of "number of clicks from the last day," or "number of clicks in the last hour." for example. Like Kafka Streams_, we support tumbling, hopping and sliding windows of time, and old windows can be expired to stop data from filling up.

For reliability we use a Kafka topic as "write-ahead-log". Whenever a key is changed we publish to the changelog. Standby nodes consume from this changelog to keep an exact replica of the data and enables instant recovery should any of the nodes fail.

To the user a table is just a dictionary, but data is persisted between restarts and replicated across nodes so on failover other nodes can take over automatically.

You can count page views by URL:

.. sourcecode:: python

# data sent to 'clicks' topic sharded by URL key.
# e.g. key="http://example.com" value="1"
click_topic = app.topic('clicks', key_type=str, value_type=int)

# default value for missing URL will be 0 with `default=int`
counts = app.Table('click_counts', default=int)

@app.agent(click_topic)
async def count_click(clicks):
    async for url, count in clicks.items():
        counts[url] += count

The data sent to the Kafka topic is partitioned, which means the clicks will be sharded by URL in such a way that every count for the same URL will be delivered to the same Faust worker instance.

Faust supports any type of stream data: bytes, Unicode and serialized structures, but also comes with "Models" that use modern Python syntax to describe how keys and values in streams are serialized:

.. sourcecode:: python

# Order is a json serialized dictionary,
# having these fields:

class Order(faust.Record):
    account_id: str
    product_id: str
    price: float
    quantity: float = 1.0

orders_topic = app.topic('orders', key_type=str, value_type=Order)

@app.agent(orders_topic)
async def process_order(orders):
    async for order in orders:
        # process each order using regular Python
        total_price = order.price * order.quantity
        await send_order_received_email(order.account_id, order)

Faust is statically typed, using the mypy type checker, so you can take advantage of static types when writing applications.

The Faust source code is small, well organized, and serves as a good resource for learning the implementation of Kafka Streams_.

Learn more about Faust in the introduction_ introduction page to read more about Faust, system requirements, installation instructions, community resources, and more.

or go directly to the quickstart_ tutorial to see Faust in action by programming a streaming application.

then explore the User Guide_ for in-depth information organized by topic.

.. _Robinhood: http://robinhood.com .. _async/await: https://medium.freecodecamp.org/a-guide-to-asynchronous-programming-in-python-with-asyncio-232e2afa44f6 .. _Celery: http://celeryproject.org .. _Kafka Streams: https://kafka.apache.org/documentation/streams .. _Apache Spark: http://spark.apache.org .. _Storm: http://storm.apache.org .. _Samza: http://samza.apache.org .. _Flink: http://flink.apache.org .. _RocksDB: http://rocksdb.org .. _Apache Kafka: https://kafka.apache.org

.. _introduction: http://faust.readthedocs.io/en/latest/introduction.html

.. _quickstart: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html

.. _User Guide: http://faust.readthedocs.io/en/latest/userguide/index.html

Faust is...

Simple Faust is extremely easy to use. To get started using other stream processing solutions you have complicated hello-world projects, and infrastructure requirements. Faust only requires Kafka, the rest is just Python, so If you know Python you can already use Faust to do stream processing, and it can integrate with just about anything.

Here's one of the easier applications you can make::

    import faust

    class Greeting(faust.Record):
        from_name: str
        to_name: str

    app = faust.App('hello-app', broker='kafka://localhost')
    topic = app.topic('hello-topic', value_type=Greeting)

    @app.agent(topic)
    async def hello(greetings):
        async for greeting in greetings:
            print(f'Hello from {greeting.from_name} to {greeting.to_name}')

    @app.timer(interval=1.0)
    async def example_sender(app):
        await hello.send(
            value=Greeting(from_name='Faust', to_name='you'),
        )

    if __name__ == '__main__':
        app.main()

You're probably a bit intimidated by the `async` and `await` keywords,
but you don't have to know how ``asyncio`` works to use
Faust: just mimic the examples, and you'll be fine.

The example application starts two tasks: one is processing a stream,
the other is a background thread sending events to that stream.
In a real-life application, your system will publish
events to Kafka topics that your processors can consume from,
and the background thread is only needed to feed data into our
example.

Highly Available Faust is highly available and can survive network problems and server crashes. In the case of node failure, it can automatically recover, and tables have standby nodes that will take over.

Distributed Start more instances of your application as needed.

Fast A single-core Faust worker instance can already process tens of thousands of events every second, and we are reasonably confident that throughput will increase once we can support a more optimized Kafka client.

Flexible Faust is just Python, and a stream is an infinite asynchronous iterator. If you know how to use Python, you already know how to use Faust, and it works with your favorite Python libraries like Django, Flask, SQLAlchemy, NTLK, NumPy, SciPy, TensorFlow, etc.

.. _introduction: http://faust.readthedocs.io/en/latest/introduction.html

.. _quickstart: http://faust.readthedocs.io/en/latest/playbooks/quickstart.html

.. _User Guide: http://faust.readthedocs.io/en/latest/userguide/index.html

Installation

You can install Faust either via the Python Package Index (PyPI) or from source.

To install using pip:

.. sourcecode:: console

$ pip install -U faust

.. _bundles:

Bundles

Faust also defines a group of setuptools extensions that can be used to install Faust and the dependencies for a given feature.

You can specify these in your requirements or on the pip command-line by using brackets. Separate multiple bundles using the comma:

.. sourcecode:: console

$ pip install "faust[rocksdb]"

$ pip install "faust[rocksdb,uvloop,fast,redis]"

The following bundles are available:

Stores


:``faust[rocksdb]``:
    for using `RocksDB`_ for storing Faust table state.

    **Recommended in production.**


.. _`RocksDB`: http://rocksdb.org

Caching

:faust[redis]: for using Redis_ as a simple caching backend (Memcached-style).

Codecs


:``faust[yaml]``:
    f

Related Skills

View on GitHub
GitHub Stars6.8k
CategoryDevelopment
Updated1d ago
Forks535

Languages

Python

Security Score

85/100

Audited on Mar 25, 2026

No findings