Reverb
Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research
Install / Use
/learn @google-deepmind/ReverbREADME
Reverb
Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research. Reverb is primarily used as an experience replay system for distributed reinforcement learning algorithms but the system also supports multiple data structure representations such as FIFO, LIFO, and priority queues.
Table of Contents
Installation
Please keep in mind that Reverb is not hardened for production use, and while we do our best to keep things in working order, things may break or segfault.
:warning: Reverb currently only supports Linux based OSes.
The recommended way to install Reverb is with pip. We also provide instructions
to build from source using the same docker images we use for releases.
TensorFlow can be installed separately or as part of the pip install.
Installing TensorFlow as part of the install ensures compatibility.
$ pip install dm-reverb[tensorflow]
# Without Tensorflow install and version dependency check.
$ pip install dm-reverb
Nightly builds
$ pip install dm-reverb-nightly[tensorflow]
# Without Tensorflow install and version dependency check.
$ pip install dm-reverb-nightly
Build from source
This guide details how to build Reverb from source.
Reverb Releases
Due to some underlying libraries such as protoc and absl, Reverb has to be
paired with a specific version of TensorFlow. If installing Reverb as
pip install dm-reverb[tensorflow] the correct version of Tensorflow will be
installed. The table below lists the version of TensorFlow that each release of
Reverb is associated with and some versions of interest:
- 0.13.0 dropped Python 3.8 support.
- 0.11.0 first version to support Python 3.11.
- 0.10.0 last version to support Python 3.7.
Release | Branch / Tag | TensorFlow Version ------- | ---------------------------------------------------------- | ------------------ Nightly | master | tf-nightly 0.14.0 | v0.14.0 | 2.14.0 0.13.0 | v0.13.0 | 2.14.0 0.12.0 | v0.12.0 | 2.13.0 0.11.0 | v0.11.0 | 2.12.0 0.10.0 | v0.10.0 | 2.11.0 0.9.0 | v0.9.0 | 2.10.0 0.8.0 | v0.8.0 | 2.9.0 0.7.x | v0.7.0 | 2.8.0
Quick Start
Starting a Reverb server is as simple as:
import reverb
server = reverb.Server(tables=[
reverb.Table(
name='my_table',
sampler=reverb.selectors.Uniform(),
remover=reverb.selectors.Fifo(),
max_size=100,
rate_limiter=reverb.rate_limiters.MinSize(1)),
],
)
Create a client to communicate with the server:
client = reverb.Client(f'localhost:{server.port}')
print(client.server_info())
Write some data to the table:
# Creates a single item and data element [0, 1].
client.insert([0, 1], priorities={'my_table': 1.0})
An item can also reference multiple data elements:
# Appends three data elements and inserts a single item which references all
# of them as {'a': [2, 3, 4], 'b': [12, 13, 14]}.
with client.trajectory_writer(num_keep_alive_refs=3) as writer:
writer.append({'a': 2, 'b': 12})
writer.append({'a': 3, 'b': 13})
writer.append({'a': 4, 'b': 14})
# Create an item referencing all the data.
writer.create_item(
table='my_table',
priority=1.0,
trajectory={
'a': writer.history['a'][:],
'b': writer.history['b'][:],
})
# Block until the item has been inserted and confirmed by the server.
writer.flush()
The items we have added to Reverb can be read by sampling them:
# client.sample() returns a generator.
print(list(client.sample('my_table', num_samples=2)))
Continue with the Reverb Tutorial for an interactive tutorial.
Detailed overview
Experience replay has become an important tool for training off-policy
reinforcement learning policies. It is used by algorithms such as
[Deep Q-Networks (DQN)][DQN], [Soft Actor-Critic (SAC)][SAC],
[Deep Deterministic Policy Gradients (DDPG)][DDPG], and
[Hindsight Experience Replay][HER], ... However building an efficient, easy to
use, and scalable replay system can be challenging. For good performance Reverb
is implemented in C++ and to enable distributed usage it provides a gRPC service
for adding, sampling, and updating the contents of the tables. Python clients
expose the full functionality of the service in an easy to use fashion.
Furthermore native TensorFlow ops are available for performant integration with
TensorFlow and tf.data.
Although originally designed for off-policy reinforcement learning, Reverb's flexibility makes it just as useful for on-policy reinforcement -- or even (un)supervised learning. Creative users have even used Reverb to store and distribute frequently updated data (such as model weights), acting as an in-memory lightweight alternative to a distributed file system where each table represents a file.
Tables
A Reverb Server consists of one or more tables. A table holds items, and each
item references one or more data elements. Tables also define sample and
removal selection strategies, a maximum item
capacity, and a rate limiter.
Multiple items can reference the same data element, even if these items exist in different tables. This is because items only contain references to data elements (as opposed to a copy of the data itself). This also means that a data element is only removed when there exists no item that contains a reference to it.
For example, it is possible to set up one Table as a Prioritized Experience Replay (PER) for transitions (sequences of length 2), and another Table as a (FIFO) queue of sequences of length 3. In this case the PER data could be used to train DQN, and the FIFO data to train a transition model for the environment.

Items are automatically removed from the Table when one of two conditions are met:
-
Inserting a new item would cause the number of items in the Table to exceed its maximum capacity. Table's removal strategy is used to determine which item to remove.
-
An item has been sampled more than the maximum number of times permitted by the Table's rate limiter. Such item is deleted.
Data elements not referenced anymore by any item are also deleted.
Users have full control over how data is sampled and removed from Reverb
tables. The behavior is primarily controlled by the
item selection strategies provided to the Table
as the sampler and remover. In combination with the
rate_limiter and max_times_sampled, a wide range of
behaviors can be achieved. Some commonly used configurations include:
Uniform Experience Replay
A set of N=1000 most recently inserted items are maintained. By setting
sampler=reverb.selectors.Uniform(), the probability to select an item is the
same for all items. Due to reverb.rate_limiters.MinSize(100), sampling
requests will block until 100 items have been inserted. By setting
remover=reverb.selectors.Fifo() when an item needs to be removed the oldest
item is removed first.
reverb.Table(
name='my_uniform_experience_replay_buffer',
sampler=reverb.selectors.Uniform(),
remover=reverb.selectors.Fifo(),
max_size=1000,
rate_limiter=reverb.rate_limiters.MinSize(100),
)
Examples of algorithms that make use of uniform experience replay include [SAC] and [DDPG].
Prioritized Experience Replay
A set of N=1000 most recently inserted items. By setting
sampler=reverb.selectors.Prioritized(priority_exponent=0.8), the probability
to select an item is proportional to the item's priority.
Note: See [Schaul, Tom, et al.][PER] for the algorithm used in this implementation of Prioritized Experience Replay.
reverb.Table(
name='my_prioritized_experience_replay_buffer',
sampler=reverb.selectors.Prioritized(0.8),
remover=reverb.selectors.Fifo(),
max_size=1000,
rate_limiter=reverb.rate_limiters.MinSize(100),
)
Examples of algorithms that make use of Prioritized Experience Replay are DQN (and its variants), and [Distributed Distributional Deterministic Policy Gradients][D4PG].
Queue
Collection of up to N=1000 items where the oldest item is selected and removed
in the same operation. If the collection contains 1000 items then insert calls
are blocked until it is no longer full, if the collection is empty then sample
calls are blocked until there is at least one item.
reverb.Table(
name='my_queue',
sampler=reverb.selectors.Fifo(),
remover=reverb.selectors.Fifo(),
max_size=1000,
max_times_sampled=1
Related Skills
openpencil
2.1kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
Flyaro-waffle-app
Waffle Delight - Full Stack MERN Application Rules & Documentation Project Overview A comprehensive waffle delivery application built with MERN stack featuring premium UI/UX, admin management, a
ui-ux-pro-max-skill
60.7kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
