Tensorforce
Tensorforce: a TensorFlow library for applied reinforcement learning
Install / Use
/learn @tensorforce/TensorforceREADME
Tensorforce: a TensorFlow library for applied reinforcement learning
This project is not maintained any longer!
Introduction
Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.
Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:
- Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
- Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
- Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.
Quicklinks
- Documentation and update notes
- Contact and Gitter channel
- Benchmarks and projects using Tensorforce
- Roadmap and contribution guidelines
- GitHub Sponsors and Liberapay
Table of content
- Installation
- Quickstart example code
- Command line usage
- Features
- Environment adapters
- Support, feedback and donating
- Core team and contributors
- Cite Tensorforce
Installation
A stable version of Tensorforce is periodically updated on PyPI and installed as follows:
pip3 install tensorforce
To always use the latest version of Tensorforce, install the GitHub version instead:
git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce
Note on installation on M1 Macs: At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the "M1 Macs" section in the documentation for a workaround.
Environments require additional packages for which there are setup options available (ale, gym, retro, vizdoom, carla; or envs for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa for TensorFlow Addons and tune for HpBandSter required for the tune.py script.
Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.
Quickstart example code
from tensorforce import Agent, Environment
# Pre-defined or custom environment
environment = Environment.create(
environment='gym', level='CartPole', max_episode_timesteps=500
)
# Instantiate a Tensorforce agent
agent = Agent.create(
agent='tensorforce',
environment=environment, # alternatively: states, actions, (max_episode_timesteps)
memory=10000,
update=dict(unit='timesteps', batch_size=64),
optimizer=dict(type='adam', learning_rate=3e-4),
policy=dict(network='auto'),
objective='policy_gradient',
reward_estimation=dict(horizon=20)
)
# Train for 300 episodes
for _ in range(300):
# Initialize episode
states = environment.reset()
terminal = False
while not terminal:
# Episode timestep
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
agent.observe(terminal=terminal, reward=reward)
agent.close()
environment.close()
Command line usage
Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:
python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
--level CartPole-v1 --episodes 100
For more information check out the documentation.
Features
- Network layers: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; plus support of Keras layers.
- Network architecture: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
- Memory types: Simple batch buffer memory, random replay memory.
- Policy distributions: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
- Reward estimation: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
- Training objectives: (Deterministic) policy gradient, state-(action-)value approximation.
- Optimization algorithms: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
- Exploration: Randomized actions, sampling temperature, variable noise.
- Preprocessing: Clipping, deltafier, sequence, image processing.
- Regularization: L2 and entropy regularization.
- Execution modes: Parallelized execution of multiple environments based on Python's
multiprocessingandsocket. - Optimized act-only SavedModel extraction.
- TensorBoard support.
By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:
- Q-learning: Deep Q-learning, Double-DQN, Dueling DQN, n-step DQN, Normalised Advantage Function (NAF)
- Policy gradient: vanilla policy-gradient / REINFORCE, Actor-critic and A3C, Proximal Policy Optimization, Trust Region Policy Optimization, Deterministic Policy Gradient
Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.
Environment adapters
- Arcade Learning Environment, a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
- CARLA, is an open-source simulator for autonomous driving research.
- OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
- OpenAI Retro, lets you turn classic video games into Gym environments for reinfo
