...Minimizing the mean square error on future experience.  - Richard S. Sutton

<a name="title"></a>BTGym

Scalable event-driven RL-friendly backtesting library. Build on top of Backtrader with OpenAI Gym environment API.

Backtrader is open-source algorithmic trading library:
GitHub: http://github.com/mementum/backtrader
Documentation and community:
http://www.backtrader.com/

OpenAI Gym is..., well, everyone knows Gym:
GitHub: http://github.com/openai/gym
Documentation and community:
https://gym.openai.com/

<a name="outline"></a>Outline

General purpose of this project is to provide gym-integrated framework for running reinforcement learning experiments in [close to] real world algorithmic trading environments.

DISCLAIMER:
Code presented here is research/development grade.
Can be unstable, buggy, poor performing and is subject to change.

Note that this package is neither out-of-the-box-moneymaker, nor it provides ready-to-converge RL solutions.
Think of it as framework for setting experiments with complex non-stationary stochastic environments.

As a research project BTGym in its current stage can hardly deliver easy end-user experience in as sense that
setting meaninfull  experiments will require some practical programming experience as well as general knowledge
of reinforcement learning theory.

News and update notes

<a name="contents"></a>Contents

Installation
Quickstart
Description
- Problem setting
- Data sampling approaches
Documentation and community
Known bugs and limitations
Roadmap
Update news

<a name="install"></a>Installation

It is highly recommended to run BTGym in designated virtual environment.

Clone or copy btgym repository to local disk, cd to it and run: pip install -e . to install package and all dependencies:

git clone https://github.com/Kismuz/btgym.git

cd btgym

pip install -e .

To update to latest version::

cd btgym

git pull

pip install --upgrade -e .

Notes:

BTGym requres Matplotlib version 2.0.2, downgrade your installation if you have version 2.1:

pip install matplotlib==2.0.2
LSOF utility should be installed to your OS, which can not be the default case for some Linux distributives, see: https://en.wikipedia.org/wiki/Lsof

<a name="start"></a>Quickstart

Making gym environment with all parmeters set to defaults is as simple as:

from btgym import BTgymEnv

MyEnvironment = BTgymEnv(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',)

Adding more controls may look like:

from gym import spaces
from btgym import BTgymEnv

MyEnvironment = BTgymEnv(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',
                         episode_duration={'days': 2, 'hours': 23, 'minutes': 55},
                         drawdown_call=50,
                         state_shape=dict(raw=spaces.Box(low=0,high=1,shape=(30,4))),
                         port=5555,
                         verbose=1,
                         )

See more options at Documentation: Quickstart >>

and how-to's in Examples directory >>.

<a name="description"></a> General description

<a name="problem"></a> Problem setting

Discrete actions setup: consider setup with one riskless asset acting as broker account cash and K (by default - one) risky assets. For every risky asset there exists track of historic price records referred as data-line. Apart from assets data lines there [optionally] exists number of exogenous data lines holding some information and statistics, e.g. economic indexes, encoded news, macroeconomic indicators, weather forecasts etc. which are considered relevant to decision-making. It is supposed for this setup that:
1. there is no interest rates for any asset;
2. broker actions are fixed-size market orders (buy, sell, close); short selling is permitted;
3. transaction costs are modelled via broker commission;
4. 'market liquidity' and 'capital impact' assumptions are met;
5. time indexes match for all data lines provided;
The problem is modelled as discrete-time finite-horizon partially observable Markov decision process for equity/currency trading:
- for every asset traded agent action space is discrete (0: hold [do nothing], 1:buy, 2: sell, 3:close [position]);
- environment is episodic: maximum episode duration and episode termination conditions are set;
- for every timestep of the episode agent is given environment state observation as tensor of last m time-embedded preprocessed values for every data-line included and emits actions according some stochastic policy.
- agent's goal is to maximize expected cumulative capital by learning optimal policy;
Continuous actions setup[BETA]: this setup closely relates to continuous portfolio optimisation problem definition; it differs from setup above in:
1. base broker actions are real numbers: a[i] in [0,1], 0<=i<=K, SUM{a[i]} = 1 for K risky assets added; each action is a market target order to adjust portfolio to get share a[i]*100% for i-th asset;
2. entire single-step broker action is dictionary of form: {cash_name: a[0], asset_name_1: a[1], ..., asset_name_K: a[K]};
3. short selling is not permitted;
For RL it implies having continuous action space as K+1 dim vector.

<a name="data"></a> Data selection options for backtest agent training:

Notice: data shaping approach is under development, expect some changes. [7.01.18]

random sampling: historic price change dataset is divided to training, cross-validation and testing subsets. Since agent actions do not influence market, it is possible to randomly sample continuous subset of training data for every episode. [Seems to be] most data-efficient method. Cross-validation and testing performed later as usual on most "recent" data;
sequential sampling: full dataset is feeded sequentially as if agent is performing real-time trading, episode by episode. Most reality-like, least data-efficient, natural non-stationarity remedy.
sliding time-window sampling: mixture of above, episde is sampled randomly from comparatively short time period, sliding from furthest to most recent training data. Should be less prone to overfitting than random sampling.

<a name="reference"></a>Documentation and Community

Read Docs and API Reference.
Browse Development Wiki.
Review opened and closed Issues.
Go to BTGym Slack channel. If you are new - use this invite link to join.

<a name="issues"></a> Known bugs and limitations:

requres Matplotlib version 2.0.2;
matplotlib backend warning: appears when importing pyplot and using %matplotlib inline magic before btgym import. It's recommended to import btacktrader and btgym first to ensure proper backend choice;
not tested with Python < 3.5;
doesn't seem to work correctly under Windows; partially done
by default, is configured to accept Forex 1 min. data from www.HistData.com;
~~only random data sampling is implemented;~~
~~no built-in dataset splitting to training/cv/testing subsets;~~ done
~~only one equity/currency pair can be traded~~ done
~~no 'skip-frames' implementation within environment;~~ done
~~no plotting features, except if using pycharm integration observer.~~ ~~Not sure if it is suited for intraday strategies.~~ [partially] done
~~making new environment kills all processes using specified network port. Watch out your jupyter kernels.~~ fixed

<a name="roadmap"></a> TODO's and Road Map:

[x] refine logic for parameters applying priority (engine vs strategy vs kwargs vs defaults);
[X] API reference;
[x] examples;
[x] frame-skipping feature;
[x] dataset tr/cv/t approach;
[x] state rendering;
[x] proper rendering for entire episode;
[x] tensorboard integration;
[x] multiply agents asynchronous operation feature (e.g for A3C):
[x] dedicated data server;
[x] multi-modal observation space shape;
[x] A3C implementation for BTgym;
[x] UNREAL implementation for BTgym;
[x] PPO implementation for BTgym;
[ ] RL^2 / MAML / DARLA adaptations - IN PROGRESS;
[x] learning from demonstrations; - partially done
[ ] risk-sensitive agents implementation;
[x] sequential and sliding time-window sampling;
[x] multiply instruments trading;
[x] docker image; - CPU version, Signalprime contribution,
[ ] TF serving model serialisation functionality;

<a name="news"></a>News and updates:

10.01.2019:
- docker CPU version is now available, contributed by Signalprime, (https://github.com/signalprime), see btgym/docker/README.md for details;
9.02.2019:
- Introduction to analytic data model notebook added to model_based_stat_arb examples folder.
25.01.2019: updates:
- lstm_policy class now requires both internal and external observation sub-spaces to be present and allows both be one-level nested sub-spaces itself (was only true for external); all declared sub-spaces got encoded by separate convolution encoders;
- policy deterministic action option is implemented for discrete action spaces and can be utilised by syncro_runner; by default it is enabled for test episodes;

Btgym

Install / Use

README