Turnstile
A distributed rate limiting WSGI middleware.
Install / Use
/learn @klmitch/TurnstileREADME
============================================== Turnstile Distributed Rate-Limiting Middleware
Turnstile is a piece of WSGI middleware that performs true distributed rate-limiting. System administrators can run an API on multiple nodes, then place this middleware in the pipeline prior to the application. Turnstile uses a Redis database to track the rate at which users are hitting the API, and can then apply configured rate limits, even if each request was made against a different API node.
Installing Turnstile
Turnstile can be easily installed like many Python packages, using
PIP_::
pip install turnstile
You can install the dependencies required by Turnstile by issuing the following command::
pip install -r .requires
From within your Turnstile source directory.
If you would like to run the tests, you can install the additional test dependencies in the same way::
pip install -r .test-requires
Then, to run the test suite, use::
nosetests -v
Alternatively, it is possible to run the full test suite using a virtual environment using the tox tool; this is the recommended way for developers to run the test suite. Four environments are defined: "py26" and "py27" run the tests under Python 2.6 and Python 2.7, respectively; "pep8" runs the pep8 style compliance tool (which should only be done by developers); and "cover" runs the test suite under the default Python installation, but with coverage enabled. The coverage report generated by the "cover" environment is summarized in the HTML files present in the "cov_html" subdirectory. An example tox invocation::
tox -e py27,pep8
Adding and Configuring Turnstile
Turnstile is intended for use with PasteDeploy-style configuration
files. It is a filter, and should be placed in an appropriate place
in the WSGI pipeline such that the limit classes used with Turnstile
can access the information necessary to make rate-limiting decisions.
(With the turnstile.limits:Limit class provided by Turnstile, no
additional information is required, as that class does not
differentiate between users of your application.)
The filter section of the PasteDeploy configuration file will also
need to contain enough information to allow Turnstile to access the
Redis database. Other options may be configured from here as well,
such as the enable configuration variable. The simplest example
of a Turnstile configuration would be::
[filter:turnstile]
use = egg:turnstile#turnstile
redis.host = <your Redis database host name or IP>
The following are the recognized configuration options:
compactor.compactor_key Specifies the sorted set that the compactor daemon uses for communication of buckets that need to be compacted. (See below for more information about the purpose of the compactor daemon.) This option defaults to "compactor".
compactor.compactor_lock
When multiple compactor daemons are being run, it is necessary to
serialize their access to the sorted set specified by
compactor.compactor_key. This option specifies a Redis key
containing the lock, and it defaults to "compactor_lock".
compactor.compactor_timeout If a compactor daemon (or its host) crashes while holding the lock, the lock will eventually time out, to allow other compactor daemons to run. This option specifies the timeout in seconds, and defaults to 30.
compactor.max_age The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These records must age for a minimum amount of time, to ensure that all Turnstile instances have seen them, before the compactor daemon can run on the bucket. However, if the summarize request to the compactor daemon is lost, there must be a timeout, to ensure that a new request to summarize a given bucket may be submitted. This option specifies a maximum age for a "summarize" record, in seconds, and defaults to 600.
compactor.max_updates The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These requests are generated when the number of update records in the bucket representation exceed the value specified by this configuration value. This option must be specified to enable the compaction logic; a good value would be 30.
compactor.min_age The bucket processing logic adds special "summarize" records to the bucket representation, to signal to other Turnstile instances that a request to summarize the bucket has been submitted. These records must age for a minimum amount of time, to ensure that all Turnstile instances have seen them, before the compactor daemon can run on the bucket. This option specifies the minimum age for a "summarize" record, in seconds, and defaults to 30.
compactor.sleep The compactor daemon reads bucket keys from a sorted set in the Redis database. If no keys are present, it will read from the sorted set again, in a loop. To ensure that the compactor daemon does not consume too much CPU time, after each read that returns no bucket to compact, it will sleep for the number of seconds defined by this option. The default is 5.
config
Allows specification of an alternate configuration file. This can
be used to generate a single file which can be shared by WSGI
servers using the Turnstile middleware and the various provided
tools. This can also allow for separation of code-related options,
such as the enable option, from pure configuration, such as the
redis.host option. The configuration file is an INI-formatted
file, with section names corresponding to the first segment of the
configuration option name. That is, the redis.host option would
be set as follows::
[redis]
host = <your Redis database host name or IP>
Configuration options which have no prefix are grouped under the
[turnstile] section of the file, as follows::
[turnstile]
status = 404 Not Found
Note that specifying the config option in the [turnstile]
section will have no effect; it is not possible to cause another
configuration file to be included in this way.
control.channel Specifies the channel that the control daemon listens on. (See below for more information about the purpose of the control daemon.) This option defaults to "control".
control.errors_channel Specifies the channel that the control daemon (see below) reports errors to. This option defaults to "errors".
control.errors_key Specifies the key of a set in the Redis database to which errors will be stored. This option defaults to "errors".
control.limits_key The key under which the limits are stored in the database. See the section on tools for more information on how to load and dump the limits stored in the Redis database. This option defaults to "limits".
control.node_name The name of the node. If provided, this option allows the specification of a recognizable name for the node. Currently, this node name is only reported when issuing a "ping" command to the control daemon (see below), and may be used to verify that all hosts responded to the ping.
control.reload_spread When limits are changed in the database, a command is sent to the control daemon (see below) to cause the limits to be reloaded. As having all nodes hit the Redis database simultaneously may overload the database, this option, if set, allows the reload to be spread out randomly within a configured interval. This option should be set to the size of the desired interval, in seconds. If not set, limits will be reloaded immediately by all nodes.
control.remote
If set to "on", "yes", "true", or "1", Turnstile will connect to a
remote control daemon (see the remote_daemon tool described
below). This enables Turnstile to be compatible with WSGI servers
which use multiple worker processes. Note that the configuration
values control.remote.authkey, control.remote.host, and
control.remote.port are required.
control.remote.authkey
Set to an authentication key, for use when control.remote is
enabled. Must be the value used by the invocation of
remote_daemon.
control.remote.host
Set to a host name or IP address, for use when control.remote is
enabled. Must be the value used by the invocation of
remote_daemon.
control.remote.port
Set to a port number, for use when control.remote is enabled.
Must be the value used by the invocation of remote_daemon.
control.shard_hint
Can be used to set a sharding hint which will be provided to the
listening thread of the control daemon (see below). This hint is
not used by the default Redis Connection class.
enable
Contains a list of turnstile.preprocessor and
turnstile.postprocessor entrypoint names. Each name is resolved
into a preprocessor and postprocessor function (missing entrypoints
are ignored) and installed, as with the preprocess and
postprocess configuration options. Note that the postprocessors
will be in the reverse ordering of the list contained in this
option. See the section on entrypoints for more information.
Note that, if enable is used, preprocess and postprocess
will be ignored.
formatter
In previous versions of Turnstile, the only way to change the way
the delay response was generated was to subclass
turnstile.middleware.TurnstileMiddleware and override the
format_delay() method; this subclass could then be used by
specifying it as the value of the turnstile option. This
version now allows the formatter to be explicitly speci
