Gubernator
High Performance Rate Limiting MicroService and Library
Install / Use
/learn @mailgun/GubernatorREADME
DEVELOPMENT ON GUBERNATOR HAS MOVED TO A NEW HOME AT gubernator-io/gubernator
v2.4.0 is the final version available from his repo, all new features and bug fixes will occur under the new repo.
Gubernator
Gubernator is a distributed, high performance, cloud native and stateless rate-limiting service.
Features
- Gubernator evenly distributes rate limit requests across the entire cluster, which means you can scale the system by simply adding more nodes.
- Gubernator doesn’t rely on external caches like memcached or redis, as such there is no deployment synchronization with a dependant service. This makes dynamically growing or shrinking the cluster in an orchestration system like kubernetes or nomad trivial.
- Gubernator holds no state on disk, It’s configuration is passed to it by the client on a per-request basis.
- Gubernator provides both GRPC and HTTP access to the API.
- It Can be run as a sidecar to services that need rate limiting or as a separate service.
- It Can be used as a library to implement a domain-specific rate limiting service.
- Supports optional eventually consistent rate limit distribution for extremely high throughput environments. (See GLOBAL behavior architecture.md)
- Gubernator is the english pronunciation of governor in Russian, also it sounds cool.
Stateless configuration
Gubernator is stateless in that it doesn’t require disk space to operate. No configuration or cache data is ever synced to disk. This is because every request to gubernator includes the config for the rate limit. At first you might think this an unnecessary overhead to each request. However, In reality a rate limit config is made up of only 4, 64bit integers.
Quick Start
# Download the docker-compose file
$ curl -O https://raw.githubusercontent.com/mailgun/gubernator/master/docker-compose.yaml
# Run the docker container
$ docker-compose up -d
Now you can make rate limit requests via CURL
# Hit the HTTP API at localhost:9080 (GRPC is at 9081)
$ curl http://localhost:9080/v1/HealthCheck
# Make a rate limit request
$ curl http://localhost:9080/v1/GetRateLimits \
--header 'Content-Type: application/json' \
--data '{
"requests": [
{
"name": "requests_per_sec",
"uniqueKey": "account:12345",
"hits": "1",
"limit": "10",
"duration": "1000"
}
]
}'
ProtoBuf Structure
An example rate limit request sent via GRPC might look like the following
rate_limits:
# Scopes the request to a specific rate limit
- name: requests_per_sec
# A unique_key that identifies this instance of a rate limit request
unique_key: account_id=123|source_ip=172.0.0.1
# The number of hits we are requesting
hits: 1
# The total number of requests allowed for this rate limit
limit: 100
# The duration of the rate limit in milliseconds
duration: 1000
# The algorithm used to calculate the rate limit
# 0 = Token Bucket
# 1 = Leaky Bucket
algorithm: 0
# The behavior of the rate limit in gubernator.
# 0 = BATCHING (Enables batching of requests to peers)
# 1 = NO_BATCHING (Disables batching)
# 2 = GLOBAL (Enable global caching for this rate limit)
behavior: 0
An example response would be
rate_limits:
# The status of the rate limit. OK = 0, OVER_LIMIT = 1
- status: 0,
# The current configured limit
limit: 10,
# The number of requests remaining
remaining: 7,
# A unix timestamp in milliseconds of when the bucket will reset, or if
# OVER_LIMIT is set it is the time at which the rate limit will no
# longer return OVER_LIMIT.
reset_time: 1551309219226,
# Additional metadata about the request the client might find useful
metadata:
# This is the name of the coordinator that rate limited this request
"owner": "api-n03.staging.us-east-1.mailgun.org:9041"
Rate limit Algorithm
Gubernator currently supports 2 rate limit algorithms.
-
Token Bucket implementation starts with an empty bucket, then each
Hitadds a token to the bucket until the bucket is full. Once the bucket is full, requests will returnOVER_LIMITuntil thereset_timeis reached at which point the bucket is emptied and requests will returnUNDER_LIMIT. This algorithm is useful for enforcing very bursty limits. (IE: Applications where a single request can add more than 1hitto the bucket; or non network based queuing systems.) The downside to this implementation is that once you have hit the limit no more requests are allowed until the configured rate limit duration resets the bucket to zero. -
Leaky Bucket is implemented similarly to Token Bucket where
OVER_LIMITis returned when the bucket is full. However tokens leak from the bucket at a consistent rate which is calculated asduration / limit. This algorithm is useful for metering, as the bucket leaks allowing traffic to continue without the need to wait for the configured rate limit duration to reset the bucket to zero.
Performance
In our production environment, for every request to our API we send 2 rate limit requests to gubernator for rate limit evaluation, one to rate the HTTP request and the other is to rate the number of recipients a user can send an email too within the specific duration. Under this setup a single gubernator node fields over 2,000 requests a second with most batched responses returned in under 1 millisecond.

Peer requests forwarded to owning nodes typically respond in under 30 microseconds.

NOTE The above graphs only report the slowest request within the 1 second sample time. So you are seeing the slowest requests that gubernator fields to clients.
Gubernator allows users to choose non-batching behavior which would further reduce latency for client rate limit requests. However because of throughput requirements our production environment uses Behaviour=BATCHING with the default 500 microsecond window. In production we have observed batch sizes of 1,000 during peak API usage. Other users who don’t have the same high traffic demands could disable batching and would see lower latencies but at the cost of throughput.
Gregorian Behavior
Users may choose a behavior called DURATION_IS_GREGORIAN which changes the
behavior of the Duration field. When Behavior is set to DURATION_IS_GREGORIAN
the Duration of the rate limit is reset whenever the end of selected gregorian
calendar interval is reached.
This is useful when you want to impose daily or monthly limits on a resource. Using this behavior you know when the end of the day or month is reached the limit on the resource is reset regardless of when the first rate limit request was received by Gubernator.
Given the following Duration values
- 0 = Minutes
- 1 = Hours
- 2 = Days
- 3 = Weeks
- 4 = Months
- 5 = Years
Examples when using Behavior = DURATION_IS_GREGORIAN
- If
Duration = 2(Days) then the rate limit will reset toCurrent = 0at the end of the current day the rate limit was created. - If
Duration = 0(Minutes) then the rate limit will reset toCurrent = 0at the end of the minute the rate limit was created. - If
Duration = 4(Months) then the rate limit will reset toCurrent = 0at the end of the month the rate limit was created.
Reset Remaining Behavior
Users may add behavior Behavior_RESET_REMAINING to the rate check request.
This will reset the rate limit as if created new on first use.
When using Reset Remaining, the Hits field should be 0.
Drain Over Limit Behavior
Users may add behavior Behavior_DRAIN_OVER_LIMIT to the rate check request.
A GetRateLimits call drains the remaining counter on first over limit event.
Then, successive GetRateLimits calls will return zero remaining counter and
not any residual value. This behavior works best with token bucket algorithm
because the Remaining counter will stay zero after an over limit until reset
time, whereas leaky bucket algorithm will immediately update Remaining to a
non-zero value.
This facilitates scenarios that require an over limit event to stay over limit
until the rate limit resets. This approach is necessary if a process must make
two rate checks, before and after a process, and the Hit amount is not known
until after the process.
- Before process: Call
GetRateLimitswithHits=0to check the value ofRemainingcounter. IfRemainingis zero, it's known that the rate limit is depleted and the process can be aborted. - After process: Call
GetRateLimitswith a user specifiedHitsvalue. If the call returns over limit, the process cannot be aborted because it had already completed. UsingDRAIN_OVER_LIMITbehavior, theRemainingcount will be drained to zero.
Once an over limit occurs in the "After" step, successive processes will detect the over limit state in the "Before" step.
Gubernator as a library
If you are using golang, you can use Gubernator as a library. This is useful if
you wish to implement a rate limit service with your own company specific model
on top. We do this internally here at mailgun with a service we creatively
called ratelimits which keeps track of the limits imposed on a per account
basis. In this way you can utilize the power and speed of Gubernator but still
layer business logic and integrate domain specific problems into your rate
limiting service.
When you use the library, your service becomes a full member of the cluster partic
