Semian
:monkey: Resiliency toolkit for Ruby for failing fast
Install / Use
/learn @Shopify/SemianREADME
Semian 

Semian is a library for controlling access to slow or unresponsive external services to avoid cascading failures.
When services are down they typically fail fast with errors like ECONNREFUSED
and ECONNRESET which can be rescued in code. However, slow resources fail
slowly. The thread serving the request blocks until it hits the timeout for the
slow resource. During that time, the thread is doing nothing useful and thus the
slow resource has caused a cascading failure by occupying workers and therefore
losing capacity. Semian is a library for failing fast in these situations,
allowing you to handle errors gracefully. Semian does this by intercepting
resource access through heuristic patterns inspired by [Hystrix][hystrix] and
[Release It][release-it]:
- Circuit breaker. A pattern for limiting the amount of requests to a dependency that is having issues.
- Bulkheading. Controlling the concurrent access to a single resource, access is coordinated server-wide with [SysV semaphores][sysv].
Resource drivers are monkey-patched to be aware of Semian, these are called Semian Adapters. Thus, every time resource access is requested Semian is queried for status on the resource first. If Semian, through the patterns above, deems the resource to be unavailable it will raise an exception. The ultimate outcome of Semian is always an exception that can then be rescued for a graceful fallback. Instead of waiting for the timeout, Semian raises straight away.
If you are already rescuing exceptions for failing resources and timeouts, Semian is mostly a drop-in library with a little configuration that will make your code more resilient to slow resource access. But, do you even need Semian?
For an overview of building resilient Ruby applications, start by reading [the Shopify blog post on Toxiproxy and Semian][resiliency-blog-post]. For more in depth information on Semian, see Understanding Semian. Semian is an extraction from [Shopify][shopify] where it's been running successfully in production since October, 2014.
The other component to your Ruby resiliency kit is [Toxiproxy][toxiproxy] to write automated resiliency tests.
Usage
Install by adding the gem to your Gemfile and require the adapters you need:
gem 'semian', require: %w(semian semian/mysql2 semian/redis)
We recommend this pattern of requiring adapters directly from the Gemfile.
This ensures Semian adapters are loaded as early as possible and also
protects your application during boot. Please see the adapter configuration
section on how to configure adapters.
Adapters
Semian works by intercepting resource access. Every time access is requested, Semian is queried, and it will raise an exception if the resource is unavailable according to the circuit breaker or bulkheads. This is done by monkey-patching the resource driver. The exception raised by the driver always inherits from the Base exception class of the driver, meaning you can always simply rescue the base class and catch both Semian and driver errors in the same rescue for fallbacks.
The following adapters are in Semian and tested heavily in production, the version is the version of the public gem with the same name:
- [
semian/mysql2][mysql-semian-adapter] (~> 0.3.16) - [
semian/redis][redis-semian-adapter] (~> 3.2.1) - [
semian/net_http][nethttp-semian-adapter] - [
semian/activerecord_trilogy_adapter][activerecord-trilogy-semian-adapter] - [
semian/activerecord_postgresql_adapter][activerecord-postgresql-semian-adapter] - [
semian-postgres][postgres-semian-adapter]
Creating Adapters
To create a Semian adapter you must implement the following methods:
- [
include Semian::Adapter][semian-adapter]. Use the helpers to wrap the resource. This takes care of situations such as monitoring, nested resources, unsupported platforms, creating the Semian resource if it doesn't already exist and so on. #semian_identifier. This is responsible for returning a symbol that represents every unique resource, for exampleredis_masterormysql_shard_1. This is usually assembled from anameattribute on the Semian configuration hash, but could also be<host>:<port>.connect. The name of this method varies. You must override the driver's connect method with one that wraps the connect call withSemian::Resource#acquire. You should do this at the lowest possible level.query. Same asconnectbut for queries on the resource.- Define exceptions
ResourceBusyErrorandCircuitOpenError. These are raised when the request was rejected early because the resource is out of tickets or because the circuit breaker is open (see Understanding Semian. They should inherit from the base exception class from the raw driver. For exampleMysql2::ErrororRedis::BaseConnectionErrorfor the MySQL and Redis drivers. This makes it easy torescueand handle them gracefully in application code, byrescueing the base class.
The best resource is looking at the already implemented adapters.
Configuration
There are some global configuration options that can be set for Semian:
# Maximum size of the LRU cache (default: 500)
# Note: Setting this to 0 enables aggressive garbage collection.
Semian.maximum_lru_size = 0
# Minimum time in seconds a resource should be resident in the LRU cache (default: 300s)
Semian.minimum_lru_time = 60
# If true, raise exceptions in case of a validation / constraint failure
# Otherwise, log in output
Semian.default_force_config_validation = false
Note: minimum_lru_time is a stronger guarantee than maximum_lru_size. That
is, if a resource has been updated more recently than minimum_lru_time it
will not be garbage collected, even if it would cause the LRU cache to grow
larger than maximum_lru_size.
Note: default_force_config_validation set to true is a
potentially breaking change. Misconfigured Semians will raise errors, so
make sure that this is what you want. See more in Configuration Validation.
When instantiating a resource it now needs to be configured for Semian. This is
done by passing semian as an argument when initializing the client. Examples
built in adapters:
# MySQL2 client
# In Rails this means having a Semian key in database.yml for each db.
client = Mysql2::Client.new(host: "localhost", username: "root", semian: {
name: "master",
tickets: 8, # See the Understanding Semian section on picking these values
success_threshold: 2,
error_threshold: 3,
error_timeout: 10,
force_config_validation: false
})
# Redis client
client = Redis.new(semian: {
name: "inventory",
tickets: 4,
success_threshold: 2,
error_threshold: 4,
error_timeout: 20
})
Redis Out-of-Memory Errors
By default, Redis Out-of-Memory (OOM) errors will open the circuit breaker. This can be
problematic because it prevents read operations and commands that could free up memory
(like DEL, LPOP, etc.) from executing, hindering Redis recovery.
To allow OOM errors to fail fast without opening the circuit, set open_circuit_on_oom: false:
client = Redis.new(semian: {
name: "inventory",
open_circuit_on_oom: false # OOM errors won't open the circuit
})
This also works with RedisClient:
client = RedisClient.config(
host: "localhost",
semian: {
name: "inventory",
open_circuit_on_oom: false
}
).new_client
Configuration Validation
Semian now provides a flag to specify log-based and exception-based configuration validation. To
explicitly force the Semian to validate it's configurations, pass force_config_validation: true
into your resource. This will raise an error in the case of a misconfigured or illegal Semian. Otherwise,
if it is set to false, it will log misconfigured parameters verbosely in output.
If not specified, it will use Semian.default_force_config_validation as
the flag.
Migration Strategy for Force Config Validation
When migrating to use force_config_validation: true, follow these steps:
- Deploy with it turned off: Start with
force_config_validation: falsein your configuration - Look for logs with prefix: Monitor your application logs for entries with the
[SEMIAN_CONFIG_WARNING]:prefix. These logs will indicate misconfigured Semian resources - Iterate to fix: Address each configuration issue identified in the logs by updating your Semian configurations
- Enable: Once all configuration issues are resolved, set
force_config_validation: trueto enable strict validation
Example log entries to look for:
[SEMIAN_CONFIG_WARNING]: Missing required arguments for Semian: [:success_threshold, :error_threshold, :error_timeout]
[SEMIAN_CONFIG_WARNING]: Both bulkhead and circuitbreaker cannot be disabled.
[SEMIAN_CONFIG_WARNING]: Bulkhead configuration require either the :tickets or :quota parameter, you provided neither
Thread Safety
Semian's circuit breaker implementation is thread-safe by default as of
v0.7.0. If you'd like to disable it for performance reasons, pass
thread_safety_disabled: true to the resource options.
Bulkheads should be disabled (pass bulkhead: false) in a threaded environment
(e.g. Puma or Sidekiq), but can safely be enabled in non-threaded environments
(e.g. Resque and Unicorn). As described in this document, circuit breakers alone
should be adequate in most environments with reasonably low timeouts.
Internally, semian uses SEM_UNDO for several sysv semaphore operations:
- Acquire
- Worker registration
- Semaphore metadata state lock
The intention behind
