SkillAgentSearch skills...

Semian

:monkey: Resiliency toolkit for Ruby for failing fast

Install / Use

/learn @Shopify/Semian
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Semian Build Status

Semian is a library for controlling access to slow or unresponsive external services to avoid cascading failures.

When services are down they typically fail fast with errors like ECONNREFUSED and ECONNRESET which can be rescued in code. However, slow resources fail slowly. The thread serving the request blocks until it hits the timeout for the slow resource. During that time, the thread is doing nothing useful and thus the slow resource has caused a cascading failure by occupying workers and therefore losing capacity. Semian is a library for failing fast in these situations, allowing you to handle errors gracefully. Semian does this by intercepting resource access through heuristic patterns inspired by [Hystrix][hystrix] and [Release It][release-it]:

  • Circuit breaker. A pattern for limiting the amount of requests to a dependency that is having issues.
  • Bulkheading. Controlling the concurrent access to a single resource, access is coordinated server-wide with [SysV semaphores][sysv].

Resource drivers are monkey-patched to be aware of Semian, these are called Semian Adapters. Thus, every time resource access is requested Semian is queried for status on the resource first. If Semian, through the patterns above, deems the resource to be unavailable it will raise an exception. The ultimate outcome of Semian is always an exception that can then be rescued for a graceful fallback. Instead of waiting for the timeout, Semian raises straight away.

If you are already rescuing exceptions for failing resources and timeouts, Semian is mostly a drop-in library with a little configuration that will make your code more resilient to slow resource access. But, do you even need Semian?

For an overview of building resilient Ruby applications, start by reading [the Shopify blog post on Toxiproxy and Semian][resiliency-blog-post]. For more in depth information on Semian, see Understanding Semian. Semian is an extraction from [Shopify][shopify] where it's been running successfully in production since October, 2014.

The other component to your Ruby resiliency kit is [Toxiproxy][toxiproxy] to write automated resiliency tests.

Usage

Install by adding the gem to your Gemfile and require the adapters you need:

gem 'semian', require: %w(semian semian/mysql2 semian/redis)

We recommend this pattern of requiring adapters directly from the Gemfile. This ensures Semian adapters are loaded as early as possible and also protects your application during boot. Please see the adapter configuration section on how to configure adapters.

Adapters

Semian works by intercepting resource access. Every time access is requested, Semian is queried, and it will raise an exception if the resource is unavailable according to the circuit breaker or bulkheads. This is done by monkey-patching the resource driver. The exception raised by the driver always inherits from the Base exception class of the driver, meaning you can always simply rescue the base class and catch both Semian and driver errors in the same rescue for fallbacks.

The following adapters are in Semian and tested heavily in production, the version is the version of the public gem with the same name:

  • [semian/mysql2][mysql-semian-adapter] (~> 0.3.16)
  • [semian/redis][redis-semian-adapter] (~> 3.2.1)
  • [semian/net_http][nethttp-semian-adapter]
  • [semian/activerecord_trilogy_adapter][activerecord-trilogy-semian-adapter]
  • [semian/activerecord_postgresql_adapter][activerecord-postgresql-semian-adapter]
  • [semian-postgres][postgres-semian-adapter]

Creating Adapters

To create a Semian adapter you must implement the following methods:

  1. [include Semian::Adapter][semian-adapter]. Use the helpers to wrap the resource. This takes care of situations such as monitoring, nested resources, unsupported platforms, creating the Semian resource if it doesn't already exist and so on.
  2. #semian_identifier. This is responsible for returning a symbol that represents every unique resource, for example redis_master or mysql_shard_1. This is usually assembled from a name attribute on the Semian configuration hash, but could also be <host>:<port>.
  3. connect. The name of this method varies. You must override the driver's connect method with one that wraps the connect call with Semian::Resource#acquire. You should do this at the lowest possible level.
  4. query. Same as connect but for queries on the resource.
  5. Define exceptions ResourceBusyError and CircuitOpenError. These are raised when the request was rejected early because the resource is out of tickets or because the circuit breaker is open (see Understanding Semian. They should inherit from the base exception class from the raw driver. For example Mysql2::Error or Redis::BaseConnectionError for the MySQL and Redis drivers. This makes it easy to rescue and handle them gracefully in application code, by rescueing the base class.

The best resource is looking at the already implemented adapters.

Configuration

There are some global configuration options that can be set for Semian:

# Maximum size of the LRU cache (default: 500)
# Note: Setting this to 0 enables aggressive garbage collection.
Semian.maximum_lru_size = 0

# Minimum time in seconds a resource should be resident in the LRU cache (default: 300s)
Semian.minimum_lru_time = 60

# If true, raise exceptions in case of a validation / constraint failure
# Otherwise, log in output
Semian.default_force_config_validation = false

Note: minimum_lru_time is a stronger guarantee than maximum_lru_size. That is, if a resource has been updated more recently than minimum_lru_time it will not be garbage collected, even if it would cause the LRU cache to grow larger than maximum_lru_size.

Note: default_force_config_validation set to true is a potentially breaking change. Misconfigured Semians will raise errors, so make sure that this is what you want. See more in Configuration Validation.

When instantiating a resource it now needs to be configured for Semian. This is done by passing semian as an argument when initializing the client. Examples built in adapters:

# MySQL2 client
# In Rails this means having a Semian key in database.yml for each db.
client = Mysql2::Client.new(host: "localhost", username: "root", semian: {
  name: "master",
  tickets: 8, # See the Understanding Semian section on picking these values
  success_threshold: 2,
  error_threshold: 3,
  error_timeout: 10,
  force_config_validation: false
})

# Redis client
client = Redis.new(semian: {
  name: "inventory",
  tickets: 4,
  success_threshold: 2,
  error_threshold: 4,
  error_timeout: 20
})
Redis Out-of-Memory Errors

By default, Redis Out-of-Memory (OOM) errors will open the circuit breaker. This can be problematic because it prevents read operations and commands that could free up memory (like DEL, LPOP, etc.) from executing, hindering Redis recovery.

To allow OOM errors to fail fast without opening the circuit, set open_circuit_on_oom: false:

client = Redis.new(semian: {
  name: "inventory",
  open_circuit_on_oom: false  # OOM errors won't open the circuit
})

This also works with RedisClient:

client = RedisClient.config(
  host: "localhost",
  semian: {
    name: "inventory",
    open_circuit_on_oom: false
  }
).new_client

Configuration Validation

Semian now provides a flag to specify log-based and exception-based configuration validation. To explicitly force the Semian to validate it's configurations, pass force_config_validation: true into your resource. This will raise an error in the case of a misconfigured or illegal Semian. Otherwise, if it is set to false, it will log misconfigured parameters verbosely in output.

If not specified, it will use Semian.default_force_config_validation as the flag.

Migration Strategy for Force Config Validation

When migrating to use force_config_validation: true, follow these steps:

  1. Deploy with it turned off: Start with force_config_validation: false in your configuration
  2. Look for logs with prefix: Monitor your application logs for entries with the [SEMIAN_CONFIG_WARNING]: prefix. These logs will indicate misconfigured Semian resources
  3. Iterate to fix: Address each configuration issue identified in the logs by updating your Semian configurations
  4. Enable: Once all configuration issues are resolved, set force_config_validation: true to enable strict validation

Example log entries to look for:

[SEMIAN_CONFIG_WARNING]: Missing required arguments for Semian: [:success_threshold, :error_threshold, :error_timeout]
[SEMIAN_CONFIG_WARNING]: Both bulkhead and circuitbreaker cannot be disabled.
[SEMIAN_CONFIG_WARNING]: Bulkhead configuration require either the :tickets or :quota parameter, you provided neither

Thread Safety

Semian's circuit breaker implementation is thread-safe by default as of v0.7.0. If you'd like to disable it for performance reasons, pass thread_safety_disabled: true to the resource options.

Bulkheads should be disabled (pass bulkhead: false) in a threaded environment (e.g. Puma or Sidekiq), but can safely be enabled in non-threaded environments (e.g. Resque and Unicorn). As described in this document, circuit breakers alone should be adequate in most environments with reasonably low timeouts.

Internally, semian uses SEM_UNDO for several sysv semaphore operations:

  • Acquire
  • Worker registration
  • Semaphore metadata state lock

The intention behind

View on GitHub
GitHub Stars1.5k
CategoryDevelopment
Updated5d ago
Forks87

Languages

Ruby

Security Score

100/100

Audited on Mar 28, 2026

No findings