sturdyC-fn-2

A sturdy gopher shielding data sources from rapidly incoming requests.

`sturdyc`: a caching library for building sturdy systems

sturdyc eliminates cache stampedes and can minimize data source load in high-throughput systems through features such as request coalescing and asynchronous refreshes. It combines the speed of in-memory caching with granular control over data freshness. At its core, sturdyc provides non-blocking reads and sharded writes for minimal lock contention. The xxhash algorithm is used for efficient key distribution.

It has all the functionality you would expect from a caching library, but what sets it apart are the flexible configurations that have been designed to make I/O heavy applications both robust and highly performant.

We have been using this package in production to enhance both the performance and reliability of our services that retrieve data from distributed caches, databases, and external APIs. While the API surface of sturdyc is tiny, it offers extensive configuration options. I encourage you to read through this README and experiment with the examples in order to understand its full capabilities.

This screenshot shows the P95 latency improvements we observed after adding this package in front of a distributed key-value store:

And through a combination of inflight-tracking, asynchronous refreshes, and refresh coalescing, we reduced load on underlying data sources by more than 90%. This reduction in outgoing requests has enabled us to operate with fewer containers and significantly cheaper database clusters.

Below is the table of contents for what this README is going to cover. However, if this is your first time using this package, I encourage you to read these examples in the order they appear. Most of them build on each other, and many share configurations.

Installing

go get github.com/viccon/sturdyc

Creating a cache client

The first thing you will have to do is to create a cache client to hold your configuration:

// Maximum number of entries in the cache. Exceeding this number will trigger
// an eviction (as long as the "evictionPercentage" is greater than 0).
capacity := 10000
// Number of shards to use. Increasing this number will reduce write lock collisions.
numShards := 10
// Time-to-live for cache entries.
ttl := 2 * time.Hour
// Percentage of entries to evict when the cache reaches its capacity. Setting this
// to 0 will make writes a no-op until an item has either expired or been deleted.
evictionPercentage := 10

// Create a cache client with the specified configuration.
cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage)

cacheClient.Set("key1", 99)
log.Println(cacheClient.Size())
log.Println(cacheClient.Get("key1"))

cacheClient.Delete("key1")
log.Println(cacheClient.Size())
log.Println(cacheClient.Get("key1"))

The New function is variadic, and as the final argument we're also able to provide a wide range of configuration options, which we will explore in detail in the sections to follow.

Evictions

The cache has two eviction strategies. One is a background job which continuously evicts expired records from each shard. However, there are options to both tweak the interval at which the job runs:

cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage,
    sturdyc.WithEvictionInterval(time.Second),
)

as well as disabling the functionality altogether:

cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage,
    sturdyc.WithNoContinuousEvictions()
)

The latter can give you a slight performance boost in situations where you're unlikely to ever exceed the capacity you've assigned to your cache.

However, if the capacity is reached, the second eviction strategy is triggered. This process performs evictions on a per-shard basis, selecting records for removal based on recency. The eviction algorithm uses quickselect, which has an O(N) time complexity without the overhead of requiring write locks on reads to update a recency list, as many LRU caches do.

Next, we'll start to look at some of the more advanced features.

Get or fetch

I have tried to design the API in a way that should make the process of integrating sturdyc with any data source as straightforward as possible. While it provides the basic get/set methods you would expect from a cache, the advanced functionality is accessed through just two core functions: GetOrFetch and GetOrFetchBatch

As an example, let's say that we had the following code for fetching orders from an API:

func (c *Client) Order(ctx context.Context, id string) (Order, error) {
	timeoutCtx, cancel := context.WithTimeout(ctx, c.timeout)
	defer cancel()

	var response Order
	err := requests.URL(c.orderURL).
		Pathf("/order/%s", id).
		ToJSON(&response).
		Fetch(timeoutCtx)

	return response, err
}

All we would have to do is wrap the lines of code that retrieves the data in a function, and then hand that over to our cache client:

func (c *Client) Order(ctx context.Context, id string) (Order, error) {
	fetchFunc := func(ctx context.Context) (Order, error) {
		timeoutCtx, cancel := context.WithTimeout(ctx, c.timeout)
		defer cancel()

		var response Order
		err := requests.URL(c.orderURL).
			Pathf("/order/%s", id).
			ToJSON(&response).
			Fetch(timeoutCtx)

		return response, err
	}

	return c.cache.GetOrFetch(ctx, id, fetchFunc)
}

The cache is going to return the value from memory if it's available, and otherwise will call the fetchFn to retrieve the data from the underlying data source.

Most of our examples are going to be retrieving data from HTTP APIs, but it's just as easy to wrap a database query, a remote procedure call, a disk read, or any other I/O operation.

The fetchFn that we pass to GetOrFetch has the following function signature:

type FetchFn[T any] func(ctx context.Context) (T, error)

For data sources capable of handling requests for multiple records at once, we'll use GetOrFetchBatch:

type KeyFn func(id string) string

type BatchFetchFn[T any] func(ctx context.Context, ids []string) (map[string]T, error)

func (c *Client[T]) GetOrFetchBatch(ctx context.Context, ids []string, keyFn KeyFn, fetchFn BatchFetchFn[T]) (map[string]T, error) {
	// ...
}

There are a few things to unpack here, so let's start with the KeyFn. When adding an in-memory cache to an API client capable of calling multiple endpoints, it's highly unlikely that an ID alone is going to be enough to uniquely identify a record.

To illustrate, let's say that we're building a Github client and want to use this package to get around their rate limit. The username itself wouldn't make for a good cache key because we could use it to fetch gists, commits, repositories, etc. Therefore, GetOrFetchBatch takes a KeyFn that prefixes each ID with something to identify the data source so that we don't end up with cache key collisions:

gistPrefixFn := cacheClient.BatchKeyFn("gists")
commitPrefixFn := cacheClient.BatchKeyFn("commits")
gists, err := cacheClient.GetOrFetchBatch(ctx, userIDs, gistPrefixFn, fetchGists)
commits, err := cacheClient.GetOrFetchBatch(ctx, userIDs, commitPrefixFn, fetchCommits)

We're now able to use the same cache for multiple data sources, and internally we'd get cache keys of this format:

Sturdyc

Install / Use

README

`sturdyc`: a caching library for building sturdy systems

Table of contents

Installing

Creating a cache client

Evictions

Get or fetch

Sturdyc

Install / Use

README

sturdyc: a caching library for building sturdy systems

Table of contents

Installing

Creating a cache client

Evictions

Get or fetch

`sturdyc`: a caching library for building sturdy systems