Sturdyc
A caching library with advanced concurrency features designed to make I/O heavy applications robust and highly performant
Install / Use
/learn @viccon/SturdycREADME
A sturdy gopher shielding data sources from rapidly incoming requests.
sturdyc: a caching library for building sturdy systems
sturdyc eliminates cache stampedes and can minimize data source load in
high-throughput systems through features such as request coalescing and
asynchronous refreshes. It combines the speed of in-memory caching with
granular control over data freshness. At its core, sturdyc provides
non-blocking reads and sharded writes for minimal lock contention. The
xxhash algorithm is used for efficient key
distribution.
It has all the functionality you would expect from a caching library, but what sets it apart are the flexible configurations that have been designed to make I/O heavy applications both robust and highly performant.
We have been using this package in production to enhance both the performance and reliability of our services that retrieve data from distributed caches, databases, and external APIs. While the API surface of sturdyc is tiny, it offers extensive configuration options. I encourage you to read through this README and experiment with the examples in order to understand its full capabilities.
This screenshot shows the P95 latency improvements we observed after adding this package in front of a distributed key-value store:
<img width="1554" alt="Screenshot 2024-05-10 at 10 15 18" src="https://github.com/viccon/sturdyc/assets/12787673/adad1d4c-e966-4db1-969a-eda4fd75653a">
And through a combination of inflight-tracking, asynchronous refreshes, and refresh coalescing, we reduced load on underlying data sources by more than 90%. This reduction in outgoing requests has enabled us to operate with fewer containers and significantly cheaper database clusters.
Table of contents
Below is the table of contents for what this README is going to cover. However, if this is your first time using this package, I encourage you to read these examples in the order they appear. Most of them build on each other, and many share configurations.
- installing
- creating a cache client
- evictions
- get or fetch
- stampede protection
- early refreshes
- deletions
- caching non-existent records
- caching batch endpoints per record
- cache key permutations
- refresh coalescing
- request passthrough
- distributed storage
- custom metrics
- generics
Installing
go get github.com/viccon/sturdyc
Creating a cache client
The first thing you will have to do is to create a cache client to hold your configuration:
// Maximum number of entries in the cache. Exceeding this number will trigger
// an eviction (as long as the "evictionPercentage" is greater than 0).
capacity := 10000
// Number of shards to use. Increasing this number will reduce write lock collisions.
numShards := 10
// Time-to-live for cache entries.
ttl := 2 * time.Hour
// Percentage of entries to evict when the cache reaches its capacity. Setting this
// to 0 will make writes a no-op until an item has either expired or been deleted.
evictionPercentage := 10
// Create a cache client with the specified configuration.
cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage)
cacheClient.Set("key1", 99)
log.Println(cacheClient.Size())
log.Println(cacheClient.Get("key1"))
cacheClient.Delete("key1")
log.Println(cacheClient.Size())
log.Println(cacheClient.Get("key1"))
The New function is variadic, and as the final argument we're also able to
provide a wide range of configuration options, which we will explore in detail
in the sections to follow.
Evictions
The cache has two eviction strategies. One is a background job which continuously evicts expired records from each shard. However, there are options to both tweak the interval at which the job runs:
cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage,
sturdyc.WithEvictionInterval(time.Second),
)
as well as disabling the functionality altogether:
cacheClient := sturdyc.New[int](capacity, numShards, ttl, evictionPercentage,
sturdyc.WithNoContinuousEvictions()
)
The latter can give you a slight performance boost in situations where you're unlikely to ever exceed the capacity you've assigned to your cache.
However, if the capacity is reached, the second eviction strategy is triggered. This process performs evictions on a per-shard basis, selecting records for removal based on recency. The eviction algorithm uses quickselect, which has an O(N) time complexity without the overhead of requiring write locks on reads to update a recency list, as many LRU caches do.
Next, we'll start to look at some of the more advanced features.
Get or fetch
I have tried to design the API in a way that should make the process of
integrating sturdyc with any data source as straightforward as possible.
While it provides the basic get/set methods you would expect from a cache, the
advanced functionality is accessed through just two core functions:
GetOrFetch and GetOrFetchBatch
As an example, let's say that we had the following code for fetching orders from an API:
func (c *Client) Order(ctx context.Context, id string) (Order, error) {
timeoutCtx, cancel := context.WithTimeout(ctx, c.timeout)
defer cancel()
var response Order
err := requests.URL(c.orderURL).
Pathf("/order/%s", id).
ToJSON(&response).
Fetch(timeoutCtx)
return response, err
}
All we would have to do is wrap the lines of code that retrieves the data in a function, and then hand that over to our cache client:
func (c *Client) Order(ctx context.Context, id string) (Order, error) {
fetchFunc := func(ctx context.Context) (Order, error) {
timeoutCtx, cancel := context.WithTimeout(ctx, c.timeout)
defer cancel()
var response Order
err := requests.URL(c.orderURL).
Pathf("/order/%s", id).
ToJSON(&response).
Fetch(timeoutCtx)
return response, err
}
return c.cache.GetOrFetch(ctx, id, fetchFunc)
}
The cache is going to return the value from memory if it's available, and
otherwise will call the fetchFn to retrieve the data from the underlying data
source.
Most of our examples are going to be retrieving data from HTTP APIs, but it's just as easy to wrap a database query, a remote procedure call, a disk read, or any other I/O operation.
The fetchFn that we pass to GetOrFetch has the following function
signature:
type FetchFn[T any] func(ctx context.Context) (T, error)
For data sources capable of handling requests for multiple records at once,
we'll use GetOrFetchBatch:
type KeyFn func(id string) string
type BatchFetchFn[T any] func(ctx context.Context, ids []string) (map[string]T, error)
func (c *Client[T]) GetOrFetchBatch(ctx context.Context, ids []string, keyFn KeyFn, fetchFn BatchFetchFn[T]) (map[string]T, error) {
// ...
}
There are a few things to unpack here, so let's start with the KeyFn. When
adding an in-memory cache to an API client capable of calling multiple
endpoints, it's highly unlikely that an ID alone is going to be enough to
uniquely identify a record.
To illustrate, let's say that we're building a Github client and want to use
this package to get around their rate limit. The username itself wouldn't make
for a good cache key because we could use it to fetch gists, commits,
repositories, etc. Therefore, GetOrFetchBatch takes a KeyFn that prefixes
each ID with something to identify the data source so that we don't end up with
cache key collisions:
gistPrefixFn := cacheClient.BatchKeyFn("gists")
commitPrefixFn := cacheClient.BatchKeyFn("commits")
gists, err := cacheClient.GetOrFetchBatch(ctx, userIDs, gistPrefixFn, fetchGists)
commits, err := cacheClient.GetOrFetchBatch(ctx, userIDs, commitPrefixFn, fetchCommits)
We're now able to use the same cache for multiple data sources, and internally we'd get cache keys of this format:
