Lasso
A fast, concurrent string interner
Install / Use
/learn @Kixiron/LassoREADME
[![CI][1]][0] [![Security Audit][2]][0] [![Coverage][3]][4] [![Docs.rs][6]][7] [![Crates.io][8]][9]
A multithreaded and single threaded string interner that allows strings to be cached with a minimal memory footprint,
associating them with a unique [key] that can be used to retrieve them at any time. A [Rodeo] allows O(1)
internment and resolution and can be turned into a [RodeoReader] to allow for contention-free resolutions
with both key to str and str to key operations. It can also be turned into a [RodeoResolver] with only
key to str operations for the lowest possible memory usage.
Which interner do I use?
For single-threaded workloads [Rodeo] is encouraged, while multi-threaded applications should use [ThreadedRodeo].
Both of these are the only way to intern strings, but most applications will hit a stage where they are done interning
strings, and at that point is where the choice between [RodeoReader] and [RodeoResolver]. If the user needs to get
keys for strings still, then they must use the [RodeoReader] (although they can still transfer into a [RodeoResolver])
at this point. For users who just need key to string resolution, the [RodeoResolver] gives contention-free access at the
minimum possible memory usage. Note that to gain access to [ThreadedRodeo] the multi-threaded feature is required.
| Interner | Thread-safe | Intern String | str to key | key to str | Contention Free | Memory Usage |
| ----------------- | :---------: | :-----------: | :--------: | :--------: | :-------------: | :----------: |
| [Rodeo] | ❌ | ✅ | ✅ | ✅ | N/A | Medium |
| [ThreadedRodeo] | ✅ | ✅ | ✅ | ✅ | ❌ | Most |
| [RodeoReader] | ✅ | ❌ | ✅ | ✅ | ✅ | Medium |
| [RodeoResolver] | ✅ | ❌ | ❌ | ✅ | ✅ | Least |
Cargo Features
By default lasso has one dependency, hashbrown, and only [Rodeo] is exposed. Hashbrown is used since the [raw_entry api] is currently unstable in the standard library's hashmap.
The raw hashmap API is used for custom hashing within the hashmaps, which works to dramatically reduce memory usage
To make use of [ThreadedRodeo], you must enable the multi-threaded feature.
multi-threaded- Enables [ThreadedRodeo], the interner for multi-threaded tasksahasher- Use [ahash]'sRandomStateas the default hasherno-std- Enablesno_std+allocsupport for [Rodeo] and [ThreadedRodeo]- Automatically enables the following required features:
ahasher-no_stdhashing function
- Automatically enables the following required features:
serialize- ImplementsSerializeandDeserializefor allSpurtypes and all internersinline-more- Annotate external apis with#[inline]
Example: Using Rodeo
use lasso::Rodeo;
let mut rodeo = Rodeo::default();
let key = rodeo.get_or_intern("Hello, world!");
// Easily retrieve the value of a key and find the key for values
assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!(Some(key), rodeo.get("Hello, world!"));
// Interning the same string again will yield the same key
let key2 = rodeo.get_or_intern("Hello, world!");
assert_eq!(key, key2);
Example: Using ThreadedRodeo
use lasso::ThreadedRodeo;
use std::{thread, sync::Arc};
let rodeo = Arc::new(ThreadedRodeo::default());
let key = rodeo.get_or_intern("Hello, world!");
// Easily retrieve the value of a key and find the key for values
assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!(Some(key), rodeo.get("Hello, world!"));
// Interning the same string again will yield the same key
let key2 = rodeo.get_or_intern("Hello, world!");
assert_eq!(key, key2);
// ThreadedRodeo can be shared across threads
let moved = Arc::clone(&rodeo);
let hello = thread::spawn(move || {
assert_eq!("Hello, world!", moved.resolve(&key));
moved.get_or_intern("Hello from the thread!")
})
.join()
.unwrap();
assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!("Hello from the thread!", rodeo.resolve(&hello));
Example: Creating a RodeoReader
use lasso::Rodeo;
// Rodeo and ThreadedRodeo are interchangeable here
let mut rodeo = Rodeo::default();
let key = rodeo.get_or_intern("Hello, world!");
assert_eq!("Hello, world!", rodeo.resolve(&key));
let reader = rodeo.into_reader();
// Reader keeps all the strings from the parent
assert_eq!("Hello, world!", reader.resolve(&key));
assert_eq!(Some(key), reader.get("Hello, world!"));
// The Reader can now be shared across threads, no matter what kind of Rodeo created it
Example: Creating a RodeoResolver
use lasso::Rodeo;
// Rodeo and ThreadedRodeo are interchangeable here
let mut rodeo = Rodeo::default();
let key = rodeo.get_or_intern("Hello, world!");
assert_eq!("Hello, world!", rodeo.resolve(&key));
let resolver = rodeo.into_resolver();
// Resolver keeps all the strings from the parent
assert_eq!("Hello, world!", resolver.resolve(&key));
// The Resolver can now be shared across threads, no matter what kind of Rodeo created it
Example: Making a custom-ranged key
Sometimes you want your keys to only inhabit (or not inhabit) a certain range of values so that you can have custom [niches]. This allows you to pack more data into what would otherwise be unused space, which can be critical for memory-sensitive applications.
use lasso::{Key, Rodeo};
// First make our key type, this will be what we use as handles into our interner
#[derive(Copy, Clone, PartialEq, Eq)]
struct NicheKey(u32);
// This will reserve the upper 255 values for us to use as niches
const NICHE: usize = 0xFF000000;
// Implementing `Key` is unsafe and requires that anything given to `try_from_usize` must produce the
// same `usize` when `into_usize` is later called
unsafe impl Key for NicheKey {
fn into_usize(self) -> usize {
self.0 as usize
}
fn try_from_usize(int: usize) -> Option<Self> {
if int < NICHE {
// The value isn't in our niche range, so we're good to go
Some(Self(int as u32))
} else {
// The value interferes with our niche, so we return `None`
None
}
}
}
// To make sure we're upholding `Key`'s safety contract, let's make two small tests
#[test]
fn value_in_range() {
let key = NicheKey::try_from_usize(0).unwrap();
assert_eq!(key.into_usize(), 0);
let key = NicheKey::try_from_usize(NICHE - 1).unwrap();
assert_eq!(key.into_usize(), NICHE - 1);
}
#[test]
fn value_out_of_range() {
let key = NicheKey::try_from_usize(NICHE);
assert!(key.is_none());
let key = NicheKey::try_from_usize(u32::max_value() as usize);
assert!(key.is_none());
}
// And now we're done and can make `Rodeo`s or `ThreadedRodeo`s that use our custom key!
let mut rodeo: Rodeo<NicheKey> = Rodeo::new();
let key = rodeo.get_or_intern("It works!");
assert_eq!(rodeo.resolve(&key), "It works!");
Example: Creation using FromIterator
use lasso::Rodeo;
use core::iter::FromIterator;
// Works for both `Rodeo` and `ThreadedRodeo`
let rodeo = Rodeo::from_iter(vec![
"one string",
"two string",
"red string",
"blue string",
]);
assert!(rodeo.contains("one string"));
assert!(rodeo.contains("two string"));
assert!(rodeo.contains("red string"));
assert!(rodeo.contains("blue string"));
use lasso::Rodeo;
use core::iter::FromIterator;
// Works for both `Rodeo` and `ThreadedRodeo`
let rodeo: Rodeo = vec!["one string", "two string", "red string", "blue string"]
.into_iter()
.collect();
assert!(rodeo.contains("one string"));
assert!(rodeo.contains("two string"));
assert!(rodeo.contains("red string"));
assert!(rodeo.contains("blue string"));
Benchmarks
Benchmarks were gathered with Criterion.rs
OS: Windows 10
CPU: Ryzen 9 3900X at 3800Mhz
RAM: 3200Mhz
Rustc: Stable 1.44.1
Rodeo
STD RandomState
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| resolve | 1.9251 μs | 13.285 GiB/s |
| try_resolve | 1.9214 μs | 13.311 GiB/s |
| resolve_unchecked | 1.4356 μs | 17.816 GiB/s |
| get_or_intern (empty) | 60.350 μs | 433.96 MiB/s |
| get_or_intern (filled) | 57.415 μs | 456.15 MiB/s |
| try_get_or_intern (empty) | 58.978 μs | 444.06 MiB/s |
| try_get_or_intern (filled) | 57.421 μs | 456.10 MiB/s |
| get (empty) | 37.288 μs | 702.37 MiB/s |
| get (filled) | 55.095 μs | 475.36 MiB/s |
AHash
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| try_resolve | 1.9282 μs | 13.264 GiB/s |
| resolve | 1.9404 μs | 13.181 GiB/s |
| resolve_unchecked | 1.4328 μs | 17.851 GiB/s |
| get_or_intern (empty) | 38.029 μs | 688.68 MiB/s |
| get_or_intern (filled) | 33.650 μs | 778.30 MiB/s |
| try_get_or_intern (empty) | 39.392 μs | 664.84 MiB/s |
| try_get_or_intern (filled) | 33.435 μs | 783.31 MiB/s |
| get (empty) | 12.565 μs | 2.0356 GiB/s |
| get (filled) | 26.545 μs | 986.61 MiB/s |
FXHash
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| resolve | 1.9014 μs | 13.451 GiB/s |
| try_resolve | 1.9278 μs | 13.267 GiB/s |
| resolve_unchecked | 1.4449 μs | 17.701 GiB/s |
| get_or_intern (empty) | 32.523 μs | 805.27 MiB/s |
| get_or_intern (filled) | 30.281 μs | 864.88 MiB/s |
| try_get_or_intern (empty) | 31.630
