Sycamore
A fast, purely functional data structure library in Common Lisp.
Install / Use
/learn @ndantam/SycamoreREADME
SYCAMORE
Fast, purely functional data structures in Common Lisp.
API Documentation: http://ndantam.github.io/sycamore
Comprehensive Set Benchmarks: https://ndantam.github.io/sycamore/bench.html
Features
- Fast, purely functional Hash Array Mapped Tries (HAMT) [1,2].
- https://en.wikipedia.org/wiki/Hash_array_mapped_trie
- One simple-vector per layer (bitmap+children) reduces indirection.
- SBCL VOPs (when available) enable single machine instructions for the POPCNT and CTZ operations commonly performed by HAMTs.
- Generally higher-performance than WB-trees.
- Requires hashing and test functions (defaults are CL standard SXHASH and EQL), but does not need the total ordering function required by trees.
- Fast, purely functional weight-balanced binary trees (WB-trees)
[3,4,5].
- http://en.wikipedia.org/wiki/Weight-balanced_tree
- Leaf nodes are simple-vectors, greatly reducing tree height.
- Interfaces for tree Sets and Maps (dictionaries).
- Ropes [6]
- http://en.wikipedia.org/wiki/Rope_(data_structure)
- Purely functional pairing heaps
- http://en.wikipedia.org/wiki/Pairing_heap
- Purely functional amortized queue
Installation
- Sycamore uses ASDF (https://common-lisp.net/project/asdf/)
- See
INSTALLfile for details
Examples
See also ./example.lisp
Hash Sets
Create a set with default EQL test:
CL-USER> (sycamore:make-hash-set)
#<SYCAMORE:HASH-SET {}>
Create a set with default EQL test and initial elements from a list:
CL-USER> (sycamore:list-hash-set '(1 2 -10 40))
#<SYCAMORE:HASH-SET {40 1 -10 2}>
Create a set with EQUALP test:
CL-USER> (sycamore:list-hash-set '(1 1.0 2.0 2) :test #'equalp)
#<SYCAMORE:HASH-SET {1 2.0}>
Insertion:
CL-USER> (sycamore:hash-set-insert (sycamore:list-hash-set '(1 2))
0)
#<SYCAMORE:HASH-SET {1 2 0}>
Lookup:
CL-USER> (sycamore:hash-set-find (sycamore:list-hash-set '(0 1 2))
0
T
Removal:
CL-USER> (sycamore:hash-set-remove (sycamore:list-hash-set '(1 2 0))
0)
#<SYCAMORE:HASH-SET {1 2}>
Union:
CL-USER> (sycamore:hash-set-union (sycamore:list-hash-set '(1 2 0))
(sycamore:list-hash-set '(1 0 3)))
#<SYCAMORE:HASH-SET {3 1 2 0}>
Intersection:
CL-USER> (sycamore:hash-set-intersection (sycamore:list-hash-set '(1 2 0))
(sycamore:list-hash-set '(1 0 3)))
#<SYCAMORE:HASH-SET {1 0}>
Difference:
CL-USER> (sycamore:hash-set-difference (sycamore:list-hash-set '(1 2 0))
(sycamore:list-hash-set '(1 0 3)))
#<SYCAMORE:HASH-SET {2}>
Map set:
CL-USER> (sycamore:map-hash-set 'list #'1+
(sycamore:list-hash-set '(1 2 0)))
(2 3 1)
Fold set:
CL-USER> (sycamore:fold-hash-set (lambda (list item) (cons item list))
nil
(sycamore:list-hash-set '(1 2 0)))
(1 2 0)
Tree Sets
Define an ordering function:
CL-USER> (defun compare (a b)
(cond ((< a b) -1)
((> a b) 1)
(t 0)))
COMPARE
Create a set for integers:
CL-USER> (sycamore:tree-set #'compare 1 2 -10 40)
#<TREE-SET {-10 1 2 40}>
Insertion:
CL-USER> (sycamore:tree-set-insert (sycamore:tree-set #'compare 1 2)
0)
#<TREE-SET {0 1 2}>
Lookup:
CL-USER> (sycamore:tree-set-find (sycamore:tree-set #'compare 1 2 0)
0)
0
T
Removal:
CL-USER> (sycamore:tree-set-remove (sycamore:tree-set #'compare 1 2 0)
0)
#<TREE-SET {1 2}>
Union operation:
CL-USER> (sycamore:tree-set-union (sycamore:tree-set #'compare 1 2)
(sycamore:tree-set #'compare 1 0 3))
#<TREE-SET {0 1 2 3}>
Intersection operation:
CL-USER> (sycamore:tree-set-intersection (sycamore:tree-set #'compare 1 2)
(sycamore:tree-set #'compare 1 0 3))
#<TREE-SET {1}>
Difference operation:
CL-USER> (sycamore:tree-set-difference (sycamore:tree-set #'compare 1 2)
(sycamore:tree-set #'compare 1 0 3))
#<TREE-SET {2}>
Map set:
CL-USER> (sycamore:map-tree-set 'list #'1+
(sycamore:tree-set #'compare 1 0 10 2))
(1 2 3 11)
Fold set:
CL-USER> (sycamore:fold-tree-set (lambda (list item) (cons item list))
nil
(sycamore:tree-set #'compare 1 0 10 2))
(10 2 1 0)
Ropes
Create a Rope:
CL-USER> (sycamore:rope "Hello" #\Space 'World!)
#<ROPE "Hello WORLD!">
Also works on lists:
CL-USER> (sycamore:rope (list "Hello" #\Space 'World!))
#<ROPE "Hello WORLD!">
And arrays:
CL-USER> (sycamore:rope (vector "Hello" #\Space 'World!))
#<ROPE "Hello WORLD!">
Rope to string:
CL-USER> (sycamore:rope-string (sycamore:rope "Hello" #\Space 'World!))
"Hello WORLD!"
Print a rope:
CL-USER> (sycamore:rope-write (sycamore:rope "Hello" #\Space 'World!)
:escape nil :stream *standard-output*)
Hello WORLD!
What collection data structure should I use?
Concept of Operations
- Two classes of set and map collections: hashes and trees.
- Hashes require a hash function and an equality function. ANSI CL provides SXHASH and EQ/EQL/EQUAL/EQUALP/=.
- Trees require a total order (for any two elements A and B: does A come before B, does B come before A, or are A and B equal?)
- Hashes offer better asymptotic (big-O) and constant factor running times (O(1)) than trees (O(log(n))).
- In some less-typical cases (e.g., adversarial key selection, very large collections on 32 bit machines), hash collisions could reduce hash performance, while balanced trees guarantee consistent performance in such cases.
Q/A
-
Q: Is your collection tiny or used outside a performance-critical path?
-
Yes: Use a list. Or whatever you care to. For collections that are small and/or not performance-critical, it probably won't much matter.
-
No: Let's discuss!
-
-
Q: Are you very certain you will only ever have one version of your structure that you need? (no sharing between modules, no backtracking, maybe used only within a single, small function)?
-
Yes: A mutable structure such as ANSI CL hash-tables may offer the best performance.
-
No: Let's discuss!
-
-
Q: Are your elements/keys hashable and equality-comparable (lisp primitives and lists are hashable/equality-comparable), and are keys not adversarially selected?
-
Yes: Hash Array Mapped Tries (HAMTs) could offer the best performance.
-
No, I either don't have a hashing function or my keys could be adversarially selected.
-
-
Q: Are your keys adversarially selected but able to be cryptographically hashed?
- Yes: In principle, HAMTs using a cryptographic hashing function would be robust to adversarially selected keys. However, strong cryptographic hashes require >64 bits, while the current HAMT implementation in Sycamore assumes that hash-codes are FIXNUMs (aligning with ANSI CL SXHASH). This limitation may be addressed in the future by adding support for larger (e.g., bit-vector) hash-codes, but for now WB-trees may be best for adversarial cases.
-
Q: Are your keys not hashable but totally ordered (for any two elements A and B: you can decide if A and B are equal or if A comes before B, or vice versa)?
- Yes: Weight-balanced trees could offer good performance. They may be slighter slower than HAMTs, but require comparison rather than hashing functions.
Alternatives
There are many other Common Lisp data structure libraries. Here are a few alternatives and their trade-offs relative to Sycamore.
ANSI CL Hash Tables
Mutable hash tables are very fast. If you don't need persistence or set operations (union, intersection, difference, subset), then a mutable hash table could be a good choice. Included benchmarks show that SBCL's hash tables are around twice as fast as Sycamore HAMTs for construction and lookup.
FSet
https://gitlab.common-lisp.net/fset/fset
FSet implements finite sets using (1) weight-balanced binary trees similar to (and which in part inspired) Sycamore's WB-tree implementation and (2) a HAMT variation based on Steindorfer's Compressed Hash-Array Mapped Prefix-tree (CHAMP) [5]. FSet trees support partial orders, whereas Sycamore trees assume a total order. HAMTs don't require any order but do require a hashing function. FSet trees uses a single, global ordering for elements via a generic function, while Sycamore orders elements with a per-collection COMPARE parameter. FSet HAMTs do not memoize hash codes, while Sycamore HAMTs do.
Benchmarks indicate that Sycamore and FSet HAMTs are similar in performance for insert, find, and remove operations, while Sycamore HAMTs are 1.3-4.5x faster for new HAMT construction, union, intersection, and difference on the tested sets. Sycamore WB-trees are 1.2-3.8x faster than FSet, depending on the operation and size, for the tested sets.
CL-HAMT
https://github.com/danshapero/cl-hamt
CL-HAMT is an implementation of hash array mapped tries. Included benchmarks show that Sycamore HAMTs are 10x faster than CL-HAMT.
CL-Containers
https://github.com/gwkkwg/cl-containers
CL-Containers is stateful/mutable/imperative, while Sycamore is purely-functional/persistent.
Lisp Interfa
