BTree
Fast sorted collections for Swift using in-memory B-trees
Install / Use
/learn @attaswift/BTreeREADME
Fast Sorted Collections for Swift<br>Using In-Memory B-Trees
- Overview
- Reference Documentation
- Optimizing Collections: The Book
- What Are B-Trees?
- Why In-Memory B-Trees?
- Laundry List of Issues with Standard Collection Types
- B-Trees to the Rescue!
- Implementation Notes
- Remark on Performance of Imported Generics
<a name="overview">Overview</a>
This project provides an efficient in-memory B-tree implementation in pure Swift, and several useful sorted collection types that use B-trees for their underlying storage.
-
Map<Key, Value>implements a sorted mapping from unique comparable keys to arbitrary values. It is likeDictionaryin the standard library, but it does not require keys to be hashable, it has strong guarantees on worst-case performance, and it maintains its elements in a well-defined order. -
List<Element>implements a random-access collection of arbitrary elements. It is likeArrayin the standard library, but lookup, insertion and removal of elements at any index have logarithmic complexity. (Arrayhas O(1) lookup, but insertion and removal at an arbitrary index costs O(n).) Concatenation of two lists of any size, inserting a list into another list at any position, removal of any subrange of elements, or extraction of an arbitrary sub-list are also operations with O(log(n)) complexity. -
SortedSet<Element>implements a sorted collection of unique comparable elements. It is likeSetin the standard library, but lookup, insertion and removal of any element has logarithmic complexity. Elements in anSortedSetare kept sorted in ascending order. Operations working on full sets (such as taking the union, intersection or difference) can take as little as O(log(n)) time if the elements in the source sets aren't interleaved. -
SortedBag<Element>implements a sorted multiset with comparable elements. This is a generalization of a set that allows multiple instances of the same value. (The standard library does not include such a collection, although you can use a dictionary to emulate one by storing the multiplicities of the keys as values.) The implementation provided in this package stores each duplicate element separately, which may come useful if your elements are reference types with identities or you have some other means to distinguish between equal elements.SortedBagoperations have the same time complexities as the equivalent operations inSortedSet. -
[
BTree<Key, Value>][BTree] is the underlying primitive collection that serves as base storage for all of the above collections. It is a general sorted key-value store with full support for elements with duplicate keys; it provides a sum of all operations individually provided by the higher-level abstractions above (and more!).The
BTreetype is public; you may want to use it if you need a collection flavor that isn't provided by default (such as a multimap) or if you need to use an operation that isn't exposed by the wrappers.
All of these collections are structs and they implement the same copy-on-write value semantics as
standard Swift collection types like Array and Dictionary. (In fact, copy-on-write works even
better with these than standard collections; continue reading to find out why!)
The latest version of BTree requires Swift 4. (The last release supporting Swift 3 was 4.0.2.)
<a name="api">Reference Documentation</a>
The project includes a nicely formatted reference document generated from the documentation comments embedded in its source code.
<a name="book">Optimizing Collections: The Book</a>
If you want to learn more about how this package works, the book Optimizing Collections includes detailed explanations of many of the algorithms and optimization tricks implemented by this package – and so, so much more. It is written by the same author, and published by the fine folks at objc.io. Buying a copy of the book is not only a nice way to support this project, it also gets you something quite interesting to read. Win-win!
<a name="what">What Are B-Trees?</a>
B-trees are search trees that provide a sorted key-value store with excellent performance characteristics. In essence, each node maintains a sorted array of its own elements, and another array for its children. The tree is kept balanced by three constraints:
- Only the root node is allowed to be less than half full.
- No node may be larger than the maximum size.
- The leaf nodes are all at the same level.
Compared to other popular search trees such as red-black trees or AVL trees, B-trees have huge nodes: nodes often contain hundreds (or even thousands) of key-value pairs and children.
This module implements a "vanilla" B-tree where every node contains full key-value pairs. (The other popular type is the B+-tree where only leaf nodes contain values; internal nodes contain only copies of keys. This often makes more sense on an external storage device with a fixed block size, but it seems less useful for an in-memory implementation.)
Each node in the tree also maintains the count of all elements under it. This makes the tree an order statistic tree, where efficient positional lookup is possible.
<a name="why">Why In-Memory B-Trees?</a>
The Swift standard library offers heavily optimized arrays and hash tables, but omits linked lists and tree-based data structures. This is a result of the Swift engineering team spending resources (effort, code size) on the abstractions that provide the biggest bang for the buck.
Indeed, the library lacks even a basic double-ended queue construct -- although Cocoa's
Foundationframework does include one inNSArray.
However, some problems call for a wider variety of data structures.
In the past, linked lists and low-order search trees such as red-black trees were frequently employed; however, the performance of these constructs on modern hardware is greatly limited by their heavy use of pointers.
B-trees were originally invented in the 1970s as a data structure for slow external storage devices. As such, they are strongly optimized for locality of reference: they prefer to keep data in long contiguous buffers and they keep pointer derefencing to a minimum. (Dereferencing a pointer in a B-tree usually meant reading another block of data from the spinning hard drive, which is a glacially slow device compared to the main memory.)
Today's computers have multi-tiered memory architectures; they rely on caching to keep the system performant. This means that locality of reference has become a hugely important property for in-memory data structures, too.
Arrays are the epitome of reference locality, so the Swift stdlib's heavy emphasis on Array as the
universal collection type is well justified.
For example, using a single array to hold a sorted list of items has quite horrible (quadratic) asymptotic complexity when there are many elements. However, up to a certain maximum size, a simple array is in fact the most efficient way to represent a sorted list.

The benchmark above demonstrates this really well: insertion of n elements into a sorted array costs O(n^2) when there are many items, but for many reasonably sized data sets, it is still much faster than creating a red-black tree with its fancypants O(n * log(n)) complexity.
Near the beginning of the curve, up to about eighteen thousand items, a sorted array implementation imported from an external module is very consistently about 6-7 times faster than a red-black tree, with a slope that is indistinguishable from O(n * log(n)).
Even after it catches up to quadratic complexity, in this particular benchmark, it takes about a hundred thousand items for the sorted

