SkillAgentSearch skills...

BTree

Fast sorted collections for Swift using in-memory B-trees

Install / Use

/learn @attaswift/BTree
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Fast Sorted Collections for Swift<br>Using In-Memory B-Trees

Swift 4.0 License Platform

Build Status Code Coverage

Carthage compatible CocoaPod Version

<a name="overview">Overview</a>

This project provides an efficient in-memory B-tree implementation in pure Swift, and several useful sorted collection types that use B-trees for their underlying storage.

  • Map<Key, Value> implements a sorted mapping from unique comparable keys to arbitrary values. It is like Dictionary in the standard library, but it does not require keys to be hashable, it has strong guarantees on worst-case performance, and it maintains its elements in a well-defined order.

  • List<Element> implements a random-access collection of arbitrary elements. It is like Array in the standard library, but lookup, insertion and removal of elements at any index have logarithmic complexity. (Array has O(1) lookup, but insertion and removal at an arbitrary index costs O(n).) Concatenation of two lists of any size, inserting a list into another list at any position, removal of any subrange of elements, or extraction of an arbitrary sub-list are also operations with O(log(n)) complexity.

  • SortedSet<Element> implements a sorted collection of unique comparable elements. It is like Set in the standard library, but lookup, insertion and removal of any element has logarithmic complexity. Elements in an SortedSet are kept sorted in ascending order. Operations working on full sets (such as taking the union, intersection or difference) can take as little as O(log(n)) time if the elements in the source sets aren't interleaved.

  • SortedBag<Element> implements a sorted multiset with comparable elements. This is a generalization of a set that allows multiple instances of the same value. (The standard library does not include such a collection, although you can use a dictionary to emulate one by storing the multiplicities of the keys as values.) The implementation provided in this package stores each duplicate element separately, which may come useful if your elements are reference types with identities or you have some other means to distinguish between equal elements. SortedBag operations have the same time complexities as the equivalent operations in SortedSet.

  • [BTree<Key, Value>][BTree] is the underlying primitive collection that serves as base storage for all of the above collections. It is a general sorted key-value store with full support for elements with duplicate keys; it provides a sum of all operations individually provided by the higher-level abstractions above (and more!).

    The BTree type is public; you may want to use it if you need a collection flavor that isn't provided by default (such as a multimap) or if you need to use an operation that isn't exposed by the wrappers.

All of these collections are structs and they implement the same copy-on-write value semantics as standard Swift collection types like Array and Dictionary. (In fact, copy-on-write works even better with these than standard collections; continue reading to find out why!)

The latest version of BTree requires Swift 4. (The last release supporting Swift 3 was 4.0.2.)

<a name="api">Reference Documentation</a>

The project includes a nicely formatted reference document generated from the documentation comments embedded in its source code.

<a name="book">Optimizing Collections: The Book</a>

If you want to learn more about how this package works, the book Optimizing Collections includes detailed explanations of many of the algorithms and optimization tricks implemented by this package – and so, so much more. It is written by the same author, and published by the fine folks at objc.io. Buying a copy of the book is not only a nice way to support this project, it also gets you something quite interesting to read. Win-win!

Optimizing Collections (eBook)

<a name="what">What Are B-Trees?</a>

B-trees are search trees that provide a sorted key-value store with excellent performance characteristics. In essence, each node maintains a sorted array of its own elements, and another array for its children. The tree is kept balanced by three constraints:

  1. Only the root node is allowed to be less than half full.
  2. No node may be larger than the maximum size.
  3. The leaf nodes are all at the same level.

Compared to other popular search trees such as red-black trees or AVL trees, B-trees have huge nodes: nodes often contain hundreds (or even thousands) of key-value pairs and children.

This module implements a "vanilla" B-tree where every node contains full key-value pairs. (The other popular type is the B+-tree where only leaf nodes contain values; internal nodes contain only copies of keys. This often makes more sense on an external storage device with a fixed block size, but it seems less useful for an in-memory implementation.)

Each node in the tree also maintains the count of all elements under it. This makes the tree an order statistic tree, where efficient positional lookup is possible.

<a name="why">Why In-Memory B-Trees?</a>

The Swift standard library offers heavily optimized arrays and hash tables, but omits linked lists and tree-based data structures. This is a result of the Swift engineering team spending resources (effort, code size) on the abstractions that provide the biggest bang for the buck.

Indeed, the library lacks even a basic double-ended queue construct -- although Cocoa's Foundation framework does include one in NSArray.

However, some problems call for a wider variety of data structures.

In the past, linked lists and low-order search trees such as red-black trees were frequently employed; however, the performance of these constructs on modern hardware is greatly limited by their heavy use of pointers.

B-trees were originally invented in the 1970s as a data structure for slow external storage devices. As such, they are strongly optimized for locality of reference: they prefer to keep data in long contiguous buffers and they keep pointer derefencing to a minimum. (Dereferencing a pointer in a B-tree usually meant reading another block of data from the spinning hard drive, which is a glacially slow device compared to the main memory.)

Today's computers have multi-tiered memory architectures; they rely on caching to keep the system performant. This means that locality of reference has become a hugely important property for in-memory data structures, too.

Arrays are the epitome of reference locality, so the Swift stdlib's heavy emphasis on Array as the universal collection type is well justified.

For example, using a single array to hold a sorted list of items has quite horrible (quadratic) asymptotic complexity when there are many elements. However, up to a certain maximum size, a simple array is in fact the most efficient way to represent a sorted list.

Typical benchmark results for sorted collections

The benchmark above demonstrates this really well: insertion of n elements into a sorted array costs O(n^2) when there are many items, but for many reasonably sized data sets, it is still much faster than creating a red-black tree with its fancypants O(n * log(n)) complexity.

Near the beginning of the curve, up to about eighteen thousand items, a sorted array implementation imported from an external module is very consistently about 6-7 times faster than a red-black tree, with a slope that is indistinguishable from O(n * log(n)).

Even after it catches up to quadratic complexity, in this particular benchmark, it takes about a hundred thousand items for the sorted

View on GitHub
GitHub Stars1.3k
CategoryDevelopment
Updated1mo ago
Forks73

Languages

Swift

Security Score

100/100

Audited on Feb 19, 2026

No findings