Generalized order-statistic tree implementation

An order statistic tree is a binary search tree where the rank of any element can be calculated efficiently (in O(log n) time). This is achieved by taking a balanced binary tree (a red-black tree in this case) and augmenting each node with additional information about the size of its subtree.

A generalized order statistic tree allows the efficient calculation of a partial sum of an arbitrary function, f(k,v) in general depending on the keys and values stored in the tree: $\sum_{k < K} f(k,v)$

The special case $f(k,v) \equiv 1$ gives back a regular order-statistic tree. In the following, I refer to this function as the weight function and the value it returns as the weight of an element. Efficient calculation of the sum of weights can be achieved by storing partial sums in the nodes of a binary tree. In this case, each node stores the partial sum for its subtree.

This repository contains a C++ template implementation of a generalized order statistic tree, implementing functionality similar to sets, multisets, maps and multimaps from the standard library. Doxygen documentation is available at https://dkondor.github.io/orbtree/.

Example code using this library is available at https://github.com/dkondor/patest_new, implementing testing for preferential attachment in growing networks (according to our paper Do the Rich Get Richer? An Empirical Analysis of the Bitcoin Transaction Network).

Usage

This library requires a C++11 compiler (it was tested on g++).

To use this library, put the main header files (orbtree.h, orbtree_base.h, orbtree_node.h, vector_realloc.h, vector_stacked.h) in your source tree and include orbtree.h in your project. Each container type has four implementations, according to the following two choices:

Whether the weight function returns only one value (simple), or returns a vector of values (vector). The latter case can be especially useful if the goal is to evaluate the same function with multiple parameter values.
Whether the implementation allocated individual nodes separately and uses pointers internally, or stores nodes in vector-like containers and uses indexes internally to refer to nodes. The latter choice provides a more compact tree with less memeory allocation, and for trees with less than 2^31-1 elements, it allows the use of 32-bit indexes instead of 64-bit pointers -- this can in turn decrease memory use as well.

These implementations are provided by the following classes (all part of the orbtree namespace):

| | Pointer-based storage | Compact storage | | ----------------------------------- | --------------------- | --------------- | | Weight function return scalar value | simple_set, simple_multiset, simple_map, simple_multimap | simple_setC, simple_multisetC, simple_mapC, simple_multimapC | | Weight function returns vector | orbset, orbmultiset, orbmap, orbmultimap | orbsetC, orbmultisetC, orbmapC, orbmultimapC |

The main interface for the above is actually provided by the orbtree::orbtree and orbtree::orbtreemap classes, but it is recommended to not instantiate these directly, but use the wrappers (which are simple using template definitions with the appropriate template parameter values).

Note: a main difference from STL maps is that values cannot be directly set by the reference returned by map::at() or operator [] or by dereferencing iterators . The reason for this is that setting a value can require recomputing the internal function values that depend on it. Use one of the map::set_value() or map::update_value() or iterator::set_value() functions instead.

What is needed for all cases is to provide a function object that can compute the values associated with the elements stored in the container (whose sum is calculated). For more generality, this function can either return a scalar (to be used be the simple_ variants) or be vector-valued, e.g. to compute results with different parameter values using the same tree.

The expected interface is documented as orbtree::NVFunc_Adapter_Simple ; also, if you would like to use a function that returns a single value, you can use the wrapper orbtree::NVFunc_Adapter_Simple .

A simple example is the following for a function that adds and doubles the key and value (which are both unsigned integers):

// define function to use
struct double_add {
    typedef std::pair<unsigned int, unsigned int> argument_type;
    typedef unsigned int result_type;
    unsigned int operator()(const std::pair<unsigned int, unsigned int>& p) const {
        return 2*(p.first + p.second);
    }
};


// map with the above function
orbtree::simple_map<unsigned int, unsigned int, double_add > map1;

// insert elements
map1.insert(std::make_pair(1U,2U));
map1.insert(std::make_pair(1000U,1234U));
// ...
// calculate sum of function for element with key == 1000
unsigned int res = map1.get_sum(1000U);

A more detailed example with a vector function that multiplies the key with the given parameter and a map that actually stores a "histogram", thus weights need to be multiplied by the value of nodes:

// function to use
struct mult_hist {
    typedef std::pair<unsigned int, unsigned int> argument_type;
    typedef double ParType;
    typedef double result_type;
    double operator()(const std::pair<unsigned int, unsigned int>& p, double a) const {
        return a*p.first*p.second;
    }
};

When declaring the tree, it is possible to use the orbtree::NVFunc_Adapter_Vec adapter and also to give a vector of parameter values to the tree constructor. When calculating weights, the weight function will be called with each parameter value.

std::vector<double> parameters{1.0,2.5,5.555555};
orbtree::orbmapC<unsigned int, unsigned int, orbtree::NVFunc_Adapter_Vec<mult_hist> > map2(parameters);

Insert elements as normal:

map2.insert(std::make_pair(1U,3U);
map2.insert(std::make_pair(10U,1U);
map2.insert(std::make_pair(5U,2U);

Calculation of weights now requires an appropriately sized array:

std::vector<double> sum1(parameters.size());
std::vector<double> norm(parameters.size());
map2.get_sum(10,tmp.data());
map2.get_norm(norm.data());

Note: it is not possible to modify mapped values directly or with iterators, e.g. the following will not compile:

map2[1] = 5;
map2[10]++;
auto it = map2.find(5);
it->second = 1;

It is possible to use map::set_value() or map::update_value() to modify the values, e.g.:

map2.update_value(1,5);
map2.set_value(11,1);

Note that the diffence between the two is that update_value() requires the key to be already present, i.e. it will throw an exception if called with a key not present in the map. On the other hand, set_value() will insert a new element if the key is not present (if the key is present, it will update the value -- of course, you can use insert() to only insert a new value if the key is not yet present in a m

Orbtree

Install / Use

README

Generalized order-statistic tree implementation

Usage