Teseo
A C++ library for the analysis of structural dynamic graphs
Install / Use
/learn @cwida/TeseoREADME
Teseo
This is the source code of Teseo, the system described in the paper D. De Leo, P. Boncz, Teseo and the Analysis of Structural Dynamic Graphs, VLDB 2021 (+ Errata). To repeat the experiments in the paper, refer to the GFE driver. This page describes how to build and use the library, a documentation promised to the final reviewers of the paper :-)
For any question, you can reach me at github[at]whatsthecraic[dot]net .
Build
Prerequisites
- Only Linux is supported.
- Autotools, Autoconf 2.69+
- A C++17 compiler. We relied on both Clang 10 and GCC 10 for the experiments in the paper.
- libnuma 2.0 +
- libevent 2.1.1+
Build instructions
Only the first time, generate the configure script from the root directory with ./autoreconf -iv, this is where Autotools is required. To build the library, create a build directory and, from there, run the configure script spawned by Autotools. There are mainly three kinds of builds:
./configure(without parameters): DEBUG build with debug symbols, assertions and other additional debug code enabled. Suitable for development../configure --enable-test, similar to a DEBUG build, but with the length of the data structures significantly smaller. Suitable for unit testing (see below)../configure --enable-optimize --disable-debug: RELEASE build, it passes to the compiler the flags-O3 -march=native -mtune=native -fno-stack-protector. It does not contain debug symbol, assertions and the extra debug code. This is the configuration used for the experiments of the paper.
Type ./configure --help for the whole set of options.
Finally, to build the library, run make. The following is the complete snippet to create a RELEASE build:
./autoreconf -iv
mkdir build && cd build
../configure --enable-optimize --disable-debug
make -j
Note that there is not a make install target, in contrast to what commonly found in other configure/make scripts.
Linking
For the compiler, include (option -I) both the directories <root_dir>/include AND <build_dir>/include. For the linker, include the static library <build_dir>/libteseo.a and its dependencies libevent and libnuma. For instance, a sample Makefile for a custom hello_world.cpp project with GCC could look like this:
all: hello_world
# Executable
hello_world: hello_world.o
g++ -std=c++17 $^ -L/path/to/teseo/build -lteseo -lnuma -levent_pthreads -levent_core -lpthread -o $@
# Compiled source
hello_world.o: hello_world.cpp
g++ -std=c++17 -c $^ -I/path/to/teseo/include -I/path/to/teseo/build/include -o $@
Assuming that the library was built in the directory /path/to/teseo/build.
Tests
To be able to run the unit tests, the configure script must be first invoked with the option --enable-test.
Afterward, the test-suite can be created with make testsuite. The complete snippet of commands is:
# starting from the root directory of this repository
./autoreconf -iv # if this has not been issued before
mkdir build_unit_testing && cd build_unit_testing
../configure --enable-test
./make testsuite -j
./testsuite
The whole test suite should complete in ~30 minutes, using as reference an AMD 3950X with 16 cores.
To selectively execute only a specific test set, use ./testsuite <testname>. The list of available tests can be retrieved with ./testsuite -l. The library used for unit testing is Catch, refer to its documentation for the other command line options.
Usage
In your application, include the header teseo.hpp, which acts as interface to the library and all exposed methods, part of the API, are documented.
The library only supports undirected weighted graphs. Vertices are identified by 64-bit integers in [0, 2^64 -2]. Edges must be simple, without self-loops. There cannot be multiple edges between the same vertices. Hyperedges , edges connecting more than two vertices, are, as you have already likely imagined, not supported. Weights must be 64-bit floating point numbers, that is doubles. These settings are sufficient to implement the kernels of the LDBC Graphalytics benchmark.
Basic interface
The first step is to create an in-memory database. This is achieved by teseo::Teseo database, there is only one empty constructor. Databases cannot be serialized or saved to disk, when the application is restarted you need to initialise a new database.
Modifying the content of the database can only be achieved through a teseo::Transaction instance. There are two types of transactions: read-write (default) and read-only. The difference is that with a read-only transaction, while the underlying graph cannot be modified, it is generally faster to examine its content. The major penalty of read-write transactions is with scans involving logical vertex identifiers (see below). A transaction instance is created through the method Teseo::start_transaction(bool read_only = false).
To change the content of the graph, the instance methods of a Transaction object are insert_vertex(uint64_t vertex_id), remove_vertex(uint64_t vertex_id), insert_edge(uint64_t source, uint64_t destination, double weight) and remove_edge(uint64_t source, uint64_t destination). These methods return nothing, but remove_vertex, which implicitly removes all edges attached to the vertex being deleted and the method returns
the number of removed edges. As graphs are undirected, it is equivalent to refer to an edge as a -> b and b -> a, that is, insert_edge(a, b, w) and insert_edge(b, a, w) have the same outcome.
Changes performed by a read-write transaction t are not saved in the database and will not be visible by later transactions until t.commit() is invoked. Pending changes can also be discarded by invoking t.roll_back(). Once one of these two methods is called, the transaction instance is consumed and cannot be further
utilised to modify or inspect the content of the graph. A read-only transaction can be terminated by both commit() and roll_back(), the final effect is the same, but it is still important to terminate it to release the acquired resources. When a transaction object goes out of scope, if it has not been explicitly terminated, the system automatically issues a roll_back().
The following snippet illustrates the methods mentioned to create a graph, insert and remove a few vertices and edges:
#include "teseo.hpp"
int main(int argc, const char* argv[]){
teseo::Teseo database;
auto tx1 = database.start_transaction();
// Create the vertices 10, 20, 30, 40
tx1.insert_vertex(10);
tx1.insert_vertex(20);
tx1.insert_vertex(30);
tx1.insert_vertex(40);
// Create the edges 10 - 20, 10 - 30 and 10 - 40
tx1.insert_edge(10, 20, /* weight */ 1.0);
tx1.insert_edge(10, 30, /* weight */ 1.5);
tx1.insert_edge(10, 40, /* weight */ 2.5);
// Remove an edge 10 - 40
// As the graph is undirected, an edge can be equivalently addressed as 10 - 40 and 40 - 10.
tx1.remove_edge(40, 10);
// Remove a vertex
tx1.remove_vertex(20);
// Save the changes
tx1.commit();
return 0;
}
Exploring the graph
Differently from the other systems compared in the paper, Teseo stores the vertices internally using the identifiers set by the user, these are called user identifiers or real identifiers . The alternative approach, used by the other systems and named logical identifiers , is to preemptively translate the vertices into a logical dense domain, in [0, N), where N are the total number of vertices. In Teseo, vertices can be retrieved using both the user identifiers and the logical identifiers .
The following snippet shows how to find all vertices in the graph:
auto tx = database.start_transaction(/* read only ? */ true);
cout << "Number of vertices: " << tx.num_vertices() << "\n";
for(uint64_t i = 0; i < tx.num_vertices(); i++){
uint64_t vertex_id = tx.vertex_id(i);
cout << "[" << i << "] vertex: " << vertex_id << "\n";
}
tx.commit();
The method uint64_t vertex_id(uint64_t logical_id) translates a logical identifier into its associated user identifier. The opposite translation is given by the method uint64_t logical_id(uint64_t vertex_id), which translates a user identifier into its logical association.
The method uint64_t degree(uint64_t id, bool logical = false) returns the number of edges attached to a given vertex. The flag logical determines whether the given vertex identifier is logical (logical = true) or real (logical = false).
The edges attached to a vertex can be accessed through an iterator, created with the method Transaction::iterator(). Rather than continuously invoking a function such as next() or *iterator, in Teseo, an iterator takes as input a callback or a lambda function, which is invoked every time an edge is fetched from the storage. The callback must have one of the following signatures:
bool (*) (uint64_t destination, double weight): receive both the destination and the weight of an edge, use the return value to stop the iterator;bool (*) (uint64_t destination): receive only the destination of an edge, use the return value to stop the iterator;void (*) (uint64_t destination, double weight): receive both the destination and the weight of an edge, the iterator will fetch all edges associated to the vertex;void (*) (uint64_t destination): receive only the destination of an edge, the iterator will fetch all edges associated to the vertex.
The return value of the callback can be used to stop the iterator (`return fal
