Ligra: A Lightweight Graph Processing Framework for Shared Memory

Organization

The code for Ligra, Ligra+, and Hygra is located in the ligra/ directory. The code for the applications is in the apps/ directory, which is where compilation should be performed. Example inputs are provided in the inputs/ directory. Graph and hypergraph utilities are provided in the utils/ directory.

Compilation

Compilation is done from within the apps/ directory. The compiled code will work on both uncompressed and compressed graphs and hypergraphs.

Compilers

g++ >= 5.3.0 with support for Cilk Plus
g++ >= 5.3.0 with OpenMP
Intel icpc compiler

To compile with g++ using Cilk Plus, define the environment variable CILK. To compile with icpc, define the environment variable MKLROOT and make sure CILK is not defined. To compile with OpenMP, define the environment variable OPENMP and make sure CILK and MKLROOT are not defined. Using Cilk Plus seems to give the best parallel performance in our experience. To compile with g++ with no parallel support, make sure CILK, MKLROOT and OPENMP are not defined.

Note: OpenMP support in Ligra has not been thoroughly tested. If you experience any errors, please send an email to Julian Shun.

For processing compressed graph and hypergraph files, there are three compression schemes currently implemented that can be used---byte codes, byte codes with run-length encoding and nibble codes. By default, the code is compiled for byte codes with run-length encoding. To use byte codes instead, define the environment variable BYTE, and to use nibble codes instead, define the environment variable NIBBLE. Parallel decoding within a vertex can be enabled by defining the environment variable PD (by default, a vertex's edge list is decoded sequentially).

After the appropriate environment variables are set, to compile, simply run

$ make -j  #compiles with all threads

The following commands cleans the directory:

$ make clean #removes all executables
$ make cleansrc #removes all executables and linked files from the ligra/ directory

Running code in Ligra

The applications take the input graph as input as well as an optional flag "-s" to indicate a symmetric graph. Symmetric graphs should be called with the "-s" flag for better performance. For example:

$ ./BFS -s ../inputs/rMatGraph_J_5_100
$ ./BellmanFord -s ../inputs/rMatGraph_WJ_5_100

For BFS, BC and BellmanFord, one can also pass the "-r" flag followed by an integer to indicate the source vertex. rMat graphs along with other graphs can be generated with the graph generators in the utils/ directory. By default, the applications are run four times, with times reported for the last three runs. This can be changed by passing the flag "-rounds" followed by an integer indicating the number of timed runs.

On NUMA machines, adding the command "numactl -i all " when running the program may improve performance for large graphs. For example:

$ numactl -i all ./BFS -s <input file>

Running code in Hygra

The hypergraph applications are located in the apps/hyper/ directory. The applications take the input hypergraph as input as well as an optional flag "-s" to indicate a symmetric hypergraph. Symmetric hypergraphs should be called with the "-s" flag for better performance. For example:

$ ./HyperBFS -s ../inputs/test
$ ./HyperSSSP -s ../inputs/test-wgh

For traversal algorithms, one can also pass the "-r" flag followed by an integer to indicate the source vertex. Random hypergraphs can be generated with the hypergraph generator in the utils/ directory.

Running code on compressed graphs and hypergraphs (Ligra+)

When using Ligra+, graphs and hypergraphs must first be compressed using the encoder program provided (encoder and hypergraphEncoder). The encoder program takes as input a file in the format described in the next section, as well as an output file name. For symmetric graphs and hypergraphs, the flag "-s" should be passed before the filenames, and for weighted graphs and hypergraphs, the flag "-w" should be passed before the filenames. For example:

$ ./encoder -s ../inputs/rMatGraph_J_5_100 ../inputs/rMatGraph_J_5_100.compressed
$ ./encoder -s -w ../inputs/rMatGraph_WJ_5_100 ../inputs/rMatGraph_WJ_5_100.compressed
$ ./hypergraphEncoder -s ../inputs/test ../inputs/test.compressed
$ ./hypergraphEncoder -s -w ../inputs/test-wgh ../inputs/test-wgh.compressed

After compressing the inputs, the applications can be run in the same manner as on uncompressed inputs, but with an additional "-c" flag. For example:

$ ./BFS -s -c ../inputs/rMatGraph_J_5_100.compressed
$ ./BellmanFord -s -c ../inputs/rMatGraph_WJ_5_100.compressed
$ ./HyperBFS -s -c ../inputs/test.compressed
$ ./HyperSSSP -s -c ../inputs/test-wgh.compressed

Make sure that the compression method used for compilation of the applications is consistent with the method used to compress the input with the encoder program.

Input Format for Ligra applications

The input format of unweighted graphs should be in one of two formats (the Ligra+ encoder currently only supports the first format).

The adjacency graph format from the Problem Based Benchmark Suite (http://www.cs.cmu.edu/~pbbs/benchmarks/graphIO.html). The adjacency graph format starts with a sequence of offsets one for each vertex, followed by a sequence of directed edges ordered by their source vertex. The offset for a vertex i refers to the location of the start of a contiguous block of out edges for vertex i in the sequence of edges. The block continues until the offset of the next vertex, or the end if i is the last vertex. All vertices and offsets are 0 based and represented in decimal. The specific format is as follows:

AdjacencyGraph
<n>
<m>
<o0>
<o1>
...
<o(n-1)>
<e0>
<e1>
...
<e(m-1)>

This file is represented as plain text.

In binary format. This requires three files NAME.config, NAME.adj, and NAME.idx, where NAME is chosen by the user. The .config file stores the number of vertices in the graph in text format. The .idx file stores in binary the offsets for the vertices in the CSR format (the <o>'s above). The .adj file stores in binary the edge targets in the CSR format (the <e>'s above).

Weighted graphs: For format (1), the weights are listed at the end of the file (after <e(m-1)>), and the first line of the file should store the string "WeightedAdjacencyGraph". For format (2), the weights are stored after all of the edge targets in the .adj file.

By default, format (1) is used. To run an input with format (2), pass the "-b" flag as a command line argument.

By default the offsets are stored as 32-bit integers, and to represent them as 64-bit integers, compile with the variable LONG defined. By default the vertex IDs (edge values) are stored as 32-bit integers, and to represent them as 64-bit integers, compile with the variable EDGELONG defined.

Input Format for Hygra applications

The input can be in either adjacency hypergraph format or binary format, similar to graphs.

The adjacency hypergraph format starts with a sequence of offsets one for each vertex, followed by a sequence of incident hyperedges (the vertex is an incoming member of the hyperedge) ordered by vertex, followed by a sequence of offsets one for each hyperedge, and finally a sequence of incident vertices (the vertex is an outgoing member of the hyperedge) ordered by hyperedge. All vertices, hyperedges, and offsets are 0 based and represented in decimal. For a graph with nv vertices, mv incident hyperedges for the vertices, nh hyperedges, and mh incident vertices for the hyperedges, the specific format is as follows:

AdjacencyHypergraph
<nv> <mv> <nh> <mh> <ov0>
<ov1>
...
<ov(nv-1)>
<ev0>
<ev1>
...
<ev(mv-1)>
<oh0>
<oh1>
...
<oh(nh-1)>
<eh0>
<eh1>
...
<eh(mh-1)>

This file is represented as plain text.

In binary format. This requires five files NAME.config, NAME.vadj, NAME.vidx, NAME.hadj, NAME.hidx, where NAME is chosen by the user. The .config file stores nv, mv, nh, and mh in text format. The .vidx and .hidx files store in binary the offsets for the vertices and hyperedges (the <ov>'s and <oh>'s above). The .vadj and .hadj files stores in binary the neighbors (the <ev>'s and <eh>'s above).

Weighted hypergraphs: For format (1), the weights are listed as another sequence following the sequence of neighbors for vertices or hyperedges file (i.e., after <ev(mv-1)> and <eh(mh-1)>), and the first line of the file should store the string "WeightedAdjacencyHypergraph". For format (2), the weights are stored after all of the edge targets in the .vadj and .hadj files.

Utilities

Several utilities are provided in the utils/ directory and can be compiled using "make".

Graph Generators

Three graph generators from the PBBS project are provided. rMatGraph generators an rMat graph (described by Chakrabarti, Zhan and Faloutsos in SDM '04). The required parameters are the number of vertices and the output file. By default the number of directed edges is set to 10 times the number of vertices, and can be changed by specifying the "-m" flag followed by the number of edges. The default parameters are (a=.5, b=.1, c=.1 and d=.3), and can be changed by specifying the "-a", "-b", and "-c" flags each followed by the desired probability (d=1-a-b-c). The "-r" flag followed by an integer specifies the random seed (default value of 1). randLocalGraph generators a random graph, and the required parameters are the number of vertices (10 times the number of edges are generated by default, and can be chang

Ligra

Install / Use

README