Networkanalysis
Java package that provides data structures and algorithms for network analysis.
Install / Use
/learn @CWTSLeiden/NetworkanalysisREADME
networkanalysis
Introduction
This Java package provides algorithms and data structures for network analysis. Currently, the package focuses on clustering (or community detection) and layout (or mapping) of networks. In particular, the package contains an implementation of the Leiden algorithm and the Louvain algorithm for network clustering and the VOS technique for network layout. Only undirected networks are supported.
The networkanalysis package was developed by Nees Jan van Eck, Vincent Traag, and Ludo Waltman at the Centre for Science and Technology Studies (CWTS) at Leiden University.
Documentation
Documentation is provided in the source code in javadoc format.
The documentation is also available in a compiled format.
Installation
Maven
<dependency>
<groupId>nl.cwts</groupId>
<artifactId>networkanalysis</artifactId>
<version>1.3.0</version>
</dependency>
Gradle
implementation group: 'nl.cwts', name: 'networkanalysis', version: '1.3.0'
Usage
The networkanalysis package requires Java 8 or higher.
The latest version of the package is available as a pre-compiled jar on Maven Central and GitHub Packages.
Instructions for compiling the source code of the package are provided below.
To run the clustering algorithms, the command-line tool RunNetworkClustering is provided.
The tool can be run as follows:
java -cp networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering
If no further arguments are provided, the following usage notice will be displayed:
RunNetworkClustering version 1.3.0
By Vincent Traag, Ludo Waltman, and Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Usage: RunNetworkClustering [options] <filename>
Identify clusters (also known as communities) in a network using either the
Leiden or the Louvain algorithm.
The file in <filename> is expected to contain a tab-separated edge list
(without a header line). Nodes are represented by zero-index integer numbers.
Only undirected networks are supported. Each edge should be included only once
in the file.
Options:
-q --quality-function {CPM|Modularity} (default: CPM)
Quality function to be optimized. Either the CPM (constant Potts model) or
the modularity quality function can be used.
-n --normalization {none|AssociationStrength|Fractionalization} (Default: none)
Method for normalizing edge weights in the CPM quality function.
-r --resolution <resolution> (default: 1.0)
Resolution parameter of the quality function.
-m --min-cluster-size <min. cluster size> (default: 1)
Minimum number of nodes per cluster.
-a --algorithm {Leiden|Louvain} (default: Leiden)
Algorithm for optimizing the quality function. Either the Leiden or the
Louvain algorithm can be used.
-s --random-starts <random starts> (default: 1)
Number of random starts of the algorithm.
-i --iterations <iterations> (default: 10)
Number of iterations of the algorithm.
--randomness <randomness> (default: 0.01)
Randomness parameter of the Leiden algorithm.
--seed <seed> (default: random)
Seed of the random number generator.
-w --weighted-edges
Indicates that the edge list file has a third column containing edge
weights.
--sorted-edge-list
Indicates that the edge list file is sorted. The file should be sorted based
on the nodes in the first column, followed by the nodes in the second
column. Each edge should be included in both directions in the file.
--input-clustering <filename> (default: singleton clustering)
Read the initial clustering from the specified file. The file is expected to
contain two tab-separated columns (without a header line), first a column of
nodes and then a column of clusters. Nodes and clusters are both represented
by zero-index integer numbers. If no file is specified, a singleton
clustering (in which each node has its own cluster) is used as the initial
clustering.
-o --output-clustering <filename> (default: standard output)
Write the final clustering to the specified file. If no file is specified,
the standard output is used.
To run the layout algorithm, the command-line tool RunNetworkLayout is provided.
The tool can be run as follows:
java -cp networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkLayout
If no further arguments are provided, the following usage notice will be displayed:
RunNetworkLayout version 1.3.0
By Nees Jan van Eck and Ludo Waltman
Centre for Science and Technology Studies (CWTS), Leiden University
Usage: RunNetworkLayout [options] <filename>
Determine a layout for a network using the gradient descent VOS layout
algorithm.
The file in <filename> is expected to contain a tab-separated edge list
(without a header line). Nodes are represented by zero-index integer numbers.
Only undirected networks are supported. Each edge should be included only once
in the file.
Options:
-q --quality-function {VOS|LinLog} (default: VOS)
Quality function to be optimized. Either the VOS (visualization of
similarities) or the LinLog quality function can be used.
-n --normalization {none|AssociationStrength|Fractionalization} (Default: none)
Method for normalizing edge weights in the VOS quality function.
-a --attraction <attraction> (Default: 2)
Attraction parameter of the VOS quality function.
-r --repulsion <repulsion> (Default: 1)
Repulsion parameter of the VOS quality function.
-s --random-starts <random starts> (default: 1)
Number of random starts of the gradient descent algorithm.
-i --max-iterations <max. iterations> (default: 1000)
Maximum number of iterations of the gradient descent algorithm.
--initial-step-size <initial step size> (default: 1.0)
Initial step size of the gradient descent algorithm.
--min-step-size <min. step size> (default: 0.001)
Minimum step size of the gradient descent algorithm.
--step-size-reduction <step size reduction> (default: 0.75)
Step size reduction of the gradient descent algorithm.
--required-quality-value-improvements <required quality value improvements>
(default: 5)
Required number of quality value improvements of the gradient descent
algorithm.
--seed <seed> (default: random)
Seed of the random number generator.
-w --weighted-edges
Indicates that the edge list file has a third column containing edge
weights.
--sorted-edge-list
Indicates that the edge list file is sorted. The file should be sorted based
on the nodes in the first column, followed by the nodes in the second
column. Each edge should be included in both directions in the file.
--input-layout <filename> (default: random layout)
Read the initial layout from the specified file. The file is expected to
contain three tab-separated columns (without a header line), first a column
of nodes, then a column of x coordinates, and finally a column of
y coordinates. Nodes are represented by zero-index integer numbers. If no
file is specified, a random layout (in which each node is positioned at
random coordinates) is used as the initial layout.
-o --output-layout <filename> (default: standard output)
Write the final layout to the specified file. If no file is specified,
the standard output is used.
Example
The following example illustrates the use of the RunNetworkClustering and RunNetworkLayout tools.
Consider this network:
0-----1
\ /
\ /
2
|
3
/ \
/ \
4-----5
The network is encoded as an edge list that is saved in a text file containing two tab-separated columns:
0 1
1 2
2 0
2 3
3 5
5 4
4 3
Nodes must be represented by integer numbers starting from 0.
Assuming that the edge list has been saved in the file network.txt, the RunNetworkClustering tool can be run as follows:
java -cp networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering -r 0.2 -o clusters.txt network.txt
In this case, clusters are identified using the Leiden algorithm.
The CPM (constant Potts model) quality function is used without normalizing edge weights.
A value of 0.2 is used for the resolution parameter.
The resulting clustering is saved in the text file clusters.txt that contains two tab-separated columns:
0 0
1 0
2 0
3 1
4 1
5 1
The file clusters.txt shows that two clusters have been identified.
The first column in the file represents a node, and the second column represents the cluster to which the node belongs.
Cluster 0 includes nodes 0, 1, and 2.
Cluster 1 includes nodes 3, 4, and 5.
The RunNetworkLayout tool can be run as follows:
