as a PDF

Transcrição

as a PDF
Brandenburgische Technische Universität Cottbus
Institut für Informatik
Lehrstuhl Software-Systemtechnik
Diplomarbeit
A Multi-Level Algorithm for
Modularity Graph Clustering
Randolf Rotta
June 30, 2008
1. Gutachter: Prof. Claus Lewerentz
2. Gutachter: Prof. Klaus Meer
Betreuer: Dr. Andreas Noack
Selbstständigkeitserklärung
Ich erkläre hiermit, dass ich die vorliegende Diplomarbeit selbstständig und ohne
unerlaubter Hilfe angefertigt habe. Alle verwendeten Hilfsmittel und Quellen sind
im Literaturverzeichnis vollständig aufgeführt und die aus den benutzten Quellen
wörtlich oder inhaltlich entnommenen Stellen als solche kenntlich gemacht.
Cottbus, Juni 2008
Danksagungen
Einen ganz besonderen Dank möchte ich meinem Betreuer Dr. Andreas Noack ausprechen. Erst seine wertvollen, zielgerichteten Kommentare haben diese Arbeit
möglich gemacht. Weiterer Dank gilt den Lehrstühlen Software-Systemtechnik und
Theoretische Informatik für die Bereitstellung von Rechentechnik für die Evaluation der Algorithmen. Zu großem Dank fühle ich mich auch meinen Eltern und Dr.
Romain Gengler für ihre Geduld und motivierenden Hilfestellungen verpflichtet.
Desweiteren danke ich meinen zahlreichen Kommilitonen für viele anregende und
aufschlussreiche Diskussionen.
Zu guter Letzt sei den vielen Autoren hilfreicher Werkzeuge und Programme
gedankt. Dazu zählen die GNU Autotools, die GNU Compiler Collection und die
C++ Boost-Bibliotheken, insbesondere dem Spirit Parser und Scanner Framework
für die Dateiverarbeitung, dem IOStreams-Paket für transparente Kompression, und
der Boost Graph Library für praktische Algorithmen und viele Ideen. Die Graphen
für die Evaluation stammen aus Sammlungen von Mark Newman, Alex Arenas,
Uri Alon, und dem Pajek Project um Vladimir Batagelj. Die Analyse der experimentellen Daten wurde durch das GNU R Project for Statistical Computing, der
iGraph Bibliothek und dem GGobi Data Visualization System ermöglicht.
Contents
List of Figures
VII
List of Tables
IX
1 Introduction
1.1 An Introductory Example . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Objectives and Outline . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Graph Clustering
2.1 Basic Definitions . . . . . . . . . . . . . . . . . .
2.1.1 Graphs . . . . . . . . . . . . . . . . . . .
2.1.2 Attributed Graphs . . . . . . . . . . . . .
2.1.3 Graph Clusterings and Quality Measures
2.2 The Modularity Measure of Newman . . . . . . .
2.3 Density-Based Clustering Quality Measures . . .
2.3.1 Volume and Density . . . . . . . . . . . .
2.3.2 Bias and the Null Model . . . . . . . . . .
2.3.3 Derived Quality Measures . . . . . . . . .
2.3.4 Volume Models . . . . . . . . . . . . . . .
2.4 Fundamental Clustering Strategies . . . . . . . .
2.4.1 Directing Components . . . . . . . . . . .
2.4.2 Constructing Components . . . . . . . . .
2.4.3 Meta-Strategies . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 The Multi-Level Refinement Algorithm
3.1 The Multi-Level Scheme . . . . . . . . . . . . . . . .
3.2 Graph Coarsening . . . . . . . . . . . . . . . . . . .
3.2.1 Greedy Grouping . . . . . . . . . . . . . . . .
3.2.2 Greedy Matching . . . . . . . . . . . . . . . .
3.3 Merge Selectors . . . . . . . . . . . . . . . . . . . . .
3.3.1 Local Merge Selectors . . . . . . . . . . . . .
3.3.2 Random Walks . . . . . . . . . . . . . . . . .
3.3.3 Spectral Methods . . . . . . . . . . . . . . . .
3.4 Cluster Refinement . . . . . . . . . . . . . . . . . . .
3.4.1 Design Space of Local Refinement Algorithms
3.4.2 Greedy Refinement . . . . . . . . . . . . . . .
3.4.3 Kernighan-Lin Refinement . . . . . . . . . . .
3.5 Further Implementation Notes . . . . . . . . . . . .
3.5.1 Index Spaces and Graphs . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
6
7
7
9
9
10
10
13
17
19
21
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
28
29
31
32
34
35
39
42
43
48
49
53
53
V
Contents
3.5.2
Data Management . . . . . . . . . . . . . . . . . . . . . . . .
4 Evaluation
4.1 Methods and Data . . . . . . . . . . . .
4.1.1 Configuration Space . . . . . . .
4.1.2 Effectiveness . . . . . . . . . . .
4.1.3 Efficiency and Scalability . . . .
4.2 Effectiveness of the Graph Coarsening .
4.2.1 Match Fraction . . . . . . . . . .
4.2.2 Coarsening Methods . . . . . . .
4.2.3 Reduction Factor . . . . . . . . .
4.3 Effectiveness of the Merge Selectors . . .
4.3.1 Random Walk Distance . . . . .
4.3.2 Random Walk Reachability . . .
4.3.3 Comparison of Merge Selectors .
4.4 Effectiveness of the Cluster Refinement .
4.4.1 Greedy Refinement . . . . . . . .
4.4.2 Kernighan-Lin Refinement . . . .
4.5 Scalability . . . . . . . . . . . . . . . . .
4.6 Comparison to Reference Algorithms . .
4.7 Comparison to Published Results . . . .
4.7.1 The Graphs and Clusterings . . .
4.7.2 Summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Results and Future Work
5.1 Results of this Work . . . . . . . . . . . . . . . .
5.2 By-Products . . . . . . . . . . . . . . . . . . . . .
5.3 Directions for Future Work . . . . . . . . . . . .
5.3.1 Pre-Coarsening . . . . . . . . . . . . . . .
5.3.2 Study of Merge Selectors . . . . . . . . .
5.3.3 Linear Programming . . . . . . . . . . . .
5.3.4 Multi-Pass Clustering and Randomization
5.3.5 High-Level Refinement Search . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
59
59
61
63
64
64
65
67
67
67
69
70
72
73
74
75
76
79
79
85
.
.
.
.
.
.
.
.
87
87
88
89
89
90
90
90
90
Bibliography
93
A The Benchmark Graph Collection
99
B Clustering Results
VI
101
List of Figures
1.1
Graph of the Mexican political elite . . . . . . . . . . . . . . . . . .
2
2.1
2.2
2.3
Example Volume Models . . . . . . . . . . . . . . . . . . . . . . . . .
Graph of US Airports . . . . . . . . . . . . . . . . . . . . . . . . . .
Recursive Subdivision and Hierarchical Clusterings . . . . . . . . . .
14
16
23
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
Operations for local cluster modification . . . . . . . . . . . . . .
The Multi-Level Scheme . . . . . . . . . . . . . . . . . . . . . . .
The Multi-Level Algorithm . . . . . . . . . . . . . . . . . . . . .
Coarsening Method: Greedy Grouping . . . . . . . . . . . . . . .
Merging two Vertices and their Edges . . . . . . . . . . . . . . .
Coarsening Method: Greedy Matching . . . . . . . . . . . . . . .
Contribution of Neighbor Vertices to the Visit Probability . . . .
Spectral vertex vectors and two cluster vectors . . . . . . . . . .
Dependencies when Moving a Vertex . . . . . . . . . . . . . . . .
Refinement Method: Complete Greedy . . . . . . . . . . . . . . .
Refinement Method: Sorted Greedy . . . . . . . . . . . . . . . .
Refinement Method: basic Kernighan-Lin . . . . . . . . . . . . .
Kernighan-Lin Refinement Creating Clusters on Graph Epa main
Effective Search Depth of Kernighan-Lin Refinement . . . . . . .
Index Spaces: Class and Concept Diagram . . . . . . . . . . . . .
Index Maps: Class and Concept Diagram . . . . . . . . . . . . .
C++ Example: Calculation of Vertex Degrees . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
27
29
30
31
36
40
45
48
49
50
51
52
53
54
55
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
Mean Modularity by Match Fraction (reduced set) . . . . . .
Modularity and Runtime by Reduction Factor . . . . . . . . .
The Random Walk Distance . . . . . . . . . . . . . . . . . . .
The Random Walk Reachability . . . . . . . . . . . . . . . . .
Mean Modularity of the Merge Selectors (large set) . . . . . .
Runtime of the Merge Selectors . . . . . . . . . . . . . . . . .
Mean Modularity by Refinement Method (reduced set) . . . .
Mean Modularity by Refinement Method (large set) . . . . .
Runtime by Graph Size . . . . . . . . . . . . . . . . . . . . .
Clustering Results and Runtime of the Reference Algorithms
Reference Graphs (karate and dolphins) . . . . . . . . . . . .
Reference Graphs (polBooks and afootball) . . . . . . . . . .
Reference Graphs (polBooks and celegans metabolic) . . . . .
Reference Graphs (circuit s838 and email) . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
65
68
69
71
71
73
75
76
78
80
81
82
83
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
VII
List of Tables
2.1
2.2
Overview of Modularity Graph Clustering Algorithms . . . . . . . .
Overview of Clustering and Partitioning Algorithms . . . . . . . . .
18
19
3.1
3.2
3.3
Proposed Merge Selection Qualities . . . . . . . . . . . . . . . . . . .
Classification of Refinement Algorithms . . . . . . . . . . . . . . . .
Hierarchical Naming Convention for File Names . . . . . . . . . . . .
33
47
56
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
The Configuration Space . . . . . . . . . . . . . . . . .
Mean Modularity by Match Fraction . . . . . . . . . .
Mean Modularity by Reduction Factor . . . . . . . . .
Runtime by Reduction Factor . . . . . . . . . . . . . .
Mean Modularity with Random Walk Distance . . . .
Mean Modularity with Random Walk Reachability . .
Mean Modularity of Merge Selectors . . . . . . . . . .
Mean Modularity by Refinement Method (reduced set)
Mean Modularity by Refinement Method (large set) .
Clustering Results of Reference Algorithms . . . . . .
Comparison to Published Results . . . . . . . . . . . .
60
65
66
66
68
70
72
74
74
77
85
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A.1 The Benchmark Graph Collection . . . . . . . . . . . . . . . . . . . . 99
A.2 References to the Graph Sources . . . . . . . . . . . . . . . . . . . . 100
B.1
B.2
B.3
B.4
B.5
B.6
Random Walk Distance by Graph . . . . . . .
Random Walk Reachability, Length 2 . . . . .
Random Walk Reachability, Length 3 . . . . .
Clustering Results from the Refinement Phase .
Runtime Measurements . . . . . . . . . . . . .
Runtime of the Reference Algorithms . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
101
101
102
103
104
IX
1 Introduction
Since the rise of computers and related information technologies it became easy to
collect vast amounts of data. Combined with technological advances in linking persons, discussions, and documents, various relational information is readily available
for analysis. These networks are able to provide insight into the function of whole
societies. Thus nowadays network analysis is an important tool also outside scientific
communities, for example in market analysis and politics.
The exploration and utilization of collected data requires elaborate analysis strategies supported by computational methods. A common abstraction for relational networks are graphs. In these entities like persons are represented by vertices. Graphs
describe the observed relationships between entities by edges connecting related vertices. In addition the edges may be attributed by weights to describe the strength
of connections.
Graph clustering is the problem and process of grouping vertices together into
disjoint sets called clusters. Discovering such clusters allows to analyze the underlying shared properties in and differences between groups. Certainly many different
clusterings are possible. In order to find practically useful ones a quality measure is
necessary. This measure leads optimization algorithms to interesting clusterings. A
few years ago Mark Newman introduced a specific quality measure called modularity [60]. It was developed for the analysis of social networks but also proved to be
useful in other scientific areas.
1.1 An Introductory Example
An example drawing of such a network is given in Figure 1.1. It is based on a
study of Jorge Gil-Mendieta and Samuel Schmidt [30] about the network of influential Mexican politicians between 1910 and 1994. In the graph each politician is
represented by a vertex and edges connect pairs of politicians based on friendships,
business partnerships, marriages, and similar. In the figure vertices are displayed
as boxes and circles with a size proportional to their number of connections. The
placement of the vertices was computed with the LinLog energy model [62] and
edges are drawn as straight lines connecting related vertices.
In the figure the vertices are colored according to a clustering of the politicians.
Based on optimizing the modularity measure, the shown clustering was automatically constructed by the algorithm developed in this work. Three major clusters are
visible. These roughly corresponding to three generational changes and differences
in the military vs. civil background. Similar groups were also discovered by the
Gil-Mendieta and Schmidt, who manually derived groups by studying the importance and influence of single persons to the network using various special purpose
centrality indices.
1
1 Introduction
Figure 1.1: Graph of the Mexican political elite [30] from 1910 to 1994. Each box
represents a Mexican politician. The family name and the main year of political
activity is printed for each. The names of actual presidents are printed in red color.
The lines connect related politicians based on friendships, business partnerships,
marriages, and similar. Finally the color of the boxes shows the best clustering
computed with the algorithms presented in this work.
2
1.2 Objectives and Outline
1.2 Objectives and Outline
The main objective of this work is the implementation and evaluation of effective and
efficient clustering algorithms for the modularity quality measure of Newman [60].
Optimizing the modularity is a NP-complete problem [13]. Therefore it is unlikely
that a fast, polynomial time algorithm finds a clustering of optimal modularity in
all graphs. Instead relatively good clusterings have to suffice.
The major design goal for the clustering algorithm is the effectiveness. The developed algorithm shall be able to find clusterings of very good modularity compared
to alternative methods. Secondly the efficiency is a concern. Complicated and expensive algorithms should replaced by simpler alternatives in case they are not able
to produce significantly better clusterings.
This work focuses on multi-level refinement heuristics. In the past these were
successfully applied on similar problems like minimum cut partitioning [41, 43, 78].
However modularity clustering differs significantly from these. The optimization
aim is for example more complex because the number of clusters is not known in
advance but has to be discovered by the algorithm. To date no detailed study exists
on the adaption of multi-level refinement to modularity clustering. In this context
also a comparison of the effectiveness and efficiency to other clustering algorithms
is necessary.
Chapter 2 introduces the mathematical formulation of the modularity clustering
aim and discusses related concepts. This analysis provides the base for the later
development of strategies specific to the modularity. The chapter concludes in Section 2.4 with the study of existing clustering algorithms. From these fundamental
strategies and components are derived. The developed multi-level algorithm will
combine several particularly useful components to benefit from their advantages.
Based on the insights in the mathematical structure and fundamental clustering
strategies Chapter 3 presents the developed family of algorithms. The algorithms are
based on multi-level refinement methods adapted to the special needs of modularity
clustering. The algorithm operates in two phases. First a hierarchy of coarsened
graphs is produced. Then this hierarchy is used in the refinement of an initial
clustering.
The coarsening phase involves the selection of pairs of clusters to be merged. A
sub-goal of this work is the exploration of alternative selection criteria in order to
improve the effectiveness and overall clustering results. This includes criteria derived
from random walks and spectral methods like presented in Section 3.3. In addition
the implementation should incorporate appropriate data structures and algorithms
to efficiently support these merge selector.
The second major component of the algorithm are cluster refinement heuristics.
These try to improve an initial clustering by exploring similar clusterings and selecting one of them. Here such neighbor clusterings are generated by moving single
vertices. The objective of Section 3.4 is the exploration of heuristics for the search
of local optima, i.e. clusterings without any neighbor clusterings of better modularity. Here the efficiency is an important concern because many traditional heuristics,
like Kernighan-Lin refinement [45, 24], cannot be efficiently implemented for the
modularity measure.
3
1 Introduction
In order to assess the practical usefulness of the developed algorithm an empirical
evaluation is carried out in Chapter 4. The experimental measurements are based on
a set of benchmark graphs collected from a wide range of application domains. First
the different configurations of the algorithm are compared regarding their efficiency
and effectiveness in finding good clusterings. Based on this evaluation an efficient
and effective default configuration and a few effective variations are chosen. Then
these are compared against a range of reference algorithms and clustering results
found in the literature.
4
2 Graph Clustering
This chapter introduces into graph clustering with the focus on Newman’s modularity as optimization aim. It should provide the necessary foundations to develop
and compare clustering algorithms. For this purpose a mathematical analysis of the
modularity and related models is as necessary as the study of existing algorithms.
This chapter is organized as follows. First graphs and clusterings are formally
defined together with helpful mathematical notations. The second section introduces the modularity measure in the form used throughout this work. In order to
substantiate its relationship to other quality measures the third sections presents an
axiomatic derivation from the density concept. The chapter concludes with a review
and summary of fundamental clustering strategies.
2.1 Basic Definitions
This section formally introduces graphs, weights and clusterings. In the first subsection different types of graphs are discussed. Undirected graphs will be used in the
derivation of quality measures while symmetric, directed graphs are used as internal
representation in the implementation. The second subsection on attributed graphs
defines vertex and edge properties and basic computations on edge weights. The
final subsection formally introduces clusterings and quality measures.
2.1.1 Graphs
An undirected graph G = (V, E) is a set of vertices V combined with a set of edges
E. Two vertices u, v ∈ V are said to be adjacent, if they are connected by an edge
e ∈ E. In that case, u and v are called end-vertices of the edge e and the edge is
incident to both vertices. Each edge connects at most two different vertices and is
called self-edge if it connects a single vertex just to itself.
Let E(X, Y ) ⊆ E be the subset of edges connecting vertices in X ⊆ V with
vertices in Y ⊆ V in an undirected graph. The edges incident to vertex v are E(v) :=
E({v}, V ) and two vertices u, v are adjacent if E(u, v) is non-empty. Conversely let
V (e) ⊆ V be the set of end-vertices incident to edge e ∈ E. For example the
neighborhood of a vertex is the set of all adjacent vertices including itself and is
expressed by N (v) := V (E(v)) ∪ {v}. The subgraph induced by vertices X ⊆ V is
G[X] := (X, E(X, X)). A graph is bipartite if the vertices can be dissected into two
disjoint sets A, B and all edges lie between both sets, i.e. E(A, B) = E.
In directed graphs all edges are oriented and connect a start-vertex with an endvertex. A directed graph is symmetric if for each edge also an edge in the opposing
direction exists. In that case the graph topology is equal to an undirected graph. In
symmetric, directed graphs for each vertex an enumeration of its incident edges is
5
undirected graph
E(X, Y ), V (e),
N (v)
directed, symmetric
2 Graph Clustering
stored and these edges point to the other end vertex. This form is used as internal
representation for undirected graphs because it is easier to traverse.
An undirected graph is called simple if each pair of vertices is connected by at
most one edge. Otherwise several parallel edges may exist connecting the same endvertices. Such graphs are called multi graph. Similarly a directed graph is simple if
it has at most one edge per direction. Throughout this work only simple graphs are
used.
2.1.2 Attributed Graphs
w(v), f (e)
f (u, v), f (X, Y )
(V, ω)
f (X)
Commonly vertices and edges are attributed with additional properties. A vertex
property is a function mapping each vertex to its property value. Edge properties
are defined analogous.
The simplest properties are textual labels like VertexName : V → Σ∗ . They can
be used in visual graph representations to identify and describe vertices and edges in
more detail. Another frequently used type are numerical properties like size, volume,
weight, color, or the spatial position of vertices and the distance between adjacent
vertices. Throughout this work mainly the abstract vertex weight w : V → Q
and edge weight f : E → Q is used. When not otherwise noted these weights are
non-negative.
To simplify P
calculations the edge weight between a pair of vertices is denoted
by
e∈E(u,v) f (e) and between vertex sets X, Y ⊆ V by f (X, Y ) :=
P f (u, v) :=
f
(e).
Therefore
the edge weight between non-adjacent vertices (no cone∈E(X,Y )
necting edge) is zero. In undirected graphs all edge weights are symmetric with
f (u, v) = f (v, u).
In case edge weights ω(u, v) between arbitrary vertex pairs are required the
weight-induced graph (V, ω) with E(u, v) ⇔ ω(u, v) 6= 0 can be used. However
note, that while non-zero edge weights indicate connected vertices, the opposite is
not necessarily true because the edges of a graph define neighborhood relationships
where nevertheless some edges might have a zero weight. In cases where the edge
weights of the graph are more important than the single edges the notation (V, f )
will be used.
The total edge weight f (X) of a set of vertices X ⊆ V is the weight of all
edges having both end-vertices in X. Some care is necessary not to count edges
twice in undirected graphs. For disjoint subsets X ∩ Y = ∅ the equation f (X, Y ) =
f (X ∪ Y ) − f (X) − f (Y ) holds. Therefore total edge weight in X is:
f (X) =
X 1
X
f (u, v) +
f (v, v)
2
u6=v∈X
deg(v)
(2.1)
v∈X
The degree of a vertex is the number of incident edges |E({v}, V )|. The weighted
degree degf (v) is the sum over
Pthe weights f (e) of incident edges. Self-edges are
counted twice in order to gain v degf (v) = 2f (V ). From hereon only the weighted
degree is used and hence the subscript is dropped. In summary the degree is calcu-
6
2.2 The Modularity Measure of Newman
lated by:
deg(v) =
X
f (v, x) + f (v, v)
(2.2)
x∈V
2.1.3 Graph Clusterings and Quality Measures
A clustering C partitions the vertices of a graph into non-empty, disjoint subsets
covering the whole set of vertices. The clustering consists of the clusters i ∈ C with
their vertices Ci ⊆ V . Hence the number of clusters is |C|. The cluster containing
vertex v is denoted by C(v) ∈ C and thus C(v) = i ⇔ v ∈ Ci . To improve readability
C[v] := CC(v) directly refers to the vertices of the cluster containing vertex v.
The term clustering is mostly used in the context of (social) network analysis
because there the aim is to group vertices together. The load balancing community
mostly uses the term partitioning to emphasize the search for balanced dissections.
A clustering quality measure Q(C) maps clusterings of a graph to rational numbers Q called clustering quality. This way the quality measure defines a semi-order
on the set of clusterings with clustering C being at least as good as clustering D
if Q(C) ≥ Q(D). This enables the easy comparison of clusterings and the search
for high quality clusterings although several different clusterings may be of same
quality.
Two quality measures Q1 , Q2 are said to be ranking equivalent when they order
all clusterings equally. This is the case when for all pairs of clusterings C, D holds
that Q1 (C) ≤ Q1 (D) ⇔ Q2 (C) ≤ Q2 (D). Ranking equivalence is invariant under
addition of constants α ∈ Q and multiplication with positive constants 0 < β ∈ Q:
Q(C) ≤ Q(D) ⇐⇒ α + β Q(C) ≤ α + β Q(D). This allows the normalization of
quality measures against other graph properties which is necessary before comparing
qualities between different graphs. Often the maximum quality over all clusterings
or upper limits are used for normalization.
2.2 The Modularity Measure of Newman
This section shortly introduces the clustering quality measure called modularity,
which used throughout this work. The modularity was developed by Newman for
community detection in social networks [58, 59]. Here the modularity definition
from [57] is used and it is connected to a volume and density model. The section
concludes with basic properties of this quality measure. An axiomatic derivation of
the modularity is presented in the next section. Proofs for the NP-completeness of
modularity optimization and other properties can be found in [13]. The modularity
has a resolution limit. On growing graphs small clusters become invisible [25, 5].
Looking at the edges covered by a cluster Newman observes the necessity to
make a ”judgment about when a particular density of edges is significant enough
to define a community” [57]. Hence more edges should be in clusters than expected
from a random redistribution of all edges. This is achieved by maximizing
P
C(u)=C(v) (f (u, v) − P (u, v)), where f (u, v) is the actual edge weight and P (u, v)
7
Ci , C(v), C[v]
Q(C)
ranking equivalence
2 Graph Clustering
the expected weight called null model. The model chosen by Newman has following properties: The weights are symmetric with P (u, v) = P (v, u), the total weight
equals the total weight of the original graph with P (V ) = f (V ) and the vertex degree should be reproduced for all vertices by degP (v) = degf (v). Putting everything
together yields P (u, v) := deg(u) deg(v)/(2f (V )). As last step Newman normalized
the modularity against the total edge weight f (V ) and gained:
1
Q(C) :=
f (V )
=
deg(u) deg(v)
f (u, v) −
2f (V )
C(u)=C(v)
X f (u, v) deg(u) deg(v) −
f (V )
2f (V )2
X
(2.3)
(2.4)
C(u)=C(v)
More insight into this measure can be gained by using the volume Vol(u, v) with
Vol(u, v) = deg(u) deg(v) and Vol(v, v) = deg(v)2 /2. Then the null model equals
the volume scaled to the total edge weight: P (u, v) = f (V ) Vol(u,
v)/ Vol(V ). Here
P
2
Vol(V ) is the total volume and equals (2f (V )) /2 because
deg(v) = 2f (V ). As
the above scaling factor looks similar to a density (mass per volume) it is called
connection density and denoted by ρ(V ) = f (V )/ Vol(V ). In this context the null
model is the local volume multiplied by the global density and the modularity is
rewritten as:
Q(C) =
=
1
f (V )
X
(f (u, v) − ρ(V ) Vol(u, v))
(2.5)
C(u)=C(v)
X
C(u)=C(v)
f (u, v) Vol(u, v)
−
f (V )
Vol(V )
(2.6)
Now it is visible that the modularity is optimized by maximizing the fraction of
edges lying in clusters while minimizing the fraction of expected edges. As consequence the modularity of a clustering with just one cluster is zero. Moreover in a
graph where the actual edge weights are randomly distributed like the null model all
clusterings have zero modularity. A further useful property of the modularity is the
additivity as it can be written as sum of edge weights or the contribution of vertices
or clusters:
Q(C) =
X
C(u)=C(V )
f (u, v) Vol(u, v)
−
f (V )
Vol(V )
X X f (u, v) Vol(u, v) =
−
f (V )
Vol(V )
v∈V u∈C[v]
X f (Ci ) Vol(Ci )
=
−
f (V )
Vol(V )
i∈C
8
(2.7)
(2.8)
(2.9)
2.3 Density-Based Clustering Quality Measures
2.3 Density-Based Clustering Quality Measures
The popular modularity measure of Newman was introduced in the previous section.
However often algorithms proposed in the literature optimize different but related
quality measures. In order to gain some insight into these an axiomatic derivation
is presented in this section. Further connections, differences and possible interpretations are outlined. The whole discussion underlines why the modularity is used
and where its properties come from.
The main purpose of the clustering quality measure is to direct the search for good
vertex clusters. The measures of interest here are based on following properties:
• The optimization should lead to clusters with strong internal connectivity and
sparse connections between clusters. This is the intuition behind ”a group of
people” and ”a subsystem”.
• Secondly the strength of connectivity should be relative to an expected, average
connectivity. This allows to adapt the search to application-specific questions
by providing a reference graph of expected edge weights.
• In structureless graphs similar to the expected average all clusterings should
have only low quality. Otherwise the measure is biased towards discovering
structure where none is expected.
• Finally the number of clusters is not known in advance. It is the task of the
quality measure to determine this number. Thus the measure has to avoid
trivial solutions. For example measures always leading to bisections are of no
use when more than two clusters are possible.
The axiomatic derivation of quality measures is a popular approach. Here, like in
most cases, also the desired properties (axioms) are more or less designed to fit to the
target quality measures. In contrast sometimes directly axioms leading to impossible
quality measures are searched. For example in [46] three properties for similarity
clustering based on vertex distances are defined: Scale-invariance (scaling all distances does not change the cluster structure), richness (for any possible clustering
also a matching distance-function exists), and consistency (increasing inter-cluster
distances or decreasing intra-cluster distances does not change the clustering). No
quality measure can fulfill all three properties at the same time. Fortunately the
properties chosen above for graph clustering are not that restrictive.
The next subsection introduces the connection density and the basic volume model.
Then the relation between biased measures and the density analyzed. The third
subsection explores density-based clustering qualities by starting from the intercluster edge weight. This discussion also shows how the widely accepted modularity
embeds into this framework. All quality measures will be independent of concrete
volume models. Some specific volume models are discussed in the last subsection.
2.3.1 Volume and Density
How to measure the strength of connectivity in and between clusters? One possibility arises by transferring the density from physics to graph clustering. Then
9
ρ(A, B)
2 Graph Clustering
the connection density is the proportion of actual edges (mass) to ”possible” edges
(volume). The actual edges are measured by the sum of their weights f (u, v) and the
”possible” edge weight is given by the reference graph (V, Vol(u, v)) called volume
model from hereon. Hence the connection density is defined as:
f (A, B)
Vol(A, B)
f (A)
ρ(A) :=
Vol(A)
ρ(A, B) :=
Vol(u, v)
(2.10)
(2.11)
As only undirected graphs are considered, it is reasonable to require the volume
to be symmetric Vol(u, v) = Vol(v, u) and positive Vol(u, v) ≥ 0. Because edges
can also be expected where no real edges are, the volume weights are defined on all
vertex pairs. The volume has to be non-zero where the edge weight is above zero.
2.3.2 Bias and the Null Model
f0 (u, v)
Given graph (V, f ) an uniform density graph (V, f0 ) is derived by evenly distributing
the global density over all vertex pairs using ρ(V ) = f0 (u, v)/ Vol(u, v). The uniform
density weights f0 (u, v) = ρ(V ) Vol(u, v) are called null model. The scaling by ρ(V )
transforms the volume into average edge weights.
A quality measure is called biased when in it does not assign the same quality
to all clusterings of the graph (V, f0 ). This bias is unfavorable because it signals
structure where no structure is expected. In reverse a quality measure is unbiased
if applying it to the null model yields a constant quality.
The bias can be removed from a quality measure by either dividing by its null
model quality (scaling) or subtracting the null model quality (shifting) [28, 62]. Let
Qf0 (C) be the quality of clustering C in the null model. Then unbiased measures are
derived like in the equations below. In uniform density graphs the scaled measure
has the same quality 1.0 for all clusterings and shifting yields constant zero quality.
Qscaled (C) := Q(C)/Qf0 (C)
shifted
Q
(C) := Q(C) − Qf0 (C)
(2.12)
(2.13)
2.3.3 Derived Quality Measures
Reformulating the first clustering aim in terms of density, vertex partitions with
low connection density between clusters and high inner density are searched. This
aim may be approached directly by maximizing the difference or the ratio between
intra-cluster density and inter-cluster density:
P
P
i6=j∈C f (Ci , Cj )
i∈C f (Ci )
Q (C) := P
−P
i∈C Vol(Ci )
i6=j∈C Vol(Ci , Cj )
!−1
P
P
i6=j∈C f (Ci , Cj )
i∈C f (Ci )
ratio
P
Q
(C) := P
i∈C Vol(Ci )
i6=j∈C Vol(Ci , Cj )
diff
10
(2.14)
(2.15)
2.3 Density-Based Clustering Quality Measures
Both quality measures are not well studied and probably rarely used. In practice
it proofed sufficient to concentrate on either the inter- or intra-cluster edges. To improve readability let Cutω (C) be the total inter-cluster weights ω(u, v) of clustering
C and Intω (C) the total intra-cluster weights (the interior):
Cutω (C) :=
X
ω(Ci , Cj )/2
(2.16)
i6=j∈C
Intω (C) :=
X
ω(Ci )
(2.17)
i∈C
Interior and cut are related by Cutω (C) = ω(V ) − Intω (C) where ω(V ) is the total
weight. The edge cut Cutf refers to the edge weights f whereas the cut volume is
CutVol . Using this short notation for example both measures of above can be written
as:
Intf (C)
Cutf (C)
−
IntVol (C) CutVol (C)
Cutf (C) −1
Intf (C)
ratio
Q
(C) :=
IntVol (C) CutVol (C)
Qdiff (C) :=
(2.18)
(2.19)
Traditionally the edge cut is used in load balancing problems to minimize the communication overhead represented by the edge weights. But without any further constrains, like a predefined number of clusters or balance requirements, edge cut minimization has trivial solutions. For example placing all vertices in the same partition
has zero cut. Additionally the edge cut is biased with Cutf0 (C) = ρ(V ) CutVol (C)
and the interior with Intf0 (C) = ρ(V ) IntVol (C). Removing this bias yields the scaled
cut and scaled interior as discussed in the next subsection and the shifted cut and
shifted interior discussed in the second subsection.
Scaled Cut and Interior Removing the bias from cut and interior by scaling yields
the scaled cut and scaled interior :
scaled cut
Q
1 Cutf (C)
(C) :=
ρ(V ) CutVol (C)
Qscaled int (C) :=
1 Intf (C)
ρ(V ) IntVol (C)
Cutf (C) CutVol (C) −1
=
f (V )
Vol(V )
Intf (C) IntVol (C) −1
=
f (V )
Vol(V )
(2.20)
(2.21)
Remembering the definition of density it is visible that the scaled cut minimizes the
inter-cluster density. The complete measure is the inter-cluster density normalized
against the global density. It can also be interpreted as the ratio between cut edges in
input graph and null model. The same holds for the scaled interior which maximizes
the intra-cluster density.
11
Cut(C), Int(C)
2 Graph Clustering
In order to optimize the separation between cluster the scaled cut is minimized.
However this prefers coarse clusterings and bisections. Merging two clusters i 6= j ∈
C does not increase the inter-cluster density in when following condition holds:
Cutf (C)
Cutf (C) − f (Ci , Cj )
≥
CutVol (C)
CutVol (C) − Vol(Ci , Cj )
(2.22)
This is the case for ρ(Ci , Cj ) ≥ Cutf (C)/ CutVol (C), i.e. when Ci , Cj have a higher
inter-cluster density than the overall inter-cluster density. Intuitively there should
always be at least one pair with inter-density higher than the current ”average” interdensity and therefore at least one nontrivial clustering of minimal inter-density exists
having only two clusters. Another good reason to distrust the scaled cut is that the
lowest inter-cluster density 0.0 can easily be obtained by putting all vertices into the
same cluster.
Also emphasizing the inner cohesion in clusters by maximizing the scaled interior tends to trivial solutions: Splitting a cluster Ci into Ci,1 , Ci,2 does not decrease
the overall intra-density in case ρ(Ci,1 , Ci,2 ) ≤ Intf (C)/ IntVol (C). Finding a cluster which can be cut with lower inter-density than the current overall intra-cluster
density is very likely. Optimal quality thus leads to very fine-grained clusterings.
Placing each vertex into a singleton cluster except the vertex pair of highest intradensity yields an optimal clustering. Adding other clusters can only lower the overall
intra-density.
Although the scaled cut and interior are not biased they have trivial solutions.
Hence both are not well suited for graph clustering. The scaled cut is related to
random walks and is known as conductance in that context [49].
Shifted Cut and Interior Shifting the edge cut and interior by their null model
bias produces the shifted cut and shifted interior:
Qshifted cut (C) := Cutf (C) − ρ(V ) CutVol (C)
shifted int
Q
(C) := Intf (C) − ρ(V ) IntVol (C)
(2.23)
(2.24)
Minimizing the shifted cut searches clusters where the weight of cut edges is lower
than the expected weight. Similarly shifted interior maximization prefers clusters
with more internal edge weights than expected. Because of Intf (C) = f (V ) −
Cutf (C) maximizing the shifted interior is ranking equivalent to minimizing the
shifted cut. Thus both produce identical clusterings and share the same properties.
Applying the analysis method of the previous section on the shifted cut yields the
condition ρ(Ci , Cj ) ≥ ρ(V ) for successfully merging two clusters and ρ(Ci,1 , Ci,2 ) ≤
ρ(V ) for successfully splitting cluster Ci into Ci,1 ∪ Ci,2 = Ci . Hence the shifted
cut tends towards cuts with inter-cluster density smaller than the global ”average”
density. The parameter ρ(V ) was introduced through the null model and is independent of the current clustering. It is possible to replace it by αρ(V ) in order to
control the granularity of the clusterings. But with α going to zero for example the
shifted cut would develop towards the cut which ignores the volume.
12
2.3 Density-Based Clustering Quality Measures
A further question is how the quality measure handles clusters composed of disconnected components. Let C 0 be the components of a cluster not connected by
any edge. By definition then the cut edge weight is zero: Cutf (C 0 ) = 0. Hence
splitting the components into separate clusters does not change the cut and interior edge weight. But the interior volume decreases and the cut volume increases.
Therefore conforming with intuition the quality measured with shifted interior and
cut improves through this separation.
As already mentioned the shifted cut minimizes the cut of actual edges while maximizing the cut of expected edges. Normalizing against the total edge weight f (V )
allows another interpretation related to the scaled cut: The normalized shifted cut
Qn-shifted cut (C) minimizes the fraction of inter-cluster edges while maximizing the
fraction in the reference graph. Similarly the normalized shifted interior maximizes
fraction of edges in clusters while minimizing the fraction of reference edges. This is
consistent with the desired property to find groups with stronger connections than
expected.
Cutf (C) CutVol (C)
−
f (V )
Vol(V )
Intf (C) IntVol (C)
−
Qn-shifted int (C) :=
f (V )
Vol(V )
Qn-shifted cut (C) :=
(2.25)
= −Qn-shifted cut (C)
(2.26)
Here it is visible that the normalized version of the shifted interior looks very
similar to the modularity measure introduced in Section 2.2. In fact just the
matching volume-model needs to be inserted. The condition for successful merges
ρ(Ci , Cj ) ≥ ρ(V ) also helps to explain the observed resolution limit of the modularity [25]: Increasing the size of a graph by inserting vertices and edges most likely
lowers the global density. Optimizing the modularity or shifted interior will make it
necessary to place several previous clusters into one as long as their inter-density is
still above to new global density. Recently Arenas et al. proposed a parametric correction for this problem by adding weighted self-edges [5]. Increasing the self-edge
weights mostly increases the vertex degree without changing the graph topology.
This results in additional terms in the volume and thus indirectly lowers the global
density ρ(V ).
Altogether the preceding analysis shows that shifted cut and interior meet the
desired properties for clustering qualities: Stronger than expected internal connectivity is preferred by both measures, they are unbiased on uniform density graphs,
and do not tend to trivial solutions.
2.3.4 Volume Models
Community structure results from the uneven distribution of edges in the input
graph. The purpose of the volume model is to describe an even distribution as
reference. The simplest method is to assign each possible edge unit volume by
Vol(u, v) := 1. But then vertices with a higher degree dominate the found cluster
structure as they have much more edges than the volume model expects. However for
example it is quite common that important vertices have a higher number if incident
13
2 Graph Clustering
V ol(u, v) = 6
w(u) = 3
V ol(u, v) = 5
w(v) = 2
(a) Undirected Atomic Vertices
w+ (u) = 3
w+ (v) = 2
(b) Undirected Atomic Vertices
Figure 2.1: Example Volume Models. Fig. (a) shows two vertices containing three
and resp. two atomic vertices. In (b) the vertices contain three and resp. two atomic
sources but just one atomic target.
edges than less important vertices. Thus for many applications it also necessary to
reproduce chosen structural properties.
In this section the basic volume model used in this work, called degree multiplicity, is derived from simple principles. The derivation is based on atomic vertices
and coarsening invariance. Both are introduced first and then the degree model
is presented as a special case. Finally the discussion of related models highlights
important aspects for flexible implementations.
A volume model for the graph (V, f ) is a function Vol : V × V → Q. As the graph
is undirected the volume should be symmetric Vol(u, v) = Vol(v, u). No negative
amounts of vertices can exist, thus the volume is positive Vol(u, v) ≥ 0 for all vertex
pairs. A zero volume would produce infinite density. Therefore the volume has to be
nonzero Vol(u, v) > 0 where the edge weight f (u, v) is nonzero. With the normalized
quality measures of above the volume is scale invariant because constant positive
factors are canceled out by the normalization and the global density term.
Given vertex weights w(v) ∈ N each vertex can be interpreted as a collection of
w(v) atomic vertices. This is like vertices representing whole subsystems as shown
in Figure 2.1a. Effectively a vertex with weight n is accounted with n times more
edges than it would have with unit weight. Thus the vertex weight describes the
importance of the vertex. Counting all possible unit edges between these atomic
vertices leads to the family of multiplicity volume models with Vol(u, v) := w(u)w(v)
for all vertex pairs u 6= v. Thanks to the scale invariance the relaxation to positive
rational ”multiplicities” w(v) ∈ Q+ is possible. The internal volume of a vertex is
Vol(v, v). Several reasonable alternatives exists. For example in case no self-edges
are expected anywhere it can be zero. But this work also handles self-edges and
Vol(v, v) := w(v)2 /2 is used. This also greatly simplifies the computations as will
be visible below.
Total volumes in the multiplicity model can be computed efficiently because
the sum over weight products can be replaced by a product of two sums. The
inter-cluster volume is computed summing the vertex weights separately and just
multiplying these two sums. A similar approach is used for the intra-cluster vol-
14
2.3 Density-Based Clustering Quality Measures
ume. Here each vertex pair is counted twice and all vertices get a fixed self-edge
volume which needs to be compensated. However inserting the self-edge volume
Vol(v, v) := w(v)2 /2 removes this compensation and yields:
X
Vol(A, B) =
w(u)w(v) =
u∈A,v∈B
Vol(A) =
1
2
u∈A
!2
X
v∈A
X
w(v)
−
w(u)
X
w(v)
v∈B
X
1X
w(v)2 +
Vol(v, v)
2
v∈A
= w(A)w(B) (2.27)
v∈A
=
w(A)2
2
(2.28)
This will proof to be an important property for the optimization algorithms of this
work which are based on graph coarsening. A volume model is coarsening invariant if
vertices can be merged without changing the volume: Let v 0 be the vertex produced
by merging v1 and v2 , then Vol(v 0 , x) = Vol(v1 , x) + Vol(v2 , x) shall hold for all
other vertices x and Vol(v 0 , v 0 ) = Vol(v1 , v1 ) + Vol(v1 , v2 ) + Vol(v2 , v2 ). In example
the multiplicity model is coarsening invariant by using w(v 0 ) = w(v1 ) + w(v2 ) for
merging the vertex weights.
The degree multiplicity uses the above model to reproduce the degree centrality by using the weighted vertex degree deg(v) as vertex weight. In this special case the global volume is Vol(V ) = w(V )2 /2 = 2f (V )2 and the global density is ρ(V ) = (2f (V ))−1 . Therefore the null model derived from this volume is
f0 (u, v) = deg(u) deg(v)/(2f (V )). This is exactly the same null model as used for
the modularity derived from expected values of random graphs with probabilities
proportional to the vertex degree [57].
Other volume models based on the multiplicity are possible. In this work just the
degree multiplicity is used simply because it allows the comparison with other algorithms and implementations. Nevertheless the study of alternative models highlights
demands on the implementation. The following list points out some possibilities:
• Other centrality indices (importance measures) can be used as vertex multiplicity. For example in the analysis of software system often the size of the
represented components in number of source lines (SLOC) is preferred as it
produces subjectively better results.1
• When using another self-edge volume it is necessary to store it for each vertex
in order to correctly implement the coarsening invariance. Then the vertex
weights and the self-edge volume are merged together with the vertices.
• In some graphs vertex classes exist with no edges between vertices of the same
class. The volume between vertices of the same class should be zero because no
edges can be expected there. This can be achieved by storing separate vertex
weights w1 , . . . , wn for each class and replacing the product w(u)w(v) by a
sum of products for each pair of classes with expected edges. When merging
vertices simply their weight vectors are summed. For example bipartite graphs
have two classes and w(u)w(v) = w1 (u)w2 (v) + w2 (u)w1 (v).
1
from private communication with Ralf Kopsch at the Software Systems Engineering Research
Group in Cottbus
15
degree
multiplicity
2 Graph Clustering
Figure 2.2: Graph of US Airports connected by direct flights. Airports lying far away
were removed from the plot. The graph is called USAir97 in the graph collection.
• For directed graphs the atomic vertex model could be extended by splitting the
atomic vertices into two classes like shown in Fig. 2.1b: Atomic sources and
targets for directed edges with multiplicities based on the original in- and outdegrees. This leads to volume models similar to the one proposed for bipartite
graphs. This way the graph and volume used for the graph clustering still
can be undirected and symmetric and existing algorithms can be reused. In
example for software systems the source multiplicity of a component could be
the number of contained function calls and the target multiplicity the number
of provided functions.
Reviewing the above possibilities indicates that the multiplicity model has a mathematical structure similar to inner product spaces. Let K be the set of possible vertex
weights w(v) ∈ K. Then calculating the volume w(u) · w(v) ∈ Q is an operation
similar to the inner product. The symmetry of the volume equals the commutativity
of the implemented operator. Merging vertices uses the addition w(u) + w(v) ∈ K.
The coarsening invariance is satisfied by the distributivity of addition and multiplication. However compared to standard vector spaces the inner products used for
volume models have much complexer semantics.
A completely different class of volume models is based on geographic information
16
2.4 Fundamental Clustering Strategies
about the vertices. In such cases vertices lying far away are expected to have weaker
connectivity. For example the number of direct flights between airports is mainly
inverse proportional to their distance. Now consider a graph of airports connected
by edges when they have direct flight connections like shown in Figure 2.2. The
edges are weighted by the frequency of direct flights. Graph clustering methods
then would indirectly just discover the distance as building component of clusters.
However some airports might have exceptionally strong connections because they
for example have economical ties or belong to the same country. In order to study
such structures it would be necessary to remove the influence of the geographical
distance. This might be achieved by integrating this distance into the volume model.
More abstractly the volume is derived from a suitable chosen vertex (dis)similarity
measure. Then weak connections are expected between dissimilar, high-distance vertices. In reverse similar, low-distance vertices are expected to be strongly connected.
In this context the density-based graph clustering can be interpreted as a search for
vertex groups being actually stronger connected than their similarity suggests.
2.4 Fundamental Clustering Strategies
This section tries to give an overview about clustering methods. Many methods combine several basic approaches. These components are provide useful ideas for the
design of new algorithms. From a very short review of the literature common, often
used components were identified. They can be broadly classified into constructing
components, directing components, and meta-strategies. Constructing components
generate a clustering. They contain for example dissection, subdivision, agglomeration, and refinement algorithms. Directing components control how the clustering
is generated. This includes selection criteria and randomization methods. Finally
meta-strategies combine constructing components. Examples are recursive subdivision, multi-level refinement, basin hopping and genetic algorithms. In the following
text the literature is shortly summarized. The next three sections describe basic
directing and constructing components, and meta-strategies.
Table 2.1 summarizes the graph clustering algorithms for the modularity measure.
Most information was derived from reviews of Danon et al. [18] and Newman [56].
The entries are classified by their basic strategy and the third column summarizes
some details. A few publications appear several times as they incorporate different
strategies.
Other algorithms exist, which were not designed to optimize the modularity measure. These include algorithms for the minimum cut partitioning problem, which is
an important topic in load balancing of for example computational fluid dynamic
simulations. In this context a lot of research is already done on graph clustering
algorithms. Therefore this literature might contain useful ideas. A historic review
of graph partitioning methods can be found in [14]. Table 2.2 summarizes some
related methods.
17
2 Graph Clustering
Article
Basic Strategy
agglomeration strategies
[15, 60]
[17]
[76]
[65]
[4]
[57]
[80]
[64]
[21]
[20]
greedy cluster merging by maximal modularity increase
greedy merging by modified selector accounting for the cluster size
greedy merging by modularity increase and cluster size (consolidation ratio)
greedy merging by modularity increase; pre-coarsening of the graph using
random walks
pre-coarsening by directly merging single-edge vertices with their neighbor
vector clustering using vertex vectors derived from eigenvectors of the modularity matrix
k-means clustering or recursive bisection on of the random walk matrix
greedy merging by similarity of visit probability distributions from short
random walks
greedy merging by distance of vertex vectors from the Laplace-Matrix
improved by using a modified matrix similar to the random walk matrix
subdivision strategies
[58]
[80]
[81]
[69]
recursive bisection using the first eigenvector of the modularity matrix
recursive bisection based on a few eigenvectors of the random walk matrix
bisection at voltage gap, similar to random walk betweenness; replaces matrix inversion by truncated power series
recursive spectral k-way subdivision with normalized Laplacian matrix;
mixed with refinement using vertex moves and cluster merges
dissection strategies
[59]
[26]
[66]
removes edges by highest random walk betweenness
removes edges by highest information centrality
removes edges by lowest edge-clustering coefficient
refinement methods
[57]
[23]
[67, 68]
[39, 38]
[52]
[19]
[5]
Kernighan-Lin refinement of clusterings from spectral methods; recommends
greedy refinement
extremal optimization: moves vertices with bad contribution to modularity;
combined with recursive bisection
simulated annealing with fuzzy clustering (co-appearance in the same cluster), null-model derived from a Q-potts spin model
simulated annealing, spin glass system
simulated annealing combined with greedy refinement (quenches); basin hopping (move several vertices, apply greedy refinement, and then accept new
clustering like with simulated annealing)
reduction to minimum cut partitioning with negative edge weights; adapted
multi-level refinement from the METIS library
tabu search: move vertices and prohibit reversing any of the last few moves
linear programming
[13]
[2]
modularity optimization formulated in integer linear programming
formulated as fractional linear program; produces vertex distances in polynomial time, then rounding the distances to {0, 1}
Table 2.1: Overview of Modularity Graph Clustering Algorithms
18
2.4 Fundamental Clustering Strategies
Article
Basic Strategy
[7]
growing cluster as shortest path neighborhood around a start vertex
until only a few edges go outside this cluster
greedy merging of clusters using mean first-passage time of random
walks; transition probability is biased by number of common neighbors
iteratively modifies the edge weights based on short random walks; the
weight of ”inter-cluster” edges declines while edge weights in clusters
raise; developed for clustering spatial data sets
removes edges by highest shortest path betweenness
clusters from flows in the graph; a generalized form of random walks
and other Markov processes
approximation algorithm for a quality measure similar to inter-cluster
density; applies spectral methods and linear programming
moves vertices to neighbor clusters with escaping from local optima
greedy merging by inter-cluster edge weight; already minimizes the
edge cut during graph coarsening
multi-level refinement based on Kernighan-Lin
uses snapshots of the agglomeration, called multi-scale; applies greedy
smoothing as refinement
evolutionary (genetic) search combined with multi-level min-cut partitioning
[83]
[40]
[32]
[75]
[6]
[45, 24]
[43, 1]
[43, 1, 78, 41]
[40]
[74, 71, 73, 72]
Table 2.2: Overview of Clustering and Partitioning Algorithms
2.4.1 Directing Components
Modularity clustering is a computational difficult problem and polynomial time algorithm will unlikely be able to find the optimum clustering in all cases. Actually
most algorithms are heuristics which try to construct a relatively good clustering
from many small decisions. Unbiased heuristics use completely random decisions.
In contrast biased heuristics use a selection criterion to direct the algorithm. In
each step this selector chooses one of several available alternatives using information
inferred from the graph.
The next paragraphs presents commonly used deterministic selection criteria. The
second paragraph show how linear programming can be used to construct selectors.
And the last paragraph discusses randomized selectors.
Selection Criteria Selection criteria can be derived from the global optimization
aim. For clustering by modularity simple selectors directly follow from the mathematical formulation. A more global view might be attained using spectral methods [3, 21, 21, 6, 80, 57]. Other heuristics use centrality or betweenness indices
to select single edges or vertices. These are for example derived from shortest
paths [32], the information centrality [26], clustering coefficients [66], and random
walks [40, 59, 64, 81, 83].
Unfortunately sometimes just the selector alone is defined as optimization aim
while the global reference measure is ignored. This leaves unclear what the final
19
2 Graph Clustering
clustering really describes and makes it difficult to compare different selectors for a
task. Another important question is to which global aim the selector really is leading.
Its possible that the local decisions optimize another property than desired. Here a
small analogy to statistical mechanics is visible. For example in Lattice-Boltzmann
fluid simulations statistical methods are necessary to proof that the simple discrete
model with microscopic (local) collision rules results in the desired macroscopic
(global) flows.
Spectral methods replace the graph by a matrix and clusters by vectors. Then
the clustering quality measure is reformulated with matrix-vector products. This
allows to apply matrix diagonalization and map the problem into the eigenvector
space. Similar to principal component analysis a few strongest eigenvalues and their
eigenvectors are used to derive vertex vectors. Then selection criteria are computed
from their length, distance, or direction.
The vertex vectors actually position the vertices in a vector space. Other methods
to compute such positions exist. In example for graph drawing often spring embedder
algorithms minimizing an energy model are used. With an appropriate model vertex
distances are related to the cluster structure. This transforms graph clustering
into a geometric clustering problem. For the modularity measure such layouts are
computable from the LinLog energy model [62]. In energy minima of this model the
distance between two clusters is roughly inverse proportional to their inter-cluster
density. However despite the provided global information this method is practically
unusable as computing layouts is very expensive.
Linear Programming Linear programming methods reformulate the clustering quality measure as a linear target function. As input to this function a vector of pairwise
vertex distances is given. Vertices of the same cluster have for example zero distance
and vertices in different clusters have distance one. The transitivity, reflexivity, and
symmetry of clusterings is enforced through linear constraints similar to triangle
inequalities.
Now additional constraints limit the vertex distances to the range from zero to
one. This is known as fractional linear programming. Using inner-point methods
exact fractional solutions can be found in polynomial time. Agarwal and Kempe [2]
proposed an agglomeration algorithm to round the distances to zero and one. Thus
basically the distances are used as merge selector. As side product the fractional
solution provides an upper bound for the highest possible modularity of a graph.
Because the set of possible clusterings is a subset of all allowed distance vectors, no
clustering can have a better modularity than the fractional solution.
Brandes et al. [13] used a similar method to derive a binary linear program. There
the vertex distances are restricted to zero and one. Finding an optimum solution
still is NP-complete and requires exponential time. But quite good algorithms exists
implementing this search given the target function and constraints. This allowed to
find the global optimum clusterings of graphs with up to 100 vertices.
Randomization All optimization heuristics may become trapped in suboptimal
clusterings. Randomizing the decisions a little bit is one method to avoid or escape
20
2.4 Fundamental Clustering Strategies
such situations. For example this can be achieved by using random selection criteria
or adding a little bit of randomness to the selector.
One often used method is simulated annealing [67, 68, 39, 38]. Typically a clusterings similar to the current clustering is constructed, i.e. by randomly moving a
vertex to another cluster. The main idea is to direct the search by not accepting each
generated clustering. Better clusterings are always accepted. But worse clusterings
are just accepted with a probability proportional to the modularity decrease ∆Q.
Often the metropolis criterion exp(T −1 ∆Q) is applied. The parameter T controls
the temperature of the algorithm. In the beginning the temperature is high and
many modularity decreasing moves are accepted. This enables a widespread search.
Later the temperature is lowered, increasing the portion of modularity increasing
moves. With zero temperature only better clusterings are accepted and thus the
algorithm finishes in a local optimum.
Duch and Arenas [23] proposed a recursive bisection algorithm. The two clusters
are computed using extremal optimization applied to an initial random bisection.
Extremal optimization can be interpreted as a biased simulated annealing. The
moved vertices are not randomly chosen but selected based on their current contribution to the modularity. This selection criterion is called vertex fitness. The idea
is that moving low-fitness vertices also improves the clustering quality.
2.4.2 Constructing Components
This subsection presents various heuristics to construct clusterings. They use the
selection criteria presented above to direct the construction. Dissection is one of
the oldest strategies often used in the analysis of social networks. It observes how
graphs and clusters fall apart. More details are presented in the next paragraph. The
second paragraph describes agglomeration method. They observe how large clusters
can be put together from smaller clusters. The last paragraph presents refinement
methods that try to improve given clusterings by moving vertices between clusters.
Dissection Dissection algorithms repeatedly remove vertices or edges from the
graph and observe the developing connectivity components. Because clusterings of
the vertex set are searched it is more common to remove edges. Removing an edge
can split a cluster in at most two parts. Thus the produced hierarchical clustering
is a binary tree.
Girvan [32] proposed a dissection algorithm which removes edges which are between clusters and least central to any cluster. These are identified by counting how
many shortest paths from arbitrary vertex pairs pass through each edge. Less edges
are expected between clusters and thus more paths should touch them. After each
removal all betweenness values are recalculated. Later Girvan and Newman [59]
proposed the random walk betweenness as improved selector. In the same paper the
modularity was introduced as measure for the optimal number of clusters.
Agglomeration Agglomeration methods grow clusters by merging smaller clusters.
The process begins with each vertex placed in a separate cluster. In each step a pair
of clusters is selected and merged. Greedy methods only merge pairs which increase
21
2 Graph Clustering
the global quality until no such pair is left. The merges are directed by a selector
that tries to predict the optimal cluster structure. As the method alone is unable
to correct wrong merges later, good predictors are necessary. These should lead to
a faithful cluster structure with a high quality. As side product the merge history
provides a hierarchical clustering.
One of the most used agglomeration methods is due to Ward [79]. He developed
a hierarchical clustering method to assign US Air Force personnel to tasks while
optimizing multiple objectives. By design agglomeration is very similar to the graph
coarsening used in multi-level graph partitioning algorithms [43, 78, 41]
Mark Newman was the first using agglomeration to directly optimize the modularity measure [60]. He called it greedy joining and selected clusters by the maximal
increase of modularity. Later the algorithm was accelerated by using sorted trees
for faster lookups [15]. Recently Wakita and Tsurumi [76] proposed a different implementation. Like in [17] they observed an unbalanced growth behavior with the
original selector. They introduced various consolidation ratios as relative size of
cluster pairs. Then as selection criterion the quotient of modularity increase and
consolidation ratio was applied.
Cluster Refinement Refinement methods improve an initial clustering by applying
local modifications. They are an ideal way to correct small errors in the clustering.
Generally refinement methods construct several similar clusterings from the current.
The most common type of such neighborhood is formed by moving single vertices into
other clusters. As the next clustering the best from this neighborhood is selected.
Greedy refinement accepts only new clusterings with a better quality than the
current. Thus the refinement ends in a local optimum where all clusterings of the
neighborhood have worse quality. Kernighan and Lin [45] proposed a very popular
method that tries to escape such local optima by accepting small sequences of slightly
worse clusterings. Fiduccia and Mattheyses [24] simplified this strategy by just
considering single vertex moves instead of constructing and moving whole groups.
Most of these methods were developed for minimum cut graph partitioning. In
the context of modularity clustering recently simple refinement methods based on
greedy and Kernighan-Lin refinement were proposed [57, 5, 69]. Unfortunately no
detailed study exists how the refinement perform in modularity clustering although
the objective is quite different from minimum cut partitioning.
2.4.3 Meta-Strategies
Finally meta-strategies combine construction methods from the previous subsection.
The idea is to exploit advantages of single components while compensating disadvantages. The strategies discussed in the next paragraphs are recursive subdivisions,
multi-level approaches, basin hopping, and genetic algorithms.
Recursive Subdivision and Hierarchical Clusterings Some algorithms can only find
a fixed number of clusters, for example [80, 81, 80, 58]. In order to find more clusters
and better clusterings the algorithms are applied recursively on each previously
computed cluster. Subdivisions into exactly two groups are called bisection and
22
2.4 Fundamental Clustering Strategies
first bisection
second bisection
(a) Recursive Bisection
(b) Hierarchical Clustering and Dendrogram
Figure 2.3: Recursive Subdivision and Hierarchical Clusterings. The red dashed line
marks the cut through the dendrogram. The corresponding clustering is drawn by
red circles.
an example recursive bisection is shown in Figure 2.3a. The recursive application
results in a hierarchical clustering.
Like agglomeration methods recursive subdivision is unable to correct errors.
Moreover the subdivision methods work with a fixed number of clusters. This imposes a structure on the clusterings which may hide the correct structure [57]. For
example a bisection of a graph with three clusters might falsely split one of the
clusters into two parts. Further bisections will not correct this.
Also other algorithms producing hierarchical clusterings exist. For example the
merge steps in agglomeration methods and the connectivity components of dissection
methods provide a cluster hierarchy. Such hierarchies are often drawn as a tree called
dendrogram. The root is a cluster containing all vertices and each leaf contains just
a single vertex.
To derive a normal, flat clustering a cut through this tree is searched as shown in
Figure 2.3b. The simplest variants select the level with best modularity [59]. More
elaborate implementations search for better clusterings by shifting the level up and
down in subtrees [63]. Of course this search is inherently limited by the underlying
tree.
Multi-Level Approaches Multi-level methods were developed to improve the search
range and speed of partition refinement methods by moving small vertex groups at
once. These groups and the connections between the groups are represented as a
coarse graph. In this graph it is sufficient for the refinement to move just single
coarse-vertices. Applying this idea recursively leads to multi-level refinement methods like [41, 78]. Karypis and Kumar [43] were the first proposing to use agglomeration methods to already optimize the clustering quality during the construction
of coarse graphs. However up to now no multi-level refinement algorithm exists to
directly optimize the modularity.
23
2 Graph Clustering
Basin Hopping The basin hopping of Massen and Doye [52] is a high-level search
algorithm. The algorithm proceeds by jumping between local optima. Like in simulated annealing a neighbor local optimum is accepted as new clustering based on
the metropolis criterion. These neighbor clusterings are found by first performing
some random vertex moves to leave the current local optimum. Then greedy refinement, called quenching, is applied to move into a (hopefully new) local optimum.
In summary this strategy combines unbiased search methods with strongly directed
local optimization.
Genetic Algorithms Another form of high-level randomization are genetic algorithms. These store a pool of candidate clusterings. New clusterings are created by
combining several old clusterings (crossing) or modifying a clustering (mutation).
The clusterings are rated according to a quality measure, which is called fitness in
this context. Just the best, fittest clusterings survive while old are removed.
For minimum cut load balancing Soper et al. [74, 71, 73, 72] developed a particularly good genetic algorithm called evolutionary search. In order to push the
pool to high-quality clusterings they improve the genetically generated clusterings
using greedy multi-level refinement. Crossing and mutation is implemented by temporarily modifying the edge weights of the graph based on the input clusterings.
Afterwards the quality of the refined clustering is evaluated on the original graph
and used as fitness of the clustering.
24
3 The Multi-Level Refinement Algorithm
This chapter presents the design of a family of algorithms for finding solutions to
the modularity clustering problem. The input is an undirected graph with weighted
edges and the algorithms search a clustering (partitioning) of the vertex set with
high modularity. Finding clusterings with maximum modularity is known to be NPcomplete [13]. Therefore the presented algorithms are heuristics with polynomial
bounded runtime and are unlikely to find global optima.
Nowadays quite often agglomeration heuristics are employed for modularity clustering. These operate by successively merging adjacent clusters until the clustering
quality cannot be improved further. Most of them share one strong disadvantage:
The merge decisions are based on insufficient local information but wrong decisions
are not corrected later.
On the other hand refinement heuristics have the potential to correct these merge
errors. In general refinement heuristics construct a neighborhood of problem solutions similar to the current solution and select the one with the best quality out of
these. A situation in which all neighbor solutions have worse quality is called local
optimum. In the context of graph clustering the problem solutions are clusterings of
the graph and local cluster refinement searches for clusterings with higher modularity by moving single vertices between clusters. Unfortunately all refinement methods
require an initial clustering and it is difficult to find good clusterings to start with.
Furthermore refinement is not effective in huge graphs where each vertex has only
a tiny effect on the modularity. Many very small movements would be necessary
to walk in and out of local optima. This makes it very unlikely to find other local
optima by walking through a series of low-quality clusterings.
Both refinement problems are tackled by the multi-level approach presented in the
next sections. It is based on the min-cut partitioning methods of Karypis [1] and
basically widens and accelerates the refinement search by moving whole groups of
vertices at once. These groups are constructed using agglomeration methods which
are presented the section about graph coarsening. They are similar to Newman’s
greedy joining [60, 15] but use other merge selection criteria. The fourth section explores the available refinement methods. Those are mostly derived from Kernighan,
Lin, Fiduccia, and Mattheyses [45, 24] but are adapted to the dynamic number of
clusters present in modularity optimization. The employed local operations are summarized in Figure 3.1. Merging clusters is exclusively used during graph coarsening.
And during refinement vertex movements are applied. The last section completes
the chapter with additional insight into key concepts of the implementation.
25
3 The Multi-Level Refinement Algorithm
move to other cluster
move to new cluster
merge clusters
Figure 3.1: Operations for local cluster modification. The operations move the
cluster boundaries, displayed as dashed lines. This changes which edges are in the
edge cut. These are highlighted by red lines.
input graph
refinement
coarsening
recursion or
initial clustering
Figure 3.2: The Multi-Level Scheme
3.1 The Multi-Level Scheme
As already mentioned the refinement heuristic requires a good initial clustering.
Unfortunately also the number of clusters is not known in advance. Assigning all
vertices to a big cluster cannot be good because the number of clusters is much
too small. Similarly assigning each vertex to a separate cluster starts with too
many and there is no good reason why a random clustering should be better. On
the other hand agglomerative clustering methods perform relatively well at finding
good clusterings.
To accelerate and widen the refinement the algorithm initially trades search range
against accuracy by moving whole groups of vertices at once (cf. Fig. reffig:alg:multilevel). Afterwards smaller groups or single vertices are moved for local improvements. The underlying data structures are unified by contracting these vertex groups
into a coarse graph. This approach is applied recursively to produce a hierarchy of
bigger groups and coarser graphs. Refinement is applied at each of these coarsening
levels beginning on the coarsest. The question of good vertex groups is coupled to
the initial clustering: The coarsest groups already are a clustering and can be used
26
3.1 The Multi-Level Scheme
as starting point. Inversely the coarsening can be interpreted as an agglomerative
method which regularly stores snapshots as coarsening levels.
Data: original graph,coarsener,refiner
Result: clustering
create level[1] from original graph;
for l from 1 to max levels do
// coarsening phase
clustering ← coarsener(level[l]);
create level[l + 1] from clustering;
reduce edge and vertex weights level[l] → level[l + 1];
lmax ← l;
if not successful then break;
clustering ← vertices of level[lmax ] ;
// initial clustering phase
for l from lmax to 2 do
// refinement phase
project clustering onto level[l − 1];
clustering ←refiner(clustering);
project clustering onto original graph;
Figure 3.3: The Multi-Level Algorithm
This approach is also shown in pseudo-code in Figure 3.3. It is known as multilevel partitioning and is a special form of the Algebraic Multi-Grid method (AMG).
To simplify the implementation the method is reformulated in three phases. This
makes it easier to centrally select the implementations used for the coarsener and
refiner. In the coarsening phase vertex groups are created by merging clusters beginning with singleton clusters until the number of groups is satisfyingly reduced.
From these groups the coarse graph is build. This is repeated until the graph cannot
be coarsened further. The initial clustering phase simply uses the vertices of the
coarsest graph as initial clustering. Finally during the refinement phase the clustering is successively projected onto the next finer graph and refined with vertex
movements until the original graph is reached.
Implementation Notes The vertex groups build during the coarsening phase are
clusterings. They are stored as a mapping of vertices to their current cluster. To
efficiently update this mapping on cluster merges the disjoint-set data structure with
path compression is used like in many union-find algorithms. Thus for all practical
graph sizes updates and lookups take constant time on average.
After each coarsening pass the contracted graph is generated by registering a
coarse vertex for each cluster. For each edge in the fine graph an edge between
the coarse vertices of their end-vertices is added if not already existing. With
the necessary edge lookups this costs O(|V | + |E|2 ), using a temporary search tree
O(|V ||E| log |E|). Because in practice this proved to be a significant difference the
implementation uses a tree for each vertex to quickly search in its incident edges.
Simultaneously a mapping of fine-graph vertices to their coarse vertex and fine
edges to their coarse edge is built. Structurally this is a graph homomorphism
respecting vertex adjacency. The homomorphism is stored together with the coarse
graph in level[l+1] and allows to transfer vertex and edge properties between both
27
3 The Multi-Level Refinement Algorithm
graphs. Transferring weights from the fine to the coarse graph requires to add up
the weights and is therefore called reduction or restriction. Some care is necessary
to transfer the weight of edges reduced to self-edges correctly because two directed
edges (one in each direction) are mapped to onto one self-edge. Later clusterings
are transfered back from the coarse to the fine graph by assigning all fine vertices to
the same cluster like their coarse vertex. This is called projection or prolongation.
Thanks to the stored homomorphism these transfers cost linear time in the size of
the finer graph, i.e. O(|V | + |E|).
3.2 Graph Coarsening
Graph coarsening is the first phase of multi-level refinement methods. Its task is
to group vertices together and build condensed, smaller graphs from these groups.
The overall aim is to find a good initial clustering, which is indirectly given through
the coarsest graph, and to build vertex groups that assist the refinement heuristics
later. This section presents the implemented coarsening algorithms.
The coarsening is accomplished in two steps: First an agglomeration heuristic
produces a clustering of the vertex set. Later with this the next coarsening level
is built by contracting all vertices of each cluster into a coarse-graph vertex. The
agglomeration heuristic successively merges pairs of clusters starting from one-vertex
clusters. Each merge decreases the number of future coarse-graph vertices by one.
The maximal number of merges is defined by the reduction factor multiplied with
the number of initial vertices. Thus ideally each coarsening pass reduces the number
of vertices by a constant factor leading to a logarithmic number of levels.
To find good coarse graphs the process is controlled by a selection strategy: Only
pairs of adjacent clusters increasing the modularity are considered for merges and
these pairs are further ranked by a selection quality. Various selection qualities are
discussed in the subsequent section about merge selectors.
The next subsections discuss the two implemented agglomeration methods called
greedy grouping and greedy matching. The first is based on Newman’s greedy joining [60]. Using matchings is a more traditional approach from multi-level graph
partitioning [43]. Following list summarizes all parameters controlling the coarsening. Their use is explained together with both algorithms in the next two sections.
coarsening method The agglomeration method: Either grouping or ”matching.
reduction factor Multiplied with the number of vertices. Gives the number of merge
steps to perform on each coarsening level. Indirectly defines the number of
coarsening levels.
match fraction Multiplied with number of good vertex pairs. Excludes worst ranked
pairs from the matching
selector The merge selection quality to use.
28
3.2 Graph Coarsening
Data: graph,selector,reduction factor
Result: clustering
merge count ← reduction factor ∗ num vertices(graph ) + 1;
while merge count > 0 do
// (a, b) ← selector best edge() in two steps:
a ← selector best vertex();
b ← selector best neighbor(a);
if found (a, b) then
clustering.merge(a,b), update graph and selector;
merge count ← merge count − 1;
else
break;
Figure 3.4: Coarsening Method: Greedy Grouping
3.2.1 Greedy Grouping
In each step greedy grouping searches the globally best ranked pair of clusters and
merges both without compromises. Figure 3.4 summarizes the algorithm in pseudocode. To quickly find the best ranked pair the clustering is also represented by a
temporary coarse graph. There each cluster is a vertex and the best neighbor of each
vertex is stored by the selector. Thus just the vertex with the best ranked neighbor
is searched and then its internally stored best neighbor is retrieved. The coarsening
ends if either no good pair is found or enough clusters were merged.
During each merge the best-neighbor information has to be updated. This requires to visit the neighbors of the two merged clusters for updating the selection
qualities between them. Thus the temporary coarse graph is used to cache merged
vertex and edge weights and the merged adjacency lists. Incrementally updating
this graph leads to complex data structures and interactions but still is much faster
than collecting the information in the original fine graph each time.
Implementation Notes Computing selection quality and updating the modularity
requires to access edge and vertex weights. With each merge these weights have to
be updated. This is achieved by informing the container data structures during the
merge operation about elementary operations like merging and moving edges. Additionally the merge selector is informed in order to update the best merge neighbor
like described in [76]: For finding the best merge neighbor of a vertex the selection
quality to all adjacent vertices has to be evaluated. But a new best neighbor has to
be searched only when the selection quality to the old best neighbor decreases. In
all other cases it is sufficient to compare the changed selection quality of a vertex
pair to the current best neighbor
With greedy grouping the next best merge pair has to be found repeatedly. This
is efficiently done with a binary heap stored in a supplementary array. The heap is
updated when the best merge neighbor of a vertex changes. In this case the merge
29
3 The Multi-Level Refinement Algorithm
1
merge
edges
1
2
a
1
1
b
move
edge
ab
1
1
new
self-edge
Figure 3.5: Merging two Vertices and their Edges
selector informs the heap about the new neighbor and selection quality and the
vertex is moved up and down through the heap into its correct position. Empiric
tests suggested that the runtime is already acceptable without the heap. Thus it is
disabled by default to reduce possible sources of error and regression.
A temporary coarse graph is maintained to store edge weights and adjacency
information for the merge selector. In this graph each cluster is represented by a
vertex. Merging two clusters equals merging their vertices and incident edges. This
is known as edge contraction and is the most expensive operation of each merge step
because the incident edges of both vertices have to be processed. Let the two involved
vertices be the to be removed and the target vertex. For example in Figure 3.5 the
vertex b is to be removed and vertex a is the target. Edges to a common neighbor
are merged by simply dropping the superfluous edge of the removed vertex. Edges
to a vertex not adjacent to the target vertex have to be moved to it.
The merge data structures proposed by Wakita and Tsurumi [76] were implemented. The outgoing edges of each vertex are stored in a double-linked list sorted
by the end-vertex index. Now both lists can be merged in one pass without searching the matching out-edges of the target vertex. Some special cases arise however:
Updating self-edges is difficult because their position is not predictable. Thus they
are always stored as the first out-edge. Unfortunately the original paper leaves open
how to correct the position of moved edges in the list of their end-vertices. The
lists should be sorted by the end-vertex index but moving an edge changes the index
from the removed to the target vertex. The simplest fix is to reinsert such edges into
the list, which costs linear time in the size of the list. Therefore worst-case runtime
O(|V |2 ) of this coarsening method is reached in case all |V | edges of the removed
vertex have to be moved to the target vertex without merging edges.
The overall worst case time complexity for a complete merge step is O(|V |2 ):
Selecting the best pair costs O(|V |) with a linear search or O(log |V |) using the
heap. The two clusters are merged in constant time using union-find. Combining
the edge lists requires quadratic time O(|V |2 ) in worst case. Updating the bestneighbor information also requires a linear search at each adjacent vertex in worst
case. This adds quadratic time O(|V |2 ). In addition updating the best pair heap
may cost up to O(|V | log |V |). However in sparse graphs the number of incident
edges often is much smaller than the total number of vertices. And in dense graphs
30
3.2 Graph Coarsening
nearly no edges require movement because they point to a common neighbor. Hence
the average merge time should be nearly linear instead of quadratic.
Related Work Clauset at al. use search trees to store the incident edges [15]. Each
edge of the removed vertex is removed from the tree of its end-vertex. Its entry
is searched in the list of the target vertex. In case target and end-vertex are not
already adjacent the edge is inserted into both trees. Each vertex can have at
most |V | adjacent vertices, thus merging two vertices with this method would cost
O(|V | log |V |) time in the worst-case.
Greedy grouping is structurally very similar to the minimum spanning tree problem
with the selection quality as edge weights. But most selection qualities change their
values dynamically depending on merge steps. Still some alternative techniques
without directly contracting edges but deferring some operations to later steps might
be applicable, cf. [27].
3.2.2 Greedy Matching
Data: graph,selector,match fraction,reduction factor
Result: clustering
foreach pair of adjacent vertices (a,b) do
// collect possible merges
if merging a, b would improve quality then
sq ← selector quality(a,b);
add (a, b, sq) to pair list;
num pairs ← num pairs + 1;
sort pair list by decreasing sq;
good = match fraction ∗ num pairs + 1;
merge count = reduction factor ∗ num vertices(graph ) + 1;
foreach pair (a, b) in pair list do
if a and b not marked then
mark a and b;
merge clusters of a and b;
merge count ← merge count − 1;
good ← good − 1;
if merge count = 0 ∨ good = 0 then break;
Figure 3.6: Coarsening Method: Greedy Matching
The greedy matching method operates by first constructing a matching, i.e. a
list of independent vertex pairs, and the next coarsening level could be directly
generated from these pairs. The pseudo-code is shown in Figure 3.6. In the first
phase all pairs of adjacent vertices are collected and sorted by their selection quality.
Pairs not improving modularity are filtered out in advance. This list is processed
beginning with the best ranked pairs. Merged vertices are marked and pairs with
31
3 The Multi-Level Refinement Algorithm
at least one already marked vertex are ignored. When a pair of unmatched vertices
is found they are merged.
Unfortunately the matching also selects many bad ranked pairs even when their
vertices have much better but already matched neighbors. Especially at the end of
a long coarsening pass bad pairs would be grouped together just because all good
pairs are already used up. The purpose of the ”good” counter is to avoid this by
considering only a fraction of the best ranked pairs.
The matching method also has a second downside: It is possible to fail the reduction aim when most vertices have only few common merge partners which are
already matched. For example if a vertex has many neighbor vertices which have
no further neighbors but this central vertex, many not independent merges are necessary. In that case many coarsening levels will be produced. This situation occurs
also in praxis for example with graphs which feature a power-law vertex degree
distribution [1].
Constructing the matching adds some complexity to the algorithm compared to
the simple looking greedy grouping method. But thanks to the independence of the
merged pairs no updated selection qualities are required and the dynamic coarsegraph can be omitted in favor of static data structures. However for simplicity this
implementation reuses most data structures from the grouping method.
Producing all coarsening levels has time complexity O(log |V | |E| log |E|) on wellstructured graphs: There are around log |V | coarsening levels and in each level the
edges have to be sorted by the selection quality which costs O(|E| log |E|). The
actual merge operations are done in constant time because no complicated updates
are necessary.
Related Work For a theoretical analysis of approximation guarantees other matchings than the implemented greedy strategy might be more useful. For example
Monien et al. [54] developed more elaborate greedy matching strategies for min-cut
partitioning and were able to derive approximation factors. Also optimal matching
are computable in polynomial time using Edmond’s algorithm [55].
3.3 Merge Selectors
The merge selector is used by coarsening algorithms to select cluster pairs for merging. The clusters and their relation are encoded in a coarsened graph in which each
cluster is represented by a vertex. Each edge in this graph (except self-edges) together with its two end-vertices embodies such a candidate merge pair. The merge
selector ranks these pairs by computing a selection quality and pairs with high selection quality are selected first.
The selector’s task is to lead to a good hierarchy of coarse graphs supporting the
subsequent cluster refinement. In a good hierarchy the coarsest graph provides an
initial clustering of high modularity. Secondly all coarsening levels shall provide
evenly grown vertex groups with relatively similar influence on the modularity of
the clustering. Otherwise the refinement will have only move options changing the
modularity either too much or too weak which would obstruct the search.
32
3.3 Merge Selectors
Extent
Name
Description
local
local
semi-global
modularity increase
weight density
random walk distance (t)
semi-global
random walk reachability (t,i)
global
spectral length
global
spectral length difference
global
spectral angle
highest increase of modularity
highest density between both clusters
best neighborhood similarity
with t random walk steps
highest weight density with reachability of neighbor vertices as weight;
with t steps and i iterations
approx. best modularity contribution in
spectral decomposition
approx. highest modularity increase in
spectral decomposition
most similar direction of the spectral
vertex vectors
Table 3.1: Proposed Merge Selection Qualities
Both aims are pursued by trying to predict which vertex pairs lie in the same
cluster compared to the unknown optimum clustering. In this setting the selection
qualities are the predictors. Their prediction quality (success probability) can be
roughly measured by comparing the initial ranking of pairs against the best known
clustering. The second aim may be fulfilled by proposing pairs well spread over
the graph and not concentrated in a single region. But other factors may be more
important, like for example the question when merge errors (merging vertices of
different optimum clusters) occur during the coarsening. Ultimately the only reliable
measure of success are the modularity clustering qualities reached after coarsening
and after refinement.
The presented selection qualities are classifiable by the extent of incorporated
information into local, semi-global, and global measures. Local measures use only
information about the two adjacent vertices and their connecting edge. Semi-global
measures try to incorporate more information from the neighborhood of both vertices
and global measures consider the topology of the whole graph at once.
Non-local selectors can differ in how they are updated during merge steps. Dynamically updated selectors recompute the selection qualities after each merge, whereas
static selectors just use simple approximations on the intermediate coarse graphs.
All presented selection qualities are static. The key properties are computed once at
the beginning of each coarsening pass. During single merge steps values are merged
and updated just locally.
Table 3.1 summarizes the proposed selection qualities. They are discussed in more
detail in the following sections beginning with local selectors followed by random
walks and spectral methods. Each of these sections is subdivided into a short introduction of the key ideas, presentation of the theoretical background and concludes
with the discussion of derived selection qualities.
33
3 The Multi-Level Refinement Algorithm
3.3.1 Local Merge Selectors
Studying the change of modularity during merges in the context of greedy agglomeration leads to a constraint depending on local properties. Two selection qualities modularity increase and weight density are derived from this constraint. The
following sections present the theoretical background and discusses both selection
qualities.
Theoretical Background Like mentioned earlier clusters are represented by temporary coarse vertices. As only adjacent clusters are considered for merging, both
vertices of each merge pair are connected by an edge. Therefore the available local
information about each pair is the edge weight f (a, b), the volume Vol(a, b) and the
expected edge weight ρ(V ) Vol(a, b). In addition properties of both clusters like the
interior edge weight f (a, a) and the interior volume Vol(a, a) are locally available.
Merging two clusters a, b changes the modularity Q(C) to Q(C 0 ) and the greedy
agglomeration requires increases quality with Q(C 0 ) > Q(C). Inserting the modularity definition((unnormalized: simpler to read and ranking equivalent)) Q(C) =
Intf (C) − ρ(V ) IntVol (C) gives the constraint f (a, b) − ρ(V ) Vol(a, b) > 0 because the
merge adds f (a, b) to the interior edge weights. As visible this constraint depends
just on local properties an may be used to derive selection qualities.
Modularity Increase In order to reach a good final modularity it sounds reasonable
to merge the pair of highest quality increase Q(C 0 ) − Q(C) in each step. The
calculation is simplified by removing the clusters where nothing changes. This leaves
the contribution of both clusters before and after the merge. These differ just by
the removed edge weight and volume between both. Thus the modularity increase
is:
Q(C 0 ) − Q(C) = f (a, b) − ρ(V ) Vol(a, b)
modularity
(3.1)
increase
This selection quality is also a form of the greedy constraint from above and
was already proposed by Newman [60] for his agglomerative clustering algorithm.
Afterwards several authors (including himself) observed problems with the growth
behavior triggered by this selector and proposed various fixes [17, 76].
The dynamic behavior can be described as follows: The cluster of the largest
contribution to modularity is merged with its best neighbor. This further improves
the strength of this cluster and the improvements possible by merges with small
neighbors. The process goes on until all good neighbors of the optimum clustering
are used up but then continues with bad neighbors until the influence of this huge
cluster is low enough again to leave other merge pairs a chance. The whole cycle
repeats several times producing clusters with an exponential distribution of the
cluster size.
The proposed corrections involve scaling the selection quality by the inverse cluster
sizes to prefer merging either small clusters or pairs of similar size. But the real
problem are the merge errors introduced in the second phase of shrinking influence
and in many situations vertices and clusters belong together in spite of their very
34
3.3 Merge Selectors
different size. Therefore these modifications were not implemented in favor to the
selection quality presented in the next section.
Weight Density The greedy constraint for merge pairs was f (a, b)−ρ(V ) Vol(a, b) >
0. Selectors are derived from such inequalities by using the left side as selection
quality and the right side as filter condition. As the greedy filtering is already done
for all selectors it can be ignored here. Other selection qualities are obtained by
transforming the inequality.
Adding the expected edge weight on both sides yields f (a, b) > ρ(V ) Vol(a, b).
This resembles heavy edge matching [43] and ignores the volume model completely.
Hence it most probably is not suitable for modularity clustering. But now dividing
the constraint by the volume Vol(a, b) results in:
f (a, b)/ Vol(a, b) > ρ(V )
(3.2)
This suggests to select pairs by highest inter-density and is implemented as the
weight density selector. A nice advantage of this selector is its consistency with the
overall density-based approach.
In each merge step these local selection qualities are recalculated from the merged
edge and vertex weights. The weights of merged edges are summed and the coarsevertex weight is the sum over both vertices. From the definition follows that only
the values to adjacent vertices can change.
3.3.2 Random Walks
Additional to using direct information from the vertices also their neighborhoods can
be compared. Ideally the neighborhoods of vertices in the same optimum cluster
are more similar than of vertices in different clusters because more edges are in
clusters than between clusters. But the unweighted, graph-theoretic neighborhood
ignores edge weights. In extreme cases the neighborhood is the whole graph and
thus provides no information.
Note that already the assumption of more edges lying in optimum clusters than
between does not always hold because with density-based clustering only the difference to the expected edges matters. A cluster border could be along a cut where the
absolute number of edges is higher than everywhere else but still much lower than
expected.
A better alternative is to compute weighted neighborhoods using random walks.
Two random walks are started at both vertices and aborted after a small number of
steps. The probabilities to visit another vertex are computed deterministically and
result in two probability distributions. Basically the selection quality is then defined
as Euclidean distance of both distributions. The random walk distance selector is
based on this idea refined by ideas from Latapy and Pons [64] as described further
below.
On the other hand the random walk reachability selector based on [40] does not
directly compare the neighborhoods. It uses random walks to measure how well a
vertex is reachable from another one walking over short paths. This is driven by the
35
weight density
3 The Multi-Level Refinement Algorithm
P t−1 (v)p(v, v)
w1
v
w2
P t−1 (w1 )p(w1 , v) P t−1 (w2 )p(w2 , v)
Figure 3.7: Contribution of Neighbor Vertices to the Visit Probability
assumption that vertices of the same optimum cluster are connected over several
paths while only few paths connect vertices of different clusters.
p(u, v), Put (v)
Theoretical Background This paragraph introduces a formal definition for random
walks and discusses basic properties. The most important one is the connection
between the connection density from Section 2.3.3 and the convergence of random
walks: Random walks converge to a stationary distribution of visit probabilities.
But the so called mixing rate of this convergence is coupled to the minimum scaled
cut. This is the lowest inter-cluster density over all bisections of the graph and a
low density relative to the global density leads to slow convergence. Thus it can
be deduced that random walks pass local barriers of lower density less often than
high-density areas.
Unfortunately this connection is far from ideal. Firstly the results apply just to
the degree multiplicity volume model. There is no apparent way to adapt random
walks to other volume models. And finally the minimum inter-density (scaled cut)
is not the modularity quality measure (shifted cut). The optimum clusterings of
both may differ significantly.
A random walk starts at a vertex and walks from vertex to vertex by randomly
choosing a neighbor. The transition probability for moving from vertex u to v is proportional to the edge weight connecting both vertices, i.e. p(u, v) = f (u, v)/ deg(u).
Instead of actually performing walks just the probability to visit certain vertices
is calculated. Let Put (v) be the probability of visiting vertex v after t steps starting
from u. It is calculated
incrementally from the initial distribution Pu0 (v) = δu,v with
P
t
t−1
steps Pu (v) = w∈V Pu (w)p(w, v).
For example Figure 3.7 depicts the contributions of neighbor vertices to the visit
probability of the central vertex v. Edges leaving the vertices in other directions are
not shown.
The visit probabilities are not symmetric, i.e. Pvt0 (vt ) 6= Pvtt (v0 ). But both are
related by the start and end vertex degrees with deg(v0 )Pvt0 (vt ) = deg(vt )Pvtt (v0 ).
To show this relation the visit probability is expanded into single steps:
Pvt0 (vt ) = p(v0 , v1 ) · · · p(vt−1 , vt )
(3.3)
The single transition probabilities contain edge weights which are symmetric
and factors 1/ deg(vi ) which can be shifted to the previous transition. The ratio
deg(vt )/ deg(v0 ) accounts for the inserted and the removed factor at both ends.
36
3.3 Merge Selectors
The probability distribution P t is also a vector over all vertices. In this context
a random walk step is the matrix vector product P t = M P t−1 with the transition
matrix Mv,w = p(w, v). The distribution P ∗ (v) = deg(v)/(2f (V ))P
is the unique sta∗
∗
∗
tionary distribution with P = M P . The proof uses M P (v) = w P ∗ (w)p(w, v).
Inserting
the definitions the vertex degrees deg(w) are canceled out. This leaves
P
f
(w,
v)/(2f
(V )) which equals P ∗ (v) thanks to the symmetry of the edge weights.
w
In non-bipartite, connected graphs long random walks converge against the stationary distribution regardless of the starting point. Thus the visit probability of a
vertex becomes proportional to its degree. The mixing rate measures the speed of
convergence to this limit.
The evolution of the probability distribution can be expressed in terms of the
symmetric matrix N = D−1/2 M D1/2 using eigenvalue decomposition. The eigenvalues are 1 = λ1 > λ2 ≥ . . . ≥ λn > −1 with eigenvectors v1 , . . . , vn . Then the visit
probability in spectral form is:
s
n
X
deg(j)
Pit (j) = P ∗ (j) +
λtk vk,i vk,j
(3.4)
deg(i)
k=2
If all eigenvalues are near zero, i.e. max(|v2 |, |vn |) is small, the second term quickly
disappears. In that case the distribution is converging fast and the mixing rate is
high.
The next step uses a known relation of the eigenvalue gap 1 − λ2 and graph
conductance. The conductance of a bisection S, V − S of a graph G = (V, E) is
defined as:
Φ(S) :=
f (S, V − S)
2f (V )P ∗ (S)P ∗ (V − S)
(3.5)
The numerator is the probability of switching between both sets using a random walk started on the stationary distribution. And the denominator is the same
probability in a sequence of independent randomly chosen vertices. Inserting the
definition of P ∗ and using the clustering C = S, V − S this formula equals:
Φ(S) =
Cutf (C)
ρ(V ) CutVol (C)
(3.6)
The conductance of a graph is defined as the minimum over all possible bisections.
This however is nothing else than the minimum normalized inter-density, which is
also called scaled cut:
Φ = min ρ(Cut(C))/ρ(V )
C
(3.7)
Now the discrete version of Cheeger’s inequality from differential geometry states
that Φ2 /8 ≤ 1 − λ2 ≤ Φ. For a proof see [49]. Therefore low inter-density leads to
a small eigenvalue gap and thus to slow mixing.
In bipartite graphs the vertices are divided into two groups and all edges run between both groups. Thus with each step also the visit probability completely swaps
between both. This is also visible in the spectral form where the most negative eigenvalue λn is −1. Taking two adjacent vertices their visit probability distributions will
37
stationary distribution P ∗
3 The Multi-Level Refinement Algorithm
not overlap for a fixed number of steps. Hence to obtain comparable
distributions
Pt
i
≤t
the probability to visit a vertex in at most t steps Pu (v) = i=1 Pu (v) is used like
in [40].
Random Walk Distance Latapy and Pons [64] propose short random walks to define a measure of vertex distance. They postulate that if to vertices lie in the same
community (optimum cluster) the probabilities of reaching each other Put (v), Pvt (u)
are high and both vertices have a similar neighborhood with Put (w) ' Pvt (w). Thus
a low distance is expected between both distributions. They observe from the stationary distribution that high degree vertices are preferred by random walkers. To
remove this influence they propose following Euclidean distance weighted by deg(w):
RW distance
d(u, v) =
s
X (Pu (w) − Pv (w))2
w
deg(w)
(3.8)
To account for bipartite situations the implemented random walk distance selector
uses d≤3 (u, v), the distance of the sums over walks of length at most 3.
During merge operations the distance of merged edges is updated by taking the
maximum. This is also known as complete-linkage in the literature. It corresponds to
using the largest distance between all vertices of both vertex groups and is consistent
with the aim to avoid merge errors. The extreme opposite for example is using the
shortest distance (single-linkage). Here it is possible to merge very distant vertices
by chaining a sequence of intermediate vertices of much lower distance.
Random Walk Reachability The random walk reachability selection quality is similar to the weight density f (a, b)/ Vol(a, b) but uses modified edge weights. Random
walks are used to compute weights which become weaker for edges crossing lowdensity cuts and stronger for edges in high-density groups. Its assumed that vertices
in the same optimum cluster are reachable over many paths. And thus as starting
point the probability to visit one end-vertex starting from the other end-vertex without returning back to the start vertex is used. This is very similar to the escape
probability of Harel and Koren [40].
To obtain actual edge weights these probabilities are scaled with the start vertex
degree. Applying the weighted symmetry of inverse paths yields:
RW reachability
rt (u, v) =
deg(u)Put (v) + deg(v)Pvt (u)
2
(3.9)
Thus after the first step the weight r1 (u, v) = deg(u)f (u, v)/ deg(u) equals the
original edge weight. Additional random walk steps will add weight to the edge
proportional to the strength of alternate paths to this neighbor. The paths returning
back to the start vertex are suppressed by setting Pu≥1 (u) = 0 in each step. To
account for local bipartite situations again the probabilities are summed over a
small number of steps with r≤t (u, v). Because this reachability is so similar to
normal weights the whole calculation can be applied to the result again leading to
a reinforcing feedback [40].
38
3.3 Merge Selectors
As selection quality the density rt,i (u, v)/ Vol(u, v) is used. Here rt,i is the weight
from i iterative applications of the reachability computation with random walks of
at most t steps. During merging vertices the edge weights rt,i of merged edges are
summed and the densities of all incident vertex pairs are recomputed.
Implementation Notes The visit probabilities Pu≤t (v) are computed separately for
each source vertex u using two vectors storing Put (v) and the new Put+1 (v) of each
target vertex v. A random walk step requires to iterate over all edges and transfer
a portion of the visit probability from the edge’s start-vertex in the old vector to
the end-vertex in the new vector. The selection qualities are computed from the
intermediate vectors between the steps. In order to avoid expensive memory management the two vectors are reused by swapping source and target vector between
each step.
The worst case time complexity for the reachability selector is O(|E||V |it) as for
each vertex and each step all edges have to be visited. The implementation tries
to improve the speed by only processing edges when their start-vertex has a nonzero visit probability. Thus the computation time depends on the typical size of
neighborhoods. Still on complete graphs the worst case is reached after one step.
The complexity of the random walk distance selector is even worse. Here the worst
case is O(|E|2 |V |t) as for each pair of adjacent vertices two probability vectors have
to be computed. On large graphs it is impossible to store and reuse these vectors
for all vertices. Hence the implementation tries to safe time by reusing the vector
of one vertex and processing all neighbor vertices in a row.
3.3.3 Spectral Methods
Random Walks as described above collect semi-global information about the densitystructure of the graph. But the relation to modularity clustering is very indirect and
holds just for one specific volume model. Some applications may require other volume models and random walks do not work well on some kinds of graphs. Thus the
search for non-local selection qualities which are fully compatible to the modularity
continues.
The approach of this section is based on the spectral methods described by Newman [57]. The modularity computation is rewritten in a matrix-vector form and
the matrix is replaced by its decomposition into eigenvalues and eigenvectors. The
eigenvectors of the strongest positive eigenvalues are then used to define vertex vectors. These describe the contribution of each vertex in a higher dimensional space
where the modularity is improved by maximizing the length of the vector sum in
each cluster.
The eigenvalues and eigenvectors are calculated at the beginning of each coarsening level. Given the vectors of two adjacent vertices the spectral length and spectral
length difference selectors analyze the length of vertex vectors. On the other end the
spectral angle selector uses the directions of these vectors. The following subsections
present the mathematical derivation of the vertex vectors and conclude with a more
detailed description of the selection qualities and implementation notes.
39
3 The Multi-Level Refinement Algorithm
x̃u
X̃1
X̃2 = x̃u + x̃v
x̃v
Figure 3.8: Spectral vertex vectors and two cluster vectors
Theoretical Background The modularity can also be described as sum over all
vertex pairs using the Kronecker delta δ(C(u), C(v)) as filter with δ(C(u), C(v)) = 1
only for pairs of the same cluster and zero elsewhere. This representation of the
modularity is:
X f (u, v) Vol(u, v) Q(C) =
−
δ(C(u), C(v))
(3.10)
f (V )
Vol(V )
u,v
modularity
matrix
The inner part of the sum is replaced by the modularity matrix Mu,v = f (u, v)/f (V )−
Vol(u, v)/ Vol(V ). And for each cluster i ∈ C the delta is separately replaced by
a binary vector si over all vertices with [si ]v = δ(C(v), i). The contribution of the
cluster i ∈ C thus is sTi M si . The spectral decomposition of the modularity matrix
is M = U DU T using the matrix of eigenvectors U = (u1 | . . . |un ) and the diagonal
matrix of eigenvalues D = diag(λ1 , . . . , λn ). This is inserted into the modularity
formula and the calculation is separated for each eigenvalue:
Q=
X
i∈C
sTi M si =
n
X
XX
(U T si )T D(U T si ) =
λj (uTj si )2
i∈C
(3.11)
i∈C j=1
P
The previous step exploits that uTj si = v∈Ci Uv,j is a scalar. As next step the
eigenvalues
shall be moved into sum contained in the squares by using the squarep
roots (λj ). Unfortunately this is difficult with negative eigenvalues. Thus the
eigenvalues and the inner sum are split up into positive and negative eigenvalues.
Without loss of generality λ1 ≥ · · · ≥ λp > 0 ≥ λp+1 ≥ · · · ≥ λn and thus:
Q=
X
i∈C
vertex vector
xv , yv


p
n
X
X
X
p
p
 ( λj uTj si )2 −
( −λj uTj si )2  =
||Xi ||22 − ||Yi ||22
j=1
j=p+1
(3.12)
i∈C
The last step replaced the sums by squared Euclidean norms of the cluster vectors Xi and Yi . This is achieved by moving the
further down into the
p eigenvalues
T
eigenvectors forming vertex vectors xv = (. . . , λj Uv,j , . . .) with j = 1, . . . , p. The
vertex vectors yv are analogously constructed from all negative eigenvalues P
and their
eigenvectors.
The
former
scalar
product
with
s
is
transformed
into
X
=
i
i
v∈Ci xv
P
and Yi = v∈Ci yv respectively.
Therefore the cluster vectors Xi are formed by adding the vectors xv of all vertices
contained in the cluster. It becomes visible that the modularity is maximized by
grouping vertices together with maximal Xi and minimal Yi lengths. In the vertex
40
3.3 Merge Selectors
vectors the eigenvectors are weighted with the square-root of their eigenvalues. Thus
only eigenvectors with a large eigenvalue have a significant influence. Moreover it
is assumed that just using the positive eigenvalues suffices for the maximization.
Thus all negative and weak positive eigenvalues are dropped from now on. This
is controlled by the two parameters in the list below. The resulting approximate
vertex vectors are denoted by x̃v . An example of the relation between vertex and
cluster vectors is shown in Figure 3.8.
x̃v , X̃v
spectral ratio Multiplied with the largest eigenvalue. Defines the lower limit for
accepted eigenvalues.
spectral max ev The maximal number of eigenvalues and -vectors to compute.
Another important property besides the length of the vertex and cluster vectors
is their direction because in the optimum clustering the cluster vectors have to be
orthogonal. This follows from the increase of vector length (and modularity) when
merging two adjacent clusters with vectors which are separated by an angle below
90 degree. Thus in m dimensions at most m + 1 clusters are representable. Hence
in practice it is necessary to compute as many eigenvectors as expected clusters.
Spectral Length and Length Difference From the spectral analysis of the previous
section follows that P
the modularity is improved by maximizing the length of each
cluster vector X̃i = v∈Ci x̃v . The first idea was to select cluster pairs a, b with the
longest vector sum ||Xa + Xb ||2 . Hence this selection quality is called spectral length.
spectral length
On second sight this selection quality is related to the modularity increase selector
as was already pointed out by Newman in [57]. The modularity increase in spectral
decomposition is the contribution of the new cluster ||Xa + Xb ||22 − ||Ya + Yb ||22 minus
the previous contribution of both clusters ||Xa ||22 + ||Xb ||22 − ||Ya ||22 − ||Yb ||22 . Using
just the positive eigenvalues results in the spectral length difference selector:
2X̃aT X̃b = ||X̃a + X̃b ||22 − (||X̃a ||22 + ||X̃b ||22 )
(3.13)
The relation of both selection qualities to the modularity increase selector also
inherits its disadvantages in the growth behavior. Looking at a long vector an
adjacent long vector is ranked higher than all short vectors even when pointing into
a nearly orthogonal direction. Since the vector length roughly correlates with the
vertex degree, high-degree vertices will be grouped together early leading to merge
errors. Measuring the change in vector length instead reduces this domination but
is nevertheless similar to the modularity increase selector. It is not directly obvious
why dropping negative and weak eigenvalues should provide a better insight into the
selection problem.
Spectral Angle Instead of the misleading vector length the direction of the vectors
can be used. The spectral angle selector prefers pairs of adjacent clusters where the
cluster vectors have the same direction, i.e. small angles. The angle is computed
using the cosine:
41
length difference
3 The Multi-Level Refinement Algorithm
cos(X̃a , X̃b ) = X̃aT X̃b /(||X̃a ||2 ||X̃b ||2 )
spectral angle
(3.14)
A conversion into angles is not necessary because orthogonal vectors have zero
cosine and small angles are near the value 1, which suffices for the ranking.
Implementation Notes For the computation of the eigenvalues and eigenvectors
the C++ interface of ARPACK [48] is used. Since the modularity matrix is symmetric and only the largest eigenvalues are to be searched it suffices to provide an
implementation for the matrix-vector product M x.
The modularity matrix M itself is dense and would require O(|V |2 ) multiplications per evaluation. Fortunately the product can be computed much faster by
exploiting the inner structure of the matrix and the multiplicity volume model:
M = f (V )−1 A + Vol(V )−1 wwT with the adjacency matrix Au,v = f (u, v) and the
vector of vertex weights w. For example in the degree multiplicity model the weights
are w(v) = deg(v) and [wwT ]u,v = deg(u) deg(v) = Vol(u, v). The required matrix
vector product thus becomes M x = f (V )−1 Ax + Vol(V )−1 w(wT x). The adjacency
matrix is sparse when the graph is sparse and the volume term reduces to a scaled
vector because wT x is a scalar. Therefore just O(|E| + 2|V |) multiplications are
required.
All vertex vectors have the same size and thus are stored in a simple memory
pool without any overheads. On the outside a specialized mapping from vertices to
the position of their vector is used. It transparently wraps the vectors into more
convenient temporary objects on access.
On merges the vectors of both coarse vertices are simply added. Then the selection
qualities of all incident edges are recalculated.
3.4 Cluster Refinement
Cluster refinement is the key ingredient of the third multi-level phase. Its task is
improving the initial clustering by searching for better clustering using sequences of
local modifications. This section explores the design space of refinement methods
and discusses implemented algorithms.
Two fundamental types of refinement can be distinguished: Greedy refinement
only allows operations improving the quality of the clustering. On the other side
search refinement cross phases of low-quality clusterings in the hope to find better
clusterings later. A clustering in which a chosen greedy heuristic cannot improve
the quality further is called local optimum. A local optimum with the best quality
of all local optima is called global optimum. More than one global optimum may
exist.
Searching global optima commonly faces some problems. Again the NP-completeness
of modularity clustering hides in these problems and enforces that not all are solvable. Following observations were identified from the literature [22, 77, 52]:
foothill problem The search gets stuck in a low-quality local optimum like it is
always the case with pure greedy heuristics. Some mechanism is necessary to
42
3.4 Cluster Refinement
leave the local optimum, for example sequences of quality decreasing moves or
crossing different locally optimal clusterings.
basin problem Many low-quality local optima in a bigger valley with better optima
being far away. In order to reach them several local optima of low-quality
have to be passed. This is also known as funnels in the context of simulated
annealing and similar methods.
plateau problem The search space has nearly the same quality everywhere except
for some very local peaks. Thus there is no local information about the right
direction of search and most heuristics would be as good as random search.
Monte-Carlo methods or a mathematical derivation of search directions might
help here.
ridge problem The direction of motion in the search space is different to the true direction of quality improvements. This might happen for example with heuristic selection criteria as a trade off to computational efforts. Also heuristics
for leaving local optima can exhibit this property. This may be detected by
checking with other selection criteria.
multi-level reduction problem The global optimal clustering of the original graph
is most likely not a local optimum in coarsened graphs because it is not representable when vertices of different clusters are merged together. Inversely
a global optimum of a coarse graph not necessarily leads to the real global
optimum of the original graph after projection and refinement. Improvements
might be possible by considering multiple candidate clusterings or applying
several multi-level passes with different coarsening hierarchies.
This leads to apparently conflicting objectives. As there is no reason to accept
obviously improvable clusterings it is necessary to fully exploit local optima without
stopping half way. On the other hand exploring a wider range of local optima requires to leave local optima and even avoid already explored optima. Therefore the
first important step towards reliable refinement algorithms is the design of greedy
refinement and search algorithms. Later search and greedy refinement may be combined to effectively explore a wide search space without missing local optima.
The next section explores the design space of refinement methods based on local
operations like moving single vertices. After classifying concrete algorithms the second section discusses the implemented greedy algorithms. The third section presents
the adaption of the well-known Kernighan-Lin refinement to modularity clustering.
3.4.1 Design Space of Local Refinement Algorithms
This section explores components of simple refinement strategies with just two basic
operations: moving a single vertex to another cluster, and moving a vertex into
a new cluster. The analysis is based on often used implementations like greedy
refinement [44], Kernighan-Lin refinement [45, 24, 58], simulated annealing [39, 38,
68] and Extremal Optimization [23, 11, 10] while looking for common components
and differences.
43
3 The Multi-Level Refinement Algorithm
Substantial differences to the classic k-way partition refinement1 exist: The number of clusters is not predetermined and no balance constraints on their size are
given. Instead determining the right number of clusters has to be achieved by the
algorithm. Clusters may be created and emptied as necessary. The modularity
clustering quality is more complex than the minimum edge cut because the volume
introduced global dependencies. Due to the multi-level approach it suffices to move
single vertices instead of constructing vertex groups like in the original KernighanLin algorithm.
The algorithms are dissected into the components listed below. The next paragraphs discuss these components in more detail and comment on their dependencies.
The section concludes with an overview how basic refinement algorithms are fit into
this design space.
Vertex Ranking Which vertices should be selected for movement?
Target Ranking To where should selected vertices be moved?
Search Mode Which vertices and targets are evaluated before selecting one of them?
Ranking Updates Is the vertex ranking updated after each move or fixed?
On Modularity Decrease What to do with selected moves not improving modularity?
maxmod
Vertex and Target Ranking The decision which vertices are moved to which cluster
is split into two aspects. The vertex ranking directs the selection of vertices and the
target ranking selected the cluster. The vertex ranking is used as a predictor similar
to the merge selector in the coarsening phase and should direct the refinement into
good, modularity-increasing directions.
Given vertex v, its current cluster C(v), and a target cluster j ∈ C ∪ {∅} the
change of modularity ∆v,j Q(C) = Q(C 0 ) − Q(C) is calculated as shown in the
equation below. The cluster ∅ is used for new, empty clusters. By definition edge
weights and volumes involving this cluster are zero. The maxmod vertex ranking
sorts the vertices by their best increase of modularity maxmod(v) = maxj ∆v,j Q(C).
The single implemented target ranking is the maxmod ranking which selects the
neighbor cluster with maximal increase of modularity arg maxj ∆v,j Q(C).
f (v, Cj − v) − f (v, C[v] − v) Vol(v, Cj − v) − Vol(v, C[v] − v)
−
(3.15)
f (V )
Vol(V )
f (v, C[v] − v) Vol(v, C[v] − v)
∆v,∅ Q(C) = −
+
(3.16)
f (V )
Vol(V )
∆v,j Q(C) =
The first term depends just on the adjacent vertices as shown in Figure 3.9.
To compute the edge weights f (v, Cj − v) all incident edges of v are visited and
their weight is summed in an array grouped by the cluster of the end-vertex. Thus
1
44
k equal sized clusters, minimum edge cut
3.4 Cluster Refinement
moving v from C(v) to j
current cluster:
f (v, C[v] − v)
v
target cluster:
f (v, Cj − v)
other clusters:
no change
Figure 3.9: Dependencies when Moving a Vertex
computing the maxmod ranking costs linear time in the number of incident edges
and the number of clusters. In average this is O(|E|/|V | + |C|).
As visible the change of modularity depends through the volume term on the
global clustering structure and changes with nearly each vertex move: The volume
is computed according to the volume model with Vol(v, Cj − v) = (w(v) − 1)w(Cj ).
The latter value is the sum over all vertex weights in the cluster. To evaluate the
volume in constant time this value is stored per cluster and iteratively updated
each time a vertex moves out or into the cluster. This global dependency is the key
point which makes modularity refinement more expensive then the standard min-cut
refinement.
Instead of ranking vertices by the expensive to compute modularity increase good
candidate vertices may also be identified by their current contribution to the modularity f (v, C[v]) − ρ(V ) Vol(v, C[v]). Vertices with a low or negative contribution
have a good chance of not belonging to their current cluster. Moving a vertex does
not modify the contribution of its self-edge. Therefore omitting self-edges leaves
mod(v) = f (v, C[v] − v) − ρ(V ) Vol(v, C[v] − v). This computation can be done in
constant time by storing f (v, C[v] − v) for each vertex and iteratively updating it
when adjacent vertices are moved.
Based on the contribution to modularity several vertex rankings were derived. It
can be assumed that the placement of high-degree vertices has a higher influence on
the modularity. To suppress this effect the modularity contribution can be divided
by its degree. Or to stay consistent with the density model, it can be scaled by
the inter-volume between the vertex and its cluster. Since these vertex rankings are
structurally and semantically similar to the vertex fitness used in extremal optimization [23] they are called fitness measures. In contrast to other vertex rankings here
lower values are preferred. In summary these are:
fitness
mod-fitness The modularity contribution mod(v) = f (v, C[v]−v)−ρ(V ) Vol(v, C[v]−
v).
eo-fitness The vertex fitness used in extremal optimization eof(v) = mod(v)/ deg(v).
density-fitness The density of connections to the current cluster ρ(v, C[v] − v) =
f (v, C[v]−v)/V ol(v, C[v]−v). This is ranking equivalent to mod(v)/[deg(v) deg(C[v]−
v)] in the degree multiplicity volume model.
45
3 The Multi-Level Refinement Algorithm
best vertex
best unmoved
Search Mode The search mode controls how vertices are selected based on the
vertex ranking. It plays a similar rule like the grouping and matching algorithms
in graph coarsening. The most accurate search method is to consider always all
possible operations: In each step all possible moves of all vertices are evaluated and
the global best modularity increase is selected. This is a costly operation because
evaluating all moves requires to visit all edges in each move step. For k vertex moves
the k|V | maxmod evaluations lead to O(k|E|+k|V | max |C|) average run-time where
max |C| is the highest number of clusters the occurred during the moves.
Fortunately this method can be divided into two stages by first selecting the best
vertex according to a vertex ranking and then evaluating further operations only for
that vertex (mode: best vertex ). In combination with the maxmod vertex ranking
this is equal to the previous method. Instead the vertex ranking should be replaced
by a constant-time predictor. Then performing k vertex moves requires at most k|V |
evaluations of the vertex ranking and one maxmod target ranking per moved vertex.
Thus the average run-time will be in O(k|V | + k|E|/|V | + k max |C|).
Most predictors are likely to also select vertices which cannot be moved for
modularity increase. In greedy refinement these vertices must be skipped. In order
to not select these vertices again all processed vertices are marked and ignored later
(mode: best unmoved ). This search mode thus selects one of the remaining vertices
until no vertex is left. The whole process is restarted in case some early selected
vertices became movable after later moves.
Ranking Updates An algorithm could simply iterate over a fixed ordering of the
vertices or update the ranking after each move. The updated ranking is implemented
by simply visiting all vertices in each move step. Alternatively a heap can be used to
efficiently retrieve the best ranked vertex. However empirical comparisons suggested
that all variants are similarly fast. Therefore visiting all vertices is the preferred
method.
On Modularity Decrease In case the selected operation does not improve the
modularity following three choices are possible:
skip, abort,
accept
skip Mark the vertex as moved but leave it in the current cluster.
abort Abort the refinement.
accept Perform the operation anyway.
Greedy heuristics are characterized by not accepting non-improving operations.
Whether to directly abort the refinement depends on the search method. In case
other vertices still allow modularity increasing moves the currently selected vertex
should be skipped. Otherwise the refinement can be aborted.
Classification of Basic Refinement Algorithms Table 3.2 provides an overview
and classification of refinement methods building on the components described in
46
3.4 Cluster Refinement
Algorithm
SearchMode
Vertex Ranking
Mod. Decrease
Impl.
greedy refinement
complete greedy
sorted-maxmod
sorted-mod
sorted-eo
sorted-dens
best
best
best
best
best
unmoved
unmoved
unmoved
unmoved
maxmod
maxmod
mod-fitness
eo-fitness
dens-fitness
abort
abort
skip
skip
skip
X
X
X
X
deterministic search
KL-maxmod
KL-mod
KL-eo
KL-dens
best
best
best
best
unmoved
unmoved
unmoved
unmoved
maxmod
mod-fitness
eo-fitness
dens-fitness
accept
accept
accept
accept
X
randomized search
simulated
annealing
all
random
spinglass
system
all
random
extremal
optimization
best+random
eo-fitness
random target ranking, accept with
probability depending on modularity
increase
random target ranking with an energy
model, accepts like
simulated annealing
accept
Table 3.2: Classification of Refinement Algorithms
47
3 The Multi-Level Refinement Algorithm
the previous subsections. Specific observations about greedy and Kernighan-Lin
refinement are discussed in the next section.
The implemented algorithms are marked in the table. Other algorithms are included to show how they align within this design space. Sorted greedy refinement
with maxmod vertex ranking (sorted-maxmod ) is not implemented because it is as
slow as complete greedy but also is restricted in its search. Thus it has no advantages
over other greedy algorithms. The three fitness-based Kernighan-Lin algorithms KLmod, KL-eo, and KL-dens are not considered further because they do not reliably
find local optima. Their vertex ranking combined with accepting quality decreasing
moves prevents this. Finally randomized algorithms are generally excluded for the
same reason. They are interesting just in combination with greedy refinement. But
then too many combinations exist to be discussed in the scope of this work.
3.4.2 Greedy Refinement
Data: graph,clustering,selector
Result: clustering
repeat
v ← selector:find best maxmod vertex;
j ← selector:find best target cluster for v;
if move v → Cj is improving modularity then
move v to cluster j and update selector;
until move was not improving ;
Figure 3.10: Refinement Method: Complete Greedy
Greedy refinement algorithms are characterized by accepting only vertex moves
that increase modularity. The complete greedy algorithm, as displayed in Figure 3.10,
uses the maximal modularity increase maxmod as vertex selector. This enforces the
globally best move in each iteration. Selecting and moving vertices is repeated until
the modularity is not further improved. In each iteration |V | vertex rankings have
to be evaluated with each requiring O(|E|/|V | + max |C|) time in average. Selecting
the best target cluster and updating the cluster weights w(C(v)), w(Cj ) costs linear
time in the number of incident edges. Moving the vertex is done in constant time
by updating the mapping C(v) which is stored in an array. Thus k moves require
O(k|E| + k|V | max |C|) time.
To improve search speed the sorted greedy algorithm splits vertex and move selection into separate steps. Figure 3.11 displays the algorithm in pseudo-code. The
inner loop selects all vertices once sorted by their vertex ranking. Since the ranking changes with each move all vertices are re-visited in each inner iteration. With
constant-time rankings like mod-fitness this will cost O(|V |2 ) time for the complete
inner loop. Again the most expensive part is selecting the target cluster for the
current vertex. The inner loop does this exactly once for each vertex which costs
O(|E| + |V | max |C|). Moving a vertex and updating cluster weights and vertex
48
3.4 Cluster Refinement
Data: graph,clustering,selector
Result: clustering
repeat
mark all vertices as unmoved;
while unmoved vertices exist do
v ← selector:find best ranked, unmoved vertex;
j ← selector:find best target cluster for v;
mark v as moved;
if move v → Cj is improving modularity then
move v to cluster j and update selector;
// outer loop
// inner loop
until no improving move found ;
Figure 3.11: Refinement Method: Sorted Greedy
fitness costs O(|E|/|V |) in average. Therefore the worst-case time for all iterations
of the inner loop is in O(|V |2 + |E| + |V | max |C|).
In case vertices were moved the outer loop restarts the refinement. This processes
vertices which were visited early but became movable with modularity increase just
after later moves of other vertices. In practice only a small number of outer iterations
is necessary. These restarts ensures that sorted greedy always finds a local optimum:
If at least one improving move exists its vertex will be visited and moved even if
it is not the best ranked vertex. Higher ranked vertices are simply skipped. When
improvements were found the refinement is restarted until no single improving move
exists. Nevertheless the found optimum depends on the vertex ranking. The variants
may end up in different nearby local optima.
3.4.3 Kernighan-Lin Refinement
The central idea of Kernighan-Lin refinement is to escape local optima by moving
vertices with the least modularity decrease in case no improvements are possible.
Selecting the least modularity decreasing moves is like a careful depth-first walk into
the surrounding clustering space along a ridge while avoiding clusterings of very low
modularity.
The basic algorithm is presented in the following subsection. Unfortunately it is
not very effective considering its long run-time. To improve this the next subsections
analyze two aspects of the dynamic behavior: The creation of clusters and the
effective search depth. Based on the results the algorithm is improved by restricting
the search depth.
Basic Algorithm The basic algorithm is shown in Figure 3.12. The best clustering
found during the refinement is called peak clustering and is separately stored. A
new peak clustering would have to be stored when the clustering after a modularity
increasing move is better than the last peak. To save some work just the last
clustering in a series of modularity increasing moves is stored. This situation is
49
3 The Multi-Level Refinement Algorithm
Data: graph,clustering,selector
Result: clustering
current ← clustering ;
// outer loop
repeat
start ← current;
peak ← current;
mark all vertices as unmoved ;
// inner loop
while unmoved vertices exist do
v ← selector:find best ranked, unmoved vertex;
j ← selector:find best target cluster for v;
mark v as moved;
if modularity is decreasing ∧ current better than peak then
peak ← current;
move v to cluster j and update selector;
if peak better than current then
current ← peak;
until current not better than start ;
clustering ← current;
Figure 3.12: Refinement Method: basic Kernighan-Lin
checked at modularity decreasing moves in the inner loop. In order to not directly
revert modularity decreasing moves processed vertices are marked and ignored in
later inner iterations. In case the inner loop found an improved clustering it is
contained in the current or the peak clustering. This is checked after the inner loop
and the best is returned to the outer loop. Altogether the outer loop restarts the
refinement on the best intermediate clustering until no further improvements were
found.
Just the Kernighan-Lin algorithm with maxmod vertex ranking fully moves into
local optima. Its inner loop starts like greedy refinement and performs modularity
increasing moves until reaching a local optimum and suppressed moves of marked
vertices are performed by the second outer iteration. Unfortunately this is not shared
by the other vertex rankings. When vertices with no modularity increasing moves
are proposed before reaching a local optimum they are moved. Thus it becomes
unlikely to reach local optima.
Similar to greedy refinement with maxmod vertex ranking the time required for
a complete run of the inner loop is O(|V |(|E| + |V | max |C|)). But in contrast
to the complete greedy refinement this algorithm does not abort in local optima.
This makes the basic algorithm very expensive which is also confirmed by empiric
experience.
Creation of Clusters The creation of clusters is an aspect specific to modularity
clustering because the optimal number of clusters is not known. Initial observations
with the basic algorithm discovered an excessive creation of new clusters in the
inner loop. For example the scatter plot in Figure 3.13a shows the relation between
50
peak
●
●
●
●
●
●
●
●
●
●
Modularity
0.60
Modularity
0.4
●
●
●
●
●
0.58
●
●
80
Clusters
●
●
●
●
●
●
100
●●
●
●
●
●
●
60
●
peak
120
●
0.64
●
0.62
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
0.5
●
●
0.2
●
●
●
●
0
500
1000
Clusters
1500
2000
(a) Modularity and Cluster Count
40
0.56
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ● ● ● ●
●
●
●
●
● ●
● ●
10000
12000
● ●
●
20
0.3
●
●
0.54
0.6
3.4 Cluster Refinement
4000
6000
Move
8000
16000
(b) Dynamics of Kernighan-Lin Refinement
Figure 3.13: Kernighan-Lin Refinement Creating Clusters on Graph Epa main
modularity and the number of clusters at the end of inner loops. In addition the
position of peak clusterings is shown by blue circles. Moreover Figure 3.13b shows
the beginning of multi-level Kernighan-Lin refinement in the same graph. The black
line plots the modularity and the red line the number of clusters. Again blue circles
mark cluster counts at peak clusterings. Both graphics underline that all peak
clusterings are found with a small number of clusters while the algorithm tends to
create a huge number of clusters.
This behavior can be explained as follows: After all vertices fitting well into another cluster are moved the algorithm continues with vertices very well connected
to their current cluster. Often moving them into a new cluster decreases the modularity less than moving them into other existing clusters. Thus a cluster is created
for that vertex.
The excessive creation of clusters brings two big problems: Firstly the search
for better clusterings is quickly directed to low-modularity clusterings and explores
only few with cluster counts near the optimum. This wastes time in uninteresting
parts of the clustering space. But more importantly the time complexity of the
whole algorithm is coupled to the number of clusters. Thus high cluster counts
significantly impair the performance.
Effective Search Depth The moves performed by the algorithm are interpretable
as a depth-first search into the space of possible clusterings with the vertex and
target ranking determining the direction. The complete execution of the inner loop
is very time consuming. However already after a small number of moves the search
is far away from optimal clusterings and cannot return back because of the marked
vertices.
In order to safely abort the inner loop earlier the number of moves between peak
clusterings is analyzed. For this purpose let the effective search depth be the max-
51
0.652
80
45
0.653
3 The Multi-Level Refinement Algorithm
●
peak
0.647
12
74
60
Observed Search Depth
40
●
●
●
●
●
●
●
●
● ●
●
0
0.646
●
100
Move
150
(a) Detail of the Dynamics on graph Epa main
●
●
●●
●
10
● ●
50
●
●
●
●
●●
●
●
●
●
100
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ● ●● ●● ● ● ●●●●●●
●
●●
●●
●
●
● ●●
● ●
● ●●
● ●● ● ●
● ●●
● ● ●
●
●● ● ●●
● ● ●● ●
●●
● ●
●
● ●
● ●● ●● ● ● ● ●● ●
●● ●●
●● ●●
●●
● ● ● ●● ● ● ● ●
●
50
●
●
●
●
0
●
●
20
30
0.648
35
Clusters
Modularity
0.649
0.650
40
0.651
●
●
●
●
500
Vertices
1000
●
●
5000
(b) Observed Effective Search Depth
Figure 3.14: Effective Search Depth of Kernighan-Lin Refinement
imal number of moves from a peak clustering to a clustering with equal or better
modularity that may occur in real-world graphs. Of course this depth depends on
the number of vertices.
A typical situation with about 1500 vertices is displayed in Figure 3.14a. The
modularity of the clusterings is shown by the black line. Blue vertical lines mark
the moves where peak clusterings were found. At the beginning of the inner loop
all 12 moves directly increasing modularity were executed. From this first peak
clustering a series of 62 modularity decreasing and increasing moves were necessary
to find better clusterings and the second peak. In example this represents an effective
search depth of around 60 moves.
In order to find a simple lower bound for the effective search depth the basic algorithm was applied with multi-level refinement to a selection of 17 real-world graphs.
At each peak clusterings the observed search depth together with the number of vertices at the current coarsening level was recorded. The scatter plot in Figure 3.14b
visualizes the dependency between vertex count and search depth. A logarithmic
scale is used for the vertex count. It is visible that at extreme values the depth
nearly linearly grows with the logarithm of the vertex count. Now a lower bound is
obtained by taking the maximal quotient. This yielded a factor of about 20. Thus
it is moderately safe to abort the inner loop after around 20 log10 |V | moves with a
modularity below the last peak clustering. This bound is also shown by the dashed
line.
Based on these observations the basic algorithm is improved by restricting the
search depth. The parameter search depth controls the accepted search depth. The
inner loop is terminated after log10 |V | times search depth moves with a modularity
lower than at the last peak clustering or beginning of the inner loop. Factors around
20 should be used based on the measurements. Due to this early termination it is
unnecessary to also restrict the creation of clusters.
52
●
3.5 Further Implementation Notes
<<concept>>
<<concept>>
IndexSpace
GrowableSpace
+const_iterator: typedef
+iterator: typedef
+key_type: typedef = Key
+index_type: typedef = size_t
+SuperSpacePtr: typedef
+contains_key(key:key_type): bool const
+begin(): const_iterator const
+end(): const_iterator const
+size(): size_t const
+upper_bound(): size_t const
+lower_bound(): size_t const
+get_key(idx:size_t): key_type const
+get_index(key:key_type): size_t
+super_space(): SuperSpacePtr const
+root_space(): void* const
+register_upper_bound(id:size_t,function)
+unregister_upper_bound(id)
+new_key(): key_type
+new_key(idx:size_t): key_type
<<root space>>
<<concept>>
RangeSpace
DynamicSpace
+add_key(key)
+remove_key(key)
+remove_all_keys()
SuperSpace:
BitSubspace
<<root space>>
SuperSpace:
FilterFunctor:
RecycleSpace
FilterSubspace
+map: IndexMap
+filter: FilterFunctor
reuses removed keys later
Figure 3.15: Index Spaces: Class and Concept Diagram
3.5 Further Implementation Notes
This section provides additional hints on the implementation. These details are not
directly important for the functioning of the clustering algorithm but describe crucial
components enabling and driving its implementation. Class diagrams and sample
code give a small insight into its structure and appearance. The next subsection
introduces the concept of index spaces and shows how graphs and data are managed
in the program. The second subsection describes how the data is externally stored.
3.5.1 Index Spaces and Graphs
Graph algorithms commonly deal with two different aspects: The adjacency structure of the graph and properties like vertex and edge weights. For navigation trough
the graph the structure is used. On the other hand calculations typically involve
weights retrieved from the current vertex or edge. For example the implementation
of the volume model stores vertex weights. At the same time the merge selector
stores the selection quality of each pair of adjacent vertices. These properties represent different aspects of the algorithm and should be managed separately. Often
it is necessary to construct temporary properties in sub-algorithms not visible to
the outside. In addition a noteworthy observation is that in graph algorithms huge
amounts of 100, 000 and more small objects for vertices, edges and temporal properties exists. All these have no own life-cycle and do not need any garbage collection
as their existence is coupled to the graph and algorithm phases.
53
3 The Multi-Level Refinement Algorithm
VT:typename
<<concept>>
KeySpace
IndexMapReadable
+KeySpacePtr: typedef
+value_type: typedef = VT
+param_type: typedef = VT const &
+reference: typedef = VT&
+key_type: typedef = KeySpacePtr::key_type
+get(key_type): param_type const
+key_space(): KeySpacePtr
+at(key_type): param_type const
<<concept>>
IndexSpace
VT:typename
KeySpace:IndexSpace
ChunkIndexMap
VT:typename
<<concept>>
VT:typename
KeySpace:IndexSpace
IndexMap
+at(key): reference
+put(key,value:param_type)
ConstantDummyMap
+value: value_type
Figure 3.16: Index Maps: Class and Concept Diagram
As solution the implementation of the presented algorithms uses the concept of
index spaces and index maps. For similar approaches see [29, 9]. Index spaces
carry the structure without considering additional properties. This is similar to a
collection of array indices. Each index abstractly represents an object. Specific
properties of these objects are managed by index maps. These allow to read and
write the data stored for an object using its index as access key. This approach
basically is an external variant of the decorator pattern. It is external because it
does not wrap the single objects when extending them.
Figure 3.15 contains a class diagram of the implemented index spaces and their
interfaces. Basic spaces allow to iterate over the indices using the methods begin
and end like in the C++ Standard Template Library (STL). The size in number of
entries can be queried using size. Grow-able and dynamic spaces allow to create
and remove indices. Subspaces mask parts of an index space. This can be used
for example to blend out removed entries or iterate over all entries with a common
property. For this purpose all index spaces hold a reference called super space to
their superset. The super space of a root space points to itself.
The class RangeSpace is the simplest, most compact implementation. It describes
a continuous range of indices from a lower to an upper bound. It is used when this
growing behavior suffices. In case it is necessary to remove indices from the space the
class RecycleSpace is used. It recycles removed indices later when new are created.
Internally a BitSubspace is used which marks removed indices in a bitset. The class
FilterSubspace allows to easily operate with subsets defined by a filter function. The
filter is provided at instantiation as function object.
Index maps, as shown in Figure 3.16, provide access to specific data stored for the
objects referred to by indices. Several maps can be created on the same index space
and independently destroyed when they are no longer in use. The implementations
use C++ templates to define the type of data to store. The index maps supply an
54
3.5 Further Implementation Notes
BOOST_FOREACH(Vertex v, vertices(g)) {
put(deg, v, 0.0);
BOOST_FOREACH(Edge e, out_edges(v,g)) {
if (v == target(e,g))
// self-edge
at(deg, v) += 2 * get(ewgt, e);
else
at(deg, v) += get(ewgt, e);
}
}
Figure 3.17: C++ Example: Calculation of Vertex Degrees. Input are the graph g
and the edge weights ewgt. The index map deg will contain the computed degrees.
automatic memory management and are resized when the underlying index space
grows. To implement this the map holds a reference to its index space. An observer
is registered at the space to inform the map when new storage has to be allocated.
In order to efficiently implement the access to data just continuous ranges of integer
indices are used in this implementation. Index Maps provide storage space for indices
from a lower to an upper bound. The lower bound is always zero and the upper
bound grows with the creation of new entries.
The only real implementation is ChunkIndexMap. It stores the data not in a single
huge array but in a sequence of fixed size array called chunks. Access to entries is
performed in two steps by splitting the numeric index in a chunk number and the
inner offset. In practice this does not cost much but allows to allocate memory
much easier. When the index space grows no resizing of arrays and moving of data
is necessary. Just a new chunk is allocated when necessary. In addition it is easier to
find memory space for fixed-size chunks than for huge continuous arrays. The other
implementation ConstantDummyMap provides read-only access to a fixed value. It
is used in some specializations of algorithms to represent unit edge weights.
Graphs are implemented using a combination of index spaces and maps. They
contain a separate space for the vertices and for the edges. A map over the edges
is used to store the start- and end-vertex indices. Separate implementations for
directed and undirected (symmetric) graphs exists. Both allow iteration over the
outgoing edges of a vertex and insertion and removal of edges. For this purpose two
additional index maps provide access to sibling edges like in a double-linked list and
a reference from each vertex to the start of its edge list. The undirected variant is
used during graph coarsening and enforces for each edge the existence of its inverse
edge. This enables the implementation of a vertex merge operation similar to edge
contraction.
Access to the graphs is mainly provided in the style of the Boost Graph Library
(BGL) [70]. Besides basic graph algorithms it provides abstract concepts for accessing and manipulating graphs. The key point is the use of external adapters. Instead
of defining a single interface to graphs the concepts define separate, external functions. With this strategy the functionality can be extended without the necessity
to extend existing interfaces and replace proxy classes. Simply new functions are
55
3 The Multi-Level Refinement Algorithm
Filename
Description
gname.meta.gz
meta-information about the graph gname
(source, content, type, . . . )
the vertex data (vertex name)
the edge data (start-vertex, end-vertex, weight)
the clustering computed with algorithm algname
free form log-file of the algorithm
configuration of the algorithm
(parameters, cluster count, modularity, . . . )
copy of the best found clustering
table comparing the clusterings
(algorithm, modularity, runtime, . . . )
weighted and unweighted vertex degrees
gname.vertices.gz
gname.edges.gz
gname.cluster.algname.gz
gname.cluster.algname.log
gname.cluster.algname.meta.gz
gname.cluster_best.gz
gname.cluster_info.gz
gname.vertex-degree.gz
Table 3.3: Hierarchical Naming Convention for File Names
added which retrieve the graph object as argument. Based on the argument types
the compiler then exploits this function overloading to select the correct implementation.
Another advantage of this strategy is better readability. Many small details can be
hidden behind the adapter functions. Therefore such adapters are provided for the
own graph implementations and index maps. The C++ code in Figure 3.17 shows
a small example to calculate the weighted degree of all vertices. Using a for-each
construct the outer loop iterates over all vertices of the graph. At each vertex the
out-going edges are visited by the inner loop and their weights summed up.
In summary the index spaces and maps with their abstract access by indices
are conceptually very similar to graphs and property maps in Boost. Here the
concept is extended by the explicit integration of index spaces. Without them it
is not possible to directly describe the dependencies between property maps and
the structures they attribute. The Boost Graph Library already supplies the user
with basic implementations for various graph types. However the simple collections
(vectors, maps) of the Standard Template Library are used which do not scale well
to big graphs. Here the chunks-based implementation for index maps provides a
scalable alternative.
3.5.2 Data Management
The management of graphs, clusterings and other data faces some special requirements in this project. Most importantly the evaluation will handle many graphs
and produce huge amounts of clusterings. Thus in each clustering no duplicate information like the graph structure and vertex names should be stored. Besides the
actual graph data quite some structured meta data emerges. This includes information about used algorithms, parameters, source references and comments. This data
should be easily accessible to computers and humans. And finally the data should
be easy to import in other software like GNU-R for post-processing.
In consequence simple text files similar to the CSV (comma separated values)
56
3.5 Further Implementation Notes
format are used. Each file stores exactly one table. The columns are separated by
tabs (\t) and rows by a single newline (\n )character. Strings may be delimited
by quotation marks. The value NA may be used for missing values. The first row
contains the table header with column titles and the first column is always used as
index. To save storage space all files are transparently compressed using the gzip
format.
Separate files are used for different data. For example graphs are stored in three
files containing vertex data, edge data, and meta data. For each vertex its name or
original identifier is stored. The edge table contains the pairs of end-vertex indices
and the edge weights. Additional files are used to store clusterings and similar.
In order to retrieve data easily a hierarchical naming convention is employed.
The dot is universally used as separator in filenames. The first component names
the graph and following components differentiate data sets. The naming scheme
is best explained by example as in the table below. Meta data is organized in a
similar fashion. Here character strings are used as row indices and the dot is used
as hierarchical separator.
57
4 Evaluation
This chapter evaluates the family of algorithms developed in the previous chapter.
The algorithms are composed of different components like coarsening method, merge
selector and so on. Specific algorithms are configured using a set of parameters.
The key aspects of the evaluation are effectiveness, reliability, simplicity, and
efficiency. An algorithm is effective when it finds better clusterings in terms of the
modularity measure compared to the alternatives. This is strongly related to the
reliability, i.e. finding these clusterings systematically and not just sometimes by
random chance. Simplicity and efficiency are concerned with the question whether
complicated or expensive components of the algorithm significantly improve the
results. This includes the study of the scalability to check how the runtime of
components develops with the graph size. Expensive components with just small or
no improvements could be omitted to simplify future implementations. At the same
time it might be possible to gain similar improvements with less effort using other
heuristics.
Not all of these aspects can be treated in full depth in the scope of this work.
Instead the evaluation mostly concentrates on the effectiveness compared to other
configurations and reference algorithms.
The chapter is organized as follows. The next section summarized the configuration space and introduces the main evaluation methods used in this work. The next
three section study the effectiveness of the graph coarsening, the merge selectors and
the refinement. Efficiency aspects are also discussed where appropriate. The fourth
section discusses experimental results on the scalability of the algorithms. And the
last two sections compare the presented multi-level refinement method against other
clustering algorithms and results collected from the literature.
4.1 Methods and Data
This section centrally discusses various aspects of the employed evaluation methods.
The first subsection summarizes the algorithm components and describes the configuration space spanned by the parameters. The second subsection introduces the
mean modularity method used to study the effectiveness of clustering algorithms.
In this context also the collection of benchmark graphs is presented. Finally the
last subsection contains some notes on the evaluation of computation times and
scalability.
4.1.1 Configuration Space
The multi-level algorithm is configured by the parameters listed in the Table 4.1.
This includes for example the coarsening and refinement methods and configuration
59
4 Evaluation
parameter
component
description+values
coarsening method
coarsening
greedy grouping
reduction factor
coarsening
match fraction
coarsening
merge selector
coarsening
method to merge cluster pairs;
greedy grouping, greedy matching
number of clusters to merge in
each coarsening level; 5%–50%
number of best ranked pairs
to consider in matching; 50%–
100%
ranking of cluster pairs;
modularity increase, weight density, RW distance, RW reachability, spectral length, spectral
length difference, spectral angle
RW steps
RW iterations
selector
selector
2
3
spectral ratio
selector
spectral max ev
selector
length of random walks
iterative applications of reachability
number of eigenvectors to use,
cut off value for λj /λ1
maximal number of eigenvectors
refinement method
refinement
vertex selector
refinement
search depth
refinement
method to move vertices;
complete greedy, sorted greedy,
Kernighan-Lin
ranking of vertices in
sorted greedy refinement;
mod-fitness, eo-fitness, densityfitness
when to abort inner-loop search
in Kernighan-Lin, multiplied by
log10 |V |
Table 4.1: The Configuration Space
60
default
10%
50%
weight density
20%
30
sorted greedy
density fitness
25
4.1 Methods and Data
of merge selectors. The parameters are grouped into a coarsening and refinement
phase depending on what they directly influence. Some merge selectors are further
configurable. Their parameters are placed in the component called selector. Note
that not all parameters are meaningful in all combinations. For example the match
fraction is used only by greedy matching. The last column contains the default
values used throughout the evaluation were nothing different is stated.
The parameters controlling spectral merge selectors and the search depth are fixed
and their influence will not be further evaluated for following reasons. The vertex
vectors used in spectral merge selector always are approximations. Using more
eigenvectors should improve the accuracy, but requires more memory and time to
compute them. Thus the maximal number of eigenvectors is limited to 30. In graphs
with less than 30 clusters also less eigenvectors contain meaningful information. By
assuming that the positive eigenvalues quickly decay to zero only the few eigenvectors with eigenvalues larger than 20% of the largest eigenvalue are used. The search
depth is only used by the Kernighan-Lin refinement. Previous experiments in Section 3.4.3 on refinement algorithms showed that it can be restricted to 20 log10 |V |
moves without missing better clusterings. For safety the factor 25 will be used.
4.1.2 Effectiveness
The main tool used in this evaluation is the comparison of mean modularities. For
this purpose clusterings produced by specific algorithm configurations on a set of
graphs are collected. The mean modularity of a configuration over the graphs is
calculated using the arithmetic mean.
Of course graphs with high modularity most influence the absolute value of the
mean modularity. But the modularity measure is normalized to the range from zero
to one. Thus all modularity values will be in the same range regardless of differing
graph properties. In addition here the absolute values are unimportant as algorithms
and configurations are compared. The differences between mean modularities are
most influenced by graphs with high modularity variations between the algorithms.
In many places also the results gained with and without refinement will be compared. Without refinement it becomes visible how much parameters influence the
raw graph coarsening. On the other hand results with refinement indicate how well
coarsening with the observed parameters supports the refinement algorithms later.
Kernighan-Lin refinement is expected to be more resistant against difficulties as it
can escape from local optima. Hence instead just sorted greedy refinement with the
density-fitness vertex selector is used for this comparisons. This reference configuration will be abbreviated by sgrd in most places.
As a useful side effect the modularity improvements with refinement over no refinement provide a significance scale. Fluctuations of the modularity much smaller
than this scale are probably not of much interest. Similarly improvements of different coarsening configurations which are easily compensated by greedy refinement
are neither important.
For many questions of the evaluation tables containing mean modularities will
be produced. However it is difficult to compare these raw values. Therefore where
possible the mean modularity results will be plotted and graphically visualized. The
61
4 Evaluation
significance scale also becomes visible in such figures by including results gained
with and without refinement.
It can be expected that not all graphs are equally well suited for the study and
comparison of clustering algorithms. Obviously some may not contain any meaningful cluster structure. Structural properties like the number of clusters, inhomogeneous cluster size, and the degree of separation will influence the algorithms. Most
importantly approximation algorithms and heuristics directly or implicitly exploit
structural properties of the graphs to make their local decisions.
Therefore it is advisable to not blindly use a few graphs for evaluation but also
study their structural properties. Unfortunately still only little is known about the
influence of various structural properties on the modularity clustering problem. In
order to avoid pitfalls here the effectiveness of clustering methods is studied using a
large collection of real-world graphs from different application domains.
The next paragraph shortly comments on why no randomly generated graphs were
used in this study. Finally the second paragraph introduces the benchmark graphs.
On Random Graphs Using graph generators it is very easy to acquire huge amounts
of benchmark graphs with roughly known structural properties. In the past quite
some studies on clustering algorithms, for example [18], used simple generated
graphs. Still random graphs should to be used with caution for following reasons:
• Depending on the generator the expected and the actual number of edges in
the generated graph do not match. In that case density and optimal quality
vary even with identical parameters.
• The random graph model determines the vertex degree distribution and many
other structural properties. For example placing edges with equal probability
between vertices (Bernoulli random graph) produces a binomial distribution.
On the other hand certainly only few graphs in real-world applications will
have such a structure. At the same time in some application domains it is still
difficult to generate graphs with properties close to the real structures.
• Some volume models depend on the vertex degree which again depends on
where the edges are placed. This interdependency makes it very difficult to
produce random graphs with specific intra- and inter-cluster densities.
• The intended clustering of generated graphs appear to be known, but another
clustering might well have a better quality for the chosen quality measure. On
smaller random graphs the fluctuations may dominate the clusterings.
The Real-World Benchmark Graphs Many researchers and institutions published
typical graphs of their research areas on the Internet. In the context of modularity
clustering for example Mark Newman, Alex Arenas, and Andreas Noack collected
an published some interesting graphs. Besides these small, individual collections the
Pajek project [8] brought together quite many graphs from different disciplines.
These collections were used to compile a set of benchmark graphs. The complete
set is listed in Table A.1 in the appendix. For each graph the number of vertices
62
4.1 Methods and Data
and edges, the weighted mean vertex degree (mean wdeg) and the global connection
density in the degree volume model (wgt. density) is printed. In addition the source
of the graph is indicated in the last column and Table A.2 provides web addresses
to this data.
For various reasons also subsets of the collection will be used. For example some
single evaluation steps will compare a big number of configurations. To keep feasible
computation times a reduced graph set will be employed. The second column marks
the graphs of the reduced set with R. In addition the graphs from the scalability
analysis are marked with S and from the comparison to reference algorithms with
C.
Many graphs of the collection are directed and contain parallel edges. But the
modularity measure is defined for simple, undirected graphs. Therefore the graphs
were pre-processed with the following aim: Between each adjacent pair of vertices
lies exactly one edge in each direction and their weight equals the sum of the original
edge weights between both vertices. This pre-processing is allowed as the focus of
this work lies on the evaluation of algorithms and not in the interpretation of single
clustering results. However other publications may have used different normalization
strategies.
The pre-processing is accomplished in four steps: First parallel edges are removed
and their weight is added to the first edge. Then missing inverse edges are added
with zero weight. Self-edges are used as their own inverse edge. In the third pass
the edge weights are made symmetric by taking the sum of each edge and its inverse
edge, ignoring self-edges. Finally in disconnected graphs the largest connectivity
component is chosen. These graphs are labeled with the suffix main.
4.1.3 Efficiency and Scalability
In order to compare the efficiency of different configurations information about the
computational expenses are necessary. Then trade offs between quality and runtime
are identified in combination with measured differences in the mean modularity.
Unfortunately it is not advisable to compare average computation times. There
is no shared time scale between the graphs. The average values would be strongly
influenced by the few largest graphs. Therefore just the measured times from a single
graph are compared. Through the evaluation the graph DIC28 main is used. For
some aspects also the graph Lederberg main is considered. All timings were measured
on a 3.00GHz Intel(R) Pentium(R) 4 CPU with 1GB main memory. Similar to the
mean modularity comparing the runtime with and without refinement provides a
significance scale.
Computation times measured on different graphs can be used to study the scalability of the algorithms. By nature the runtime depends on the number of vertices
and edges. Plotting the measured times against the number of vertices will visualize
these dependencies. In practice the absolute times are uninteresting as they also depend on startup time, implementation style, and the runtime environment. Instead
the progression of runtime curves of different algorithms is compared.
63
4 Evaluation
mean modularity
0.575
0.580
0.585
0.590
0.595
Modularity by Match Fraction
0.565
0.570
●
M−sgrd 10
M−sgrd 30
M−none 10
M−none 30
●
●
●
●
●
20
40
60
match fraction [%]
80
100
Figure 4.1: Mean Modularity by Match Fraction (reduced set)
4.2 Effectiveness of the Graph Coarsening
This section has three aims. First the best value for the parameter match fraction
used by the matching algorithm is searched. Then greedy grouping is compared
against the best greedy matching variant. And finally the influence of the reduction
factor on coarsening and refinement is studied.
4.2.1 Match Fraction
The match fraction controls how many of the best ranked cluster pairs are considered
when constructing the matchings. All other pairs are simply ignored in order to avoid
selecting pairs ranked very low just because all better pairs are blocked by previous
matches.
The greedy matching is evaluated in the default configuration. Two different
reduction factors 10% and 30% and match fraction values 10%, 25%, 50%, 75% and
100% are used. With 100% no pairs are ignored. The parameters are compared with
and without refinement. In summary the two basic algorithms are greedy matching
with refinement (M-sgrd ) and without (M-none).
Table 4.2 compares the mean modularities gained with the 20 configurations. As
benchmark graphs the reduced set is used. The rows contain the two algorithms
combined with both reduction factors. The columns list the match fractions. The
maximum value of each row is marked in order to highlight the best match fraction.
In addition Figure 4.1 contains a plot of the four rows.
As visible in the table and the figure in average best modularities are gained
with the 50% match fraction. Thus this value is used as default in the following
evaluations. Altogether the influence of this parameter seems to be negligible.
64
4.2 Effectiveness of the Graph Coarsening
10%
0.56318
0.59582
0.56415
0.59516
M-none 10%
M-sgrd 10%
M-none 30%
M-sgrd 30%
25%
0.56436
0.59573
0.56336
0.59513
50%
0.56595
0.59592
0.56431
0.59453
75%
0.56590
0.59566
0.56441
0.59432
100%
0.56572
0.59567
0.56252
0.59432
Table 4.2: Mean Modularity by Match Fraction. Columns contain the match fraction
and rows contain the algorithm and reduction factor.
Reduction Factor vs. Modularity and Runtime
140
Modularity by Reduction Factor
●
●
●
●
●
120
●
100
80
runtime [s]
G−none
G−sgrd
M−none
M−sgrd
●
●
40
0.56
0.590
60
mean modularity
0.57
0.58
●
modularity
runtime
mean modularity
0.592
0.594
0.59
0.596
●
●
●
20
40
60
reduction factor
80
100
(a) Modularity by Reduction Factor (reduced set)
20
40
60
reduction factor [%]
80
100
(b) Runtime and Modularity on DIC28 main
Figure 4.2: Modularity and Runtime by Reduction Factor
4.2.2 Coarsening Methods
Next greedy grouping and greedy merging are compared. To stay independent of
specific reduction factors the values 5%, 10%, 30%, 50%, and 100% are used. The
factor controls the number of coarsening levels and thus indirectly influences the refinement later. The value 100% produces just one coarsening level. However factors
above 50% cannot be used with greedy matching. Again the default parameters
are used. In summary the four algorithms are greedy grouping with refinement
(G-sgrd ), without refinement (G-none), greedy matching with refinement (M-sgrd ),
and without (M-none).
Table 4.3 compares the mean modularities gained with the 18 configurations on
the reduced graph set. The rows contain the four algorithms and the columns the
reduction factors. The maximum value of each column is highlighted to better see
the best algorithm. Additionally Figure 4.2a visualizes these values.
The second Table 4.4a compares the runtime of the configurations on the largest
graph of the set (DIC28 main). Furthermore the number of actually produced
coarsening levels is printed in parenthesis.
65
4 Evaluation
5%
0.56849
0.59689
0.56616
0.59613
G-none
G-sgrd
M-none
M-sgrd
10%
0.56849
0.59677
0.56595
0.59592
30%
0.56849
0.59617
0.56431
0.59453
50%
0.56846
0.59557
0.55364
0.59154
100%
0.56846
0.58852
Table 4.3: Mean modularity by Reduction Factor. The Columns are the reduction
factor and rows list the algorithms. Empty cells are impossible combinations.
G-none
G-sgrd
M-none
M-sgrd
50.084s
136.239s
45.77s
139.139s
5%
(123)
(123)
(123)
(123)
26.579s
80.033s
22.2s
72.939s
10%
(60)
(60)
(60)
(60)
17.285s
40.683s
11.614s
36.83s
30%
(19)
(19)
(20)
(20)
16.785s
33.463s
11.235s
32.077s
50%
(10)
(10)
(17)
(17)
100%
17.152s (2)
30.903s (2)
(a) Runtime and Coarsening Levels on DIC28 main
G-none
G-sgrd
M-none
M-sgrd
30.289s
40.514s
56.541s
62.994s
5%
(129)
(129)
(200)
(200)
10%
13.518s (61)
19.513s (61)
47.313s (200)
49.97s (200)
30%
6.85s (18)
9.851s (18)
41.586s (200)
43.598s (200)
50%
5.604s (10)
8.261s (10)
39.505s (200)
43.89s (200)
100%
4.558s (2)
7.656s (2)
(b) Runtime and Coarsening Levels on Lederberg main
Table 4.4: Runtime by Reduction Factor. The columns are the reduction factor and
rows list the coarsening method. The cells contain the runtime in seconds and the
number of coarsening levels. Empty cells are impossible combinations.
66
4.3 Effectiveness of the Merge Selectors
As another extreme the times measured on the graph Lederberg main are listed
in Table 4.4b. While the mean degree of this graph is around 10 it contains some
high-degree vertices with an extreme vertex having 1103 incident edges. Here the
matching algorithm reached the maximal number of coarsening levels (200) regardless of the reduction factor.
Concerning grouping vs. matching, the table and plot of mean modularities shows
that greedy grouping always performed slightly better than matching. Moreover the
behavior of the matching algorithm on the graph Lederberg main confirmed that it
is unreliable. Hence greedy grouping should be used as coarsening method.
4.2.3 Reduction Factor
In order to find the best reduction factor the mean modularity Table 4.3 of the
previous subsection is reused. In addition Figure 4.2b shows a plot of the mean
modularities of G-sgrd and the runtime on the graph DIC28 main against the reduction factors.
Without refinement the reduction factor has no real influence on greedy grouping
(G-none). Just a small influence is visible with greedy matching (M-none). Maybe
because with more coarsening levels it select less bad cluster pairs. Combined with
greedy refinement smaller reduction factors improve the clusterings. However halving the reduction factor from 10% to 5% doubles the runtime and the number of
coarsening levels while the difference in mean modularity is very small. Therefore as
a trade off between quality and runtime the reduction factor 10% is a good choice.
4.3 Effectiveness of the Merge Selectors
The merge selector is used by the coarsening methods to rank cluster pairs in the
selection process. This section experimentally compares the developed merge selectors to find the best. First the best parameters for the random walk distance and
reachability selectors are searched. Then these are compared with the other merge
selectors.
4.3.1 Random Walk Distance
The random walk distance selector has one parameter RW length controlling the
length of random walks. Here the lengths 1–5 are studied. The clusterings are
compared using greedy grouping with refinement, named RWdist-sgrd, and without,
named RWdist-none. All other parameters are set to their default values.
Table 4.5 summarizes the mean modularities gained with the 10 configurations on
the reduced graph set. The columns contain the length of random walks and the rows
contain the two basic algorithm combinations. The best mean modularity of each
row is highlighted to make good lengths visible. The same values are also plotted in
Figure 4.3a. As reference the dashed lines mark the mean modularity produced by
the standard weight density selector with refinement (WD-sgrd ) and without (WDnone). Appendix B.1 contains a detailed table of the clusterings produced on each
graph.
67
4 Evaluation
0.60
Modularity with RW Distance
Runtime with RW Distance (DIC28_main)
●
●
●
RWdist−sgrd
RWdist−none
2500
0.58
3000
●
●
runtime [s]
1500
2000
●
●
1000
0.52
RWdist−sgrd
RWdist−none
WD−sgrd
WD−none
●
●
●
0
0.50
●
500
mean modularity
0.54
0.56
●
1
2
3
length of random walks
4
5
(a) Modularity with Random Walk Distance
1
2
3
random walk length
4
(b) Runtime with Random Walk Distance on
DIC28 main
Figure 4.3: The Random Walk Distance
RWdist-none
RWdist-sgrd
1
0.48510
0.53202
2
0.54798
0.57045
3
0.54424
0.56610
4
0.55213
0.57051
5
0.55495
0.57226
Table 4.5: Mean modularity with Random Walk Distance. Columns contain the
length of random walks and rows contain the algorithm.
68
5
4.3 Effectiveness of the Merge Selectors
●
800
RWreach−sgrd 1
RWreach−sgrd 3
RWreach−sgrd 5
RWreach−none 1
RWreach−none 3
RWreach−none 5
WD−sgrd
WD−none
RWreach−sgrd 1
RWreach−sgrd 3
RWreach−sgrd 5
RWreach−none 1
RWreach−none 3
RWreach−none 5
0.57
400
time [s]
600
mean modularity
0.58
0.59
●
●
●
1000
●
Runtime with RW Reachability (DIC28_main)
1200
Modularity with RW Reachability
●
200
●
●
●
0
0.56
●
1.0
1.5
2.0
2.5
3.0
length of random walks
3.5
4.0
1.0
1.5
2.0
2.5
3.0
length of random walks
3.5
4.0
(a) Modularity with Random Walk Reachability (b) Runtime with Random Walk Reachability on
DIC28 main
Figure 4.4: The Random Walk Reachability
Finally Figure 4.3b compares the runtime on the graph DIC28 main. The values
with and without refinement are plotted to provide a time scale relative to the
refinement costs.
The first table and figure show that the random walk distance selector becomes
slightly better with longer random walks. However the mean modularities are very
low compared to the weight density selector. With refinement RWdist is nearly as
good as weight density is already without any refinement. In fact RWdist found
better clusterings just on the two smallest graphs (SouthernWomen and dolphins).
At the same time computing this selector is extremely expensive compared to the
refinement. Therefore it is removed from the further evaluation.
4.3.2 Random Walk Reachability
The random walk reachablity modifies the edge weights using short random walks.
This modification operator can be applied on its own results and the RW iterations
parameter controls this number of applications. Here the values 1, 3, and 5 are
considered. Again the parameter RW length defines the length of the random walks
and the values 1–4 are used. The combination with length one and just one iteration
is equal to the original weight density selector. Because iterated application does
not change the weights the other iteration counts were excluded in this combination.
The remaining setup is identical to the previous evaluations and the two algorithms
are named RWreach-sgrd and RWreach-none.
Table 4.6 summarizes the mean modularities produced by the 20 configurations
on the reduced set of graphs. The columns contain the length of the walks and
the rows the two algorithms RWreach-sgrd and RWreach-none combined with the
69
4 Evaluation
RWreach-none 1
RWreach-sgrd 1
RWreach-none 3
RWreach-sgrd 3
RWreach-none 5
RWreach-sgrd 5
1
0.56852
0.59678
2
0.57150
0.59616
0.57530
0.59658
0.57720
0.59732
3
0.57218
0.59663
0.57672
0.59609
0.57102
0.59464
4
0.57380
0.59676
0.57468
0.59760
0.55942
0.59016
Table 4.6: Mean modularity with Random Walk Reachability. Columns contain the
length of random walks and rows contain the algorithm combined with the number
of iterations. Empty cells are impossible combinations.
number of iterations. Figure 4.4a shows a plot of the mean modularities vs. random
walk length. Again the dashed lines mark the mean modularity produced by the
weight density selector. Appendices B.2 and B.3 contain two detailed tables of the
modularities gained on each graph with walks of length 2 and 3.
In addition Figure 4.4a shows the runtime measured on the graph DIC28 main.
Again the runtime difference between the sorted greedy refinement and no refinement
provides a visual relation to other components of the algorithm.
In the mean modularity plot some small improvements are visible without refinement. However with refinement these completely disappear. Three iterations seem
to be slightly better then just one iteration. Using five iterations has a negative
impact when combined with longer walks.
In summary no clear tendency or improvements were measurable. Maybe the used
benchmark graphs are structurally unsuited for these merge selectors. In the further
evaluation simply two steps long walks with three iterations (RW reachability (2,3))
are chosen as default. This combination still can be computed quickly and does not
really degrade the mean modularity.
4.3.3 Comparison of Merge Selectors
To find the best merge selector in terms of effectiveness and efficiency this section
compares mean modularities produced with each merge selector and their runtime.
The larger graph collection is used to gain more reliable mean values. As a side effect
this allows to verify whether the results from the reduced graph set are representative. For the same reason greedy matching is considered again checking whether it
works better with other merge selectors than the weight density.
The studied configurations are composed of three components: Merge selector,
coarsening method, and refinement method using the naming scheme Selector Coarsening-Refinement. Cluster pairs are selected either by modularity increase
like used in the greedy joining algorithm of Newman (MI ), or by the weight density
(WD), random walk reachability (RWR), spectral length (SPL), spectral length difference (SPLD), or the spectral angle (SPA). As coarsening method greedy grouping
(G) and greedy matching (M ) are used. The third component is the refinement algorithm. Like before no refinement (none) and sorted greedy refinement (sgrd ) is
used.
70
4.3 Effectiveness of the Merge Selectors
0.54
Modularity by Merge Selector (large set)
●
●
●
0.46
mean modularity
0.48
0.50
0.52
●
0.44
●
●
●
SPL
SPLD
SPA
MI
merge selector
G−none
M−none
G−sgrd
M−sgrd
WD
RWR
Figure 4.5: Mean Modularity of the Merge Selectors (large set)
Runtime by Merge Selector (DIC28_main)
●
G−none
M−none
G−sgrd
M−sgrd
runtime [s]
100
runtime [s]
200
500
200
500
G−none
M−none
G−sgrd
M−sgrd
1000
2000
●
Runtime by Merge Selector (Lederberg_main)
100
50
●
●
●
●
50
●
●
20
●
●
●
●
●
20
●
SPL
SPLD
SPA
MI
merge selector
WD
RWR
(a) Runtime by Merge Selector on DIC28 main
SPL
SPLD
SPA
MI
merge selector
WD
RWR
(b) Runtime by Merge Selector on
Lederberg main
Figure 4.6: Runtime of the Merge Selectors
71
4 Evaluation
SPL
SPLD
SPA
MI
WD
RWR
G-none
0.43653
0.45479
0.52405
0.52963
0.52305
0.52971
M-none
0.44014
0.48348
0.52252
0.51985
0.51866
0.52768
G-sgrd
0.48521
0.49926
0.54041
0.53968
0.54608
0.54718
M-sgrd
0.51962
0.53231
0.54341
0.54369
0.54529
0.54744
Table 4.7: Mean Modularity of Merge Selectors. Columns contain the algorithms
and rows the merge selectors.
Table 4.7 displays the mean modularities of all 24 configurations on the large graph
set. The rows contain the six merge selectors and the columns the four algorithm
variants. The rows are sorted by their overall mean modularity. Figure 4.5 shows a
plot of the same data.
In order to compare the runtime of the configurations the times measured on the
graphs DIC28 main and Lederberg main are shown in Figures 4.6a and 4.6b. A
logarithmic scale is used to easier differentiate the runtime.
The first two spectral merge selectors SPL and SPLD did not perform well in
terms of runtime and clustering results. This could also be expected from their
mathematical derivation as approximation of the modularity increase. However
the modularity values on these two show that the matching (M ) method is less
sensitive against bad selectors than grouping (G). All other merge selectors SPA,
WD, MI, and RWR yielded very similar clustering results when compared to the
modularity improvement possible by refinement. This is an interesting observation
because the modularity increase selector (MI ) was expected to produce worse results
than the weight density (WD). Very small modularity improvements over the other
selectors were gained by the random walk reachability (RWR). On the better merge
selectors greedy matching (M ) turned out to be as good as greedy grouping (G)
when refinement was used. Without refinement grouping produced slightly better
results. On DIC28 main matching with refinement was slightly faster than grouping.
But on the graph Lederberg main with its extremely skewed distribution of vertex
degrees greedy grouping always was considerably faster.
Altogether the weight density selector (WD) often had the lowest runtime while
still producing good mean modularities. Therefore it will be used as default merge selector. On the large graph set the grouping and matching methods were comparably
good in terms of mean modularity. This result slightly differs to the measurements
from the reduced graph set.
4.4 Effectiveness of the Cluster Refinement
This section compares the multi-level refinement algorithms against raw graph coarsening without refinement. The first subsection concentrates on the greedy refinement
variants to select the best in terms of modularity improvement and search speed.
And finally the second subsection compares the best greedy refinement against the
72
4.4 Effectiveness of the Cluster Refinement
Runtime by Refinement Method (DIC28_main)
0
0.570
1000
0.575
mean modularity
0.580
0.585
runtime [s]
2000
3000
0.590
4000
0.595
Modularity by Refinement Method
none
SGR−mod
SGR−eo
CGR
SGR−density
KL
(a) Modularity by Refinement Method
none
SGR−mod
SGR−eo
CGR
SGR−density
KL
(b) Runtime by Refinement Method on
DIC28 main
Figure 4.7: Mean Modularity by Refinement Method (reduced set)
presented Kernighan-Lin method.
4.4.1 Greedy Refinement
The complete greedy refinement method moves the vertex with the best modularity
increase in each move step until no further improvement is possible. However it
is computational expensive to find the best vertex because it requires to evaluate
all possible moves of all vertices in each step. For this reason the sorted greedy
refinement was introduced. It saves search time by trying to move the vertices in
a specific order according to a vertex selector. In the following evaluation the best
of the available vertex selectors is selected. And finally it is studied whether the
complete greedy refinement can be replaced by the sorted variant without degrading
the results too much.
The three sorted greedy refinement methods are SGR-density using the densityfitness vertex selector, SGR-mod using the mod-fitness, and SGR-eo using the eofitness. The complete greedy refinement is named CGR. To provide a significance
scale Kernighan-Lin refinement (KL) and no refinement (none) is included. All other
parameters are set to their default values, i.e. greedy grouping by weight density is
used for the coarsening.
Table 4.8 summarizes the mean modularity produced by each refinement algorithm
on the reduced graph set. The second column displays the runtime of the algorithms
on the graph DIC28 main. The rows are sorted by their mean modularity. In
addition to the table the bar plot 4.7a visually compares the mean modularities.
The runtime is also plotted in Figure 4.7b.
Between the three sorted greedy variants (SGR-density, SGR-mod, and SGR-
73
4 Evaluation
none
SGR-mod
SGR-eo
CGR
SGR-density
KL
mean modularity
0.56849
0.59661
0.59664
0.59672
0.59677
0.59792
time DIC28 main
24.11900
75.93500
74.89400
798.17800
76.71600
4672.12100
mod. DIC28 main
0.80154
0.84754
0.84727
0.84758
0.84747
0.84781
Table 4.8: Mean Modularity by Refinement Method (reduced set). The first column
contains mean modularities and the second column lists the runtime on the graph
DIC28 main.
mean modularity
none
0.52305
SGR-density
0.54608
CGR
0.54610
KL
0.54810
Table 4.9: Mean Modularity by Refinement Method (large set)
eo) no significant differences in modularity are visible. Their clustering results are
also comparable to complete greedy refinement (CGR). On the graph DIC28 main
the complete greedy refinement was about 10 times slower than any sorted greedy
refinement. This is in agreement with the higher worst case complexity of the
complete greedy refinement. Therefore sorted greedy refinement with the densityfitness vertex selector (SGR-density) is chosen as default greedy refinement method.
4.4.2 Kernighan-Lin Refinement
This subsection analyzes how much the refinement variants improve the clustering
results compared to no refinement. This includes the question whether KernighanLin refinement performs significantly better than greedy refinement. The algorithms
are configured like in the previous subsection. Considered are the variants none,
SGR-density, CGR, and KL. The evaluation is applied to the large graph set to gain
more reliable mean modularity values. Table 4.9 lists the produced mean modularity
values and Figure 4.8 provides a bar plot of the same values. Appendix B.4 contains
a table of the single modularity values gained by the algorithms on each graph.
On the large graph set the mean modularity was improved with sorted greedy refinement (SGR-density) by 4.4% compared to no refinement1 . This range provides
the significance scale like already used in the previous evaluations. In comparison
Kernighan-Lin refinement (KL) improved the results by 4.79%. The mean improvement with Kernighan-Lin refinement over sorted greedy refinement was 0.37%. The
runtime and modularity values of the graph DIC28 main from the previous subsection show that the Kernighan-Lin refinement was about 10 times slower than sorted
greedy refinement. At the same time there it improved the modularity by just
0.04%. Like on the reduced set also on the large graph set sorted greed refinement
(SGR-density) was equally good as the complete greedy refinement (CGR).
1
74
max / min ∗100%
4.5 Scalability
0.525
0.530
mean modularity
0.535
0.540
0.545
Modularity by Refinement Method (large set)
none
SGR−density
CGR
KL
Figure 4.8: Mean Modularity by Refinement Method (large set)
Altogether Kernighan-Lin refinement reliably improves the clusterings by a small
amount but requires considerably more runtime. Therefore it should be used in
place of sorted greedy refinement only when best clusterings are searched. Similar
or better modularity improvements might be easily achievable by other refinement
methods in less time.
4.5 Scalability
The purpose of this section is to experimentally study how well the runtime of the
algorithms scales with the graph size. The considered configurations are the multilevel Kernighan-Lin refinement (KL), sorted greedy refinement by density-fitness
(SGR-density), complete greedy refinement (CGR) and the raw graph coarsening
without refinement (none). For all other parameters the default values are used.
The runtime of all graphs and algorithm has to be measured on the same computer.
Thus only a subset of 24 graphs from the large graph collection is used. The graphs
are also marked in the graph table in the appendix.
Figure 4.9a shows the total runtime of the algorithms versus the number of vertices. In addition Figure 4.9b shows the computation time of the coarsening phase,
which equals the configuration none. The time spend on sorted greedy refinement
is included. In order to show the influence of the vertices the names of three larger
graphs and their vertex count are inserted into the figure. The complete runtime
measurements of each graph can be found in the appendix in Table B.5.
Of course the runtime not only depends on the number of vertices but also on
the edges. On the other hand the number of edges mostly scales with the vertices
because nearly all graphs have a similar mean vertex degree. The graphs eatRS
and hep-th-new main are the two largest graphs of the collection. Both have more
than three times more edges than other graphs of similar vertex count (for example
75
4 Evaluation
Runtime vs. Graph Size
Runtime vs. Graph Size
eatRS 305501 edges
none
SGR−density
CGR
KL
●
hep−th−new_main 352059 edges
●
1000
50
runtime [s]
100
runtime [s]
2000
3000
●
coarsening
SGR−density without coarsening
150
4000
●
●
●
●
DIC28_main 71014 edges
●
0
0
●●
●
●
5000
●
●
●
●
10000
●
15000
vertices
●
20000
(a) Runtime by Graph Size
25000
●
●
0
●
●
● ●●
●
●
●
● ●
●●
●●
0
●●
●
●
●
5000
●
10000
15000
vertices
20000
25000
(b) Coarsening and Greedy Refinement Time by
Graph Size
Figure 4.9: Runtime by Graph Size
DIC28 main). In the second figure it is visible that coarsening and refinement
also takes around three times longer on these graphs. Kernighan-Lin and complete
greedy refinement even stronger depend on the number of edges. Both coarsening
and sorted greedy refinement were equally fast on most graphs and scale much better
with the graph size than complete greedy and Kernighan-Lin refinement.
4.6 Comparison to Reference Algorithms
This section compares the developed algorithm to other publicly available algorithms. Many of these are implemented as proof of concept and only work on
unweighted graphs without self-edges. Therefore the evaluation is based on the 19
graphs of the collection usable with all implementations. Information about the
graphs can be found in Appendix A.1. First the reference algorithms are introduced
shortly. Then the modularity of the clusterings produced by the algorithms and
their runtime are compared.
The following algorithms are compared here. For each a short name is given to
be used in tables and summaries. One of the most popular algorithms is the fast
greedy joining (fgj ) of Clauset, Newman, and Moore [15].2 It is an agglomeration
method selecting cluster pairs by highest modularity increase. Recently also the
modified greedy joining of Wakita and Tsurumi [76] became available.3 It selects
cluster pairs by highest modularity increase multiplied with a consolidation ratio
min(|Ci |/|Cj |, |Cj |/|Ci |). The wakita HN variant uses the number of vertices as
2
3
76
available at http://cs.unm.edu/~aaron/research/fastmodularity.htm
available at http://www.is.titech.ac.jp/~wakita/en/
4.6 Comparison to Reference Algorithms
Chain8
Star9
K55
Tree15
ModMath main
SouthernWomen
karate
mexican power
Grid66
Sawmill
dolphins
polBooks
adjnoun
sandi main
USAir97
circuit s838
CSphd main
Erdos02
DIC28 main
average
walktrap
0.31633
-0.21875
0.00000
0.47449
0.34641
0.00000
0.39908
0.22869
0.47833
0.40153
0.44043
0.49927
0.17640
0.78157
0.28539
0.70256
0.88971
0.62534
0.70285
0.39630
leadingev
0.31633
0.00000
0.30000
0.50510
0.42287
0.21487
0.37763
0.29984
0.55000
0.44745
0.48940
0.39922
0.22153
0.78667
0.26937
0.70215
0.74219
0.59221
0.70923
0.43927
wakita HE
0.37755
-0.078125
0.30000
0.51276
0.31712
0.22775
0.35947
0.33772
0.42222
0.49102
0.48687
0.46585
0.24365
0.80995
0.34439
0.76499
0.91382
0.66991
0.73570
0.45803
wakita HN
0.31633
-0.078125
0.30000
0.50510
0.35488
0.24157
0.41880
0.31949
0.49236
0.48660
0.44838
0.49138
0.25766
0.82054
0.31526
0.79488
0.92001
0.67756
0.74179
0.46445
fgj
0.35714
-0.00781
0.00000
0.50510
0.41051
0.31467
0.38067
0.33089
0.49597
0.55008
0.49549
0.50197
0.29349
0.82729
0.32039
0.80472
0.92470
0.67027
0.78874
0.47180
ML-none
0.35714
-0.00781
0.30000
0.50510
0.42018
0.31530
0.40869
0.32811
0.51319
0.55008
0.51786
0.50362
0.28428
0.82083
0.34353
0.79041
0.92484
0.68484
0.80154
0.49272
spinglass
0.37755
-0.00781
0.30000
0.50510
0.44880
0.33121
0.41979
0.35119
0.53250
0.53863
0.52852
0.52640
0.31336
0.81619
0.36597
0.81481
0.90584
0.70985
0.81747
0.50502
ML-sgrd
0.35714
0.00000
0.30000
0.50510
0.42905
0.31972
0.41979
0.34776
0.55000
0.55008
0.52587
0.52724
0.31078
0.82773
0.36824
0.81551
0.92558
0.71592
0.84747
0.50753
ML-KL
0.37755
0.00000
0.30000
0.51276
0.44880
0.33601
0.41979
0.35952
0.54125
0.55008
0.52587
0.52561
0.31078
0.82773
0.36824
0.81551
0.92558
0.71611
0.84781
0.51100
Table 4.10: Clustering Results of Reference Algorithms
cluster size |Ci | and the wakita HE variant the of number of edges leaving the
cluster. A few other algorithms are available through the igraph library of Csárdi
and Nepusz [16].4 The following were used. The spinglass algorithm of Reichardt
and Bornholdt [67] optimizes modularity by simulated annealing on a physical energy
model. The implementation requires an upper bound on the number of clusters. In
this evaluation the upper bound 120 was used for all graphs. A spectral method is
Newman’s [58] recursive bisection based on the first eigenvector of the modularity
matrix (leadingev ). Finally an agglomeration method based on random walks is the
walktrap algorithm of Pons and Latapy [64].
For comparison the standard configuration with three refinement variants was
chosen. All use greedy grouping by weight density with 10% reduction factor. The
variant without any refinement (ML-none) is similar to other pure agglomeration
methods, namely fgj, wakita-HN, and wakita-HE. The two other refinement variants are sorted greedy refinement by density-fitness (ML-sgrd ) and the improved
Kernighan-Lin refinement (ML-KL).
Table 4.10 lists the modularity of the clusterings found by the implementations.
The graphs are sorted by number of vertices and the algorithms by mean modularity. The last row shows the arithmetic mean modularity for each algorithm. For
each graph the algorithms with the best result are marked with a bold font. A similar table of comparing the runtime of the algorithms is included in Appendix B.6.
Figure 4.10 summarizes the mean modularity gained by each algorithm and the
measured runtime.
It is visible that in average greedy grouping by density (ML-none) is better than all
other agglomeration methods (fgj, wakita HE, wakita HN ) even without refinement.
Yet on a few graphs other agglomeration methods perform slightly better. Looking at
the multi-level refinement methods (ML-sgrd, ML-KL) only the spinglass algorithm
4
available at http://igraph.sourceforge.net/
77
4 Evaluation
0.3
0.4393
walktrap
leadingev
0.4580
0.4644
0.4718
0.4927
0.5050
0.5075
0.5110
fgj
ML−none
spinglass
ML−sgrd
ML−KL
0.2
0.3963
0.0
0.1
Average Modularity
0.4
0.5
Comparison of Modularity
wakita_HE wakita_HN
Algorithm
(a) Modularity by Algorithm
5000
Comparison of Runtime
3000
2000
walktrap
leadingev
wakita_HE
wakita_HN
fgj
ML−none
spinglass
ML−sgrd
ML−KL
0
1000
runtime [s]
4000
●
0
●
●
●
●
●●
●● ●
5000
10000
15000
20000
25000
vertex count
(b) Runtime of the Algorithms vs. Graph Size
Figure 4.10: Clustering Results and Runtime of the Reference Algorithms
78
4.7 Comparison to Published Results
of Reichardt and Bornholdt produces comparable good clusterings. However it is
much slower. On nearly all graphs it is outperformed by the multi-level KernighanLin refinement (ML-KL) in terms of modularity and runtime. The three slowest
implementations were spinglass, ML-KL, and leadingev. All other algorithms had a
relatively constant, low runtime.
4.7 Comparison to Published Results
This section compares the presented multi-level refinement method against other
clustering algorithms. For many of these algorithms clustering results are published
in the papers presenting the algorithm. Because of their size clusterings are directly
printed only for very small graphs. Commonly just the modularity value of the
clusterings are published.
The following discussion is based on the algorithms listed in Section 2.4 about
fundamental clustering methods. First some general problems of this evaluation
method are discussed. Then for each graph the modularity values found in articles
are presented. The section concludes with a small summary table comparing the
best found values to own results.
Only modularity values published together with the original algorithms were considered and for each value the source article is cited. The modularity values are
compared against two multi-level Kernighan-Lin refinement algorithms. Both use
greedy grouping with 10% reduction factor and Kernighan-Lin refinement. As merge
selector the weight density is employed by the ML-KL-density variant and the random walk reachability (2,3) by the ML-KL-rw variant. The first variant is the default
configuration identified by the previous evaluation. In addition the other variant was
chosen because for degree volume model used here it is in many cases able to find
better clusterings.
The comparison of printed modularity values embodies some general problems.
Often only a few digits are printed to save space. With just three digits it is difficult
to check whether the same or just similar clusterings where found. In this regard
also printing the number of clusters would be helpful. The calculation of modularity
may be done slightly different. Several variants of the modularity measure exist and
small variations in the handling of self-edges are possible. In addition unweighted
edges might have been used instead of the available weighted version. Finally small
diversities can arise in preprocessing and graph conversion by different strategies to
obtain undirected, symmetric graphs. Here just the last problem is addressed by
excluding graphs when their published number of vertices and edges differs from the
own version. This applies to most graphs in [64, 25, 69].
4.7.1 The Graphs and Clusterings
In the following paragraphs each graph is shortly presented. Thereafter the clustering results are discussed. For smaller graphs also example pictures are printed.
The layout was computed with the LinLog energy model [62]5 . The best clustering
5
available at http://www.informatik.tu-cottbus.de/~an/GD/
79
4 Evaluation
(a) karate
(b) dolphins
Figure 4.11: Reference Graphs (karate and dolphins)
computed by any configuration of the own multi-level algorithm is shown as vertex
color.
karate This is Zachary’s karate club study [82], maybe the most studied social
network. A picture of the graph colored with the optimum clustering is shown in
Figure 4.11a. During the 2 years long study, 34 members of a karate club were
observed and the number of contacts outside of the club were counted. Like in
most other studies of this network the simple unweighted version of the network is
used here. The instructor of the club (vertex #1) resigned because of conflicts with
the club administrator (vertex #34) about lesson fees. His supporters followed him
forming a new organization with the members 1 to 9, 11 to 14, 17, 18, 20, 22. Person
9 joined this faction although he was a supporter of the club president, because he
was only three weeks away from the test for black belt when the split occurred, and
had to follow the instructor to retain this rank.
The greedy joining algorithm of Newman found 2 clusters with modularity 0.381 [60].
By modifying the merge selector Danon found a clustering with modularity 0.4087 [17].
The spectral clustering algorithms of Donetti et al. yielded modularity 0.412 [21]
and 0.419 [20] in the improved version. Also Newman’s recursive spectral bisection
found modularity 0.419 [58]. Surprisingly the algorithms based on random walks
just found clusterings with 0.3937 [65] and 0.38 [64]. Extremal Optimization with
4 clusters of modularity 0.4188 [23] fell short of the optimum. As verified by [2]
the global optimum clustering has 4 clusters with modularity 0.4197. The same
clustering was found by both multi-level algorithms.
80
4.7 Comparison to Published Results
(a) polBooks
(b) afootball
Figure 4.12: Reference Graphs (polBooks and afootball)
dolphins The dolphin social network is an undirected graph of frequent associations
between 62 dolphins in a community residing in Doubtful Sound in New Zealand [51,
50]. For a picture see Figure 4.11b.
This graph just recently appeared in the context of modularity clustering. Thus
only very few published results exist. With Fractional Linear Programming a clustering of modularity 0.529 and the upper bound 0.531 was found [2]. Also the exact
Integer Linear Programming gave an optimum clustering with modularity 0.529 [13].
Using multi-level refinement 5 clusters with modularity 0.52586 (ML-KL-density)
and 0.52772 (ML-KL-rw ) were found. The best clustering found with experimental
configurations had modularity 0.52851 and the same clustering was found by the
spinglass algorithm in the previous section.
polBooks A network of books about recent US politics sold by the online bookseller
Amazon.com. Edges between books represent frequent co-purchasing of books by
the same buyers (Fig. 4.12a). The network was compiled by Valdis Krebs [47].
With Fractional Linear Programming a clustering of modularity 0.5272 and an
upper bound 0.528 was found [2]. This result was confirmed by the exact Integer
Linear Programming method with modularity 0.527 [13]. The multi-level algorithm
ML-KL-rw and very many other configurations found 5 clusters with modularity
0.52723. Surprisingly the ML-KL-density variant just found 0.52560 with 6 clusters
although greedy refinement in the same configuration found the better clustering.
afootball A real-world graph with known group structure is the games schedule of
Division I of the United States college football for the 2000 season [31]. The 115
teams are grouped into conferences of 8-12 teams and some independent teams. In
81
4 Evaluation
(a) jazz
(b) celegans metabolic
Figure 4.13: Reference Graphs (polBooks and celegans metabolic)
average seven intra- and four inter-conference games are played and inter-conference
games between geographical close teams are more likely. A picture of the graph is
visible in Figure 4.12b.
Newman’s greedy joining algorithm found 6 clusters with modularity 0.546 [60]
whereas the spectral methods of White et al. found 11 clusters with 0.602 [80].
Similarly the random walk based agglomeration of Pons found modularity 0.60 [64].
With Fractional Linear Programming and rounding a clustering of 0.6046 and the
upper bound 0.606 was found [2]. The random walk based multi-level clustering
ML-KL-rw comes close to this bound with 10 clusters and modularity 0.60582. The
simpler variant ML-KL-density just got modularity 0.59419 with 7 clusters.
jazz A network of 198 Jazz musicians compiled by Geisler and Danon [33]. Such
graphs are interesting for the social sciences as Jazz musicians very often work
together in many different small groups (Fig. 4.13a). This graph also became very
popular for the comparison of clustering methods.
A picture of the graph is shown on the right.
The best clustering was reported for Extremal Optimization with 5 clusters of
modularity 0.4452 [23]. Fractional Linear programming with rounding gave modularity 0.445 [2] and an upper bound of 0.446. Spectral clustering methods yielded
modularity 0.437 [21], 0.444 [20], and 0.442 [58]. The worst modularity 0.4409 [17]
was produced by Danon’s greedy joining with modified merge selector. Both multilevel algorithms found 4 clusters with modularity 0.44514 (ML-KL-rw ) and 0.44487
(ML-KL-density).
82
4.7 Comparison to Published Results
(a) circuit s838
(b) email
Figure 4.14: Reference Graphs (circuit s838 and email)
celegans metabolic A metabolic network for the nematode Caenorhabditis elegans [42, 23]. The graph describes chemical reactions as well as the regulatory
interactions that guide these reactions (Fig. 4.13b).
The first published clustering result of this graph was found by Extremal Optimization with 12 clusters of modularity 0.4342 [23]. Greedy joining with precoarsening by short random walks found 7 clusters with modularity 0.4164 [65].
Newman’s recursive spectral bisection yielded 0.435 [58]. The recursive bisection algorithm with Quadratic Programming of Agarwal and Kempe found the to date best
published clustering with modularity 0.450. The two multi-level refinement algorithms gave similar results with 0.44992 (ML-KL-rw ) and 0.45090 (ML-KL-density).
The best clustering with experimental configurations had modularity 0.45136.
circuit s838 The electric circuit s838 st.txt of Uri Alon’s collection of complex
networks [53]. This graph is an undirected version with 512 vertices connected by
819 edges (Fig. 4.14a).
The only published clustering has modularity 0.815 with 13 clusters [69]. Both
multi-level configurations found clusterings of similar modularity. The random walk
based greedy grouping ML-KL-rw is with 0.815507 slightly worse than ML-KLdensity with the weight density selector which got 0.815513. However better clusterings exist. For example with experimental configurations a clustering with 12
clusters and modularity 0.81719 was found.
email The network of e-mail interchanges between members of the University
Rovira i Virgili in Tarragona, Spain [36, 37]. The graph contains 1133 users of
the university e-mail system, including faculty, researchers, technicians, managers,
83
4 Evaluation
administrators, and graduate students. Two users are connected by an edge if both
exchanged e-mails with each other (Fig. 4.14b). Thus the graph is undirected and
unweighted.
In the last years also this graph became popular for the evaluation of graph clustering methods. The first published result is for Extremal Optimization. It found
15 clusters and modularity 0.5738 [23]. Danon’s greedy joining with modified merge
selector just yielded modularity 0.5569 [17]. The recursive spectral bisection of Newman found a clustering with modularity 0.572 [58]. The best published clustering
with modularity 0.579 was found by recursive bisection with Quadratic Programming [2]. The multi-level algorithm ML-KL-density produced 11 clusters with modularity 0.57767. A better clustering with 9 clusters and modularity 0.58137 was
obtained from the random walk variant ML-KL-rw.
hep-th main A scientist collaboration network in high-energy physics. The graph
contains the collaborations of scientists posting preprints on the high-energy theory
archive at http://www.arxiv.org between 1995 and 1999. The 5835 vertices represent authors and edges connect co-authors. The edges are weighted as described
in [61].
The only published clustering is from a recursive spectral bisection method. It
produced 114 clusters with modularity 0.707 [21]. The multi-level algorithm MLKL-density found 57 clusters of 0.85523. A better clustering was found with the
random-walk variant ML-KL-rw. It contains 59 clusters and has modularity 0.85629.
Erdos02 This is the year 2002 version of the collaboration network around Erdös [34].
The vertices represent authors and the unweighted edges connect co-authors. Only
authors with a co-author distance from Erdös up to 2 are included.
The only published clustering was found with greedy joining combined with random walk based pre-coarsening. It contains 20 clusters and has modularity 0.6817 [65].
The multi-level algorithm ML-KL-density produced 39 clusters with modularity
0.71611. A slightly better clustering with the same number of clusters and modularity 0.716396 was found by the random-walk variant ML-KL-rw
PGPgiantcompo A giant component of the network of users of the Pretty-GoodPrivacy algorithm for secure information interchange compiled by Arenas et al. [12,
35]. The component contains 10680 vertices. Edges connect users trusting each
other.
The modified greedy joining of Danon found the worst clustering with modularity
0.7462 [17]. Extremal Optimization found 365 clusters with modularity 0.8459 [23].
The best published modularity got Newman’s spectral bisection with 0.855 [58].
The best of both multi-level algorithms was ML-KL-rw. It found 96 clusters with
modularity 0.88462. The greedy grouping by density ML-KL-density yielded 102
clusters with 0.88392. Finally the best clustering found in own experiments so far
has 94 clusters and modularity 0.885450.
cond-mat-2003 main A collaboration network of scientists posting preprints on
the condensed matter archive at http://www.arxiv.org. This version contains data
84
4.7 Comparison to Published Results
graph
karate
dolphins
polBooks
afootball
jazz
celegans metabolic
circuit s838
email
hep-th main
Erdos02
PGPgiantcompo
cond-mat-2003 main
|V |
34
62
105
115
198
453
512
1133
5835
6927
10680
27519
|E|
78
159
441
613
2742
2040
819
5451
13815
11850
24316
116181
article
[2]
[2]
[2]
[2]
[23]
[2]
[69]
[2]
[21]
[65]
[58]
[65]
modularity
0.4197 (4)
0.529
0.5272
0.6046
0.4452 (5)
0.450
0.815 (13)
0.579
0.707 (114)
0.6817 (20)
0.855
0.7251 (44)
ML-KL-density
0.41978 (4)
0.52586 (5)
0.52560 (6)
0.59419 (7)
0.44487 (4)
0.45090 (9)
0.81551 (15)
0.57767 (11)
0.85523 (57)
0.71611 (39)
0.88392 (102)
0.81377 (77)
ML-KL-rw
0.41978 (4)
0.52772 (5)
0.52723 (5)
0.60582 (10)
0.44514 (4)
0.44992 (8)
0.81550 (15)
0.58137 (9)
0.85629 (59)
0.71639 (39)
0.88462 (96)
0.81586 (78)
Table 4.11: Comparison to Published Results
from 1995 to 2003. Vertices represent authors and edges connect co-authors. The
edges are weighted as described in [61]. Here only the largest component containing
27519 vertices is used.
Extremal Optimization discovered 647 clusters with modularity 0.6790 [23]. Later
much better clusterings were found. Newman’s recursive spectral bisection yielded
0.723 [58] and greedy joining combined with pre-coarsening found 44 clusters with
modularity 0.7251 [65]. Of the own algorithms the multi-level refinement ML-KLdensity produced 77 clusters with 0.81377. A better clustering with 78 clusters
and modularity 0.81586 was found by ML-KL-rw. However even better clusterings
were discovered in experiments. For example one with 82 clusters and modularity
0.81718.
4.7.2 Summary
The Table 4.11 summarizes the clustering results. For each graph the best modularity found in the literature and the modularity computed with the two multi-level
algorithms ML-KL-density and ML-KL-rw is given. The graphs are sorted by the
number of vertices and the best modularity on each graph is marked in bold.
On smaller graphs optimal or at least very good clusterings were obtained with
the Linear Programming methods from [2] and [13]. In these cases the multi-level
algorithms found comparable good clusterings. Only few published results for big
graphs exist. There the multi-level algorithm ML-KL-rw performed much better
than other methods. Still the clusterings found by experimental configurations show
that on some graphs better results are possible. But these were found more by
fortunate circumstances than by a reliable, systematic search.
85
5 Results and Future Work
The objective of this work was developing and evaluating a multi-level refinement
algorithm for the modularity graph clustering problem. In this context the main
focus was on the derivation and evaluation of better merge selection criteria for the
graph coarsening, implementing more efficient data structures for coarsening and
refinement, and researching and evaluating applicable refinement algorithms.
The following sections discuss the results of this work. First immediate results,
like the default configuration derived from the evaluation, are presented. The second
section shortly presents by-products developed during the course of this work and
the last section comments on possible directions for future work.
5.1 Results of this Work
Merge selection criteria direct the selection of cluster pairs in the coarsening phase
of the multi-level algorithm. The selected clusters are merged until the clustering
is not further improved. This merge process produces the initial clustering and the
vertex groups for the later refinement. Therefore merge selectors play a crucial role
in the overall algorithm.
In this work a range of semi-global and global merge selectors was presented along
with simple local selectors. The first group of selectors uses the visit probabilities
of short random walks and the second groups applies spectral method to analyze
the global structure of the graph. The implemented variants were experimentally
compared using mean modularities over a large collection of graphs. Of the seven
developed selectors four performed nearly equally well in terms of mean modularity.
These are, in the order of best to worser results, the random walk reachability, the
weight density, the modularity increase and the spectral angle. Comparing runtime
and simplicity the weight density proved to produce good results in short time while
being very simple to calculate.
Two coarsening algorithms were implemented. The greedy grouping method works
like the greedy joining of Newman [60] by directly merging the selected pair of clusters in each step. Efficient implementations of this method require a dynamic graph
data structure to quickly retrieve and update selection qualities. The implemented
structure is similar to the algorithm of Wakita [76]. The second method is greedy
matching and resembles the graph coarsening methods traditionally used with multilevel graph algorithms. It constructs a matching of some vertices using only the best
ranked pairs. The experimental comparison showed that greedy matching is less
sensitive against bad merge selectors when combined with refinement than greedy
grouping. However the grouping method proved to be more reliable and produced
in average slightly better results.
87
5 Results and Future Work
For developing effective and efficient refinement algorithms the design space of
simple methods was analyzed. The main focus was on algorithms moving single
vertices between clusters. In the domain of greedy refinement methods this analysis
allowed to develop a more efficient greedy algorithm. It is named sorted greedy
refinement and selects vertices to move using a simple move selector instead of the
expensive to calculate modularity increase. Three move selectors based on the vertex
fitness concept were proposed. In addition a hill-climbing refinement algorithm in
the style of Kernighan and Lin [45] was implemented.
All refinement algorithms were able to improve the clustering results considerably
compared to the raw coarsening phase. The sorted greedy refinement produced
equally good results compared to the traditional greedy refinement. At the same
time it was much faster and the runtime scaled better with the graph size. The
move selectors had an insignificant influence on the results. The Kernighan-Lin
refinement was improved in its runtime and it also produced slightly better results
than greedy refinement. However its bad scalability still makes it unusable for large
graphs. For example on graphs with about 25,000 vertices it was around 60 times
slower than the sorted greedy refinement.1
The presented algorithm reflects the state of the art of multi-level refinement
methods. In comparison to other clustering algorithms the multi-level refinement
was competitive in both clustering quality and runtime. However studying clusterings produced with other configurations shows that still improvements of the reliability are possible as sometimes small variations of the parameters lead to better
clusterings.
Summarizing the results, following default configuration was identified. For the
graph coarsening phase greedy merging by the weight density merge selector should
be used with reduction factor 10%. For the refinement phase the sorted greedy
refinement with density-fitness should be used. When the computation time is not
a concern alternatively Kernighan-Lin refinement could be used to find slightly
better clusterings.
5.2 By-Products
As by-product the study of the mathematical properties of the modularity measure
presented in Chapter 2 shed light on the relationship between modularity and other
clustering quality measures. In this context the concept of connection density relative to a volume model was introduced and a mathematical connection between
density and modularity was established. The volume model describes expected edge
weights derived from a vertex similarity measure. This enables the quick and easy
adaption of the modularity measure together with the clustering algorithms to specific application domains.
The implementation of the presented clustering algorithm is relatively flexible
and configurable. The algorithm’s components like merge selectors and refinement
algorithms can be replaced without much effort. At the same time the use of index
spaces and maps supports the generic implementation of graph algorithms. This
1
88
Measured on the graph DIC28 main, cf. Table B.5
5.3 Directions for Future Work
strategy turned out to be useable and it is possible to transfer the key concepts into
other programming languages.
Over the course of this project a small website was build up collecting information about clustering algorithms, benchmark graphs and related topics.2 The site
contains a rich set of data about the graph collection, including various eigenvalue
spectra and other graph statistics. Clustering results are summarized for each graph
and clustering methods can be compared globally. Such a website could be used in
future as a reference for the evaluation of algorithms by providing a common collection of graphs and a tool chain to compare clusterings to older results. For example
in and around the load-balancing community the collection of Walshaw et al. [74]
containing graphs and partitioning results helped to track the current state of art
and provided inspiration to new combined optimization strategies.
5.3 Directions for Future Work
In this section a few ideas for potential improvements of the clustering algorithm
are proposed. The first three subsections are mostly related to the graph coarsening
phase and the last two discuss high-level meta-strategies.
5.3.1 Pre-Coarsening
Performance and quality might be improved by applying a pre-coarsening to the
initial graph: The vertex degree of quite some real-world networks follows a power
law distribution. In that case most vertices have only a single edge attaching them
to a more central vertex. These vertices are called hairs and certainly will be
grouped into the same cluster as their single neighbor. Particularly in unweighted
graphs maximum modularity clusterings cannot contain clusters of a single one-edge
vertex [13]. Thus a special first coarsening pass should merge them in advance. This
will improve the information available to the merge selector. Additionally the greedy
matching may become more effective and reliably because it is no longer obstructed
by single-edge vertices. Similar preprocessing methods were already proposed by
Arenas et al. [4] for size reduction of graphs.
An implementation would collect the unweighted vertex degrees first. All hair
vertices have unit degree and thus are easy to find. They are merged to its single
neighbor vertex if this increases the modularity. In addition some two-edge vertices
might be reduced: These bridge vertices lie between two neighbors and could be
merged with the better of both neighbors.
Furthermore this pre-coarsening is a strong counter-example to merge selectors
based on vertex-size or consolidation ratios. These try to prefer vertex pairs of
similar size in order to grow clusters more evenly. But merging hairs with their
much bigger neighbor is an expected and necessary behavior in spite the huge size
difference.
2
presently at http://goya.informatik.tu-cottbus.de/~clustering/
89
5 Results and Future Work
5.3.2 Study of Merge Selectors
During the course of this work a measure for the prediction quality of merge selectors
was developed. It counts the percentage of the vertex pairs having a high selection
quality that would actually select a pair of vertices in the same final cluster. For
this purpose a reference clustering is necessary.
However it turned out that this prediction quality alone does not much about the
general performance of a merge selector. In practice it is also important to study
when and where coarsening errors, i.e. putting together vertices of different clusters,
occur. For example it might be that the spectral angle merge selector performs well
on big, fine grained graphs but just fails on the coarser coarsening levels. On the fine
levels all information is non-locally stretched while the coarsening joins information
about the structure and may produce good local information later on. In that case
the weight density selector should be used on the coarser levels instead.
5.3.3 Linear Programming
It is possible to formulate the clustering problem as linear or quadratic program [13,
2]. Instead of classic rounding techniques the computed distances could be used as
merge selection quality. This would enable multi-level refinement on the rounding
results.
The fractional linear program is solvable in polynomial time but requires |V |2
space for the distance matrix and cubic space for the transitivity constraints. Thus
it becomes impracticable already for medium-sized graphs unless the constraints
are replaced by a more compact implicit representation. However for the presented
multi-level refinement approximate distances between adjacent vertices would suffice.
Maybe such approximations could be faster computed using other representations
of the optimization aim. For example in [6] an embedding into higher-dimensional
unit spheres under square Euclidean norm was used for similar quality measures.
5.3.4 Multi-Pass Clustering and Randomization
A meta-strategy similar to evolutionary search [72] is multi-pass clustering. Because
the refinement corrects coarsening errors the computed clustering contains valuable
information. In a second pass this can be fed back into the coarsening by ignoring all vertex pairs crossing previous cluster boundaries. This effectively produces
a corrected coarsening hierarchy and allows further improvements by refinement
heuristics. Coarsening and refinement are repeated until the clustering does not
further improve. The multi-pass search may be widened by applying some kind of
randomization during the coarsening.
5.3.5 High-Level Refinement Search
In the domain of cluster refinement several improvements might be possible. For
example restarting the Kernighan-Lin search on intermediate clusterings allows to
improve the search depth. In this context a slight randomization would help breaking
90
5.3 Directions for Future Work
ties and avoiding to cycle repeatedly through the same movement sequence. But a
too strong randomization prevents the detection of local optima again.
Still the Kernighan-Lin approach is very slow because in modularity clustering
the quality improvements of vertex moves are difficult to compute. On the other
hand in this work a very fast greedy refinement method was developed. Similarly a
fast, randomized method to leave local optima could be developed. Combining both
would allow to walk between local optima like in the basin hopping method [52].
However some open questions remain how this can be effectively combined with the
multi-level strategy. For example currently a lot of information about the graph is
lost between each coarsening level because just the last best clustering is projected
to the finer level.
91
Bibliography
[1] A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning powerlaw graphs. 20th International Parallel and Distributed Processing Symposium,
page 10, 2006.
[2] Gaurav Agarwal and David Kempe. Modularity-maximizing network communities via mathematical programming. 0710.2533, October 2007.
[3] Charles J. Alpert, Andrew B. Kahng, and So-Zen Yao. Spectral partitioning
with multiple eigenvectors. Discrete Applied Mathematics, 90:3–26, 1999.
[4] A. Arenas, J. Duch, A. Fernandez, and S. Gomez. Size reduction of complex
networks preserving modularity. New Journal of Physics, 9:176, June 2007.
[5] A. Arenas, A. Fernández, and S. Gómez. Analysis of the structure of complex
networks at different resolution levels. New Journal of Physics, 10:053039, 2008.
[6] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric
embeddings and graph partitioning. Proceedings of the thirty-sixth annual ACM
symposium on Theory of computing, pages 222–231, 2004.
[7] James P. Bagrow and Erik M. Bollt. Local method for detecting communities.
Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 72:046108–
10, October 2005.
[8] Vladimir Batagelj and Andrej Mrvar. Pajek datasets, 2006.
[9] M. Besch and H.W. Pohl. Dependence-Free Clustering of Shift-Invariant Data
Structures. Proceedings of the Third International Euro-Par Conference on
Parallel Processing, pages 338–341, 1997.
[10] Stefan Boettcher and Allon G. Percus. Extremal optimization for graph partitioning. Physical Review E, 64:026114, July 2001.
[11] Stefan Boettcher and Allon G. Percus. Optimization with extremal dynamics.
Physical Review Letters, 86:5211, June 2001.
[12] M. Boguñá, R. Pastor-Satorras, A. Dı́az-Guilera, and A. Arenas. Models
of social networks based on social distance attachment. Physical Review E,
70(5):56122, 2004.
[13] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer,
Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE Trans.
on Knowl. and Data Eng., 20:172–188, 2008.
93
BIBLIOGRAPHY
[14] B.L. Chamberlain. Graph partitioning algorithms for distributing workloads of
parallel computations. University of Washington Technical Report UW-CSE98-10, 3, 1998.
[15] Aaron Clauset, M. E. J. Newman, and Cristopher Moore. Finding community
structure in very large networks. Physical Review E, 70:066111, December 2004.
[16] Gábor Csárdi and Tamás Nepusz. The igraph software package for complex
network research. InterJournal Complex Systems, 1695, 2006.
[17] L. Danon, A. Dı́az-Guilera, and A. Arenas. Effect of size heterogeneity on
community identification in complex networks. Arxiv preprint physics/0601144,
2006.
[18] Leon Danon, Albert Dı́az-Guilera, Jordi Duch, and Alex Arenas. Comparing
community structure identification. Journal of Statistical Mechanics: Theory
and Experiment, 2005:P09008, 2005.
[19] Hristo Djidjev. A scalable multilevel algorithm for graph clustering and community structure detection. Workshop on Algorithms and Models for the Web
Graph, Lecture Notes in Computer Science, pages LA–UR–06–6261, 2006.
[20] L. Donetti and M. A Munoz. Improved spectral algorithm for the detection of
network communities. physics/0504059, April 2005.
[21] Luca Donetti and Miguel A. Muñoz. Detecting network communities: a new
systematic and efficient algorithm. Journal of Statistical Mechanics: Theory
and Experiment, 2004:P10012, 2004.
[22] Jonathan Doye and David Wales. On the thermodynamics of global optimization. cond-mat/9709019, September 1997. Phys. Rev. Lett. 80, 1357 (1998).
[23] Jordi Duch and Alex Arenas. Community detection in complex networks using extremal optimization. Physical Review E (Statistical, Nonlinear, and Soft
Matter Physics), 72:027104–4, 2005.
[24] C.M. Fiduccia and R.M. Mattheyses. A linear-time heuristic for improving
network partitions. In Design Automation, 1982. 19th Conference on, pages
175–181, 1982.
[25] Santo Fortunato and Marc Barthelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36, 2007.
[26] Santo Fortunato, Vito Latora, and Massimo Marchiori. Method to find community structures based on information centrality. Physical Review E (Statistical,
Nonlinear, and Soft Matter Physics), 70:056104–13, November 2004.
[27] H. N. Gabow, Z. Galil, and T.H. Spencer. Efficient Implementation of Graph
Algorithms Using Contraction. Journal of the Association for Computing Machinery, 36(3):540–572, 1989.
94
BIBLIOGRAPHY
[28] M. Gaertler, R. Görke, and D. Wagner. Significance-driven graph clustering.
Proceedings of the 3rd International Conference on Algorithmic Aspects in Information and Management (AAIM’07). Lecture Notes in Computer Science,
June 2007.
[29] J. Gerlach and P. Gottschling. A generic c++ framework for parallel meshbased scientific applications. Proceedings of the 6th International Workshop on
High-Level Parallel Programming Models and Supportive Environments, pages
45–54, 2001.
[30] Jorge Gil-Mendieta and Samuel Schmidt. The political network in mexico.
Social Networks, 18:355–381, October 1996.
[31] M. Girvan and J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821, 2002.
[32] M. Girvan and M. E. J. Newman. Community structure in social and biological
networks. Proceedings of the National Academy of Sciences, 99:7821–7826, June
2002.
[33] P. Gleiser and L. Danon. Community structure in jazz. Advances in Complex
Systems, 6(4):565–573, 2003.
[34] Jerry Grossman. The erdös number project, 2002.
[35] X. Guardiola, R. Guimera, A. Arenas, A. Diaz-Guilera, D. Streib, and L. A. N.
Amaral. Macro-and micro-structure of trust networks. Arxiv preprint condmat/0206240, 2002.
[36] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similar
community structure in a network of human interactions. Physical Review E,
68:065103, December 2003.
[37] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. The real
communication network behind the formal chart: Community structure in organizations. Journal of Economic Behavior & Organization, 61:653–667, December 2006.
[38] Roger Guimera and Luis A. Nunes Amaral. Functional cartography of complex
metabolic networks. q-bio/0502035, February 2005. Nature 433, 895-900 (2005).
[39] Roger Guimerà, Marta Sales-Pardo, and Luı́s A. Nunes Amaral. Modularity
from fluctuations in random graphs and complex networks. Physical Review E,
70:025101, 2004.
[40] David Harel and Yehuda Koren. On clustering using random walks. Lecture
Notes in Computer Science, 2245:18–41, 2001.
[41] Bruce Hendrickson and Robert W. Leland. A multilevel algorithm for partitioning graphs. Proc. Supercomputing, 95:285, 1995.
95
BIBLIOGRAPHY
[42] H. Jeong, B. Tombor, R. Albert, Z. N Oltvai, and A. L Barabasi. The largescale organization of metabolic networks. cond-mat/0010278, October 2000.
Nature, v407 651-654 (2000).
[43] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359,
1998.
[44] George Karypis and Vipin Kumar. Parallel multilevel k-way partitioning scheme
for irregular graphs. SIAM Review, 41(2):278–300, 1999.
[45] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning
graphs. Bell System Technical Journal, 49(2):291–307, 1970.
[46] J. Kleinberg. An impossibility theorem for clustering. Advances in Neural
Information Processing Systems 15: Proceedings of the 2002 Conference, 2003.
[47] Valdis Krebs. A network of books about recent us politics sold by the online
bookseller amazon.com. http://www.orgnet.com/.
[48] R.
Lehoucq,
K.
Maschhoff,
D.
Sorenson,
and
C.
Yang.
Arpack:
An efficient portable large scale eigenvalue package
http://www.caam.rice.edu/software/arpack/, 1996.
[49] László Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdos
is Eighty, 2:1–46, 1993.
[50] David Lusseau. Evidence for social role in a dolphin social network.
bio/0607048, July 2006.
q-
[51] David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, Elisabeth
Slooten, and Steve M. Dawson. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral
Ecology and Sociobiology, 54:396–405, 2003.
[52] Claire P. Massen and Jonathan P. K. Doye. Identifying communities within
energy landscapes. Physical Review E (Statistical, Nonlinear, and Soft Matter
Physics), 71:046101–12, April 2005.
[53] Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr,
Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. Superfamilies of evolved and
designed networks. Science, 303:1538–1542, March 2004.
[54] Burkhard Monien, Robert Preis, and Ralf Diekmann. Quality matching
and local improvement for multilevel graph-partitioning. Parallel Computing,
26:1609–1634, November 2000.
[55] M. Müller-Hannemann and A. Schwartz. Implementing weighted b-matching
algorithms: towards a flexible software design. J. Exp. Algorithmics, 4:7, 1999.
96
BIBLIOGRAPHY
[56] Newman. Detecting community structure in networks. The European Physical
Journal B - Condensed Matter and Complex Systems, 38:321–330, March 2004.
[57] J. Newman. Finding community structure in networks using the eigenvectors
of matrices. Physical Review E, 74(3):036104, 2006.
[58] J. Newman. Modularity and community structure in networks. Proceedings of
the National Academy of Sciences, 103(23):8577, 2006.
[59] J. Newman and M. Girvan. Finding and evaluating community structure in
networks. Physical Review E, 69(2):026113, 2004.
[60] M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E, 69:066133, June 2004.
[61] MEJ Newman. From the Cover: The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404, 2001.
[62] Andreas Noack. Unified quality measures for clusterings, layouts, and orderings of graphs, and their application as software design criteria. PhD thesis,
Brandenburgische Technische Universität Cottbus, 2007.
[63] P. Pons. Post-processing hierarchical community structures: Quality improvements and multi-scale view. ArXiv Computer Science e-prints, cs.DS/0608050,
2006.
[64] Pascal Pons and Matthieu Latapy. Computing communities in large networks
using random walks (long version). physics/0512106, December 2005.
[65] Josep M. Pujol, Javier Bejar, and Jordi Delgado. Clustering algorithm for determining community structure in large networks. Physical Review E (Statistical,
Nonlinear, and Soft Matter Physics), 74:016107–9, July 2006.
[66] Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and
Domenico Parisi. Defining and identifying communities in networks. Proceedings
of the National Academy of Sciences, 101:2658–2663, March 2004.
[67] Jorg Reichardt and Stefan Bornholdt. Statistical mechanics of community detection. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics),
74(1):016110–14, 2006.
[68] J?rg Reichardt and Stefan Bornholdt. Detecting fuzzy community structures
in complex networks with a potts model. Physical Review Letters, 93:218701,
November 2004.
[69] Jianhua Ruan and Weixiong Zhang. Identifying network communities with a
high resolution. Physical Review E (Statistical, Nonlinear, and Soft Matter
Physics), 77:016104–12, 2008.
[70] J. Siek, L.Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide
and Reference Manual. Addison-Wesley, 2002.
97
BIBLIOGRAPHY
[71] A. J. Soper and C. Walshaw. A generational scheme for partitioning graphs.
In L. Spector et al., editor, Proc. Genetic & Evolutionary Comput. Conf.
(GECCO-2001), pages 607–614. Morgan Kaufmann, San Francisco, 2001.
[72] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search and
multilevel approach to graph partitioning. In D. Whitley et al., editor, Proc.
Genetic & Evolutionary Comput. Conf. (GECCO-2000), pages 674–681. Morgan Kaufmann, San Francisco, 2000.
[73] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search and
multilevel optimisation approach to graph partitioning. Tech. Rep. 00/IM/58,
Comp. Math. Sci., Univ. Greenwich, London SE10 9LS, UK, April 2000.
[74] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search and
multilevel optimisation approach to graph partitioning. J. Global Optimization,
29(2):225–241, 2004.
[75] Stijn van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University
of Utrecht, May 2000.
[76] Ken Wakita and Toshiyuki Tsurumi. Finding community structure in megascale social networks. cs/0702048, February 2007.
[77] David Wales and Jonathan Doye. Global optimization by basin-hopping and the
lowest energy structures of lennard-jones clusters containing up to 110 atoms.
cond-mat/9803344, March 1998. J. Phys. Chem. A 101, 5111-5116 (1997).
[78] C. Walshaw and M. Cross. Mesh partitioning: a multilevel balancing and
refinement algorithm. SIAM J. Sci. Comput., 22(1):63–80, 2000.
[79] J.H. Ward Jr. Hierarchical grouping to optimize an objective function. Journal
of the American Statistical Association, 58(301):236–244, 1963.
[80] Scott White and Padhraic Smyth. A spectral clustering approach to finding
communities in graph. SIAM Data Mining Conference, 2005.
[81] Wu and Huberman. Finding communities in linear time: a physics approach.
The European Physical Journal B - Condensed Matter and Complex Systems,
38:331–338, March 2004.
[82] W. W. Zachary. An information flow model for conflict and fission in small
groups. Journal of Anthropological Research, 33:452–473, 1977.
[83] Zhou and Lipowsky. Network brownian motion: A new method to measure vertex-vertex proximity and to identify communities and subcommunities.
Computational Science - ICCS 2004, 2004.
98
A The Benchmark Graph Collection
The Table A.1 contains a list of the used benchmarks graphs. The graphs are sorted
by their number of vertices. For each graph the number of vertices and edges, the
weighted mean vertex degree (mean wdeg) and the global connection density in the
degree volume model (wgt. density) is printed. Graphs with the suffix main are
just the largest connectivity component of the original graph. The second column
marks the graphs used in the reduced set (R), in the scalability analysis (S ), and
in the comparison to reference algorithms (C ). The last column gives a short source
reference. Links to the corresponding websites can be found in Table A.2.
Table A.1: The Benchmark Graph Collection
Chain8
Star9
K55
Student government
Tree15
ModMath main
SouthernWomen
Hi-tech main
karate
football
mexican power
Grid66
morse
Sawmill
Food
dolphins
WorldImport1999
prison alon
lesmis
world trade
polBooks
adjnoun
afootball
baywet
jazz
A99m main
SmallW main
A01 main
sandi main
celegansneural
USAir97
8Clusters
WorldCities main
celegans metabolic
circuit s838
Roget main
CSphd main
email
polBlogs main
NDyeast main
Yeast main
subset
SC
C
SC
C
C
RC
S
C
C
C
C
RC
R
S
RC
RC
R
S
S
S
RC
RS
RC
R
RC
S
SC
R
S
R
vertices
8
9
10
11
15
30
32
33
34
35
35
36
36
36
45
62
66
67
77
80
105
112
115
128
198
233
233
249
253
297
332
400
413
453
512
994
1025
1133
1222
1458
2224
edges
7
8
25
32
14
61
89
91
78
118
117
60
666
62
990
159
2145
142
254
875
441
425
613
2075
2742
325
994
635
307
2148
2126
24063
7518
2040
819
3641
1043
5451
16717
1993
7049
mean wdeg
1.75000
1.77778
5.00000
7.45455
1.86667
4.06667
5.56250
8.90909
4.58824
16.85714
6.68571
3.33333
1324.38889
3.44444
507.82222
5.12903
132361.51897
5.43284
21.29870
1644039.85000
8.40000
7.58929
10.71304
54.05277
55.39394
2.79828
17.06438
5.15663
2.42688
59.37374
12.80723
238.63000
81.80145
20.24283
3.19922
10.17807
2.03512
19.24448
31.23977
2.70302
6.14119
wgt. density
0.071429
0.06250
0.02000
0.012195
0.035714
0.0081967
0.005618
0.0034014
0.0064103
0.0016949
0.0042735
0.0083333
2.2390e-05
0.0080645
4.376e-05
0.0031447
1.1447e-07
0.0027473
0.00060976
7.6032e-09
0.0011338
0.0011765
0.00081169
0.00014453
9.1174e-05
0.0015337
0.00025151
0.00077882
0.0016287
5.6709e-05
0.00023518
1.0476e-05
2.96e-05
0.00010931
0.0006105
9.8853e-05
0.00047939
4.5863e-05
2.6197e-05
0.00025664
7.5576e-05
source
ANoack-generated
ANoack-generated
RRotta
pajek-social
ANoack-generated
pajek-social
pajek-social
pajek-social
Newman
pajek
pajek
ANoack-generated
anoack
pajek-social
ANoack
pajek-social
ANoack
UriAlon
Newman
pajek-econ
pajek-mixed
Newman-ling
Newman
pajek-bio
Arenas
pajek-social
pajek-citation
pajek-citation
pajek-citation
Newman-bio
pajek
ANoack-generated
pajek-mixed
Arenas-bio
UriAlon
pajek-ling
pajek-social
Arenas
pajek-mixed
NotreDame-bio
pajek-bio
99
A The Benchmark Graph Collection
SciMet main
ODLIS main
DutchElite main
geom main
Epa main
eva main
USpowerGrid
hep-th main
Erdos02
Lederberg main
PairsP
PGPgiantcompo
astro-ph main
eatRS
DIC28 main
hep-th-new main
cond-mat-2003 main
subset
S
S
RS
R
S
SC
RS
RS
S
RS
S
RSC
S
S
vertices
2678
2898
3621
3621
4253
4475
4941
5835
6927
8212
10617
10680
14845
23219
24831
27400
27519
edges
10369
16381
4310
9461
8897
4654
6594
13815
11850
41436
63786
24316
119652
305501
71014
352059
116181
mean wdeg
7.75541
12.70842
2.38111
10.91964
4.21020
2.08402
5.33819
4.68711
3.42139
10.10801
115.39098
4.55805
4.49610
67.90641
5.71979
25.73161
4.41826
wgt. density
4.8151e-05
2.7156e-05
0.00011598
2.5291e-05
5.5847e-05
0.00010725
3.7913e-05
3.6564e-05
4.2194e-05
1.2048e-05
8.1627e-07
2.0542e-05
1.4982e-05
6.3464e-07
7.0409e-06
1.4184e-06
8.2246e-06
source
pajek-citation
pajek-ling
pajek-social
pajek-citation
pajek-web
pajek-econ
pajek-mixed
Newman-coauth
pajek-social
pajek-citation
pajek-ling
Arenas-social
Newman-coauth
pajek-ling
pajek-ling
pajek-citation
Newman-coauth
Table A.2: References to the Graph Sources
ANoack-generated
pajek-citation
pajek-social
Newman-ling
Newman
Newman-coauth
pajek-bio
Arenas-bio
Newman-bio
UriAlon
pajek-ling
Arenas
pajek-web
pajek-econ
ANoack
pajek
RRotta
anoack
NotreDame-bio
Arenas-social
pajek-mixed
100
web address
http://www-sst.informatik.tu-cottbus.de/~an/GD/
http://vlado.fmf.uni-lj.si/pub/networks/data/
http://vlado.fmf.uni-lj.si/pub/networks/data/
http://www-personal.umich.edu/~mejn/netdata/
http://www-personal.umich.edu/~mejn/netdata/
http://www-personal.umich.edu/~mejn/netdata/
http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm
http://deim.urv.cat/~aarenas/data/welcome.htm
http://www-personal.umich.edu/~mejn/netdata/
http://www.weizmann.ac.il/mcb/UriAlon/
http://vlado.fmf.uni-lj.si/pub/networks/data/
http://deim.urv.cat/~aarenas/data/welcome.htm
http://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htm
http://vlado.fmf.uni-lj.si/pub/networks/data/econ/Eva/Eva.htm
http://www-sst.informatik.tu-cottbus.de/~an/GD/
http://vlado.fmf.uni-lj.si/pub/networks/data/sport/football.htm
http://goya.informatik.tu-cottbus.de/~clustering/
http://www.informatik.tu-cottbus.de/~an/GD/
http://vlado.fmf.uni-lj.si/pub/networks/data/ND/NDnets.htm
http://deim.urv.cat/~aarenas/data/welcome.htm
http://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htm
B Clustering Results
Table B.1: Random Walk Distance by Graph
SouthernWomen
dolphins
prison alon
polBooks
adjnoun
jazz
sandi main
celegansneural
USAir97
celegans metabolic
circuit s838
polBlogs main
Yeast main
Epa main
USpowerGrid
Lederberg main
PairsP
astro-ph main
DIC28 main
1
0.33449
0.46220
0.56098
0.52694
0.24411
0.44403
0.70997
0.48148
0.35050
0.42627
0.67174
0.43162
0.54016
0.58271
0.84679
0.61540
0.50587
0.65154
0.72153
2
0.31530
0.52680
0.61317
0.52694
0.27143
0.44403
0.80704
0.47932
0.35093
0.43558
0.81058
0.43219
0.57924
0.63287
0.90077
0.64291
0.58476
0.71070
0.77391
3
0.31530
0.46642
0.61739
0.52694
0.30637
0.44424
0.80215
0.47936
0.35246
0.42427
0.76911
0.43219
0.58575
0.63498
0.86988
0.66511
0.58308
0.70995
0.77093
4
0.33184
0.46642
0.61667
0.52694
0.28173
0.44403
0.81677
0.48499
0.35804
0.42970
0.77549
0.43219
0.59310
0.63672
0.90234
0.66392
0.58951
0.71056
0.77864
5
0.33449
0.46642
0.60873
0.52694
0.26923
0.44447
0.80646
0.50224
0.36082
0.43405
0.80461
0.43219
0.58996
0.64941
0.87756
0.68129
0.58975
0.71071
0.78366
WD-sgrd
0.31972
0.52587
0.62077
0.52724
0.31078
0.44468
0.82773
0.50295
0.36824
0.45070
0.81551
0.43237
0.62179
0.66843
0.93795
0.70339
0.65067
0.76261
0.84747
Table B.2: Random Walk Reachability, Length 2
SouthernWomen
dolphins
prison alon
polBooks
adjnoun
jazz
sandi main
celegansneural
USAir97
celegans metabolic
circuit s838
polBlogs main
Yeast main
Epa main
USpowerGrid
Lederberg main
PairsP
astro-ph main
DIC28 main
RWreach-sgrd 1
0.31972
0.52852
0.62077
0.52724
0.30194
0.44487
0.82760
0.50295
0.36824
0.44966
0.80524
0.43237
0.62207
0.66783
0.93717
0.70267
0.65476
0.76248
0.85097
RWreach-sgrd 3
0.31972
0.52852
0.61464
0.52724
0.30359
0.44452
0.82779
0.50382
0.36603
0.44989
0.81521
0.43228
0.62328
0.66795
0.93882
0.70323
0.65361
0.76329
0.85162
RWreach-sgrd 5
0.33449
0.52852
0.61288
0.52724
0.30750
0.44487
0.82779
0.50353
0.36824
0.44182
0.81551
0.43228
0.62593
0.66568
0.93919
0.70402
0.65386
0.76409
0.85158
WD-sgrd
0.31972
0.52587
0.62077
0.52724
0.31078
0.44468
0.82773
0.50295
0.36824
0.45070
0.81551
0.43237
0.62179
0.66843
0.93795
0.70339
0.65067
0.76261
0.84747
Table B.3: Random Walk Reachability, Length 3
SouthernWomen
dolphins
RWreach-sgrd 1
0.31972
0.52852
RWreach-sgrd 3
0.32635
0.52680
RWreach-sgrd 5
0.33184
0.52680
WD-sgrd
0.31972
0.52587
101
B Clustering Results
prison alon
polBooks
adjnoun
jazz
sandi main
celegansneural
USAir97
celegans metabolic
circuit s838
polBlogs main
Yeast main
Epa main
USpowerGrid
Lederberg main
PairsP
astro-ph main
DIC28 main
RWreach-sgrd 1
0.61288
0.52724
0.30122
0.44468
0.82779
0.50295
0.36824
0.45105
0.81377
0.43242
0.62524
0.67258
0.93768
0.70331
0.65233
0.76328
0.85106
RWreach-sgrd 3
0.61464
0.52724
0.29690
0.44487
0.82779
0.49997
0.36603
0.44075
0.81520
0.43233
0.62475
0.66818
0.93921
0.70382
0.65434
0.76365
0.85285
RWreach-sgrd 5
0.61898
0.52694
0.27580
0.44424
0.82779
0.49783
0.36094
0.43418
0.81447
0.43225
0.62710
0.66667
0.93937
0.70362
0.65346
0.76440
0.85152
WD-sgrd
0.62077
0.52724
0.31078
0.44468
0.82773
0.50295
0.36824
0.45070
0.81551
0.43237
0.62179
0.66843
0.93795
0.70339
0.65067
0.76261
0.84747
Table B.4: Clustering Results from the Refinement Phase
Chain8
Star9
K55
Student government
Tree15
ModMath main
SouthernWomen
Hi-tech main
karate
football
mexican power
Grid66
morse
Sawmill
Food
dolphins
WorldImport1999
prison alon
lesmis
world trade
polBooks
adjnoun
afootball
baywet
jazz
A99m main
SmallW main
A01 main
sandi main
celegansneural
USAir97
8Clusters
WorldCities main
celegans metabolic
circuit s838
Roget main
CSphd main
email
polBlogs main
NDyeast main
Yeast main
SciMet main
102
none
0.3571429 (2)
-0.0078125 (2)
0.3000000 (2)
0.1299822 (3)
0.5051020 (5)
0.4201827 (4)
0.3153011 (2)
0.2932112 (4)
0.4086949 (4)
0.3560299 (4)
0.3281102 (4)
0.5131944 (4)
0.2238226 (5)
0.5500780 (4)
0.3989169 (4)
0.5178593 (4)
0.2746734 (3)
0.6158375 (9)
0.5662983 (6)
0.3416073 (3)
0.5036225 (4)
0.2842824 (7)
0.5768034 (7)
0.3616026 (4)
0.3915476 (4)
0.6994806 (12)
0.4444636 (3)
0.5760183 (10)
0.8208257 (15)
0.4740477 (5)
0.3435344 (6)
0.2844054 (8)
0.1162580 (5)
0.4006559 (9)
0.7904052 (15)
0.5371181 (14)
0.9248402 (33)
0.5293695 (11)
0.4144859 (3)
0.8135195 (34)
0.5718771 (23)
0.5485112 (14)
SGR-density
0.3571429 (2)
0.0000000 (1)
0.3000000 (2)
0.1817371 (2)
0.5051020 (5)
0.4290513 (4)
0.3197197 (3)
0.3164654 (3)
0.4197896 (4)
0.3602930 (5)
0.3477610 (5)
0.5500000 (4)
0.2260019 (5)
0.5500780 (4)
0.4022119 (4)
0.5258692 (5)
0.2750381 (3)
0.6207735 (9)
0.5666880 (6)
0.3458804 (3)
0.5272366 (5)
0.3107820 (6)
0.5878718 (7)
0.3630110 (4)
0.4446760 (4)
0.7093229 (12)
0.4489310 (3)
0.6163954 (11)
0.8277276 (15)
0.5029479 (5)
0.3682440 (6)
0.2940509 (8)
0.1772768 (4)
0.4507027 (9)
0.8155133 (15)
0.5800064 (14)
0.9255843 (33)
0.5771250 (12)
0.4323714 (13)
0.8209006 (32)
0.6217924 (23)
0.6206436 (18)
CGR
0.3571429 (2)
0.0000000 (1)
0.3000000 (2)
0.1817371 (2)
0.5051020 (5)
0.4290513 (4)
0.3197197 (3)
0.3164422 (4)
0.4197896 (4)
0.3602930 (5)
0.3477610 (5)
0.5500000 (4)
0.2260019 (5)
0.5500780 (4)
0.4022119 (4)
0.5258692 (5)
0.2750381 (3)
0.6207735 (9)
0.5666880 (6)
0.3458804 (3)
0.5272366 (5)
0.3107820 (6)
0.5878718 (7)
0.3630111 (4)
0.4446760 (4)
0.7093229 (12)
0.4489310 (3)
0.6195992 (11)
0.8277276 (15)
0.5029479 (5)
0.3682440 (6)
0.2940509 (8)
0.1773516 (4)
0.4507027 (9)
0.8155118 (15)
0.5801273 (14)
0.9255843 (33)
0.5771250 (12)
0.4323714 (13)
0.8211389 (33)
0.6202976 (22)
0.6210574 (18)
KL
0.3775510 (3)
0.0000000 (1)
0.3000000 (2)
0.1817371 (2)
0.5127551 (4)
0.4488041 (4)
0.3360056 (3)
0.3167893 (3)
0.4197896 (4)
0.3603160 (5)
0.3595222 (4)
0.5412500 (5)
0.2276623 (4)
0.5500780 (4)
0.4022119 (4)
0.5258692 (5)
0.2750381 (3)
0.6207735 (9)
0.5666880 (6)
0.3458804 (3)
0.5256066 (6)
0.3107820 (6)
0.5941993 (7)
0.3630111 (4)
0.4448713 (4)
0.7139006 (11)
0.4494608 (3)
0.6321622 (12)
0.8277276 (15)
0.5029479 (5)
0.3682440 (6)
0.2940509 (8)
0.1789302 (4)
0.4509054 (9)
0.8155133 (15)
0.5868690 (16)
0.9255843 (33)
0.5776780 (11)
0.4324173 (13)
0.8225031 (31)
0.6232274 (22)
0.6224638 (16)
ODLIS main
DutchElite main
geom main
Epa main
eva main
USpowerGrid
hep-th main
Erdos02
Lederberg main
PairsP
PGPgiantcompo
astro-ph main
eatRS
DIC28 main
hep-th-new main
cond-mat-2003 main
average
none
0.4607628 (12)
0.8389162 (49)
0.7250140 (36)
0.6270148 (28)
0.9333784 (53)
0.9328997 (41)
0.8425086 (58)
0.6848450 (31)
0.6508633 (20)
0.6145969 (35)
0.8631499 (66)
0.7300281 (49)
0.4466266 (26)
0.8015399 (58)
0.5798309 (18)
0.7891245 (70)
0.52305183903
SGR-density
0.5134514 (12)
0.8493005 (48)
0.7470261 (43)
0.6684264 (27)
0.9357163 (52)
0.9379491 (39)
0.8551498 (59)
0.7159228 (38)
0.7033853 (22)
0.6506679 (35)
0.8832731 (94)
0.7623166 (58)
0.4999899 (25)
0.8474654 (105)
0.6663347 (31)
0.8135988 (76)
0.54608087341
CGR
0.5128609 (12)
0.8482894 (48)
0.7470261 (43)
0.6685486 (26)
0.9357164 (52)
0.9382380 (39)
0.8548411 (57)
0.7159229 (38)
0.7033853 (22)
0.6510081 (33)
0.8832732 (94)
0.7621238 (58)
0.4998523 (26)
0.8475762 (102)
0.6663266 (31)
0.8134295 (75)
0.54609771377
KL
0.5137963 (12)
0.8492394 (47)
0.7472464 (42)
0.6709372 (27)
0.9360090 (50)
0.9381809 (39)
0.8552368 (57)
0.7161150 (39)
0.7033671 (22)
0.6520954 (31)
0.8839220 (102)
0.7631902 (58)
0.5005763 (22)
0.8478136 (104)
0.6667373 (30)
0.8137741 (77)
0.54810369879
Table B.5: Runtime Measurements. All times are in seconds.
Chain8
K55
Hi-tech main
lesmis
A99m main
SmallW main
A01 main
celegansneural
Roget main
CSphd main
NDyeast main
SciMet main
ODLIS main
Epa main
hep-th main
Erdos02
Lederberg main
PairsP
PGPgiantcompo
astro-ph main
eatRS
DIC28 main
hep-th-new main
cond-mat-2003 main
vertices
8
10
33
77
233
233
249
297
994
1025
1458
2678
2898
4253
5835
6927
8212
10617
10680
14845
23219
24831
27400
27519
edges
7
25
91
254
325
994
635
2148
3641
1043
1993
10369
16381
8897
13815
11850
41436
63786
24316
119652
305501
71014
352059
116181
none
0.28800
0.42300
0.90900
1.16100
1.39100
1.83700
1.55100
2.02100
2.75000
1.66000
2.03800
4.75600
6.35300
4.54300
5.27600
6.04900
14.61700
22.13300
9.00400
29.20600
192.21500
24.79100
143.37100
33.39300
SGR-density
0.32700
0.40100
0.93300
1.20900
1.52100
1.96800
1.76400
2.21100
3.20600
2.04300
2.80200
6.44700
8.02400
7.70500
11.26100
11.31000
20.20700
33.34600
22.69000
52.03700
236.66900
76.99900
186.87100
76.34600
CGR
0.29000
0.37800
0.87800
1.15600
1.43500
1.89600
1.71000
2.18600
3.85200
1.96100
3.32300
14.37800
13.80900
30.41500
27.64100
34.10100
83.13300
127.07900
55.56700
365.39200
2115.29200
816.51200
1608.20900
1019.13700
KL
0.35100
0.43800
1.12900
1.28100
2.49300
3.08400
3.35600
5.16600
24.25000
22.72700
37.48900
84.12400
79.36200
201.82000
423.34600
359.59600
392.03800
769.91300
1255.66100
1917.59600
4003.69600
4755.91700
4553.72100
4158.93400
103
Chain8
Star9
K55
Tree15
ModMath main
SouthernWomen
karate
mexican power
Grid66
Sawmill
dolphins
polBooks
adjnoun
sandi main
USAir97
circuit s838
CSphd main
Erdos02
DIC28 main
average
walktrap
1e-05
0.00100
1e-05
1e-05
0.00100
0.00100
0.00100
0.00200
0.00100
0.00100
0.00200
0.00300
0.00500
0.00500
0.03400
0.01500
0.04300
4.66500
28.73900
0.0029568
leadingev
0.00100
0.05700
0.00100
0.00700
0.02200
0.02900
0.03300
0.03000
0.04400
0.03500
0.06000
0.13100
0.22600
1.13800
1.10400
1.84800
2.51300
54.29700
810.58300
0.14823
wakita HE
0.12300
0.12400
0.17300
0.17400
0.17300
0.17300
0.17400
0.17300
0.17400
0.17400
0.17400
0.17500
0.17400
0.22300
0.22300
0.22300
0.22400
2.22800
2.56000
0.23281
wakita HN
0.12400
0.12300
0.12300
0.12300
0.17400
0.17300
0.17300
0.17300
0.17300
0.17300
0.17300
0.17400
0.17500
0.17400
0.22500
0.22400
0.22400
2.18700
1.54400
0.21555
fgj
0.00600
0.00500
0.00500
0.00600
0.00700
0.00700
0.00700
0.00700
0.00700
0.00700
0.00800
0.01300
0.01700
0.01900
0.04200
0.05100
0.08900
3.52000
27.60200
0.02285
ML-none
0.25300
0.29000
0.32800
0.40500
0.78900
0.90400
0.82900
0.88300
0.86500
0.86600
1.11300
1.34800
1.25500
1.22000
2.04600
1.61200
1.59400
5.71400
24.19300
1.10230
spinglass
1.85300
3.02000
1.26400
5.02400
6.92700
7.43200
7.73500
6.25000
5.37300
6.41200
11.57400
18.45100
30.18600
48.13700
116.79000
137.68800
219.04600
1959.89200
7213.12900
22.44932
ML-sgrd
0.27100
0.30900
0.34900
0.43400
0.84800
0.97100
0.89600
0.94300
0.93300
0.93100
1.19900
1.46500
1.36300
1.34900
2.23900
1.84300
1.99000
10.89800
75.86800
1.30958
ML-KL
0.31200
0.36000
0.38000
0.50100
1.00800
1.16200
1.06900
1.12900
1.10300
1.09600
0.93000
1.39700
1.40800
2.37400
5.03300
5.88600
22.37300
351.05700
4581.66900
2.71063
B Clustering Results
Table B.6: Runtime of the Reference Algorithms. All time values are printed in
seconds.
104