Professional Documents
Culture Documents
Cohesive Subgraph Computation Over Large Sparse Graphs Algorithms Data Structures and Programming Techniques Lijun Chang
Cohesive Subgraph Computation Over Large Sparse Graphs Algorithms Data Structures and Programming Techniques Lijun Chang
https://textbookfull.com/product/probabilistic-data-structures-
and-algorithms-for-big-data-applications-gakhov/
https://textbookfull.com/product/data-structures-and-algorithms-
in-swift-kevin-lau/
https://textbookfull.com/product/data-structures-algorithms-in-
python-john-canning/
https://textbookfull.com/product/problem-solving-in-data-
structures-algorithms-using-c-programming-interview-guide-first-
edition-hemant-jain/
Problem Solving in Data Structures Algorithms Using C
Programming Interview Guide 1st Edition Hemant Jain
https://textbookfull.com/product/problem-solving-in-data-
structures-algorithms-using-c-programming-interview-guide-1st-
edition-hemant-jain/
https://textbookfull.com/product/algorithms-for-data-and-
computation-privacy-1st-edition-alex-x-liu/
https://textbookfull.com/product/graphs-algorithms-and-
optimization-second-edition-kocay/
https://textbookfull.com/product/data-structures-algorithms-in-
kotlin-implementing-practical-data-structures-in-kotlin-1st-
edition-raywenderlich-tutorial-team/
Springer Series in the Data Sciences
Cohesive Subgraph
Computation
over Large Sparse
Graphs
Algorithms, Data Structures,
and Programming Techniques
Springer Series in the Data Sciences
Series Editors:
David Banks, Duke University, Durham
Jianqing Fan, Princeton University, Princeton
Michael Jordan, University of California, Berkeley
Ravi Kannan, Microsoft Research Labs, Bangalore
Yurii Nesterov, Universite Catholique de Louvain, Louvain-la-Neuve
Christopher Ré, Stanford University, Stanford
Ryan Tibshirani, Carnegie Melon University, Pittsburgh
Larry Wasserman, Carnegie Mellon University, Pittsburgh
Springer Series in the Data Sciences focuses primarily on monographs and graduate
level textbooks. The target audience includes students and researchers working in
and across the fields of mathematics, theoretical computer science, and statistics.
Data Analysis and Interpretation is a broad field encompassing some of the
fastest-growing subjects in interdisciplinary statistics, mathematics and computer
science. It encompasses a process of inspecting, cleaning, transforming, and mod-
eling data with the goal of discovering useful information, suggesting conclusions,
and supporting decision making. Data analysis has multiple facets and approaches,
including diverse techniques under a variety of names, in different business, science,
and social science domains. Springer Series in the Data Sciences addresses the needs
of a broad spectrum of scientists and students who are utilizing quantitative methods
in their daily research.
The series is broad but structured, including topics within all core areas of the
data sciences. The breadth of the series reflects the variation of scholarly projects
currently underway in the field of machine learning.
Cohesive Subgraph
Computation over
Large Sparse Graphs
Algorithms, Data Structures,
and Programming Techniques
123
Lijun Chang Lu Qin
School of Computer Science Centre for Artificial Intelligence
The University of Sydney University of Technology Sydney
Sydney, NSW, Australia Sydney, NSW, Australia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my wife, Xi
my parents, Qiyuan and Yumei
Lijun Chang
To my wife, Michelle
my parents, Hanmin and Yaping
Lu Qin
Preface
Graph model has been widely used to represent the relationships among entities in
a wide spectrum of applications such as social networks, communication networks,
collaboration networks, information networks, and biological networks. As a result,
we are nowadays facing a tremendous amount of large real-world graphs. For exam-
ple, SNAP [49] and Network Repository [71] are two representative graph reposi-
tories hosting thousands of real graphs. An availability of rich graph data not only
brings great opportunities for realizing big values of data to serve key applications
but also brings great challenges in computation.
The main purpose of this book is to survey the recent technical developments
on efficiently processing large sparse graphs, in view of the fact that real graphs
are usually sparse graphs. Algorithms designed for large sparse graphs should be
analyzed with respect to the number of edges in a graph, and ideally should run
in linear or near-linear time to the number of edges. In this book, we illustrate
the general techniques and principles, toward efficiently processing large sparse
graphs with millions of vertices and billions of edges, through the problems of co-
hesive subgraph computation. Although real graphs are sparsely connected from a
global point of view, they usually contain subgraphs that are locally densely con-
nected [11]. Computing cohesive/dense subgraphs can either be the main goal of a
graph analysis task or act as a preprocessing step aiming to reduce/trim the graph
by removing sparse/unimportant parts such that more complex and time-consuming
analysis can be conducted. In the literature, the cohesiveness of a subgraph is usu-
ally measured by the minimum degree, the average degree, or their higher-order
variants, or edge connectivity. Cohesive subgraph computation based on different
cohesiveness measures extracts cohesive subgraphs with different properties and
also requires different levels of computational efforts.
The book can be used either as an extended survey for people who are inter-
ested in cohesive subgraph computation or as a reference book for a postgraduate
course on the related topics, or as a guideline book for writing effective C/C++
programs to efficiently process real graphs with billions of edges. In this book, we
vii
viii Preface
will introduce algorithms, in the form of pseudocode, analyze their time and space
complexities, and also discuss their implementations. C/C++ codes for all the data
structures and some of the presented algorithms are available at the author’s GitHub
website.1
where α (G) is the arboricity of a graph G and satisfies α (G) ≤ m [20]. Then, we
discuss how to extend the algorithms presented in Chapters 3 and 4 for higher-order
core decomposition (specifically, truss decomposition and nucleus decomposition)
and higher-order densest subgraph computation (specifically, k-clique densest sub-
graph computation), respectively.
In Chapter 6, we discuss edge connectivity-based graph decomposition. Firstly,
given an integer k, we study the problem of computing all maximal k-edge con-
nected subgraphs in a given input graph. We present a graph partition-based ap-
proach to conduct this in O(h × l × m) time, where h and l are usually bounded
by small constants for real-world graphs. Then, we present a divide-and-conquer
approach, which invokes the graph partition-based approach as a building block,
for computing the maximal k-edge connected subgraphs for all different k values in
O((log α (G)) × h × l × m) time.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Graph Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Real Graph Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Representation of Large Sparse Graphs . . . . . . . . . . . . . . . . . . 4
1.1.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Cohesive Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Cohesive Subgraph Computation . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 1
Introduction
With the rapid development of information technology such as social media, on-
line communities, and mobile communications, huge volumes of digital data are
accumulated with data entities involving complex relationships. These data are usu-
ally modelled as graphs in view of the simple yet strong expressive power of graph
model; that is, entities are represented by vertices and relationships are represented
by edges. Managing and extracting knowledge and insights from large graphs are
highly demanded by many key applications [93], including public health, science,
engineering, business, environment, and more. An availability of rich graph data not
only brings great opportunities for realizing big values of data to serve key appli-
cations but also brings great challenges in computation. This book surveys recent
technical developments on efficiently processing large sparse graphs, where real
graphs are usually sparse graphs.
In this chapter, we firstly present in Section 1.1 the background information in-
cluding graph terminologies, some example real graphs that serve the purpose of
illustrating properties of real graphs as well as the purpose of empirically evaluating
algorithms, and space-effective representation of large sparse graphs in main mem-
ory. Then, in Section 1.2 we briefly introduce the problem of cohesive subgraph
computation and also discuss its applications.
1.1 Background
In this book, we focus on unweighted and undirected graphs and consider only
the interconnection structure (i.e., edges) among vertices of a graph, while ignoring
possible attributes of vertices and edges. That is, we consider the simplest form of a
graph that consists of a set of vertices and a set of edges.
We denote a graph by g or G. For a graph g, we let V (g) and E(g) denote the
set of vertices and the set of edges of g, respectively, and we also represent g by
(V (g), E(g)). We denote the edge between u and v by (u, v), the set of neighbors of
a vertex u in g by:
Ng (u) = {v ∈ V (g) | (u, v) ∈ E(g)},
and the degree of u in g by:
dg (u) = |Ng (u)|.
We denote the minimum vertex degree, the average vertex degree, and the maximum
vertex degree of g by dmin (g), davg (g), and dmax (g), respectively. Given a subset Vs
of vertices of g (i.e., Vs ⊆ V (g)), we use g[Vs ] to denote the subgraph of g induced
by Vs ; that is:
g[Vs ] = (Vs , {(u, v) ∈ E(g) | u ∈ Vs , v ∈ Vs }).
Given a subset of edges of g, Es ⊆ E(g), we use g[Es ] to denote the subgraph of g
induced by Es ; that is:
g[Es ] = ( {u, v}, Es ).
(u,v)∈Es
v6
v9
v5 v1 v2 v
8
v7
v10 v3 v4 v11
Fig. 1.1: An example unweighted undirected graph
Example 1.1. Figure 1.1 shows an example graph G consisting of 11 vertices and
13 undirected edges; that is, n = 11 and m = 13. The set of neighbors of v1 is
N(v1 ) = {v2 , v3 , v4 , v5 }, and the degree of v1 is d(v1 ) = |N(v1 )| = 4. The vertex-
induced subgraph G[{v1 , v2 , v3 , v4 }] is a clique consisting of 4 vertices and 6 undi-
rected edges.
1.1 Background 3
In this book, we focus on techniques for efficiently processing real graphs that
are obtained from real-life applications. In the following, we introduce several real
graph data repositories as well as present some example real graphs that serve the
purpose of illustrating properties of real graphs and the purpose of empirically eval-
uating algorithms in the remainder of the book.
Real Graph Data Repositories. Several real graph data repositories have been ac-
tively maintained by different research groups, which in total cover thousands of
real graphs. A few example repositories are as follows:
• Stanford Network Analysis Project (SNAP) [49] maintains a collection of more
than 50 large network datasets from tens of thousands of vertices and edges to
tens of millions of vertices and billions of edges. It includes social networks,
web graphs, road networks, internet networks, citation networks, collaboration
networks, and communication networks.
• Laboratory for Web Algorithmics (LAW) [12] hosts a set of large networks with
size up-to 1 billion vertices and tens of billions of edges. The networks of LAW
are mainly web graphs and social networks.
• Network Repository [71] is a large network repository archiving thousands of
graphs with up-to billions of vertices and tens of billions of edges.
• The Koblenz Network Collection (KONECT)1 contains several hundred network
datasets with up-to tens of millions of vertices and billions of edges. The net-
works of KONECT cover many diverse areas such as social networks, hyper-
link networks, authorship networks, physical networks, interaction networks, and
communication networks.
Five Real-World Graphs. We choose five real-world graphs from different do-
mains to show the characteristic of real-world graphs; these graphs will also be used
to demonstrate the performance of algorithms and data structures in the remain-
der of the book. The graphs are as-Skitter, soc-LiveJournal1, twitter-2010,
uk-2005, and it-2004; the first two are downloaded from SNAP, while the remain-
ing three are downloaded from LAW. as-Skitter is an internet topology graph.
soc-LiveJournal1 is an online community, where members maintain journals and
make friends. twitter-2010 is a social network. uk-2005 and it-2004 are two
web graphs crawled within the .uk and .it domains, respectively.
For each graph, we make its edges undirected, remove duplicate edges, and then
choose the largest connected component (i.e., giant component) as the correspond-
ing graph. Statistics of the five graphs are given in Table 1.1, where the last column
shows the degeneracy (see Chapter 3) of G. We can see that davg (G) n holds for
all these graphs; that is, real-world graphs are usually sparse graphs.
1 http://konect.uni-koblenz.de/.
4 1 Introduction
#Vertices
10
3 104
10
103
102 2
10
1 1
10 10
0 0
10 0 1 2 3 4 5
10 0 1 2 3 4 5
10 10 10 10 10 10 10 10 10 10 10 10
Degree Degree
#Vertices
4 4
10 10
3 3
10 10
2 2
10 10
101 101
0 0
10 10
100 101 102 103 104 105 106 107 100 101 102 103 104 105 106 107
Degree Degree
Figure 1.2 shows the degree distribution for four of the graphs. Note that both
x-axis and y-axis are in log scale. Thus, the degree distribution follows a power-
law distribution; this demonstrates that real-world graphs are usually power-law
graphs.
There are two standard ways to represent a graph in main memory [24], adjacency
matrix and adjacency list. For a graph with n vertices and m edges, the adjacency
matrix representation consumes θ (n2 ) space, while the adjacency list representa-
tion consumes θ (n + m) space. As we are dealing with large sparse graphs con-
taining tens of millions (or even hundreds of millions) of vertices in this book (see
Table 1.1), the adjacency matrix representation is not feasible for such large num-
ber of vertices. On the other hand, an adjacency list consumes more space than an
1.1 Background 5
array due to explicitly storing pointers in the adjacency list, and moreover, access-
ing linked lists has the pointer-chasing issue that usually results in random memory
access. Thus, we use a variant of the adjacency list representation, called adjacency
array representation, which is also known as the Compressed Sparse Row (CSR)
representation in the literature [73]. Note that, as we focus on static graphs in this
book, the adjacency array representation is sufficient; however, if the input graph
dynamically grows (i.e., new edges are continuously added), then the adjacency list
representation or other representations may be required.
In the adjacency array representation, an unweighted graph is represented by two
arrays, denoted pstart and edges. It assumes that each of the n vertices of G takes
a distinct id from {0, . . . , n − 1}; note that if this assumption does not hold, then
a mapping from V to {0, . . . , n − 1} can be explicitly constructed. Thus, in the re-
mainder of the book, we also occasionally use i (0 ≤ i ≤ n − 1) to denote a vertex.
The adjacency array representation is to store the set of neighbors of each vertex
consecutively by an array (rather than a linked list as done in the adjacency list rep-
resentation) and then concatenate all such arrays into a single large array edges,
by putting the neighbors of vertex 0 first, followed by the neighbors of vertex 1,
and so on so forth. The start position (i.e., index) of the set of neighbors of vertex i
in the array edges is stored in pstart[i], while pstart[n] stores the length of the
array edges. In this way, the degree of vertex i can be obtained in constant time
as pstart[i + 1] − pstart[i], and the set of neighbors of vertex i is stored con-
secutively in the subarray edges[pstart[i], . . . , pstart[i + 1] − 1]. As a result, the
neighbors of each vertex occupy consecutive space in the main memory, which can
improve the cache hit-rate. Note that this representation also supports the removal
of edges from the graph. That is, we move all the remaining neighbors of vertex i
to be consecutive in edges starting at position pstart[i], and we use another array
pend to explicitly store in pend[i] the last position of the neighbors of i.
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
pstart 0 4 8 12 16 19 20 21 23 24 25 26
edges
v2 v3 v4 v5 v1 v3 v4 v8 v1 v2 v4 v10 v1 v2 v3 v11 v1 v6 v7 v5 v5 v2 v9 v8 v3 v4
Figure 1.3 demonstrates the adjacency array representation for the graph in Fig-
ure 1.1. It is easy to see that, by the adjacency array representation, an unweighted
undirected graph with n vertices and m undirected edges can be stored in main
memory by n + 2m + 1 machine words; note that here each undirected edge is stored
twice in the graph representation, once for each direction.
An example C++ code for allocating memory to store the graph G is shown in
Listing 1.1. Here, pstart and edges are two arrays of the data type unsigned int.
For presentation simplicity, we define unsigned int as uint in Listing 1.1, which
will also be used in the data structures presented in Chapter 2. Note that, as the range
of unsigned int in a typical machine nowadays is from 0 to 4, 294, 967, 295, the
6 1 Introduction
example C++ code in Listing 1.1 can be used to store a graph containing up-to 2
billion undirected edges; for storing larger graphs, the data type of pstart, or even
edges, needs to be changed to long.
In this paper, we will provide time complexity analysis for all the presented al-
gorithms by using the big-O notation O(·). Specifically, for two given functions
f (n) and f (n), f (n) ∈ O( f (n)) if there exist positive constants c and n0 such that
f (n) ≤ c × f (n) for all n ≥ n0 ; note that O(·) denotes a set of functions. Occa-
sionally, we will also use the θ -notation. Specifically, for two given functions f (n)
and f (n), f (n) ∈ θ ( f (n)) if there exist positive constants c1 , c2 , and n0 , such that
c1 × f (n) ≤ f (n) ≤ c2 × f (n) for all n ≥ n0 .
As we aim to process large graphs with billions of edges in main memory, it is
also important to keep the memory consumption of an algorithm small such that
larger graphs can be processed with the available main memory. Thus, we also an-
alyze the space complexities of algorithms in this book. As the number m of edges
usually is much larger than the number n of vertices for large real-world graphs
(see Table 1.1), we analyze the space complexity in the form of c × m + O(n) by
explicitly specifying the constant c, since c × m usually is the dominating factor.
Recall that the adjacency array representation of a graph in Section 1.1.3 con-
sumes 2m + O(n) memory space. Thus, if an algorithm takes only O(n) extra mem-
ory space besides the graph representation, then a graph with 1 billion undirected
edges may be able to be processed in a machine with 16GB main memory which is
common for nowadays’ commodity machines. Note that a graph with 1 billion undi-
rected edges takes slightly more than 8GB main memory to store by the adjacency
array representation.
In this book, we illustrate the general techniques and principles towards efficiently
processing large real-world graphs with millions of vertices and billions of edges,
through the problems of cohesive subgraph computation.
1.2 Cohesive Subgraphs 7
A common phenomenon of real-world graphs is that they are usually globally sparse
but locally dense [11]. That is, the entire graph is sparse in terms of having a small
average degree (e.g., in the order of tens), but it contains subgraphs that are cohe-
sive/dense (e.g., contains a large clique of up-to thousands of vertices). Thus, it is of
great importance to extract cohesive subgraphs from large sparse graphs.
Given an input graph G, cohesive subgraph computation is either to find all max-
imal subgraphs of G whose cohesiveness values are at least k for all possible k
values, or to find the subgraph of G with the largest cohesiveness value. Here, the
cohesiveness value of a subgraph g is solely determined by the structure of g while
being independent to other parts of G that are not in g, and sometimes is also re-
ferred to as the density of the subgraph; thus, cohesive subgraph sometimes is also
called dense subgraph. In this book, we focus on cohesive subgraph computation
based on the following commonly used cohesiveness measures:
1. Minimum degree (aka, k-core, see Chapter 3); that is, the maximal subgraph
whose minimum degree is at least k, which is called k-core. The problem is either
to compute the k-core for a user-given k or to compute k-cores for all possible k
values [59, 76, 81].
2. Average degree (aka, dense subgraph, see Chapter 4); that is a subgraph with
average degree at least k. The problem studied usually is to compute the subgraph
with the largest average degree (i.e., densest subgraph) [18, 35].
3. Higher-order Variants of k-core and Densest Subgraph (see Chapter 5); for ex-
ample, the maximal subgraph in which each edge participates in at least k trian-
gles within the subgraph (i.e., k-truss) [22, 92], the subgraph where the average
number of triangles each vertex participates is the largest (i.e., triangle-dense
subgraph) [89].
4. Edge connectivity (aka, k-edge connected components, see Chapter 6); that is,
the maximal subgraphs each of which is k-edge connected. The problem studied
is either to compute the k-edge connected components for a user-given k [3, 17,
102] or to compute k-edge connected components for all possible k values [16,
99].
Besides the above commonly used ones, other cohesiveness measures have also
been defined in the literature [48]. For example, a graph g is a clique if each vertex
is connected to all other vertices [80] (i.e., |E(g)| = |V (g)|(|V2 (g)|−1) ); a graph g is a
γ -quasi clique if at least γ portion of its vertex pairs are connected by edges (i.e.,
|E(g)| ≥ γ × |V (g)|(|V2 (g)|−1) ) [1]; a graph g is a k-plex if every vertex of g is con-
nected to all but no more than (k − 1) other vertices (i.e., dg (u) ≥ |g| − k, for each
u ∈ V (g)) [82]. Nevertheless, cohesive subgraph computation based on these defi-
nitions usually leads to NP-Hard problems, and thus are generally computationally
too expensive to be applied to large graphs [23]. Consequently, we do not consider
these alternative cohesiveness measures in this book.
8 1 Introduction
1.2.2 Applications
Cohesive subgraph computation can either be the main goal of a graph analysis
task or act as a preprocessing step aiming to reduce/trim the graph by removing
sparse/unimport parts such that more complex and time-consuming analysis can be
conducted. For example, some of the applications of cohesive subgraph computation
are illustrated as follows.
Locating Influential Nodes. Cohesive subgraph detection has been used to identify
the entities in a network that act as influential spreaders for propagating information
to a large portion of the network [46, 55]. Analysis on real networks shows that
vertices belonging to the maximal k-truss subgraph for the largest k show good
spreading behavior [55], leading to fast and wide epidemic spreading. Moreover,
vertices belonging to such dense subgraphs dominate the small set of vertices that
achieve the optimal spreading in the network [55].
Link Spam Detection. Dense subgraph detection is a useful primitive for spam
detection [33]. A study in [33] shows that many of the dense bipartite subgraphs in
a web graph are link spam, i.e., websites that attempt to manipulate search engine
rankings through aggressive interlinking to simulate popular content.
Real-Time Story Identification. A graph can be used to represent entities and their
relationships that are mentioned in the texts of an online social media such as Twit-
ter, where edge weights correspond to the pairwise association strengths of entities.
It is shown in [4] that given such a graph, a cohesive group of strongly associated
entity pairs usually indicates an important story.
Chapter 2
Linear Heap Data Structures
In this chapter, we present linear heap data structures that will be useful in the re-
mainder of the book for designing algorithms to efficiently process large sparse
graphs. In essence, the linear heap data structures can be used to replace Fibonacci
Heap and Binary Heap [24] to achieve better time complexity as well as practical
performance, when some assumptions are satisfied.
In general, the linear heap data structures store (element, key) pairs, and have
two assumptions regarding element and key, respectively. Firstly, the total number
of distinct elements is denoted by n, and the elements are 0, 1, . . . , n − 1; note that, in
this book, each element usually corresponds to a vertex of G. Secondly, let key cap
be the upper bound of the values of key, then the possible values of key are integers
in the range [0, key cap]; note that, in this book, key cap is usually bounded by n.
By utilizing the above two assumptions, the linear heap data structures can support
updating the key of an element in constant time and also support retrieving/remov-
ing the element with the minimum (or maximum) key in amortized constant time.
Recall that the famous Fibonacci Heap is constant-time updatable but has a loga-
rithmic popping cost [24].
The linked list-based linear heap organizes all elements with the same key value by
a doubly linked list [17]; that is, there is one doubly linked list for each distinct key
value. Moreover, the heads (i.e., the first elements) of all such doubly linked lists
are stored in an array heads, such that the doubly linked list for a specific key value
key can be retrieved in constant time from heads[key]. For memory efficiency, the
information of doubly linked lists are stored in two arrays, pres and nexts, such
that the elements that precede and follow element i in the doubly linked list of i are
pres[i] and nexts[i], respectively. In addition, there is an array keys for storing
the key values of elements; that is, the key value of element i is keys[i].
10 0 1 2 3 4 5 6 7 8 9 10
heads - v11 v8 v5 v4 - - - - - -
...
4 v4 v3 v2 v1
v5
3 nexts - v1 v2 v3 - - v6 - v7 v9 v10
2 v8
1 v11 v10 v9 v7 v6 pres v2 v3 v4 - - v7 v9 - v10v11 -
0 keys 4 4 4 4 3 1 1 2 1 1 1
heads v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
(a) Conceptual view (b) Actual storage
For example, Figure 2.1 illustrates the linked list-based linear heap constructed
for vertices in Figure 1.1 where the key value of a vertex is its degree. Figure 2.1
shows the conceptual view in the form of adjacency lists, while Figure 2.1 shows
the actual data stored for the data structure; here, n = 11 and key cap = 10. Note
that dashed arrows in Figure 2.1 are just for illustration purpose and are not actually
stored. Each index in the array heads corresponds to a key value, while each index
in the arrays pres, nexts, and keys corresponds to an element; note that the same
index in the arrays pres, nexts, and keys corresponds to the same element.
public:
ListLinearHeap(uint _n, uint _key_cap) ;
˜ListLinearHeap() ;
void init(uint _n, uint _key_cap, uint *_elems, uint *_keys) ;
void insert(uint element, uint key) ;
uint remove(uint element) ;
uint get_n() { return n; }
uint get_key_cap() { return key_cap; }
uint get_key(uint element) { return keys[element]; }
uint increment(uint element, uint inc) ;
uint decrement(uint element, uint dec) ;
bool get_max(uint &element, uint &key) ;
bool pop_max(uint &element, uint &key) ;
bool get_min(uint &element, uint &key) ;
bool pop_min(uint &element, uint &key) ;
}
Initialize a Linked List-Based Linear Heap (init). The main task of init is to:
(1) allocate proper memory space for the data structure, (2) assign proper initial val-
ues for max key, min key, and heads, and (3) insert the (element,key) pairs, sup-
plied in the input parameter to init, into the data structure. The pseudocode of init
is shown in Algorithm 1, where memory allocation is omitted. Specifically, max key
is initialized as 0, min key is initialized as key cap, and heads[key] is initialized
12 2 Linear Heap Data Structures
as non-exist (denoted by null) for each distinct key value. Each (element,key) pair
is inserted into the data structure by invoking the member function insert.
Algorithm 3: remove(element)
1 if pres[element] = null then
/* element is at the beginning of a doubly linked list */
2 heads[keys[element]] ← nexts[element];
3 if nexts[element]! = null then pres[nexts[element]] ← null;
4 else
5 nexts[pres[element]] ← nexts[element];
6 if nexts[elements]! = null then pres[nexts[elements]] ← pres[element];
7 return keys[element];
To remove an element from the data structure, the doubly linked list contain-
ing element is updated by adding a direct link between the immediate preceding
element pres[element] and the immediate succeeding element nexts[element] of
element. The pseudocode is given in Algorithm 3, which returns the key value of
the removed element. Note that if element is at the beginning of a doubly linked list,
then heads[keys[element]] also needs to be updated.
Update the Key Value of an Element. To update the key value of an element,
the element is firstly removed from the doubly linked list corresponding to the key
value keys[element] and is then inserted into the doubly linked list corresponding
to the updated key value. As a result, the key value of element in the data structure is
updated, and moreover min key and max key may also be updated by the new key
value of element. The pseudocode of decrement is shown in Algorithm 4, which
returns the updated key value of element. The pseudocode of increment is similar
and is omitted.
Pop/Get Min/Max from a Linked List-Based Linear Heap. To pop the element
with the minimum key value from the data structure, the value of min key is firstly
updated to be the smallest value such that heads[min key] = null. If such min key
exists, then all elements in the doubly linked list pointed by heads[min key] have
the same minimum key value, and the first element is removed from the data struc-
ture and is returned. Otherwise, min key is updated to be larger than max key,
which means that the data structure currently contains no element. The pseudocode
is shown in Algorithm 5. The pseudocodes of get min, pop max, and get max are
similar and are omitted. Note that, during the execution of the data structure, the
value of min key is guaranteed to be a lower bound of the key values of all ele-
ments in the data structure. Thus, Algorithm 5 correctly obtains the element with
the minimum key value.
in pres[i] and nexts[i], explicitly. Alternatively, all elements in the same doubly
linked list can be stored consecutively in an array, in the same way as the adjacency
array graph representation discussed in Section 1.1.3. This strategy is used in the
array-based linear heap in [7]. Specifically, all elements with the same key value
are stored consecutively in an array ids, such that elements with key value 0 are
put at the beginning of ids and are followed by elements with key value 1 and so
forth. The start position of the elements for each distinct key value is stored in an
array heads. Thus, heads and ids resemble pstart and edges of the adjacency
array graph representation, respectively. In addition, the key values of elements are
stored in an array keys in the same way as the linked list-based linear heap, and the
positions of elements in ids are stored in an array rids (i.e., rids[ids[i]] = i).
0 1 2 3 4 5 6 7 8 9 10
heads 0 0 5 6 7 11 11 11 11 11 11
ids v6 v7 v9 v10v11v8 v5 v1 v2 v3 v4
rids 7 8 9 10 6 0 1 5 2 3 4
keys 4 4 4 4 3 1 1 2 1 1 1
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
For example, Figure 2.2 demonstrates the array-based linear heap constructed for
vertices in Figure 1.1 where the key value of a vertex is its degree. That is, Figure 2.2
shows the actual data of the adjacency array-based representation for the doubly
linked lists in Figure 2.1. Note that dashed arrows in Figure 2.2 are for illustration
purpose and are not actually stored. Each index in the array heads corresponds to a
key value, and each index in the arrays rids and keys corresponds to an element
i, while the indexes of the array ids correspond to neither.
public:
ArrayLinearHeap(uint _n, uint _key_cap) ;
˜ArrayLinearHeap() ;
void init(uint _n, uint _key_cap, uint *_ids, uint *_keys) ;
uint get_n() { return n; }
uint get_key_cap() { return key_cap; }
uint get_key(uint element) { return key_s[element]; }
void increment(uint element) ;
void decrement(uint element) ;
bool get_max(uint &element, uint &key) ;
bool pop_max(uint &element, uint &key) ;
bool get_min(uint &element, uint &key) ;
bool pop_min(uint &element, uint &key) ;
}
and remove are disabled, and increment and decrement are only allowed to up-
date the key value of an element by 1.
The space complexity of ArrayLinearHeap is 3n + key cap + O(1), the same
as that of ListLinearHeap. Also, to use ArrayLinearHeap after constructing it,
the member function init needs to be invoked first to properly initialize the member
variables before invoking any other member functions.
Initialize an Array-Based Linear Heap (init). The pseudocode of init is shown
in Algorithm 6, where memory allocation is omitted. It needs to sort all elements
in ids in non-decreasing key value order, which is conducted by the counting
sort [24]. After initializing by init, the set of elements with key value key is stored
consecutively in the subarray ids[heads[key], . . . , heads[key + 1] − 1].
Update the Key Value of an Element. In ArrayLinearHeap, the key value of an
element is only allowed to be updated by 1. The general idea of decrement is as fol-
lows. Let key be the key value of element before updating. Then, the goal is to move
element from the subarray ids[heads[key], . . . , heads[key + 1] − 1] to the subarray
ids[heads[key − 1], . . . , heads[key] − 1]. To do so, element is firstly moved to be at
(by swapping with) position heads[key], where the original position of element in
ids is located by rids[element]. Then, the start position of elements in ids with
2.2 Array-Based Linear Heap 17
Algorithm 7: decrement(element)
1 key ← keys[element];
2 if heads[key] = rids[element] then
3 Swap the content of ids for positions heads[key] and rids[element];
4 rids[ids[rids[element]]] ← rids[element];
5 rids[ids[heads[key]]] ← heads[key];
6 if min key = key then
7 min key ← min key − 1;
8 heads[min key] ← heads[min key + 1];
9 heads[key] ← heads[key] + 1; keys[element] ← keys[element] − 1;
10 return keys[element];
key value key (i.e., heads[key]) is increased by one; consequently, element is now at
the end of the subarray ids[heads[key − 1], . . . , heads[key] − 1]. Note that min key
may also be updated in decrement. The pseudocode of decrement is shown in
Algorithm 7, which returns the updated key value of element. The pseudocode of
increment is similar and is omitted.
For example, to decrement the key value of v4 by 1 for the data structure in
Figure 2.2, firstly, v4 is swapped with v1 in the array ids, and then heads[4] is
increased by 1 to become 8; note that rids and keys are updated accordingly.
public:
LazyLinearHeap(uint _n, uint _key_cap) ;
˜LazyLinearHeap() ;
void init(uint _n, uint _key_cap, uint *_elems, uint *_keys) ;
uint get_n() { return n; }
uint get_key_cap() { return key_cap; }
uint get_key(uint element) { return keys[element]; }
uint decrement(uint element, uint dec) {
return keys[element] -= dec;
}
bool get_max(uint &element, uint &key) ;
bool pop_max(uint &element, uint &key) ;
}
LF124.
Bedelia. England. 90 min., sd., b&w, 35 mm. Based on the book by
Vera Caspary. Appl. au.; Isadore Goldsmith. © John Corfield
Productions, Ltd.; 24May46; LF124.
LF125.
Don’t look now. An Anglo-Italian coproduction by Casey
Productions, Ltd. & Eldorado Films, S. R. L. England. 110 min., sd.,
color, 35 mm. From a story by Daphne DuMaurier. © D. L. N.
Ventures Partnership; 12Oct73; LF125.
LF126.
Men of two worlds. John Sutro. England. 107 min., sd., b&w, 16
mm. © Two Cities Films, Ltd.; 9Sep46; LF126.
LF127.
Beware of pity. A Pentagon production. England. 103 min., sd.,
b&w, 16 mm. From the novel by Stefan Zweig. © Two Cities Films,
Ltd.; 22Jul46; LF127.
LF128.
Theirs is the glory. England. 82 min., sd., b&w, 16 mm. ©
Gaumont British Distributors, Ltd. & General Film Distributors,
Ltd.; 14Oct46; LF128.
LF129.
Carnival. A Two Cities film. England. 93 min., sd., b&w, 16 mm. ©
Two Cities Films, Ltd.; 2Dec46; LF129.
LF130.
The History of Mister Polly. England. 96 min., sd., b&w, 35 mm.
The History of Mister Polly, by H. G. Wells. © Two Cities Films, Ltd.;
28Mar49 (in notice: 1948); LF130.
LF131.
The Reluctant widow. A Two Cities film. England. 86 min., sd.,
b&w, 16 mm. From the novel by Georgette Heyer. © Two Cities
Films, Ltd.; 1May50; LF131.
LF132.
Flood tide. A Pentagon production. England. 90 min., sd., b&w, 16
mm. © Aquila Film Productions, Ltd.; 2May49; LF132.
LF133.
Golden Salamander. A Ronald Neame production. England. 97
min., sd., b&w, 16 mm. © Pinewood Films, Ltd.; 3Feb50 (in notice:
1949); LF133.
LF134.
Fools rush in. A Pinewood Films production. England. 82 min.,
sd., b&w, 16 mm. From the play by Kenneth Horne. © Pinewood
Films, Ltd.; 23May49; LF134.
LF135.
Dear Mister Prohack. A Pentagon production. England. 89 min.,
sd., b&w, 35 mm. Adapted from the novel, Mister Prohack, by Arnold