Professional Documents
Culture Documents
Textbook Complex Networks in Software Knowledge and Social Systems Milos Savic Ebook All Chapter PDF
Textbook Complex Networks in Software Knowledge and Social Systems Milos Savic Ebook All Chapter PDF
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
https://textbookfull.com/product/complex-intelligent-and-
software-intensive-systems-leonard-barolli/
https://textbookfull.com/product/complex-systems-and-social-
practices-in-energy-transitions-framing-energy-sustainability-in-
the-time-of-renewables-1st-edition-nicola-labanca-eds/
https://textbookfull.com/product/transformations-of-social-
ecological-systems-studies-in-co-creating-integrated-knowledge-
toward-sustainable-futures-tetsu-sato/
https://textbookfull.com/product/complex-networks-principles-
methods-and-applications-1st-edition-vito-latora/
Intelligent Systems Reference Library 148
Complex
Networks in
Software,
Knowledge, and
Social Systems
Intelligent Systems Reference Library
Volume 148
Series editors
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
Lakhmi C. Jain
Complex Networks
in Software, Knowledge,
and Social Systems
123
Miloš Savić Lakhmi C. Jain
Faculty of Sciences, Department of Centre for Artificial Intelligence, Faculty of
Mathematics and Informatics Engineering and Information Technology
University of Novi Sad University of Technology Sydney
Novi Sad Sydney, NSW
Serbia Australia
Mirjana Ivanović
Faculty of Sciences, Department of
Mathematics and Informatics
University of Novi Sad
Novi Sad
Serbia
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
We are living in the information age being surrounded by diverse types of complex
networks. The study of complex networks has gained a significant research interest
in recent years, mostly because of their ubiquitous presence in nature and society,
leading to an inter-disciplinary research field involving researchers from all major
scientific disciplines.
This monograph deals with three types of complex networks describing the
structure of software systems, the semantic web ontologies, and the self-organized
social structure of research collaboration. In Chaps. 1 and 2, the authors give an
overview of fundamental concepts, metrics, methods, and models important in
studying real-world complex networks. As the main research contribution of the
monograph, they propose and empirically validate several novel methods to analyze
complex networks in which nodes are enriched with the domain-independent
structural metrics and the metrics from a particular domain (i.e., software metrics,
ontology metrics, and metrics of research performance, respectively).
Software networks are directed graphs that represent the dependencies among
software entities present in a complex software system. One software system can be
represented by several software networks reflecting its structure at different
v
vi Foreword
ix
x Preface
Part I Introduction
1 Introduction to Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Software Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Ontology Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Co-authorship Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Research Contributions of the Monograph . . . . . . . . . . . . . . . . . . 10
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Fundamentals of Complex Network Analysis . . . . . . . . . . . . . . . . . . 17
2.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Complex Network Measures and Methods . . . . . . . . . . . . . . . . . . 21
2.2.1 Connectivity of Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.3 Centrality Metrics and Algorithms . . . . . . . . . . . . . . . . . . 28
2.2.4 Node Similarity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.5 Link Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.6 Clustering, Cohesive Groups and Community Detection
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39
2.3 Basic Complex Network Models . . . . . . . . . . . . . . . . . . . . . . . .. 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 53
xi
xii Contents
xv
xvi About the Authors
xvii
xviii Acronyms
Abstract Complex networks are graphs describing complex natural, conceptual and
engineered systems. In this chapter we present an introduction to complex networks
by giving several examples of technological, social, information and biological net-
works. Then, we discuss complex networks that are in the focus of this monograph
(software, ontology and co-authorship networks). Finally, we briefly outline our main
research contributions presented in the monograph.
Generally speaking, complex systems are systems that are difficult to understand,
model and predict. The term system usually denotes a set of interrelated and inter-
dependent components (real or abstract) which together form an integrated whole.
The main characteristic of complex systems is that their behavior cannot be inferred
from the behavior of individual components. Complex systems are characterized by
a large number of constituent components that may interact in many different ways.
A variety of complex systems can be described by networks that show relations,
dependencies and interactions among their constituent components. Networks are
composed out of nodes (also called vertices, sites or actors) and links (also called
edges, bonds or ties) connecting nodes. From a mathematical and computer science
point of view, complex networks are graph data structures describing large and com-
plex real-world systems. Examples of complex networks are numerous and can be
found in almost any scientific discipline. Mark Newman classified complex networks
into four broad categories: technological networks, social networks, information net-
works and biological networks [42, 46]. However, this classification should not be
taken too rigorously since many networks may belong to more than one of the New-
man’s categories.
Technological networks represent engineered, man-made complex systems. Well
known examples of technological networks forming the backbone of technological
societies are networks of computers and related devices (e.g. Internet), power grids,
telephone networks (networks of landlines and wireless links that transmit phone
calls) and transportation networks such as airport, railway and road networks. Other
© Springer International Publishing AG, part of Springer Nature 2019 3
M. Savić et al., Complex Networks in Software, Knowledge,
and Social Systems, Intelligent Systems Reference Library 148,
https://doi.org/10.1007/978-3-319-91196-0_1
4 1 Introduction to Complex Networks
examples include: networks representing software systems such as call graphs [51]
and class collaboration networks [52], networks depicting the structure of oil refiner-
ies [4] and networks of electronic circuits [15].
Social networks describe interactions and relations between social entities. Social
entities can be individuals, social groups, institutions or even whole nations. There
are many types of social links: links determined by individual opinions on other
individuals, links describing transfers of material resources, links denoting collabo-
ration, cooperation and coalition, links resulting from behavioral interactions, links
imposed by formal relations within formally organized social groups, and so on. The
study of social networks dates back to 1930s. The first social networks documented
in the literature are sociograms made by psychotherapist Jacob L. Moreno published
in his book “Who shall survive” (1934). Moreno’s first sociograms visualizing sit-
ting preferences within a group of schoolmates clearly show the presence of gender
homophily and community structures in small social groups.
Another well-known example of an early documented social network constructed
by direct observations of social interactions is the Zachary’s karate club network [66].
This social network encompasses 34 members of an university karate club. Wayne
Zachary (an anthropologist from the Temple University, Philadelphia, USA) con-
nected two club’s members by a weighted link if they frequently interacted outside
regular club activities. Link weights in the network denote the strength of social
interactions. This network is particularly interesting because the club split into two
groups due to a conflict between two club’s members (the chief administrator and the
instructor). Zachary applied the Ford–Fulkerson algorithm to the network before the
split and showed that the algorithm gives accurate assignments of the members into
the groups formed after the split (only one assignment was incorrect). Nowadays, the
Zachary’s karate club network is one of benchmark networks used to test the ability
of community detection algorithms to infer ground truth communities.
Social network analysis has a very long tradition within social sciences [12]. The
topics studied by social scientists within the framework of social networks are quite
extensive including occupational mobility, political and economic systems, decision
making within groups, coalition formation, diffusion of innovations, sociology of sci-
ence and corporate interlocking [62]. However, the main characteristic of traditional
social network analysis studies is that they are based on small-scale social networks
assembled through interviews, questionnaires or direct observations of social ties.
With the rise of information technologies, Internet and massive amounts of data
about human activities, it becomes possible to construct and analyze large-scale
social networks. Examples include the network of collaboration among movie actors
extracted from the IMDb database, networks of research collaboration extracted from
massive bibliographic databases, the world trade network extracted from the Com-
trade database and networks of personal interactions extracted from e-mail server
logs, phone call logs or social networking sites such as Facebook and Twitter.
Information networks show connections between data items. The best known
example of an information network is the World Wide Web (WWW) network. The
nodes in the WWW network represent web pages. Two nodes A and B are con-
nected by a directed link A → B if the web page represented by A contains at least
1.1 Complex Networks 5
or activate a protein complex. Gene regulatory networks describe how cells control
the expression of genes. On the other hand, two genes are connected in a gene co-
expression network if their expression profiles are highly correlated. The nodes in
a metabolic network represent metabolites. Two metabolites are connected by an
undirected link if they participate in the same metabolic reaction. Neural networks
contain neurons and synapses that attempt to mimic biological problem solving
mechanisms [29]. They are typically designed to learn from experience using a
dense interconnection of neurons using synapses. A number of neural networks
were proposed in the past using various schemes such as feedforward connection,
learning schemes and so on. Biological networks also include networks describing
interactions between species such as food webs.
In the past years, researchers empirically analyzed a large number of large-scale
complex networks from different domains [19]. The burst started in 1998 with Watts
and Strogatz [63] who discovered that three large and sparse real-world networks
(the collaboration graph of film actors, the power-grid network of Western United
States and the neural network of the worm C. Elegans) exhibit the small-world
property (a small distance between two randomly chosen nodes) and a high level of
local clustering (highly dense ego-networks, i.e. highly dense sub-graphs induced
by a randomly chosen node and its adjacent nodes). This discovery was significant
because the classic theory of random graphs, used until then in modeling complex
networked structures, cannot explain the presence of these two qualities together in
one large and sparse graph.
Analysis of statistical properties of networks that represent large portions of the
World Wide Web [3, 33] and the Internet at the physical level [22] led to the discovery
that their degree distributions (probability P(k) that a randomly chosen node has
exactly k incident links) follow power-laws of the form P(k) ∼ k −γ , a property that
the Erdős-Renyi model of random graphs [10, 20, 21] does not predict. Networks
obeying the previously mentioned connectivity pattern are also known as scale-free
networks [6]. A vast majority of nodes in a scale-free network are loosely coupled, but
scale-free networks also contain a small fraction of nodes (called hubs or preferential
nodes) whose degree of connectedness is unexpectedly high and tend to increase
as networks evolve [6]. Two important consequences of the scale-free networked
organization are (1) the “robust, yet fragile” property [2, 11] (scale-free networks
are robust to random failures, but they experience an extreme disintegration when
failures occur in hubs) and (2) the absence of an propagation threshold in spreading
processes [49].
Newman’s studies of complex networks from different domains revealed another
two important characteristics of real-world networks: assortativity mixing patterns
(a tendency that hubs either establish or avoid connections among themselves) [44,
45] and community organization (the presence of highly dense sub-graphs in a sparse
graph) [23, 47]. Leskovec et al. [35] investigated the evolution of large-scale net-
works from different domains observing two evolutionary phenomena: denisification
power-laws (the number of links grows super-linearly as a power of the number of
nodes during evolution, which means that the average node degree increases in time)
and shrinking diameters (the diameter of a network decreases as it grows).
1.1 Complex Networks 7
After previously mentioned grounding works in the field of complex network anal-
ysis, researchers have analyzed a variety of complex biological, social, technological
and conceptual systems represented as networks (good overviews can be found in [1,
9, 19, 46]). These studies initiated a new theory of complex networks (also known
as network science) whose focus is on the analysis techniques and mathematical
models which can reveal, reproduce and explain frequently observed structural and
evolutionary characteristics of real-world networks.
The focus of this book is on three types of complex networks:
1. software networks – technological networks extracted from the source code of
computer programs that represent the design of software systems.
2. ontology networks – information-technological networks extracted from the
source code of semantic web ontologies that describe the structure of shared
and reusable knowledge.
3. co-authorship networks – social networks extracted from bibliographic databases
that show collaboration among scientists.
code is the most popular, valuable and trusted source of information for fact extrac-
tion because other artifacts (e.g. documentation, release notes, information collected
from version management or bug tracking systems) may be missing, outdated, or
unsynchronized with the actual implementation. Fact extraction is an automatic pro-
cess during which the source code is analyzed to identify software entities and their
mutual relationships. Therefore, software networks can be viewed as fact bases used
in reverse engineering, architecture recovery and software comprehension activities.
Architecture recovery techniques usually perform software network partitioning [17,
38] or cluster software entities according to feature vectors that can be constructed
from software networks [5, 39]. Graphical representations of software entities and
dependencies between them have long been accepted as comprehension aids to sup-
port reverse engineering processes [34]. Moreover, nodes in software networks can
be enriched with software metrics information in order to provide visual, polymet-
ric views of analyzed software systems [34]. Additionally, software networks can be
exploited to identify and remove “bad smells” from source code [48], to support static
concept location in source code [55], program comprehension during incremental
change [13] and software component retrieval [28], to identify design patterns in
source code [37], and to predict defects in software systems [8, 59, 69].
The term ontology has a very broad meaning. In information and computer sciences
ontology means a specification of a conceptualization [26]. An ontology formally
describes concepts and relations present in a domain of discourse, and as such mod-
els a certain part of reality. In the context of the Semantic Web vision [7, 56],
ontologies are formal specifications of shared and reusable knowledge that support
automated data-driven reasoning, data integration and interoperability of computer
programs processing web accessible resources. The traditional World Wide Web can
be viewed as a collection of inter-linked documents created by humans for humans.
The Semantic Web is an extension of the World Wide Web based on the concept of
structured inter-linked data that can be “consumed” by both humans and autonomous
software agents [7].
Information always involves multiple inter-related entities in some context. There-
fore, if we want to build computer programs able to perform tasks involving publicly
available data then we have to specify the structure of data and their surrounding
context. The sentence “John Doe is the doctoral adviser of Richard Roe” is an exam-
ple of information that involves two entities in the context of university education.
If a person does not have basic knowledge about the structure of academic organiza-
tions then he/she is unable to understand the sentence and possibly act upon it. Even
worse situation is with computer programs that do not have any inherent cognitive
capabilities. However, if we formally specify the structure of academic organizations
by stating relevant concepts (such as Professor and PhDStudent), relations (such as
AdvisorOf), and their mutual associations in the form of “subject-predicate-object”
1.3 Ontology Networks 9
Miirun Eedla pesi yhä vaatteita joen rannalla. Tällä välin oli hän
käynyt kotonaan ja pukeutunut pyhähameeseensa, jonka helmat oli
kumminkin varovaisuuden vuoksi kietaistu vyötärölle.
Punaposkisena ja reippaana siinä vaan liehui ollenkaan
aavistamatta, että veti puoleensa ohi kulkevien ijäkkäitten
poikamiesten huomion, ja sai vakaiset miehet ajattelemaan naima-
asioita silloin, kun muut tärkeämmät asiat olivat kysymyksessä.
— Kuka?
— Kuuletko sinä…!
— Ka, mitä?
Rietula terästyi.
— Oleppa nyt lautamies tai mikä tahansa, niin anna sille sapiskat.
— Niin, minä vaan kysyn, että mitä varten sinä tulet tänne
kokoukseen puhumaan puuta heinää ja sotkemaan ihmisten asioita
ja yrityksiä, aloitteli Rietula. Ja minä kysyn vielä, mikä sinä olet
olevinasi, kun sinun pitää aina haukkua meitä huonosta sovusta ja
pahanilkisyydestä. Sinä et ole edes lautamies ja siltä sinä otat
suuvuoron ja esiinnyt näissä julkisissa paikoissa komentavasti kuin
kukko kanaparvessa. Mutta me emme kuuntele enää tästälähin
sinun turhia jaarituksiasi. Sinut ajetaan ulos, jollet pidä visusti kiinni
leipälaukkuasi. Ajetaan pian koko kyläkunnasta ja Kuivalan
seurakunnasta sinne Hämeeseen, josta olet tänne tullutkin
sekaannusta aikaansaamaan.
Taisi tulla ihan ilkoiset häpeät. Mikä se saikin nyt miehen niin
pehmenemään? Eihän vaan tuo Eedla…? Kyllä se taisi olla sen
ansio, että ryhti meni mieheltä. Mokomakin tallukka siinä…! Ilmankos
se Salomoni sanoikin, että älä katso naisen puoleen. Ja ilmankos
Simsonilta voimat meni, kun nainen leikkasi hiukset. Voi tätä Eedlan
lepukkata, minkä teki!
— Kuuletko sinä!
— Että mitä?
— Älä tolita!
— Kuulkaahan sitä!
— Ulos, Kyrmyniska!
— … vää!
— En!
Kun Iisakki oli saanut kaikki lestit nakelluksi tuvasta, otti hän
rikkinäisen padan ja paistinpannun ja lennätti ne rantakiville niin että
rämähti. Niitä seurasi pöytä ja sänkyrenkkana, joka aukeni kokonaan
liitoksistaan saatuaan potkun rantakivikolle.
— Tartuhan kiinni!
Ja tupa pyllähti nyt vuorostaan jokeen, josta kauhistunut suutari
yhä harasi lestejään.
— Ei ole lysti Iisakilla käräjissä, sen minä takaan. Jos olisi muuten,
mutta meneppäs vierittämään toisen osuutta jokeen. Kyllä se häpeä
vielä maksetaan.
— Pois piti lähteä näiltä mailta. Riita ja tora vaan kiihtyy tässä
kurjassa kyläkunnassa, ja minun vanhat korvani eivät sitä kestä
enää kuunnella. Pois rauhallisemmille maille, jossa ihmisten
asumuksia ei vieritellä jokeen eikä juorut terävien käärmeenkielien
lailla myrkytä ihmisten elämää. Ei minusta ole riitelijäksi enää…
— Enkä minäkään rupea ketään panettelemaan, virkkoi suutarin
eukko ja lähti menemään miehensä häntä seuratessa.