UNIVERSITY OF LJUBLJANA
INSTITUTE OF MATHEMATICS, PHYSICS AND MECHANICS
DEPARTMENT OF THEORETICAL COMPUTER SCIENCE
JADRANSKA 19, 1 000 LJUBLJANA, SLOVENIA
Preprint series, Vol. 41 (2003), 871
PAJEK
ANALYSIS AND VISUALIZATION
OF LARGE NETWORKS
Vladimir Batagelj, Andrej Mrvar
ISSN 13184865
Version: March 4, 2003
Math.Subj.Class.(2000): 05 C90, 68 R10, 76 M27, 68 U05,
05 C50, 05 C85, 90 C27, 92 H30, 92 G30, 93 A15.
Supported by the Ministry of Education, Science and Sport of Slovenia,
Projects J18532 and Z53350.
To be published in Graph Drawing Software book, edited by M. J¨ unger
and P. Mutzel, in the Springer series Mathematics and Visualization.
Address: Vladimir Batagelj, University of Ljubljana, FMF, Department
of Mathematics, and IMFM Ljubljana, Department of TCS, Jadranska
ulica 19, 1 000 Ljubljana, Slovenia
email: vladimir.batagelj@unilj.si
Ljubljana, March 14, 2003
0
Pajek
Analysis and Visualization of Large Networks
Vladimir Batagelj
1
and Andrej Mrvar
2
1
Department of Mathematics, Faculty of Mathematics and Physics, University of
Ljubljana, Slovenia
2
Faculty of Social Sciences, University of Ljubljana, Slovenia
1 Introduction
Pajek is a program, for Windows, for analysis and visu
alization of large networks having some ten or houndred
of thousands of vertices. In Slovenian language pajek
means spider.
The design of Pajek is based on experiences gained in development of
graph data structure and algorithms libraries Graph [2] and Xgraph [15],
collection of network analysis and visualization programs STRAN, RelCalc,
Draw, Energ [9], and SGMLbased graph description markup language NetML
[8]. We started the development of Pajek in November 1996.
The main goals in the design of Pajek are [10,13]:
• to support abstraction by (recursive) decomposition of a large network
into several smaller networks that can be treated further using more
sophisticated methods;
• to provide the user with some powerful visualization tools;
• to implement a selection of eﬃcient (subquadratic) algorithms for analysis
of large networks.
With Pajek we can (see Figure 1): ﬁnd clusters (components, neighbour
hoods of ‘important’ vertices, cores, etc.) in a network, extract vertices that
belong to the same clusters and show them separately, possibly with the parts
of the context (detailed local view), shrink vertices in clusters and show re
lations among clusters (global view).
Besides ordinary (directed, undirected, mixed) networks Pajek supports
also:
• 2mode networks, bipartite (valued) graphs – networks between two dis
joint sets of vertices. Examples of such networks are: (authors, papers,
cites the paper), (authors, papers, is the (co)author of the paper), (peo
ple, events, was present at), (people, institutions, is member of), (articles,
shoping lists, is on the list).
This work was partially supported by the Ministry of Education, Science and
Sport of Slovenia, Projects J18532 and Z53350.
2 Vladimir Batagelj and Andrej Mrvar
Fig. 1. Approaches to deal with large networks
• temporal networks, dynamic graphs – networks changing over time.
In this chapter we present the main characteristics of Pajek. Since large
networks can’t be visualized in details in a single view we have ﬁrst to identify
interesting substructures in such network and then visualize them as separate
views. The central, algorithmic section of this chapter deals mainly with
diﬀerent eﬃcient approaches to this problem.
2 Applications
There exist several sources of large networks that are already in machine
readable form. Pajek provides tools for analysis and visualization of such
networks and is applied by researchers in diﬀerent areas: social network analy
sis [11], chemistry (organic molecule), biomedical/genomics research (protein
receptor interaction networks) [59], genealogies [57,28], Internet networks [22],
citation networks [42], diﬀusion networks (AIDS, news), analysis of texts [17],
datamining (2mode networks) [14], etc. Although it was developed primarily
for analysis of large networks it is often used also for, especially visualization
of, small networks.
In last months (end of 2002) we had over 500 downloads of Pajek per
month.
Pajek is also used at several universities: Ljubljana, Rotterdam, Stanford,
Irvine, The Ohio State University, Penn State, Wisconsin/Madison, Vienna,
Pajek, Analysis and Visualization of Large Networks 3
Freiburg, Madrid, and some others as a support in courses on network anal
ysis. Together with Wouter de Nooy from University of Rotterdam we wrote
a course book Exploratory Social Network Analysis With Pajek[25].
3 Algorithms
To support the design goals we implemented several algorithms known from
the literature (see section 4.2), but for some tasks new, eﬃcient algorithms,
suitable to deal with large networks, had to be developed. They mainly pro
vide diﬀerent ways to identify interesting substructures in a given network.
3.1 Citation weights
In a given set of units/vertices U (articles, books, works, etc.) we introduce
a citing relation/set of arcs R ⊆ U U
uRv ≡ v cites u
which determines a citation network N = (U, R).
The citation network analysis started in 1964 with the paper of Garﬁeld et
al. [29]. In 1989 Hummon and Doreian [36] proposed three indices – weights
of arcs that provide us with automatic way to identify the (most) important
part of the citation network. For two of these indices we developed algorithms
to eﬃciently compute them [4].
A citing relation is usually irreﬂexive (no loops) and (almost) acyclic.
In the following we shall assume that it has these two properties. Since in
reallife citation networks the strong components are small (usually 2 or 3
vertices) we can transform such network into an acyclic network by shrinking
strong components and deleting loops. For other approaches see [4]. It is also
useful to transform a citation network to its standardized form by adding a
common source vertex s / ∈ U and a common sink vertex t / ∈ U. The source
s is linked by an arc to all minimal elements of R; and all maximal elements
of R are linked to the sink t. Thus we get a stdigraph [TF 2.2]. Finally, to
make the theory smoother, we add also the ‘feedback’ arc (t, s).
The search path count (SPC) method is based on counters n(u, v) that
count the number of diﬀerent paths from s to t through the arc (u, v). To
compute n(u, v) we introduce two auxiliary quantities: n
−
(v) counts the num
ber of diﬀerent paths from s to v, and n
+
(v) counts the number of diﬀerent
paths from v to t.
It follows by basic principles of combinatorics that
n(u, v) = n
−
(u) n
+
(v), (u, v) ∈ R
where
n
−
(u) =
1 u = s
v:vRu
n
−
(v) otherwise
4 Vladimir Batagelj and Andrej Mrvar
Fig. 2. Part of SOM main subnetwork at level 0.001
and
n
+
(u) =
1 u = t
v:uRv
n
+
(v) otherwise
This is the basis of an eﬃcient algorithm for computing n(u, v) – after the
topological sort [TF 2.2] of the stdigraph we can compute, using the above
relations in topological order, the weights in time of order O(m), m = [R[.
The topological order ensures that all the quantities in the right sides of the
above equalities are already computed when needed.
The Hummon and Doreian indices are deﬁned as follows:
• search path link count (SPLC) method: w
l
(u, v) equals the number of
“all possible search paths through the network emanating from an origin
node” through the arc (u, v) ∈ R.
• search path node pair (SPNP) method: w
p
(u, v) “accounts for all con
nected vertex pairs along the paths through the arc (u, v) ∈ R”.
We get the SPLC weights by applying the SPC method on the network
obtained from a given standardized network by linking the source s by an arc
Pajek, Analysis and Visualization of Large Networks 5
Fig. 3. 0, 1, 2 and 3 core
to each nonminimal vertex from U; and the SPNP weights by applying the
SPC method on the network obtained from the SPLC network by additionally
linking by an arc each nonmaximal vertex from U to the sink t.
The values of counters n(u, v) form a ﬂow in the citation network – the
Kirchoﬀ’s vertex law holds: For every vertex u in a standardized citation
network incoming ﬂow = outgoing ﬂow:
v:vRu
n(v, u) =
v:uRv
n(u, v) = n
−
(u) n
+
(u)
The weight n(t, s) equals to the total ﬂow through network and provides a
natural normalization of weights
w(u, v) =
n(u, v)
n(t, s)
⇒ 0 ≤ w(u, v) ≤ 1
and if C is a minimal arccutset
(u,v)∈C
w(u, v) = 1
In large networks the values of weights can grow very large. This should
be considered in the implementation of the algorithms.
In Figure 2 the main subnetwork obtained as an edgecut at level 0.001
of the citation network (n = 4470, m = 12731) on SOM (selforganizing
maps) literature is presented. The picture is exported in SVG with addi
tional Javascript support that provides the user with options to inspect the
subnetwork at diﬀerent predetermined levels.
3.2 Cores and generalized cores
The notion of core was introduced by Seidman in 1983 [51]. Let G = (V, E)
be a graph. A subgraph H = (W, E[W) induced by the set W is a kcore or a
core of order k iﬀ ∀v ∈ W : deg
H
(v) ≥ k, and H is a maximal subgraph with
this property. The core of maximum order is also called the main core. The
6 Vladimir Batagelj and Andrej Mrvar
L.Guibas
M.Sharir
M.vanKreveld
B.Chazelle
J.Snoeyink
A.Garg
D.Dobkin
F.Preparata
J.Hershberger
C.Yap
J.Boissonnat
O.Schwarzkopf
J.Mitchell
M.Overmars
P.Gupta
R.Pollack
D.Eppstein
M.Goodrich
M.Bern
P.Agarwal
I.Tollis
H.Edelsbrunner
E.Arkin
R.Janardan
M.deBerg
D.Halperin
L.Vismara
M.Smid
G.Toussaint
M.Yvinec
M.Teillaud
S.Suri
R.Klein
E.Welzl
G.Liotta
J.Pach
P.Bose
J.Schwerdt
J.Majhi
J.Czyzowicz
R.Tamassia
B.Aronov
R.Seidel
J.Urrutia
J.Vitter
J.Matousek
C.Icking
J.O’Rourke
O.Devillers
G.diBattista
Fig. 4. pScore at level 46 of Geomlib network
core number of vertex v is the highest order of a core that contains this vertex.
The degree deg(v) can be: indegree, outdegree, indegree + outdegree, etc.,
determining diﬀerent types of cores.
In Figure 3 an example of cores decomposition of a given graph is pre
sented. From this ﬁgure we can see the following properties of cores:
• The cores are nested: i < j =⇒ H
j
⊆ H
i
• Cores are not necessarily connected subgraphs.
Our algorithm for determining the cores hierarchy is based on the follow
ing property [16]:
If from a given graph G = (V, E) we recursively delete all vertices,
and edges incident with them, of degree less than k, the remaining
graph is the kcore.
Its outline is given in Algorithm 1. In the reﬁnements of the algorithm we
have to provide eﬃcient implementations of sorting the degrees and their
reordering. Since the values of degrees are in the range 0..n −1 we can order
them in O(n) using a variant of bin sort; and the update of the ordering can
be done in a constant time. For details see [18].
The cores, because they can be determined very eﬃciently, are one among
few concepts that provide us with meaningful decompositions of large net
works. We expect that diﬀerent approaches to the analysis of large networks
Pajek, Analysis and Visualization of Large Networks 7
Algorithm 1: Core Numbers Algorithm
Input : Graph G = (V, E) represented by lists of neighbors
Output : Table core[V ] with core number for each vertex
Compute the degrees of vertices
Order the set of vertices V in increasing order of their degrees
for each v ∈ V in the order do
Set core[v] = degree[v]
for each u ∈ adj(v) do
if degree[u] > degree[v] then
Set degree[u] = degree[u] − 1
Reorder V accordingly
end
end
end
can be built on this basis. For example: we get the following bound on the
chromatic number of a given graph G
χ(G) ≤ 1 + core(G)
Cores can also be used to localize the search for interesting subnetworks in
large networks since: if it exists, a kcomponent is contained in a kcore; and
a kclique is contained in a kcore.
The notion of core can be generalized to networks. Let N = (V, E, w) be a
network, where G = (V, E) is a graph and w : E → IR is a function assigning
values to edges. A vertex property function on N, or a pfunction for short, is
a function p(v, U), v ∈ V , U ⊆ V with real values. Let adj
U
(v) = adj(v) ∩U.
Besides degrees, here are some examples of pfunctions:
p
S
(v, U) =
u∈adj
U
(v)
w(v, u), where w : E → IR
+
0
p
M
(v, U) = max
u∈adj
U
(v)
w(v, u), where w : E → IR
p
k
(v, U) = number of cycles of length k through vertex v in (U, E[U)
The subgraph H = (C, E[C) induced by the set C ⊆ V is a pcore at level
t ∈ IR iﬀ ∀v ∈ C : t ≤ p(v, C) and C is a maximal such set.
The function p is monotone iﬀ it has the property
C
1
⊂ C
2
⇒ ∀v ∈ V : (p(v, C
1
) ≤ p(v, C
2
))
The degrees and the functions p
S
, p
M
and p
k
are monotone. For a monotone
function the pcore at level t can be determined, as in the ordinary case, by
successively deleting vertices with value of p lower than t; and the cores on
diﬀerent levels are nested
t
1
< t
2
⇒ H
t2
⊆ H
t1
8 Vladimir Batagelj and Andrej Mrvar
Damianus/Georgio/
Legnussa/Babalio/
Marin/Gondola/
Magdalena/Grede/
Nicolinus/Gondola/
Franussa/Bona/
Marinus/Bona/
Phylippa/Mence/
Sarachin/Bona/
Nicoletta/Gondola/
Marinus/Zrieva/
Maria/Ragnina/
Lorenzo/Ragnina/
Slavussa/Mence/
Junius/Zrieva/
Margarita/Bona/
Junius/Georgio/
Anucla/Zrieva/
Michael/Zrieva/
Francischa/Georgio/
Nicola/Ragnina/
Nicoleta/Zrieva/
Fig. 5. Marriages among relatives in Ragusa
The pfunction is local iﬀ
p(v, U) = p(v, adj
U
(v))
The degrees, p
S
and p
M
are local; but p
k
is not local for k ≥ 4. For a local
pfunction an O(mmax(∆, log n)) algorithm for determining the pcore levels
exists, assuming that p(v, adj
C
(v)) can be computed in O(deg
C
(v)) [19].
In Figure 4 a p
S
core at level 46 of the collaboration network in the ﬁeld
of computational geometry [37] is presented.
3.3 Pattern searching
If a selected pattern determined by a given graph does not occur frequently
in a sparse network the straightforward backtracking algorithm applied for
pattern searching ﬁnds all appearences of the pattern very fast even in the
case of very large networks.
To speed up the search or to consider some additional properties of the
pattern, a user can set some additional options:
• vertices in network should match with vertices in pattern in some nomi
nal, ordinal or numerical property (for example, type of atom in molec
ula);
• values of edges must match (for example, edges representing male/female
links in the case of pgraphs [57]);
• the ﬁrst vertex in the pattern can be selected only from a given subset of
vertices in the network.
Pattern searching was successfully applied to searching for patterns of atoms
in molecula (carbon rings) and searching for relinking marriages in genealo
gies. Figure 5 presents three connected relinking marriages which are non
blood marriages found in the genealogy of ragusan noble families [28]. The
Pajek, Analysis and Visualization of Large Networks 9
1  003
2  012
3  102
4  021D
5  021U
6  021C
7  111D
8  111U
9  030T
10  030C
11  201
12  120D
13  120U
14  120C
15  210
16  300
Fig. 6. Triads
genealogy is represented as a pgraph. A solid arc indicates the is a son of
relation, and a dotted arc indicates the is a daughter of relation. In all
three patterns a brother and a sister from one family found their partners in
the same other family.
3.4 Triads
Let G = (V, R) be a simple directed graph without loops. A triad is a sub
graph induced by a given set of three vertices. There are 16 nonisomorphic
(types of) triads [55, page 244]. They can be partitioned into three basic
types (see Figure 6):
• the null triad 003;
• dyadic triads 012 and 102; and
• connected triads: 111D, 201, 210, 300, 021D, 111U, 120D, 021U, 030T,
120U, 021C, 030C and 120C.
10 Vladimir Batagelj and Andrej Mrvar
Several properties of a graph can be expressed in terms of its triadic spectrum
– distribution of all its triads. It also provides ingredients for p
∗
network
models [56]. A direct approach to determine the triadic spectrum is of order
O(n
3
); but in most large graphs it can be determined much faster [12]. The
algorithm is based on the folllowing observation: in a large and sparse graph
most triads are null triads. Let T
1
, T
2
, T
3
be the number of null, dyadic and
connected triads. Since the total number of triads is T =
n
3
and the above
types partition the set of all triads, the idea of the algorithm is as follows:
• count all dyadic T
2
and all connected T
3
triads with their subtypes;
• compute the number of null triads T
1
= T −T
2
−T
3
.
In the algorithm we have to assure that every nonnull triad is counted ex
actly once while scanning the set of arcs. A set of three vertices ¦v, u, w¦
can be in general selected in 6 diﬀerent ways (v, u, w), (v, w, u), (u, v, w),
(u, w, v), (w, v, u), (w, u, v). We solve the isomorphism problem by introduc
ing the canonical selection that contributes to the triadic count; the other,
noncanonical selections need not to be considered in the counting process.
Every connected dyad forms a dyadic triad with every vertex both mem
bers of the dyad are not adjacent to. Let
ˆ
R = R∪R
−1
. Each pair of vertices
(v, u), v < u connected by an arc contributes
n −[
ˆ
R(u) ∪
ˆ
R(v) ¸ ¦u, v¦[ −2
triads of type 3 – 102, if u and v are connected in both directions; and
of type 2 – 012 otherwise. The condition v < u determines the canonical
selection for dyadic triads. A selection (v, u, w) of connected triad is canonical
iﬀ v < u < w.
The triads isomorphism problem can be eﬃciently solved by assigning to
each triad a code – an integer number between 0 to 63 obtained by treating
the outdiagonal entries of triad adjacency matrix as a binary number. Each
triad code corresponds to a unique triad type that can be determined from
a precomputed table.
For a connected triad we can always assume that v is the smallest of its
vertices. So we have to determine the canonical selection from the remaining
two selections (v, u, w) and (v, w, u). If v < w < u and v
ˆ
Rw then the selection
(v, w, u) was already counted before. Therefore we have to consider it as
canonical only if it is not v
ˆ
Rw.
In an implementation of the algorithm we must also take care about the
range overﬂow in the case of T and T
1
.
The total complexity of the algorithm is O(
ˆ
∆m) and thus, for graphs with
small maximum degree
ˆ
∆ << n, since 2m ≤ n
ˆ
∆, of order O(n).
3.5 Triangular connectivities
In this subsection we present an extension of notion of connectivity to con
nectivity by chains of triangles.
Pajek, Analysis and Visualization of Large Networks 11
AJTAI, MIKLOS
ALAVI, YOUSEF
ALON, NOGA
ARONOV, BORIS
BABAI, LASZLO
BOLLOBAS, BELA
CHARTRAND, GARY
CHEN, GUANTAO
CHUNG, FAN RONG K.
COLBOURN, CHARLES J.
FAUDREE, RALPH J.
FRANKL, PETER
FUREDI, ZOLTAN
GODDARD, WAYNE D.
GRAHAM, RONALD L.
GYARFAS, ANDRAS
HARARY, FRANK
HEDETNIEMI, STEPHEN T.
HENNING, MICHAEL A.
JACOBSON, MICHAEL S.
KLEITMAN, DANIEL J.
KOMLOS, JANOS
KUBICKI, GRZEGORZ
LASKAR, RENU C.
LEHEL, JENO
LINIAL, NATHAN
LOVASZ, LASZLO
MAGIDOR, MENACHEM MCKAY, BRENDAN D.
MULLIN, RONALD C.
NESETRIL, JAROSLAV
OELLERMANN, ORTRUD R.
PACH, JANOS
PHELPS, KEVIN T.
POLLACK, RICHARD M.
RODL, VOJTECH
ROSA, ALEXANDER
SAKS, MICHAEL E.
SCHELP, RICHARD H.
SCHWENK, ALLEN JOHN
SHELAH, SAHARON
SPENCER, JOEL H.
STINSON, DOUGLAS ROBERT
SZEMEREDI, ENDRE
TUZA, ZSOLT
WORMALD, NICHOLAS C.
Fig. 7. Edgecut at level 16 of triangular network of Erd˝ os collaboration graph
Undirected graphs
We call a triangle a subgraph isomorphic to K
3
. A subgraph H = (V
, E
)
of G = (V, E) is triangular if each its vertex and each its edge belongs to at
least one triangle in H.
A sequence (T
1
, T
2
, . . . , T
s
) of triangles of G (vertex) triangularly connects
vertices u, v ∈ V iﬀ u ∈ T
1
and v ∈ T
s
or u ∈ T
s
and v ∈ T
1
and V (T
i−1
) ∩
V (T
i
) ,= ∅, i = 2, . . . s. Such sequence is called a triangular chain. It edge
triangularly connects vertices u, v ∈ V iﬀ a stronger version of the second
condition holds E(T
i−1
) ∩ E(T
i
) ,= ∅, i = 2, . . . s.
A pair of vertices u, v ∈ V is (vertex) triangularly connected iﬀ u =
v, or there exists a chain that triangularly connects u and v. Triangular
connectivity is an equivalence relation on the set of vertices V ; and nontrivial
triangular connectivity components are exactly maximal connected triangular
subgraphs.
A pair of vertices u, v ∈ V is edge triangularly connected iﬀ u = v, or
there exists a chain that edge triangularly connects u and v. Edge triangular
connectivity components determine an equivalence relation on the set of edges
E. Each nontriangular edge is in its own component.
12 Vladimir Batagelj and Andrej Mrvar
Let G be a simple undirected graph. A triangular network N
T
(G) =
(V, E
T
, w) determined by G is a subgraph G
T
= (V, E
T
) of G which set of
edges E
T
consists of all triangular edges of E(G). For e ∈ E
T
the weight w(e)
equals to the number of diﬀerent triangles in G to which e belongs.
A procedure for determining E
T
and w(e), e ∈ E
T
simply collects all edges
with w(e) = [adj(u) ∩ adj(v)[ > 0, e = ¦u, v¦ ∈ E. If the sets of neighbors
adj(v) are ordered we can use merging to compute w(e) faster. Nontrivial
triangular connectivity components are exactly the components of G
T
.
Triangular networks can be used to eﬃciently identify dense cliquelike
parts of a graph. If an edge e belongs to a kclique in G then w(e) ≥ k −2.
In Figure 7 the edgecut at level 16 of triangular network of Erd˝ os collab
oration graph [34,11] (without Erd˝ os, n = 6926, m = 11343) is presented.
Directed graphs
If the graph G is mixed we replace edges with pairs of opposite arcs. In the
following let G = (V, A) be a simple directed graph without loops. For a
selected arc (u, v) ∈ A there are four diﬀerent types of directed triangles:
cyclic, transitive, input and output.
cyc tra in out
For each type we get the corresponding triangular network N
cyc
, N
tra
,
N
in
and N
out
. Also procedures for determining the networks are similar to
undirected case. For example, for the cyclic network N
cyc
= (V, A
cyc
, w
cyc
)
we have for (u, v) ∈ A
cyc
w
cyc
(u, v) = [outadj(v) ∩ inadj(u)[
In directed graphs we distinguish weak and strong connectivity. The weak
connectivity can be reduced to the undirected concepts in the skeleton S =
(V, E
S
) of the given graph G
E
S
= ¦¦u, v¦ : u ,= v ∧ (u, v) ∈ A¦
A subgraph H = (V
, A
) of G is cyclic triangular if each its vertex and
each its arc belongs to at least one cyclic triangle in H. A connected cyclic
triangular subgraph is also strongly connected.
A sequence (T
1
, T
2
, . . . , T
s
) of cyclic triangles of G (vertex) cyclic trian
gularly connects vertex u ∈ V to vertex v ∈ V iﬀ u ∈ T
1
and v ∈ T
s
or u ∈ T
s
and v ∈ T
1
and V (T
i−1
) ∩ V (T
i
) ,= ∅, i = 2, . . . s; such sequence is called a
cyclic triangular chain. It arc cyclic triangularly connects vertex u to vertex
Pajek, Analysis and Visualization of Large Networks 13
abstract
American Library Association /ALA/
American Library Directory
bibliographic record
bibliography
binding
blanket order
book
book size
Books in Print /BIP/
call number
catalog
charge
collation
colophon
condition
copyright
cover
dummy
dust jacket
edition
editor
endpaper
entry
fiction
fixed location
folio
frequency
front matter
halftitle
homepage
imprint
index
International Standard Book Number /ISBN/
invoice
issue
journal
layout
librarian
library
library binding
Library Literature
new book
Oak Knoll
page
parts of a book
periodical
plate
printing
publication
published price
publisher
publishing
review
round table
serial
series
suggestion box
table of contents /TOC/
text
title
title page
transaction log
vendor
work
Pajek
Fig. 8. Edgecut at level 11 of transitive network of ODLIS dictionary graph
v iﬀ A(T
i−1
) ∩ A(T
i
) ,= ∅, i = 2, . . . s holds; such sequence is called an arc
cyclic triangular chain.
Again, we can introduce two types of cyclic triangular connectivity:
A pair of vertices u, v ∈ V is (vertex) cyclic triangularly connected iﬀ
u = v, or there exists a cyclic triangular chain that connects u to v.
A pair of vertices u, v ∈ V is arc cyclic triangularly connected iﬀ u = v,
or there exists an arc cyclic triangular chain that connects u to v.
Cyclic triangular connectivity is an equivalence relation on the set of
vertices V ; and the arc cyclic triangular connectivity components determine
an equivalence relation on the set of arcs A.
There exists also a parallel to unilateral connectivity. The vertex v ∈ V
is transitively triangularly reachable from the vertex u ∈ V iﬀ u = v, or there
exists a walk from u to v in which each arc is transitive – is a base of some
transitive triangle.
Transitive arcs are essentially reinforced arcs. If we remove from a graph
G = (V, A) a transitive arc the reachability relation in V does not change.
In Figure 8 the edgecut at level 11 of transitive network of ODLIS dic
tionary graph [45] is presented.
These notions can be generalized to short cycle connectivity [20].
14 Vladimir Batagelj and Andrej Mrvar
3.6 Generating large random networks
Let p ∈ [0, 1] be a given probability. An Erd˝ osR´enyi random graph G ∈
((n, p) is obtained by selecting every edge ¦u, v¦ with a probability p:
Pr(¦u, v¦ ∈ G) = p
It is easy to write a program to do this:
E = ∅;
for u = 1 to n −1 do for v = u + 1 to n do
if random < p then E = E ∪ ¦¦u, v¦¦;
But, for large and very sparse networks this is too slow. A faster procedure
can be built on the following idea: move by random steps over the M =
n
2
cells and mark the touched cells.
How to select the length of the random step? For our Bernoulli model
we have Pr(step = s) = q
s−1
p, s = 1, 2, 3, . . . and F(s) = Pr(step < s) =
s−1
t=1
q
t−1
p = 1−q
s−1
. Therefore we get the random step s from the equation
F(s) = random
s = F
−1
(random) = 1 +¸
log(1 −random)
log q

This is the basis of the fast random graph generation procedure presented in
Algorithm 2. The expected number of steps of this procedure is Mp.
Algorithm 2: Sparse Erd˝ osR´enyi random graph generator
Input : Probability p, Number of vertices n
Output : Random graph G = (1..n, E)
Set q = 1 − p; f = 1; u = 2; k = 0; E = ∅; M = n(n − 1)/2; again = true
while again do
Set k = k + 1 +
ln(1 − random)
ln q
if k > M then Set again = false else
while f < k do Set f = f + u; u = u + 1
Set v = k + u − f − 1; E = E ∪ {{u, v}}
end
od
The same approach is easy to adapt to generate diﬀerent types of random
graphs: undirected, directed, acyclic, undirected bipartite, directed bipartite,
acyclic bipartite, 2mode, and others [5].
Pajek contains also a reﬁnement of the model for generating scale free
networks, proposed in [47]. At each step of the growth a new vertex and k
Pajek, Analysis and Visualization of Large Networks 15
edges are added to the network N. The endpoints of the edges are randomly
selected among all vertices according to the probability
Pr(v) = α
indeg(v)
[E[
+β
outdeg(v)
[E[
+γ
1
[V [
where α + β + γ = 1. It is easy to check that
v∈V
Pr(v) = 1. The time
complexity of this procedure is O(m).
3.7 2mode networks
A 2mode network is a structure N = (U, V, A, w), where U and V are disjoint
sets of vertices, A is the set of arcs with the initial vertex in the set U and
the terminal vertex in the set V , and w : A → IR is a weight. If no weight is
deﬁned we can assume a constant weight w(u, v) = 1 for all arcs (u, v) ∈ A.
The set A can be viewed also as a relation A ⊆ U V . A 2mode network
can be formally represented by rectangular matrix A = [a
uv
]
U×V
.
a
uv
=
w(u, v) (u, v) ∈ A
0 otherwise
For direct analysis of 2mode networks we can use eigenvector approach,
clustering and blockmodeling. But most often we transform a 2mode net
work into an ordinary (1mode) network N
1
= (U, E
1
, w
1
) or/and N
2
=
(V, E
2
, w
2
), where E
1
and w
1
are determined by the matrix A
(1)
= AA
T
,
a
(1)
uv
=
z∈V
a
uz
a
T
zv
. Evidently a
(1)
uv
= a
(1)
vu
. There is an edge ¦u, v¦ ∈ E
1
in
N
1
iﬀ adj(u) ∩ adj(v) ,= ∅. Its weight is w
1
(u, v) = a
(1)
uv
. The network N
2
is
determined in a similar way by the matrix A
(2)
= A
T
A. The networks N
1
and N
2
are analyzed using standard methods.
3.8 Normalizations
The normalization approach was developed for quick inspection of (1mode)
networks obtained from 2mode networks [14,60] – a kind of network based
datamining. In networks obtained from large 2mode networks there are
often huge diﬀerences in weights. Therefore it is not possible to compare the
vertices according to the raw data. First we have to normalize the network to
make the weights comparable. There exist several ways how to do this. Some
of them are presented in Table 1. They can be used also on other networks.
In the case of networks without loops we deﬁne the diagonal weights
for undirected networks as the sum of outdiagonal elements in the row (or
column)
w
vv
=
u
w
vu
16 Vladimir Batagelj and Andrej Mrvar
Fig. 9. GeoDeg normalization of Reuters terror news network
Table 1. Weight normalizations
Geouv =
wuv
√
wuuwvv
Input
uv
=
wuv
wvv
Minuv =
wuv
min(wuu, wvv)
MinDiruv =
wuv
wuu
wuu ≤ wvv
0 otherwise
GeoDeg
uv
=
wuv
deg
u
deg
v
Output
uv
=
wuv
wuu
Maxuv =
wuv
max(wuu, wvv)
MaxDiruv =
wuv
wvv
wuu ≤ wvv
0 otherwise
and for directed networks as some mean value of the row and column sum,
for example
w
vv
=
1
2
(
u
w
vu
+
u
w
uv
)
Usually we assume that the network does not contain any isolated vertex.
After a selected normalization the important parts of network are ob
tained by edgecutting the normalized network at selected level t and pre
serving components with at least k vertices.
Pajek, Analysis and Visualization of Large Networks 17
In Figure 9 a part of ‘themes’ from Reuters terror news network [14]
determined by a cut of its GeoDeg normalization is presented.
3.9 Blockmodeling
Pajek  shadow 0.00,1.00 Sep 51998
World trade  alphabetic order
afg
alb
alg
arg
aus
aut
bel
bol
bra
brm
bul
bur
cam
can
car
cha
chd
chi
col
con
cos
cub
cyp
cze
dah
den
dom
ecu
ege
egy
els
eth
fin
fra
gab
gha
gre
gua
gui
hai
hon
hun
ice
ind
ins
ire
irn
irq
isr
ita
ivo
jam
jap
jor
ken
kmr
kod
kor
kuw
lao
leb
lib
liy
lux
maa
mat
mex
mla
mli
mon
mor
nau
nep
net
nic
nig
nir
nor
nze
pak
pan
par
per
phi
pol
por
rum
rwa
saf
sau
sen
sie
som
spa
sri
sud
swe
swi
syr
tai
tha
tog
tri
tun
tur
uga
uki
upv
uru
usa
usr
ven
vnd
vnr
wge
yem
yug
zai
afg
alb
alg
arg
aus
aut
bel
bol
bra
brm
bul
bur
cam
can
car
cha
chd
chi
col
con
cos
cub
cyp
cze
dah
den
dom
ecu
ege
egy
els
eth
fin
fra
gab
gha
gre
gua
gui
hai
hon
hun
ice
ind
ins
ire
irn
irq
isr
ita
ivo
jam
jap
jor
ken
kmr
kod
kor
kuw
lao
leb
lib
liy
lux
maa
mat
mex
mla
mli
mon
mor
nau
nep
net
nic
nig
nir
nor
nze
pak
pan
par
per
phi
pol
por
rum
rwa
saf
sau
sen
sie
som
spa
sri
sud
swe
swi
syr
tai
tha
tog
tri
tun
tur
uga
uki
upv
uru
usa
usr
ven
vnd
vnr
wge
yem
yug
zai
Pajek  shadow 0.00,1.00 Sep 51998
World Trade (Snyder and Kick, 1979)  cores
uki
net
bel
lux
fra
ita
den
jap
usa
can
bra
arg
ire
swi
spa
por
wge
ege
pol
aus
hun
cze
yug
gre
bul
rum
usr
fin
swe
nor
irn
tur
irq
egy
leb
cha
ind
pak
aut
cub
mex
uru
nig
ken
saf
mor
sud
syr
isr
sau
kuw
sri
tha
mla
gua
hon
els
nic
cos
pan
col
ven
ecu
per
chi
tai
kor
vnr
phi
ins
nze
mli
sen
nir
ivo
upv
gha
cam
gab
maa
alg
hai
dom
jam
tri
bol
par
mat
alb
cyp
ice
dah
nau
gui
lib
sie
tog
car
chd
con
zai
uga
bur
rwa
som
eth
tun
liy
jor
yem
afg
mon
kod
brm
nep
kmr
lao
vnd
uki
net
bel
lux
fra
ita
den
jap
usa
can
bra
arg
ire
swi
spa
por
wge
ege
pol
aus
hun
cze
yug
gre
bul
rum
usr
fin
swe
nor
irn
tur
irq
egy
leb
cha
ind
pak
aut
cub
mex
uru
nig
ken
saf
mor
sud
syr
isr
sau
kuw
sri
tha
mla
gua
hon
els
nic
cos
pan
col
ven
ecu
per
chi
tai
kor
vnr
phi
ins
nze
mli
sen
nir
ivo
upv
gha
cam
gab
maa
alg
hai
dom
jam
tri
bol
par
mat
alb
cyp
ice
dah
nau
gui
lib
sie
tog
car
chd
con
zai
uga
bur
rwa
som
eth
tun
liy
jor
yem
afg
mon
kod
brm
nep
kmr
lao
vnd
Fig. 10. Orderings
In Figure 10 the Snyder and Kick’s world trade network is presented by
its matrix: on the left side the units (states) are ordered in the alphabetic
order of their names; on the right side they are ordered on the basis of clus
tering results. It is evident that a ‘proper’ ordering can reveal a structure in
the network. Such orderings can be produced in diﬀerent ways [44]. On the
networks of moderate size (up to some hundreds of units) we can use also the
blockmodeling methods.
The goal of blockmodeling is to reduce a large, potentially incoherent net
work to a smaller comprehensible structure that can be interpreted more
readily [6,3,7]. One of the main procedural goals of blockmodeling is to iden
tify, in a given network N = (U, R), R ⊆ U U, clusters (classes) of units/
vertices that share structural characteristics deﬁned in terms of R. The units
within a cluster have the same or similar connection patterns to other units.
They form a clustering C = ¦C
1
, C
2
, . . . , C
k
¦ which is a partition of the set
U. Each partition determines an equivalence relation (and vice versa).
A clustering C partitions also the relation R into blocks
R(C
i
, C
j
) = R ∩ C
i
C
j
Each such block consists of units belonging to clusters C
i
and C
j
and all arcs
leading from cluster C
i
to cluster C
j
. If i = j, a block R(C
i
, C
i
) is called a
diagonal block.
18 Vladimir Batagelj and Andrej Mrvar
Fig. 11. Blockmodeling
A blockmodel consists of structures obtained by identifying all units from
the same cluster of the clustering C. For an exact deﬁnition of a blockmodel
we have to be precise also about which blocks produce an arc in the reduced
graph and which do not, and of what type. Some types of connections are
presented in Figure 12. The reduced graph can be represented by relational
matrix, called also image matrix.
Also, by reordering of network matrix so that the units from each cluster of
the optimal clustering are located together we obtain a matrix representation
of the network with visible structure.
How to determine an appropriate blockmodel? The blockmodeling can be
formulated as a clustering problem (Φ, P) as follows:
Determine the clustering C
∈ Φ for which
P(C
) = min
C∈Φ
P(C)
Since the set of units U is ﬁnite, the set of feasible clusterings Φ is also ﬁnite.
Therefore the set Min(Φ, P) of all solutions of the problem (optimal cluster
ings) is not empty. In theory, the set Min(Φ, P) can be determined by the
complete search – but it turns out that most cases of the clustering problem
are ^T hard. The blockmodeling problems are usually solved using local
optimization methods based on moving a unit from one cluster to another or
interchanging two units between two clusters.
One of the possible ways of constructing a criterion function that directly
reﬂects the considered equivalence is to measure the ﬁt of a clustering to
Pajek, Analysis and Visualization of Large Networks 19
Fig. 12. Block Types
an ideal one with perfect relations within each cluster and between clusters
according to the considered equivalence.
Given a clustering C = ¦C
1
, C
2
, . . . , C
k
¦, let B(C
u
, C
v
) denote the set of
all ideal blocks corresponding to block R(C
u
, C
v
). Then the global error of
clustering C can be expressed as
P(C) =
Cu,Cv∈C
min
B∈B(Cu,Cv)
d(R(C
u
, C
v
), B)
where the term d(R(C
u
, C
v
), B) measures the diﬀerence (error) between the
block R(C
u
, C
v
) and the ideal block B. d is constructed on the basis of
characterizations of types of blocks. The function d has to be compatible
with the selected type of equivalence. Determining the block error, we also
determine the type of the best ﬁtting ideal block (the types are ordered).
The criterion function P(C) is sensitive iﬀ P(C) = 0 ⇔ C determines an
exact blockmodeling. For all presented block types sensitive criterion func
tions can be constructed. Once a clustering C and types of blocks are de
termined, we can also compute the values of connections by using averaging
rules.
In Figure 13 a symmetric acyclic (edge connected inside clusters, acyclic
reduced graph) blockmodel [27] of Student Government at the University of
Ljubljana [35] is presented. The obtained clustering in 4 clusters is almost
exact. The only error is produced by the arc (a3, m5).
20 Vladimir Batagelj and Andrej Mrvar
Fig. 13. A Symmetric Acyclic Blockmodel of Student Government
4 Implementation
4.1 Data structures
In Pajek analysis and visualization are performed using 6 data types:
• network (graph),
• partition (nominal or ordinal properties of vertices),
• vector (numerical properties of vertices),
• cluster (subset of vertices),
• permutation (reordering of vertices, ordinal properties), and
• hierarchy (general tree structure on vertices).
In the near future we intend to extend this list with a support of multiple
networks and partitions of edges.
The power of Pajek is based on several transformations that support
diﬀerent transitions among these data structures. Also the menu structure
(see Figure 14) of the main Pajek’s window is based on them. Pajek’s main
window uses a ‘calculator’ paradigm with listaccumulator for each data type.
The operations are performed on the currently active (selected) data and are
also returning the results through accumulators.
The values of vectors can be used to determine several elements of network
display such as: X, Y, Z coordinates and the size of the vertex shape. The
partition can be graphically represented by the color and shape of vertices.
Also the values of edges can be represented by the thickness and/or color.
Pajek, Analysis and Visualization of Large Networks 21
Fig. 14. Pajek’s Main Window
4.2 Implemented algorithms
In Pajek, besides the algorithms described in section 3, several known eﬃcient
algorithms are implemented, like:
• simpliﬁcations and transformations: deleting loops, multiple edges, trans
forming arcs to edges etc.;
• components: strong, weak, biconnected, symmetric;
• decompositions: symmetricacyclic, hierarchical clustering;
• paths: shortest path(s), all paths between two vertices;
• ﬂows: maximum ﬂow between two vertices;
• neighborhood: kneighbours;
• CPM – critical paths;
• social networks algorithms: centrality measures, hubs and authorities,
measures of prestige, brokerage roles, structural holes, diﬀusion parti
tions;
• measures of dependencies among partitions / vectors: Cramer’s V, Spear
man rank correlation coeﬃcient, Pearson correlation coeﬃcient, Rajski
coeﬃcient;
• extracting subnetwork;
• shrinking clusters in network (generalized blockmodeling);
• reordering: topological ordering, Richards’s numbering, Murtagh’s seri
ation and clumping algorithms, depth/breadth ﬁrst search;
Pajek contains also some data analysis procedures which have higher order
time complexities and can be therefore used only on smaller networks, or se
lected parts of large networks: hierarchical clustering, generalized blockmod
eling, partitioning signed graphs [26], TSP (Traveling Salesman Problem),
computing geodesics matrices, etc.
The procedures are available through the main window menus. Frequently
used sequences of operations can be deﬁned as macros. This allows also the
adaptations of Pajek to groups of users from diﬀerent areas (social networks,
chemistry, genealogy, computer science, mathematics. . . ) for speciﬁc tasks.
22 Vladimir Batagelj and Andrej Mrvar
4.3 Layout Algorithms and Layout Features
Special emphasis is given in Pajek to automatic generation of network lay
outs. Several standard algorithms for automatic graph drawing are imple
mented: spring embedders (KamadaKawai and FruchtermanReingold), lay
outs determined by eigenvectors (Lanczos algorithm), drawing in layers (ge
nealogies and other acyclic structures), ﬁsheye views and block (matrix)
representation.
These algorithms were modiﬁed and extended to enable additional op
tions: drawing with constraints (optimization of the selected part of the net
work, ﬁxing some vertices to predeﬁned positions, using values of edges as
similarities or dissimilarities), drawing in 3D space. Pajek also provides tools
for manual editing of graph layout.
Properties of vertices/edges (given as data or computed) can be repre
sented using colors, sizes and/or shapes of vertices/edges.
Pajek supports also drawing sequences of networks in its Draw window,
and exports sequences of networks in suitable formats that can be examined
with special 2D or 3D viewers (e.g., SVG and Mage). Pictures in SVG can
be further controled using support written in Javascript.
4.4 Interfaces
Pajek supports also some nonnative input formats: UCINET DL ﬁles [53];
Vega graph ﬁles [54]; chemical MDLMOL [41] and BS; and genealogical GED
COM [30].
The layouts can be exported in the following output graphic formats that
can be examined by special 2D and 3D viewers: Encapsulated PostScript
(EPS) [31], Scalable Vector Graphics (SVG) [1], VRML [24], MDLMOL/
chime [41], and Kinemages (Mage) [49].
The main window menu Tools provides export of Pajek’s data to statisti
cal program R [48,21]. In the Tools menu, the user can prepare calls to her/his
favorite viewers and other tools. It is also possible to run Pajek (+macros)
from other programs (R, Ucinet, and others).
5 Examples
Several examples of applications of Pajek were already presented as illustra
tions while describing selected algorithms.
In Figure 15 a 3D layout of a graph obtained using eigenvectors is pre
sented.
In Figure 16 a snapshoot of 3D layout displayed in a VRML viewer of our
drawing of graph A from the Graph drawing contest 1997 is presented [33].
Pajek, Analysis and Visualization of Large Networks 23
Fig. 15. 3D layout obtained using eigenvectors
6 Software
6.1 Architecture
Pajek is implemented in Delphi and runs on Windows operating systems.
On the things to do list we have: support for GraphML format, implement
ing Pajek on Unix, and replacing macros by a Javascript(?) based network
scripting language.
6.2 Availability
Pajek is still under development. The latest version is freely available, for
noncommercial use, at its home page:
http://vlado.fmf.unilj.si/pub/networks/pajek/
24 Vladimir Batagelj and Andrej Mrvar
Fig. 16. GD’97 contest graph A in VRML
References
1. Adobe SVG Viewer (2002) http://www.adobe.com/svg/viewer/install/
2. Batagelj V. (1986) Graph – data structure and algorithms in pascal. Research
report.
3. Batagelj, V. (1997) Notes on blockmodeling. Social Networks 19, 143155.
4. Batagelj V. (2002) Eﬃcient Algorithms for Citation Network Analysis
5. Batagelj V., Brandes U. (2002) Fast generation of large sparse random graphs.
in preparation.
6. Batagelj, V., Doreian, P., and Ferligoj, A. (1992) An Optimizational Approach
to Regular Equivalence. Social Networks 14, 121135.
7. Batagelj V., Ferligoj A. (2000) Clustering relational data. Data Analysis (ed.:
W. Gaul, O. Opitz, M. Schader), Springer, Berlin, 315.
8. Batagelj V., Mrvar A. (1995) Towards NetML Networks Markup Language.
Presented at International Social Network Conference, London, July 610, 1995.
http://www.ijp.si/ftp/pub/preprints/ps/95/trp9515.ps
9. Batagelj V., Mrvar A. (199194) Programs for Network Analysis.
http://vlado.fmf.unilj.si/pub/networks/
10. Batagelj V., Mrvar A. (1998) Pajek – A Program for Large Network Analysis.
Connections, 21 (2), 4757
11. Batagelj V., Mrvar A. (2000) Some Analyses of Erd˝ os Collaboration Graph.
Social Networks, 22, 173186
12. Batagelj V., Mrvar A. (2001) A Subquadratic Triad Census Algorithm for Large
Sparse Networks with Small Maximum Degree. Social Networks, 23, 237243
Pajek, Analysis and Visualization of Large Networks 25
13. Batagelj V., Mrvar A. (2002) Pajek  Analysis and Visualization of Large Net
works. In: Mutzel P., J¨ unger M., Leipert S. (Eds.) GD’01, Vienna, Austria.
September 2326, 2001 LNCS 2265. SpringerVerlag, 477478.
14. Batagelj V., Mrvar A. (2002) Density based approaches to Reuters terror news
network analysis. submitted.
15. Batagelj V., Pisanski T. (1989) Xgraph project documentation.
16. Batagelj V., Mrvar A., Zaverˇsnik M. (1999) Partitioning Approach to Visual
ization of Large Graphs. In: Kratochvil J. (Ed.) GD’99,
ˇ
Stiˇrin Castle, Czech
Republic. LNCS 1731. SpringerVerlag, 9097.
17. Batagelj V., Mrvar A., Zaverˇsnik M. (2002) Network analysis of texts. Language
Technologies, Ljubljana, p. 143148.
18. Batagelj V., Zaverˇsnik M. (2001) An O(m) Algorithm for Cores Decomposition
of Networks. Submitted.
19. Batagelj V., Zaverˇsnik M. (2002) Generalized Cores. Submitted.
http://arxiv.org/abs/cs.DS/0202039
20. Batagelj, V. and Zaverˇsnik, M. (2002) Triangular connectivity and its general
izations, in preparation.
21. Butts, C.T. (2002) sna: Tools for Social Network Analysis.
http://cran.at.rproject.org/src/contrib/PACKAGES.html#sna
22. Caida: Internet Visualization Tool Taxonomy.
http://www.caida.org/tools/taxonomy/visualization/
23. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. (2001) Introduction to
Algorithms, Second Edition. MIT Press.
24. Cosmo Player (2002) http://ca.com/cosmo/
25. de Nooy W., Mrvar A., Batagelj V. (2002) Exploratory Social Network Analysis
With Pajek. to be published by the Cambridge University Press.
26. Doreian P., Mrvar A. (1996) A Partitioning Approach to Structural Balance.
Social Networks, 18. 149168
27. Doreian, P., Batagelj, V., Ferligoj, A. (2000) Symmetricacyclic decompositions
of networks. J. classif., 17(1), 328.
28. Dremelj P., Mrvar A., Batagelj V. (2002) Analiza rodoslova dubrovaˇckog vlas
teoskog kruga pomo´cu programa Pajek. Anali Dubrovnik XL, HAZU, Zagreb,
Dubrovnik, 105126 (in Croat).
29. Garﬁeld E, Sher IH, and Torpie RJ.: The Use of Citation Data in Writing
the History of Science. Philadelphia: The Institute for Scientiﬁc Information,
December 1964. http://www.garfield.library.upenn.edu/papers/
useofcitdatawritinghistofsci.pdf
30. GEDCOM 5.5.
http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm
31. Ghostscript, Ghostview and GSview http://www.cs.wisc.edu/~ghost/
32. Gibbons A. (1985) Algorithmic Graph Theory. Cambridge University Press.
33. Graph Drawing Contest 1997. http://vlado.fmf.unilj.si/pub/gd/gd97.htm
34. Grossman J. (2002) The Erd˝ os Number Project.
http://www.oakland.edu/~grossman/erdoshp.html
35. Hlebec, V. (1993) Recall versus recognition: Comparison of two alternative
procedures for collecting social network data. Metodoloˇski zvezki 9, Ljubljana:
FDV, 121128.
36. Hummon, N.P. & Doreian, P. (1989) Connectivity in a citation network: The
development of DNA theory. Social Networks, 11, 39–63.
26 Vladimir Batagelj and Andrej Mrvar
37. Jones B. (2002). Computational geometry database.
http://compgeom.cs.uiuc.edu/~jeffe/compgeom/biblios.html
38. Kleinberg J. (1998) Authoritative sources in a hyperlinked environment. In
Proc 9th ACMSIAM Symposium on Discrete Algorithms, p. 668677.
http://www.cs.cornell.edu/home/kleinber/auth.ps
http://citeseer.nj.nec.com/kleinberg97authoritative.html
39. Knuth, D. E. (1993) The Stanford GraphBase. Stanford University, ACM Press,
New York. ftp://labrea.stanford.edu/pub/sgb/
40. Mahnken, I. (1960) Dubrovaˇcki patricijat u XIV veku. Beograd, Nauˇcno delo.
41. MDL Information Systems, Inc. (2002) http://www.mdli.com/
42. James Moody home page (2002) http://www.soc.sbs.ohiostate.edu/jwm/
43. Mrvar A., Batagelj V. (2000) Relational Calculator  a tool for analyzing social
networks. Metodoloˇski zvezki 16, FDV, Ljubljana, 6376.
44. Murtagh, F. (1985) Multidimensional Clustering Algorithms, Compstat lec
tures, 4, Vienna: PhysicaVerlag.
45. ODLIS (2002) Online dictionary of library and information science.
http://vax.wcsu.edu/library/odlis.html
46. Pajek’s datasets. http://vlado.fmf.unilj.si/pub/networks/data/
47. D.M. Pennock etal. (2002) Winners dont’t take all, PNAS, 99/8, 52075211.
48. The R Project for Statistical Computing. http://www.rproject.org/
49. Richardson D.C., Richardson J.S. (2002) The Mage Page.
http://kinemage.biochem.duke.edu/index.html
50. Scott, J. (2000) Social Network Analysis: A Handbook, 2nd edition. London:
Sage Publications.
51. Seidman S. B. (1983) Network structure and minimum degree, Social Networks,
5, 269–287.
52. Tarjan, R. E. (1983) Data Structures and Network Algorithms. Society for
Industrial and Applied Mathematics Philadelphia, Pennsylvania.
53. UCINET (2002) http://www.analytictech.com/
54. Project Vega (2002) http://vega.ijp.si/
55. Wasserman S., Faust K. (1994) Social Network Analysis: Methods and Appli
cations. Cambridge University Press, Cambridge.
56. Wasserman, S., and Pattison, P. (1996) Logit models and logistic regressions for
social networks: I. An introduction to Markov graphs and p
∗
. Psychometrika,
60, 401426. http://kentucky.psych.uiuc.edu/pstar/index.html
57. White D.R., Batagelj V., Mrvar A. (1999) Analyzing Large Kinship and Mar
riage Networks with Pgraph and Pajek. Social Science Computer Review, 17
(3), 245274
58. Wilson, R.J., Watkins, J.J. (1990) Graphs: An Introductory Approach. New
York: John Wiley and Sons.
59. Yuen Ho, et.al. (2002) Systematic identiﬁcation of protein complexes in
Saccharomyces cerevisiae by mass spectrometry. Nature, vol 415, 180183.
http://www.mshri.on.ca/tyers/pdfs/proteome.pdf
60. Zaverˇsnik M., Batagelj V., Mrvar A. (2002) Analysis and visualization of 2
mode networks. Proceedings of Sixth Austrian, Hungarian, Italian and Slove
nian Meeting of Young Statisticians, October 57, 2001, Ossiach, Austria. Uni
versity of Klagenfurt, p. 113123.
0
mixed) networks Pajek supports also: • 2mode networks. University of Ljubljana. We started the development of Pajek in November 1996. In Slovenian language pajek means spider.
This work was partially supported by the Ministry of Education. (people. Energ [9]. (people.Pajek Analysis and Visualization of Large Networks
Vladimir Batagelj1 and Andrej Mrvar2
1
2
Department of Mathematics. is member of). Besides ordinary (directed. for Windows. Slovenia
1
Introduction
Pajek is a program. institutions. shrink vertices in clusters and show relations among clusters (global view). extract vertices that belong to the same clusters and show them separately. possibly with the parts of the context (detailed local view). bipartite (valued) graphs – networks between two disjoint sets of vertices. The main goals in the design of Pajek are [10. Examples of such networks are: (authors. Slovenia Faculty of Social Sciences. Faculty of Mathematics and Physics. cores. collection of network analysis and visualization programs STRAN. etc. and SGMLbased graph description markup language NetML [8]. With Pajek we can (see Figure 1): ﬁnd clusters (components. • to implement a selection of eﬃcient (subquadratic) algorithms for analysis of large networks.
The design of Pajek is based on experiences gained in development of graph data structure and algorithms libraries Graph [2] and Xgraph [15]. • to provide the user with some powerful visualization tools. for analysis and visualization of large networks having some ten or houndred of thousands of vertices. Projects J18532 and Z53350. papers. neighbourhoods of ‘important’ vertices.) in a network. is the (co)author of the paper). University of Ljubljana.13]: • to support abstraction by (recursive) decomposition of a large network into several smaller networks that can be treated further using more sophisticated methods. (authors. Science and Sport of Slovenia. undirected. papers. was present at). events. cites the paper). is on the list). shoping lists. RelCalc. (articles. Draw.
.
The central. In last months (end of 2002) we had over 500 downloads of Pajek per month. datamining (2mode networks) [14]. Penn State.2
Vladimir Batagelj and Andrej Mrvar
Fig. Pajek provides tools for analysis and visualization of such networks and is applied by researchers in diﬀerent areas: social network analysis [11]. Stanford. citation networks [42]. genealogies [57. small networks. etc.
2
Applications
There exist several sources of large networks that are already in machinereadable form.28]. especially visualization of. The Ohio State University. Rotterdam. Approaches to deal with large networks
• temporal networks. 1. biomedical/genomics research (proteinreceptor interaction networks) [59]. algorithmic section of this chapter deals mainly with diﬀerent eﬃcient approaches to this problem. Internet networks [22]. Irvine. dynamic graphs – networks changing over time. Vienna. In this chapter we present the main characteristics of Pajek. Pajek is also used at several universities: Ljubljana. Since large networks can’t be visualized in details in a single view we have ﬁrst to identify interesting substructures in such network and then visualize them as separate views. analysis of texts [17]. diﬀusion networks (AIDS.
. Although it was developed primarily for analysis of large networks it is often used also for. news). Wisconsin/Madison. chemistry (organic molecule).
1 Citation weights
In a given set of units/vertices U (articles. eﬃcient algorithms. had to be developed. we add also the ‘feedback’ arc (t. Together with Wouter de Nooy from University of Rotterdam we wrote a course book Exploratory Social Network Analysis With Pajek[25]. etc. 3. For two of these indices we developed algorithms to eﬃciently compute them [4]. to make the theory smoother. A citing relation is usually irreﬂexive (no loops) and (almost) acyclic. where n− (u) = 1
v:vRu
(u. For other approaches see [4]. Thus we get a stdigraph [TF 2.2]. books. and n+ (v) counts the number of diﬀerent paths from v to t. It follows by basic principles of combinatorics that n(u. v). s).Pajek. In the following we shall assume that it has these two properties. and all maximal elements of R are linked to the sink t. works. They mainly provide diﬀerent ways to identify interesting substructures in a given network. Finally. [29]. The search path count (SPC) method is based on counters n(u.
3
Algorithms
To support the design goals we implemented several algorithms known from the literature (see section 4. Madrid. To compute n(u.) we introduce a citing relation/set of arcs R ⊆ U × U uRv ≡ v cites u which determines a citation network N = (U. v) that count the number of diﬀerent paths from s to t through the arc (u. v) = n− (u) · n+ (v). In 1989 Hummon and Doreian [36] proposed three indices – weights of arcs that provide us with automatic way to identify the (most) important part of the citation network. but for some tasks new. It is also useful to transform a citation network to its standardized form by adding a common source vertex s ∈ U and a common sink vertex t ∈ U . The source / / s is linked by an arc to all minimal elements of R.2). v) ∈ R u=s otherwise
n− (v)
. suitable to deal with large networks. The citation network analysis started in 1964 with the paper of Garﬁeld et al. v) we introduce two auxiliary quantities: n− (v) counts the number of diﬀerent paths from s to v. R). Analysis and Visualization of Large Networks
3
Freiburg. Since in reallife citation networks the strong components are small (usually 2 or 3 vertices) we can transform such network into an acyclic network by shrinking strong components and deleting loops. and some others as a support in courses on network analysis.
v) ∈ R. • search path node pair (SPNP) method: wp (u. Part of SOM main subnetwork at level 0. v) equals the number of “all possible search paths through the network emanating from an origin node” through the arc (u.4
Vladimir Batagelj and Andrej Mrvar
Fig. 2. We get the SPLC weights by applying the SPC method on the network obtained from a given standardized network by linking the source s by an arc
. v) ∈ R”. v) – after the topological sort [TF 2.001
and n+ (u) =
1
+ v:uRv n (v)
u=t otherwise
This is the basis of an eﬃcient algorithm for computing n(u. v) “accounts for all connected vertex pairs along the paths through the arc (u. using the above relations in topological order. The topological order ensures that all the quantities in the right sides of the above equalities are already computed when needed.2] of the stdigraph we can compute. The Hummon and Doreian indices are deﬁned as follows: • search path link count (SPLC) method: wl (u. the weights in time of order O(m). m = R.
The picture is exported in SVG with additional Javascript support that provides the user with options to inspect the subnetwork at diﬀerent predetermined levels. u) =
v:vRu v:uRv
n(u. Let G = (V. 1.001 of the citation network (n = 4470. s) ⇒ 0 ≤ w(u. 2 and 3 core
to each nonminimal vertex from U . v) ≤ 1
and if C is a minimal arccutset w(u. v) = n− (u) · n+ (u)
The weight n(t. v) n(t. In Figure 2 the main subnetwork obtained as an edgecut at level 0. 3. A subgraph H = (W. and H is a maximal subgraph with this property.v)∈C
In large networks the values of weights can grow very large. and the SPNP weights by applying the SPC method on the network obtained from the SPLC network by additionally linking by an arc each nonmaximal vertex from U to the sink t.2 Cores and generalized cores
The notion of core was introduced by Seidman in 1983 [51]. The
. 0. v) = n(u. v) = 1
(u. EW ) induced by the set W is a kcore or a core of order k iﬀ ∀v ∈ W : degH (v) ≥ k. This should be considered in the implementation of the algorithms.Pajek. Analysis and Visualization of Large Networks
5
Fig. The values of counters n(u. The core of maximum order is also called the main core. s) equals to the total ﬂow through network and provides a natural normalization of weights w(u. E) be a graph. v) form a ﬂow in the citation network – the Kirchoﬀ ’s vertex law holds: For every vertex u in a standardized citation network incoming ﬂow = outgoing ﬂow : n(v. m = 12731) on SOM (selforganizing maps) literature is presented. 3.
In Figure 3 an example of cores decomposition of a given graph is presented.Welzl J. and the update of the ordering can be done in a constant time..Liotta
D.Matousek C. pS core at level 46 of Geomlib network
core number of vertex v is the highest order of a core that contains this vertex.diBattista R.Janardan J.Yvinec P.Overmars M.Mitchell
I.Boissonnat O.Bern D. E) we recursively delete all vertices. From this ﬁgure we can see the following properties of cores: • The cores are nested: i < j =⇒ Hj ⊆ Hi • Cores are not necessarily connected subgraphs.Teillaud P.Halperin M.deBerg O.Arkin
M. The degree deg(v) can be: indegree.Majhi J.vanKreveld M.Guibas H.Schwarzkopf G.Gupta R.Vismara
M. 4.Urrutia
J.Eppstein
J.Preparata J.Edelsbrunner M. We expect that diﬀerent approaches to the analysis of large networks
.Pollack J.Bose M. the remaining graph is the kcore.Dobkin
S. Its outline is given in Algorithm 1.Klein
Fig.Schwerdt
J.Devillers M.Yap
F.Vitter
J.Garg
L.Seidel B.Aronov L.Tollis
A. Our algorithm for determining the cores hierarchy is based on the following property [16]: If from a given graph G = (V. Since the values of degrees are in the range 0.Czyzowicz
C. determining diﬀerent types of cores.Icking R. In the reﬁnements of the algorithm we have to provide eﬃcient implementations of sorting the degrees and their reordering.Tamassia G. are one among few concepts that provide us with meaningful decompositions of large networks.Sharir R.Snoeyink P.Chazelle R.n − 1 we can order them in O(n) using a variant of bin sort. indegree + outdegree.6
Vladimir Batagelj and Andrej Mrvar
E. etc.Smid J. outdegree.Pach E.Goodrich
G. and edges incident with them. For details see [18].Hershberger
B.O’Rourke J. because they can be determined very eﬃciently..Agarwal D. of degree less than k.Toussaint M.Suri
J. The cores.
E. u). pM and pk are monotone. For example: we get the following bound on the chromatic number of a given graph G χ(G) ≤ 1 + core(G) Cores can also be used to localize the search for interesting subnetworks in large networks since: if it exists. E) represented by lists of neighbors Output : Table core[V ] with core number for each vertex Compute the degrees of vertices Order the set of vertices V in increasing order of their degrees for each v ∈ V in the order do Set core[v] = degree[v] for each u ∈ adj(v) do if degree[u] > degree[v] then Set degree[u] = degree[u] − 1 Reorder V accordingly end end end
can be built on this basis. For a monotone function the pcore at level t can be determined. U ). C) and C is a maximal such set. and the cores on diﬀerent levels are nested t 1 < t 2 ⇒ H t2 ⊆ H t1
. where G = (V.Pajek. EC) induced by the set C ⊆ V is a pcore at level t ∈ IR iﬀ ∀v ∈ C : t ≤ p(v. where w : E → IR
pk (v. and a kclique is contained in a kcore. E) is a graph and w : E → IR is a function assigning values to edges. v ∈ V . here are some examples of pfunctions: pS (v. Analysis and Visualization of Large Networks
7
Algorithm 1: Core Numbers Algorithm
Input : Graph G = (V. or a pfunction for short. U ) =
u∈adjU (v) + w(v. The notion of core can be generalized to networks. Let adjU (v) = adj(v) ∩ U . Let N = (V. is a function p(v. as in the ordinary case. where w : E → IR0
pM (v. U ) =
u∈adjU (v)
max
w(v. A vertex property function on N. by successively deleting vertices with value of p lower than t. C2 )) The degrees and the functions pS . u). U ⊆ V with real values. a kcomponent is contained in a kcore. U ) = number of cycles of length k through vertex v in (U. Besides degrees. EU ) The subgraph H = (C. w) be a network. The function p is monotone iﬀ it has the property C1 ⊂ C2 ⇒ ∀v ∈ V : (p(v. C1 ) ≤ p(v.
a user can set some additional options: • vertices in network should match with vertices in pattern in some nominal. For a local pfunction an O(m max(∆. • values of edges must match (for example.3 Pattern searching
If a selected pattern determined by a given graph does not occur frequently in a sparse network the straightforward backtracking algorithm applied for pattern searching ﬁnds all appearences of the pattern very fast even in the case of very large networks. Figure 5 presents three connected relinking marriages which are nonblood marriages found in the genealogy of ragusan noble families [28]. adjU (v)) The degrees. In Figure 4 a pS core at level 46 of the collaboration network in the ﬁeld of computational geometry [37] is presented. pS and pM are local. The
. adjC (v)) can be computed in O(degC (v)) [19]. ordinal or numerical property (for example. To speed up the search or to consider some additional properties of the pattern. Pattern searching was successfully applied to searching for patterns of atoms in molecula (carbon rings) and searching for relinking marriages in genealogies. 3. assuming that p(v. but pk is not local for k ≥ 4. 5. edges representing male/female links in the case of pgraphs [57]). • the ﬁrst vertex in the pattern can be selected only from a given subset of vertices in the network. type of atom in molecula). Marriages among relatives in Ragusa
The pfunction is local iﬀ p(v. U ) = p(v.8
Vladimir Batagelj and Andrej Mrvar
Michael/Zrieva/ Francischa/Georgio/ Junius/Georgio/ Anucla/Zrieva/ Nicola/Ragnina/ Nicoleta/Zrieva/ Marinus/Zrieva/ Maria/Ragnina/
Damianus/Georgio/ Legnussa/Babalio/ Nicolinus/Gondola/ Franussa/Bona/
Junius/Zrieva/ Margarita/Bona/
Lorenzo/Ragnina/ Slavussa/Mence/
Sarachin/Bona/ Nicoletta/Gondola/
Marin/Gondola/ Magdalena/Grede/
Marinus/Bona/ Phylippa/Mence/
Fig. log n)) algorithm for determining the pcore levels exists.
012
3 .
.030C
11 . Analysis and Visualization of Large Networks
9
1 . Triads
genealogy is represented as a pgraph.111U
9 .120D
13 . and • connected triads: 111D. 021C. 300. A triad is a subgraph induced by a given set of three vertices. 111U.021C
7 .030T
10 . A solid arc indicates the is a son of relation. • dyadic triads 012 and 102. 021U.Pajek. 201. 3. and a dotted arc indicates the is a daughter of relation.210
16 .120C
15 .300
Fig. 021D. In all three patterns a brother and a sister from one family found their partners in the same other family.003
2 . 030T. R) be a simple directed graph without loops. page 244]. 120D. There are 16 nonisomorphic (types of) triads [55.4 Triads
Let G = (V.021D
5 . They can be partitioned into three basic types (see Figure 6): • the null triad 003.021U
6 .111D
8 . 030C and 120C.201
12 .120U
14 . 210. 6. 120U.102
4 .
We solve the isomorphism problem by introducing the canonical selection that contributes to the triadic count. w). Each pair of vertices (v. u) was already counted before. Since the total number of triads is T = n and the above 3 types partition the set of all triads. u. the idea of the algorithm is as follows: • count all dyadic T2 and all connected T3 triads with their subtypes. w) of connected triad is canonical iﬀ v < u < w. w. v. (u. A set of three vertices {v. the other. but in most large graphs it can be determined much faster [12]. u. For a connected triad we can always assume that v is the smallest of its vertices. (u.
. So we have to determine the canonical selection from the remaining ˆ two selections (v. (w. u). w. • compute the number of null triads T1 = T − T2 − T3 . w) and (v. The condition v < u determines the canonical selection for dyadic triads. and of type 2 – 012 otherwise. ˆ The total complexity of the algorithm is O(∆m) and thus. u. Each triad code corresponds to a unique triad type that can be determined from a precomputed table. Let R = R ∪ R−1 . Every connected dyad forms a dyadic triad with every vertex both memˆ bers of the dyad are not adjacent to. In the algorithm we have to assure that every nonnull triad is counted exactly once while scanning the set of arcs. v < u connected by an arc contributes ˆ ˆ n − R(u) ∪ R(v) \ {u.5 Triangular connectivities
In this subsection we present an extension of notion of connectivity to connectivity by chains of triangles. u. v). ˆ small maximum degree ∆ < 3. It also provides ingredients for p∗ network models [56]. dyadic and connected triads. w. u). A selection (v. if u and v are connected in both directions. v. w). A direct approach to determine the triadic spectrum is of order O(n3 ). noncanonical selections need not to be considered in the counting process. for graphs with ˆ < n. u).10
Vladimir Batagelj and Andrej Mrvar
Several properties of a graph can be expressed in terms of its triadic spectrum – distribution of all its triads. The algorithm is based on the folllowing observation: in a large and sparse graph most triads are null triads. T2 . If v < w < u and v Rw then the selection (v. since 2m ≤ n∆. Let T1 . u. of order O(n). T3 be the number of null. The triads isomorphism problem can be eﬃciently solved by assigning to each triad a code – an integer number between 0 to 63 obtained by treating the outdiagonal entries of triad adjacency matrix as a binary number. v} − 2 triads of type 3 – 102. Therefore we have to consider it as ˆ canonical only if it is not v Rw. w. (v. w} can be in general selected in 6 diﬀerent ways (v. In an implementation of the algorithm we must also take care about the range overﬂow in the case of T and T1 . v). u). (w.
E) is triangular if each its vertex and each its edge belongs to at least one triangle in H.
.
SHELAH. KEVIN T. RICHARD H. . PETER
GRAHAM. and nontrivial triangular connectivity components are exactly maximal connected triangular subgraphs. . YOUSEF CHARTRAND. ORTRUD R. . LOVASZ. RALPH J.
MAGIDOR. . . FAN RONG K. FRANK KUBICKI. ENDRE BOLLOBAS. i = 2. ANDRAS SCHELP. NOGA OELLERMANN. SAKS. GARY HARARY. DOUGLAS ROBERT MULLIN.
STINSON. CHARLES J. v ∈ V is (vertex) triangularly connected iﬀ u = v. NICHOLAS C. Each nontriangular edge is in its own component. Triangular connectivity is an equivalence relation on the set of vertices V . ZOLTAN
HENNING. DANIEL J. BELA AJTAI. JANOS GODDARD. BRENDAN D. 7. MICHAEL S. Edgecut at level 16 of triangular network of Erd˝s collaboration graph o
Undirected graphs We call a triangle a subgraph isomorphic to K3 . GRZEGORZ SCHWENK. ALON. It edge triangularly connects vertices u. ALEXANDER
ARONOV. A sequence (T1 . E ) of G = (V. SAHARON
MCKAY. Edge triangular connectivity components determine an equivalence relation on the set of edges E. . A pair of vertices u. JENO CHEN. RONALD L. A subgraph H = (V . LEHEL. .Pajek. . WAYNE D. BORIS PACH. CHUNG. v ∈ V is edge triangularly connected iﬀ u = v. RENU C.
HEDETNIEMI. MENACHEM
KLEITMAN. JAROSLAV GYARFAS. MICHAEL A. RONALD C. v ∈ V iﬀ u ∈ T1 and v ∈ Ts or u ∈ Ts and v ∈ T1 and V (Ti−1 ) ∩ V (Ti ) = ∅. ALAVI. RICHARD M.
TUZA. JACOBSON.
11
LASKAR. COLBOURN. s. LASZLO SZEMEREDI. Such sequence is called a triangular chain.
FUREDI. v ∈ V iﬀ a stronger version of the second condition holds E(Ti−1 ) ∩ E(Ti ) = ∅. STEPHEN T. MICHAEL E. SPENCER. . MIKLOS RODL.
PHELPS. or there exists a chain that triangularly connects u and v. s. i = 2. VOJTECH NESETRIL. LINIAL. T2 . ALLEN JOHN ROSA. JOEL H. JANOS POLLACK. NATHAN FRANKL.
Fig. A pair of vertices u. Ts ) of triangles of G (vertex) triangularly connects vertices u. ZSOLT BABAI. LASZLO KOMLOS. GUANTAO FAUDREE. or there exists a chain that edge triangularly connects u and v. . Analysis and Visualization of Large Networks
WORMALD.
transitive.11] (without Erd˝s. . Acyc . For a selected arc (u. A connected cyclic triangular subgraph is also strongly connected. s. T2 . . Ntra . Ts ) of cyclic triangles of G (vertex) cyclic triangularly connects vertex u ∈ V to vertex v ∈ V iﬀ u ∈ T1 and v ∈ Ts or u ∈ Ts and v ∈ T1 and V (Ti−1 ) ∩ V (Ti ) = ∅. . It arc cyclic triangularly connects vertex u to vertex
. v) ∈ A there are four diﬀerent types of directed triangles: cyclic. o Directed graphs If the graph G is mixed we replace edges with pairs of opposite arcs. In the following let G = (V. v) = outadj(v) ∩ inadj(u) In directed graphs we distinguish weak and strong connectivity. . . i = 2.12
Vladimir Batagelj and Andrej Mrvar
Let G be a simple undirected graph. m = 11343) is presented. v} : u = v ∧ (u. A procedure for determining ET and w(e). such sequence is called a cyclic triangular chain. ET . A sequence (T1 . for the cyclic network Ncyc = (V. A ) of G is cyclic triangular if each its vertex and each its arc belongs to at least one cyclic triangle in H. If the sets of neighbors adj(v) are ordered we can use merging to compute w(e) faster. . Nin and Nout . ET ) of G which set of edges ET consists of all triangular edges of E(G). e ∈ ET simply collects all edges with w(e) = adj(u) ∩ adj(v) > 0. For example. v} ∈ E. v) ∈ Acyc wcyc (u. A triangular network NT (G) = (V. . Triangular networks can be used to eﬃciently identify dense cliquelike parts of a graph.
cyc
tra
in
out
For each type we get the corresponding triangular network Ncyc . In Figure 7 the edgecut at level 16 of triangular network of Erd˝s collabo oration graph [34. For e ∈ ET the weight w(e) equals to the number of diﬀerent triangles in G to which e belongs. e = {u. Also procedures for determining the networks are similar to undirected case. ES ) of the given graph G ES = {{u. w) determined by G is a subgraph GT = (V. n = 6926. wcyc ) we have for (u. If an edge e belongs to a kclique in G then w(e) ≥ k − 2. input and output. v) ∈ A} A subgraph H = (V . A) be a simple directed graph without loops. Nontrivial triangular connectivity components are exactly the components of GT . The weak connectivity can be reduced to the undirected concepts in the skeleton S = (V.
. or there exists a cyclic triangular chain that connects u to v.
. The vertex v ∈ V is transitively triangularly reachable from the vertex u ∈ V iﬀ u = v. A) a transitive arc the reachability relation in V does not change. v ∈ V is (vertex) cyclic triangularly connected iﬀ u = v. Edgecut at level 11 of transitive network of ODLIS dictionary graph
Pajek
v iﬀ A(Ti−1 ) ∩ A(Ti ) = ∅. or there exists a walk from u to v in which each arc is transitive – is a base of some transitive triangle. These notions can be generalized to short cycle connectivity [20]. i = 2. In Figure 8 the edgecut at level 11 of transitive network of ODLIS dictionary graph [45] is presented. or there exists an arc cyclic triangular chain that connects u to v. . such sequence is called an arc cyclic triangular chain. . and the arc cyclic triangular connectivity components determine an equivalence relation on the set of arcs A. s holds.Pajek. 8. Analysis and Visualization of Large Networks
serial
13
American Library Directory transaction log suggestion box charge library
publication periodical review series
frequency issue
call number fixed location blanket order vendor
Library Literature journal publishing American Library Association /ALA/ Books in Print /BIP/ title page printing
colophon layout
entry
homepage round table librarian catalog condition Oak Knoll dust jacket halftitle library binding book published price
International Standard Book Number /ISBN/ dummy plate imprint bibliography work edition fiction abstract editor title table of contents /TOC/ text parts of a book front matter endpaper collation folio page index copyright
bibliographic record
invoice
new book book size
publisher
binding cover
Fig. Again. Cyclic triangular connectivity is an equivalence relation on the set of vertices V . Transitive arcs are essentially reinforced arcs. There exists also a parallel to unilateral connectivity. If we remove from a graph G = (V. v ∈ V is arc cyclic triangularly connected iﬀ u = v. we can introduce two types of cyclic triangular connectivity: A pair of vertices u. A pair of vertices u.
v}} end od
The same approach is easy to adapt to generate diﬀerent types of random graphs: undirected. acyclic. 2. Algorithm 2: Sparse Erd˝sR´nyi random graph generator o e
Input : Probability p. Number of vertices n Output : Random graph G = (1. acyclic bipartite. A faster procedure can be built on the following idea: move by random steps over the M = n 2 cells and mark the touched cells. But. Therefore we get the random step s from the equation t=1 q F (s) = random s = F −1 (random) = 1 + log(1 − random) log q
This is the basis of the fast random graph generation procedure presented in Algorithm 2. E) Set q = 1 − p. E = E ∪ {{u. and F (s) = Pr(step < s) = s−1 t−1 p = 1−q s−1 . 2mode. directed bipartite. Pajek contains also a reﬁnement of the model for generating scale free network s. . proposed in [47]. At each step of the growth a new vertex and k
. How to select the length of the random step? For our Bernoulli model we have Pr(step = s) = q s−1 p. for large and very sparse networks this is too slow. . for u = 1 to n − 1 do for v = u + 1 to n do if random < p then E = E ∪ {{u. f = 1. s = 1. An Erd˝sR´nyi random graph G ∈ o e G(n. M = n(n − 1)/2. .14
Vladimir Batagelj and Andrej Mrvar
3.. p) is obtained by selecting every edge {u. v} ∈ G) = p It is easy to write a program to do this: E = ∅. undirected bipartite. v}}. u = 2. v} with a probability p: Pr({u.6
Generating large random networks
Let p ∈ [0. k = 0. The expected number of steps of this procedure is M p. directed. 1] be a given probability. E = ∅. again = true while again do ln(1 − random) Set k = k + 1 + ln q if k > M then Set again = f alse else while f < k do Set f = f + u. u = u + 1 Set v = k + u − f − 1. 3.n. and others [5].
In networks obtained from large 2mode networks there are often huge diﬀerences in weights. 3. v) ∈ A. auv = w(u. where E1 and w1 are determined by the matrix A(1) = AAT . There exist several ways how to do this. A. The endpoints of the edges are randomly selected among all vertices according to the probability Pr(v) = α indeg(v) outdeg(v) 1 +β +γ E E V 
v∈V
where α + β + γ = 1. and w : A → IR is a weight. But most often we transform a 2mode network into an ordinary (1mode) network N1 = (U. Some of them are presented in Table 1. The network N2 is determined in a similar way by the matrix A(2) = AT A. E2 . v) 0 (u. Analysis and Visualization of Large Networks
15
edges are added to the network N . V. If no weight is deﬁned we can assume a constant weight w(u. v} ∈ E1 in zv (1) N1 iﬀ adj(u) ∩ adj(v) = ∅. Therefore it is not possible to compare the vertices according to the raw data. The time
A 2mode network is a structure N = (U. w1 ) or/and N2 = (V.60] – a kind of network based datamining. A 2mode network can be formally represented by rectangular matrix A = [auv ]U ×V .7 2mode networks
Pr(v) = 1. Evidently auv = avu . They can be used also on other networks. 3. First we have to normalize the network to make the weights comparable.8 Normalizations
The normalization approach was developed for quick inspection of (1mode) networks obtained from 2mode networks [14. E1 . A is the set of arcs with the initial vertex in the set U and the terminal vertex in the set V . The networks N1 and N2 are analyzed using standard methods. In the case of networks without loops we deﬁne the diagonal weights for undirected networks as the sum of outdiagonal elements in the row (or column) wvv =
u
wvu
. There is an edge {u. w2 ). clustering and blockmodeling. where U and V are disjoint sets of vertices. (1) (1) (1) auv = z∈V auz · aT . The set A can be viewed also as a relation A ⊆ U × V . It is easy to check that complexity of this procedure is O(m). v) = auv . Its weight is w1 (u. v) ∈ A otherwise
For direct analysis of 2mode networks we can use eigenvector approach.Pajek. w). v) = 1 for all arcs (u.
wvv ) MinDiruv =
wuv wuu
GeoDeguv = Outputuv = Maxuv = MaxDiruv =
wuv degu degv wuv wuu wuv max(wuu . After a selected normalization the important parts of network are obtained by edgecutting the normalized network at selected level t and preserving components with at least k vertices. Weight normalizations wuv Geouv = √ wuu wvv wuv Inputuv = wvv wuv Minuv = min(wuu .16
Vladimir Batagelj and Andrej Mrvar
Fig.
. for example 1 wvv = ( wvu + wuv ) 2 u u Usually we assume that the network does not contain any isolated vertex. 9. wvv )
wuv wvv
0
wuu ≤ wvv otherwise
0
wuu ≤ wvv otherwise
and for directed networks as some mean value of the row and column sum. GeoDeg normalization of Reuters terror news network Table 1.
in a given network N = (U. a block R(Ci .51998
afg alb alg arg aus aut bel bol bra brm bul bur cam can car cha chd chi col con cos cub cyp cze dah den dom ecu ege egy els eth fin fra gab gha gre gua gui hai hon hun ice ind ins ire irn irq isr ita ivo jam jap jor ken kmr kod kor kuw lao leb lib liy lux maa mat mex mla mli mon mor nau nep net nic nig nir nor nze pak pan par per phi pol por rum rwa saf sau sen sie som spa sri sud swe swi syr tai tha tog tri tun tur uga uki upv uru usa usr ven vnd vnr wge yem yug zai
Fig. The units within a cluster have the same or similar connection patterns to other units.00. A clustering C partitions also the relation R into block s R(Ci . .00 World Trade (Snyder and Kick.cores
uki net bel lux fra ita den jap usa can bra arg ire swi spa por wge ege pol aus hun cze yug gre bul rum usr fin swe nor irn tur irq egy leb cha ind pak aut cub mex uru nig ken saf mor sud syr isr sau kuw sri tha mla gua hon els nic cos pan col ven ecu per chi tai kor vnr phi ins nze mli sen nir ivo upv gha cam gab maa alg hai dom jam tri bol par mat alb cyp ice dah nau gui lib sie tog car chd con zai uga bur rwa som eth tun liy jor yem afg mon kod brm nep kmr lao vnd
Sep. Ck } which is a partition of the set U . Ci ) is called a diagonal block. Such orderings can be produced in diﬀerent ways [44]. potentially incoherent network to a smaller comprehensible structure that can be interpreted more readily [6. One of the main procedural goals of blockmodeling is to identify. 3.1. C2 . . The goal of blockmodeling is to reduce a large.51998
Pajek . On the networks of moderate size (up to some hundreds of units) we can use also the blockmodeling methods.00. 10. It is evident that a ‘proper’ ordering can reveal a structure in the network.9 Blockmodeling
Pajek .1. on the right side they are ordered on the basis of clustering results.
uki net bel lux fra ita den jap usa can bra arg ire swi spa por wge ege pol aus hun cze yug gre bul rum usr fin swe nor irn tur irq egy leb cha ind pak aut cub mex uru nig ken saf mor sud syr isr sau kuw sri tha mla gua hon els nic cos pan col ven ecu per chi tai kor vnr phi ins nze mli sen nir ivo upv gha cam gab maa alg hai dom jam tri bol par mat alb cyp ice dah nau gui lib sie tog car chd con zai uga bur rwa som eth tun liy jor yem afg mon kod brm nep kmr lao vnd
.7].shadow 0. Cj ) = R ∩ Ci × Cj Each such block consists of units belonging to clusters Ci and Cj and all arcs leading from cluster Ci to cluster Cj .00 World trade . clusters (classes) of units/ vertices that share structural characteristics deﬁned in terms of R. R ⊆ U × U . . R). . They form a clustering C = {C1 . Orderings
In Figure 10 the Snyder and Kick’s world trade network is presented by its matrix: on the left side the units (states) are ordered in the alphabetic order of their names.shadow 0. Each partition determines an equivalence relation (and vice versa). If i = j. 1979) .alphabetic order
afg alb alg arg aus aut bel bol bra brm bul bur cam can car cha chd chi col con cos cub cyp cze dah den dom ecu ege egy els eth fin fra gab gha gre gua gui hai hon hun ice ind ins ire irn irq isr ita ivo jam jap jor ken kmr kod kor kuw lao leb lib liy lux maa mat mex mla mli mon mor nau nep net nic nig nir nor nze pak pan par per phi pol por rum rwa saf sau sen sie som spa sri sud swe swi syr tai tha tog tri tun tur uga uki upv uru usa usr ven vnd vnr wge yem yug zai
Sep.3.Pajek. Analysis and Visualization of Large Networks
17
In Figure 9 a part of ‘themes’ from Reuters terror news network [14] determined by a cut of its GeoDeg normalization is presented.
Therefore the set Min(Φ. and of what type. the set Min(Φ. How to determine an appropriate blockmodel? The blockmodeling can be formulated as a clustering problem (Φ. the set of feasible clusterings Φ is also ﬁnite. P ) of all solutions of the problem (optimal clusterings) is not empty. Blockmodeling
A blockmodel consists of structures obtained by identifying all units from the same cluster of the clustering C. Some types of connections are presented in Figure 12. One of the possible ways of constructing a criterion function that directly reﬂects the considered equivalence is to measure the ﬁt of a clustering to
. P ) as follows: Determine the clustering C ∈ Φ for which P (C ) = min P (C)
C∈Φ
Since the set of units U is ﬁnite.18
Vladimir Batagelj and Andrej Mrvar
Fig. The reduced graph can be represented by relational matrix. P ) can be determined by the complete search – but it turns out that most cases of the clustering problem are N P hard. by reordering of network matrix so that the units from each cluster of the optimal clustering are located together we obtain a matrix representation of the network with visible structure. In theory. called also image matrix. The blockmodeling problems are usually solved using local optimization methods based on moving a unit from one cluster to another or interchanging two units between two clusters. 11. Also. For an exact deﬁnition of a blockmodel we have to be precise also about which blocks produce an arc in the reduced graph and which do not.
d is constructed on the basis of characterizations of types of blocks. Cv ). m5). . For all presented block types sensitive criterion functions can be constructed. 12. The function d has to be compatible with the selected type of equivalence. Analysis and Visualization of Large Networks
19
Fig. Ck }. Cv ). Given a clustering C = {C1 .Cv ∈C B∈B(Cu . .
.Cv )
min
d(R(Cu . The obtained clustering in 4 clusters is almost exact. Once a clustering C and types of blocks are determined. Block Types
an ideal one with perfect relations within each cluster and between clusters according to the considered equivalence. Cv ) denote the set of all ideal blocks corresponding to block R(Cu . we can also compute the values of connections by using averaging rules. Cv ). . . let B(Cu . B)
where the term d(R(Cu . The criterion function P (C) is sensitive iﬀ P (C) = 0 ⇔ C determines an exact blockmodeling. we also determine the type of the best ﬁtting ideal block (the types are ordered). C2 . Cv ) and the ideal block B. In Figure 13 a symmetric acyclic (edge connected inside clusters.Pajek. acyclic reduced graph) blockmodel [27] of Student Government at the University of Ljubljana [35] is presented. Determining the block error. The only error is produced by the arc (a3. Then the global error of clustering C can be expressed as P (C) =
Cu . B) measures the diﬀerence (error) between the block R(Cu .
partition (nominal or ordinal properties of vertices). The values of vectors can be used to determine several elements of network display such as: X. Y. Also the values of edges can be represented by the thickness and/or color.1
Implementation
Data structures
In Pajek analysis and visualization are performed using 6 data types: • • • • • • network (graph).
In the near future we intend to extend this list with a support of multiple networks and partitions of edges. The power of Pajek is based on several transformations that support diﬀerent transitions among these data structures. vector (numerical properties of vertices). ordinal properties). The operations are performed on the currently active (selected) data and are also returning the results through accumulators. Z coordinates and the size of the vertex shape. Also the menu structure (see Figure 14) of the main Pajek’s window is based on them. A Symmetric Acyclic Blockmodel of Student Government
4
4.20
Vladimir Batagelj and Andrej Mrvar
Fig. 13. cluster (subset of vertices). Pajek’s main window uses a ‘calculator’ paradigm with listaccumulator for each data type. and hierarchy (general tree structure on vertices). The partition can be graphically represented by the color and shape of vertices. permutation (reordering of vertices.
.
Frequently used sequences of operations can be deﬁned as macros. The procedures are available through the main window menus. symmetric. or selected parts of large networks: hierarchical clustering. Rajski coeﬃcient. besides the algorithms described in section 3. like: • simpliﬁcations and transformations: deleting loops. weak. 14. hubs and authorities. computing geodesics matrices. • CPM – critical paths. • shrinking clusters in network (generalized blockmodeling). measures of prestige. Pajek contains also some data analysis procedures which have higher order time complexities and can be therefore used only on smaller networks. generalized blockmodeling. .2
Implemented algorithms
In Pajek. • decompositions: symmetricacyclic. partitioning signed graphs [26]. • reordering: topological ordering. • social networks algorithms: centrality measures. • components: strong. several known eﬃcient algorithms are implemented.. TSP (Traveling Salesman Problem). chemistry. Spearman rank correlation coeﬃcient. Murtagh’s seriation and clumping algorithms. genealogy. Pajek’s Main Window
4. transforming arcs to edges etc. mathematics. • ﬂows: maximum ﬂow between two vertices. Richards’s numbering. multiple edges. brokerage roles. structural holes. • extracting subnetwork. • paths: shortest path(s). depth/breadth ﬁrst search. This allows also the adaptations of Pajek to groups of users from diﬀerent areas (social networks.Pajek. • measures of dependencies among partitions / vectors: Cramer’s V.
. all paths between two vertices. computer science. diﬀusion partitions. Analysis and Visualization of Large Networks
21
Fig. • neighborhood : kneighbours. . Pearson correlation coeﬃcient. biconnected. etc. ) for speciﬁc tasks. hierarchical clustering.
The layouts can be exported in the following output graphic formats that can be examined by special 2D and 3D viewers: Encapsulated PostScript (EPS) [31]. and Kinemages (Mage) [49]. sizes and/or shapes of vertices/edges. ﬁsheye views and block (matrix) representation. SVG and Mage). Pictures in SVG can be further controled using support written in Javascript. In Figure 15 a 3D layout of a graph obtained using eigenvector s is presented. drawing in 3D space. drawing in layers (genealogies and other acyclic structures). using values of edges as similarities or dissimilarities). and genealogical GEDCOM [30]. Vega graph ﬁles [54]. chemical MDLMOL [41] and BS..
5
Examples
Several examples of applications of Pajek were already presented as illustrations while describing selected algorithms. Several standard algorithms for automatic graph drawing are implemented: spring embedders (KamadaKawai and FruchtermanReingold).4
Interfaces
Pajek supports also some nonnative input formats: UCINET DL ﬁles [53].
4. Pajek supports also drawing sequences of networks in its Draw window. These algorithms were modiﬁed and extended to enable additional options: drawing with constraints (optimization of the selected part of the network.g.22
Vladimir Batagelj and Andrej Mrvar
4. and others).3
Layout Algorithms and Layout Features
Special emphasis is given in Pajek to automatic generation of network layouts. Pajek also provides tools for manual editing of graph layout. Ucinet. Properties of vertices/edges (given as data or computed) can be represented using colors. In Figure 16 a snapshoot of 3D layout displayed in a VRML viewer of our drawing of graph A from the Graph drawing contest 1997 is presented [33]. the user can prepare calls to her/his favorite viewers and other tools. VRML [24]. ﬁxing some vertices to predeﬁned positions.21]. layouts determined by eigenvectors (Lanczos algorithm).
. In the Tools menu. and exports sequences of networks in suitable formats that can be examined with special 2D or 3D viewers (e. Scalable Vector Graphics (SVG) [1]. The main window menu Tools provides export of Pajek’s data to statistical program R [48. It is also possible to run Pajek (+macros) from other programs (R. MDLMOL/ chime [41].
at its home page: http://vlado.unilj. On the things to do list we have: support for GraphML format. 3D layout obtained using eigenvectors
6
6.Pajek. implementing Pajek on Unix.1
Software
Architecture
Pajek is implemented in Delphi and runs on Windows operating systems. for noncommercial use. 15. The latest version is freely available. and replacing macros by a Javascript(?) based network scripting language.fmf.si/pub/networks/pajek/
.2 Availability
Pajek is still under development. 6. Analysis and Visualization of Large Networks
23
Fig.
Connections. Batagelj V. O. (1992) An Optimizational Approach to Regular Equivalence. 16. Adobe SVG Viewer (2002) http://www. Batagelj V. (2000) Clustering relational data. 7. Schader). 4757 11.. Gaul. Mrvar A.. 237243
. P.adobe.. Doreian. A. and Ferligoj. Batagelj. 143155. Social Networks. 315. (1998) Pajek – A Program for Large Network Analysis. Research report. in preparation. (199194) Programs for Network Analysis. Batagelj V. Presented at International Social Network Conference. (2000) Some Analyses of Erd˝s Collaboration Graph. 4.ijp. Springer.com/svg/viewer/install/ 2.: W. Opitz. Batagelj V.. Berlin. London. (1995) Towards NetML Networks Markup Language.si/pub/networks/ 10. GD’97 contest graph A in VRML
References
1.ps 9. Batagelj. Ferligoj A. July 610. Mrvar A. http://www..fmf. (2001) A Subquadratic Triad Census Algorithm for Large Sparse Networks with Small Maximum Degree. 6. (1997) Notes on blockmodeling. V. Batagelj V.. M.unilj. Batagelj V. Batagelj V. Batagelj V. Brandes U. Mrvar A.24
Vladimir Batagelj and Andrej Mrvar
Fig. Mrvar A. 22. 21 (2). 3. 1995. (2002) Fast generation of large sparse random graphs.. 121135. Social Networks 14. 23. Mrvar A.. 8.si/ftp/pub/preprints/ps/95/trp9515. Data Analysis (ed. 173186 12. V. (2002) Eﬃcient Algorithms for Citation Network Analysis 5. Batagelj V.. (1986) Graph – data structure and algorithms in pascal. Social Networks 19. http://vlado. o Social Networks.
11. C. Butts. Mrvar A. Batagelj V. in preparation.5. (2000) Symmetricacyclic decompositions of networks. 121128.. Submitted.. o http://www. J¨nger M. 17. submitted.. (1985) Algorithmic Graph Theory.at.com/~pmcbride/gedcom/55gctoc. Batagelj.cs. s http://arxiv. (1993) Recall versus recognition: Comparison of two alternative procedures for collecting social network data.html#sna 22. 17(1). Doreian P. Submitted.upenn.htm 34.htm 31. Graph Drawing Contest 1997.. (2002) The Erd˝s Number Project.library. A... MIT Press. 29. classif. 36. Zaverˇnik M. & Doreian.edu/~ghost/ 32. HAZU. V.si/pub/gd/gd97. c Dubrovnik. Czech Republic. Second Edition. Batagelj V..rootsweb. SpringerVerlag. 9097. Cosmo Player (2002) http://ca.unilj. Batagelj. Hlebec.P.H. M. Ghostview and GSview http://www.) GD’99.html 35.. 2001 LNCS 2265.
. 24. (2002) Exploratory Social Network Analysis With Pajek. Leiserson C. Zaverˇnik M. Grossman J. Zagreb.. 18. http://homepages. Mrvar A.) GD’01. 15. Mrvar A. 105126 (in Croat).fmf. Sher IH. (1989) Xgraph project documentation. (2002) Network analysis of texts. Batagelj V. V. (1996) A Partitioning Approach to Structural Balance. Batagelj V. Ghostscript. (2002) Generalized Cores. (2002) Pajek . http://www. (2002) Density based approaches to Reuters terror news network analysis. Anali Dubrovnik XL.edu/~grossman/erdoshp.. de Nooy W. Zaverˇnik M.org/abs/cs. 33. Metodoloˇki zvezki 9. Zaverˇnik M.. Batagelj V..wisc. 39–63. (2001) An O(m) Algorithm for Cores Decomposition s of Networks. Pisanski T. Ljubljana. Stiˇin Castle. Doreian. Gibbons A. http://cran.: The Use of Citation Data in Writing the History of Science. (1989) Connectivity in a citation network: The development of DNA theory. GEDCOM 5. 149168 27. Language s Technologies. (2002) sna: Tools for Social Network Analysis. to be published by the Cambridge University Press. Garﬁeld E. Social Networks. 143148. 19. Cambridge University Press. 18. SpringerVerlag.pdf 30.edu/papers/ useofcitdatawritinghistofsci. Social Networks. http://www. LNCS 1731. Stein C.org/src/contrib/PACKAGES.. Leipert S. J.L.Analysis and Visualization of Large Networks. Philadelphia: The Institute for Scientiﬁc Information. N. 26. Batagelj V.caida..DS/0202039 20. P. 16. (Eds.garfield. December 1964.Pajek.. (2002) Analiza rodoslova dubrovaˇkog vlasc teoskog kruga pomo´u programa Pajek. (2002) Triangular connectivity and its generals izations.rproject. Mrvar A. (2001) Introduction to Algorithms. and Zaverˇnik. http://vlado. 14.. Hummon. V.. Caida: Internet Visualization Tool Taxonomy. (1999) Partitioning Approach to Visuals ˇ r ization of Large Graphs. Ljubljana: s FDV. and Torpie RJ.com/cosmo/ 25. Dremelj P..org/tools/taxonomy/visualization/ 23.oakland.. Batagelj V. Ferligoj. In: Mutzel P. Mrvar A. Analysis and Visualization of Large Networks
25
13. 328. Vienna.T. In: Kratochvil J. 21.E. (Ed. 477478. Mrvar A. P.. Mrvar A. Cormen T.. Austria. 28. p. Rivest R. u September 2326. Batagelj V.. Batagelj V.
vol 415. http://www. University of Klagenfurt.duke. Proceedings of Sixth Austrian. J. http://kinemage. (1960) Dubrovaˇki patricijat u XIV veku. Batagelj V. Zaverˇnik M. In Proc 9th ACMSIAM Symposium on Discrete Algorithms. (1994) Social Network Analysis: Methods and Applications. 180183.uiuc.biochem. Cambridge University Press. October 57. (2002) http://www.mshri. (2000) Social Network Analysis: A Handbook. (1985) Multidimensional Clustering Algorithms. http://vlado.
.html 39. http://compgeom. s 44. Mrvar A. (2002) The Mage Page.. p. Stanford University. UCINET (2002) http://www.stanford. (2002) Winners dont’t take all. 56. Society for Industrial and Applied Mathematics Philadelphia. New York.J. Wasserman. 51. 48. FDV.si/ 55. PNAS. Beograd.soc. Richardson D. White D.edu/jwm/ 43. (1983) Data Structures and Network Algorithms. (1993) The Stanford GraphBase. New York: John Wiley and Sons.cornell. ftp://labrea. R. P. (2002) Analysis and visualization of 2s mode networks. 401426. c c 41. Project Vega (2002) http://vega.ijp.sbs. Richardson J. (1990) Graphs: An Introductory Approach. p. 45.. James Moody home page (2002) http://www. R. 668677.cs. Hungarian.mdli. D. 99/8.html 46.nj. I.ohiostate.rproject.html 57.edu/index.al. (2000) Relational Calculator .J. Tarjan..cs.com/kleinberg97authoritative. http://www. 59.nec. Wasserman S.com/ 54..psych. Vienna: PhysicaVerlag. Social Networks. E. and Pattison. Watkins.. Computational geometry database. Scott.C. http://vax. The R Project for Statistical Computing. Kleinberg J. D.S.26
Vladimir Batagelj and Andrej Mrvar
37.. Italian and Slovenian Meeting of Young Statisticians. Psychometrika. S. 17 (3). Murtagh. Batagelj V.fmf.on. Cambridge.edu/home/kleinber/auth. (1999) Analyzing Large Kinship and Marriage Networks with Pgraph and Pajek. Mrvar A. Metodoloˇki zvezki 16. Yuen Ho.edu/library/odlis. 245274 58. Social Science Computer Review. 2nd edition.wcsu. ACM Press. Seidman S.com/ 42. B. Ossiach. et. (1998) Authoritative sources in a hyperlinked environment. 52. Mrvar A.edu/pstar/index. 5.si/pub/networks/data/ 47. 53. 52075211.unilj. F.org/ 49. J.edu/pub/sgb/ 40.R.ps http://citeseer. Pajek’s datasets. http://kentucky. Nature. Austria. Knuth..uiuc. Wilson.a tool for analyzing social networks.analytictech.. Mahnken. Compstat lectures. 4. Pennock etal.html 38. (2002) Systematic identiﬁcation of protein complexes in Saccharomyces cerevisiae by mass spectrometry. MDL Information Systems. Batagelj V. 113123.pdf 60. 6376. (1983) Network structure and minimum degree. E. http://www.html 50. Faust K. (2002). Pennsylvania.edu/~jeffe/compgeom/biblios. Inc. 2001. 269–287. London: Sage Publications. ODLIS (2002) Online dictionary of library and information science. Jones B. Nauˇno delo.M. (1996) Logit models and logistic regressions for social networks: I.ca/tyers/pdfs/proteome. Ljubljana.. 60. An introduction to Markov graphs and p∗ .