This action might not be possible to undo. Are you sure you want to continue?
CS109/Stat121/AC209/E109
Data Science
Network Models
Hanspeter Pﬁster & Joe Blitzstein
pﬁster@seas.harvard.edu / blitzstein@stat.harvard.edu
1
5
4 3
2
This Week
•
HW4 due tonight at 11:59 pm
•
Friday lab 1011:30 am in MD G115
Examples from Newman (2003)
I Introduction 3
FIG. 2 Three examples of the kinds of networks that are the topic of this review. (a) A food web of predatorprey interactions
between species in a freshwater lake [272]. Picture courtesy of Neo Martinez and Richard Williams. (b) The network of
collaborations between scientists at a private research institution [171]. (c) A network of sexual contacts between individuals
in the study by Potterat et al. [342].
A. Types of networks
A set of vertices joined by edges is only the simplest
type of network; there are many ways in which networks
may be more complex than this (Fig. 3). For instance,
there may be more than one diﬀerent type of vertex in a
network, or more than one diﬀerent type of edge. And
vertices or edges may have a variety of properties, nu
merical or otherwise, associated with them. Taking the
example of a social network of people, the vertices may
represent men or women, people of diﬀerent nationalities,
locations, ages, incomes, or many other things. Edges
may represent friendship, but they could also represent
animosity, or professional acquaintance, or geographical
proximity. They can carry weights, representing, say,
how well two people know each other. They can also be
directed, pointing in only one direction. Graphs com
posed of directed edges are themselves called directed
graphs or sometimes digraphs, for short. A graph rep
resenting telephone calls or email messages between in
dividuals would be directed, since each message goes in
only one direction. Directed graphs can be either cyclic,
meaning they contain closed loops of edges, or acyclic
meaning they do not. Some networks, such as food webs,
are approximately but not perfectly acyclic.
One can also have hyperedges—edges that join more
than two vertices together. Graphs containing such edges
are called hypergraphs. Hyperedges could be used to in
dicate family ties in a social network for example—n in
dividuals connected to each other by virtue of belonging
to the same immediate family could be represented by
an nedge joining them. Graphs may also be naturally
partitioned in various ways. We will see a number of
examples in this review of bipartite graphs: graphs that
contain vertices of two distinct types, with edges running
only between unlike types. Socalled aﬃliation networks
Graphs
A graph G=(V,E) consists of a vertex set V and an
edge set E containing unordered pairs {i,j} of vertices.
1
15
7
16
2
12
10
6
8
11
13
14
4
5
9
3
1
2 3
4
graph multigraph
The degree of vertex v is the number of
edges attached to it.
A Plea for Clarity: What is a Network?
•
graph vs. multigraph (are loops, multiple edges ok?
What is a “simple” graph?)
•
directed vs. undirected
•
weighted vs. unweighted
•
dynamics of vs. dynamics on
•
labeled vs. unlabeled
•
network as quantity of interest vs. quantities of
interest on networks
Why model networks?
•
Hard to interpret “hairballs”.
•
We can deﬁne some interesting features
(statistics) of a network, such as measures of
clustering, and compare the observed values
against those of a model
•
Warning: much of the network literature
carelessly ignores the way in which the network
data were gathered (sampling) and whether
there are missing/unknown nodes or edges!
ErdosRenyi Random Graph Model
•
Independently ﬂip coins with prob. p of heads
•
Let n get large and p get small, with the average
degree c = (n1)p held constant.
•
What happens for c < 1?
•
What happens for c > 1?
•
What happens for c = 1?
Degree Sequences
1
2
3
4 5
6
7 9
8
Take V = {1, . . . , n} and let d
i
be the degree of vertex i.
The degree sequence of G is d = (d
1
, . . . , d
n
).
n = 9, d = (3, 4, 3, 3, 4, 3, 3, 3, 2)
A sequence d is graphical if there is a graph G with degree sequence d.
G is a realization of d.
MCMC on Networks
mixing times, burnin, bottlenecks, autocorrelation,...
Switchings Chain
1 2
3 4
1 2
3 4
Power Laws
•
Powerlaw (a.k.a. scalefree) networks: the number of
vertices of degree k is proportional to k

!
•
Stumpf et al (2005): Subnets of scalefree networks are
not scalefree, especially for large
!
•
Their subnets are i.i.d. nodebased.
•
What about features other than degree distributions?
p1 Model (HollandLeinhardt 1981)
3.4 The p
1
Model for Social Networks
A conceptually separate thread of research developed in parallel in the statistics and social
sciences literature, starting with the introduction of the p
1
model. Consider a directed graph
on the set of n nodes. Holland and Leinhardt’s p
1
model focuses on dyadic pairings and
keeps track of whether node i links to j, j to i, neither, or both. It contains the following
parameters:
• θ: a base rate for edge propagation,
• α
i
(expansiveness): the eﬀect of an outgoing edge from i,
• β
j
(popularity): the eﬀect of an incoming edge into j,
• ρ
ij
(reciprocation/mutuality): the added eﬀect of reciprocated edges.
Let P(0, 0) be the probability for the absence of an edge between i and j, P
ij
(1, 0) the
probability of i linking to j (“1” indicates the outgoing node of the edge), P
ij
(1, 1) the
probability of i linking to j and j linking to i. The p
1
model posits the following probabilities
(see [149]):
log P
ij
(0, 0) = λ
ij
, (3.1)
log P
ij
(1, 0) = λ
ij
+ α
i
+ β
j
+ θ, (3.2)
log P
ij
(0, 1) = λ
ij
+ α
j
+ β
i
+ θ, (3.3)
log P
ij
(1, 1) = λ
ij
+ α
i
+ β
j
+ α
j
+ β
i
+ 2θ + ρ
ij
. (3.4)
In this representation of p
1
, λ
ij
is a normalizing constant to ensure that the probabilities
for each dyad (i, j) add to 1. For our present purposes, assume that the dyad is in one
and only one of the four possible states. The reciprocation eﬀect, ρ
ij
, implies that the odds
of observing a mutual dyad, with an edge from node i to node j and one from j to i, is
enhanced by a factor of exp(ρ
ij
) over and above what we would expect if the edges occured
independently of one another.
The problem with this general p
1
representation is that there is a lack of identiﬁcation of
the reciprocation parameters. The following special cases of p
1
are identiﬁable and of special
interest:
1. α
i
= 0, β
j
= 0, and ρ
ij
= 0. This is basically an Erd¨osR´enyiGilbert model for
directed graphs: each directed edge has the same probability of appearance.
2. ρ
ij
= 0, no reciprocal eﬀect. This model eﬀectively focuses solely on the degree distri
butions into and out of nodes.
3. ρ
ij
= ρ, constant reciprocation. This was the version of p
1
studied in depth by Holland
and Leinhardt using maximum likelihood estimation.
27
ERGMs (Exponential Random Graph Models)
26 J. BLITZSTEIN AND P. DIACONIS
More formally, deﬁne a probability measure P
β
on the space of all graphs on n vertices
by
P
β
(G) = Z
−1
exp
−
n
⇤
i=1
β
i
d
i
(G)
⇥
,
where Z is a normalizing constant. The real parameters β
1
, . . . , β
n
are chosen to achieve
given expected degrees. This model appears explicitly in Park and Newman [59], using the
tools and language of statistical mechanics.
Holland and Leinhardt [35] give iterative algorithms for the maximum likelihood es
timators of the parameters, and Snijders [65] considers MCMC methods. Techniques of
Haberman [31] can be used to prove that the maximum likelihood estimates of the β
i
are
consistent and asymptotically normal as n →∞, provided that there is a constant B such
that β
i
 ≤ B for all i.
Such exponential models are standard fare in statistics, statistical mechanics, and social
networking (where they are called p
∗
models). They are used for directed graphs in Holland
and Leinhardt [35] and for graphs in Frank and Strauss [27, 70] and Snijders [65, 66],
with a variety of suﬃcient statistics (see the surveys in [3], [56], and [66]). One standard
motivation for using the probability measure P
β
when the degree sequence is the main
feature of interest is that this model gives the maximum entropy distribution on graphs
with a given expected degree sequence (see Lauritzen [43] for further discussion of this).
Unlike most other exponential models on graphs, the normalizing constant Z is available in
closed form. Furthermore, there is an easy method of sampling exactly from P
β
, as shown
by the following. The same formulas are given in [59], but for completeness we provide a
brief proof.
Lemma 1. Fix real parameters β
1
, . . . , β
n
. Let Y
ij
be independent binary random vari
ables for 1 ≤ i < j ≤ n, with
P(Y
ij
= 1) =
e
−(β
i
+β
j
)
1 + e
−(β
i
+β
j
)
= 1 −P(Y
ij
= 0).
Form a random graph G by creating an edge between i and j if and only if Y
ij
= 1. Then
G is distributed according to P
β
, with
Z =
⌅
1≤i<j≤n
(1 + e
−(β
i
+β
j
)
).
Proof. Let G be a graph and y
ij
= 1 if {i, j} is an edge of G, y
ij
= 0 otherwise. Then
the probability of G under the above procedure is
P(Y
ij
= y
ij
for all i, j) =
⌅
i<j
e
−y
ij
(β
i
+β
j
)
1 + e
−(β
i
+β
j
)
.
How can we test and ﬁt this model?
How can we use this model?
Pseudolikelihood (StraussIkeda ’80)
Fix a pair of nodes {i,j}, and consider the indicator r.v. of
whether an edge {i,j} is present in G.
Conditioning on the rest of G yields great simpliﬁcation:
P(edge {i, j}rest)
P(no edge {i, j}rest)
= e
β
⇥
(x(G
+
)−x(G
−
))
So use logistic regression? Be careful of variance estimates!
MCMCMLE (GeyerThompson ’92)
But why ∝ and not =?
Don’t know the normalizing constant!
From now on we write
P
β
(G) =
exp(β
′
x(G))
c(β)
6
Write
Fix some baseline
β
0
and estimate loglikelihood ratio.
l(β) −l(β
0
) = (β −β
0
)
x(G) −log
c(β)
c(β
0
)
c(β)
c(β
0
)
= E
β
0
q
β
(G)
q
β
0
(G)
Ratio of normalizing
constants is:
= q
β
(G)/c(β)
So can approximate the MLE via MCMC.
What about the choice of
β
0
though?
i.i.d. node i.i.d. edge snowball RDS
short
paths
Erdos
Dyad
Indep.
ERGM
Fixed
degree
Geom
Latent Space Models
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hoff et al (2002) model:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Degrees
6
3
6
2
4
7
5
4
6
3
5
5
4
2
4
4
2
3
3
2
Normalization?
Closeness
uses the reciprocal of the average
shortest distance to other nodes
0.56
0.46
0.54
0.4
0.46
0.54
0.53
0.46
0.53
0.45
0.53
0.5
0.51
0.39
0.46
0.49
0.39
0.45
0.4
0.43
Betweenness
many variations:
shortest paths
vs. ﬂow
maximization
vs. all paths vs.
random paths
Eigenvector Centrality
use eigenvector of A corresponding to the largest
eigenvalue (Bonacich); more generally, “power centrality”