CS109/Stat121/AC209/E-109

Data Science
Network Models
Hanspeter Pfister & Joe Blitzstein
pfister@seas.harvard.edu / blitzstein@stat.harvard.edu
1
5
4 3
2
This Week

HW4 due tonight at 11:59 pm

Friday lab 10-11:30 am in MD G115
Examples from Newman (2003)
I Introduction 3
FIG. 2 Three examples of the kinds of networks that are the topic of this review. (a) A food web of predator-prey interactions
between species in a freshwater lake [272]. Picture courtesy of Neo Martinez and Richard Williams. (b) The network of
collaborations between scientists at a private research institution [171]. (c) A network of sexual contacts between individuals
in the study by Potterat et al. [342].
A. Types of networks
A set of vertices joined by edges is only the simplest
type of network; there are many ways in which networks
may be more complex than this (Fig. 3). For instance,
there may be more than one different type of vertex in a
network, or more than one different type of edge. And
vertices or edges may have a variety of properties, nu-
merical or otherwise, associated with them. Taking the
example of a social network of people, the vertices may
represent men or women, people of different nationalities,
locations, ages, incomes, or many other things. Edges
may represent friendship, but they could also represent
animosity, or professional acquaintance, or geographical
proximity. They can carry weights, representing, say,
how well two people know each other. They can also be
directed, pointing in only one direction. Graphs com-
posed of directed edges are themselves called directed
graphs or sometimes digraphs, for short. A graph rep-
resenting telephone calls or email messages between in-
dividuals would be directed, since each message goes in
only one direction. Directed graphs can be either cyclic,
meaning they contain closed loops of edges, or acyclic
meaning they do not. Some networks, such as food webs,
are approximately but not perfectly acyclic.
One can also have hyperedges—edges that join more
than two vertices together. Graphs containing such edges
are called hypergraphs. Hyperedges could be used to in-
dicate family ties in a social network for example—n in-
dividuals connected to each other by virtue of belonging
to the same immediate family could be represented by
an n-edge joining them. Graphs may also be naturally
partitioned in various ways. We will see a number of
examples in this review of bipartite graphs: graphs that
contain vertices of two distinct types, with edges running
only between unlike types. So-called affiliation networks
Graphs
A graph G=(V,E) consists of a vertex set V and an
edge set E containing unordered pairs {i,j} of vertices.
1
15
7
16
2
12
10
6
8
11
13
14
4
5
9
3
1
2 3
4
graph multigraph
The degree of vertex v is the number of
edges attached to it.
A Plea for Clarity: What is a Network?

graph vs. multigraph (are loops, multiple edges ok?
What is a “simple” graph?)

directed vs. undirected

weighted vs. unweighted

dynamics of vs. dynamics on

labeled vs. unlabeled

network as quantity of interest vs. quantities of
interest on networks
Why model networks?

Hard to interpret “hairballs”.

We can define some interesting features
(statistics) of a network, such as measures of
clustering, and compare the observed values
against those of a model

Warning: much of the network literature
carelessly ignores the way in which the network
data were gathered (sampling) and whether
there are missing/unknown nodes or edges!
Erdos-Renyi Random Graph Model

Independently flip coins with prob. p of heads

Let n get large and p get small, with the average
degree c = (n-1)p held constant.

What happens for c < 1?

What happens for c > 1?

What happens for c = 1?
Degree Sequences
1
2
3
4 5
6
7 9
8
Take V = {1, . . . , n} and let d
i
be the degree of vertex i.
The degree sequence of G is d = (d
1
, . . . , d
n
).
n = 9, d = (3, 4, 3, 3, 4, 3, 3, 3, 2)
A sequence d is graphical if there is a graph G with degree sequence d.
G is a realization of d.
MCMC on Networks
mixing times, burn-in, bottlenecks, autocorrelation,...
Switchings Chain
1 2
3 4
1 2
3 4
Power Laws

Power-law (a.k.a. scale-free) networks: the number of
vertices of degree k is proportional to k
-
!


Stumpf et al (2005): Subnets of scale-free networks are
not scale-free, especially for large
!


Their subnets are i.i.d. node-based.

What about features other than degree distributions?
p1 Model (Holland-Leinhardt 1981)
3.4 The p
1
Model for Social Networks
A conceptually separate thread of research developed in parallel in the statistics and social
sciences literature, starting with the introduction of the p
1
model. Consider a directed graph
on the set of n nodes. Holland and Leinhardt’s p
1
model focuses on dyadic pairings and
keeps track of whether node i links to j, j to i, neither, or both. It contains the following
parameters:
• θ: a base rate for edge propagation,
• α
i
(expansiveness): the effect of an outgoing edge from i,
• β
j
(popularity): the effect of an incoming edge into j,
• ρ
ij
(reciprocation/mutuality): the added effect of reciprocated edges.
Let P(0, 0) be the probability for the absence of an edge between i and j, P
ij
(1, 0) the
probability of i linking to j (“1” indicates the outgoing node of the edge), P
ij
(1, 1) the
probability of i linking to j and j linking to i. The p
1
model posits the following probabilities
(see [149]):
log P
ij
(0, 0) = λ
ij
, (3.1)
log P
ij
(1, 0) = λ
ij
+ α
i
+ β
j
+ θ, (3.2)
log P
ij
(0, 1) = λ
ij
+ α
j
+ β
i
+ θ, (3.3)
log P
ij
(1, 1) = λ
ij
+ α
i
+ β
j
+ α
j
+ β
i
+ 2θ + ρ
ij
. (3.4)
In this representation of p
1
, λ
ij
is a normalizing constant to ensure that the probabilities
for each dyad (i, j) add to 1. For our present purposes, assume that the dyad is in one
and only one of the four possible states. The reciprocation effect, ρ
ij
, implies that the odds
of observing a mutual dyad, with an edge from node i to node j and one from j to i, is
enhanced by a factor of exp(ρ
ij
) over and above what we would expect if the edges occured
independently of one another.
The problem with this general p
1
representation is that there is a lack of identification of
the reciprocation parameters. The following special cases of p
1
are identifiable and of special
interest:
1. α
i
= 0, β
j
= 0, and ρ
ij
= 0. This is basically an Erd¨os-R´enyi-Gilbert model for
directed graphs: each directed edge has the same probability of appearance.
2. ρ
ij
= 0, no reciprocal effect. This model effectively focuses solely on the degree distri-
butions into and out of nodes.
3. ρ
ij
= ρ, constant reciprocation. This was the version of p
1
studied in depth by Holland
and Leinhardt using maximum likelihood estimation.
27
ERGMs (Exponential Random Graph Models)
26 J. BLITZSTEIN AND P. DIACONIS
More formally, define a probability measure P
β
on the space of all graphs on n vertices
by
P
β
(G) = Z
−1
exp


n

i=1
β
i
d
i
(G)

,
where Z is a normalizing constant. The real parameters β
1
, . . . , β
n
are chosen to achieve
given expected degrees. This model appears explicitly in Park and Newman [59], using the
tools and language of statistical mechanics.
Holland and Leinhardt [35] give iterative algorithms for the maximum likelihood es-
timators of the parameters, and Snijders [65] considers MCMC methods. Techniques of
Haberman [31] can be used to prove that the maximum likelihood estimates of the β
i
are
consistent and asymptotically normal as n →∞, provided that there is a constant B such
that |β
i
| ≤ B for all i.
Such exponential models are standard fare in statistics, statistical mechanics, and social
networking (where they are called p

models). They are used for directed graphs in Holland
and Leinhardt [35] and for graphs in Frank and Strauss [27, 70] and Snijders [65, 66],
with a variety of sufficient statistics (see the surveys in [3], [56], and [66]). One standard
motivation for using the probability measure P
β
when the degree sequence is the main
feature of interest is that this model gives the maximum entropy distribution on graphs
with a given expected degree sequence (see Lauritzen [43] for further discussion of this).
Unlike most other exponential models on graphs, the normalizing constant Z is available in
closed form. Furthermore, there is an easy method of sampling exactly from P
β
, as shown
by the following. The same formulas are given in [59], but for completeness we provide a
brief proof.
Lemma 1. Fix real parameters β
1
, . . . , β
n
. Let Y
ij
be independent binary random vari-
ables for 1 ≤ i < j ≤ n, with
P(Y
ij
= 1) =
e
−(β
i

j
)
1 + e
−(β
i

j
)
= 1 −P(Y
ij
= 0).
Form a random graph G by creating an edge between i and j if and only if Y
ij
= 1. Then
G is distributed according to P
β
, with
Z =

1≤i<j≤n
(1 + e
−(β
i

j
)
).
Proof. Let G be a graph and y
ij
= 1 if {i, j} is an edge of G, y
ij
= 0 otherwise. Then
the probability of G under the above procedure is
P(Y
ij
= y
ij
for all i, j) =

i<j
e
−y
ij

i

j
)
1 + e
−(β
i

j
)
.
How can we test and fit this model?
How can we use this model?
Pseudolikelihood (Strauss-Ikeda ’80)
Fix a pair of nodes {i,j}, and consider the indicator r.v. of
whether an edge {i,j} is present in G.
Conditioning on the rest of G yields great simplification:
P(edge {i, j}|rest)
P(no edge {i, j}|rest)
= e
β

(x(G
+
)−x(G

))
So use logistic regression? Be careful of variance estimates!
MCMCMLE (Geyer-Thompson ’92)
But why ∝ and not =?
Don’t know the normalizing constant!
From now on we write
P
β
(G) =
exp(β

x(G))
c(β)
6
Write
Fix some baseline
β
0
and estimate log-likelihood ratio.
l(β) −l(β
0
) = (β −β
0
)

x(G) −log
c(β)
c(β
0
)
c(β)
c(β
0
)
= E
β
0
q
β
(G)
q
β
0
(G)
Ratio of normalizing
constants is:
= q
β
(G)/c(β)
So can approximate the MLE via MCMC.
What about the choice of
β
0
though?
i.i.d. node i.i.d. edge snowball RDS
short
paths
Erdos
Dyad
Indep.
ERGM
Fixed
degree
Geom
Latent Space Models
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hoff et al (2002) model:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Degrees
6
3
6
2
4
7
5
4
6
3
5
5
4
2
4
4
2
3
3
2
Normalization?
Closeness
uses the reciprocal of the average
shortest distance to other nodes
0.56
0.46
0.54
0.4
0.46
0.54
0.53
0.46
0.53
0.45
0.53
0.5
0.51
0.39
0.46
0.49
0.39
0.45
0.4
0.43
Betweenness
many variations:
shortest paths
vs. flow
maximization
vs. all paths vs.
random paths
Eigenvector Centrality
use eigenvector of A corresponding to the largest
eigenvalue (Bonacich); more generally, “power centrality”

Sign up to vote on this title
UsefulNot useful