SoICT-Eng - ProbComp - Lec 9 - Random Network Models

Models & Algorithms
for Internet
Computing
Lecture 9: RANDOM NETWORK MODELS
A nice book to discover the field
• Markov chain and basic concepts

• Page Rank Model and Search Engine
• Stationary Distribution
Probability for Computing 3

What is a network (graph) model?
Network models
A brief cover on:
I. Erdos–Renyi random graphs
II. Generalized random graphs
with the same degree distribution as the data networks
III. Small-world networks
IV. Scale-free networks
V. Hierarchical model
VI. Geometric random graphs
The E-R model
Erdos–Renyi random graphs (ER)
• Model a data network G(V,E) with |V|=n and |E|=m
• An ER graph that models G is constructed as follows:
• It has n nodes
• Edges are added between pairs of nodes uniformly at
random with the same probability p
• Two (equivalent) methods for constructing ER graphs:
• Gn,p: pick p so that the resulting model network has m
edges
• Gn,m: pick randomly m pairs of nodes and add edges
between them with probability 1
• Number of edges, |E|=m, in Gn,p is:
• Average degree is:

• Many properties of ER can be proven theoretically
(See: Bollobas, "Random Graphs," 2002)
• Example:
• When m=n/2,suddenly the giant component
emerges, i.e.:
• One connected component of the network has
O(n) nodes
• The next largest connected component has
O(log(n)) nodes
DEGREE DISTRIBUTION OF A RANDOM GRAPH
Select k
nodes from N- probability
1 of
probability
missing N-
of
1-k
having k
edges
edges
As the network size increases, the distribution becomes increasingly

narrow—we are increasingly confident that the degree of a node is in the
vicinity of <k>.
Network Science: Random Graphs
• The degree distribution is binomial:
• For large n, this can be approximated with

Poisson distribution:
where z is the average degree

• However, many real world networks have
power-law degree distribution
• Clustering coefficient, C, of ER is low (for low p)
• C=p, since probability p of connecting any two

nodes in an ER graph is the same, regardless of
whether the nodes are neighbors
• However, many real world networks have high

clustering coefficients
• Average diameter of ER graphs is small
• It is equal to
• Real networks also have small average diameters
• Summary
Generalized random graphs (ER-DD)
• Preserve the degree distribution of data
(“ER-DD”)
• Constructed as follows:
• An ER-DD network has n nodes
(so does the data)
• Edges are added between pairs of nodes using
the “stubs method” [configuration model
discussed earlier]
• The “stubs method” for constructing ER-DD
graphs:
• The number of “stubs” (to be filled by edges) is
assigned to each node in the model network
according to the degree distribution of the real
network to be modeled
• Edges are created between pairs of nodes with
“available” stubs picked at random
• After an edge is created, the number of stubs left
available at the corresponding “end nodes” of the
edges is decreased by one
• Multiple edges between the same pair of nodes are
not allowed
• Summary
• 2 global network properties are matched by ER-DD

Small-world networks (SW)
• Watts and Strogatz,
1998
• Created from
regular ring lattices
by random rewiring
of a small
percentage of their
edges
Small-world networks (SW)
• SW networks have:
• High clustering coefficients – introduced by “ring
regularity”
• Large average diameters of regular lattices – fixed
by randomly re-wiring a small percentage of edges
• Summary
Scale-free networks (SF)
• Power-law degree distributions: P(k) = k−γ
• γ > 0; 2 < γ < 3
• Power-law degree distributions: P(k) = k−γ
• γ > 0; 2 < γ < 3
• Different models exist, e.g.:
• A popular one is:

• Preferential Attachment Model (SF-BA)
(Barabasi-Albert, 1999)
• Preferential Attachment Model (SF-BA)
• “Growth” model: nodes are added to an existing
network
• New nodes preferentially attach to existing nodes with
probability proportional to the degrees of the existing
nodes; e.g.:
• This is repeated until the size of SF network matches
the size of the data
• “Rich getting richer”
• Summary
Hierarchical model
• Preserves network “modularity” via a fractal-
like generation of the network
Hierarchical model
• These graphs do not match any biological data
and are highly unlikely to be found in data sets
Geometric random graphs
• “Uniform” geometric random graphs (GEO)
• Take any metric space and, using a uniform random
distribution, place nodes within the space
• If any nodes are within radius r (calculated via any
chosen distance norm for the space), they will be
connected
• Choose r so that the size of the GEO network matches
that of the data
• There are many possible metric spaces (e.g., Euclidean
space)
• There are many possible distance norms
(e.g. the Euclidean distance, the Chessboard distance,
and the Manhattan/Taxi Driver distance)
Geometric random graphs
• “Uniform” geometric random graphs (GEO)
• Summary
Stochastic Block Models
• M matrix (constant p)  Erdos Renyi model

• M matrix (not constant)  Erdos Renyi within community; random bipartite across
communities.
• Vertices within a community are considered exchangeable (i.e. probabilistically equivalent
with respect to their interactions with other vertices)
SBM: Representation and Instantiation
SBM: ASSORTATIVE
NETWORKS
SBM: Dissortative Networks
SBM contd.
• A number of other composable models can be viewed as
Stochastic Block Models
• One may compose/create multiple generative models in this
fashion
• Can handle directed networks (M is not symmetric in this case)

• Can potentially model all three properties of real networks one
is often interested in!
Random network
models
MORE ON ERDOS-RENYI RANDOM GRAPH
RANDOM NETWORK MODEL
Pál Erdös Alfréd Rényi

(1913-1996) (1921-1970)
Erdös-Rényi model (1960)
Connect with probability p
p=1/6
N=10
k ~ 1.5
Network Science: Random Graphs 2012

Definition:
A random graph is a graph of N labeled nodes where each

pair of nodes is connected by a preset probability p.
We will call is G(N, p).

p=1/6
N=12

p=0.03
N=100
Note: No node has a very high degree. Rather, it is very unlikely for one
node to have a very high degree. Why? (HW question)

N and p do not uniquely define

the network– we can have many
different realizations of it. How
many?
N=1
0
p=1/
6
The probability to form a particular graph G(N,p)

That is, each graph
is G(N,p) appears with
probability
P(G(N,p)).

P(L): the probability to have exactly L links in a network of N nodes and

probability p:
The maximum number of
links in a network of N
nodes.
Binomial distribution...
Number of different ways we can

choose L links among all
potential links.

MATH TUTORIAL the mean of a binomial distribution
There is a faster way using generating functions, see:

http://planetmath.org/encyclopedia/BernoulliDistribution2.html Network Science: Random Graphs 2012
MATH TUTORIAL the variance of a binomial distribution
 ( X )  E ( X )  E  X 
2 2 2
http://keral2008.blogspot.com/2008/10/derivation-of-mean-and-variance-of.html
MATH TUTORIAL the variance of a binomial distribution
MATH TUTORIAL Binomian Distribution: The bottom line
P(L): the probability to have a network of

exactly L links
•The average number of links <L> in a random

graph
•The standard deviation

Select k
nodes from N- probability
1 of
probability
missing N-
of
1-k
having k
edges
edges
As the network size increases, the distribution becomes increasingly

narrow—we are increasingly confident that the degree of a node is in the
vicinity of <k>.
or large N and small k, we can use the following approximations:
k  k  k
ln[(1  p) ( N 1) k ]  ( N  1  k ) ln(1  )  ( N  1  k )    k  (1  )k 
N 1 N 1 N 1
for

P(k)
k Network Science: Random Graphs 2012

DEGREE DISTRIBUTION OF A RANDOM NETWORK
Exact Result Large N limit

-binomial distribution- -Poisson distribution-
Probability Distribution Function
(PDF)

NODES HAVE COMPARABLE DEGREES IN RANDOM NETWORKS
What does it mean? Continuum formalism:
If we consider a network with average degree <k> then the probability to

have a node whose degree exceeds a degree k 0 is:
For example, with <k>=10,

•the probability to find a node whose degree is at least twice the average degree is 0.00158826.
•the probability to find a node whose degree is at least ten times the average degree is
1.79967152 × 10-13
•the probability to find a node whose degree is less than a tenth of the average degree is
0.00049
What does it mean? Discrete formalism:
•The probability of seeing a node with very high of very low degree is
exponentially small.
•Most nodes have comparable degrees.
•The larger the size of a random network, the more similar are the node
NO OUTLIERS IN A RANDOM SOCIETY
According to sociological research, for a typical individual k ~1,000
The probability to find an individual with degree k>2,000 is 10 -27.
Given that N ~109, the chance of finding an individual with 2,000 acquaintances is so tiny
that such nodes are virtually non-existent in a random society.
a random society would consist of mainly average individuals, with everyone with
roughly the same number of friends.
It would lack outliers, individuals that are either highly popular or recluse.

SIX DEGREES small worlds
Sarah
Jan
e Ralph
Pete
r Frigyes Karinthy,
1929
Stanley Milgram,
1967
SIX DEGREES 1967: Stanley Milgram
HOW TO TAKE PART IN THIS STUDY
1. ADD YOUR NAME TO THE ROSTER AT THE BOTTOM OF THIS SHEET, so that the next
person who receives this letter will know who it came from.
2. DETACH ONE POSTCARD. FILL IT AND RETURN IT TO HARVARD UNIVERSITY. No stamp

is needed. The postcard is very important. It allows us to keep track of the progress of the folder as
it moves toward the target person.
3. IF YOU KNOW THE TARGET PERSON ON A PERSONAL BASIS, MAIL THIS FOLDER
DIRECTLY TO HIM (HER). Do this only if you have previously met the target person and know each
other on a first name basis.
4. IF YOU DO NOT KNOW THE TARGET PERSON ON A PERSONAL BASIS, DO NOT TRY TO
CONTACT HIM DIRECTLY. INSTEAD, MAIL THIS FOLDER (POST CARDS AND ALL) TO A PERSONAL
ACQUAINTANCE WHO IS MORE LIKELY THAN YOU TO KNOW THE TARGET PERSON. You may send
the folder to a friend, relative or acquaintance, but it must be someone you know on a first name
basis.

DISTANCES IN RANDOM GRAPHS
Random graphs tend to have a tree-like topology (???) with almost

constant node degrees.
• nr. of first neighbors:

N1  k
2
• nr. of second neighbors: N2  k
•nr. of neighbours at distance d:
• estimate maximum distance:
d for the world = log (7 billion) / log

(1000) = 3.28 Network Science: Random Graphs 2012
DISTANCES IN RANDOM GRAPHS compare with real data
log N
l max 
log k
Given the huge differences in scope, size, and average degree, the
agreement is excellent. Network Science: Random Graphs 2012
EVOLUTION OF A RANDOM NETWORK
Until now we focused on the static properties of a random graph with fixes
p value.
What happens when vary the parameter p?
GOTO http://cs.gmu.edu/~astavrou/random.html
Choose Nodes=100.
Note that the p goes up in increments of 0.001, which, for N=100, L=pN(N-
1)/2~p*50,000, i.e. each increment is really about 50 new lines.

EVOLUTION OF A RANDOM NETWORK
disconnected nodes  NETWORK.
<k>
How does this transition happen? Network Science: Random Graphs 2012
THE PHASE TRANSITION TAKES PLACE AT <k>=1
Let us denote with u=1-Ng/N, i.e., the fraction of nodes that are NOT part of
the giant component (GC) Ng .
For a node i to be part of the GC, it needs to connect to it via another node j.
If i is NOT part of the GC, that could happen for two reasons:
Case A: node i does not connect to node j,

Probability: 1-p
Case B: node i connects to j, but j is not connected to the GC:

Probability: pu
Total probability that i is not part of the GC via node j is:

1-p+pu
The probability that i is not linked to the GC via any other node is (1-
p+pu)N-1
The probability that i is linked to the GC is 1-
Hence:
or any p and N this equation provides the size of the giant component as NGC=N(1-u)

EVOLUTION OF A RANDOM GRAPH
ng p=<k>/(N-1) and taking the log of both sides and using <k><<N we obtain:
Taking an exponential of both sides we obtain
if we denote with S the fraction of nodes in the giant component, S=N GC/N, i.e. S=1-u
Erdos and Renyi, 1959

THE PHASE TRANSTION IN A RN TAKES PLACE AT <k>=1
S: the fraction of nodes in the giant component, S=N g/N
Set S=0, we obtain a

Phase transition point: phase transition at
<k>=1
(a) (b)
after Newman, 2010

EVOLUTION OF A RANDOM GRAPH
Analytical result Numerical result

CLUSTER SIZE DISTRIBUTION
Probability that a
randomly selected node
belongs to a cluster of
size s:
The distribution of
cluster sizes at
the critical point,
displayed in a log-log
plot. The data represent
At the critical point
an average over 1000
<k>=1
systems of sizes
The dashed line has a
slope of

Derivation in Newman, 2010
I: II: III: IV:
Subcritical Critical Supercritical Connected
<k> < 1 <k> = 1 <k> > 1 <k> > ln N
<k>
N=100
<k>=0.5 <k>=1 <k>=3 <k>=5

I:
Subcritical
<k> < 1
p < pc=1/N
<k>
No giant component.
N-L isolated clusters, cluster size distribution is exponential
The largest cluster is a tree, its size ~ ln N

II:
Critical
<k> = 1
p=pc=1/N
<k>
Unique giant component: NG~ N2/3

contains a vanishing fraction of all nodes, N G/N~N-1/3 A jump in the cluster size:
Small components are trees, GC has loops. N=1,000  ln N~ 6.9; N2/3~95
N=7 109  ln N~ 22; N2/3~3,659,250
Cluster size distribution: p(s)~s-3/2
III:
Supercritical
<k> > 1
p > pc=1/N <k>=3
<k>
Unique giant component: NG~ (p-pc)N

GC has loops.
Cluster size distribution: exponential

IV:
Connected
<k> > ln N
p > (ln N)/N
<k>=5
<k>
Only one cluster: NG=N

GC is dense.
Cluster size distribution: None

IV:
Connected
<k> > ln N
p > (ln N)/N
The probability that a node does not connect to the giant component is
(1-p)NG~(1-p)N
The expected number of such nodes is:
For a sufficiently large p we are left with only one disconnected node, i.e.
C=1.

I: II: III: IV:
Subcritical Critical Supercritical Connected
<k> < 1 <k> = 1 <k> > 1 <k> > ln N
<k>
N=100
<k>=0.5 <k>=1 <k>=3 <k>=5

CLUSTERING COEFFICIENT
ni is the no. of
connections among
the ki nodes
Since edges are independent and have the same probability p,
This is valid for random

The clustering coefficient of random graphs is small. networks only, with
arbitrary degree
For fixed degree C decreases with the system size N. distribution

13.47 from Newman 2010
Erdös-Rényi MODEL (1960)
•Degree distribution
Binomial, Poisson (exponential tails)
•Clustering coefficient
Vanishing for large network sizes
•Average distance among nodes

Logarithmically small

Thank you for
your attentions!

SoICT-Eng - ProbComp - Lec 9 - Random Network Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SoICT-Eng - ProbComp - Lec 9 - Random Network Models

Uploaded by

Copyright:

Available Formats

Models & Algorithms

• Markov chain and basic concepts

Probability for Computing 3

• Average degree is:

As the network size increases, the distribution becomes increasingly

• For large n, this can be approximated with

where z is the average degree

• Clustering coefficient, C, of ER is low (for low p)

• C=p, since probability p of connecting any two

• However, many real world networks have high

• Real networks also have small average diameters

• 2 global network properties are matched by ER-DD

• A popular one is:

• M matrix (constant p)  Erdos Renyi model

• Can handle directed networks (M is not symmetric in this case)

Pál Erdös Alfréd Rényi

Erdös-Rényi model (1960)

Connect with probability p

Network Science: Random Graphs 2012

A random graph is a graph of N labeled nodes where each

We will call is G(N, p).

Network Science: Random Graphs 2012

Network Science: Random Graphs 2012

Network Science: Random Graphs 2012

N and p do not uniquely define

The probability to form a particular graph G(N,p)

Network Science: Random Graphs 2012

P(L): the probability to have exactly L links in a network of N nodes and

Number of different ways we can

Network Science: Random Graphs 2012

There is a faster way using generating functions, see:

P(L): the probability to have a network of

•The average number of links <L> in a random

•The standard deviation

Network Science: Random Graphs 2012

As the network size increases, the distribution becomes increasingly

or large N and small k, we can use the following approximations:

Network Science: Random Graphs 2012

k Network Science: Random Graphs 2012

Exact Result Large N limit

Network Science: Random Graphs 2012

What does it mean? Continuum formalism:

If we consider a network with average degree <k> then the probability to

For example, with <k>=10,

What does it mean? Discrete formalism:

According to sociological research, for a typical individual k ~1,000

The probability to find an individual with degree k>2,000 is 10 -27.

Network Science: Random Graphs 2012

HOW TO TAKE PART IN THIS STUDY

2. DETACH ONE POSTCARD. FILL IT AND RETURN IT TO HARVARD UNIVERSITY. No stamp

Network Science: Random Graphs 2012

Random graphs tend to have a tree-like topology (???) with almost

• nr. of first neighbors:

• estimate maximum distance:

d for the world = log (7 billion) / log

What happens when vary the parameter p?

Network Science: Random Graphs 2012

disconnected nodes  NETWORK.

Case A: node i does not connect to node j,

Case B: node i connects to j, but j is not connected to the GC:

Total probability that i is not part of the GC via node j is:

Network Science: Random Graphs 2012

Taking an exponential of both sides we obtain