Professional Documents
Culture Documents
City Population:
– A few metropolitan areas are densely populated
– Most cities have an average population size
Social Media:
– We observe the same phenomenon regularly when measuring popularity or
interestingness for entities.
• The Pareto principle
(80–20 rule): 80% of the effects come from 20% of the causes
Degree Distribution
Site Popularity:
– Many sites are visited less than a 1,000 times a month
– A few are visited more than a million times daily
User Activity:
– Social media users are often active on a few sites
– Some individuals are active on hundreds of sites
Product Price:
– There are exponentially more modestly priced products for sale compared to
expensive ones.
Friendships:
– Many individuals have a few friends and a handful of users have thousands of
friends
In all the provided observations, the distribution of values
follows a power-law distribution
Power-Law Degree Distribution
• When the frequency of an event changes as a power of an
attribute
– the frequency follows a power-law
• Let k denote the degree of a node . Let pk denote the fraction of
individuals with degree k, (i.e. frequency of observing k / |V| ).
Then, according to the power-law distribution we have
pk = ak-b
– where b is the power-law exponent
– a is the power-law intercept
• A power-law distribution
• Small occurrences : common
• Large instances: extremely rare
Power-Law Distribution: Examples
• Call networks:
•–
Thefraction of telephone numbers that receive calls per day is roughly
proportional to
• Book Purchasing:
– The fraction of books that are bought by people is roughly proportional to
• Scientific Papers:
– The fraction of scientific papers that receive citations in total is roughly
proportional to
• Social Networks:
– The fraction of users that have in-degrees of is roughly proportional to
Power-law Distribution: An Elementary Test
• test
To whether a network exhibits a power-law distribution
1. Pick a popularity measure and compute it for the whole network
– Example: number of friends for all nodes
3. Plot a log-log graph, where the -axis represents and the -axis
represents
• Average Number of
Intermediate people
is 5.2
Erdös Number
• Erdös Number: Number of links
required to connect scholars to
Erdös, via co-authorship papers
Similarities:
– In the limit (when is large), both and models act similarly
• The expected number of edges in is
• We can set
– Both models act the same because they contain the same number of edges
Differences:
– The model contains a fixed number of edges
– The model is likely to contain none or all possible edges
G(n, p) – a few Mathematical properties
Proof:
– A node can be connected to at most nodes (via edges)
– All edges are selected independently with probability
– Therefore, on an average, edges are selected
• or equivalently,
G(n, p) - Expected Number of Edges
•Proposition:
The expected number of edges in is
Proof:
– Since edges are selected independently, and we
have a maximum edges, the expected number of
edges is
G(n, p) - Probability of observing edges
•Proposition:
Given the model, the probability
of observing edges is
At :
– a giant component appears
– diameter peaks
– path lengths are long
For :
– almost all nodes
connected
– diameter shrinks
– path lengths shorten
Why ?
•
It is proven that in random graphs phase transition occurs when c = 1; that is, p = 1 /
(n - 1)
Proposition : In random graphs, phase transition happens at c = 1.
Proof: Consider a random graph with expected node degree .
• In this graph,
– Consider any connected set of nodes ;
– Let denote the complement set; and
– Assume
• For any node in
– If we move one hop away from ,
we visit approximately nodes.
• If we move one hop away from nodes in ,
– we visit approximately nodes.
• If is small, the nodes in only visit nodes in and when moving one hop away
from , the set of nodes guaranteed to be connected gets larger by a factor .
• The connected set of visited nodes gets c 2 times larger when moving two hops and so on.
• In the limit, if we want this connected component to become the largest
component, then after traveling hops, its size must grow and we must have
Properties of Random Graphs - Degree Distribution
• When
computing degree distribution, we estimate
the probability of observing , for node
• Proposition : For a graph generated by , node v
has degree d, d <= n-1, with probability
Poison
Distribution
2 Phase Transition (Connectivity)
nd
•
Proof.
– The global clustering coefficient defines the probability of two neighbors of the
same node being connected.
– In a random graph, for any two nodes, it is
• Equal to the generation probability that determines the probability of two nodes getting
connected
• In random graphs, the expected local clustering coefficient is equivalent to
the global clustering coefficient .
• By appropriately selecting p, we can generate networks with a high
clustering coefficient.
• Further, selecting a large p is undesirable because doing so will generate a
very dense graph, which is unrealistic, as in the real-world, networks are
often sparse.
Average Path Length
•
Proposition: The average path length l in a random graph is
Proof:
• Lthe expected diameter size of the graph
• Starting with any node and the expected degree ,
– one can visit approximately nodes by traveling one edge
– nodes by traveling edges, and
– nodes by traveling diameter number of edges
• After this step, almost all nodes should be visited. In this case, we have
• In random graphs, the expected diameter size tends to the
average path length in the limit. Using this fact, we have
Modeling with Random Graphs
•
• Given
a real-world network, we can simulate it using a random graph model.
• Compute the average degree in the real-world graph
• Compute using
• Generate the random graph G(n,p) using and the number of nodes in the
given network.
• Random Graph:
• Clustering Coefficient (low):
• Average Path Length (ok!) :
Clustering Coefficient for Small-world model
•
• The Clustering Coefficient (CC) for a small-world network is a value between CC of
Regular Lattice and CC of Random Graph, depending on
• Commonly, clustering coefficient for a regular lattice is represented using C(0), and
the clustering coefficient for a small-world model with = p is represented as C(p).
• The relation between the two values can be computed analytically; it has been
proven that
• The intuition behind this relation is that because the clustering coefficient
enumerates the number of closed triads in a graph,
– we are interested in triads that are still left connected after the rewiring process.
• For a triad to stay connected, all three edges must not be rewired with probability
(1 - p).
• Since the process is performed independently for each edge, the probability of
observing triads is (1-p)3 times the probability of observing them in a regular lattice.
• We also need to take into account new triads that are formed by the rewiring
process; however, that probability is nominal and hence negligible.
Clustering Coefficient for Small-world model
• The
probability that a connected triple
stays connected after rewiring consists of
1. The probability that none of the 3
edges were rewired is
2. The probability that other edges were
rewired back to form a connected triple
• Very small and can be ignored
• Clustering coefficient
Regular Lattice vs. Random Graph, What
happens in Between?
• Regular Lattice:
•
• Clustering Coefficient (high):
• Average Path Length (high):
• Random Graph:
• Clustering Coefficient (low):
• Average Path Length (ok!) :
• Does smaller average path length mean smaller clustering coefficient?
• Does larger average path length mean larger clustering coefficient?
• Numerical simulation:
• We increase (i.e., ) from 0 to 1
• Assume
• is the average path length of the regular lattice
• is the clustering coefficient of the regular lattice
• For any , denotes the average path length of the small-world graph and denotes its
clustering coefficient
• Observations:
• Fast decrease of average distance
• Slow decrease in clustering coefficient
Change in Clustering Coefficient /Avg. Path Length
•
•
The graph depicts the value of C(p) /C(0) for different values of p.
• As shown in the figure, the value for C(p) stays high until p reaches 0.1, (10% rewired) and then decreases
rapidly to a value around zero.
• Since models with a high clustering coefficient and small average path length are desired, values in range
0.01 <= = p <=0.1 are preferred.
Modeling with the Small-World Model
• Given a real-world network in which average
degree is and clustering coefficient ,
• we set and determine using equation
• The small-world model is still incapable of generating a realistic degree distribution in the
simulated graph.
• To generate scale-free networks (i.e., with a power-law degree distribution), the preferential
attachment model is introduced.
Small-World Model
• SW networks have:
– High clustering coefficients – introduced by
“ring regularity”
– Large average diameters of regular lattices –
fixed by randomly re-wiring a small percentage
of edges
Preferential Attachment Model
•• Main
assumption: • Distribution of wealth
– When a new user joins the in the society:
network, the probability of
connecting to existing nodes is – The rich get richer
proportional to existing nodes’ • The higher the node’s degree, the
degrees higher the probability of new
– For the new node nodes getting connected to it.
• Connect to a random node with • Barabási, Albert-László, and Réka
probability Albert. "Emergence of scaling in
random
networks." science 286.5439
𝑃(2)=1/7
𝑃(4)=0
1 5
𝑃(3)=2
/7
2 3 4
Preferential Attachment - Algorithm
• The algorithm starts with a graph containing a small set of nodes m0 and
then adds new nodes one at a time.
• Each new node gets to connect to m <= m0 other nodes, and each
connection to existing node vi depends on the degree of vi
• The model incorporates two ingredients to achieve a scale-free network
(1) The growth element and
(2) The preferential attachment element
• The growth is realized by adding nodes as time goes by.
• The preferential attachment is realized by connecting to node vi based on
its degree probability,
Constructing Scale-free Networks
Properties of the Preferential Attachment
Model
• Degree Distribution:
• Clustering Coefficient: