0% found this document useful (0 votes)
36 views66 pages

05&6RandomGraphs & Homophily

The document covers fundamental concepts in social network analysis, including nodes, edges, and the strength of weak ties, emphasizing how weak ties can facilitate information diffusion. It introduces small-world networks, characterized by high clustering and short path lengths, and contrasts random and regular graphs. Additionally, it discusses clustering coefficients and graph diameter, providing examples and formulas relevant to these concepts.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views66 pages

05&6RandomGraphs & Homophily

The document covers fundamental concepts in social network analysis, including nodes, edges, and the strength of weak ties, emphasizing how weak ties can facilitate information diffusion. It introduces small-world networks, characterized by high clustering and short path lengths, and contrasts random and regular graphs. Additionally, it discusses clustering coefficients and graph diameter, providing examples and formulas relevant to these concepts.

Uploaded by

Husein Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Random Graphs

Social Network Analysis,


Lecture 5
Instructor: Dr. VITTAPU
Some concepts
• Before we discuss “the strength of weak ties” and “small worlds”, let’s just go over some
basic concepts.

• A node or vertex is an individual unit in the graph or system. (If it is a network of


legislators, then each node represents a legislator).

• A graph or system or network is a set of units that may be (but are not necessarily)
connected to each other.
• An “edge” is a connection or tie between two nodes.

• A neighborhood N for a vertex or node is the set of its immediately connected nodes.

• Degree: The degree ki of a vertex or node is the number of other nodes in its
neighborhood.
• In an undirected graph or network, the edges are reciprocal—so if A is connected to B,
B is by definition connected to A.

• In a directed graph or network, the edges are not necessarily reciprocal—A may be
connected to B, but B may not be connected to A (think of a graph with arrows
indicating direction of the edges.)

• Okay, now let’s discuss the meaning of the “strength of weak ties”....
The Strength of Weak Ties
• Granovetter’s “The Strength of Weak Ties”
argued that allowed an individual to reach
a higher number of other individuals.
• Granovetter observed that the presence of
weak ties often reduced path lengths
(distance) between any two individuals—
which led to quicker diffusion of
information.
Small Worlds---Intro
• Next, let’s consider the related concept of “small worlds”, another
concept that has emerged in network analysis.

• But for some background, let’s discuss some different possible types of
graphs, plus the concepts of “clustering” and “diameter”.

• Two possible graphs (almost at opposite ends of a spectrum) are


“random graphs” and “regular graphs”.
• A “small world” can be thought of in-between a random and a regular
graph.
BackgroundRandom Graphs
• In a random graph, each pair of
vertices i, j has a connecting edge
with an independent probability of p

• This graph has 16 nodes, 120


possible connections, and 19 actual
connections—about a 1/7
probability than any two nodes will
be connected to each other.

• In a random graph, the presence of a


connection between A and B as well
as a connection between B and C will
not influence the probability of a
connection between A and C.
BackgroundRegular Graphs
• A regular graph is a network
where each node has the
same number (k) of
neighbors (that is, each node
or vertex has degree k).

• A k-degree graph is seen at


the left. k = 3 (each node is
connected to three other
nodes—that is, there are
three nodes in each node’s
neighborhood.)
Clustering Coefficients
• Clustering Coefficients were introduced by Watts & Strogatz in 1998,
as a way to measure how close a node (or vertex) and its neighbors are
from being a clique, or a complete graph within a larger graph or
network.
• The clustering coefficient of a node is the number of actual connections
across the neighbors of a particular node, as a percentage of possible
connections. The clustering coefficient for the entire system is the
average of the clustering coefficient for each node.
• This formula (on the right) is for the total number of possible
connections for an undirected matrix. (Think in terms of a matrix—
the total number of possible connections is half of the total # of cells,
after subtracting the diagonal.)
A Very Simple Example
• Four legislators—whether
A B C D
they serve on at least one
committee together.
A 1 0 1
• This is an undirected matrix
B 1 1 0 —if legislator A serves with
legislator B on a committee,
then legislator B serves with
C 0 1 0 legislator A on a committee.

D 1 0 0
A Very Simple Example

A B C D
• The possible number of
connections in this
matrix is 6.
A 1 0 1

B 1 1 0
• K=4 legislators.

C 0 1 0 • ½ * k * (k-1) = ½ * 4 * 3
• =6
D 1 0 0
A Very Simple Example
• The clustering coefficient for
A B C D
legislator A is 2/3 – s/he is
“connected to” two out of a
A 1 0 1 possible 3 other legislators.
The same is true of legislator
B.
B 1 1 0

• Legislators C and D each have


C 0 1 0 a clustering coefficient of 1/3.

D 1 0 0
A Very Simple Example

A B C D
• The average of those
four clustering
coefficients is .5.
A 1 0 1

B 1 1 0
• And note that across the
entire network, .5 (3 of
6) of all possible
C 0 1 0
connections are actually
made.
D 1 0 0
Clustering Coefficients
• This is the formula the
clustering coefficient for
the system. N=number of
nodes. C=clustering
coefficient for each node i.
Clustering Coefficient
• Note that the clustering coefficient for
undirected graphs is a bit different
than the clustering coefficient for
directed graphs—there are twice as
many possible ties, a non-reciprocated
edge counts for one tie, and a
reciprocated edge counts for two ties.
Clustering Coefficient
• So, in an undirected graph, if a node is connected to four
other nodes—and among those four, only the first and the
third are connected—the clustering coefficient is 1/6. (1
actual connection out of 6 possible connections.)

• Clustering refers to how connected your neighbors are to


each other (relative to how connected they could be)

• Now let’s talk about network diameter.


Graph Diameter

• The graph diameter is the “longest shortest path” between any two
vertices or nodes.
• The graphs above have diameters of 3, 4, 5, and 7, respectively.
• The graph on the right has a relatively large diameter, because it takes (at
most) 7 edges to travel between one node to another. (the two nodes at
the very bottom of the network are not very closely connected)
It’s a Small World, After All
• This is essentially the “six degrees of separation” idea—that the
number of “steps” or “links” needed to connect any one arbitrarily
chosen individual to any other is low (that is, networks have lower
diameters than one would expect.)

• In Milgram’s 1967 “small world experiment”, individuals were asked to


reach a particular target individual by passing a message along a chain
of acquaintances. For successful chains, the average # of
intermediaries needed was 5 (that is, 6 steps)—although note that
most chains were not completed.
Small Worlds
• Brian Uzzi has focused on the importance of “small worlds”– networks
that are both highly locally clustered and have short path lengths.
• A graph is small-world if its average clustering coefficient is
significantly higher than a random graph constructed on the same
vertex set (with the same number of edges), and if the graph has a
short mean-shortest path length.

• These two characteristics are often mutually exclusive in random


graphs—but do describe a wide variety of real-life situations.
Small Worlds
• The left is an example of a
small-world graph.

• Note that it is highly clustered


—a higher proportion (than
one would expect randomly) of
each node’s neighbors are
actually connected to each
other.

• It also has a small diameter,


relative to the number of
nodes.
Random graphs!
• Background: Probability Theory
• Static random models
– Poisson random networks
– “Small World” model
– The configuration model
• Calculating metrics
Background: Random Variables
• RVs: Represent probabilistic events
– E.g., coin flip X
X: {H, T}→{p, (1-p)}
heads w/ tails w/ prob
prob p (1-p)

– Options should add to 1


• Instances of coin flips = samples
• How flips play out = distribution
Background: Prob. Mass Function
• Coin toss: Uniform • Single dice: Uniform

• Coin toss: Weighted • Single dice: Weighted

How to know if it’s fair? Sample many times!


Background: Binomial Distr.
• Out of n coin flips, how many H?
sampled 100
times

Number of Heads, out of 10 flips


• Prob n heads:
Written:
 n  mH  T np H 1  p T m X ~ B n, p 

 n  Hp 1  p  X is distr. according

  to a binomial distr.
Background: Binomial Coeff.
• How many ways are there to get n
out of 3 heads?
• Calculate:
8 possible
results n  n!
  
HHH 3   k  k!(n  k )!
3 Heads: 3  = 1 way
HHT  
HTH 3 • Pascal’s triangle
2 Heads:   = 3 ways 1
HTT 2 1 1
THH 3  1 2 1
1 Heads:   = 3 ways 1 3 3 1
THT 1  1 4 6 4 1
TTH 3 1 5 10 10 5 1
0 Heads:   = 1 way 1 6 15 20 15 6 1
TTT 0  …
GROUP DISCUSSION:

Y ~ Mult?
A 6-sided dice is a multinomial
distribution.
How will this differ from
binomial?
X ~ B n, p 
Erdös and Rényi, 1959-1961; MOJ 1.2.3, 4.1.1

POISSON RANDOM NETWORKS


Binomial Link Formation
Rényi, 1959-1961
Erdös and

• Nodes n = {1,…, n}
• i, j links form with 6 1

probability p
5 2
No link: (1-p)
• Independent 4 3

probabilities
Probability of Node degree d

• For 1 node i (out of n nodes):


• Links form independently

 n  1 d
P deg(i ) d    p 1  p 
n  1 d

→ binomial!
 d 

Note: Not exactly independent, since 1 link


affects 2 nodes. But as n increases, this
Poisson Approximation see MOJ 1.2.3

• Usually: large n, small p


n  1 p 
 ( n  1) p d
e
P deg(i ) d  
d!
• Poisson: normally, # events in fixed
interval, with events independent

e  k
P k events  
k!
Examples: Poisson model

Images MOJ 1.2


Watts & Strogatz, 1998ff; MOJ 4.1.2, E&K 20.1-20.2

SMALL WORLD MODEL


Why an alternate model?
• Real life: high clustering, low avg.
path length/diameter
– E.g., me: CC=0.556, APL=2.93, d=10
• Poisson: lacks high clustering
– Any link formation prob: p
– Triadic closure? Etc?
• Low avg. path length/diameter
Watts-Strogatz: Starting out

each connected
to 2 neighbors
each side

High clustering, high diameter/APL


Watts-Strogatz: Rewiring

High clustering, lower diameter/APL


high diameter/APL
Watts-Strogatz: Summary
• Model “small world phenomenon”
– high clustering
– low distance
• Start w/ highly clustered network
• Rewire some links

Explore in Assignment 3!
Bender & Canfield 1978; MOJ 4.1.4

THE CONFIGURATION MODEL


Degr. Distributions → Sequences

• Node i with degree di, list di times


– Ex: 1112233445566677788999
7

8 6 5

9 10

3 1 4

2
Add links
• Randomly select pairs; link & remove
1112233445566677788999
duplicate link: multi-graph!
Links: 3,7 6,6 8,9 3,7
self-link 1,4 1,5 2,4 6,8 7

5,7 1,9 2,9 8 6 5

Despite self-links and 9 10

duplicate links, new


3 1 4
graph with similar
degree distribution! 2
Modularity, revisited
• Stopping criterion for partitioning
Expected random edges
(Configuration model) the communities
1v,u = 1 if there within a community that v and u are
is an edge v, u assigned to
1  deg v  deg u  
Q  1v ,u    comm v , comm u 
2 m v ,u  2m 
count edges 1 if equal
within a
community divide over
all edges m
Modularity interpretation
• Values: [-0.5, 1.0]
- (neg): communities cut across links
more than random
0: communities capture nothing
+(pos): communities have more links
than you’d expect at random
• Criterion: stop partitioning when
modularity drops below threshold
CALCULATING METRICS ON RANDOM NETWORKS

Metrics will vary!


Homophily
 The principle that we cannot us to parts of the NW
that would be far away.
 Homophily creates many triangles, while the weak
ties still produce the kind of widely branching
structure that reaches many nodes in a few steps.
 Creates a NW by giving each node two kinds of links
1.Those explainable purely by homophily
2. Those that constitute weak ties.
Homophily!
• Before: Networks as graphs, Graph
theory, Random graphs
• Today: Homophily
– Mechanisms
– Affiliation
(b) A network built from local structure and random edges

o a highly clustered network (such as the grid), with a small number of random links added in

o homophily (the principle that we connect to others who are like ourselves) and
weak ties (the links to acquaintances that connect us to parts of the network
that would otherwise be far away).
o Homophily creates many triangles, while the weak ties still produce the kind of
widely branching structure that reaches many nodes in a few steps.
Review: Community structure
• Dfn: A community structure, ∏, is
defined on nodes/vertices

a collection of disjoint subsets of V


(i.e., a partition) whose union is V.
Grouping by similarity…
• Homophily: we tend to be similar
to our friends
– Plato: “similarity begets friendship”
– Aristotle: people “love those who are
like themselves”
• Intrinsic causes, e.g., triadic closure
• Extrinsic causes, e.g., affiliation
Figure 20.2(b) gives a schematic picture of the
resulting network — a hybrid structure
consisting of a small amount of randomness
(the weak ties) sprinkled onto an underlying
structured pattern (the homophilous links).
Watts and Strogatz observe first that the
network has many triangles: any two
neighbouring nodes (or nearby nodes) will
have many common friends, where their
neighbourhoods of radius r overlap, and this
produces many triangles.
But they also find that there are — with high
probability — very short paths connecting
every pair of nodes in the network. Roughly,
the argument is as follows.
The general conclusions of the Watts- strogatz model still follow even if only a
small fraction of the nodes on the grid each have a single random link.
two nodes are one grid step apart if they are directly adjacent to each other in
either the horizontal or vertical direction.
Measuring homophily
• Pick a characteristic, e.g., ethnicity
• Does that kind of similarity matter?
– … links more likely than random?
• Possible results:
– Not significant
– Homophily (signif. more likely links)
– Inverse homophily (signif. less likely)
Example: Childhood friendships
• Edges:
– P(M:M edge) = p2
– P(F:F edge) = q2
– P(M:F edge) = 2pq = male, proportion p

• Test: if heterogeneous = female, proportion q

edges are significantly


less than 2pq, we have
homophily!
INDIVIDUAL EXERCISE:

Does this graph show


homophily by gender?

= male, proportion p

= female, proportion q
Schelling Model

MODEL OF SPATIAL SEGREGATION


Schelling Model (1972, 1978)

– Simpl. 1: people=agents in 2 groups


– Simpl. 2: space=grid (8 neighbors)
– Simpl. 3: time in rounds 1 2 3

• Homophily constraint 4
6 7
5
8

– Agents need threshold in-group


neighbors
– Else “discontent” – will move in next
round
* “models are always wrong, but sometimes they can be useful”
Local homophily → Global effect
• 150x150 grid, 100k blue, 100k red,
• Threshold = 3
Local homophily → Global effect
• 150x150 grid, 100k blue, 100k red,
• Threshold = 4

Individual desires
for homophily have
global effect!
MECHANISMS OF HOMOPHILY
GROUP DISCUSSION:

Is there homophily because


(a) people form friendships
with those like them? selection

(b) people change & become


like those around them?
socialization/social influence
Selection v. Social Influence
• Selecting similar friends
– Characteristics shape network
– Intentional: e.g., at a social gathering
– Systemic: e.g., physical neighborhood
• Socialization/social influence
– Network shapes characteristics
– Nodes are mutable (changeable)
What’s at play in Homophily?
• Nuanced… unfolding over time,
which characteristic, etc.
– How to study this?
• Longitudinal studies! How did
behavior change…
– before/after joining a “group”
– as neighbors’ behavior changed
Example: Homophily & Health?
• Obesity study (Christakis & Fowler 2007)

– 12,000 people, 32 year


– Obese vs. non-Obese clusters (homophily!)
• Hypotheses for cause
– Selection: friendships form between similar
obesity status
– Socialization: changes in obesity status
influence friends’ obesity status
– Confounding: Homophily explained by other
characteristics
Example: Wikipedia editors
• Similarity: overlap in
editing topics (y-axis)
• Time 0: editors message
each other
GROUP DISCUSSION:
Situation: You’re a public health officer who notes that the use of
illegal drugs has increased among some clusters of people (that
exhibit homophily) in your area.
So you design a health intervention to stop the use of illegal drugs.
Because of your limited resources, you’re directly interacting with
relatively few people, then hoping for those people to change their
behavior and then be “change” agents.
Question: Is the health intervention more likely to be successful if the
homophily of drug users is based on
(a) Selection
(b) Socialization
Why?
Affiliation
Extrinsic node properties – Foci
• Intrinsic (to graph) only – incomplete
picture!
• Foci (focal points of activity) are
“social, psychological, legal, or physical
entit[ies] around which joint activities
are organized”
– Not on typical Social Network graph
– Subsets of V
– Many/overlapping possible!
Figure 20.11: When nodes belong to multiple
foci, we can define the social distance
between two nodes to be the smallest focus
that contains both of them. In the figure, the
foci are represented by ovals; the node
labeled v belongs to five foci of sizes 2, 3, 5,
7, and 9 (with the largest focus containing all
the nodes shown).

For example, two people may both work for the same thousand-person company
and live in the same million-person city, but it is the fact that they both belong to
the same twenty-person literacy tutoring organization that makes it most
probable they know each other.
Thus, a natural way to define the social distance between two people is to
declare it to be the size of the smallest focus that includes both of them.
Affiliation networks G=(V1, V2, E),
where E=(v1, v2)

Bipartite graphs:
only connections
across 2 types of
nodes

nodes foci
(here: people) (here: company
board of
directors)

What about
node-node
links?
member-of
Social-Affiliation Networks
• Affiliation networks: had 2 types of nodes
• Social-Affiliation networks: also 2 types of edges
3 types of Closure
• Triadic closure:
person => person:person
• Focal closure:
focus => person:person
• Membership closure:
person => focus:person
Figure : Simulation of decentralized search in the grid-based model with
clustering exponent q. Each point is the average of 1000 runs on (a slight
variant of) a grid with 400 million nodes. The delivery time is best in the
vicinity of exponent q = 2, as expected; but even with this number of nodes, the
delivery time is comparable over the range between 1.5 and 2 [248].

You might also like