Professional Documents
Culture Documents
MS4252
Social Network Analysis
1
Chapter 1
INTRODUCTION
2
Social Network
n Social Media
Ø Infiltrated and changed the way minions of people interact and
communicate
n Social Networking
Ø Over one billion users on Facebook alone and billions more
accounts across thousands of social networking sites online
3
Understanding Social Network
n Understanding social network
Ø Both those explicitly formed on social networking website and
those implicitly formed in many other types of social media
Has taken on its new importance in light of this
astounding popularity
https://www.pnas.org/doi/10.1073/pnas.1801429116 5
Social Network Analysis
n Researchers at Twitter studied the flow of the
information out of Japan after the 2011 earthquake.
6
Social Network Analysis
n Websites can support participation and contributions
from many types of users.
8
Social Media for Public Sector
n Public sector users may be individuals, such as elected
officials; local organizations, such as schools or libraries,
or government agencies.
n Major Questions:
Ø How are public-sector users taking advantage of the social
media?
9
Analyzing Individual Users
n Three types of Social Media Use
Ø Broadcast / Sending Information
Ø Conversation Interaction
10
Analyzing Individual Users
n How an individual organization uses social media???
11
Case Study: Solve an attempted child abduction
12
Outreach
n Philadelphia Police posted video and images of the
suspect at Youtube, Twitter, Facebook.
13
Strategies from the Philadelphia
Police
n Use social media mostly for broadcast
Ø They do not tweet too often (less that 10 times a day) so that
they do not overwhelm people with information.
14
Case Study: Predicting Elections and
Astroturfing
n Analyzing behavior from a
large and diverse set of social
media users to understand
public opinions about public
issue.
15
Predicting Elections
n Result for predicting elections are mixed
n Astroturfing
Ø Creating fake grassroots campaigns; setting up fake accounts to
simulate grassroots support.
17
Structure and Astroturfing detection
n A tool called `Truthy’ to detect astroturfing content on
Twitter (Ratkiewicz et. al., 2011)
Ø Dark edges indicate retweets and light edges indicate mentions.
Ø Graphs (a) and (b) are astroturfing accounts, while graphs (c)
and (d) are real accounts.
18
Chapter 15
19
Measuring Success
n The return on investment (ROI) is computed as
(Income-cost)/Cost
20
Measuring Success
n Other ways to measure success in social media
1. Counts
2. Social Sharing
3. Engagement Rate
4. Interaction
5. Referral Rates
6. Importance and Influence of Users.
21
Measuring Success: Example
n A restaurant in Arlington, Virginia used Twitter network
of @frontpageva with about 1,400 followers to share
specials and events and to interact with followers.
22
Measuring Success: Example
n Which of their users are most influential?
n How many clusters are there?
23
Measuring Success: Example
n Top group: twitter accounts that share information
about Washington, D.C. and Arlington, Virginia events,
parties, and locations.
n The lower cluster: Washington Capitals fans and players
because the restaurant is across the street from the
capitals’ practice facility.
24
Success Stories
n Will it blend? Video Campaign
n https://www.youtube.com/Blendtec
Ø Strange things are blended to show the power of the blender
Ø After beginning campaign, Blendtec saw a 700% increase in
sales
Ø Viral sharing made people more aware of their products
25
Success Stories
n Zappos Customer Service @zappos_service
26
Chapter 2
27
2.1 Basic Network Concept
Network’s Structure
n A person is considered as a Node or Vertex;
between nodes;
Ø Describe something about the relationship between the people
(e.g., sister, mother, cousin)
n Graph or Network is a set of Nodes and Edges.
Chuck Node/Vertex
Alice
29
Undirected or Directed Edges
Mutual Relationship
Alice Chuck
Bob
Alice
Chuck
Reciprocal Relationship
directed
Bob
30
Edges Weights
n Weighted or Valued edges
n Numerical information about a relationship
n The strength of a relationship
Ø But it can come from a variety of sources and indicate many
things, e.g. the number of times, how many connections?
Alice Chuck
Bob
31
Example – Email
n Build a network of email communication (a directed
network)
Ø One-way
u Alice may send an email to Bob without receiving a reply.
Alice
Bob Chuck
32
Directed Network
n The number of possible edges is double that of
undirected graph
n In an undirected graph,
Ø There can be only one edge between any two nodes
n In a directed one,
Ø there are two possible edges
B C 33
Example – Apollo 13 Movie
n Main Actors in Apollo
13 movie:
Ø Actors are nodes
34
Example- Network
n New network that connects them if they were in an
additional movie together (except Apollo 13)
35
Example – Network
n There are labels on the edges that say something about
the relationship between the people.
36
Directed or Undirected?
n Can it be directed network?
37
Weighted Edges
n Weight indicates how many movies the actors have
been in together.
38
Adjacency Lists
n An adjacent list, also called an edge list,
Ø is one of the most basic and frequently used representations of
a network.
Ø Each edge in the network is indicated by listing the pair of
nodes that are connected.
n The adjacent list for the Apollo 13 network:
Actors Actors
Tom Hanks Bill Paxton
Tom Hanks Gary Sinise
Tom Hanks Kevin Bacon
Bill Paxton Gary Sinise
Gary Sinise Kevin Bacon
Gary Sinise Ed Harris 39
Adjacent List
n In an undirected network, the order of the node names
in each pair is irrelevant; the order can be reversed.
40
Adjacency list with edge weight
n This is included on the same line as the two node
names, and usually follows them.
n An edge weight is a common value to see included in an
adjacency list.
u The adjacency list for the Apollo 13 network:
42
Adjacency matrix
n First, the diagonal is all zeros because there are no
edges between a node and itself.
Ø Some network do allow for self-loops.
Ø E.g., in an email network, if a person emails himself, there could
be a link from one node to itself, and thus there could be a 1 on
the diagonal.
n In an undirected network, the matrix is symmetric.
43
Adjacency matrix
n In a directed network, the matrix will not necessarily be
symmetric.
Ø There are edges from A to C, and C to A, and from A to B, but
the reciprocal edge from B to A is absent.
Ø Thus, we only record a 1 for the A-B edge, and record a 0 for
the B-A edge.
B
C
44
Adjacency matrix with weight
n This scheme can be altered to show the weight of an
edge as well.
45
2.2 Basic Network Structure and
Properties
n Subnetwork
46
Subnetwork - Singleton
n Some of the simplest subnetworks are
singletons.
Ø While these nodes are not very “social,” they are still
part of a social network.
Ø these represent people who signed up for an account to
access some part of the site other than the social
networking features, or people who signed up but never
actively participated.
n Node A is a singleton.
47
Subnetwork – dyad and triad
n Two nodes and their relationship, it is called a
dyad
Ø Node A and B is a not connected dyad.
Ø Even though, the relationship between A and B are not
connected, but pair of nodes could also be called a
dyad.
Ø a connected dyad between B and C
n A group of three nodes is called a triad.
Ø a fully connected triad between D, E, and F
48
Cliques
n One of particular interest is whether or not all nodes in
a group are connected to one another. It is called a
clique.
Ø e.g. a group of people who are all strongly connected and tend
to talk mostly to one another
Ø (e.g., “Alice is part of a clique at school”).
A
Ø Clique E
Ø A, C, D D
Ø F, H, B, G C
Ø All nodes must be connected G
F
to all other nodes in the clique
Ø Tightly connected group B
H 49
Cliques
n Find the cliques
50
Clusters
n A cluster is a group of nodes that are tightly connected
Ø ‘tightly’ means they are more tightly connected than the
network as a whole
51
Egocentric Networks
n This is a network we pull out by selecting a node and all
of its connections.
52
Egocentric Network – 1.5 degree
n A 1.5-degree egocentric
network of D
n A 1.5-degree egocentric
network of D with D
excluded
53
Egocentric Networks – 2 degree
n A 2-degree egocentric network of D
54
Path and Connectedness
The connections between nodes and measures of their
closeness are important characteristics.
n Paths:
Ø A Path is a series of steps from one node to another node in a
network
u Path is not an edge.
u A series of edges through nodes
55
Paths
n A path connecting node
M to node C by the
following steps:
Ø M-P-F-O-C
Ø M-L-K-J-P-F-Q-D-C
56
Shortest Path: Length
n The length of path = the count of the number of edges
in it.
n Shortest paths will be an important measure
Ø Sometimes called geodesic distances.
Ø To show how closely connected to the nodes
57
Connectedness
n Connectedness
Ø Paths are used to determine a graph property
n Two nodes in a graph are called connected if there is a
path between then in the network.
n An entire graph/network is connected if there is a path
between every pair of nodes.
58
Connectedness – strongly connected
n If a graph is not connected, it may have subgraphs that
are connected.
A B
60
Weakly connected
n If a path can not be found between all pairs of nodes
using the direction of the edges,
Ø but paths can be found if the directed edges are treated as
undirected,
Ø the graph is called weakly connected.
61
Hubs and Bridges
n A bridge is an edge that connect otherwise separate
groups of nodes in the network,
Ø If removed, will increase the number of the connected
components in a graph.
Ø E.g., an edge between nodes P and F is a bridge.
62
Hubs and Bridges
n Hubs are important nodes rather than edges.
n Hubs are used to refer to the most connected nodes in
the network.
Ø In example below, node P would be a hub because it has many
connections to other nodes.
63
Chapter 3
64
3.1 Describing nodes and edges
n Degree
Ø How nodes are connected to one another and to the network as
a whole
Ø The degree of a node is the number of edges connected to that
node.
n In undirected network
Ø The degree of a node = the total number of edges connected to
it
n In directed network, two measures of degree: in-degree
and out degree.
Ø In-degree: the # of edges coming to the node
Ø Out-degree: the # of edges originating from the node going
outward to other nodes
Ø The degree of ad node = in-degree+out-degree 65
Degree: example
66
Centrality
n Centrality
Ø One of the core principles of network analysis
Ø Measures how `central’ a node is in the network
Ø Used as an estimate of its importance in the network
68
(1) Degree centrality
69
An extreme counter-example
n The red node has high degree centrality, but it is far
from the dense center.
70
(2) Closeness Centrality
n Measures how close a node is to all other nodes in the
network
n Calculated as the reciprocal of the average of the
shortest path length from the node to every other node
in the network.
# 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 − 1
𝐶𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 𝐶𝑒𝑛𝑡𝑟𝑎𝑙𝑖𝑡𝑦 =
𝑡𝑜𝑡𝑎𝑙 𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑙𝑒𝑛𝑔𝑡ℎ
71
(2) Closeness Centrality
Node Path Length
P 1
O 1
Q 2
C 2
D 3
R 1
…
73
(3) Betweenness Centrality
n Most widely used when analyzing social networks
Ø Measure of a node’s influence
Ø Captures how important a node is in the flow of information
from one part to the network of another
n Percentage of shortest paths that include a given node
1. Select a pair of nodes and find all the shortest paths between those
nodes.
2. Compute the fraction of those shortest paths that include node N
3. Repeat step 1 and 2 for every pair of nodes
4. Add up the fractions to obtain the betweenness centrality for Node N
74
(3) Betweenness Centrality
n Betweenness Centrality for B
75
Betweenness Centrality – directed
networks
n In directed networks, betweenness can have several
meanings.
u Alternatively, the user may have fewer followers, but connect them
to many accounts that are otherwise distant.
76
(4) Eigenvector Centrality
n Measure a node’s importance while giving consideration
to the importance of its neighbors.
Ø Sometimes used to measure a node’s influence in the network
77
(4) Eigenvector Centrality
n A variant of eigenvector centrality is at the core of
Google’s PageRank algorithm, which they use to rank
web pages.
78
(4) Eigenvector Centrality
n Eigenvector centrality:
79
Consideration/Comparison
n A node may appear highly central with one
measure but have low centrality with another.
Ø Does not mean one measure is incorrect;
Ø The interpretation of the centrality measures is left to a
human analyst.
n Centrality measures may be difficult to compare
across networks
u A very important node in a small network may have
centrality measures that would seem unimportant in a
larger network
n The measures are calculated for undirected,
unweighted network.
u When working with directed or weighted networks, these
measures require modification.
80
Describing network
n A number of measures can be used to describe the
structure of a network as a whole.
n We introduce four:
Ø Degree distribution
Ø Density
Ø Connectivity
Ø Centralization
81
Degree distribution
n Degree is used to describe individual nodes.
Ø get an idea of the degree for all the nodes in the network;
Ø show how many nodes have each possible degree.
82
Degree distribution
83
Power Law Distribution
n A lot of people who have a relatively low degree.
84
Density
n Density describes how connected a network is.
n To understand both individual nodes and the network as
a whole
n Measure the ratio of possible edges in a graph.
85
Density
n Formula
# 𝑜𝑓 𝑑𝑒𝑔𝑒𝑠
𝑑𝑒𝑛𝑠𝑖𝑡𝑦 =
# 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠
n Edges: 6
n Total possible edges: 10
n Density: 6/10 = 0.6
86
Number of possible edges
n In directed networks,
n In undirected networks,
𝑛(𝑛 − 1)
# 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 =
2
n Handshake problem?
87
Exercise
n What is the density?
88
Density - Properties
n Then network is large, and density may be low.
Ø Normally, the density may be under 10%
Ø Density in Facebook might be 0.0001 (billion of people in total)
89
Density in egocentric network
n Local clustering coefficient:
Ø Is the density of a node’s 1.5 degree egocentric network (with
the node itself excluded)
Ø E.g., A 1.5-degree egocentric network of D with D exclude.
Ø Interested just in D’s friends only.
90
Connectivity and Cohesion
n Connectivity, also known as cohesion
Ø Measures how these edges are distributed.
Ø that is, there is no longer a path from each node to every other
node.
91
Connectivity and Cohesion
n The connectivity (Cohesion) is 1, because removing
node B, C, or D would disconnect the graph.
92
Connectivity and Cohesion
n The connectivity is 2? Why?
93
Centralization
n Use the distribution of a centrality measure to
understand the network as a whole.
n Centralization of power is an often-used concept and
phrase, which relates very closely to centralization in a
network.
n E.g., betweenness centrality can represent the control
one node has in the ability of others to communicate.
Ø If many messages must pass through a particular node, that
node has the power to stop or pass on information.
Ø If a few nodes have very high betweenness, we say that the
power is centralized in those nodes.
94
Centralization
n Different centrality measures can be substituted.
Ø Centralization: take the sum of the differences in the centrality
between the most central node and every other node in the
network, and divided this by the maximum possible difference in
centrality.
n Between 0 and 1:
∑$
!"# {𝐶 𝑛 ∗ −𝐶 𝑛 }
!
𝑚𝑎𝑥 ∑$ !"# {𝐶 𝑛 ∗ −𝐶 𝑛 }
!
where 𝐶(𝑛) is the centrality of node n, 𝑛∗ is the most
central node.
95
Centralization-Example
n Consider three types of centrality.
96
It is a small world
n A social psychologist – Small World Experiment
97
It is a small world: six degrees of
separation
n Six degrees of separation
Ø Any two people in the world are separated by short paths, on
average about six steps.
Ø Came the term `small worlds’, which indicates that people who
may be far apart physically and socially are still connected with
relatively small paths.
98
Six degrees of separation?
n As social network increases, are the number of
`degrees’ decreasing?
99
Small Worlds
n Properties:
Ø High average clustering coefficient
Ø Short average shortest path length
100
Random Graphs and Small Worlds
101
See You!!!
102