You are on page 1of 102

Big Data Analytics

MS4252
Social Network Analysis

Sources: Jennifer Golbeck 2013. Analyzing the Social Web.


Elsevier (chapter 1 2 3, 14, 15)

1
Chapter 1

INTRODUCTION

2
Social Network
n Social Media
Ø Infiltrated and changed the way minions of people interact and
communicate

n Social Networking
Ø Over one billion users on Facebook alone and billions more
accounts across thousands of social networking sites online

3
Understanding Social Network
n Understanding social network
Ø Both those explicitly formed on social networking website and
those implicitly formed in many other types of social media
Has taken on its new importance in light of this
astounding popularity

n Analyzing these social connections and interactions can


help us understand
Ø Who the important people are in a network?
Ø What roles a person plays?
Ø What subgroups of users are highly interconnected?
Ø How things like diseases or rumors will spread through a
network?
Ø How users participate? 4
Social Network Analysis
n Organizations can prevent or control the spread of
disease outbreaks.

n An article from PNAS

https://www.pnas.org/doi/10.1073/pnas.1801429116 5
Social Network Analysis
n Researchers at Twitter studied the flow of the
information out of Japan after the 2011 earthquake.

6
Social Network Analysis
n Websites can support participation and contributions
from many types of users.

n Businesses can provide immediate assistance to


customers who have problems or complaints.

n Users can band together to better understand their


communities and government or take collective action.

n Content providers online can filter and sort


information to show users the most relevant, interesting,
and trusted content.
7
Chapter 14

SOCIAL MEDIA FOR PUBLIC SECTOR

8
Social Media for Public Sector
n Public sector users may be individuals, such as elected
officials; local organizations, such as schools or libraries,
or government agencies.

n Major Questions:
Ø How are public-sector users taking advantage of the social
media?

Ø How are people talking about the public-sector-related topics


(trend analysis)?

9
Analyzing Individual Users
n Three types of Social Media Use
Ø Broadcast / Sending Information

Ø Request for Feedback / Input

Ø Conversation Interaction

10
Analyzing Individual Users
n How an individual organization uses social media???

Ø Who is doing the post?


Ø Who are the target audience members?
Ø Why is the audience engaged in social media with the
organization? What type of content or interaction is the
audience interested in?
Ø What are the goals of the user? Which of the three interaction
methods above are they using?
Ø How is the user using social media?
Ø Do the user’s actions support the goals?

11
Case Study: Solve an attempted child abduction

n Summer 2012, a 10-year-old girl and her 2-year-old


brother were walking home
Ø A man attempted to abduct the girl, but she fought him off and
ran.
Ø Children did not know the attacker, but he was caught on
several surveillance videos.

12
Outreach
n Philadelphia Police posted video and images of the
suspect at Youtube, Twitter, Facebook.

n Have caught 87 suspects through social media usage


n The man turned himself in within hours of the postings,
claiming that he could not walk, talk, or breathe out!

13
Strategies from the Philadelphia
Police
n Use social media mostly for broadcast
Ø They do not tweet too often (less that 10 times a day) so that
they do not overwhelm people with information.

n Reach out on multiple channels with clear requests to


receive feedback from audiences.

14
Case Study: Predicting Elections and
Astroturfing
n Analyzing behavior from a
large and diverse set of social
media users to understand
public opinions about public
issue.

n Can analyzing social media


activity indicate who might win
an election?

n Can politicians take advantage


of this to make it look like they
have more support than they
do?

15
Predicting Elections
n Result for predicting elections are mixed

Betting market predictions. (www.electionbettingodds.com)


16
Astroturfing
n ‘Grassroots’ effects are those initiated and led by
members of the general public.

n Astroturfing
Ø Creating fake grassroots campaigns; setting up fake accounts to
simulate grassroots support.

Ø These accounts can be detected using structural social network


analysis, using retweet and follow behaviour.

17
Structure and Astroturfing detection
n A tool called `Truthy’ to detect astroturfing content on
Twitter (Ratkiewicz et. al., 2011)
Ø Dark edges indicate retweets and light edges indicate mentions.
Ø Graphs (a) and (b) are astroturfing accounts, while graphs (c)
and (d) are real accounts.

18
Chapter 15

BUSINESS USE OF SOCIAL MEDIA

19
Measuring Success
n The return on investment (ROI) is computed as
(Income-cost)/Cost

n This statistic is not easy to compute with social media.

Ø Advertising Campaigns that combines traditional media (e.g.,


print ads) with social media may lead to increased sales and
showed a lot of engagement online, but difficult to measure how
many of the sales come from social media activities.

20
Measuring Success
n Other ways to measure success in social media

1. Counts
2. Social Sharing
3. Engagement Rate
4. Interaction
5. Referral Rates
6. Importance and Influence of Users.

21
Measuring Success: Example
n A restaurant in Arlington, Virginia used Twitter network
of @frontpageva with about 1,400 followers to share
specials and events and to interact with followers.

n Over a summer month when hockey is not in season,


the followers average between 350 and 400 mentions,
an engagement rate of 30%

n The social connection network is visualized on the next


page.

22
Measuring Success: Example
n Which of their users are most influential?
n How many clusters are there?

23
Measuring Success: Example
n Top group: twitter accounts that share information
about Washington, D.C. and Arlington, Virginia events,
parties, and locations.
n The lower cluster: Washington Capitals fans and players
because the restaurant is across the street from the
capitals’ practice facility.

n Large white node: John Carlson, the defenseman for


the Capitals, a professional hockey player (with ten of
thousands of followers).
n What do you suggest to increase the sale?

24
Success Stories
n Will it blend? Video Campaign
n https://www.youtube.com/Blendtec
Ø Strange things are blended to show the power of the blender
Ø After beginning campaign, Blendtec saw a 700% increase in
sales
Ø Viral sharing made people more aware of their products

25
Success Stories
n Zappos Customer Service @zappos_service

Ø A study over a 45-day period showed that they responded to


every customer service request within 24 hours.
Ø Customers can see the back-and-forth conversions.
Ø Zappos Customer Service also monitors any posts about Zappos
and sends message to customers with concerns (even if they did
not contact customer support).
Ø Surprisingly, only a small fraction of the customer service team
is needed to manage these request.

26
Chapter 2

NODES, EDGES, AND NETWORK


MEASURES

27
2.1 Basic Network Concept
Network’s Structure
n A person is considered as a Node or Vertex;

n Links or Edges or Ties represent a relationship

between nodes;
Ø Describe something about the relationship between the people
(e.g., sister, mother, cousin)
n Graph or Network is a set of Nodes and Edges.

Chuck Node/Vertex
Alice

Bob Link or Edge


28
Edges/Links/Ties
n Edges can be either directed or undirected.

Ø An undirected edge indicates a mutual relationship (e.g., friends


on the Facebook)
Ø A directed edge indicates a relationship that one node has with
the other that is not necessarily reciprocated (e.g., follows on
the Instagram).

n The type of edge used defines the network as either a


directed network or an undirected network.

29
Undirected or Directed Edges
Mutual Relationship

Alice Chuck

Bob
Alice
Chuck

Reciprocal Relationship
directed
Bob

30
Edges Weights
n Weighted or Valued edges
n Numerical information about a relationship
n The strength of a relationship
Ø But it can come from a variety of sources and indicate many
things, e.g. the number of times, how many connections?

Alice Chuck

Bob
31
Example – Email
n Build a network of email communication (a directed
network)
Ø One-way
u Alice may send an email to Bob without receiving a reply.

n Edges can be reciprocated.


Ø Person Alice may email Person Chuck, and Chuck may reply.

Alice

Bob Chuck
32
Directed Network
n The number of possible edges is double that of
undirected graph
n In an undirected graph,
Ø There can be only one edge between any two nodes
n In a directed one,
Ø there are two possible edges

n An undirected edge is never used in a directed network.

B C 33
Example – Apollo 13 Movie
n Main Actors in Apollo
13 movie:
Ø Actors are nodes

Ø Edges connect actors


who were in the same
movie together.

34
Example- Network
n New network that connects them if they were in an
additional movie together (except Apollo 13)

35
Example – Network
n There are labels on the edges that say something about
the relationship between the people.

36
Directed or Undirected?
n Can it be directed network?

37
Weighted Edges
n Weight indicates how many movies the actors have
been in together.

38
Adjacency Lists
n An adjacent list, also called an edge list,
Ø is one of the most basic and frequently used representations of
a network.
Ø Each edge in the network is indicated by listing the pair of
nodes that are connected.
n The adjacent list for the Apollo 13 network:

Actors Actors
Tom Hanks Bill Paxton
Tom Hanks Gary Sinise
Tom Hanks Kevin Bacon
Bill Paxton Gary Sinise
Gary Sinise Kevin Bacon
Gary Sinise Ed Harris 39
Adjacent List
n In an undirected network, the order of the node names
in each pair is irrelevant; the order can be reversed.

n In a directed network, the order of the node names is


important.
Ø If a pair is listed as `Node A, Node B’, it means there is a
relationship from Node A to Node B.
Ø The reverse relationship is not implied,
Ø But it can be indicated by including another line listing `Node B,
Node A’

40
Adjacency list with edge weight
n This is included on the same line as the two node
names, and usually follows them.
n An edge weight is a common value to see included in an
adjacency list.
u The adjacency list for the Apollo 13 network:

Actors Actors Weight


Tom Hanks Bill Paxton 1
Tom Hanks Gary Sinise 4
Tom Hanks Kevin Bacon 1
Bill Paxton Gary Sinise 1
Gary Sinise Kevin Bacon 1
Gary Sinise Ed Harris 1
41
Adjacency Matrix
n An alternative to the adjacent list is an adjacent matrix.
Ø In an adjacency matrix, it indicates if there is or is not an edge
between every pair of nodes.
Ø Typically, a 0 indicates no edge and a 1 indicates an edge.

42
Adjacency matrix
n First, the diagonal is all zeros because there are no
edges between a node and itself.
Ø Some network do allow for self-loops.
Ø E.g., in an email network, if a person emails himself, there could
be a link from one node to itself, and thus there could be a 1 on
the diagonal.
n In an undirected network, the matrix is symmetric.

43
Adjacency matrix
n In a directed network, the matrix will not necessarily be
symmetric.
Ø There are edges from A to C, and C to A, and from A to B, but
the reciprocal edge from B to A is absent.
Ø Thus, we only record a 1 for the A-B edge, and record a 0 for
the B-A edge.

B
C
44
Adjacency matrix with weight
n This scheme can be altered to show the weight of an
edge as well.

45
2.2 Basic Network Structure and
Properties
n Subnetwork

n There are parts of the network that are


interesting as well.

n When we are considering a subset of the nodes


and edges in a graph, it is called a subnetwork.

46
Subnetwork - Singleton
n Some of the simplest subnetworks are
singletons.
Ø While these nodes are not very “social,” they are still
part of a social network.
Ø these represent people who signed up for an account to
access some part of the site other than the social
networking features, or people who signed up but never
actively participated.

n Node A is a singleton.

47
Subnetwork – dyad and triad
n Two nodes and their relationship, it is called a
dyad
Ø Node A and B is a not connected dyad.
Ø Even though, the relationship between A and B are not
connected, but pair of nodes could also be called a
dyad.
Ø a connected dyad between B and C
n A group of three nodes is called a triad.
Ø a fully connected triad between D, E, and F

48
Cliques
n One of particular interest is whether or not all nodes in
a group are connected to one another. It is called a
clique.
Ø e.g. a group of people who are all strongly connected and tend
to talk mostly to one another
Ø (e.g., “Alice is part of a clique at school”).

A
Ø Clique E
Ø A, C, D D
Ø F, H, B, G C
Ø All nodes must be connected G
F
to all other nodes in the clique
Ø Tightly connected group B
H 49
Cliques
n Find the cliques

50
Clusters
n A cluster is a group of nodes that are tightly connected
Ø ‘tightly’ means they are more tightly connected than the
network as a whole

n Does not need to be a clique


n Group in the lower right of the graph is a cluster

51
Egocentric Networks
n This is a network we pull out by selecting a node and all
of its connections.

n A degree-1 egocentric network of D


Ø Node D and its edges to its neighbors
Ø One step away from D in the network

52
Egocentric Network – 1.5 degree
n A 1.5-degree egocentric
network of D

n A 1.5-degree egocentric
network of D with D
excluded

53
Egocentric Networks – 2 degree
n A 2-degree egocentric network of D

54
Path and Connectedness
The connections between nodes and measures of their
closeness are important characteristics.

n Paths:
Ø A Path is a series of steps from one node to another node in a
network
u Path is not an edge.
u A series of edges through nodes

55
Paths
n A path connecting node
M to node C by the
following steps:

Ø M-P-F-O-C
Ø M-L-K-J-P-F-Q-D-C

56
Shortest Path: Length
n The length of path = the count of the number of edges
in it.
n Shortest paths will be an important measure
Ø Sometimes called geodesic distances.
Ø To show how closely connected to the nodes

n Give me some statistics?

57
Connectedness
n Connectedness
Ø Paths are used to determine a graph property
n Two nodes in a graph are called connected if there is a
path between then in the network.
n An entire graph/network is connected if there is a path
between every pair of nodes.

58
Connectedness – strongly connected
n If a graph is not connected, it may have subgraphs that
are connected.

Ø The subgraphs are called connected components.

A B

D • A three-node connected component


• A two-node connected component
F
E • A singleton
59
Strongly connected
n In a directed network,
edges may only go in one
direction.

n If there are edges that


can be followed in the
correct direction to find
a path between every
pair of nodes, the
directed graph is called
strongly connected.

60
Weakly connected
n If a path can not be found between all pairs of nodes
using the direction of the edges,
Ø but paths can be found if the directed edges are treated as
undirected,
Ø the graph is called weakly connected.

61
Hubs and Bridges
n A bridge is an edge that connect otherwise separate
groups of nodes in the network,
Ø If removed, will increase the number of the connected
components in a graph.
Ø E.g., an edge between nodes P and F is a bridge.

62
Hubs and Bridges
n Hubs are important nodes rather than edges.
n Hubs are used to refer to the most connected nodes in
the network.
Ø In example below, node P would be a hub because it has many
connections to other nodes.

63
Chapter 3

NETWORK STRUCTURE AND


MEASURES

64
3.1 Describing nodes and edges
n Degree
Ø How nodes are connected to one another and to the network as
a whole
Ø The degree of a node is the number of edges connected to that
node.
n In undirected network
Ø The degree of a node = the total number of edges connected to
it
n In directed network, two measures of degree: in-degree
and out degree.
Ø In-degree: the # of edges coming to the node
Ø Out-degree: the # of edges originating from the node going
outward to other nodes
Ø The degree of ad node = in-degree+out-degree 65
Degree: example

66
Centrality
n Centrality
Ø One of the core principles of network analysis
Ø Measures how `central’ a node is in the network
Ø Used as an estimate of its importance in the network

n However, depending on the application and point of


view, what counts as `central’ may vary depending on
the context.
n Four types of centrality:
Ø Degree centrality
Ø Closeness centrality
Ø Betweenness centrality
Ø Eigenvector centrality
67
(1) Degree Centrality
n Degree centrality
Ø Measure of the total connections a node has
Ø Degree centrality of a node = its degree = the number of edges
it has

Ø Higher the degree -> The node is more central?


Ø Many nodes with high degrees also have high centrality
Ø Not necessarily indicate the importance of a node in connecting
others or how central it is to the main group

68
(1) Degree centrality

Node Degree Centrality


P
F
D
B

69
An extreme counter-example
n The red node has high degree centrality, but it is far
from the dense center.

Periphery of the network

70
(2) Closeness Centrality
n Measures how close a node is to all other nodes in the
network
n Calculated as the reciprocal of the average of the
shortest path length from the node to every other node
in the network.
# 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 − 1
𝐶𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 𝐶𝑒𝑛𝑡𝑟𝑎𝑙𝑖𝑡𝑦 =
𝑡𝑜𝑡𝑎𝑙 𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 𝑝𝑎𝑡ℎ 𝑙𝑒𝑛𝑔𝑡ℎ

n The higher value of statistic indicates the node is closer


to other nodes.

71
(2) Closeness Centrality
Node Path Length
P 1
O 1
Q 2
C 2
D 3
R 1

Closeness Centrality for F = 10/18 = 0.556


72
Tracking Flu Transmission
n Source: http://humanvirosphere.blogspot.com/2011/02/social-
networks-and-patterns-of-disease.html
n How does disease spread?

73
(3) Betweenness Centrality
n Most widely used when analyzing social networks
Ø Measure of a node’s influence
Ø Captures how important a node is in the flow of information
from one part to the network of another
n Percentage of shortest paths that include a given node
1. Select a pair of nodes and find all the shortest paths between those
nodes.
2. Compute the fraction of those shortest paths that include node N
3. Repeat step 1 and 2 for every pair of nodes
4. Add up the fractions to obtain the betweenness centrality for Node N

n The higher the betweenness centrality indicates


that the node is more important

74
(3) Betweenness Centrality
n Betweenness Centrality for B

Betweenness Centrality for B = 6*1(A to all


others)+0.5(CD)+14*0(all remaining pairs)=6.5

75
Betweenness Centrality – directed
networks
n In directed networks, betweenness can have several
meanings.

Ø For example, consider a Twitter account


u A user with high betweenness may be followed by many others
who don’t follow the same people as the user.
l Indicate that the user is well-followed.

u Alternatively, the user may have fewer followers, but connect them
to many accounts that are otherwise distant.

n Understanding the direction of the edges for a node is


important to understand the meaning of centrality.

76
(4) Eigenvector Centrality
n Measure a node’s importance while giving consideration
to the importance of its neighbors.
Ø Sometimes used to measure a node’s influence in the network

n Gives more weight to nodes if they are connected to


influential nodes.

Ø For example, a node with 300 relatively unpopular friends on


Facebook would have lower eigenvector centrality than
someone with 300 very popular friends (like Barak Obama).

77
(4) Eigenvector Centrality
n A variant of eigenvector centrality is at the core of
Google’s PageRank algorithm, which they use to rank
web pages.

Ø Every node is a webpage


Ø Every edge is a link
Ø If the page is linked by CNN, that link is worth more than the
page linked to a nobody.

n A page linked by prominent or important nodes get


more weight.

78
(4) Eigenvector Centrality
n Eigenvector centrality:

Ø determined by performing a matrix calculation to determine


what is called the principal eigenvector using the adjacency
matrix.

n The main principle: links from important nodes (as


measured by the degree centrality) are worth more than
links from unimportant nodes.

79
Consideration/Comparison
n A node may appear highly central with one
measure but have low centrality with another.
Ø Does not mean one measure is incorrect;
Ø The interpretation of the centrality measures is left to a
human analyst.
n Centrality measures may be difficult to compare
across networks
u A very important node in a small network may have
centrality measures that would seem unimportant in a
larger network
n The measures are calculated for undirected,
unweighted network.
u When working with directed or weighted networks, these
measures require modification.
80
Describing network
n A number of measures can be used to describe the
structure of a network as a whole.

n We introduce four:

Ø Degree distribution
Ø Density
Ø Connectivity
Ø Centralization

81
Degree distribution
n Degree is used to describe individual nodes.

n Degree distribution is used to

Ø get an idea of the degree for all the nodes in the network;
Ø show how many nodes have each possible degree.

82
Degree distribution

83
Power Law Distribution
n A lot of people who have a relatively low degree.

84
Density
n Density describes how connected a network is.
n To understand both individual nodes and the network as
a whole
n Measure the ratio of possible edges in a graph.

85
Density
n Formula
# 𝑜𝑓 𝑑𝑒𝑔𝑒𝑠
𝑑𝑒𝑛𝑠𝑖𝑡𝑦 =
# 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠

n Edges: 6
n Total possible edges: 10
n Density: 6/10 = 0.6

86
Number of possible edges
n In directed networks,

# 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 = 𝑛×(𝑛 − 1)

n In undirected networks,
𝑛(𝑛 − 1)
# 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑑𝑔𝑒𝑠 =
2

n Handshake problem?

87
Exercise
n What is the density?

88
Density - Properties
n Then network is large, and density may be low.
Ø Normally, the density may be under 10%
Ø Density in Facebook might be 0.0001 (billion of people in total)

n A network with no edges would have a density of 0.


n On the other hand, the densest possible network would
be a network where all possible edges exists which is a
clique.

n Density is aways between 0 and 1.

89
Density in egocentric network
n Local clustering coefficient:
Ø Is the density of a node’s 1.5 degree egocentric network (with
the node itself excluded)
Ø E.g., A 1.5-degree egocentric network of D with D exclude.
Ø Interested just in D’s friends only.

n What is the local cluster coefficient of D?

90
Connectivity and Cohesion
n Connectivity, also known as cohesion
Ø Measures how these edges are distributed.

Ø Count the minimum number of nodes to be removed before the


network becomes disconnected;

Ø that is, there is no longer a path from each node to every other
node.

91
Connectivity and Cohesion
n The connectivity (Cohesion) is 1, because removing
node B, C, or D would disconnect the graph.

92
Connectivity and Cohesion
n The connectivity is 2? Why?

93
Centralization
n Use the distribution of a centrality measure to
understand the network as a whole.
n Centralization of power is an often-used concept and
phrase, which relates very closely to centralization in a
network.
n E.g., betweenness centrality can represent the control
one node has in the ability of others to communicate.
Ø If many messages must pass through a particular node, that
node has the power to stop or pass on information.
Ø If a few nodes have very high betweenness, we say that the
power is centralized in those nodes.

94
Centralization
n Different centrality measures can be substituted.
Ø Centralization: take the sum of the differences in the centrality
between the most central node and every other node in the
network, and divided this by the maximum possible difference in
centrality.

n Between 0 and 1:
∑$
!"# {𝐶 𝑛 ∗ −𝐶 𝑛 }
!
𝑚𝑎𝑥 ∑$ !"# {𝐶 𝑛 ∗ −𝐶 𝑛 }
!
where 𝐶(𝑛) is the centrality of node n, 𝑛∗ is the most
central node.

95
Centralization-Example
n Consider three types of centrality.

n What is the theoretical maximum centrality? (a star


network)

n What is centralization (degree)?

96
It is a small world
n A social psychologist – Small World Experiment

97
It is a small world: six degrees of
separation
n Six degrees of separation
Ø Any two people in the world are separated by short paths, on
average about six steps.
Ø Came the term `small worlds’, which indicates that people who
may be far apart physically and socially are still connected with
relatively small paths.

98
Six degrees of separation?
n As social network increases, are the number of
`degrees’ decreasing?

99
Small Worlds
n Properties:
Ø High average clustering coefficient
Ø Short average shortest path length

n The structural attribute is common among many


naturally occurring networks
Ø Most social networks, neural networks

100
Random Graphs and Small Worlds

Regular Network Small Worlds Random Graph

101
See You!!!

102

You might also like