You are on page 1of 35

SOCIAL NETWORK ANALYSIS OF

FACEBOOK PAGES

Team Members:

15BCE0077 – AKSHAYA .M
15BCE0221-SWATHI T
15BCE0278 – PESALA VENKATA BHAGYA NIKHILA
15BCE0529- KEVIN PAUL
15BCE0708- TOSHNIWAL MOHIT
15BCE0802-SARANYA V S

Report submitted for the


Final Project Review of

Course Code: CSE3021 – Social and Information Networks

Slot: A2 + TA2

Professor: Dr.Vasantha W B

1
TABLE OF CONTENTS

TITLE PAGE NO.

INTRODUCTION 3

LITERATURE REVIEW 4

OBJECTIVE OF THE PROJECT 6

INNOVATION COMPONENT 6

WORK DONE AND IMPLEMENTATION 6

RESULTS AND DISCUSSIONS 29

REFERENCES 32

2
1. INTRODUCTION

Facebook Pages provide a key aspect of social media marketing to businesses all around the
world. In recent times, social networking sites have provided a medium through which people
can interact and gain and share vast knowledge among themselves.

On a public Facebook page users have the ability to comment, like or share posts. This is the
interaction between the users with the company/individual.

On a Facebook Page we have the option to like and follow the page which means that
whenever a new post comes up the user will be notified about the post as soon as the users
logs in.

We will visualize the network as a bimodal graph. This satisfies the criteria for bimodal graph
as an edge represents a connection between two distinct set of actors.

Here the graph is weighted and directed. An edge in this scenario from the bimodal graph
indicates whether an user has liked or commented on a post. On a post X if user Y comments
n times there will be a directed link from Y to X of weight n.

Social Network Analysis attributes such as centrality measure how important a node is to a
network. The three forms of centrality measured in this study are degree centrality, closeness
centrality and betweenness centrality. We will also study the density of the above network.

3
2.LITERATURE REVIEW

Summary Table:
Limitations/
Concept /
Authors Methodology Future
Theoretica Dataset
and Year Title used/ Relevant Research/
l model/ details/
(Reference (Study) Implementatio Finding Gaps
Framewor Analysis
) n identified
k
computeri
zed data
collection
“requires
Obtain fewer
Kevin permission to
Facebook research
Lewis , Tastes, The get data from
.com resources students
Jason ties, and framework Facebook,
public than do differ
Kaufmana, time: A used here then personal tremendously
Marco new social data from
is to apply Use Profile interviews in the extent
Gonzalez , network students
Andreas dataset network Data of all or mailed to which they
at questionna “act out their
Wimmer , using analysis on users to
Harvard ires,” social lives”
Nicholas Facebook. a given perform
Universit making on Facebook
Christakis com dataset comprehensive
y replication
2006 analysis on
s and
the network meta-
evaluation
s much
more easy
Bongwon Large To identify Extract Data 10k URLs and Future
Suh, Scale the from Twitter, tweets hashtags research
Lichan Analytics important Then perform from a have includes
Hong, Peter on Factors factors that a reduction download strong generating a
Pirolli, and Impacting relationshi predictive
cause technique ed
Ed H. Chi Retweet ps with model which
people to where dataset,
2010 in Twitter retweetabil can predict
Network retweet correlated 74M from ity. retweet
tweets features will Twitter Amongst ability based
be reduced API contextual on past
into a smaller features, retweets
number these the
are called number of
principal followers
components, and
which followees
as well as
accounts for
the age of
variance of the
individual account
components. seems to
This is affect
followed by retweetabil

4
selecting the
right number
of factors then ity
interpreting
them
The
The methodology
concept used here is to Future
David Massive used here set up a research
Ediger Karl Social is parallel supercomputer Social
Entire includes
Jiang Jason Network Network
processing architecture utilizing the
Riedy Analysis: twitter Analysis
to do a CrayXMT and full
David A. Mining feed of of Big
large scale then utilize capability of
Bader Twitter Sept 09 Data is
mining and graphCT to supercomput
for Social done
social visualize er
2014 Good
network graphs and architecture
analysis perform
analysis
Social
Network
Analysis
along
Obtain Unique
Analyzing without
User id of profiles Geo
Activities, of main feed location
Lydia Demograp and then crawl Live analysis is Dataset taken
Manikonda hics, through their profiles done. We is just a
Yuheng Hu Social users to obtain of certain find that small portion
SubbaraoK Network a large number celebritie the of the total
ambhampat Structure of users and s and reciprocity users of
i and User- avoid their is not as Instagram
2014 Generated sampling bias. followers high as in
Content Then perform flickr and
on the
network
Instagram clustering
analysis coefficient
is higher
than
twitter
Backstrom, Romantic The Random Randoml dispersion Does not test
Lars, and partnership concept sampled y sampled is a on people
Jon s and the structural who haven’t
Kleinberg dispersion
used here Facebook data Facebook
is is scraped dataset means of declared a
2014 of social
capturing relationship
ties: a dispersion where where
the notion which means
network which partner/spouse users that a that we can’t
analysis of looks not information is have
relationship friend test what the
status on
only at the enlisted, then declared a spans dispersion
facebook number of theoretical relationsh many method will
mutual dispersion is ip contexts in do in case
friends but performed and one’s where no
a network on using social life relationship
structure machine is present.

5
learning to
study the
social
among
structure of the
mutual
facebook
friends
friends the
partner can be
identified

3.OBJECTIVE
Facebook Pages play a key role in a company’s success and hence the importance of the
success of its Facebook is ascertained. We will be performing Social Network Analytics on
Facebook Pages of two competitors and perform a comparison between them to identify the
connectivity, centrality and other metrics in this network.

4.INNOVATION COMPONENT
In most experimental approaches for social network analysis the study is based on social
interactions between the users of the social network. The engagement between posts and
users on social media is something that is rarely studied upon. In our work, we are showing
the interactions between posts and users, rather than interactions among users. We are
modelling the network using a bimodal network.

5. WORK DONE AND IMPLEMENTATION

a)METHODOLOGY ADOPTED:
The Methodology adopted in this case is we start off by creating a developer Facebook
account and creating an app within it to obtain authentication id and password. This is
followed up by installing the required packages in R to link the app to the language.The
required data is then extracted from the live Facebook page.Following the extraction of data,
Social Network Analysis is performed on it.The data that need to be visualized are extracted
and view on R using the plot functionality.

The process of authentication, data collection, and creating social networks can be expressed
with the 3 verb functions: Authenticate(), Collect(), and Create(). This simplified workflow
exploits the pipe interface of the Magrittr package, and provides better handling of API
authentication between R sessions. What we are doing is “piping” the data forward using the
%>% operator, in a kind of functional programming approach. It means we can pipe together
all the different elements of the work flow in a quick and easy manner. This also provides the
ability to save and load authentication tokens, so we don’t have to keep authenticating with
APIs between sessions. Obviously, this opens up possibilities for automation and data mining
projects.

6
We will be using this data to create a bimodal network. This graph object is bimodal because
edges represent relationships between nodes of two different types. For example, in our
bimodal Facebook network, nodes represent Facebook users or Facebook posts, and edges
represent whether a user has commented or ‘liked’ a post. Edges are directed and weighted
(e.g. if user i has commented n times on post j, then the weight of this directed edge equals n)

Next we will do some more descriptive analysis:

These networks are bipartite because nodes of the same type cannot share an edge (e.g. a user
can only like/comment on a post, but not like/comment another user, and posts cannot
perform directed actions either on users or other posts). What we can do is induce two
subgraphs from each network. More specifically, we can induce two actor networks, one for
the users and one for the posts.

Secondly, we will look at the induced graph for the “users”. The induced “users” actor
network consists only of nodes that are of type “user”. An edge exists between user i and user
j if they both co-liked or co-commented the same post (i.e. they share an interaction with a
post j).

b) DATASET USED:
The dataset used is the live Facebook data of Adidas and Nike. The dataset consists of data of
the year 2017 from the month of January to the month of September. The data was procured
utilizing Facebook’s developer API are extracted utilizing the libraries of R

c)TOOLS USED:
Software Requirements:

R language- to perform network analysis

Google Chrome- To Access Facebook Data

R libraries

Windows 10.

Hardware requirements:

Laptop with min 2 GB ram, i3 processor

Storage of 300MB

7
d)SCREENSHOTS AND DEMO:

ADIDAS

JANUARY:

8
FEBRUARY:

9
10
MARCH:

11
APRIL:

12
13
MAY:

14
JUNE:

15
JULY:

16
17
AUGUST:

18
19
SEPTEMBER:

20
NIKE

JANUARY:

21
22
FEBRUARY:

23
MARCH:

24
25
APRIL:

26
MAY:

27
JUNE:

28
JULY:

29
AUGUST:

30
SEPTEMBER:

31
6. RESULTS AND DISCUSSIONS
Facebook Data obtained from the Facebook pages of both Nike and Adidas was extracted and
was analysed with a month on month comparison on multiple attributes related to social
network data was obtained. The graph given below shows a month on graph line graph
comparing vertex count from both Nike and Adidas. From the line graph given below it is
clear that Adidas has significantly greater number of users who interact with the page.
14000

12000

10000

8000

6000
vertex count nike
4000 vertex count adidas

2000

0
Ja
n
Fe
b ch ril ay ne ly st
Ju ugu mbe
r
ar Ap M Ju
M A e
pt
Se

32
The graph given below is a month on month line graph of both Nike and Adidas
of the total Edge count. This figure indicates the quantity of interactions
between the users and the posts present in it. From the graph given below it is
clear to see that the size of interactions between users and the posts is clear
20000
18000
16000
14000
12000
10000
8000 edge count nike
6000 ecount
4000
2000
0

The line chart given below normalizes the sheer volume of users presence on a facebook page
as we study the density. This is a feature that gives us an idea about the quality of user
engagement with the page. The higher the density the more engaged the user is with the
Facebook page. From the graph given below it is very clearly visualized that the quality of
interactions between the users and the page for the first five months of the year for Adidas is
significantly better than the density of the Nike Page. This however gradually evens out for
the rest of the year.

0 density nike
density adidas
0

0
Jan Feb March April May June July August September

33
The following centrality measures give us an idea of how well the post is connected to all the
users present on the page. The first line graph is a line graph showing a month on month
variation of the closeness centrality. The graph below shows us there isn’t too much
significant variation between both networks except for a spike for a period of one month
period between April and May when the value of Nike spiked and from July to September.
8.00E-07
7.00E-07
6.00E-07
5.00E-07
4.00E-07
3.00E-07 closeness nike
2.00E-07 closeness adidas
1.00E-07
0.00E+00

The graph given below shows a plot of Eigenvector centrality versus months. The graph
gives a clear idea of another kind of centrality. However in this type of centrality Adidas
happens to be more central than Nike.

1.20E-01

1.00E-01

8.00E-02

6.00E-02
eigenvectorcentrality
4.00E-02 adidas
eigenvectorcentrality nike
2.00E-02

0.00E+00
n b h il y e y t r
Ja Fe arc Apr Ma Jun Jul gus be
M u
A pte m
Se

34
The edge between centrality of this graph is 0 as it a directed bipartite graph. The reciprocity
is also zero as a post can’t interact back with a user. From the output obtained by forming a
network which connected users who liked the same post we get a complete graph indicating
there are loyal followers of the page. In conclusion Social Network Analysis of Facebook
Pages of Nike and Adidas was done successfully and certain measures of them were
compared and analysed.

7.REFERENCES
Ediger, David, et al. "Massive social network analysis: Mining twitter for social
good." Parallel Processing (ICPP), 2010 39th International Conference on. IEEE, 2010.

Manikonda, Lydia, Yuheng Hu, and SubbaraoKambhampati. "Analyzing user activities,


demographics, social network structure and user-generated content on Instagram." arXiv
preprint arXiv:1410.8099 (2014).

Backstrom, Lars, and Jon Kleinberg. "Romantic partnerships and the dispersion of social ties:
a network analysis of relationship status on facebook." Proceedings of the 17th ACM
conference on Computer supported cooperative work & social computing. ACM, 2014.

Suh, Bongwon, et al. "Want to be retweeted? large scale analytics on factors impacting
retweet in twitter network." Social computing (socialcom), 2010 ieee second international
conference on. IEEE, 2010.

Lewis, Kevin, et al. "Tastes, ties, and time: A new social network dataset using Facebook.
com." Social networks 30.4 (2008): 330-342.

35

You might also like