Facebook Network Centrality Analysis

SNA Report1 file:///C:/Users/xvenel/Downloads/SNA%20Report1(1).
html
WEEK 1
INTRODUCTION:
In the first week, we were supposed to choose a network to work on and to implement it properly in python,
computing some basic but essential features of it in order to have a starting overview of the network. Having
drawn a subgraph, and computed the number of nodes/edges, the average degree and the density will make
our work suitable also in the next weeks' tasks to compute more complex formulas.
• TASK A: We decided to pick the Facebook ego network, which is an undirected and unweighted network
where links represent connections between users.
• TASK B: We implemented the graph in python by iterating over the lines of the dataset Facebook-ego.txt
and creating a list of tuples for each couple of values:
In [ ]: import random # random generator

import networkx as nx # network
import matplotlib.pyplot as plt # drawing
import numpy as np # Matrices
In [ ]: dataset = open('Facebook-ego/facebook_edges.txt','r')
graph = {}
list_edges = []
for line in dataset:
n1,n2 = line.split(" ")
list_edges.append((int(n1),int(n2)))
Then we have created the graph by using the following NetworkX function, which creates a dictionary whose
keys are the nodes and the values are the nodes which are connected to them:
In [ ]: graph=nx.from_edgelist(list_edges)
• TASK C: We chose to start drawing a subgraph starting by one of the last nodes: 4023. Observing the
raw data we have noticed that those nodes were the ones with less edges. This allows us to have a
better visualization of the subgraph when we use the apposite command nx.draw(g) and get better
interpretations of our results. To do it we have used some NetowrkX functions. The subgraph starts from
the node n ( =4023th node) and takes in consideration all the nodes connected to it up to the 2 level of
connection.
1 of 9 10/27/2022, 4:32 PM
SNA Report1 file:///C:/Users/xvenel/Downloads/SNA%20Report1(1).html
In [ ]: n = 4023
i = 1
s_edges = []
level_i = [(n,k) for k in list(graph[n])]
nodes_1 = [i[1] for i in level_i]
level_ii = [(n,k) for n in nodes_1 for k in list(graph[n])]
s_edges.extend(level_i)
s_edges.extend(level_ii)
s_graph=nx.from_edgelist(s_edges)
nx.draw(s_graph)
• TASK D: For computing the number of nodes, the smartest way is to calculate the length of the graph,
while for the number of edges we have calculated the length of the list of tuples that we have created in
TASK B.
In [ ]: n_nodes = len(graph)
n_edges = len(list_edges)
n_neighborhoods= n_edges*2
For what concerns average degree and density it was sufficient to apply their formulas:
2 of 9 10/27/2022, 4:32 PM
In [5]: average_degree = n_neighborhoods/n_nodes

density = average_degree / (n_nodes-1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 average_degree = n_neighborhoods/n_nodes
2 density = average_degree / (n_nodes-1)
NameError: name 'n_neighborhoods' is not defined
CONCLUSION:
In conclusion of this first week, we can state that the graph is a Sparse graph. The density value, which is
quite low (0.01081…), tells us that the nodes do not tend to be very connected to each other, in relation to the
number of nodes.
WEEK 2
For this week, our task must be applied to the largest component of our graph, so the largest not-extendable
subgraph. Since our graph is undirected, the largest component is the graph itself. The aim of the following
tasks is to check how many nodes of the graph are friends and how is their relations. By computing the
average clustering we can understand the mean ratio between the friends’ friendships of a node n and the
number of its neighbours. In this way, we analyse how much are interconnected the nodes’ friends are. The
transitivity T of a graph is based on the relative number of triangles in the graph, compared to the total
number of connected wedges of nodes. The transitivity of a graph is closely related to the clustering
coefficient of a graph, as both measure the relative frequency of triangles.
• Task 2.a: We compute the average clustering using a special function of networkx. For each node, it finds
the ratios between the number of triangles where the node n is present and the binomial coefficient of Kn.
After that it sum all of them and multiply it by 1/N.
In [ ]: av_clustering = nx.average_clustering(s_graph)
Using the command nx.triangles(s_graph) we obtain a dictionary of how many triangles is inside each node.
Doing sum(list(nx.triangles(s_graph).values())) we get the sum of the total number of triangles in the graph.
In [ ]: n_triangles = sum(list(nx.triangles(s_graph).values()))
3 of 9 10/27/2022, 4:32 PM
This will be essential to creating an own transivity’s function.
• TASK 2.b: How we did that? For each node of the graph, we checked its degree ( len(list(graph[i]))) and
we computed the binomial coefficient of Kn, summing that for all the previous ones (series_ki).
In [6]: def transivity_of(graph):

series_ki = 0
for i in graph:
degree_ki = len(list(graph[i]))
series_ki = series_ki + (degree_ki*(degree_ki-1))/2
return (3*n_triangles)/series_ki
transitivity = transivity_of(s_graph)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [6], in <cell line: 7>()
5 series_ki = series_ki + (degree_ki*(degree_ki-1))/2
6 return (3*n_triangles)/series_ki
----> 7 transitivity = transivity_of(s_graph)
NameError: name 's_graph' is not defined
Our function will return (3*n_triangles)/series_ki .
We can observe that: the av.clustering of the chosen subgraph is ≈0.46 the transitivity of the chosen subgraph
is ≈0.73
WEEK 3
INTRODUCTION:
For this week the main object was the centralities, whose computation is very important if we want to
understand the importance of nodes in a graph. Since there are many kinds of centralities, we were supposed
to choose the most relevant centrality for our network and we decided to pick the Betweenness Centrality.
Looking at the plotted subgraph we have noticed that, especially in the middle, there are a lot of nodes
connected to each other. The betweenness centrality, in fact, calculating the shortest path between each pair
of nodes in a graph where a node “i” lies, is the best method for understanding the influence that node i has
over the connection of its neighbours.
The last step was plotting the cumulative distribution for this function, which tells us, on a scale from 1 to 0
(always decreasing), the cumulative probability for a random X value.
• COMPUTATIONS: Knowing the formula for the Normalized Betweenness Centrality, we first recalled the
nodes by using a NetworkX function and then computed the number by using the length:
4 of 9 10/27/2022, 4:32 PM
In [ ]: #def normalized_betweness_of(g):
nodes = nx.nodes(g)
n_nodes = len(nodes)
visited = []
s_paths = {}
Then we have created a nested loop and, if the nodes i,j has respected some fundamental conditions, they
will be appended in the list of the visited nodes. This is essential in order to don’t have a redundant output:
In [ ]: for i in nodes:
for j in nodes:
if i != j and ((i,j) not in visited or (j,i) not in visited):
visited.append((i,j))
After that we have iterated over all the shortest paths between i and j and append each path in “s_paths” (if
the length of the path is greather than 2, otherwise it would be not respected the conditions of the series
inside the betweenness formula):
In [ ]: #for p in nx.all_shortest_paths(g, source=i, target=j):

if len(p)>2:
if (i,j) not in s_paths:
s_paths[(i,j)] = [p]
else:
s_paths[(i,j)].extend([p])
Then, we computed the normalized betweenness in order to see in a scale 0 to 1 (percentage), the influence
that a node has over the flow of information of a graph and then plot it onto a cartesian plane. To do it we
have calculated how many times a node appears (saved them on number_of) and then divided by the total
number of units.
5 of 9 10/27/2022, 4:32 PM
In [ ]: normalized_betweness = {}
for n in nodes:
n_inside_sp = 0
for values in s_paths:
values = s_paths[values]
for k in values:
k = k[1:-1]
if n in k:
n_inside_sp = n_inside_sp + 1
betweness_of_n = 0.5*(n_inside_sp / len(values))
normalized_betweness[n]=(n_nodes-1)*(n_nodes-2))*(betweness_of_n)
return normalized_betweness
n_betweeness = normalized_betweness_of(s_graph)
centralities = list(n_betweeness.values())
WEEK 5
These week we must resolve tasks related to PageRank: a really useful algorithm to analyze the importance
of the different ‘pages’. We can simplify the PageRank algorithm to describe it as a way for the importance of
a webpage analyzing the quantity and quality of the links that point to it in the different iterations. Since all
links in the web are directed (and not undirected like our graph), in our function we use the command below to
convert our undirected graph into a directed one.
In [ ]: def PageRank(g,t,alpha):
g = nx.DiGraph(g)
Doing a PageRank’s algorithm means to compute the ratio between the pagerank of the previous iteration
and the out-degree of the nodes which link our node i.
6 of 9 10/27/2022, 4:32 PM
How can we implement this formula on Python? We create a function which takes as inputs:
1. The graph g
2. The threshold t
3. The parameter alpha
For first, we create a list of all nodes: this is essential in order to consider the different pair of shortest paths.
We have created two different dictionaries related to the type of links in the graph for each node:
• Outlinks -> for each node we write the list of the links which the node i does
• Inlinks -> for each node we write the list of the links which arrive to the node i
In [ ]: outlinks={}
inlinks = {}
listof_edges = nx.edges(g)
for i in listof_edges:
if i[0] not in outlinks:
outlinks[i[0]] = [i[1]]
else:
outlinks[i[0]].append(i[1])
if i[1] not in inlinks:
inlinks[i[1]] = [i[0]]
else:
inlinks[i[1]].append(i[0])
7 of 9 10/27/2022, 4:32 PM
The first rt (R(t=0)) will have as values 1/N for each node, and then the formula for each node, for each
iteration, that is the formula that we saw in class for pagerank. The first rt (R(t=0)) is created outside the while
loop
rt = {n:(1/(len(nx.nodes(g)))) for n in g} times= 0 prev_rt= rt.copy()
Where len(nx.nodes(g))) is the number of nodes. While our counter variable times is not equal to the
thereshold t, then we do the following steps: For every node in prev_rt, we do the calculation of their R_t
based on the previous formula saving it in rt. To do this we need to use 2 for loops:
1. The aim of the first for loop inside while is to compute the formula of the pa
gerank. We start with result = 0, adding the new values every call of the loop.
For every node n we enter in the list of their in-edges (inlinks[i]). The numerat
or is the value of i in the previous iteration, whereas the numerator will be th
e out-degree of i, appointed as len(outlinks[i]). The result will be the ratio be
tween the variables numerator and denominator.
We assign this value to rt[n]
2. We do a for loop in the new rt, now for each key’s value we compute the rest o
f the pagerank’s formula.
For the next iteration, rt becomes the prev_rt (previous R_t )
When the while loop ends, it will return rt. As we can see in #2 task, our function’s output coincides with the
networkx’s output.
In [ ]: while times!=t:
for n in prev_rt:
result = 0
for i in inlinks[n]:
numerator = prev_rt[i]
denominator = len(outlinks[i])
if denominator == 0:
result = result + 0
else:
result = result + ( numerator / denominator )
rt[n]=result
for n in rt:
rt[n]= alpha/len(g) + (1-alpha)*rt[n]
times = times + 1
prev_rt= rt.copy()
return rt
In [ ]: print (PageRank(s_graph,100, alpha=0.15))

print ('---')
print(nx.pagerank(s_graph,alpha=0.85,max_iter=100))
8 of 9 10/27/2022, 4:32 PM
We still need to find a way to compute properly the pagerank with a non-given t as input. So we need to check
whenever the prev_rt is equal to rt even with all the computations of the iteration. In this case, we say that the
pageranks converges. So we need to change the while’s condition, and it could not be prev_rt!=rt since it will
get issues with the command prev_rt = rt.copy() After this, we can do the rest of the tasks.
9 of 9 10/27/2022, 4:32 PM

Facebook Network Centrality Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Facebook Network Centrality Analysis

Uploaded by

Copyright:

Available Formats

SNA Report1 file:///C:/Users/xvenel/Downloads/SNA%20Report1(1).

In [ ]: import random # random generator

In [5]: average_degree = n_neighborhoods/n_nodes

NameError: name 'n_neighborhoods' is not defined

This will be essential to creating an own transivity’s function.

In [6]: def transivity_of(graph):

NameError: name 's_graph' is not defined

Our function will return (3*n_triangles)/series_ki .

In [ ]: #for p in nx.all_shortest_paths(g, source=i, target=j):

rt = {n:(1/(len(nx.nodes(g)))) for n in g} times= 0 prev_rt= rt.copy()

In [ ]: print (PageRank(s_graph,100, alpha=0.15))

You might also like