Professional Documents
Culture Documents
Research Paper
Research Paper
Social media networks share a large amount of information via their network through the
users and their friends forming a large network of community. Social networks also plays a
crucial role in connecting different types of users and their friends, families and developing
relationships among them and ensuring that people always get connected to each other.
Hence on the social networks recommendation systems play a key role in getting the correct
users connected to each other and also making sure more and more people connected to each
other. The present social media networks suggests friends based on the respective user’s
network. But this may not be the ideal case as a friend suggestion to the respective user based
on the user’s respective network as the case does not reflect the real life style of the user.
In this research, we propose a Facebook Friend Recommendation System built using
XGBOOST algorithm on which the recommendation system runs. This recommendation
system recommends friends to the users based on the social graph of the user’s network and
considering different type of connecting parameters and similarity measures. We further
collect similarity measures such as Jaccard Distance and Otsuka-Ochiai coefficient which
filters the information from the graph and to calculate the rating of different users. The
extracted features used in building the model are Adar Index, Hits Score and Katz Centrality
and PageRank. The similarity measure measures the similarity of users rating respective to
the users rating. Then from receiving a similar rating recommendation system will suggest a
list of friends which are having similar rating with more recommendation. We have
implemented this model on computers and laptops with similar datasets and is giving huge
accuracy result of (100%) using XGBOOST algorithm which is best known for its efficient
speed and performance.
1 Introduction:
Social networks are an important source of abundant information for a large community of
people and building a community where people can share their activities, their interests and
their thoughts and also share their knowledge. Social media sites are used by millions of users
as it is a must and thing for sharing content all around the world. Social media sites are
enjoying large amount of success with the huge amount of users and content shared around
the world making their community grow all over the world. To stay get connected on the
social media sites it is important for the social media sites to build a proper recommendation
system which is very accurate and so powerful such that every user can get connected to the
right amount of users. Recommendation systems are used by social media sites to give
recommendation to the users as their friends or likely known people.
Social media recommendation system has used computing methods of machine learning like
natural language processing, data mining etc are used in the previous recommendation
systems.
The recommendation systems brought the information which is enjoyable by the user by
information filtering techniques such as content filtering.
But over the last 5 years recommendation systems used in social media are personalization
based recommendation systems which came into light and moreover social media sites use
this personalized recommendation systems which gives the recommendations more precisely
and accurately than computational methods of recommendation.
This friend recommendation system has been tested upon two algorithms Random Forest and
XGBOOST but I am preferring XGBOOST due to its faster speed and higher accuracy.
In this research, we will build a XGBOOST based friend recommendation system which
recommends friends based on the followers and followees of a person. Using a dataset of
9.43M edges of Facebook Recruiting dataset in which has it both nodes source node and
destination node in which 80% of the dataset is used for training the model and 20%
remaining for testing the model. We have a lot of similarity measures such as Jaccard
Distance and lot of feature extractions in which we will extract features for a particular user
and rank them using PageRank. In this model algorithm used for classification is
XGBOOST.XGBOOST is a gradient boosted decision trees which is well known for its speed
and performance. Confusion matrix is generated to describe the performance of the
classification algorithm in the model.
Accuracy score is calculated by the precision score, recall score and F1 score.
2 Previous Research
3 Methodology
The Dataset for this recommendation system is taken from the Facebook’s Recruiting
challenge on Kaggle platform. This dataset contains two columns such as Source node
and Destination node.
The complete data is not taken into consideration of the model but only for 100000
rows for which the model is created.
First of all the recommendation system used by social media giants like Facebook,
Twitter and Instagram is FoF algorithm (Friends of Friends algorithm).This algorithm
works by recommending friends to the users by considering their individual social
network. This recommendation systems works by if A is friend of B then it
recommends the friends of the A user social network to B such that A may show them
that they are his friends to B in the near future. So, this recommendation system may
not be accurate and may lead to wrong direction because this does not take case into
real life situation as the recommended friend may not be likely known by the users.
For solving this problem we added a Influence algorithm to the FoF algorithm which
makes the recommendation system works accurate compared to FoF algorithm and
make accurate decisions for recommending friends to the users. This algorithm works
by methods such as influence_map which takes parameters of the social graph and the
user. This influence_map returns a map of users who have atleast one common friend
with the input user and not taken into consideration into friends of input user.
Secondly this contains influence_graph which sorts the keys of the above given map
with the descending order of values. This influence algorithm takes consideration of
parameters such as how much a user’s social network is influencing the person such
as his interests and the content he follows. This makes the Influence algorithm for the
recommendation system to make accurate decisions over the FoF algorithm and this
has also proven with results.
But the problem here is the recommendation system by influence algorithm works
accurate only for smaller amount of datasets such as upto 1 lakh people. This
recommendation system may not work for heavy social media giants like Facebook
which has billions of data. For overcoming this problem we have developed a
XGBOOST based Friend Recommendation System using Graph Mining which can
make accurate decisions and more and more accuracy of the recommendation system
has resulted us in a best recommendation system which was running by XGBOOST
which is widely known for its speed and performance. This system had made of 100%
of train score which makes the system more accurate even with billions of data.
The Xgboost algorithm is the best popular machine learning algorithm whether we
choose prediction kind of thing or applying regression or classification. Xgboost is
known importantly for Extreme Gradient Boosting which introduces regularization
parameters to reduce overfits. This Gradient boosted trees will use regression trees in
the sequence learning process which will be awarded as weak learners. These
regression trees are similar to decision trees which are used to make decisions.
Xgboost algorithm assigns a continuous score to each and every leaf once the last
node in the tree has finished growing which forms a gradient tree which is added
whole up and provides the right prediction after the tree has grown totally. For each
and every iteration i there grows a tree t , scores w are calculated for each and every
iteration there will be a prediction y. This total learning process works by and their
main goal is minimize the overall and total score which contains loss function at i-1
and new structure of t forms with the loss function and minimizing overall score.
Thus Xgboost algorithm allows growing of regression trees in sequential manner and
learns from its previous iterations thus by minimising the overall score.
Xgboost also contains Gradient descent for gradient trees which calculates the optimal
values for each leaf that contains nodes and also for the possible optimal value of
overall score of tree t which minimizes the overall score. This score is also called as
impurity predictions of a tree. The loss function calculated in the algorithm contains a
regularization or penalty term under which its importance is reducing the complexity
of the regression tree functions used in the regression trees. This parameter can be
tuned awesome and can have values greater than zero and also have values between 0
to 1 which shows the maximum running complexity. But if these regularization term is
considered as zero, then there is no objection between predicting results of gradient
boosted trees and Xgboost which makes does not make a sense of Xgboost algorithm.
This Xgboost algorithm also contains parameters such as learning rate which is the
shrinkage that we will do at each and every step. Another type of parameter that is
accepted by Xgboost is column subsampling which is selecting a random subset of
features to apply for a gradient tree boosting algorithm which further allows reduction
of overfitting and does not overfit the models. For handling overfitting Xgboost has
added different methods to the regularization parameter which has been found in
gradient descent models.
Xgboost is rocking since over the previous years because of its reckless features such
as Sequential tree growing, minimizing loss function through gradient descent ,
parallel processing to increase speed , regularization parameter and learning rate like
different type of parameters. Xgboost is speeder compared to other tree ensembling
algorithms because its regularization parameter which reduces variance and
minimizing loss function through gradient descent. Additionally, xgboost has a lot of
hyperparameters subsample which is bootstrapping the training data and taking small
amount of data into consideration, maximum depth of trees upto which depth of nodes
we can process a tree, minimum weights in child nodes for splitting nodes and number
of estimators which are number of trees to use.
The parameters of the model after applying Xgboost algorithm with different type of
hyperparameters, regularization parameters and the specified parameters: