P. 1
Research Paper

Research Paper

|Views: 13|Likes:
Published by Damitha Premadasa

More info:

Published by: Damitha Premadasa on Jan 12, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less





Link Prediction in Social Networks by Social Distance

H.D.D.D. Premadasa
Department of Computer Science and Engineering, Faculty of Engineering, University of Moratuwa, Sri Lanka damitha.premadasa@uom.lk

In the last decade the, Social Networks has been studied extensively in the context of analyzing relationships, interaction between people and determining the interesting structural properties of the network. The studies are mainly focused on online Social Network application such as Facebook, Twitter, LinkedIn and MySpace. The Social Network data in the internet and Application Programming Interfaces (API) enable research on Social Networks. The identification of people who has more probability to be connected, and introduce them to each other enables the network tightly connected and expand the network further. This research is focused on identify the social distance and communities of members in a social network by analyzing structural variable and social variables in to account.

affected by other nodes and attributes. The specific problem instance that addresses in this research is to predict the likelihood of a future association between two nodes, knowing that there is no association between the nodes in the current state of the graph. This problem is commonly known as the Link Prediction problem which is the focus of this research

Although, most of the early research in social network has been done, numerous efforts has been made by Computer Scientists recently. Most of the work has concentrated on analyzing the existing social networks and characterize them. The most similar research I found is "Predicting Tie Strength With Social Media" which derives the seven tie strength model for existing relationships. Few efforts have made to solve the link prediction problem, especially for social network domain. The closest match with this work is the work by D. Liben, et.all, where the authors extracted several features from the network topology of a co-authorship network[1] and Link Prediction using Supervised Learning by M Al Hasan et all[2]. Their experiments evaluated the effectiveness of these features for the link prediction problem. This project intends to build the link prediction model by analyzing structured and social variables of the node in social network.

In general, a social network is defined as a network of interactions or relationships, where the nodes consist of actors, and the edges consist of the relationships or interactions between these actors. The relationships of social network do not always imply only online social networks such as Facebook, LinkedIn but some interactions may be in any conventional or nonconventional form, such as face-to-face interactions, telecommunication interactions, email interactions or postal mail interactions. But in the context of research and analysis the online social networks are used to model social networks as high volume of social data availability. The associations are usually driven by mutual interests and attributes. Understanding the dynamics that drives the evolution of social network is a complex problem due to a large number of variable parameters. But, a comparatively easier problem is to understand the association between two specific nodes. Several variations of the above problems make interesting research topics. What are the factors that drive the associations, how is the association between two nodes


Multi-Attribute Relationship Networks

In a multi-dimensional network has multiple types of interactions among the nodes in the network elements. In Each dimension of the network represents one type of connectivity between users. For instance, in Facebook a user can directly or indirectly connect with others by friend relationships, common structural attributes such as common working place or common education place or more, common groups and etc. The identified main attributes types of these networks can be identified as follows [3],

Data extraction and Preprocessing 2.g. the approach. a Java client program which uses RestFB is developed and used. 2. 3. 5. College / University To extract quality data. The output of the program was two . distinguishing between strong and weak ties with over 85% accuracy. The model builds on a Facebook dataset and ties and performs relationships well. 4. link prediction problem can be posed as a binary classification problem that can be solved by employing effective features in a supervised learning framework. Basic information. More incomplete data were observed when the social data is extracted. uniformly representing objects in the graph (e. RestFB[5] is an open source simple and flexible Facebook Graph API and Old REST API client implemented in Java language. The methodology can be divide in to main three parts. Friends and Family. Gender Friends and Families: Mutual Friends Education and Work: Employer. technologies and techniques will be discussed..1. Current City. consistent view of the Facebook social graph. For an example some profiles exist only with the user id but other attributes are missing. Incomplete data was filtered using the same program. To build the Link Prediction model 500 users have been randomly used. friend relationships).2 Tie Strength Model This research focuses on selected attributes of each member’s profile. The Old Graph API holds functions needs to use for this research analysis. Mutual relationships are also considered to build the Link Prediction Model. Basic Information: User Id. The model of each classification . people) and the connections between them (e. 1.4 Data Extraction and Preprocessing 1. Birth Year. 1. Test Link Prediction Model Classification model of link prediction problem needs to predict this link by successfully distinguishing the positive classes from the dataset.. Hence the RestFB will be used to extract information from the Facebook. Predictive Intensity Attributes Duration Attributes Reciprocal Services Attributes Structural Attributes Emotional Support Attributes Social Distance Attributes 1. Thus. Figure 1 1. Build Link Prediction Model 3.3 Facebook Data Extracting API The new Graph API[4] of Facebook presents a simple. Predictive model maps social media data to tie strength. Figure 2 Following are the detailed attributes descriptions that were extracted. Education and Work fields were chosen to represent social behavior and structural behavior of a member.5 Build and Evaluate Link Prediction Model Methodology In this section. The relationships between users in the social networks have been studied the Tie Strength Model [3] is derived.csv files which will be the input to the WEKA[6] data mining software. 6.g.

precision.Accuracy. even though the discussion is focused on the feature set for structural and social attributes analysis. squared-error metrics are considered. For an example total number of mutual friends among two friends is an aggregation attribute and it will be an interesting feature in the process of link prediction.2 Classification Algorithms There are large numbers of classification algorithms for supervised learning. precision-recall.745 Perception RBF Network 73.1 74. F-measure and squared error. 5-fold cross validation was used for the result reported.74.2729 Relative absolute error 0. some usually work better than others for a specific dataset or domain. F-value. Classifier C4. squared-error.5. we tried more than one variation and reported the result that showed the best performance. some aggregation functions need to be used to combine the attribute values of the corresponding nodes in a node-pair. RBF Network and Bagging.732 Bagging 64. During the testing phase accuracy. Decision Table. In this research. In this research. a well known machine learning library.30% range. we should choose features that represent some form of proximity between the pair of vertices that represent a data point. Results and Discussions The results obtained from this experiment are given below. There is a separate test data set which is used to evaluate the classification models.methodology is saved to re-evaluate with the test dataset. Then we compared the performance of the above classifiers using different performance metrics like accuracy.5. Primarily the result set is the performance analysis which contains accuracy. Naive Bayes.732 0.30 Precision FMeasure 0.772 Decision Table 0. These will be called aggregated feature. so no conclusion can be drawn from the accuracy metric about the most suited algorithm for the link prediction. One favorable property of these features is that they are very cheap to compute and there exist individual attributes that can also provide helpful clues for link prediction. to understand the inconsistency in feature values. For some of these. Since. This indicates that the features that we had selected have good discriminating ability.30 74. The algorithms that we used are Decision tree.4058 0.814 Naive Bayes 0. For link prediction.744 1. C4.24 0.18 0. these attributes only pertain to one node in the social network.4808 . For most of the features.18%. precision-recall. F-value is the harmonic mean of precision-recall that is sometimes considered a better performance measure for a classification model in comparison to accuracy.77 77.5 (J 48) Naive Bayes Decision Table Multilayer Perception Root mean squared error 0.554 0. Precision and F-measure Results 1. WEKA [6] was used. especially if the populations of the classes are biased in the training dataset. the distribution of positive and negative class exhibit significantly difference.5 (J 48) 0.553 Table 1 . In the context of Table1 attributes C4. Multilayer perception and RBF Network accuracy levels lies between 74.793 0. For all the algorithms.3088 0.3318 0. Classifier Accurac y (%) 78.5 algorithm and Naïve Bayes dominates. the above concept measure provides a clear direction to choose conceptually identical features in other network domains. the distribution of positive and negatively label samples were analyzed for the selected features in the dataset. Although their performances are comparable. Such a small difference is not statistically significant. In all the classifier algorithms other than Bagging has more than 73% accuracy which proves that the popular classification algorithms yields quality results in social network domain as well. For the algorithms.3899 0. F-value.1 Feature Set Choosing an appropriate feature set is the most critical part of any machine learning algorithm. Multilayer Perception. Further.305 0.771 0.4118 0. The values are not significantly different in all models for this dataset. thus facilitating the classification algorithm to pick patterns from the feature value to correctly classify the samples. the experimented was done with five different classification algorithms.743 Multilayer 0.747 0.

find missing members of groups and find missing relationships.3088 root mean square error and 0. 2007. Through this research it is proved that. There are other similar measures such as Jaccard’s [8] coefficient. For an example. CHI '09 Proceedings of the 27th international conference on Human factors in computing systems. We believe that these ranks are meaningful and can help other researchers to choose attributes for link prediction problem in a similar domain. F-measure and error results. The behavior of the link prediction model is varying with the locality of the user.RBF Network Bagging 0. "Link Prediction using Supervised Learning". link prediction in a real life social network can be solved with a very high accuracy by considering only few features.ist. Mohammed Zaki.1019-1031. It has been shown that most of the popular classification model can solve the problem with an acceptable accuracy. "The linkprediction problem for social networks". Adding more topological features such as edge disjoint k shortest distances makes the Model stronger. pp. It is also provided a comparison of the features and ranked them according to their prediction ability using different feature analysis algorithms. http://citeseerx. Kleinberg. But 0. Data preprocessing step of this research focused only on incomplete data. When the tree structure is analyzed. Citeseer. it has been proved that most significant feature is the "Mutual Friendships" which is a topological feature of the social network. 2009. Recent researches [7] show that this metric is remarkably robust and has the higher average correlation to the other metrics. 211-220 .M. Future Works Link prediction is done using a comparatively small localized datasets but large localized dataset will yield more accurate prediction model. favorite sports in Sports section. an algorithm used to generate a decision tree yields the best accuracy precision. Vineet Chaoji. distance. current city and hometown hierarchy can be defined in the context of structural attribute of the user. The C. New York 2006.5 decision tree algorithm has the lowest relative absolute error errors comparing to other algorithms used whereas Naïve Bayes has the lowest root mean squared error. the better the chance that they will collaborate. user location contains country. Saeed Salem. pp. Liben-Nowell and J. the methodology. Moreover. In results. hence an excellent metric to compare the performances of different classifiers. Such understanding can lead to efficient implementation of tools to identify hidden characteristics. The feature selection for link prediction can be extended and more attributes with higher information gain can be added. Therefore the localized dataset will yield more accurate model for link prediction over a global link prediction model.psu. “Predicting Tie Strength With Social Media”. C4.3899 relative absolute error implies the model has considerable amount of error even though it has the best error results among the others.1. key features extraction and the approach will be a guidance to solve social network problems in future.1. presented at JASIST.1225 [3] Eric Gilbert and Karrie Karahalios . but C4.61. the followings were discovered .edu/viewdoc/summary? doi=10.4748 To compare errors of different algorithms root mean squared error and relative absolute error measures were used. The shorter the Acknowledgements I wish to acknowledge the active support and advice given by Dr.4644 0. For an example attributes membership in Friends and Family section. Social networks contain a large number of forged data.5[9]. The constant guidance and frequent reviews have been enabled the completion of this research project. Higher the number of mutual friends more the possibility to be connected in the network. Recent researches [7] show that this metric is remarkably robust and has the higher average correlation to the other metrics.4228 Table 2 – Error results 0.The attribute which had highest information gain is Mutual Friends count. Shehan Perera. Conclusion Link prediction in a social network is an important problem as it is very helpful in analyzing the possible growth of inter-relationships in the social groups. References [1] D. The other attributes such as College. Work Place and Birth Year are covered by Mutual Friends attribute. [2] Mohammad Al Hasan. Identifying forge data patterns in social network is also a challenging task which helps to extract high quality data.5 has proven that it is the most suitable algorithm for this dataset. hence an excellent metric to compare the performances of different classifiers. Building of concept hierarchies will drive to produce more fine grained results.4.3156 0.

http://restfb. [7] R. KDD 2004. July 2011.[4] Facebook. [9] J.com/docs/reference/api/ [5] Mark Allen. 2005.com/ [6] I. “Facebook Graph API”. Morgan Kaufmann. 1993 . Data Mining: Pract.facebook. Morgan Kaufmann. [8] D.5: Programs for Machine Learning". “restfb”. Witten and E. Quinlan. https://developers. Kleinberg. LinkKDD. San Francisco. R. "C4. Frank. Caruana and A. 2004. Niculescu-Mizil. July 2011. Liben-Nowell and J. “Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria”. "The Link Prediction Problem for Social Networks". Machine learning tools and techniques”.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->