You are on page 1of 7

2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference

on Embedded and Ubiquitous Computing

Friendship Prediction Based on the Fusion of Topology and Geographical Features


in LBSN

Hui Luo, Bin Guo, ZhiwenYu, Zhu Wang, Yun Feng


School of Computer Science, Northwestern Polytechnical University, P. R. China
Xi’an, China
brouselh@gmail.com

Abstract—Friendship prediction in social networks is useful for prediction, researchers have to build dedicated models for
various applications, such as friend/place recommendation and each selected feature sets. Instead of measuring user
privacy management. In this paper, we propose a friendship similarity, in this paper, we propose a classification-based
prediction approach by fusing the topology and geographical approach that leverages a fusion of online/offline features of
features in location based social networks (LBSNs). We users for user friendship prediction.
investigate the features of users’ relationship both online and Recently, large-scale data from LBSNs has become
offline and quantify the contributions of selected features available containing information about the updates of users
through information gain metric. Three key features are as well as users’ friends. While Facebook and Foursquare are
selected, namely user social topology, location category, and
popular LBSN services worldwide, Jiepang1 and Dianping2
check-in location. Friendship is predicted based on the fusion
of the selected online/offline features. Three inference models
are two representative LBSN services in China. We collect
are selected to infer the friendship, including Random Forests, data from both Foursquare and Jiepang to predict friends of
Support Vector Machine (SVM), and Naive Bayes. The users in both datasets. It is believed that users’ behavior
proposed approach is validated by intensive empirical could be linked to the behavior of their neighbors in social
evaluations using the collected Foursquare and Jiepang topology network [8]. In this paper, we conduct an in-depth
datasets. analysis of various characteristics of LBSNs, and select
features from both the aspects of social topology and check-
Keywords-LBSNs; Friendship Prediction; Social Topology; in behavior of users. For check-in behavior, we analyze both
Geographical Features; Online/Offline Interaction the locations they check in and the category of these
locations. Then, we propose a friendship prediction approach
I. INTRODUCTION by fusing the topology and geographical features of LBSNs.
We measure the contribution of different features based
Location based Social Networks (LBSNs) allow users to on their information gain, and select three key features
find where their friends are, to search location-tagged namely user social topology, location category, and check-in
content within their social graph, and to meet people nearby. location. We adopt the Bayes model to verify the
It helps users establish an extensive and strong contact with performance of the selected features by reproducing the
the outside world. The recent surging of LBSN makes it social topology network. Then, we propose a friendship
possible to collect large-scale data of users’ updates, leading prediction method based on the fusion of the selected online
a stirring of interest in LBSN studies which include social and offline features. Three classification models including
topology analysis [1,2], community mining [3-5], social Random Forests, Support Vector Machine (SVM), and
privacy protection [6], friendship prediction [7-11], etc. Naive Bayes are adopted to predict the user’s social ties. In
For the friendship prediction, intuitively, the more similar summary, the main contributions of this paper are as follows:
the users are, the more likely they will be friends in the • We investigate the valuable features of users’
future. We argue that the formation of online friendship may relationship both from social topology network and
reflect to some degree of the real-world counterpart. In other geography.
words, people who have mutual interests, are geographically
• We measure the contribution of different features to
close, or belong to common social circles (friends of friends),
users’ friendship by using the information gain
are more likely to become friends [8]. It is a challenge that
metric, and use a Bayes model to reproduce the
what metrics can be used to characterize these correlations.
social topology network to verify that the selected
A viable way is to select features to quantify them. As a
features characterize users’ relationship well.
result, the key of friendship prediction is how to select
• We propose a friendship prediction method based on
appropriate features. Currently, the mainstream method is to
the fusion of the selected features. The experimental
quantify the similarity between users by introducing different
results on two real-world datasets show the
user similarity measures, and user pairs with high similarity
effectiveness of the friendship prediction method.
will be recognized as friends. However, users who are
The precision of the three classification models for
similar to each other does not necessarily mean that they are
Foursquare achieves up to 64.31% and all of the
friends. Meanwhile, users usually have a lot of features or
attributes, a large similarity based on the selected features
may not indicate users’ overall similarity. The problem of 1
http://jiepang.com
such an approach is that, in order to improve the accuracy of 2
http://www.dianping.com

978-0-7695-5088-6/13 $31.00
$26.00 © 2013 IEEE 2224
DOI 10.1109/HPCC.and.EUC.2013.319
recall achieves up to 66.68%, respectively. For the edges represent user relationships. Specifically, while a solid
Jiepang data set, the precision achieves up to line means the linked users are friends, a dotted line means
98.95%, and the recall up to 72.73%. not. Let G (U , E ) be an unweighted, undirected graph
s s s

The rest of the paper is organized as follows. In Section representing the social topology network where each node
2, we discuss the related work. Section 3 presents the method u ∈ U denotes a user and each edge e
s s ∈ E denotes a u 1, u 2 s
of feature selection. Section 4 describes the approach for
relation between a pair of users (u1, u2). Information gain is
friendship prediction, followed by the evaluation results in
usually a good measure for deciding the relevance of an
Section 5. Finally, we conclude our work in Section 6.
attribute [21], so we use information gain to select useful
II. RELATED WORK features as information gain can evaluate the contribution of
selected features to users’ relationship. For target feature X
Friendship prediction in social networks has attracted and a selected feature Y, the information gain IG(X, Y) equal
much research recently. There are several approaches the information entropy of X minus the conditional entropy
towards friendship prediction. Schwartz and Wood proposed [22] achieved by learning the state of the feature Y. It is
an “interest distance” metric to measure the similarity of two defined as follows:
users’ friend circles, and cluster the users according to IG ( X , Y ) = H ( X ) − H ( X | Y ) (1)
similarity [9]. Gregory et al. performed hierarchical
Assume X with n outcomes {x1... xn}, p(xi) is the
clustering to discovering overlapping communities in
probability mass function of outcome xi. The information
networks [12] and Grob. et al. used this algorithm to make
entropy H(X) is defined as Eq. (2).
iterative friend recommendations from the same community n
[10]. These methods only used single feature, such as social H ( X ) = ¦ − p( xi ) log p( xi ) (2)
network structure or user interest. i =1

Some researchers try to use multiple features in Assume p( y j ) is the probability mass function outcome
friendship prediction. Özseyhan et al. [13] used association y j , p( xi , y j ) is the probability that X = xi and Y = y j . The
rule mining to suggest potential friends of users based on the conditional entropy H(X|Y) is defined as Eq. (3).
features of age, location, and income. Li et al. [14] studied p( y j )
how users share their location in real world and built multi- H ( X | Y ) = ¦ p ( xi , y j ) log (3)
i, j p ( xi , y j )
layered friendship model through quantifying the correlation
between users’ friendship with their mobility characteristics, The data of Foursquare collected started from October
social graph properties, and user profiles [8]. Cranshaw et al. 24th, 2011 and lasted for 8 weeks. It has more than 12
[15] proposed a model with a set of location-based features million check-ins performed by 720,000 anonymous users
for friendship prediction by analyzing users’ location trails. over 3 million venues. We resort to Twitter streaming API
Sadilek et al. [11] used a regression decision tree to unify the [23] to get the publicly shared check-ins in this work because
text similarity coefficient and co-location to improve the Foursquare API [24] provides limited authorized access for
performance of friendship prediction. All of these methods retrieving check-in information. We also crawled metadata
use a variety of features to measure the similarity of users related to users and venues, including every user’s Twitter
rather than the users’ relationship. They all assume that two profile and every venue’s Foursquare profile. The data of
users are friends if they are similar in terms of selected Jiepang was collected in two months from April, 2012 to
features. June, 2012. We used the open API [25] provided by Jiepang
There are recently studies that calculate user intimacy and collected more than 2 million check-ins performed by
through offline interactions. For instance, Guo et al. [16] 89,936 anonymous users from those who made at least one
proposed a method to predict user friendship based on inter- check-in on Jiepang during the period from users’
user meeting frequency and duration in the physical world. registration time to our collection time. Both of the two
Methods of link prediction in complex networks [17] have datasets include user ID and his/her friends list, check-in
also been adopted to mine potential friends of users. Liben- time, geographic coordinate (latitude/longitude), and the
Nowell et al. [18] analyzed the link prediction problem for category of the location people check-in.
social networks. Zhou et al. [19] used 10 types of similarity
indexes based on the local node information to make link TABLE I. THE INFORMATION GAIN OF EACH FEATURE
prediction in six real networks. Leskovec et al. [20] studied Features Information gain
online social networks in which relationships can be either user social topology 0.0055
positive (indicating relations such as friendship) or negative
location category 0.0021
(indicating relations such as opposition or antagonism).
check-in location 0.0012
These methods select features from topology network to
predict potential friends of users, while our work combines pairs of location people check in 0.0004
both social topology network and geography features. number of check in 0.0002

III. FEATURE SELECTION We selected 2,731 users who have updates in Foursquare
As we know, the key to friendship prediction is how to to calculate the information gain of selected features. The
select appropriate features. We assume the social network of number of friend pairs is 5,590 and non-friend pairs is
LBSN is a complete graph, where the nodes denote users, the 3,722,225. So the information entropy of users’ relationship

2225
is 0.0162. In Table 1, we calculate the information gain of C. Check-in location
five features from both social topology and geography. We Those users who have more common check-ins are more
finally select the top three features for friendship prediction, likely to become friends than those who have none.
i.e., user social topology, location category, and check-in Especially when the number of common check-ins takes up a
location. large proportion of the total check-ins of users. Assuming the
A. User social topology check-in locations of user ui are (li1, li 2 ……, liH ) , Dist (lim , l jn )
denotes the distance between the check-in location lim of
Whether two people are friends has a strong relationship
user i and the check-in location l jn of user j. Those users are
with their social topology network [26]. In social topology
more likely to become friends if they have more common
network, those users who have common neighbors in current
check-in locations than those who have none. Assuming that
time are more likely to be friends in the future than those
two check-ins are the same place if the distance of them is
who have none. If users are friends now but they have none
less than 0.3Km, so we ignored those distances which were
or few common friends, their relationship can be very weak.
greater than 0.3Km, for each pair of user i and user j, we uses
In order to measure this situation we introduce a concept
Eq. (7) to quantify this property.
called social distance as (i, j ) , to represent the shortest M N

a (i, j ) = ¦¦ im
distance between ui and u j in the topology, as formulated in (c + c jn ) (Ci + C j )
l m =1 n=1 (7)
Eq. (4). If user i and user j are friends, then the distance of i Dist (lim , l jn ) < 0.3
and j is 1. Otherwise, the distance is infinite. We removing
the friendship of user i and user j (if they are friends) before IV. FRIENDSHIP PREDICTION
calculating the distance of them, and then calculate the As shown in Fig. 1, the selected users form a social
shortest distance of them in user social topology. topology network. Each user has some check-ins and each
as (i, j ) = shortest _ dis (ui , u j )inGs' (U s , Es − eij ) (4) check-in has its category. We quantify the relationship of
users from the three different layers and have three features
B. Location category for each pair of users. For the three selected features, user
The categories of the locations people check-in reflect social topology is online features while both check-in
their behaviors or preferences to some extent. Users location and location category are offline features. Then, we
encounter at different locations have different meanings. build a classification model to fuse the selected features.
Generally, people who meet frequently in private locations Finally, we use the model for friendship inference and divide
are more likely to become friends than those who often meet the edges in social topology network into the edges of
in public places. In order to quantify the relationship between friendship and the edges of non-friendship.
user preference and the semantics of check-in locations. We
define the categories of locations where ui checks in as
(ti1 , ti 2 ……, tiT ) , the number of check-in of each location
category as (ci1 , ci 2 ……, ciT ) , and the total number of check-ins
of ui as Ci . Assuming that the total number of users is L,
p(k) denotes the probability user k checks in at location tik .
We introduce the concept called location information
entropy, which is defined as,
L
E(ti ) = ¦− pi (k) logp (k ) i
(5)
i=1

As we can observe from Eq. (5), if many users check in


at location l1 , and each user checks in at l1 many times, l1
may be a public place and the value of location information
entropy is larger. Users who have common check-in
locations are more likely to become friends in the future.
Taking the home of user A for example, A has the most
Figure 1. The approach for friendship prediction
number of check-in of A’s home, if user B sometimes has
check-in at this place, B is more likely to be A’s friend. If
In order to prove that the selected features are irrelevant
both A and B have check-ins in public places, there more
to classification algorithms, we adopt three classification
likely to be happen by chance. We ignored those locations
models for friendship inference, namely Random Forests
whose location information entropy are greater than 5. For
[27], SVM [28], and Naive Bayes [29]. We use Weka [30] to
each pair of user i and j, we use Eq. (6) to quantify the
implement Random Forest and Naive Bayes, and use the
location category property at . LibSVM [31] for SVM implementation.
M N
at (i, j ) = ¦¦ (cim + c jn ) (Ci + C j ) (6)
m =1 n =1 V. EXPERIMENTAL RESULTS
tim =t jn andE (tim )<5
As we believe that the model of classification is effective
only when the selected features characterize users’

2226
relationship well, we use a Bayes model to reproduce the potential friendship. The friendship in the user social
social topology network to verify the performance of the topology network is very sparse. Take the Paris check-in
selected features and use the ROC curve to evaluate the dataset for example, the number of friend pairs is 5,590 and
ranking performance. We also compare the modeling non-friend pairs is 3,722,225. The degree of the imbalance of
performance with Multi-Layered Friendship Model [8] and data [32] is larger than 1:700. Weiss et al [33] studied the
analyze the contribution of the online and offline features in distribution of training dataset and the performance of
our model. classification, and showed that a relatively balanced class
distribution of training dataset will get better performance.
A. Feature Exploration Thus we choose the ratio of friendship and non-friendship of
We assume | E | denotes the total number of the pair of
s the training data as 1:3. We select users by city. Taking
friendship, | AEs | denotes the total number of the pair of Beijing for example, in the dataset of Beijing we select those
users, av denotes a value of a feature, Pf (av ) denotes the users who have at least one check-in in Beijing and select all
probability of value av for friend pairs and Pnf (av ) denotes check-in data of selected users.
the probability of value av for non-friend pairs. As we know, 1) Mining Potential Friendship
the pair of friends is very sparse in user social topology Mining potential friendship in the social topology
network, that means | E |<<| AE | , then, we have s s network is an important measure of friendship prediction.
p(av ) ≈ pnf (av ) . As Bayes theorem formula shows, the We delete some friendship in the social topology randomly
probability of the pair of users is friends is calculated in Eq. and build a model for friendship prediction with
(8). classification algorithm. We use the model to divide the
P (av | u1u 2 ) × P (u1u2 ) Pf (av ) | Es | edges from social topology network into the edges of
P(u1u2 | av ) ≈ = × (8)
Pnf (av ) Pnf (av ) | AEs | friendship and the edges of non-friendship.
Assuming that F (av ) = Pf (av ) Pnf (av ) , we calculated the We select the check-in data in Paris of Foursquare and
probability of the three selected features, assuming the data in Shanghai of Jiepang for this experiment. 5% and 10%
selected features are independent of each other and ignored friendship edges were deleted randomly in the two datasets
O( P 2 ) and O( P 3 ) , then we have,
respectively. We use the three classification models to learn
and infer friendships. Using re-sampling method [32], we
P(u1u2 | as at al ) = P1 + P2 + P3 = ( F1 + F2 + F3 ) × | E s | | AEs | (9)
randomly selected training data to build the model and make
1 friendship prediction for fifty times.
The results of mining potential friendship are shown in
0.9
Fig. 3. As we can see that all the inference models perform
0.8 AUC=0.9401 AUC=0.6654 well even with 10% friendships removed in the social
0.7
AUC=0.7051
topology network. Compared with using the data of Jiepang,
True positives

0.6 the model of the data of Foursquare has lower precision and
AUC=0.9291 higher recall, which means that the model of the data of
0.5
Foursquare has mining more potential friendship but has
0.4 lower credibility. Also, there is not much difference between
0.3 user social topology deleting 5% friendship and 10% friendship. It means that the
location category geographical features make up for the lack of topology
0.2
check-in location features to some extent. Generally, the classification model
0.1 three feature fusion
built with the selected features is effective in friendship
0
0 0.2 0.4 0.6 0.8 1 prediction with both datasets.
False positives 2) Two-fold Cross-Validation
Figure 2. The ROC curves of Multi-Layered Friendship Model ranking
We used the two-fold cross-validation to verify the
reliability of the friendship prediction model. In the dataset
We selected 2,731 users who have updates in Foursquare of Foursquare, we select 2,731 users who have update in
to build a Bayes model according to Eq. (9). We rank users’ Paris and collect all of their updates as D1 and select 5,665
friendship by comparing the summarization of the three users who have update in London and collect their updates as
features’ value of F. As shown in Fig. 2, the area of the ROC D2. We first train on D1 and test on D2, and then train on D2
curve is 0.9401, which means the model reproduces the user and test on D1. In the dataset of Jiepang, we select 3,656
social topology network very well. This analysis indicates users who have update in Beijing and collect all of their
that the three selected features characterize the relationship updates as D3 and select 5,275 users who have update in
of users in social topology network effectively. Shanghai and collect their updates as D4. Similarly, we train
on D3 and test on D4, and vice versa.
B. Friendship Prediction As the friendship in the user social topology network is
As we have proved that the selected features characterize relatively sparse, the number of the pair of non-friendship
users’ relationship well, we are committed to make was selected three times than the number of the pair of
friendship prediction with higher precision and recall. friendship to reduce the effect of imbalance data to the result
We use check-in data from two real LBSNs, namely of classification [33].
Foursquare and Jiepang, to validate the accuracy of mining

2227
Paris delete 5% friendship Paris delete 10% friendship
1 1

0.5 Precision 0.5


Recall
F-measure
0 0
RandomForest SVM NaiveBay es RandomForest SVM NaiveBayes
(a) (b)
Shanghai delete 5% friendship Shanghai delete 10% friendship
1 1

0.5 0.5

0 0
RandomForest SVM NaiveBayes RandomForest SVM NaiveBayes
(c) (d)

Figure 3. Mining potential friendship

Paris train/London test London train/Paris test


1 1

0.5 Precision 0.5


Recall
F-measure
0 0
Random Forest SVM NaiveB ay es Random Forest SVM NaiveBayes
(a) (b)
Beijing train/Shanghai test Shanghai train/Beijing test
1 1

0.5 0.5

0 0
Random Forest SVM NaiveBayes RandomForest SVM NaiveBayes
(c) (d)

Figure 4. Cross-verify of checking data

The results of two-fold cross-validation are shown in make friendship prediction on the dataset of Foursquare. As
Fig.4. For the dataset of Foursquare, SVM performs the best. we can see from Fig. 5, although we get good performance
The result of Naïve Bayes is also acceptable. In the dataset of on the accuracy of the model, the precision is less than 0.1
Jiepang, the experiment results show that all of the three when the value of F is lower, when F is bigger, the recall is
algorithms have high precision, recall and accuracy of almost close to zero and the precision is not greater than 0.4.
friendship prediction. As we can see from two sets of the It means that the model is prone to predict users who are not
cross-validation results, although there is some difference friends. It is obvious that identifying someone who will be
between the three classification algorithms, we have friends to users is more interesting than find someone who is
achieved anticipated goal. The results indicate that our model not. We can get the conclusion that our method performs
can be trained using the data in one city and used to test well than the Multi-Layered Friendship Model [8] on
other cities. friendship prediction.
C. Comparison with Other Model D. Effect of Different Feature Combinations
We compare the performance of our approach with We also analyze the contribution of the online and offline
Multi-Layered Friendship Model [8].We use the same features by using different combination of features to make
features to build Multi-Layered Friendship Model [8] to friendship prediction with our method. As we can see from

2228
Fig.6, it is obvious that our model performs well when using ACKNOWLEDGEMENTS
online feature (social topology), which means that online This work was partially supported by the National Basic
features make greater contribution than offline features in Research Program of China (No.2012CB316400), the
our method. This also shows that the online behaviors of National Natural Science Foundation of China (No.
users reflect their preference better. This phenomenon could 61222209, 61103063, and 61373119), the Program for New
explain why the model performance better in Jiepang. In Century Excellent Talents in University (No. NCET-12-
China’s traditional culture, friend of a friend is a friend, 0466), the Specialized Research Fund for the Doctoral
while in western countries, there not. Program of Higher Education (No. 20126102110043), the
0.9 Natural Science Basic Research Plan in Shanxi Province of
China (No. 2012JQ8028) and the Basic Research Foundation
0.8
of NPU (No. JC20110267).
0.7 Recall of Multi-model
Precision of Multi-model REFERENCES
0.6 F-measureof Multi-model
Recall of our model [1] J. Leskovec, K. J. Lang, A. Dasgupta and M. W. Mahoney,
0.5 Precision of our model “Statistical properties of community structure in large social and
F-measure of our model information networks,” In Proceedings of the 17th international
0.4 conference on World Wide Web, 2008, pp. 695-704 .
0.3 [2] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel and B.
Bhattacharjee, “Measurement and analysis of online social networks,”
0.2 In Proceedings of the 7th ACM SIGCOMM conference on Internet
measurement, 2007, pp. 29-42.
0.1
[3] K. Wakita and T. Tsurumi, “Finding community structure in Mega-
0 scale social networks,” In Proceedings of the 16th international
0 50 100 150 200
F value
conference on World Wide Web, 2007, pp. 1275-1276.
[4] H. Kwak, Y. Choi, Y. H. Eom, H. Jeong and M. Sue, “Mining
Figure 5. The performance of Multi-Layered Friendship Model communities in networks: A solution for consistency and its
evaluation,” In Proceedings of the 9th ACM SIGCOMM conference
on Internet measurement conference, 2009, pp. 301-314.
1
[5] B. Dou, S. Li and S. Zhang, “Social Network Analysis Based on
Structure,” Chinese Journal of Computers, pp. 741-753, April 2012.
0.8
[6] R. Tan, J. Gu, J. Yang, X. Lin, P. Chen and Z. Qiao, “Designs of
privacy protection in location-aware mobile social networking
0.6 applications,” Journal of Software, 2010, pp. 298-309.
Precision
Recall [7] E. Cho, S. A. Myers and J. Leskovec, “Friendship and mobility: User
0.4 movement in location-based social networks,” In KDD 2011, pp.
F-measure
1082-1090.
0.2 [8] N. Li and G. Chen, “Multi-Layered Friendship Modeling for
Location-Based Mobile Social Networks,” Int’l. Conf.Mobile and
0 Ubiquitous Systems: Computing, Networking and Services, Toronto,
a b c a+b a+c b+c a+b+c Canada, pp. 1-10, July 2009.
Figure 6. Performance of different combination of features, a denote [9] M. F. Schwartz and D. M. Wood, “Discovering shared interests using
social topology, b denotes location category and c denotes check-in graph analysis,” Communications of the ACM, pp. 78–89, August
location, “a+b” means the combination of a and b, “a+b+c” means the 1993.
combination of a, b and c. [10] R. Grob, M. Kuhn, R. Wattenhofer and M. Wirz, “Cluestr: Mobile
social networking for enhanced group communication,” In
Proceedings of the ACM 2009 International Conference on
VI. CONCLUSION Supporting Group Work, pp. 81-90, May 2009.
[11] A. Sadilek, H. Kautz and J. P. Bigham, “Finding Your Friends and
In this paper, we propose an approach for friendship Following Them to Where You Are,” In Fifth ACM International
prediction based on network topology and geographical Conference on Web Search and Data Mining, 2012, pp. 723-732.
feature fusion. We select useful features through information [12] S. Gregory, “An algorithm to find overlapping community structure
gain metric. We build classification model to fuse the online in networks,” In PKDD 2007, pp. 91-102, September 2007.
and offline features and experimental results on two real [13] C. Ozseyhan, B. Badur and O. N. Darcan, “An Association Rule-
LBSN datasets show the effectiveness of the proposed Based Recommendation Engine for an Online Dating Site,” In
approach. For future work, we will focus on building a Communications of the IBIMA, 2012.
friendship recommendation model with incremental update [14] N. Li and G. Chen, “Analysis of a Location-Based Social Network,”
to achieve the scalability. We will also investigate the usage In Proceedings of the 2009 International Conference on
Computational Science and Engineering, 2009, pp.263-270.
of our friend prediction algorithm for social activity
[15] J. Cranshaw, E. Toch, J. Hong, A. Kittur and N. Sadeh, “Bridging the
organization [34] and opportunistic social networking [35]. gap between physical location and online social networks,” In
The prior one can suggest potential participants to be invited Ubicomp, 2010, pp. 119-128.
by activity initiators, whereas the latter one can detect nearby [16] B. Guo, D. Zhang, Z. Yu and X. Zhou, “Hybrid SN: Interlinking
like-minded people and cue potential interactions among Opportunistic and Online Communities to Augment Information
them. Dissemination,” The 9th IEEE International Conference on

2229
Ubiquitous Intelligence and Computing (UIC'12), Fukuoka, Japan [27] L. Breiman, “Random forests,” Machine Learning, pp. 5-32, October
2012. 2001.
[17] L. Lv, “Link Prediction on Complex Networks,” Journal of [28] C. CORTES and V. VAPNIK, “Support-vector network,” Machine
University of Electronic Science and Technology of China, 2010, pp. Learning, 1995, pp. 273-297.
651-661. [29] H. Zhang, “The Optimality of Naive Bayes,” In: Proceedings of the
[18] D. Liben-Nowell and J. Kleinberg, “The link prediction problem for Seventeenth International Florida Artificial Intelligence Research
social networks,” Journal of the American Society for Information Society Conference. AAAI Press, 2004.
Science and Technology, 2007, pp. 1019-1031. [30] G. Holmes, A. Donkin and I. H. Witten, “Weka: A machine learning
[19] T. Zhou, L. Lv, Y. Zhang, “Predicting missing links via local workbench,” Intelligent Information Systems, 1994, pp. 357-361.
information,” In the European Physical Journal B October 2009, pp. [31] C. Chang and C. Lin, “LIBSVM: A library for support vector
623-630 . machines,” ACM Transactions on Intelligent Systems and
[20] J. Leskovec, D. Huttenlocher and J. Kleinberg, “Predicting positive Technology, April 2011.
and negative links in online social networks,” In Proceedings of the [32] Z. Yun, B. Yang and W. Qu, “Survey of Mining Imbalanced
19th international conference on World wide web, 2010, pp. 641-650. Datasets,” Computer Science, 2010, pp. 27-31.
[21] T. M. Mitchell, “Machine Learning,” The Mc-Graw-Hill Companies, [33] G. Weiss and F. Provost, “Learning when training data are costly:
Inc., 1997. The effect of class distribution on tree induction,” Journal of
[22] C. Shannon, “A Mathematical Theory of Communication,” ACM Artificial Intelligence Research 19, 2003, pp. 315-354.
SIGMOBILE Mobile Computing and Communications Review, pp. [34] B. Guo, H. He, Z. Yu, D. Zhang and X. Zhou, "GroupMe: Supporting
3-55, January 2001. Group Formation with Mobile Sensing and Social Graph Mining",
[23] [Online]. Available: https://dev.twitter.com/docs. The 9th International Conference on Mobile and Ubiquitous Systems:
[24] [Online]. Available: https://developer.foursquare.com/docs. Computing, Networking and Services (MobiQuitous'12), Beijing,
China, 2012.
[25] [Online]. Available: http://dev.jiepang.com.
[35] B. Guo, D. Zhang, Z. Wang, Z. Yu and X. Zhou, "Opportunistic IoT:
[26] J. Leskovec, L. Backstrom, R. Kumar and A. Tomkins, “Microscopic
Exploring the Harmonious Interaction between Human and the
evolution of social networks,” In Proceeding of the ACM KDD, pp.
Internet of Things", Journal of Network and Computer Applications
462- 470, August 2008.
(JNCA), Elsevier, 2013.

2230

You might also like