Professional Documents
Culture Documents
ABSTRACT Complex networks have become high-dimensional, sparse, and redundant due to the rapid
expansion of the Internet. Effective link prediction techniques are needed to obtain the most relevant and
important information for Internet users. A new link prediction algorithm based on the area under the receiver
operating characteristic curve (AUC) as the evaluation metric is proposed in this paper. In the proposed
method, the AUC is treated as the objective function and the link prediction problem is transformed into an
optimization problem. A group of topological features is defined for each ordered pair of nodes. By using
those features as the attributes of the node pairs, link prediction can be treated as a binary classification, where
the class label of each node pair is determined by the existence of a directed link between the node pair. Then,
the binary classification problem can be solved by the AUC optimization. According to the empirical results,
high-quality predictions can be achieved by our algorithm.
INDEX TERMS Link prediction, AUC metric, complex networks, target function.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
28122 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
B. Chen et al.: Link Prediction on Directed Networks Based on AUC Optimization
tion in the genetic algorithm. To address the issue of link AUC is the most common measurement to evaluate the
prediction, a non-negative matrix factorization model based quality of predictions results. As a crucial measure of per-
on neighborhoods was proposed by Zhao et al. [27]. The formance, AUC has been applied to various tasks, including
model adopts the factors of characteristics from the topolog- class-imbalance learning, cost-sensitive learning, instances
ical structure that combine with the local neighboring struc- ranking, and information retrieval. For link prediction in
ture of the potential network. A rank aggregation approach networks, regardless of the method we use, our final goal is to
to predict links in complex networks was proposed by optimize the AUC. Since AUC is a measurement of the link
Pujari and Kanawati [28]. Complete Bayesian inference was prediction result quality, we can optimize it directly instead of
performed by Ermis and Cemgil [29] through variational calculating alternative similarity indexes or optimizing other
Bayesian (VB) methods on the coupled and single-tensor parameters, which is required in other models describing the
factorization models. The approach can be performed in large topological features of the network. Inspired by this fact,
models. Cai et al. [30] presented a quick comparison on the we present a link prediction method that directly optimizes
basis of a method used to predict the relations connected to the AUC. In the method, AUC is treated as the objective
a specific node. They used a random walk method to build a function, and link prediction is transformed into an optimiza-
road to the relations of a specific node. Lv et al. [31] presented tion problem. A group of topological features is defined for
a common structural similarity index that independent of each ordered pair of nodes. By using those features as the
previous knowledge of the network in terms of perturbation of attributes of the node pairs, link prediction can be treated as
the adjacency matrix. Li et al. [32] added the determination a binary classification problem in which the class label of
of a definition to the benefits of their own final roads and each node pair is determined by whether a directed link exists
the roads related to the final nodes and proposed a new between the node pair. Thus, the link prediction problem
consistency index called Scop. can be reduced to determining which edges belong to the
Recently, many scholars found that a modular network positive class, where the edges exist, and which belong to the
structure is capable of disclosing future links. Some scholars negative class, where the edges are absent. Then, the binary
have used local partitioning of a network to predict future classification problem can be solved by AUC optimization.
links. For instance, Zhang et al. [33] found that information According to the experimental findings on actual networks,
provided online is both misleading and redundant. Due to the accurate results and high speed are achieved by the algorithm
‘‘less can be more’’ characteristics, some algorithms can be in comparison with other methods.
designed to improve prediction performance by eliminating
links from the original networks. Clemente and Grassi [34] III. LINK PREDICTION AND AUC SCORE
focused on the problem of evaluating the local clustering in A network represented by a directed graph G = (V , E) is
complex networks. This measure can be defined in various discussed in this paper. In the graph, E is the set of links
ways for cases of networks with weighted edges. and V is the set of nodes. G does not allow self-connections
However, less attention has been given to directed and and multiple links. When N = |V | is the number of nodes, U
weighted networks. The authors provided a new local clus- is used to define the set that contains all the possible directed
tering coefficient for these types of networks, beginning links of N (N − 1). When aij = 1 and A = [aij ] is the
with those existing in previous studies for the undirected neighboring matrix of network G, there is a linkage between
and weighted case. Moreover, four specific components are nodes j and i; otherwise, aij = 0. The same derivation can
extracted to separately consider different types of triangle be applied to the weighted links, in which the A entries refer
patterns. to constant or ordinal values. The link prediction task can be
Moreover, an empirical application on several real net- viewed as links that will show up in the future or missing links
works was presented, and the performance of coefficients in the series of links U − E that do not exist.
from previous studies was compared with the performance of Our approach is to give a score, S(x, y), to every pair
our coefficients. Agreste et al. [35] assessed the community of ordered nodes (x, y) ∈ U . The current possibility of a
detection algorithms in terms of scalability and accuracy direct link between nodes is indicated by the score. The larger
and proposed the infomap algorithm, which can be regarded S(x, y) is, the higher the probability that there is a link from
as a promising tool for the purposes of web data analysis, node x to node y for a pair of nodes (x, y) in U \E.
to illustrate the trade-off between computational performance To evaluate the link prediction results, links in the network
and accuracy. Santos et al. [36] proposed a consensus graph are divided into two parts: the probe set E P represents the
that clusters algorithms in terms of directed networks and unknown information for forecasting and the training set
achieves superior performance with small-sized clusters and E T represents the known information. E T ∪ E P = E and
a high mixture index. E T ∩ E P = ∅. An ordered list of all non-observed links (i.e.,
Nevertheless, many algorithms based on similarity con- U −E T ) is provided by the link prediction algorithm, or every
sider only the benefits of roads related to two nodes and non-observed link is provided. Therefore, for (x, y) ∈ U −
ignore the effects the two nodes when considering their com- E T , the score Sxy quantifies the current probability.
parability. Hence, the inaccurate outcomes cannot be fully AUC is the most widely adopted evaluation metric to
trusted. quantify the accuracy of the prediction approaches to assess
Here, NOI , NOO , NIO and NII are, respectively, the numbers Equation (12) as shown at the bottom of this page, yields
of triangles involving node vi of type OI , OO, IO and II , a coefficient that is the mean of all the links. All the calcula-
corresponding to Figure 2(a), (b), (c) and (d), respectively. tions of diout + djin can be dropped to approximate the local
7. The clustering coefficient of ordered node pair (vi , vj ). assortativity between the node pair (vi, vj) for M . M refers to
The ratio between the two nodes and the number of trian- the number of links in the network. The degree of correlation
gles can be used to define the clustering coefficient between rijOI of node pair (vi, vj) is defined as follows:
an ordered node pair (vi , vj ). Since the edges of the triangles
have directions, 4 types of triangles, OI, OO, IO and II, are 4diout djin − diout − djin
rijOI = . (13)
considered, as shown in Figure 3. 2(diout )2 + 2(djin )2 − diout − djin
From (13), we can see that rijOI ranges in 0 < rijOI < 1.
Similarly, we can define the degrees of correlation rijOO , rijII
and rijIO of node pair (vi , vj ) as follows:
to show a mixture of disassortative or assortative pairs. In in 0out (v) that do not belong to any of the previous neighbor-
i−1
two types of degree for each node, we define 4 types of neighbors of node u.
degrees of correlation for a directed network, rijOI , rijOO , rijII We define CN 2 (v, u) = 0out (v) ∩ 0in (u) = 0out
1 (v) ∩ 0 1 (u)
in
and rijIO , corresponding to different combinations of the types as the 2n d-order common neighbors of ordered node pair
(v, u), and i
of degrees of the two nodes. For instance, degree correlation h the i-th-order icommon neighbors as CN (u, v) =
j
j+k=i 0out (v) ∩ 0in (u) . For instance, the 3 d-order
S k r
r OI is defined as:
common neighbors of ordered node pair (v, u), CN 3 (v, u), vj from vi , which is:
is the combination of the intersection of 0out 1 (v) and 0 2 (u)
in
and that of 0out (v) and 0in (u). We use CN i (v, u) (i =
2 1 X 1
sRA
ij = (21)
dzin · dzout
2,i. . . , n)
1, as the features of ordered node pair (vi , vj ). Z ∈0(vi )∩0(vj )
CN (v, u) can be calculated via adjacent matrix A. For
instance, the 3r d-order common neighbors of ordered node The RA index assigns greater weight to less-connected
pair (v, u), CN 3 (v, u), can be calculated as follows: neighbors to refine the counting of common neighbors.
X Of the above features, 1 to 8 are based on node properties,
ahw awk ,
3
CN (v, u) = (18)
and the rest are based on ordered-node-pair properties. Except
h,k,w6=v,u,avh =aku =1 the Leader-Follower ranking, every feature has a similar ver-
sion in undirected networks. Therefore, our method can be
10. Jaccard coefficient.
applied to both directed and undirected networks.
The data are utilized to compare the diversity and similarity
of the neighboring sets. For an ordered node pair (vi , vj ),
V. THE LINK PREDICTION ALGORITHM
the Jaccard coefficient is the size of the intersection divided
The main idea of our algorithm is that we use the AUC
by the size of the neighbor sets 0in (vj ) and 0out (vi ), as shown
metric as the target function of optimization to obtain high-
in the formula below, and it is used to evaluate the similarity
quality link prediction results. AUC is a good measure of
of the neighboring sets of the two nodes.
performance. Several algorithms have been devoted to the
0out (vi ) ∩ 0in (vj ) optimization of AUC, mainly by reducing the surrogate con-
Jaccard
sij = (19) vex loss on a set of training data.
0out (vi ) ∪ 0in (vj )
sPA out
ij = di × djin . (20) h(xij ) = sij = wi xijT , (22)
This formula can be used to quantify the importance of where wi = (wi1 , wi2 , . . . , wim ) is the weight vector, and wij
the links subject to different dynamics based on networks, reflects the importance of the j − th feature on the prediction
for instance, transportation, synchronization, and percolation. results. Our goal is to find the optimal weight vector w∗i such
This feature does not require information about the neighbor- that sij and yij are as close as possible. Let W = [wij ] be an
hood of every node; therefore, the computation is simple. n ∗ n weight matrix, and let S = [sij ] be the similarity matrix.
12. Resource allocation index (RA). The goal is then to identify a function h that maps A to the
The dynamics of resource allocation on the complex net- target S such that S = h(X ) = WX T . Namely formula (23)
works motivate this index. For an ordered node pair (vi , vj ) and (24).
that is not directly connected, node vi can send resources to vj Here, we use P = N (N − 1)/2 to denote the number of
with its neighbors as transmitters. The simple case assumes links in the universal set containing all possible links. The
that each transmitter has a resource unit and will distribute number of links in the network is M . I (b) is an indicator
the information to all its neighbors. The RA index between function that returns 1 if the argument is 0 and true, and the
vj and vi can be defined as the quantity of resources sent to set of neighbors of node vk can be denoted as 0(vk ), with
n
1 X X X 1
AUC = [I (ski > skj ) + I (ski = skj )] (23)
M · (P − M ) 2
/
k=1 vi ∈0(vk ) vj ∈0(v k)
n
1 X X X 1
AUC = [I (h(xki ) > h(xkj )) + I (h(xki ) = h(xkj ))]. (24)
M · (P − M ) 2
/
k=1 vi ∈0(vk ) vj ∈0(v k)
n
λ 1 X X X
L(w) = kwk2F + ι[h(xki ) − h(xkj )]. (26)
2 M · (P − M )
/
k=1 vi ∈0(vk ) vj ∈0(vk)
n
λ 1 X X X
L(w) = kwk2F + [1 − wk (xki − xkj )T ]2 . (27)
2 M · (P − M )
/
k=1 vi ∈0(vk ) vj ∈0(vk)
λ 1 X X
Lk (wk ) = kwk k2F + [1 − wk (xki − xkj )]2 . (28)
2 M · (P − M )
/
vi ∈0(uk ) vj ∈0(u k)
∂Lk 1 X X
= λwk − {2(xki − xkj ) − 2wk (xki − xkj )(xki − xkj )T }. (30)
∂wk M · (P − M )
/
vi ∈0(uk ) vj ∈0(u k)
TABLE 2. Comparison of the accuracy of algorithms quantified by AUC. TABLE 3. Comparison of the accuracy of algorithms in terms of precision.
In Figure 4, we find that a maximum AUC still exists when The results from the 11 methods show that the AUC_LP
λ and η are not simultaneously small, and the optimal AUC algorithm outperforms the other ten algorithms in terms of
under all possible values of λ and η occurs in the region where precision. Specifically, compared with those of CN, the pre-
they are greater than 0.3. Specifically, the optimal parameters cision in improved by 13.27% for USAir, 18.19% for PB,
are λ=0.6 and η=0.3 in USAir, λ = 0.5 and η = 0.4 in PB, 164.04% for PPI and 1104.80% for INT. When compared
λ = 0.35 and η = 0.75 in PPI, and λ = 0.9 and η = 0.4 in INT. against Salton, Jaccard and LHN_I, the improvement is even
In addition, the highest accuracy is achieved by the algo- greater. AUC_LP’s performance for dissimilar networks is
rithm when using AUC as the metric under the optimal especially strong: it is either perfect or almost perfect. By con-
parameters λ and η. The mean AUC for the various algorithms trast, no other algorithms are successful.
in the 10-fold cross-validation (CV) are shown in Table 2. The In addition, we also compare our method with some
highest AUC score of the 11 algorithms for each dataset is representative methods (mainly based on community detec-
highlighted in bold. tion, clustering coefficient, and time window) to demon-
As shown in Table 2, AUC_LP index has the best overall strate the performance of our method. Figure 5 presents
performance among the local indices. For example, compared the prediction results four networks obtained with seven
with those of CN, the average AUC is improved by 1.12% research approaches (S-Rank, CMA-ES, VB, Node-LP, WT,
for USAir, 4.64% for PB, 7.48% for PPI, and 51.54% for ConClus-LP, and CD-LP). The study is performed with 90%
INT. When compared against Jaccard, the average AUC is of the data collected from the networks used as the training
improved by 5.28% for USAir, 12.09% for PB, 7.01% for set and the rest used for the test set.
PPI, and 49.92% for INT. When the comparison is against PA, In addition, the size of the training dataset was increased
the improvement is even greater: 30.35% for USAir, 28.60% from 50% to 90%. The results of the different methods with
for PB, 8.13% for PPI, and 51.54% for INT. Therefore, different training dataset sizes are shown in Figure 6 and
AUC_LP achieves the best accuracy, as measured by AUC. Figure 7.
Besides AUC, precision is another important metric. The According to the test results, the efficiency of the
mean precision of the algorithms in the 10-fold CV is shown approaches can be improved by selecting appropriate network
in Table 3. The highest precision achieved by the algorithms characteristics. Moreover, the prediction accuracy is unre-
in each dataset is shown in bold. lated to the size of the network. Specifically, the prediction
FIGURE 5. The comparison of the accuracy of algorithms according to AUC and Precision in different
datasets.
FIGURE 6. Comparison of the average AUC of different algorithms with different training dataset sizes.
accuracy can be improved if the study targets a set of net- can provide insight into reactions to disturbances, such as the
works. extinction of a species [38]. Network robustness is helpful
Extensive experiments have been performed to observe for biologists to explore mutations and diseases and also
networks and to verify the validity of the AUC_LP algorithm for the recovery of mutations [39]. The network robustness
compared with other link prediction algorithms. The results principles help to promote the understanding of the risks and
of this study can be used to assess other network-prediction stability of banking systems [40]. Additionally, in engineer-
algorithms. Moreover, this study confirms previous research ing, network robustness is helpful to assess the resilience of
results and simultaneously provides a reference for the infrastructure mechanisms, for example, power grids or the
study of information screening technology and relevant Internet [41].
professionals. Most networks are not random networks. Although the
number of nodes is small, there can be a massive num-
A. ROBUSTNESS ANALYSIS OF THE ALGORITHM ber of connections. The majority of networks are quite
Robustness refers to the capacity to endure perturbations and small, in general, consistent with the law promoted by
failures and serves as the key attribute to many complex Zipf [42]. A complex network with a power law dis-
mechanisms, including complex networks. tribution and degree distribution is called a scale-free
Studies of the robustness of complex mechanisms are of network.
great importance in many fields [37]. Robustness serves as an Scale-free networks are seldom impacted by random fail-
important attribute of ecosystems in the field of ecology and ures. Usually, the small nodes are randomly selected. Even
FIGURE 7. Comparison of the average precision of different algorithms with different training dataset sizes.
FIGURE 8. The relationship between the degrees of nodes and the number of nodes.
when these nodes are randomly removed, the basic connectiv- MA-CR, namely, the memetic algorithm, through a two-level
ity is maintained. A scale-free network is highly vulnerable to learning strategy to promote network community robustness
deliberate attacks compared with random networks because while maintaining the community structure and degree distri-
of the non-uniformity, fault tolerance and survivability. The bution with the goal of decreasing two-level targeted attacks.
nodes of a scale-free network do not have to be removed if Research on real-world networks and scale-free networks
the strongest central nodes with the maximum connectivity in has assessed the stability and effectiveness of the proposed
the system maintain a strong influence on the connectivity of algorithms relative to other high-end algorithms. Baig and
the whole network and satisfy the critical point. Afterward, Akoglu [46] conducted the first large-scale empirical corre-
the network is separated into isolated islands that are inca- lation analysis of attack strategies, i.e., the node importance
pable of contacting each other. metrics they adopt for the graph robustness.
Thus, many studies have proposed robust evaluation This task is approached in three ways and illustrates that
indexes and designs to prevent intentional and robust attacks. some computationally complex strategies can be closely pre-
For instance, De Meo et al. [43] found that graph robust- dicted by simpler strategies. Moreover, a few strategies can
ness can be predicted rapidly via the Randic index. Alenazi be adopted to serve as a close proxy for consensus.
and Sterbenz [44] claimed that adding links to maintain the The original dataset is analyzed first during the robustness
balance of the link-betweenness would lead to the best net- analysis. Additionally, the relationship between the degrees
work resilience against the aforementioned attacks among of the nodes and the number of nodes in the original network
the robustness metrics studied. Liu et al. [45] created a new must be analyzed. Figure 8 presents the experimental results.
FIGURE 9. Analysis of the nodes’ degree in the training set and probe set.
Figure 8 shows an inverse proportional relationship serves as the ordinate. Thus, the partitioning of the dataset can
between the node degree and number of nodes. Therefore, be seen as a network attack. The ratio of training set is set to
in these networks, when the node degree is small, there is 50%, and correlation of the node degree is obtained randomly.
a greater number of occurrences. By contrast, a large node Figure 9 (a-d) presents the results of these four datasets. The
degree is associated with a smaller number of occurrences. scatter plots in panels (a) to (d) demonstrate the node degree
The distribution of the four network degrees generally in the NP, that is, the training set, and NT, that is, the probe
follows the Zipf law. Therefore, the four network degrees set, when the dataset is randomly classified. Panel (e) and
mentioned above can be regarded as scale-free networks. panel (h) present the relationship between the changes in the
Malliaros et al. [37] considered scale-free networks under ratio of the training set and the nodes’ degrees in the probe
deliberate and random attacks. Furthermore, the robustness set and training set. The red line corresponding with the sets
of the network connectivity with the two class node removal represents random classification.
strategies was compared. The results showed that scale-free Figure 9 shows that NP and NT have an almost linear
networks are robust to random node failures, so the robustness relationship because the links are selected as training set and
of these four networks can be analyzed preliminarily. probe set with equal probability. Additionally, C–Node is
Moreover, because connections between these objects may approximately 1 in the datasets divided randomly as training
exist in both the test set and the training set, we need to set ratio changes.
consider the number of node links in the training set, which By comparing the network topology properties of the orig-
serves as the abscissa, and the node links in the test set, which inal dataset and the training set, we can conclude that the
FIGURE 11. The change in the AUC of the AUC_LP algorithm with NC .
scheme used to divide datasets in a deliberate attack does Therefore, the AUC_LP algorithm not only achieves higher
not affect the original network degree distribution. The basic AUC values on real networks than those of other algorithms
characteristics of the original network can be maintained but is also robust.
while achieving high robustness.
VII. CONCLUSION
Various areas, including anthropology, sociology, computer
B. CONVERGENCE ANALYSIS OF THE ALGORITHM science, and information science, have recently paid great
We also study the convergence conditions of the algorithm. attention to the issue of link prediction due to the current
Formula (28) is convex because it is a quadratic loss function. large number of network statistics in electronic form. A novel
Therefore, a gradient descent strategy (31) can be used to test framework is proposed to assess the performance of a link
the convergence of the extreme points of Formula (28). prediction algorithm in this article. In the algorithm, the score
The rate of convergence depends on the step size µt . If the matrix is effectively decomposed with the weight matrix and
step size is too large, convergence to the optimal solution the adjacency matrix. We use the AUC metric as the target
may be difficult, but a small step size may lead to slow function of optimization and obtain the optimal weight matrix
convergence. To achieve a better convergence rate, we use a via the gradient descent algorithm. According to the empirical
variable step size determined by the Armijo rule [47]. findings, the AUC_LP algorithm yields high-quality forecast-
In the process of the iteration of weight matrix W , we cal- ing results compared with those of other algorithms. Thus,
culate the error of W between
adjacent time
2periods. The error the findings in this article are beneficial because they can
formula is: Error (t) =
W (t) − W (t−1)
F . Then, we plot help Internet retailers to better utilize the current forecasting
the relation between the error and the iteration number of approaches.
the algorithm for different datasets. The results are shown
in Figure 10. REFERENCES
Figure 10 shows that in the iteration of weight matrix W , [1] B. Barzel and A.-L. Barabási, ‘‘Network link prediction by global silencing
the error trend presents a sharp decrease and then rapidly of indirect correlations,’’ Nature Biotechnol., vol. 31, no. 8, pp. 720–725,
reaches the state of convergence. Therefore, the iteration time 2013.
[2] L. Zhu, D. Guo, J. Yin, G. Ver Steeg, and A. Galstyan, ‘‘Scalable temporal
complexity of this step is O(NC), which is equivalent to O(1). latent space inference for link prediction in dynamic social networks,’’
Finally, we study how the AUC of the AUC_LP algorithm IEEE Trans. Knowl. Data Eng., vol. 28, no. 10, pp. 2765–2777, Oct. 2016.
under optimal parameters λ and η changes with the number [3] A. Papadimitriou, P. Symeonidis, and Y. Manolopoulos, ‘‘Fast and accurate
link prediction in social networking systems,’’ J. Syst. Softw., vol. 95, no. 9,
of iterations NC for different datasets. The results are shown pp. 2119–2132, 2012.
in Figure 11. [4] J. Tang, S. Chang, C. Aggarwal, and H. Liu, ‘‘Negative link prediction in
From Figure 11, we can see that the AUC of the AUC_LP social media,’’ in Proc. 8th ACM Int. Conf. Web Search Data Mining, 2015,
pp. 87–96.
algorithm shows an increasing trend and always maintains a
[5] K. Jahanbakhsh, V. King, and G. C. Shoja, ‘‘Predicting missing contacts
high value as NC increases in the four datasets. For example, in mobile social networks,’’ Pervasive Mobile Comput., vol. 8, no. 5,
the AUC value remains constant at approximately 0.935 when pp. 698–716, 2012.
the number of iterations is greater than 80 for the USAir [6] W. Liang, X. He, D. Tang, and X. Zhang, ‘‘S-Rank: A supervised rank-
ing framework for relationship prediction in heterogeneous information
dataset. For the PB dataset, the AUC is maintained at approxi- networks,’’ in Proc. Int. Conf. Ind., Eng. Other Appl. Appl. Intell. Syst.,
mately 0.940 when the number of iterations is greater than 75. Springer, 2016, pp. 305–319.
[7] H. Hwang, D. Petrey, and B. Honig, ‘‘A hybrid method for protein–protein [33] Q.-M. Zhang, A. Zeng, and M.-S. Shang, ‘‘Extracting the information
interface prediction,’’ Protein Sci., vol. 25, no. 1, pp. 159–165, 2016. backbone in online system,’’ PLoS ONE, vol. 8, no. 5, p. e62624, 2013.
[8] P. Wang, B. Xu, Y. Wu, and X. Zhou, ‘‘Link prediction in social networks: [34] G. P. Clemente and R. Grassi, ‘‘Directed clustering in weighted net-
The state-of-the-art,’’ Sci. China Inf. Sci., vol. 58, no. 1, pp. 1–38, 2015. works: A new perspective,’’ Chaos, Solitons Fractals, vol. 107, pp. 26–38,
[9] M. E. J. Newman, ‘‘Clustering and preferential attachment in growing Feb. 2018.
networks,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. [35] S. Agreste et al., ‘‘An empirical comparison of algorithms to find commu-
Top., vol. 64, no. 2, p. 025102, 2001. nities in directed graphs and their application in Web data analytics,’’ IEEE
[10] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Trans. Big Data, vol. 3, no. 3, pp. 289–306, Sep. 2017.
‘‘Network motifs: Simple building blocks of complex networks,’’ Science, [36] C. P. Santos, D. M. Carvalho, and M. C. V. Nascimento, ‘‘A consensus
vol. 298, no. 5594, pp. 824–827, 2002. graph clustering algorithm for directed networks,’’ Expert Syst. Appl.,
[11] S. A. Golder and S. Yardi, ‘‘Structural predictors of tie formation in twitter: vol. 54, pp. 121–135, Jul. 2016.
Transitivity and mutuality,’’ in Proc. IEEE 2nd Int. Conf. Social Comput. [37] F. D. Malliaros, V. Megalooikonomou, and C. Faloutsos, ‘‘Estimating
(SocialCom), Aug. 2010, pp. 88–95. robustness in large social graphs,’’ Knowl. Inf. Syst., vol. 45, no. 2,
[12] D. Yin, L. Hong, X. Xiong, and Brian D. Davison, ‘‘Link formation pp. 645–678, 2015.
analysis in microblogs,’’ in Proc. 34th Int. ACM SIGIR Conf. Res. Develop. [38] R. V. Solé and M. Montoya, ‘‘Complexity and fragility in ecological
Inf. Retr., 2011, pp. 1235–1236. networks,’’ Proc. Roy. Soc. London B, Biol. Sci., vol. 268, no. 1480,
[13] D. Garlaschelli and M. I. Loffredo, ‘‘Patterns of link reciprocity in directed pp. 2039–2045, 2001.
networks,’’ Phys. Rev. Lett., vol. 93, no. 26, p. 268701, 2004. [39] A. E. Motter, N. Gulbahce, E. Almaas, and A.-L. Barabási, ‘‘Predicting
[14] Y.-X. Zhu, X.-G. Zhang, G.-Q. Sun, M. Tang, T. Zhou, and Z.-K. Zhang, synthetic rescues in metabolic networks,’’ Mol. Syst. Biol., vol. 4, no. 1,
‘‘Influence of reciprocal links in social networks,’’ PLoS ONE, vol. 9, no. 7, p. 168, 2008.
p. e103007, 2014. [40] A. G. Haldane and R. M. May, ‘‘Systemic risk in banking ecosystems,’’
[15] D. M. Romero and J. M. Kleinberg, ‘‘The directed closure process in hybrid Nature, vol. 469, no. 7330, pp. 351–355, 2011.
social-information networks, with an analysis of link formation on twitter,’’ [41] R. Albert, I. Albert, and G. L. Nakarado, ‘‘Structural vulnerability of the
in Proc. ICWSM, 2010, pp. 138–145. North American power grid,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids
[16] D. Schall, ‘‘Link prediction in directed social networks,’’ Social Netw. Relat. Interdiscip. Top., vol. 69, no. 2, p. 025103, 2004.
Anal. Mining, vol. 94, no. 1, p. 157, 2014. [42] M. E. J. Newman, ‘‘Power laws, Pareto distributions and Zipf’s law,’’
[17] Q.-M. Zhang, L. Lü, W.-Q. Wang, Y. Xiao, and T. Zhou, ‘‘Potential theory Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005.
for directed networks,’’ PLoS ONE, vol. 8, no. 2, p. e55437, 2013. [43] P. De Meo et al., ‘‘Estimating graph robustness through the Randic index,’’
[18] R. N. Lichtenwalter and N. V. Chawla, ‘‘Vertex collocation profiles: Sub- IEEE Trans. Cybern., vol. 10, pp. 1–14, 2017.
graph counting for link analysis and prediction,’’ in Proc. 21st Int. Conf. [44] M. J. F. Alenazi and J. P. G. Sterbenz, ‘‘Evaluation and comparison of
World Wide Web, 2012, pp. 1019–1028. several graph robustness metrics to improve network resilience,’’ in Proc.
[19] X. Li and H. Chen, ‘‘Recommendation as link prediction in bipartite 7th Int. Workshop IEEE Reliable Netw. Des. Modeling (RNDM), Oct. 2015,
graphs: A graph kernel-based machine learning approach,’’ Decision Sup- pp. 7–13.
port Syst., vol. 54, no. 2, pp. 880–890, 2013. [45] W. Liu, M. Gong, S. Wang, and L. Ma, ‘‘A two-level learning strategy based
[20] S. Scellato, A. Noulas, and C. Mascolo, ‘‘Exploiting place features in memetic algorithm for enhancing community robustness of networks,’’ Inf.
link prediction on location-based social networks,’’ in Proc. 17th ACM Sci., vol. 422, pp. 290–304, Jan. 2018.
SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 1046–1054. [46] M. B. Baig and L. Akoglu, ‘‘Correlation of node importance measures: An
[21] J. Scripps, P.-N. Tan, F. Chen, and A.-H. Esfahanian, ‘‘A matrix alignment empirical study through graph robustness,’’ in Proc. 24th Int. Conf. World
approach for link prediction,’’ in Proc. 19th Int. Conf. Pattern Recognit. Wide Web, 2015, pp. 275–281.
(ICPR), Dec. 2008, pp. 1–4. [47] L. Armijo, ‘‘Minimization of functions having Lipschitz continuous first
[22] M. Pujari and R. Kanawati, ‘‘Link prediction in complex networks by partial derivatives,’’ Pacific J. Math., vol. 16, pp. 1–3, Nov. 1966.
supervised rank aggregation,’’ in Proc. IEEE 24th Int. Conf. Tools Artif.
Intell. (ICTAI), Nov. 2012, pp. 782–789.
[23] K.-Y. Chiang, N. Natarajan, A. Tewari, and I. S. Dhillon, ‘‘Exploiting
longer cycles for link prediction in signed networks,’’ in Proc. 20th ACM
Int. Conf. Inf. Knowl. Manage., 2011, pp. 1157–1162.
[24] C. A. Bliss, M. R. Frank, C. M. Danforth, and P. S. Dodds, ‘‘An evolution-
BOLUN CHEN received the Ph.D. degree from the
ary algorithm approach to link prediction in dynamic social networks,’’ Nanjing University of Aeronautics and Astronau-
J. Comput. Sci., vol. 5, no. 5, pp. 750–764, 2014. tics, Nanjing, China, in 2016. He was a Visiting
[25] R. R. Sarukkai, ‘‘Link prediction and path analysis using Markov chains,’’ Scholar with the University of Fribourg, Fribourg,
Comput. Netw., vol. 33, nos. 1–6, pp. 377–386, 2000. Switzerland. He is a Professor with the Huaiyin
[26] Y. Bhawsar, G. S. Thakur, and R. S. Thakur, ‘‘Model for link prediction in Institute of Technology, Lianyungang, China. His
social network by genetic algorithm approach,’’ Data Mining Knowl. Eng., research interests include link prediction, recom-
vol. 7, no. 5, pp. 191–196, 2015. mender systems, and data mining. He is currently
[27] Y. Zhao, S. Li, C. Zhao, and W. Jiang, ‘‘Link prediction via a neighborhood- a reviewer of the Journal of Information Science,
based nonnegative matrix factorization model,’’ in Proc. 3rd Int. Conf. World Wide Web Journal, and Reliability Engi-
Commun., Signal Process., Syst., 2015, pp. 603–611. neering & System Safety.
[28] M. Pujari and R. Kanawati, ‘‘Supervised rank aggregation approach for
link prediction in complex networks,’’ in Proc. 21st Int. Conf. World Wide
Web, 2012, pp. 1189–1196.
[29] B. Ermis and A. T. Cemgil. (2014). ‘‘A Bayesian tensor factorization
model via variational inference for link prediction.’’ [Online]. Available:
https://arxiv.org/abs/1409.8276
YONG HUA received the M.S. degree from the
[30] C. Dai, L. Chen, and B. Li, ‘‘Link prediction based on sampling in complex Huaiyin Institute of Technology, Lianyungang,
networks,’’ Appl. Intell., vol. 47, no. 1, pp. 1–12, 2017. China. His research interests include maximiza-
[31] L. Linyuan, P. Liming, Z. Tao, Z. Yi-Cheng, and H. E. Stanley, ‘‘Toward tion of influence of complex network and recom-
link predictability of complex networks,’’ Proc. Nat. Acad. Sci. USA, mender systems.
vol. 112, no. 8, pp. 2325–2330, 2014.
[32] L. Li, L. Qian, J. Cheng, M. Ma, and X. Chen, ‘‘Accurate similarity index
based on the contributions of paths and end nodes for link prediction,’’
J. Inf. Sci., vol. 41, no. 2, pp. 167–177, 2015.
YAN YUAN received the M.S. degree from the YING JIN is currently a Professor with the Huaiyin
Huaiyin Institute of Technology, Lianyungang, Institute of Technology, Lianyungang, China. Her
China. Her research interests include complex net- research interests include deep learning, recom-
work, data analysis, and machine learning. mender systems, and data analysis.