You are on page 1of 6

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2016) 000–000
Procedia
Procedia Computer
Computer Science
Science 11300 (2016)
(2017) 000–000
478–483 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia

The 2nd International Workshop on Data Mining in IoT Systems (DaMIS 2017)
The 2nd International Workshop on Data Mining in IoT Systems (DaMIS 2017)
An
An Effective
Effective Simulated
Simulated Annealing
Annealing for
for Influence
Influence Maximization
Maximization
Problem
Problem of
of Online
Online Social
Social Networks
Networks
a b a,∗
Shi-Jui
Shi-Jui Liu
Liua ,, Chi-Yuan
Chi-Yuan Chen
Chenb ,, Chun-Wei
Chun-Wei Tsai
Tsaia,∗
a Department of Computer Science and Engineering, National Chung Hsing University, Taichung, Taiwan, R.O.C.
a Department of Computer Science and Engineering, National Chung Hsing University, Taichung, Taiwan, R.O.C.
b Department of Computer Science and Information Engineering, National Ilan University, Yilan, Taiwan, R.O.C.
b Department of Computer Science and Information Engineering, National Ilan University, Yilan, Taiwan, R.O.C.

Abstract
Abstract
The influence maximization problem (IMP) is one of the most well-known problems in the research domain of online social net-
The influence maximization problem (IMP) is one of the most well-known problems in the research domain of online social net-
works (OSN) that has attracted the attention of many researchers from different disciplines in recent years. One of the reasons is
works (OSN) that has attracted the attention of many researchers from different disciplines in recent years. One of the reasons is
that the speed of information propagation in the OSN can be increased if we can find out users that have maximum influence on
that the speed of information propagation in the OSN can be increased if we can find out users that have maximum influence on
other users. The traditional rule-based and heuristic algorithms may not be able to find useful information out of these data because
other users. The traditional rule-based and heuristic algorithms may not be able to find useful information out of these data because
the data are generally large and complex. Although metaheuristic algorithms can be used to solve the IMP, there is still plenty
the data are generally large and complex. Although metaheuristic algorithms can be used to solve the IMP, there is still plenty
of room for improvement. That is why an effective and efficient algorithm is presented in this paper. The proposed algorithm,
of room for improvement. That is why an effective and efficient algorithm is presented in this paper. The proposed algorithm,
called simulated annealing with search partition (SASP), is based on a search space partitioning mechanism to enhance the search
called simulated annealing with search partition (SASP), is based on a search space partitioning mechanism to enhance the search
performance of simulated annealing for the IMP. The experimental results show that the proposed algorithm outperforms the other
performance of simulated annealing for the IMP. The experimental results show that the proposed algorithm outperforms the other
state-of-the-art influence maximization problem algorithms compared in this paper in terms of the quality of the end result and the
state-of-the-art influence maximization problem algorithms compared in this paper in terms of the quality of the end result and the
number of objective function evaluations.
number of objective function evaluations.

©c 2017
2016 TheAuthors.
Authors. Publishedby by ElsevierB.V.
B.V.
c 2016 The
The Authors.Published
Published byElsevier
Elsevier B.V.
Peer-review under responsibility of the Conference Program Chairs.
Peer-review under responsibility of the Conference Program Chairs.
Keywords: Metaheuristic algorithm, simulated annealing, influence maximization problem, and online social networks.
Keywords: Metaheuristic algorithm, simulated annealing, influence maximization problem, and online social networks.

1. Introduction
1. Introduction
Online social network (OSN) 11 provides another way to connect people together via the modern computer and
Online social network (OSN) provides another way to connect people together via the modern computer and
internet technologies, such as wireless sensor network, cloud computing, internet of things 2,3,4 . Different from the
internet technologies, such as wireless sensor network, cloud computing, internet of things 2,3,4 . Different from the
traditional chat software, the new development of OSN is capable of providing many fancy services to the users.
traditional chat software, the new development of OSN is capable of providing many fancy services to the users.
Among them, an interesting optimization problem on online social networks, called the influence maximization prob-
Among them, an interesting optimization problem on online social networks, called the influence maximization prob-
lem (IMP). An effective and efficient method is needed for solving this problem because we can find out the relation-
lem (IMP). An effective and efficient method is needed for solving this problem because we can find out the relation-
ship between users, the behavior of users, and the impact of users from the data of OSN 1,5 . Because of the privacy
ship between users, the behavior of users, and the impact of users from the data of OSN 1,5 . Because of the privacy
issues, we cannot access the raw data of some users to analyze their behavior, but we can still understand the interests
issues, we cannot access the raw data of some users to analyze their behavior, but we can still understand the interests
of some anonymous users from the de-identified user behavior data. The friendship or antagonism relations can, there-
of some anonymous users from the de-identified user behavior data. The friendship or antagonism relations can, there-

∗ Corresponding author.
∗ Corresponding
E-mail address:author.
cwtsai@nchu.edu.tw
E-mail address: cwtsai@nchu.edu.tw
1877-0509  c 2016 The Authors. Published by Elsevier B.V.
1877-0509  c 2016 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the Conference Program Chairs.
Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the Conference Program Chairs.
10.1016/j.procs.2017.08.306
Shi-Jui Liu et al. / Procedia Computer Science 113 (2017) 478–483 479
2 Liu et al. / Procedia Computer Science 00 (2016) 000–000

fore, be easily extracted 6 . Understanding the user interests is just a simple example to illustrate the possibilities of
these useful data. In fact, several studies have attempted to develop a more powerful mechanism or tool to understand
the user behavior for many years. In 7,8 , Benevenuto et al. developed a transition probability model to understand the
next step of users from their clickstream data.
As an optimization problem for understanding the user behavior on OSN, one well-known IMP was introduced by
Kempe and his colleagues 9,10 . The problem assumes that a graph G(V, E, W) stands for the OSN and its parameters
V, E, W are defined as follows:

• V = {V1 , V2 , . . . , Vξ } is a set of nodes, where Vi can be regarded as a user,


• E = {E1,2 , E1,3 , . . . , E1,ξ , E2,1 , E2,3 , . . . Eξ,ξ−1 } is a set of edges, where Ei, j can be regarded as the relationship
between two users, i  j, and
• W = {W1,2 , W1,3 , . . . , W1,ξ , W2,1 , W2,3 , . . . Wξ,ξ−1 } is a set of weights, where each Wi, j is associated with an edge
Ei, j ∈ E.

In brief, the goal of this problem is to find out a set of nodes A from G that maximize the influence σ(A), where A
contains κ < ξ nodes each of which represents a user. Also because the problem is NP-hard 9,10 , how to come up with
a high performance method to find out the nodes that maximize the influence in a reasonable time is a critical research
issue, especially when the number of nodes of OSN is large.

Algorithm 1: Propagation Model

1 A = Initialization(G)
2 While the termination criterion is not met
3 A = Influence(A, G)
4 A = A + A
5 End
6 Output A

As shown in line 1 of Algorithm 1, the propagation models, as described in 9,10 to measure the influence of A,
will first assume that a set of nodes from G will be activated while the others will remain inactive. Then, the nodes
(i.e., users) in A will try to activate (i.e., influence) their inactive neighbor nodes based on the given probabilistic rule
as shown in line 3 of Algorithm 1. The weights W between these nodes can typically be regarded as a propagation
probability P of nodes in G. It explains how to employ this probability model to decide whether an active node will
successfully influence its inactive friend nodes to make them active or not. As shown in line 4 of Algorithm 1, the
“new” active nodes A will be added to the set of active nodes A .
This way of measuring the maximum influence does not use all the users (nodes) of the OSN; rather, each testing
uses only a portion of nodes to measure the maximum influence of A. Also, many rounds need to be performed, and
the average of A will be used as a measure of the results, which is defined by
R
1 
I(S) = · σi (S), (1)
R i=1

where S is the seed set A; I(S) the influence of the seed set S; R the number of samples; and σi (S) the influence of
seed set S at i-th round.
As we mentioned previously, developing a high performance method for finding out a “good” set of nodes A in
solving the influence maximization problem is a critical research issue. This is why Chen et al. 11 presented a greedy
algorithm for the IMP, called NewGreedy. Because the basic idea of NewGreedy is not to check all the inactive
nodes every round, it is capable of finding out a set of nodes A very quickly. Also because IMP is NP-hard, several
recent studies have attempted to employ metaheuristic algorithms to solve this problem. One of the well-known
metaheuristic algorithms, genetic algorithm (GA), was used in several studies 12,13 . Guo et al. 12 presented an elite
strategy to enhance the performance of the evolution process of GA, by keeping only the top 50% of chromosomes
480 Shi-Jui Liu et al. / Procedia Computer Science 113 (2017) 478–483
Liu et al. / Procedia Computer Science 00 (2016) 000–000 3

(i.e., the selection strategy is based on the fitness values of these chromosomes) to create another 50% of chromosomes
in the population. In other words, it will remove 50% of the chromosomes every generation because their fitness values
are worse than the other 50% of the chromosomes. Our observation shows that this strategy will quickly find out a
“good” solution, but it will fall into local optimum easily at early generation when the problem is large and complex.
To enhance the performance of greedy algorithm and GA for solving this problem, Tsai et al. 13 developed a hybrid
metaheuristic algorithm, by integrating the GA with NewGreedy algorithm, called GNA. The simulation results show
that GNA is able to find out a better result than the traditional methods for solving the IMP, such as NewGreedy and
GA.

2. The Proposed Algorithm

2.1. Notation

To simplify the discussion that follows, the following notation is used throughout the rest of the paper.

s a solution for the IMP; i.e., s = {s1 , s2 , . . . , sn }, where si is the i-th subsolution of the solution and n the
number of subsolutions.
snn the number of neighbor (candidate) solutions created by the proposed algorithm for solution s.
Ψ the current temperature of the proposed algorithm.
ΨI , Ψmin the initial and minimum temperature of the proposed algorithm.
r the entire search space, i.e., r = {r1 , r2 , . . . , rh }, where h the number of regions.
rj a set of nodes representing a region in r, i.e., r j = {Vh×( j−1)+1 , Vh×( j−1)+2 , . . . , V ξ × j },
h
m
s stop criterion of the proposed algorithm for the entire search process, e.g., the number of evaluations or
the number of iterations.
smj stop criterion of the proposed algorithm for the j-th region, which is set to sm /(2h) in this paper.

2.2. The Simulated Annealing with Search Partition

The proposed algorithm is inspired by search economics 13 that will first divide the search space into a set of regions.
The main characteristic of the proposed algorithm is that it not only divides the search space into different regions and
uses simulated annealing (SA) to search these regions, respectively, but it also summarizes the searched results from
different regions to further improve the searched solution. That is the main difference between the proposed algorithm
with other divide and conquer search algorithms.

Algorithm 2: SASP

1 s = Initialization()
2 r = SearchSpaceDivision(R, h)
3
4 /∗Local Search∗/
5 For j = 1 to h
6 s = SimulatedAnnealingFunction(s, r j , smj )
7 End
8
9 /∗Global Search∗/
10 s = SimulatedAnnealingFunction(s, r, sm /2)
11
12 Output s
Shi-Jui Liu et al. / Procedia Computer Science 113 (2017) 478–483 481
4 Liu et al. / Procedia Computer Science 00 (2016) 000–000

As shown in line 2 of Algorithm 2, the proposed algorithm will set the parameters and input the data first using
the Initialization() operator. Then, the proposed algorithm will divide the entire search space into h regions using the
SearchSpaceDivision() operator. Each region is associated with a set of nodes from V of online social networks, r j . For
instance, suppose there are eight nodes in V (i.e., V={V1 , V2 , V3 , V4 , V5 , V6 , V7 , V8 }) and are divided into two regions;
thus, the nodes in the first region will be {V1 , V2 , V3 , V4 }. It can be calculated as r j = {Vh×( j−1)+1 , Vh×( j−1)+2 , . . . , V ξ × j }.
h
If h = 2 and j = 1, then r1 = {V2×(1−1)+1 , V2×(1−1)+2 , . . . , V ξ ×1 } = {V1 , V2 , V3 , V4 }. For j = 2, r2 = {V5 , V6 , V7 , V8 }
2
Lines 5–7 of Algorithm 2 show that the proposed algorithm will then apply the SimulatedAnnealingFunction()
operator (a modified simulated annealing method), where the arguments s, r j , smj are, respectively, the initial solution,
the region, and the stop criterion of this operator, to search each region in turn.
After that, line 10 shows that the proposed algorithm will apply the SimulatedAnnealingFunction() operator to the
entire search space. The search strategy at lines 5 to 7 of Algorithm 2 can be regarded as the local search while that at
line 10 of Algorithm 2 can be regarded as the global search. If the number of evaluations is used as the stop criterion,
sm , then the number of evaluations for the local search (i.e., sm /(2h) × h = sm /2) and the global search (i.e., sm /2) will
be the same; that is, fifty percent for the local search and fifty percent for the global search.

Algorithm 3: SimulatedAnnealingFunction(s, R, sm )

1 s = CreateNeighbor(s, snn )
2 s = Determination(s, s )
3 Ψ = DecreaseTemperature(Ψ, ΨI , Ψmin )
4
5 Output s

As shown in Algorithm 3, the SimulatedAnnealingFunction() operator is similar to simple simulated annealing that
will create a set of neighbors. However, the search space will be restricted by the parameters r and sm ; that is, the
SimulatedAnnealingFunction() operator will search not the entire search space, but the restricted search space. For
example, suppose the nodes {1, 2, 3, 4} are in the first region, and the nodes {5, 6, 7, 8} are in the second region. The
exchange nodes of s will be restricted to nodes {1, 2, 3, 4} if R = {1, 2, 3, 4}. The Determination() operator is similar
to that of simple SA, which decides whether to accept a non-improving solution or not. The probability of accepting
a non-improving solution is defined as  
− f (s ) − f (s)
Pa = exp , (2)
Ψ
where f (·) denotes the evaluation function, s the current solution, s the new candidate solution, and Ψ the temperature.
That is, if s satisfies the probabilistic acceptance criterion of Eq. 1 (i.e., r < Pa ), where r is a random number in the
range [0, 1). The current solution s and its objective value f (s) will be replaced by the new candidate solution s and
its objective f (s ). The temperature decrease schedule of the DecreaseTemperature() operator is set to Ψ = 0.999Ψ as
far as this paper is concerned.

3. Simulation Results

Table 1. D     .


Dataset # of Nodes # of Edges # of Triangles # of Seeds PP Reference
DS1 75,879 508,837 1,624,481 200 0.05 https://snap.stanford.edu/data/soc-Epinions1.html
DS2 82,168 948,464 602,592 300 0.04 https://snap.stanford.edu/data/soc-Slashdot0902.html
DS3 262,111 1,234,877 717,719 1,000 0.10 https://snap.stanford.edu/data/amazon0302.html
DS4 403,394 3,387,388 3,986,507 1,000 0.04 https://snap.stanford.edu/data/amazon0601.html

In this section, several datasets of influence maximization problems are used to evaluate the performance of the
proposed algorithm. The empirical analysis is conducted on a PC with Intel i5-4590 CPU 3.30GHz and 4 GB of mem-
ory running Windows 7, and the programs are written in C++. As shown in Table 1, four well-known datasets—which
482 Shi-Jui Liu et al. / Procedia Computer Science 113 (2017) 478–483
Liu et al. / Procedia Computer Science 00 (2016) 000–000 5

contain from 75,879 nodes up to 403,394 nodes—are used to evaluate the performance of the proposed algorithm. The
column labeled ‘# of Seeds’ represents the number of nodes we want to find out from the testing dataset that has the
maximum influence on the other nodes in the same network. The column labeled ‘PP ’ represents the propagation
probability of the nodes while the column labeled ‘Reference’ represents the source of these datasets.

Table 2. P S.


Algorithm Parameters
GA Population size (i.e., number of solutions) 8
Crossover rate 0.1
Mutation rate 1.0
SA Solution 1
Number of neighbors 100
Maximum temperature 10
Minimum temperature 0.0001
Cooling schedule T = 0.999T
SASP Solution 1
Number of neighbors 100
Maximum temperature 10
Minimum temperature 0.0001
Cooling schedule T = 0.999T
Number of regions 4

Table 3. S .


Dataset Sample SA GA SASP
S1 5,433.33 5,362.27 5,433.83
S2 5,403.47 5,339.50 5,401.27
DS1
S3 5,473.17 5,408.87 5,471.0
Avg 5,436.66 5,370.21 5,435.38
S1 9,807.90 9,661.10 9,795.70
S2 9,952.50 9,748.00 9,910.30
DS2
S3 9,914.83 9,797.07 9,952.70
Avg 9,891.74 9,735.39 9,886.23
S1 7,397.53 7,193.33 10,302.57
S2 7,301.40 7,104.53 10,171.87
DS3
S3 7,481.30 7,285.40 10,365.47
Avg 7,393.41 7,194.42 10,279.97
S1 18,065.63 17,648.97 24,716.50
S2 18,402.30 17,979.43 25,375.47
DS4
S3 18,521.37 18,106.10 25,383.07
Avg 18,329.77 17,911.50 25,158.34

Each experiment is carried out for 30 runs, and the number of iterations each run is set equal to 600,000 while all
the experimental results shown are the average of the 30 runs. Table 2 gives the parameter settings of SASP, genetic
algorithm (GA), and simulated annealing (SA). According to our observation, the general settings of GA (i.e., with
the crossover rate set equal to 0.6 and the mutation rate set equal to 0.01) will give worse results for solving these four
influence maximization problems . That is why the crossover and mutation rates of GA are set equal to 0.1 and 1.0,
respectively, in this study, which mean that only 10% of the chromosomes will be selected for crossover while all the
chromosomes will be mutated.
In Table 3, S1, S2, and S3 denote samples randomly selected from the datasets. The results show that SA provides
a better result than GA and SASP on average for small datasets; i.e., DS1 and DS2 although the differences of
these results are not significant. For example, the difference between SASP and SA is about 0.056% [−0.00056 =
(9, 886.23 − 9891.74)/9, 891.74] for DS2. However, for large-scale datasets, the proposed algorithm can provide a
better result than the other two metaheuristic algorithms. The difference between them are significant. For example,
Shi-Jui Liu et al. / Procedia Computer Science 113 (2017) 478–483 483
6 Liu et al. / Procedia Computer Science 00 (2016) 000–000

for DS4, the difference between SASP and SA is about 37.253% [0.37253 = (25, 480.12 − 18, 329.77)/18, 329.77] in
terms of the average results. The results further illustrate that the proposed algorithm is useful for solving large-scale
datasets of IMP.

4. Conclusion

In this paper, we present an efficient SA-based algorithm to solve the influence maximization problem that restricts
the search space of each search during the convergence process. By using this strategy, SASP will be able to avoid
the search diversity from narrowing down to a particular direction during the convergence process. The design of
the proposed algorithm also takes the balance of diversification and intensification of the search into account; that is,
fifty percent for the local search and fifty percent for the global search. The simulation results show that the proposed
algorithm can find out a better result than the other metaheuristic algorithms evaluated in this paper, especially for
large-scale datasets. These results further show that the scalability of SASP is better than the others because it
can continuously improve the searched results before it finds out the optimum solution. Since this study shows
the possibilities of search space division that may be useful to enhance the performance of the other metaheuristic
algorithm-based algorithms, we will try to find out other ways to divide the search space to develop a more effective
and efficient SA-based algorithm.

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions on the paper.
This work was supported in part by the Ministry of Science and Technology of Taiwan, R.O.C., under Contracts MOST
106-3114-E-005-001, MOST 105-2218-E-001-001, MOST 105-2221-E-005-091, and MOST 105-2221-E-197-017.

References

1. J. Heidemann, M. Klier, F. Probst, Online social networks: A survey of a global phenomenon, Computer Networks 56 (18) (2012) 3866–3878.
2. T. Qiu, A. Zhao, F. Xia, W. Si, D. O. Wu, Rose: Robustness strategy for scale-free wireless sensor networks, IEEE Transactions on Networking
(2017) 1–16doi:10.1109/TNET.2017.2713530.
3. S. Cuomo, P. D. Michele, A. Galletti, F. Piccialli, A cultural heritage case study of visitor experiences shared on a social network, in:
Proceedings of the International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2015, pp. 539–544.
4. F. Piccialli, J. E. Jung, Understanding customer experience diffusion on social networking services by big data analytics, Mobile Networks
and Applicationsdoi:10.1007/s11036-016-0803-8.
5. L. Jin, Y. Chen, T. Wang, P. Hui, A. V. Vasilakos, Understanding user behavior in online social networks: a survey, IEEE Communications
Magazine 51 (9) (2013) 144–150.
6. J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks, in: Proceedings of the 19th
International Conference on World Wide Web, 2010, pp. 641–650.
7. F. Benevenuto, T. Rodrigues, M. Cha, V. Almeida, Characterizing user behavior in online social networks, in: Proceedings of the ACM
SIGCOMM Conference on Internet Measurement, 2009, pp. 49–62.
8. F. Benevenuto, T. Rodrigues, M. Cha, V. Almeida, Characterizing user navigation and interactions in online social networks, Information
Sciences 195 (2012) 1–24.
9. D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in: Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2003, pp. 137–146.
10. D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, Theory of Computing 11 (4) (2015)
105–147.
11. W. Chen, Y. Wang, S. Yang, Efficient influence maximization in social networks, in: Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2009, pp. 199–208.
12. J. Guo, Y. Liu, J. Shen, Z. Wei, J. Lv, Influence maximization algorithm based on genetic algorithm, Computational Information System
10 (21) (2014) 9255–9262.
13. C. W. Tsai, Y. C. Yang, M. C. Chiang, A genetic newgreedy algorithm for influence maximization in social network, in: Proceedings of the
IEEE International Conference on Systems, Man, and Cybernetics, 2015, pp. 2549–2554.

You might also like