Properties of the Most Influential Social Sensors

Bal´ azs K´ osa∗ , Bal´ azs Pinczel† , G´ abor R´ acz‡ and Attila Kiss§
E¨ otv¨ os Lor´ and University Faculty of Informatics, Budapest, Hungary Emails: {∗ balhal, † vic, ‡ gabee33, § kiss}@inf.elte.hu

Abstract—In the influence maximization problem one is to find a subset of vertexes with the highest influence among the node sets of the same cardinality, where a model representing the spread of influence is also given. In [6] two of the most commonly used model were introduced and it was shown that in both cases the problem becomes NP-hard. On the other hand, it was also proven that the greedy algorithm always guarantees a solution performing at most as bad as 1 − 1/e times the optimal solution. Focusing on the Independent Cascade Model we enhance the greedy algorithm to be able to remember not only the locally best solution but b other sets as well, whose influence was the second, third etc. best in the previous step. Surprisingly, contrast to the extended search space this method performs indistinguishably the same as the optimized greedy algorithm of [1] even for relatively large values of b. This shows that there are several different node sets, whose influence is indistinguishably the same as that of the node set returned by the greedy algorithm. Inspired by this result we characterize the most influential sets from two different perspective. Firstly, we try to determine the distribution of three different centrality measures on the members. It turns out that for the eigenvector centrality [12] this distribution can be closely approximated by the normal distribution. Secondly, we examine how the age of a node, i.e., the time passed after becoming a member of the network, correlates to its chance of being chosen by the greedy algorithm. Surprisingly, we found that for graphs with 100, 000 nodes generated by the forest fire model [8] [7], most of the times even the ”youngest” node belongs to the first 50, 000 users who have joined the network. It may be even more striking that the fourth youngest elements are among the 10 percent of these nodes in average.

I. I NTRODUCTION Over the last few years on-line social networks (OSN) have become staggeringly popular, indeed the number of the users of such sites is still dynamically growing. Browsing the quarterly Facebook reports1 for example it turns out that from 1.11 billion monthly active users of the first quarter the site grew to 1.15 billion in the second quarter of this year. This relatively new phenomenon has posed serious challenges to several disciplines including informatics and computer science. Since the structure of the deduced OSN graphs obeys the rules of small worlds [10] their exploration is involved in a wilder research area concentrating on networks in general. One of the interesting questions among many of this field is a special version of the set cover problem [2]. Namely, one is to find a fixed number of vertexes, the so-called social sensors through which the largest possible part of the whole system can be reached, i.e. either information can be collected or influence can be spread. For instance in [9] one of the challenges was to find the best sensor placements for a real water distribution
1 http://investor.fb.com/releasedetail.cfm?ReleaseID=780093

network. In the same paper a similar question was asked for the hyperlinked world of blogosphere. Which blogs are to be read in order to catch the largest amount of information flowing around the system. The third example was taken from the world of marketing. Who should be targeted and conceivably paid by a reasonable amount of money in a marketing campaign so as to influence as many participants of the network as it is possible. In this paper we shall concentrate on this latter problem, which is mainly referred as influence maximization. In [9] it was shown how all these issues can be formulated in a general framework, nevertheless in our case the particular details of the model of the diffusion is also of importance. This model was introduced in [6], where the whole question was considered as a discrete optimization problem. Two basic representations were introduced, the Independent Cascade Model and the Linear Treshold Model, whose possible generalizations were also discussed. It was proven that the problem is NPhard in both cases, nevertheless based on the submodularity of the scoring functions it was also shown that the simple greedy algorithm assuredly approaches the optimal solution by a factor of 1 − 1/e. Since then much of the efforts was concentrated on optimizing the greedy algorithm. In particular a serious drawback of this approach is that the influence of the candidate sets should be evaluated in each turn, which owing to the non-deterministic nature of the process is accomplished by using Monte Carlo simulations. Since all together the number of these simulations is remarkably high, for large graphs the whole evaluation can be very time consuming. In [1] an improvement of the original method was proposed in which at the beginning of an iteration in the Monte Carlo simulation each edge of the original graph is deleted with a certain probability and then the influence of a node set S is measured as the number of reachable nodes from S . This idea has been already used in [6], when the submodularity of the scoring functions was proven. In [9] an orthogonal approach of optimization is presented which using the aforementioned submodularity property in each step of the greedy algorithm reduces the number of those nodes that are considered as candidates for extending the already selected node set. In [5] this idea is further refined. In [14] the network is clustered and the influence of a node set extended with a candidate is only measured on that cluster to which the candidate in question belongs. Very recently, the influence maximization problem was examined for dynamic networks as well in [4]. In our paper we focus on the Independent Cascade Model. Relating to the results of [6] and [1] as a first step we

the relationship between the age of a node and its chance of being selected is discussed. More precisely. Of course. The elements of this set are considered to be active. Consequently. In our paper we concentrate on the Independent Cascade Model. The influence of S is defined to be the number of active nodes after termination. i. we examine how the age of a node. Since as our previous experiment concerning the distributions showed that the greedy algorithm have a tendency to select from the nodes with lower centrality measures as well. Thus. we found that for graphs with 100. In [6] two basic diffusion models were introduced by means of which the influence function can be calculated. Here. E ). most of the times even the ”youngest” node out of 30 belongs to the first 50. Secondly. firstly for each S ⊆ V. In the influence maximization problem for each subset S of the nodes of G an influence function is defined: σ → R+ ∪ {0}. 000 nodes. this observation can also be built in a probabilistic method aimed to perform with a similar efficiency as the greedy algorithm. for each S. using the forest fire model to generate random graphs we try to determine the distribution of three different centrality measures on the members of the winner sets. eigenvector centrality and betweenness [12]. correlates to its chance of being chosen by the greedy algorithm. However. On the other hand. it seems that in many cases the greedy algorithm closely approximate the optimal solution probably better than the aforementioned 1 − 1/e factor. Finally. the model of diffusion is presented and the influence maximization problem is formalized.e. it was found that the influence function is monotone and submodular under this setting. In Section IV. The idea is taken from the field of relational databases. but the hypothesis is still acceptable. More precisely. Thus. In [6] it was reported that when the elements were chosen according to the decreasing order of their degree. an arbitrary set of nodes S is chosen first from which the influence will start to spread. These measures are the degree. The aim is to find that subset S of V. H ⊆ V. II. where k is a fixed constant belonging to the input of the problem. Using the result of [11] it can be guaranteed that the greedy algorithm up to a constant factor never performs worse than the optimal solution. The same phenomenon was experienced for the distance centrality measure. For a social network the nodes of its graph representation stand for individuals. this method performs indistinguishably the same as the optimized greedy algorithm of [1] even for relatively large values of b.G. in the following experiments we restrict ourselves to consider the results returned by the optimized greedy algorithm of [1]. the distribution of the aforementioned three centrality measures on the members of the most influential node sets is examined. while the edges usually models some sort of relationships between the individuals. we conclude. best in the previous step. the time passed after becoming a member of the network. S ⊆ H and node v σ (S ∪ {v }) − σ (S ) ≥ σ (H ∪ {v }) − σ (S ).try to improve the efficiency of the greedy algorithm by remembering not only the locally best solution but b other sets as well. It may be even more striking that the fourth youngest elements are among the 10 percent of these nodes in average. The paper is organized as follows. This correspondence for the degree distribution is weaker. 000 users who have joined the network. What is more the difference between the goodness of the first and bth sets is also negligible in most of the steps. the other endpoint is activated. which we have not found yet. Hence. whose influence was the second. In [6] it was proven that the influence maximization problem is NP-hard in this case. one would suspect that much younger vertexes also have a considerable chance to become members of the most influential node sets. The process stops. then the influence of the resulting node set was considerably lower than that of given by the greedy algorithm. it is highly peculiar that one has to be so an early bird to be able to catch the worm. the question of the ratio and distribution of lower centrality values has its importance of its own.e. in Section V. the marginal gain of adding the same node to a growing set decreases as the set becomes larger.. i. to which σ (S ) is maximal among the influences of the node subsets with cardinality k . Finally.G with cardinality k . again generated by the forest fire model. The distribution of eigenvector centralities nicely fits to a normal distribution. Afterwards. where p is part of the input of the model. Furthermore. Surprisingly. the distribution of the betweenness values should be approximated with a different distribution. since according to our previous observation when tying to find the best solution there is a larger set of nodes to choose from without significantly changing the influence of the result. when either all nodes have become active. This suggests that for a certain level the greedy algorithm is insensitive toward which vertex is chosen to be added to the already selected set from the candidates with a relatively similar influence. where it was employed to find the best order of subsequent join operations [3]. The results of our initial experiments are promising. These are also extended with the individual vertexes and at the end of the turn the b best of the new sets are kept for the next iteration. at first sight surprisingly.G and node v σ (S ∪ {v }) ≥ σ (S ). if the distribution of a centrality measure could be assessed from a smaller sample. or no new node has been activated in a round. In the (i + 1)th round each node that has become active in the ith round may cause its non-active neighbours to become active too. Moreover. e . 1 σ (S ∗ ) ≥ (1 − ) σ (Sopt ). In other words. S ELINGER OPTIMIZATION OF THE GREEDY ALGORITHM A network is modeled as an undirected graph G = (V. for such a node its edges are taken one after the other and with a fixed probability p. third etc. Then the enhanced greedy algorithm is introduced and its performance is compared with the performance of three other algorithms. then instead of running the expensive greedy algorithm this information can be utilized to generate candidate sets. In Section II. in Section III. As a next question.

.G σ (S ∪ {v }) − σ (v ) = 0 v is in R(S ). best candidate node sets as well. In MixedGreedy the CELF optimization technique [9] is built into the Newgreedy algorithm. node represents an author and an edge is added between two authors whenever they jointly wrote a paper. Here. 692 nodes and 183. b. R(H ) or RG (H ). SelingerGreedy(G. The second dataset.G. The first.8. S1 . . R) S1 := ∅ . Sb denotes the first b node sets with the highest influence and R gives the number of the Monte Carlo simulations. third etc. {sj v | v ∈ V. 1. auxS is an auxiliary array of node sets with b elements. Sb .org 3 The dataset is available for download at http://research. . Note that in the algorithm only the values of sj v -s are calculated. The marginal gain of node v for Sj is approximated by sj v /R (1 ≤ j ≤ b). After the calculation of these variables auxiliary function BestCandidates is called which taking the nodes of G. 1300 1200 1100 influence 1000 900 800 700 600 10 newGreedyIC SeringerGreedyIC mixedGreedyIC degree discount influence 9650 9600 9550 9500 9450 9400 newGreedyIC SeringerGreedyIC mixedGreedyIC degree discount 15 20 25 30 35 40 size of seed set 45 50 9350 10 15 20 25 30 35 40 size of seed set 45 50 (a) NetPHY. [1].zip. is derived from the Enron email network. 1 ≤ j ≤ b}) S1 := auxS [1] . . . each node is reachable from itself.where S ∗ denotes the result of the greedy algorithm.com/ enus/people/weic/graphdata. while the average clustering coefficient is 0. In [1] an improvement of the original method was proposed in which at the beginning of an iteration in the Monte Carlo simulation each edge of the original graph is deleted with probability 1 − p. Sb := auxS [b] end for return S1 . . is extracted from the arXiv2 academic collaboration network by Wei Chen et al. . 831 edges. The details are to be found in Fig. k . . . S1 . We also try to extend these sets with new nodes and at the end of an iteration again the best solutions are kept. i. . which consists of around half million emails.edu/data/ email-Enron. The 90-percentile effective diameter is 4. . Furthermore. It is constructed by using the full paper list of Physics section from 1991 to 2003. . Owing to the non-deterministic nature of the diffusion model in practice the values of σ are approximated by means of Monte Carlo simulations. Influence spread of different algorithms.4970. The dataset can also be downloaded from here.html. . . The SelingerGreedy algorithm. The 4 The figures are taken from this site: http://snap. Note that in this case the calculation of the marginal gain also reduces to the problem of reachability. Sb where for a node set H . . . The numbers of nodes and edges are respectively 37. Nodes represent email addresses here and if an address i sent at least one email to address j . thus we call our new algorithm SelingerGreedy. i.02 (b) EmailEnr. Since the number of iterations used in experiments ranges between 10000 and 20000. Sb with cardinality i and the aforementioned sj v values returns those b node sets with cardinality i + 1 that are constructed from S1 . . RG (Sb ) for each node v in V.3 Each 2 http://arXiv. we extend the previous algorithm coined NewGreedy in [1] by not only keeping track of the locally best solutions in terms of influence but also remember the second. Sb := ∅ for i = 1 to k do b s1 v := 0 . for choosing the node with the best marginal gain it is not necessary to divide by R. which is referred EmailEnr. Sb by adding a node from V. S1 . . 154 and 231. Figure 1.e. for node v and S ⊆ V. The latter two algorithms were also developed in [1]. they have the highest influence among the node sets which can be constructed in this way and they are mutually different from each other (1 ≤ i ≤ k ).4 We compared SelingerGreedy with the original NewGreedy algorithm as well as methods MixedGreedy and DegreeDiscount. Reachability is considered to be a reflexive relation. this can seriously slow down the speed of the algorithm. while |H | is the size of H .G for j = 1 to R do construct G by deleting each edge of G with probability 1 − p compute RG (S1 ) . This idea is named after Selinger in database theory [3]. .G do for q = 1 to b do if v ∈ / RG (Sq ) then q sq v := sv + |RG (v )| end if end for end for end for auxS := BestCandidates(V. when the order of subsequent join operations is to be optimized [13]. . It consists of 36.G.G. .e. In the comparison of the effectiveness of our algorithm with that of others two data sets were used. p = 0. . . denotes the reachable nodes from H . |R({v })| otherwise. . Then the influence of a node set S is measured as the number of reachable nodes from S . which is called NetPHY. if the underlying graph G is not clear from the context.stanford. . . sv := 0 for all v ∈ V.microsoft. while Sopt the optimal solution respectively.05 Figure 2. . p = 0. Based upon an idea emerging on the field of relational databases. then an undirected edge between i and j is contained in the graph. 584. .

From this observation one might suspect that at least in considerable amount of the cases the greedy algorithm more closely approaches the optimal solution than a 1 − 1/e factor.05.2 0.6 0.63 9599.3 0.279 9502.66 9601.1 Relative frequency 0. it seems that in most of the steps of the greedy algorithm there is a larger number of candidates with very similar advantageous properties. However. b = 15. fifth.01 to 0. In both cases the number of rounds (R) was taken to be 10. EmailEnr.4 0.1.01 (f) p = 0. From the figures it is not only evident that best node set returned by SelingerGreedy is as influential as the result of NewGreedy. In the first case this ratio is around 83 percent. tenth and fifteenth best candidate sets.1 0.414 30 9539.e. 30.8 Betweenness centrality 1 Relative frequency All nodes Nodes in seed set All nodes Nodes in seed set 0.2 0.4 0.2 0. As it was expected NewGreedy and SelingerGreedy outperformed the rest of the methods.89 9450.973 9572.3 Relative frequency 0. 100 graphs were generated with 100. 40 and 50 can be seen. Distributions of various centrality measures.15 0. whereas the cardinality of the sets to be found was chosen to .5 0. the distribution of the sizes of weakly and strongly connected components.1 0 0 (d) p = 0. the phenomenon of shrinking diameters etc. 000.05 0 0 (a) p = 0. The results can be found in Fig. 3.01 and 0. R = 10. Thus.8 0. Claim 1: Thus.6 0.065 9569.507 9601.1. k = 50 and p = 0. 3.1 All nodes Nodes in seed set All nodes Nodes in seed set Degree Discount algorithm uses a different approach.1 Figure 4. 000 nodes in this case.7 0.7 0. p = 0. i.519 9497. but it also turns out that the first and fifteenth best candidates of SelingerGreedy performs almost indistinguishably. while since the calculation is more time consuming for betweenness values the graphs consists of only 10.3 0.844 9537.9 0. b = 15 and k = 50.18 9501.1 0.3 Relative frequency 0.247 9566.01 1 0. Basically.2 0. since it is in the intersection of the parameter intervals found to perform the best in overall in [8] and [7]. in each step it chooses the node with the highest discounted degree value.861 20 9503.05.04 40 9573.1 0 0 1 0. inspired by this idea we tried to approximate the distribution of three different centrality measures.3 0.01 0.05 0 0 20 Relative frequency (a) NewGreedy.6 0. 0. distribution and after a few attempts with high probability the desired influential seed would be found.94 30 9539.635 40 9571.05 Figure 3. namely degree centrality. p = 0. nevertheless in the second case the number of nodes influenced by the initial seed given by the DegreeDiscount algorithm is around 98 per cent of the number of nodes activated by the seed of NewGreedy. Influence spread of different algorithms. Note that since undirected graphs are to be generated it is not necessary to differentiate between the forward and backward burning probabilities of the forest fire model.808 50 9602.2 0. 000 nodes for the degree and eigenvector centralities.15 0. (b) the same values are enumerated for SelingerGreedy for the first.803 9538.4 0.6 0.25 0. (a) the influence of the seeds given by NewGreedy for cardinalities 10. III.207 All nodes Nodes in seed set 0.1 0.3 0.05 0 0 20 All nodes Nodes in seed set 40 60 80 100 120 140 Degree centrality 40 60 80 100 120 140 Degree centrality (b) SelingerGreedy.25 0.2 0. 2.15 0.37. On the other hand.8 0.02.6 0. eigenvector centrality and betweenness. For this single burning probability we decided to use 0.2 0. 000.317 9538. What is more we obtained very similar results for other values of p ranging from 0.2 0. In the first case the NETPHY dataset was used and the probability of activating a neighbour p was set to 0. and the method is quite insensible to which of these candidates is chosen eventually. the difference between the performances of NewGreedy and SelingerGreedy is negligible. he or she could pick nodes based on the aforementioned 0.8 Betweenness centrality 1 (e) p = 0.9 0.054 9596.05 0 0.344 9447.764 20 9497.25 0. the degree of a node is decreased with a value corresponding to the number of its active neighbours after each iteration. second.1 0.941 Relative frequency 0. EmailEnr. The probability parameter of the Independent Cascade Model was set to 0.371 9445.15 0.5 0.795 50 9603. In the second case the algorithms were evaluated over the EmailEnr dataset and p was 0.871 9572.05 b/k 1 2 5 10 15 10 9454.86 9448.575 9502.25 0. [8] [7].b/k 1 10 9442.404 9537. Here. then after calculating the centrality measure in question for all nodes.8 Eigenvector centrality 1 0 (b) p = 0.2 0. In Fig. 20.8 Eigenvector centrality 1 (c) p = 0. D ISTRIBUTION OF CENTRALITY MEASURES The preceding statement suggests that if one was to find a node set of cardinality k with high influence and knew the distribution of a centrality measure among the nodes of such sets.4 0.4 0. as it has been mentioned in the introduction these distributions have their importance of their own.4 0.2 0. For the experiments the forest fire model was used which is reported to well mimic several important static and dynamic properties of real world networks including the degree distribution. while in In Fig.6 0.

003 StdDev 3.086 0. be 30. The burning probability was chosen to be 0.01 0. when a probabilistic algorithm of finding the most influential node set is to be developed.1) 0.05 0.147. V. As it is expected the values of the different centrality measures are significantly higher for the elements of the influential node sets returned by the greedy algorithm. For selecting the node set with the highest influence the NewGreedy algorithm was used. p = 0. Influence spread of different algorithms. maximum values. For the simulation of the evolution of the social graphs again the forest fire model was used. since as it has turned out in the previous section nodes with smaller centrality values can also become a member of a node set returned by the greedy algorithm.092 (b) Degree centrality Figure 6.854 0. The minimum.1 0. Surprisingly. however.2 0. D (0.582 0.05 Figure 7. Mean and standard deviation of the age of the members of the node sets returned by the greedy algorithm. 4. 000 nodes.1.0599 D (0. In other words. Considering the shape of the diagrams of the eigenvector and degree centrality distributions of the most influential node sets we tried to fit a normal distribution with the appropriate expected value and standard deviation.136 0.163 0. the cardinality of the node set to be found was set to 30. The results can be seen in Fig. the distributions of three centrality measures were assessed and we found that the eigenvector centrality behaves the most predictable way in this respect. influence its chance of being chosen by the greedy algorithm.942 0.946 0.15 0. IV. To sum up the eigenvector centrality is the most promising candidate on which the development of a probabilistic method for finding the most influential node set can be based. Needless to say. The results can be seen in Fig. 7. As previously.105 0.007 0 Max 135 1 1 Mean 3.1 0..157 0. these experiments should be repeated on real world networks.098 0.107 D (0.156 (a) p = 0. Unfortunately.108 0.03 Max 131 1 1 Mean 43. the relation between the age of a node and its chance of being chosen by the greedy algorithm was . T HE EARLY BIRD CATCHES THE WORM Finally. The first position on the abcissa represents the oldest nodes. the means and the standard deviations for all nodes and for the influential sets are enumerated in the tables of Fig.024 80000 70000 60000 80000 70000 60000 50000 40000 30000 20000 10000 0 0 5 10 15 20 25 Elements of seed set 30 (b) Influential sets. The D.01 0.0465 0. First. The D. C ONCLUSION AND FUTURE WORK In our paper we examined some properties of the most influential social sensors selected by the greedy algorithm. we recognized that in most of the steps there is a larger set of candidates with similar advantageous characteristics and the algorithm is rather insensitive to which of these candidates are selected.136 0. the second position the second oldest etc. this observation should also been utilized. 6.01) 0.e.561 0.01 and 0.01) 0.28 0.and significance values of the Kolmogorov-Smirnov test.13 0.01 (b) p = 0. It may be even more striking that the fourth youngest elements are among the 10 percent of these nodes in average. 000 nodes that joined the graph. but not so strong. StdDev 12.0447 0. the time passed after becoming a member of the network. Intuitively.1 Figure 5. the correlation is positive. As the figures shows for eigenvector centrality the null hypothesis is accepted at all significance levels.2 0.019 0.499 0.114 0.622 0. Next.01 Centrality Degree Eigenvector Betweenness Min 12 0. which was found the best solution for this purpose in [7].18 50000 40000 30000 20000 10000 0 0 5 10 15 20 25 Elements of seed set 30 Generation time Centrality Degree Eigenvector Betweenness Min 1 0. p = 0. while for degree centrality it is rejected at significance level 0. 1000 graphs were generated with 100. we have not found an appropriate distribution to fit to the experimental distribution of the betweenness values. while the number of the iterations in the Monte Carlo simulations R was 10. nevertheless one may still draw the conclusion that the age of a node probably strongly predicts whether it has any chance of being selected by the greedy algorithm. 5.15 0.146 0.112 StdDev 3. the results of our experiments contradict this conjecture and suggest that the correlation is stronger than it is expected. (c) Influential sets. The ordinate enumerates the steps in all of which a new node joins the network. namely 0. To test our hypothesis the Kolmogorov-Smirnov test was applied. 000. Lastly.Generation time (a) All nodes Centrality Degree Eigenvector Betweenness Min 29 0. namely its experimental distribution closely fits to the normal distribution. the experiments were carried out for two different values of the probability parameter p of the Independent Cascade Model. the older the node the higher the probability of being selected.1 and above. Of course. The diagram shows that in both cases in average even the youngest elements were among the first 50.37 again for similar reasons as in the preceding section.1) 0. i.116 0.122 0. the importance of the age factor is relatively weak.107 (a) Eigenvector centrality D (0.04 Max 135 1 1 Mean 55. we examine how the age of a node.and significance values can be found in Fig.05 0.

KDD ’03. ACM. Relating to the first case a method should be developed by means of which the distribution of the appropriate centrality measure can be obtained effectively on a smaller sample and then this distribution can be used to select node sets whose expected influence is high. Sampling from large graphs. Maximizing the spread [6] David Kempe. 2007.C-11/1/KONV-2012-0013).1682. L.. ACM. and Natalie Glance. Carlos Guestrin. ´ Tardos. D. Celf++: optimizing the greedy algorithm for influence maximization in social networks. and T. 2012. Influence maximization in continuous time diffusion networks. and M. Oxford University Press. An analysis of approximations for maximizing submodular set functions. Yajun Wang. Price. pages 29–42. G. [2] Toshihiro Fujito. In Proceedings of the 20th international conference companion on World wide web. KDD ’07. [3] Hector Garcia-Molina. and Eva of influence through a social network. Access path selection in a relational database management system. The age of node can also be built into the probabilistic algorithm. Lakshmanan. where a node may leave the network. CoRR. L. ACM Trans. Discov. Surprisingly. 1999. . Gummadi. Nemhauser. Jon Kleinberg. SIGMOD ’79. ACM. ACM. abs/1205. Wolsey. M. and Laks V.examined. Database Systems: The Complete Book. pages 23–34.S. L. Andreas Krause. ACM. KDD ’09. [13] P. 2006. [4] Manuel Gomez-Rodriguez and Bernhard Sch¨ olkopf.2.: TAMOP-4. [9] Jure Leskovec. Communitybased greedy algorithm for mining top-k influential nodes in mobile social networks. Ullman. Lorie. Data. Cost-effective outbreak detection in networks. IMC ’07. Graph evolution: Densification and shrinking diameters. 2010.hu (grant no. 2009. WWW ’11. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2 edition. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data. [11] G. Krishna P. pages 137–146. Prentice Hall Press. it turned out that with high probability all members of a selected set of cardinality 25 belong to the first 10 percent of the nodes that first joined the network. Griffiths Selinger. pages 1039–1048. Res. R. 1979.2. [10] Alan Mislove. and Bobby Bhattacharjee. 14(1):265–294. ACM. In the last two cases graphs generated by the forest fire model were used. [5] Amit Goyal. Massimiliano Marcon. pages 420–429. and Kunqing Xie. pages 631–636. ACM. M. ACKNOWLEDGEMENT This work was partially supported by the European Union and the European Social Fund through project FuturICT. Fisher. [14] Yu Wang. Mathematical Programming In Mathematical Programming. 2007. Christos Faloutsos. 2011. 2010. Oper. and Siyu Yang. A. In the near future we plan to repeat these experiments on real world networks too. [12] Mark Newman. ACM. KDD ’10. Peter Druschel. Jeanne VanBriesen. Astrahan.. 2007. A. Lett. 25(4):169–174. 2008. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. and Christos Faloutsos. and Jennifer Widom. Guojie Song. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. Inc. then the same framework can be utilized for other diffusion models as well including the Linear Treshold Model or such models. R EFERENCES [1] Wei Chen. [7] Jure Leskovec and Christos Faloutsos. Jeffrey D. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. Networks: An Introduction. KDD ’06. Efficient influence maximization in social networks. D. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 1978. Chamberlin. Measurement and analysis of online social networks. 1(1). [8] Jure Leskovec. pages 199–208. 2003. On approximation of the submodular set cover problem. Wei Lu. Knowl. Gao Cong. pages 47–48. Jon Kleinberg. If the resulting method proves to be successful.