Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
Rumor Restriction in Online Social Networks.pdf

Rumor Restriction in Online Social Networks.pdf

Ratings: (0)|Views: 0|Likes:
Published by Sowarga

More info:

Published by: Sowarga on Jan 16, 2014
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/16/2014

pdf

text

original

 
Rumor Restriction in Online Social Networks
Songsong Li
, Yuqing Zhu
, Deying Li
∗¶
, Donghyun Kim
, and Hejiao Huang
§
School of Information, Renmin University of China, Beijing 100872, China
Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA 75080
Department of Mathematics and Physics, North Carolina Central University, Durham, NC USA 27707
§
Department of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
Corresponding Author: Deying Li. Email: deyingli@ruc.edu.cn
 Abstract
—Online Social Networks (OSNs) have recentlyemerged as an effective medium for information sharing. Unfor-tunately, it has been frequently observed that malicious rumorsbeing spread over an OSN are not controllable, and this is notdesirable. This paper proposes a new problem, namely the
 γ 
k
rumor restriction problem, whose goal is, given a social network,to find a set
 
 of nodes with
 k
 protectors (
γ 
 
 k
 protectorsfrom the contaminated set, and
 (1
 
 γ 
)
 
 k
 protectors fromthe decontaminated set) to protect the network such that thenumber of decontaminated nodes is maximum. We show thatthe objective function of the
 γ 
 
 k
 rumor restriction problem issubmodular, and use this result to design a greedy approximationalgorithm with performance ratio of 
 1
 
 1
/e
 for the problemunder the linear threshold model and independent cascade model,respectively. To verify our algorithms, we conduct experimentson real word social networks including NetHEPT, WikiVoteand Slashdot0811. The results show that our algorithm worksefficiently and effectively.Keywords: Rumor containment, Real-world social networks,IC model, LT model.
I. INTRODUCTIONWith the high-speed development of online social networks,many people have integrated popular online social sites suchas Facebook, Twitter, and LinkedIn, into their everyday lives.As a result, along with the traditional information propagat-ing mediums such as newspapers, the social networks havebecome a major source of news.Unfortunately, not all the information in social networks iscorrect and beneficial. Frequently, an online social network serves as a platform to spread malicious rumors, which is un-verified, forged, and/or intentionally/unintentionally changed.Some individuals may spread the rumor for the sake of thepersonal. Even a correct information can be transformed intoa canard during the propagation process over multiple people.In 2009, for instance, the misinformation of swine flu outbreak propagated in twitter, which caused widespread panic. Inthe same year, at Fort Hood, Texas, when a mass shootinghappened, a soldier inside the base set out messages via Twitterwhen the incident was unfolded. Though his informationwas incorrect, his reports of multiple shooters and shootinglocations quickly spread through the social networks and evento the mass media like television broadcast. In July 2011, theTwitter account of Fox news was hacked, and the accountrepeatedly announced that the president of the United Stateshas been shot dead. The rogue posts were rapidly sharedaround the internet.Currently, most social networks barely have any protec-tive mechanisms to fight against the intentional/unintentionalmalicious rumors. On the other hand, such rumors tend toexaggerate facts, get more attention from people, and thuspropagate much faster than the truth. As a result, it is necessaryto have a safeguard to contain malicious rumors in onlinesocial networks. To make the social networks a reliable andsound platform to disseminate critical information, there is aneed of a tool to detect such malicious rumors as well aslimit its propagation if necessary. Recently, many researchesinvestigated the problem of containing malicious rumors insocial networks [4], [5], [6]. Unlike the traditional influencemaximization problem, the goal of this rumor containmentproblem is to limit the rumors’ influence.This paper aims to find a set of users as a seed set tocontain a rumor. The seed set helps us to broadcast goodinformation to compete with bad information (rumors) andto make the users of the network to be protected with goodinformation. Since the rumors are spread very fast inside thesocial network and will undesirable affect general public, weshould announce the correct information, i.e. an authorizedannouncement to eliminate the rumors. In order to restrict thespread of the rumors, in [6], Nguyen et al. studied the
 β 
 nodeprotectors problem to find the smallest set of highly influentialnodes whose decontamination with good information helpsto contain the viral spread of misinformation. Based on the
β 
 node protectors problem, we propose the
 γ 
 
 k
 rumorrestriction problem. Nguyen et al. in [6] selected the seed setfrom decontaminated nodes. However, in our
 γ 
 
 k
 rumorrestriction problem, we first choose a seed set from boththe contaminated nodes and the decontaminated nodes. Wedescribe the different methods of choosing seed set in Fig.1. In general, at the beginning of rumor spread, most of the contaminated nodes are not the real rumors. Since thereis no correct information to compete with misinformation,the contaminated nodes are easy to trust rumors and spreadit. However, if an authorized announcement appears, thosecontaminated nodes may change their minds. In the exampleplotted in Fig. 2, a student Alice posts a rumor message “Thefinal exam is canceled.” on her Twitter. Her classmates trustit and broadcast it on their Twitters. Those classmates canbe regards as the contaminated nodes. However, after the TAposts the authorized announcement that “The final exam ison next Thursday.” Alice’s classmates will change their mind.
 
Choose Seed set in decontaminated set
    
 Rumors Seed nodes(Proctors) Decontaminated nodes
Fig. 1: Different methods for containment of rumor.Therefore, we propose the
 γ 
 −
 k
 rumor restriction problem.Whose detailed description will be described in Section III.
Our Contribution:
 We propose the
 γ 
 −
k
 rumor restrictionproblem, which is to find a set
 
 
 V 
 with
 k
 protectors, inwhich the
 γ 
 ∗
k
 protectors are from the contaminated set andthe
 (1
 −
 γ 
)
 ∗
 k
 protectors are from the decontaminated set.The protectors will be provided with good information so thatthe expected number of decontamination nodes in the wholenetwork is maximized. (It is easy to find that if we chooseall the contaminated nodes as the seeds, the rumor will beblocked.) The reason we give a ratio
 γ 
 is that more effort willbe taken to make a contaminated node which was already in-fluenced with bad information to accept the good information.This ratio represents the ratio of seeds we choose from thecontaminated node set. We study our problem under both LTand IC models, and give two algorithms for the two modelsrespectively. We also prove that the objective function aremonotone and submodular for both LT and IC models, whichimplies that the greedy algorithm has the performance ratioof 
 1
1
/e
. We conduct extensive experiments on large-scalereal-world social network datasets to study the performancesof our algorithms.
Roadmap:
 The remainder of this paper will be organizedas follows. We will survey some related works in SectionII. In Section III we will define our problem:
 γ 
 
 k
 rumorrestriction problem for both LT and IC models. The monotonyand submodularity of our proposed objective function willbe proved in Section IV. In Section V, we will represent theexperimental results on real datasets and analyze the results.Finally, we will conclude our paper in Section VI.II. RELATED WORKSAs a result of the popularity of social networks, a great dealof researchers have conducted extensive studies on variousproblems in social networks. One of the fundamental problemsis the influence maximization problem: to select a seed setof 
 k
 nodes in a social network so that the total influenceis maximized. Influence maximization problem was firstlystudied by Domingos and Richardsons [1], [2]. In [3] Kempeet al. proposed the linear threshold (LT) model and theindependent cascade (IC) model for influence maximizationproblem. The core of the model is that, the whole socialnetwork changed into a graph, beginning with an initial set of vertices
 S 
 in the graph, information propagates from initial setto their neighbors and then other neighbors. In the diffusion of LT model, each node
 v
 has a threshold
 θ
v
 , which is usuallyassigned uniform randomly in interval
 [0
,
1]
. The influencefrom
 u
 which is one of 
 v
’s neighbors to
 v
 is denoted by aweight
 w
(
u,v
)
,
u
 w
uv
 
 1
. Node
 v
 becomes active onlyif 
u
 w
uv
 
 θ
v
. However, the diffusion of IC model isdifferent, in first step of the IC model, the node in
 
 hasa single chance to activate each currently inactive neighbor.In step
 t
, when node
 v
 becomes active,
 v
 is given a singlechance to activate each currently inactive neighbor
 u
, withprobability
 
. no matter whether the activation process is asuccess or failure, node
 u
 will not attempt to activate
 v
 againin subsequent step. The activation process ends until there areno more activations possible.Ma et al. discussed diffusion of negative opinions in [7], butthe authors did not explain where the negative opinions comefrom. In [4], Chen et al. focused on maximizing the expectednumber of positive nodes in network after the cascade, andproposed an IC-N model, which extends the independentcascade model and explicitly incorporates the emergence andpropagation of negative opinions. BUdak et al. in [8] studiedthe user’s opinion by identifying a subset of individuals thatneeds to be convinced to adopt the competing (or “good”)campaign so as to minimize the number of people whoadopt the “bad” campaign at the end of both propagations.In [9], He et al. studied competitive influence propagation insocial networks under the competitive linear threshold (CLT)model, an extension to the classic linear threshold model.Under CLT model, the authors defined the influence blockingmaximization (IBM) problem, which focuses on blocking theinfluence propagation of its competing entity as much aspossible by selecting a number of seed nodes that have highinfluence propagation. In CLT model, each node edge has twoweights, positive weight and negative weight to propagate dif-ferent attitude influence. The authors aimed to study influencecompetitively and try to minimize the expected number of negatively activated nodes. In [30], Dinh et al. investigated thecost-effective massive viral marketing problem, which takesinto the consideration the limited influence propagation. Tominimize the seeding cost, the authors provided mathematicalprogramming to find optimal seeding for medium-size net-works, and proposed VirAds, an efficient algorithm, to tacklethe problem on large-scale networks. In [31], Wang et al.investigated the influence quantification problem and proposeda pairwise factor graph (PFG) model for influence in socialnetworks. An efficient algorithm was designed to learn themodel and make inference. The authors further proposed adynamic factor graph (DFG) model to incorporate the timeinformation.In [32], Qazvinian discussed the problem of rumor detectionin microblogs and explored the effectiveness of 
 3
 categoriesof features: content-based, network-based, and microblog-specific for correctly identifying rumors. Moreover, the authors
 
showed how these features are also effective in identifyingdisinformers, the users who endorse a rumor and further helpit spread. Nguyen et al. in [33] discussed the
 k
-Suspectorproblem which aims to identify the top
 k
 most suspectedsources of misinformation. The authors also proposed twoeffective approaches namely ranking-based and optimization-based algorithms respectively and further extended the solu-tions to cope with the incompleteness of collected data as wellas multiple attacks. In [34], it discussed the rumor centralityissue and shows that the node with the maximal rumor cen-trality is indeed the maximum likelihood estimator for regulartrees. It also provided a simple linear time message-passingalgorithm for evaluating the rumor centrality,allowing for fastestimation of the rumor source in large networks. In [35], Shahet al. presented an analysis of user conversations in onlinesocial media and their evolution over time, and proposed adynamic model that predicts the growth dynamics and struc-tural properties of conversation threads. Shah et al. showedthat there are actually underlying rules in common for onlineconversations in different social media websites. In [36], Wanget al. studied the Maximum Circle of Trust (MCT) problemseeking to share the information with the maximum expectednumber of the poster’s friends such that the information spreadto the unwanted targets is brought to its knees. The authorsproposed
 FPTAS 
 and
 PTAS 
 algorithms for simple scenarios of this problem. For the general case, the authors showed the
 ♯P 
-hardness, and proposed an effective Iterative Circle of TrustDetection (ICTD) algorithm based on a novel greedy function.In [6], Shen et al. studied the
 β 
 Node Protectors problem,which aims to find the smallest set of highly influential nodeswhose decontamination with good information helps to containthe viral spread of misinformation. The process initiates froma set
 I 
 to a desired ratio
 1
β 
 of the whole nodes in
 T 
 timesteps. The authors proposed a Greedy Viral Stopper (GVS)algorithm that provides better lower bounds on the number of selected nodes, and a community-based heuristic method forthe Node Protector problems. However, this algorithm requiresthe uniform protection fraction for each community.III. P
ROBLEM
 F
ORMULATION AND
 DIFFUSION MODELIn this section we formulate our
 γ 
 
 k
 rumor restrictionmodel. Initially, there is a contaminated set
 I 
. We aim to finda node set
 S 
 with
 k
 protector nodes, in which
 γ 
k
 nodes arefrom the contaminated set
 I 
 and
 (1
γ 
)
k
 nodes are from thedecontaminated set, to protect the whole network so that thenumber of decontaminated nodes is maximized. The symbolsare summarized in Table I.
 A. Problem Statement 
Our social network can be modeled as a directed graph
G
 = (
V,E,w
)
, where
 V 
 is a set of 
 n
 nodes and
 E 
 ⊆
 V 
 ×
is a set of 
 m
 directed edges. A node
 v
 
 
 represents anindividual user in the social network and an edge
 (
v,u
)
 ∈
 E 
means node
 v
 has some relationship with node
 u
, the influenceweight
 w
(
u,v
)
 reflects the influence weight from node
 v
 tonode
 u
.TABLE I: Symbol descriptions
Notations Descriptions
 Contaminated set
w
(
u,v
)
 Influence weight from node
 v
 to node
 uη
v
 Threshold for contaminated nodes to trust good information
q
 Truth factor in IC model
SW 
(
v,t
)
 v
’s status weight at time
 tPIN 
(
u,t
)
 The protect influence of node
 u
, at given step
 tσ
G
(
S,t
)
 The total influence of seed set
 S AP 
(
v,
)
 Probability of node
 v
 protected with good information
P
(
ρ,
)
 The protected influence of a path from
 S 
 to
 v
Definition 1.
 (
γ 
 
 k
 rumor restriction problem): Given asocial network represented by a directed graph
 G
 = (
V,E,w
)
 ,an underlying diffusion model (either LT or IC model), atime duration
 
 , and a contaminated set 
 
 that holds therumor information, to find a set 
 
 
 V 
 ,
 |
|
 =
 k
 , both fromthe decontaminated nodes and the contaminated nodes, suchthat providing the good information to this set can maximizethe expected number of decontamination nodes in the wholenetwork when this time duration ends.
We choose the protectors both in contaminated node set
 
and decontaminated node set. The reason we choose seed set
 from contaminated set
 I 
 with ratio
 γ 
 is to save the cost of turning contaminated nodes. Initially, there is a contaminatedset
 
 with bad information. In the time duration
 
, we aimto find a node set
 
 with
 k
 protector nodes which helpus broadcast the good information. We assume that oncea decontaminated node receives the good information, thebad information will never influence it any more. We alsoassume that it takes no cost to influence a decontaminatednode. In Linear Threshold (LT) model, we randomly choosea good information threshold
 η
v
 for each contaminated node
v
.
 η
v
 represents the threshold for the contaminated node totrust good information from decontaminated node
 u
(
u
 ∈
 S 
)
.When the weight
 w
(
u,v
)
 is larger than the good informationthreshold
 η
v
, the good information will take over the bad oneon node
 v
. In Independence Cascade (IC) model, we definetruth factor
 
, which indicates the probability that a con-taminated node becomes decontaminated after it is activatedby a decontaminated neighbor
 u
(
u
 
 
)
. Fig. 3 and Fig. 4describe the process. Fig. 3 plots the initial situation, wherethere is a contaminated set
 
 to broadcast misinformation.All decontaminated nodes beyond the set
 
 have risk of being contaminated. In our
 γ 
 −
 k
 rumor restriction problemwe find
 k
 protector nodes both from contaminated set anddecontaminated set. In the end, number of decontaminatednodes is maximum. In Fig. 4, we describe that we chooseprotectors both from contaminated set and decontaminatedset. With the help of protectors, the decontaminated nodesare protected with good information.Based on the definition above, we can now formally formu-late the
 γ 
 −
k
 rumor restriction problem in LT model and ICmodel.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->