Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
865Activity
0 of .
Results for:
No results containing your search query
P. 1
Influence and Passivity in Social Media - HP Labs Research

Influence and Passivity in Social Media - HP Labs Research

Ratings:

4.4

(5)
|Views: 33,110 |Likes:
Published by Hewlett-Packard
(Source: Hewlett Packard) "A large study of information propa- gation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and pas- sivity of users based on their information forwarding activity."
(Source: Hewlett Packard) "A large study of information propa- gation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and pas- sivity of users based on their information forwarding activity."

More info:

Published by: Hewlett-Packard on Aug 05, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/13/2014

pdf

text

original

 
Influence and Passivity in Social Media
Daniel M. Romero
Cornell UniversityCenter for AppliedMathematicsIthaca, NY, USA
dmr239@cornell.eduWojciech Galuba
EPFLDistributed InformationSystems LabLausanne, Switzerland
wojciech.galuba@epfl.chSitaram Asur
Social Computing LabHP LabsPalo Alto, California, USA
sitaram.asur@hp.comBernardo A. Huberman
Social Computing LabHP LabsPalo Alto, California, USA
bernardo.huberman@hp.com
ABSTRACT
The ever-increasing amount of information flowing throughSocial Media forces the members of these networks to com-pete for attention and influence by relying on other peopleto spread their message. A large study of information propa-gation within Twitter reveals that the majority of users actas passive information consumers and do not forward thecontent to the network. Therefore, in order for individualsto become influential they must not only obtain attentionand thus be popular, but also overcome user passivity. Wepropose an algorithm that determines the influence and pas-sivity of users based on their information forwarding activ-ity. An evaluation performed with a 2.5 million user datasetshows that our influence measure is a good predictor of URLclicks, outperforming several other measures that do not ex-plicitly take user passivity into account. We also explicitlydemonstrate that high popularity does not necessarily implyhigh influence and vice-versa.
1. INTRODUCTION
The explosive growth of Social Media has provided mil-lions of people the opportunity to create and share contenton a scale barely imaginable a few years ago. Massive partic-ipation in these social networks is reflected in the countlessnumber of opinions, news and product reviews that are con-stantly posted and discussed in social sites such as Facebook,Digg and Twitter, to name a few. Given this widespreadgeneration and consumption of content, it is natural to tar-get one’s messages to highly connected people who will prop-agate them further in the social network. This is particularlythe case in Twitter, which is one of the fastest growing so-cial networks on the Internet, and thus the focus of advertis-
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
ing companies and celebrities eager to exploit this vast newmedium. As a result, ideas, opinions, and products competewith all other content for the scarce attention of the usercommunity. In spite of the seemingly chaotic fashion withwhich all these interactions take place, certain topics man-age to get an inordinate amount of attention, thus bubblingto the top in terms of popularity and contributing to newtrends and to the public agenda of the community. Howthis happens in a world where crowdsourcing dominates isstill an unresolved problem, but there is considerable consen-sus on the fact that two aspects of information transmissionseem to be important in determining which content receivesinordinate amounts of attention.One is the popularity and status of given members of thesesocial networks, which is measured by the level of atten-tion they receive in the form of followers who create linksto their accounts to automatically receive the content theygenerate. The other is the influence that these individualswield, which is determined by the actual propagation of theircontent through the network. This influence is determinedby many factors, such as the novelty and resonance of theirmessages with those of their followers and the quality andfrequency of the content they generate. Equally importantis the passivity of members of the network which provides abarrier to propagation that is often hard to overcome. Thusgaining knowledge of the identity of influential and least pas-sive people in a network can be extremely useful from theperspectives of viral marketing, propagating one’s point of view, as well as setting which topics dominate the publicagenda.In this paper, we analyze the propagation of web links ontwitter over time to understand how attention to given usersand their influence is determined. We devise a general modelfor influence using the concept of passivity in a social net-work and develop an efficient algorithm similar to the HITSalgorithm [14] to quantify the influence of all the users in thenetwork. Our influence measure utilizes both the structuralproperties of the network as well as the diffusion behavioramong users. The influence of a user thus depends on notonly the size of the influenced audience, but also on theirpassivity. This differentiates it from earlier measures of in-fluence which were primarily based on individual statisticalproperties such as the number of followers or retweets [7].
 
We have shown through our extensive evaluation thatthis influence model outperforms other measures of influ-ence such as PageRank, H-index, the number of followersand the number of retweets. In addition it has good predic-tive properties in that it can forecast in advance the upperbound on the number of clicks a URL can get. We havealso presented case studies showing the top influential usersuncovered by our algorithm. An important conclusion fromthe results is that the correlation between popularity andinfluence is quite weak, with the most influential users notnecessarily the ones with the highest popularity. Addition-ally, when we considered nodes with high passivity, we founda majority of them to be spammers and robot users. Thisdemonstrates an application of our algorithm in automaticuser categorization and filtering of online content .
2. RELATED WORK
The study of information and influence propagation insocial networks has been particularly active for a numberof years in fields as disparate as sociology, communication,marketing, political science and physics. Earlier work fo-cused on the effects that scale-free networks and the affinityof their members for certain topics had on the propagationof information [6]. Others discussed the presence of keyinfluentials [12, 11, 8, 5, 10] in a social network, defined asthose who are responsible for the overall information dis-semination in the network. This research highlighted thevalue of highly connected individuals as key elements in thepropagation of information through the network.Huberman et al. [2] studied the social interactions onTwitter to reveal that the driving process for usage is asparse hidden network underlying the friends and follow-ers, while most of the links represent meaningless interac-tions. Jansen et al. [3] have examined twitter as a mecha-nism for word-of-mouth advertising. They considered par-ticular brands and products and examined the structure of the postings and the change in sentiments. Galuba et al. [4]propose a propagation model that predicts, which users willtweet about which URL based on the history of past useractivity.There have also been earlier studies focused on social in-fluence and propagation. Agarwal et al. [8] have examinedthe problem of identifying influential bloggers in the blogo-sphere. They discovered that the most influential bloggerswere not necessarily the most active. Aral et al [9] havedistinguished the effects of homophily from influence as mo-tivators for propagation. As to the study of influence withintwitter, Cha et al. [7] have performed a comparison of threedifferent measures of influence - indegree, retweets and usermentions. They discovered that while retweets and men-tions correlated well with each other, the indegree of usersdid not correlate well with the other two measures. Basedon this, they hypothesized that the number of followers maynot a good measure of influence. On the other hand, Wenget al [5] have proposed a topic-sensitive PageRank measurefor influence in Twitter. Their measure is based on the factthat they observed high reciprocity among follower relation-ships in their dataset, which they attributed to homophily.However, other work [7] has shown that the reciprocity islow overall in Twitter and contradicted the assumptions of this work.
3. TWITTER3.1 Background on Twitter
Twitter is an extremely popular online microblogging ser-vice, that has gained a very large user base, consisting of more than 105 million users (as of April 2010). The Twittergraph is a directed social network, where each user choosesto follow certain other users. Each user submits periodicstatus updates, known as
tweets
, that consist of short mes-sages limited in size to 140 characters. These updates typi-cally consist of personal information about the users, newsor links to content such as images, video and articles. Theposts made by a user are automatically displayed on theuser’s profile page, as well as shown to his followers.A
retweet 
is a post originally made by one user that is for-warded by another user. Retweets are useful for propagatinginteresting posts and links through the Twitter community.Twitter has attracted lots of attention from corporationsfor the immense potential it provides for viral marketing.Due to its huge reach, Twitter is increasingly used by newsorganizations to disseminate news updates, which are thenfiltered and commented on by the Twitter community. Anumber of businesses and organizations are using Twitteror similar micro-blogging services to advertise products anddisseminate information to stockholders.
3.2 Dataset
Twitter provides a Search API for extracting tweets con-taining particular keywords. To obtain the dataset for thisstudy, we continuously queried the Twitter Search API fora period of 300 hours starting on 10 Sep 2009 for all tweetscontaining the string
http
. This allowed us to acquire acontinuous stream of 22 million tweets with URLs, whichwe estimated to be 1/15th of the entire Twitter activity atthat time. From each of the accumulated tweets, we ex-tracted the URL mentions. Each of the unique 15 millionURLs in the data set was then checked for valid formattingand the URLs shortened via the services such as
bit.ly
or
tinyurl.com
were expanded into their original form by fol-lowing the HTTP redirects. For each encountered uniqueuser ID, we queried the Twitter API for metadata aboutthat user and in particular the user’s followers and followees.The end result was a dataset of timestamped URL mentionstogether with the complete social graph for the users con-cerned.
User graph.
The user graph contains those users whosetweets appeared in the stream, i.e., users that during the300 hour observation period posted at least one public tweetcontaining a URL. The graph does not contain any users whodo not mention any URLs in their tweets or users that havechosen to make their Twitter stream private.For each newly encountered user ID, the list of followedusers was only fetched once. Our data set does not capturethe changes occurring in the user graph over the observationperiod.
4. THE IP ALGORITHM
Evidence for passivity.
The users that receive informa-tion from other users may never see it or choose to ignoreit. We have quantified the degree to which this occurs onTwitter (Fig. 4). An average Twitter user retweets only onein 318 URLs, which is a relatively low value. The retweetingrates vary widely across the users and the small number of 
 
10   -8   10   -7   10   -6   10   -5   10   -4   10   -3   10   -2   10   -1   10   0   retweetingrate   10   0   10   1   10   2   10   3   10   4   10   5   10   6   
      #    u    s    e   r    s
audienceretweetingrate   userretweetingrate   
Figure 1: Evidence for the Twitter user passivity.We measure passivity by two metrics: 1. the userretweeting rate and 2. the audience retweeting rate.The
user retweeting rate
is the ratio between thenumber of URLs that user
i
decides to retweet tothe total number of URLs user
i
received from thefollowed users. The
audience retweeting rate
is theratio between the number of user
i
’s URLs that wereretweeted by
i
’s followers to the number of times afollower of 
i
received a URL from
i
.
the most active users play an important role in spreading theinformation in Twitter. This suggests that the level of userpassivity should be taken into account for the informationspread models to be accurate.
Assumptions.
Twitter is used by many people as a toolfor spreading their ideas, knowledge, or opinions to others.An interesting and important question is whether it is pos-sible to identify those users who are very good at spreadingtheir content, not only to those who choose to follow them,but to a larger part of the network. It is often fairly easyto obtain information about the pairwise influence relation-ships between users. In Twitter, for example, one can mea-sure how much influence user A has on user B by countingthe number of times B retweeted A. However, it is not veryclear how to use the pairwise influence information to accu-rately obtain information about the relative influence eachuser has on the whole network. To answer this question wedesign an algorithm (IP) that assigns a relative
influence
score and a
passivity 
score to every user. The passivity of a user is a measure of how difficult it is for other users toinfluence him. We assume that the influence of a user de-pends on both the quantity and the quality of the audienceshe influences. In general, our model makes the followingassumptions:1. A user’s influence score depends on the number of peo-ple she influences as well as their passivity.2. A user’s influence score depends on how dedicated thepeople she influences are. Dedication is measured bythe amount of attention a user pays to a given one ascompared to everyone else.3. A user’s passivity score depends on the influence of those who she’s exposed to but not influenced by.4. A user’s passivity score depends on how much she re- jects other user’s influence compared to everyone else.
Algorithm 1
: The Influence-Passivity (IP) algorithm
0
(1
,
1
,...,
1)
R
|
|
;
0
(1
,
1
,...,
1)
R
|
|
;
for
i
= 1
to
m
do
Update
R
i
using operation (2) and the values
i
1
;Update
i
using operation (1) and the values
R
i
;
for
j
= 1
to
|
|
do
j
=
j
k
k
;
j
=
R
j
k
R
k
;
endend
Return (
m
,
m
);
Operation.
The algorithm iteratively computes both thepassivity and influence scores simultaneously in a similarmanner as in the HITS algorithm for finding authoritativeweb pages and hubs that link to them [14].Given a weighted directed graph
G
= (
N,E,W 
) withnodes
, arcs
, and arc weights
, where the weights
w
ij
on arc
e
= (
i,j
) represent the ratio of influence that
i
exerts on
j
to the total influence that
i
attempted to exerton
j
, the IP algorithm outputs a function
:
[0
,
1],which represents the nodes’ relative influence on the net-work, and a function
:
[0
,
1] which represents thenodes’ relative passivity of the network.For every arc
e
= (
i,j
)
, we define the
acceptance rate
by
u
ij
=
w
i,j
k
:(
k,j
)
E
w
kj
. This value represents the amountof influence that user
j
accepted from user
i
normalizedby the total influence accepted by
j
from all users on thenetwork. The acceptance rate can be viewed as the dedi-cation or loyalty user
j
has to user
i
. On the other hand,for every
e
= (
 j,i
)
we define the
rejection rate
by
v
ji
=1
w
ji
k
:(
j,k
)
E
(1
w
jk
). Since the value 1
w
ji
is amountof influence that user
i
rejected from
j
, then the value
v
ji
represents the influence that user
i
rejected from user
j
nor-malized by the total influence rejected from
j
by all users inthe network.The algorithm is based on the following operations:
i
j
:(
i,j
)
E
u
ij
j
(1)
i
j
:(
j,i
)
E
v
ji
j
(2)Each term on the right hand side of the above operationscorresponds to one of the listed assumptions. In operation1 the term
j
corresponds to assumption 1 and the term
u
ij
corresponds to assumption 2. In operation 2 the term
j
corresponds to assumption 3 and the term
v
ji
correspondsto assumption 4. The
Influence-Passivity algorithm 
(Algo-rithm 1) takes the graph
G
as the input and computes theinfluence and passivity for each node in
m
iterations.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->