Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
136Activity
0 of .
Results for:
No results containing your search query
P. 1
Predicting the Future With Social Media

Predicting the Future With Social Media

Ratings:

4.88

(8)
|Views: 3,691 |Likes:
Published by Hewlett-Packard
@HPLabs research, April 2010
@HPLabs research, April 2010

More info:

Published by: Hewlett-Packard on Apr 02, 2010
Copyright:Traditional Copyright: All rights reserved

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF or read online from Scribd
See more
See less

02/19/2013

pdf

 
Predicting the Future With Social Media
Sitaram Asur
Social Computing LabHP LabsPalo Alto, CaliforniaEmail: sitaram.asur@hp.com
Bernardo A. Huberman
Social Computing LabHP LabsPalo Alto, CaliforniaEmail: bernardo.huberman@hp.com
 Abstract
—In recent years, social media has become ubiquitousand important for social networking and content sharing. Andyet, the content that is generated from these websites remainslargely untapped. In this paper, we demonstrate how social mediacontent can be used to predict real-world outcomes. In particular,we use the chatter from Twitter.com to forecast box-officerevenues for movies. We show that a simple model built fromthe rate at which tweets are created about particular topics canoutperform market-based predictors. We further demonstratehow sentiments extracted from Twitter can be further utilized toimprove the forecasting power of social media.
I. I
NTRODUCTION
Social media has exploded as a category of online discoursewhere people create content, share it, bookmark it and network at a prodigious rate. Examples include Facebook, MySpace,Digg, Twitter and JISC listservs on the academic side. Becauseof its ease of use, speed and reach, social media is fastchanging the public discourse in society and setting trendsand agendas in topics that range from the environment andpolitics to technology and the entertainment industry.Since social media can also be construed as a form of collective wisdom, we decided to investigate its power atpredicting real-world outcomes. Surprisingly, we discoveredthat the chatter of a community can indeed be used to makequantitative predictions that outperform those of artificialmarkets. These information markets generally involve thetrading of state-contingent securities, and if large enough andproperly designed, they are usually more accurate than othertechniques for extracting diffuse information, such as surveysand opinions polls. Specifically, the prices in these marketshave been shown to have strong correlations with observedoutcome frequencies, and thus are good indicators of futureoutcomes [4], [5].In the case of social media, the enormity and high vari-ance of the information that propagates through large usercommunities presents an interesting opportunity for harnessingthat data into a form that allows for specific predictionsabout particular outcomes, without having to institute marketmechanisms. One can also build models to aggregate theopinions of the collective population and gain useful insightsinto their behavior, while predicting future trends. Moreover,gathering information on how people converse regarding par-ticular products can be helpful when designing marketing andadvertising campaigns [1], [3].This paper reports on such a study. Specifically we considerthe task of predicting box-office revenues for movies usingthe chatter from Twitter, one of the fastest growing socialnetworks in the Internet. Twitter
1
, a micro-blogging network,has experienced a burst of popularity in recent months leadingto a huge user-base, consisting of several tens of millions of users who actively participate in the creation and propagationof content.We have focused on movies in this study for two mainreasons.
The topic of movies is of considerable interest amongthe social media user community, characterized both bylarge number of users discussing movies, as well as asubstantial variance in their opinions.
The real-world outcomes can be easily observed frombox-office revenue for movies.Our goals in this paper are as follows. First, we assess howbuzz and attention is created for different movies and how thatchanges over time. Movie producers spend a lot of effort andmoney in publicizing their movies, and have also embracedthe Twitter medium for this purpose. We then focus on themechanism of viral marketing and pre-release hype on Twitter,and the role that attention plays in forecasting real-world box-office performance. Our hypothesis is that movies that are welltalked about will be well-watched.Next, we study how sentiments are created, how positive andnegative opinions propagate and how they influence people.For a bad movie, the initial reviews might be enough todiscourage others from watching it, while on the other hand, itis possible for interest to be generated by positive reviews andopinions over time. For this purpose, we perform sentimentanalysis on the data, using text classifiers to distinguishpositively oriented tweets from negative.Our chief conclusions are as follows:
We show that social media feeds can be effective indica-tors of real-world performance.
We discovered that the rate at which movie tweetsare generated can be used to build a powerful modelfor predicting movie box-office revenue. Moreover ourpredictions are consistently better than those producedby an information market such as the Hollywood Stock Exchange, the gold standard in the industry [4].
1
http://www.twitter.com
 
Our analysis of the sentiment content in the tweets showsthat they can improve box-office revenue predictionsbased on tweet rates only after the movies are released.This paper is organized as follows. Next, we survey recentrelated work. We then provide a short introduction to Twitterand the dataset that we collected. In Section 5, we study howattention and popularity are created and how they evolve.We then discuss our study on using tweets from Twitterfor predicting movie performance. In Section 6, we presentour analysis on sentiments and their effects. We concludein Section 7. We describe our prediction model in a generalcontext in the Appendix.II. R
ELATED
W
ORK
Although Twitter has been very popular as a web service,there has not been considerable published research on it.Huberman and others [2] studied the social interactions onTwitter to reveal that the driving process for usage is a sparsehidden network underlying the friends and followers, whilemost of the links represent meaningless interactions. Java etal [7] investigated community structure and isolated differenttypes of user intentions on Twitter. Jansen and others [3]have examined Twitter as a mechanism for word-of-mouthadvertising, and considered particular brands and productswhile examining the structure of the postings and the change insentiments. However the authors do not perform any analysison the predictive aspect of Twitter.There has been some prior work on analyzing the correlationbetween blog and review mentions and performance. Gruhland others [9] showed how to generate automated queriesfor mining blogs in order to predict spikes in book sales.And while there has been research on predicting movie sales,almost all of them have used meta-data information on themovies themselves to perform the forecasting, such as themovies genre, MPAA rating, running time, release date, thenumber of screens on which the movie debuted, and thepresence of particular actors or actresses in the cast. Joshiand others [10] use linear regression from text and metadatafeatures to predict earnings for movies. Sharda and Delen [8]have treated the prediction problem as a classification problemand used neural networks to classify movies into categoriesranging from ’flop’ to ’blockbuster’. Apart from the factthat they are predicting ranges over actual numbers, the bestaccuracy that their model can achieve is fairly low. Zhangand Skiena [6] have used a news aggregation model alongwith IMDB data to predict movie box-office numbers. Wehave shown how our model can generate better results whencompared to their method.III. T
WITTER
Launched on July 13, 2006, Twitter
2
is an extremelypopular online microblogging service. It has a very large userbase, consisting of several millions of users (23M unique users
2
http://www.twitter.com
in Jan
3
). It can be considered a directed social network, whereeach user has a set of subscribers known as followers. Eachuser submits periodic status updates, known as
tweets
, thatconsist of short messages of maximum size 140 characters.These updates typically consist of personal information aboutthe users, news or links to content such as images, videoand articles. The posts made by a user are displayed on theuser’s profile page, as well as shown to his/her followers. It isalso possible to send a direct message to another user. Suchmessages are preceded by
@
user
id
indicating the intendeddestination.A
retweet
is a post originally made by one user that isforwarded by another user. These retweets are a popular meansof propagating interesting posts and links through the Twittercommunity.Twitter has attracted lots of attention from corporationsfor the immense potential it provides for viral marketing.Due to its huge reach, Twitter is increasingly used by newsorganizations to filter news updates through the community.A number of businesses and organizations are using Twitteror similar micro-blogging services to advertise products anddisseminate information to stakeholders.IV. D
ATASET
C
HARACTERISTICS
The dataset that we used was obtained by crawling hourlyfeed data from Twitter.com. To ensure that we obtained alltweets referring to a movie, we used keywords present in themovie title as search arguments. We extracted tweets overfrequent intervals using the Twitter Search Api
4
, therebyensuring we had the timestamp, author and tweet text forour analysis. We extracted 2.89 million tweets referring to 24different movies released over a period of three months.Movies are typically released on Fridays, with the exceptionof a few which are released on Wednesday. Since an average of 2 new movies are released each week, we collected data overa time period of 3 months from November to February to havesufficient data to measure predictive behavior. For consistency,we only considered the movies released on a Friday and onlythose in wide release. For movies that were initially in limitedrelease, we began collecting data from the time it becamewide. For each movie, we define the
critical period 
as thetime from the week before it is released, when the promotionalcampaigns are in full swing, to two weeks after release, whenits initial popularity fades and opinions from people have beendisseminated.Some details on the movies chosen and their release datesare provided in Table 1. Note that, some movies that werereleased during the period considered were not used in thisstudy, simply because it was difficult to correctly identifytweets that were relevant to those movies. For instance,for the movie
2012
, it was impractical to segregate tweetstalking about the movie, from those referring to the year. Wehave taken care to ensure that the data we have used was
3
http://blog.compete.com/2010/02/24/compete-ranks-top-sites-for-january-2010/ 
4
http://search.twitter.com/api/ 
 
Movie Release DateArmored 2009-12-04Avatar 2009-12-18The Blind Side 2009-11-20The Book of Eli 2010-01-15Daybreakers 2010-01-08Dear John 2010-02-05Did You Hear About The Morgans 2009-12-18Edge Of Darkness 2010-01-29Extraordinary Measures 2010-01-22From Paris With Love 2010-02-05The Imaginarium of Dr Parnassus 2010-01-08Invictus 2009-12-11Leap Year 2010-01-08Legion 2010-01-22Twilight : New Moon 2009-11-20Pirate Radio 2009-11-13Princess And The Frog 2009-12-11Sherlock Holmes 2009-12-25Spy Next Door 2010-01-15The Crazies 2010-02-26Tooth Fairy 2010-01-22Transylmania 2009-12-04When In Rome 2010-01-29Youth In Revolt 2010-01-08TABLE IN
AMES AND RELEASE DATES FOR THE MOVIES WE CONSIDERED IN OURANALYSIS
.
disambiguated and clean by choosing appropriate keywordsand performing sanity checks.
2 4 6 8 10 12 14 16 18 2050010001500200025003000350040004500release weekendweekend 2
Fig. 1. Time-series of tweets over the critical period for different movies.
The total data over the critical period for the 24 movieswe considered includes 2.89 million tweets from 1.2 millionusers.Fig 1 shows the timeseries trend in the number of tweetsfor movies over the critical period. We can observe that thebusiest time for a movie is around the time it is released,following which the chatter invariably fades. The box-officerevenue follows a similar trend with the opening weekendgenerally providing the most revenue for a movie.Fig 2 shows how the number of tweets per unique authorchanges over time. We find that this ratio remains fairlyconsistent with a value between 1 and 1.5 across the criticalperiod. Fig 3 displays the distribution of tweets by different
2 4 6 8 10 12 14 16 18 2011.11.21.31.41.51.61.71.81.92
Days
   T  w  e  e   t  s  p  e  r  a  u   t   h  o  r  s
Release weekend
Fig. 2. Number of tweets per unique authors for different movies
01234567802468101214
log(tweets)
       l     o     g       (       f     r     e     q     u     e     n     c     y       )
 
Fig. 3. Log distribution of authors and tweets.
authors over the critical period. The X-axis shows the numberof tweets in the log scale, while the Y-axis represents thecorresponding frequency of authors in the log scale. We canobserve that it is close to a Zipfian distribution, with a fewauthors generating a large number of tweets. This is consistentwith observed behavior from other networks [12]. Next, weexamine the distribution of authors over different movies. Fig 4shows the distribution of authors and the number of moviesthey comment on. Once again we find a power-law curve, witha majority of the authors talking about only a few movies.V. A
TTENTION AND
P
OPULARITY
We are interested in studying how attention and popularityare generated for movies on Twitter, and the effects of thisattention on the real-world performance of the movies consid-ered.
 A. Pre-release Attention:
Prior to the release of a movie, media companies and andproducers generate promotional information in the form of trailer videos, news, blogs and photos. We expect the tweetsfor movies before the time of their release to consist primarilyof such promotional campaigns, geared to promote word-of-mouth cascades. On Twitter, this can be characterized bytweets referring to particular urls (photos, trailers and other

Activity (136)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
abangirfan liked this
Daniel Sokol added this note
Awesome, i've been wanting to see work like this for a while
timothyas liked this
Cristina Floroiu liked this
Cristina Floroiu liked this
Jo Barnes liked this
torchlightses liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->