You are on page 1of 1

2016 IEEE International Conference on Big Data (Big Data)

Nowcasting with Social Media Data

David L. Kimmey and Jin S. Yoo


Department of Computer Science
Indiana University-Purdue University Fort Wayne
Indiana, USA
Email: kimmd01@ipfw.edu, yooj@ipfw.edu

Extended Abstract —Social media is a source of real time their algorithms are not scalable to handle big social media
data that individuals create and voluntarily share on major data.
social media generators such as Facebook, Twitter, Google,
This work proposes a scalable parallel and distributed
Yahoo, and Instagram. Nowcasting uses real-time data and
algorithm for social media event detection, which finds a
is defined as the prediction of the present, the very near
user specified event within social media utilizing the prob-
future, and the very recent past. Nowcasting with social media
abilistic soft logic method. The proposed method identifies
assists businesses and government agencies in understanding
the relationships between the geospatial and causal model
public opinion and trends, and in creating timely forecasts
variables within the data. Generated probabilistic soft logic
of economic indicators. The example includes geo-economic
rules are then aggregated by a given rule weight threshold to
events: how the public’s sentiment affects the stock mar-
identify interesting events users specified. Our method also
ket [1], predicting political alignment: management of political
provides three advantages over query approaches: the user
strategy [4], and social questions: popular events associated
specified events corpus has less noise; the clearer recognition
with increased public sentiment [8]. Nowcasting and social
of the user specified events approximate current location, and
media are also being used together to discover unusual social
the identification of popular keyword themes from a user’s
events such as demonstrations and spontaneous festivals, and
specified event within social media. For the evaluation of
natural disasters such as earthquakes and storms [6], and to
the proposed approach, we used Twitter data to nowcast a
forecast early detection of disease activity - thereby allowing
corporate or government agency report, and to compare and
rapid disease response, which reduces the public impact of
contrast the nowcast results with a traditional time series
disease [2].
model. The experiment of predicting the CDCs influenza like
Research has shown high correlations between social media illness report shows nowcasting with social media outperforms
data and government reports of 76.7% up to 90% [3]. There the traditional time series model by as much as 16%, up to
are advantages to using nowcasting to predict report values. For 20%, in terms of statistical error. Furthermore, the parallel
example, it is not trivial to forecast turning points in time series event detection algorithm with MapReduce improved the al-
data, however a forecast using nowcasting has turning points gorithms running time by 65%.
with higher correlation to the actual data than a traditional R EFERENCES
time series model [2]. Many government and business report
[1] J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market
values do not use real-time data values (i.e., the reports value . Journal of Computational Science, 2(1):1–8, 2011.
describes past results from the previous week, month, or [2] H. Choi and H. Varian. Predicting the present with Google Trends.
quarter), however, to make timely decisions, governments and Economic Record, 88:2–9, 2012.
businesses need to forecast, in real-time, trends and events [3] V. R. Kamala and L. MayGladence. An optimal approach for social
which may affect their operations. Nowcasting provides a real- data analysis in Big Data. In Proc. of International Conference on
time data prediction solution for real-time decisions. Computation of Power, Energy Information and Communication, pages
192–199, 2011.
The research question is: how does users obtain clean, rele- [4] S. Kumar, F. Morstatter, and H. Liu. Twitter Data Analytics. Springer,
vant, and timely social media data to complete the nowcasting 2013.
process for their specific reporting needs? One possible ap- [5] R. Lee, S. Wakamiya, and K. Sumiya. Discovery of unusual regional so-
proach is to query real-time data from social media and to now- cial activities using geo-tagged microblogs. World Wide Web, 14(4):321–
cast a trend or event; however, the defining and detection of 349, July 2011.
events is a non-trivial task and has long been a research topic. [6] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users:
Real-time event detection by social sensors. In Proceedings of the
Furthermore, nowcasting with social media gives challenges International Conference on World Wide Web, pages 851–860, 2010.
due to the characteristics of big data such as heterogeneous, [7] R. D. Santos, S. Shah, F. Chen, A. Boedihardjo, C. Lu, and N. Ramakrish-
autonomous, complex, and evolving associations. For example, nan. Forecasting location-based events with spatio-temporal storytelling.
Twitters tweet content is usually overwhelmed with “babble” In Proceedings of the ACM SIGSPATIAL International Workshop on
(i.e., about 40% of tweets queried or data mined will not Location-Based Social Networks, pages 13–22, 2014.
include the queried keyword) [3]. Twitters broadcasted tweet [8] M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment in Twitter events.
locations are autonomous and have evolving association that Journal of the American Society for Information Science and Technology,
62(2):206–418, 2011.
are dynamically changing and increasing in a real-time nature.
[9] Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-
Popular methods to find real-time social media events are line event detection. In Proceedings of the International ACM SIGIR
frequency bursts method [9], deviation (/anomaly) detection Conference on Research and Development in Information Retrieval,
method [5], and probabilistic soft logic method [7]. However pages 28–36, 1998.

978-1-4673-9005-7/16/$31.00 ©2016 IEEE 4004

You might also like