Professional Documents
Culture Documents
Abstract—Social media services have become increasingly example, automatically detecting emotions such as joy, sad-
popular and their penetration is worldwide. Micro-blogging ness, fear, anger, and surprise in the social web has several
services, such as Twitter, allow users to express themselves, practical applications, for instance, tracking the popularity of
share their emotions and discuss their daily life affairs in
real-time, covering a variety of different points of view and political figures or public response to new released products.
opinions, including political and event-related topics such as This is the field of sentiment analysis, which involves
immigration, economic issues, tax policy or election campaigns. determining the opinions and private states (beliefs, feelings,
On the other hand, traditional methods tracking public opinion and speculations) of the speaker towards a target entity [3].
still heavily rely upon opinion polls, which are usually limited
to small sample sizes and can incur in significant costs in terms Our goal in this work is to explore the sentiments and
of time and money. In this paper, we leverage state-of-the-art emotions towards political figures in Latin America. To
techniques of sentiment analysis for real-time political emotion this end, we analyze mentions on Twitter and blogs of
tracking. In particular, we analyze mentions of personal names eighteen Latin American presidents, between October 1,
of 18 presidents in Latin America, and measure each political
figure’s effect in the emotions reflected on the social web.
2011 and April 1, 2012. The names of the presidents and
their respective country are listed in Table I. By making use
Keywords- Computational Social Science; Sentiment Analy- of an emotion lexicon, we study the emotions evoked by
sis; Social Media Analytics; Twitter.
each president. While this approach is standard in many
I. I NTRODUCTION applications (e.g., [4], [5], [6]), we felt that a study on
political emotion detection via the social web, covering Latin
The Social Web has been successfully established and is America in particular, was necessary.
growing. Real-time microblogging services, such as Twitter
In summary the contributions of this paper are:
(twitter.com), have experienced an explosion in global user
adoption over the past years. It is estimated that the Twitter • We present an extensive sentiment analysis of the
users surpassed 300 million, and they generate more than political landscape of Latin America. To the best of our
200 million of 140-character Twitter messages – tweets – knowledge, this is the first study of this nature with a
every day [1]. Latin America is not the exception, Brazil, coverage of eighteen countries in the region.
Mexico, Venezuela, Colombia, Argentina, and Chile figure • For polarity and emotion detection, many studies have
among the top-20 countries in terms of Twitter accounts, been made in sentiment analysis. This paper presents,
as reported by a recent study by Semiocast, a provider of not only the extracted emotions and polarity, but also
consumer insight and brand management solutions [2]. goes a step forward and quantifies which combination
The high rate at which users share their opinions on blogs, of emotions explains better the public’s opinion.
forums, and social networking sites, such as Facebook or • The paper provides evidence of the potential uses of
Twitter, makes this kind of media even more attractive to twitter in emerging regions. It provides an example of
measure specific sentiments towards current affairs. Teal- integration of sentiment analysis in Spanish and the
time access to the large amount of user generated content real-time nature of Twitter using Latin America as test
available can provide the tools to social researchers, and bed.
citizens in general, to monitor the pulse of the society
towards specific topics of interest, a task traditionally ac- The rest of the paper is organized as follows. In Section II,
complished only through opinion polls, which are costly and we discuss our methodology and instruments: we describe
time consuming to conduct, and therefore frequently limited the preprocessing and analysis of the documents collected
to small sample sizes. for each president and also present our sentiment analysis
Real-time analysis of social media streams allows for framework and model. Section III discusses the results of our
discovery of latent patterns in public opinion, which can study. We present related work in Section IV. We conclude
be exploited to improve decision making processes. For the paper in Section V.
ID Country President ID Country President
P1 Argentina Cristina Fernández P10 Guatemala Otto Pérez Molina
P2 Bolivia Evo Morales P11 Honduras Porfirio Lobo
P3 Brazil Dilma Rouseff P12 Mexico Felipe Calderón
P4 Chile Sebastián Piñera P13 Nicaragua Daniel Ortega
P5 Colombia Juan Manuel Santos P14 Panama Ricardo Martinelli
P6 Costa Rica Laura Chinchilla P15 Paraguay Fernando Lugo
P7 Dominican Republic Leonel Fernández P16 Peru Ollanta Humala
P8 Ecuador Rafael Correa P17 Uruguay José Mujica
P9 El Salvador Mauricio Funes P18 Venezuela Hugo Chávez
Table I: Presidents of Latin America considered in the analysis (listed alphabetically by country name).
II. TAKING THE P ULSE OF P OLITICAL E MOTIONS • Blog posts were fetched using Google News RSS
Our approach consists of the following steps: Feeds4 . Similarly as in the case of tweets, we used
1) Data collection process as query term the name of the president and forced an
2) Emotion and Polarity Analysis exact match. We restricted the sources of information
3) Pattern recognition from the sentiment analysis to be exclusively blogs in the Spanish language. Again,
the time range was specified to the period under anal-
In this section, we first present how we collected the set
ysis. In this case, we consider as document the post’s
of documents used in the study and the dataset statistics.
title and the short snippet of text (∼300 characters)
Second, we explain the preprocessing techniques on the
contained in the item’s description tag of the RSS result
dataset, and finally, we explain the approach for emotion
as returned by Google.
analysis employed in this work. In Section III, we present
the results and discuss the patterns discovered from the
Stream of
sentiment analysis that can help explain the popularity Tweets and Blog snippets
observed in opinion polls.
A. Data Collection Process
We perform our study on a collection of 165,484 docu-
Social Web
ments, from them, 155,280 are 140-character Twitter mes- Social Analytics
sages or tweets, and 10,204 are snippets of weblog posts. The Data Collector
Table II: Contrast of opinion poll results, polarity (positive–negative), and Plutchik’s eight basic emotions opposing pairs: joy–
sadness, anger–fear, trust–disgust, and anticipation–surprise. (Presidents are listed alphabetically by country name).
As training vectors we will use president’s emotional vec- score pair. Note that the negative weight for the pair trust–
tors based on the emotion opposing pairs, plus the polarity disgust suggests a negative correlation with the opinion poll
dimensions, i.e., the training instance, Xp , for president p, results.
is defined as follows:
where FA–B , corresponds to the binary feature for polarity Social media provides a rich and diverse source of users’
or emotion pair, A–B. FA–B is equal to +1, if A > B; equal opinions that has shown great potential for political analysis.
to 0, if A == B; and equal to -1, if A < B. For example, Demartini et al. [6] apply sentiment and
For example, the corresponding training instance for the time series analysis techniques on blog data to estimate
Argentinian president, is as follows: XP 1 = [1, 1, −1, 1, 1], the temporal development of opinions for two candidates,
and the corresponding label is +1, and for the president Obama and McCain, during the US presidential elections
of Mexico, the training instance corresponds XP 12 = in 2008. In contrast, our work applies sentiment analysis
[−1, −1, −1, −1, 0], with a label −1 (see Table II). on tweets and small blog post snippets referring to Latin
The predicted value of the poll is given by: American presidents to estimate the effect that each of them
has in the emotions reflected on user generated content.
poll(p) = w ~p
~ ·X Another contrasting aspect is that we do not limit the
analysis to candidates within the same country, but we study
where w ~ is the weight vector to be learned. Our objective is the presidents of eighteen Latin American countries.
to fit a linear model, e.g., using a Support Vector Machine
Andranik Tumasjan et al. [11], examine whether Twitter
(SVM), that is able to predict the binary value of the opinion
is a vehicle for online political deliberation by looking
poll. Please note that given the small amount of training
at how people use microblogging to exchange information
data, we do not expect that this model will generalize well
about political issues. The authors evaluate whether Twitter
for unseen data, our objective is to discover what are the
messages reflect the current offline political sentiment in a
emotional features that are more influential. To this end, we
meaningful way and analyze whether the activity on Twitter
will analyze the learned weights w. ~
can be used to predict the popularity of parties or coalitions
As metric, we use the Area Under ROC Curve (AUC) [14] in the real world. Similarly, we analyze tweets and blog
to measure how well the linear model fits the data. The snippets as a source of political sentiment, however, our
AUC value will always be between 0.0 and 1.0, being the aim is to show how from these data, it is possible to build
best classifiers the ones with a higher AUC value. A random a profile of political figures, but in particular, current Latin
guess has an AUC equal to 0.5 . American presidents. Furthermore, our analysis is performed
We train the SVM model using Stochastic Gradient De- outside elections period, which gives us interesting insights
scent [15], with a learning rate equal to 0.001 and 5 epochs of how users feel with respect to actual presidents on a daily
of 10000 iterations each, the regularization parameter is set basis. The geographical area of our study and the language
to 0.0, since as we mentioned before, we are interested in of the analyzed tweets and snippets are contrasting aspects
fitting the model to the observed data. The averaged model as well, since we consider the area of Latin America and
weights obtained after repeating the procedure 100 times, we analyze content written in Spanish.
are as follows:
Johan Bollen et al. [16] explore how public mood patterns
~ = [0.249, 1.748, 0.265, −1.192, 1.694] .
w relate to fluctuations in macroscopic social and economic
indicators in the same time period. The authors perform a
This particular model achieves an AUC = 0.81. If we use sentiment analysis of Twitter data using an extended version
only polarity scores as predictors, the AUC drops signifi- of a psychometric instrument, the Profile of Mood States
cantly to 0.61. This indicates that polarity analysis is limited (POMS). Our work, on the contrary, focuses on how to
for popularity prediction, and a combination of emotions is exploit Social Media content (tweets and snippets) to build
a better approach for short-term popularity forecasts. a profile of each one of the Latin American presidents
Roughly speaking, a high positive (resp. negative) weight based on what people write about them and the sentiment
indicates that presidents with these emotional features should expressed on the used vocabulary. Yet another contrast is
be approved (resp. disapproved) by the people. The fea- the fact that we do not limit our analysis to the emotion or
tures corresponding to the emotional pairs joy–sadness and polarity extraction, but also exploit Plutchik’s four opposing
anticipation–surprise are the dominant terms of the expres- emotions pairs to train a linear model using SVM that pro-
sion, with weights 1.748 and 1.694, respectively. The pair vides insights on which combination of emotions explains
of emotions sadness–fear has a weight still over the polarity the outcome of people’s opinion.
8% 1% 2% 20% 7% 7% 8% 1%
anger trust anger anger 7% 10% trust anticipation 2% 20% anti
anger joy
14% 15% trust 15% anger3%
18% 15% 18% 10% trust anger
21%
anger trust
anger 10% 12%
14% 16% 14%
16%
disgust
17% disgust disgust
disgust disgust
16% 16% 16%
fear disgust fear 17%
18% 16% fear fear
fear 18%
disgust disgust 20% 17%
21%
fear 18% fear 34%
19% 21%
sadness sadness sadness sadness
surprise sadness
18% surprise
23% 16% surprise 11% sadness surprise surprise 23% surprise
surprise
sadness
4% 2% 2% 12% sadness 2%
5% 4% 2% 18%
12%
(q) (r)
Figure 5: Emotions Detected for each President.
V. C ONCLUSION AND F UTURE W ORK ACKNOWLEDGMENT
This work is partially supported by EU FP7 Project
We show in this work how the real-time nature of social CUBRIK (contract no. 287704).
media streams, in particular, Twitter, can be leveraged to take
the pulse of political emotions in emerging regions of the R EFERENCES
world, namely: Latin America. We performed a sentiment [1] @twittereng, “200 million tweets per day,” Twitter Blog. http:
analysis of tweets and brief blog posts over a period of //goo.gl/eybp0, June 2011.
six months. We applied a term-based method to detect the [2] Semiocast, “Countries on Twitter,” http://goo.gl/RfxZw, 2012.
[3] J. M. Wiebe, “Tracking point of view in narrative,” Comput.
polarity and emotions associated to eighteen Latin American Linguist., vol. 20, no. 2, pp. 233–287, Jun. 1994.
presidents. [4] P. S. Dodds, K. D. Harris, I. M. Kloumann, C. A. Bliss,
We conclude that the extracted polarity and isolated emo- and C. M. Danforth, “Temporal patterns of happiness and
tions alone, are not good predictors for the opinion captured information in a global social network: Hedonometrics and
twitter,” PLoS ONE, vol. 6, no. 12, p. e26752, 12 2011.
in an independent opinion poll. But the linear combination [5] S. Mohammad, “From once upon a time to happily ever after:
of basic emotions opposing pairs and polarity, achieved a Tracking emotions in novels and fairy tales,” in Proceedings
prediction performance of 81% in terms of AUC, whereas of the 5th ACL-HLT Workshop on Language Technology for
using polarity alone the AUC dropped to 61%. We also Cultural Heritage, Social Sciences, and Humanities. Port-
noticed that the pair of emotions joy–sadness dominated the land, OR, USA: Association for Computational Linguistics,
June 2011, pp. 105–114.
model. [6] G. Demartini, S. Siersdorfer, S. Chelaru, and W. Nejdl, “An-
Future work includes an online evaluation of our ap- alyzing political trends in the blogosphere,” in International
proach, where the global patterns discovered in the stream AAAI Conference on Weblogs and Social Media, 2011.
are immediately fed-back to the social web. We are inter- [7] Consulta Mitofsky – www.consulta.mx, “Aprobación de Man-
datarios América y El Mundo,” http://goo.gl/8fFNU, April
ested in measuring how this information and awareness can 2012.
affect individual and collective behavior. [8] H. Schmid, “Probabilistic part-of-speech tagging using deci-
All the 165,484 documents analyzed were in Spanish, but sion trees,” in Proceedings of the International Conference
the emotion lexicon was in English. Previous approaches on New Methods in Language Processing, Manchester, UK,
1994.
dealing with multilingualism in sentiment analysis use ma- [9] S. M. Mohammad and P. D. Turney, “Crowdsourcing a word-
chine assisted translation to translate all documents to the emotion association lexicon,” Computational Intelligence,
target language, usually English. However, our aim is to 2011.
process the social media stream in real-time, therefore the [10] R. Plutchik, A General Psychoevolutionary Theory of Emo-
translation of documents arriving at high speed becomes tion. New York: Academic press, 1980, pp. 3–33.
[11] A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe, “Pre-
problematic. Instead, we propose to translate the emotion dicting elections with twitter: What 140 characters reveal
lexicon itself to the original language of the documents, from about political sentiment,” in International AAAI Conference
English to Spanish. The translation of the lexicon is done on Weblogs and Social Media, 2010.
offline and only once. This methodology allowed a more [12] CID-Gallup – www.cidgallup.com, “Encuesta de Opinión
flexible architecture and the possibility of processing the Pública Centro América y República Dominicana,”
http://goo.gl/lc85v, December 2011.
documents more efficiently in real-time and in their original [13] C. M. Bishop, Pattern Recognition and Machine Learning
language. (Information Science and Statistics), 1st ed. Springer, 2007.
Our study has limitations that have to be considered for [14] T. Fawcett, “An introduction to roc analysis,” Pattern Recog-
future extensions, for example, the data collection process nition Letters, vol. 27, no. 8, pp. 861 – 874, 2006.
[15] L. Bottou, “Large-scale machine learning with stochastic
can be refined to consider more sophisticated named entity gradient descent,” in Proceedings of the 19th International
recognition, and not just the exact match of the names, Conference on Computational Statistics (COMPSTAT’2010),
moreover, the term-based sentiment analysis techniques that Y. Lechevallier and G. Saporta, Eds. Paris, France: Springer,
we applied are fast, but they do not consider complex context August 2010, pp. 177–187.
or senses, e.g., irony and sarcasm, that would be interesting [16] J. Bollen, H. Mao, and A. Pepe, “Modeling public mood and
emotion: Twitter sentiment and socio-economic phenomena,”
to explore. Additionally, a lexicon specialized in political in International AAAI Conference on Weblogs and Social
sentiment, in the target language studied, could bring bene- Media, 2011.
fits in the precision of the sentiment extracted, such a lexicon
could be built using crowdsourcing techniques, as in the case
of EmoLex, the one we used in this study.
Finally, we hope that this paper provides some insight
into the future of short-term forecasting of political figures
popularity, both during electoral campaigns and during pres-
idential mandates.