You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/286841798

Sentiment Analysis for Government: An Optimized Approach

Conference Paper · July 2015


DOI: 10.1007/978-3-319-21024-7_7

CITATIONS READS

9 4,165

8 authors, including:

Angelo Corallo L. Fortunato


Università del Salento Università del Salento
201 PUBLICATIONS   1,410 CITATIONS    44 PUBLICATIONS   256 CITATIONS   

SEE PROFILE SEE PROFILE

Marco Matera Marco Alessi


Engineering Ingegneria Informatica Engineering Ingegneria Informatica
2 PUBLICATIONS   11 CITATIONS    13 PUBLICATIONS   74 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

KHIRA Project View project

KHIRA – Knowledge based Holistic Integrated Research Approach View project

All content following this page was uploaded by Davide Storelli on 15 April 2016.

The user has requested enhancement of the downloaded file.


Sentiment Analysis for Government:
An Optimized Approach

Angelo Corallo1, Laura Fortunato1(&), Marco Matera1,


Marco Alessi2, Alessio Camillò3, Valentina Chetta3,
Enza Giangreco3, and Davide Storelli3
1
Dipartimento di Ingegneria Dell’Innovazione,
University of Salento, Lecce, Italy
{angelo.corallo,laura.fortunato}@unisalento.it,
marco.matera@studenti.unisalento.it
2
R&D Department, Engineering Ingegneria Informatica SPA, Palermo, Italy
marco.alessi@eng.it
3
R&D Department, Engineering Ingegneria Informatica SPA, Lecce, Italy
{alessio.camillo,valentina.chetta,enza.giangreco,
davide.storelli}@eng.it

Abstract. This paper describes a Sentiment Analysis (SA) method to analyze


tweets polarity and to enable government to describe quantitatively the opinion
of active users on social networks with respect to the topics of interest to the
Public Administration.
We propose an optimized approach employing a document-level and a
dataset-level supervised machine learning classifier to provide accurate results in
both individual and aggregated sentiment classification.
The aim of this work is also to identify the types of features that allow to
obtain the most accurate sentiment classification for a dataset of Italian tweets in
the context of a Public Administration event, also taking into account the size of
the training set. This work uses a dataset of 1,700 Italian tweets relating to the
public event of “Lecce 2019 – European Capital of Culture”.

Keywords: Sentiment analysis  Machine learning  Public administration

1 Introduction

Recently, Twitter, one of the most popular micro-blogging tools, has gained significant
popularity among the social network services. Micro-blogging is an innovative form of
communication in which users can express in short posts their feelings or opinions
about a variety of subjects or describe their current status.
The interest of many studies aimed at SA of Twitter messages, tweets, is
remarkable. The brevity of these texts (the tweets cannot be longer than 140 char-
acters) and the informal nature of social media, have involved the use of slang,
abbreviations, new words, URLs, etc. These factors together with frequent mis-
spellings and improper punctuation make it more complex to extract the opinions and
sentiments of the people.

© Springer International Publishing Switzerland 2015


P. Perner (Ed.): MLDM 2015, LNAI 9166, pp. 98–112, 2015.
DOI: 10.1007/978-3-319-21024-7_7
Sentiment Analysis for Government: An Optimized Approach 99

Despite this, recently, numerous studies have focused on natural language pro-
cessing of Twitter messages [1–4], leading to useful information in various fields, such
as brand evaluation [1], public health [5], natural disasters management [6], social
behaviors [7], movie market [8], political sphere [9].
This work falls within the context of the public administration, with the aim to
provide reliable estimates and analysis of what citizens think about the institutions, the
efficiency of services and infrastructures, the degree of satisfaction about special
events. The paper proposes an optimized sentiment classification approach as a political
and social decision support tool for governments.
The dataset used in the experiments contains Italian tweets relating to public event
“Lecce 2019-European Capital of Culture” which were collected using the Twitter
search API between 2 September 2014 and 17 November 2014.
We offer an optimized approach employing a document-level and a dataset-level
supervised machine learning classifier to provide accurate results in both individual and
aggregated classification. In addition, we detect the particular kind of features that
allow obtaining the most accurate sentiment classification for a dataset of the Italian
tweets in the context of a Public Administration event, considering also the size of the
training set and the way this affect results.
The paper is organized as follows: the research background is described in Sect. 2,
followed by a discussion of the public event “Lecce 2019: European Capital of Cul-
ture” in Sect. 3, a description of the dataset in Sect. 4 and a description of the machine
learning algorithms and optimization employed in Sect. 5. In Sect. 6, the results are
presented. Section 7 concludes and provides indications towards future work.

2 Research Background

The related work can be divided into two groups, general SA research and research
which is devoted specifically to the government domain.

2.1 Sentiment Analysis Approach


SA and Opinion mining studies, analyzes, classifies documents about the opinions
expressed by the people about a product, a service, an event, an organization or a
person. The objective of this area is the development of linguistic analysis methods that
allows identifying the polarity of the opinions.
In the last decade, SA had a strong development, thanks to the large presence on the
World Wide Web, to an increasing number of documents generated by users and
thanks to the diffusion of social network.
Every day, millions of users share opinions about their lives, providing an inex-
haustible source of data on which it is possible to apply the techniques for opinions
analysis.
In 2001 the paper of Das [10] and Tong [11] began to use the term “sentiment” in
reference to the automatic analysis of evaluative text and tracking of the predictive
judgments. Since that time, the awareness of the research problems and the opportunity
100 A. Corallo et al.

related to SA and Opinion Mining has been growing. The growing interest in SA and
Opinion Mining is partly due to the different application areas: in commercial field to
the analysis of the review of products [12], in political field to identify the electorate
mood and therefore the trend in the voting (or abstaining) [13], etc. In social envi-
ronments, the SA can be used as a survey tool that allows to understand the existing
points of view: for example, to understand the opinion that some people have about a
subject, to predict the impact of a future event or to analyze the influence of a past
occurrence [14]. The big data technologies, the observation methods and the analysis of
the behavior on the web, make SA an important decision making tool for the analysis
of social network, able to develop relation, culture and sociological debate.

2.2 Sentiment Analysis for Government


In the public administrations, SA is a technique capable to facilitate the creation of
relationship between public body and citizens. SA is able to discover the criticalities of
this relationship in order to focus on taking right actions.
The SA carried out in social networks allows public administrations to identify and
meet user’s needs and enables citizens to affect the service delivery and to participate in
the creation of a new service, or even to identify innovative uses of existing services [15].
The basic principle of this work is the Open Services Innovation [16], where
service innovation is generated by the contribution of citizen judgments and opinions.
The center of this system moves from the services to the citizens with their emotions
and feelings.
Capturing the citizen opinions using social media can be a mind-safe approach less
expensive and more reliable than surveys, where there is the risk of false statements of
opinion. Moreover, the social networks reveal a more extensive and widespread
involvement of users or citizens; it allows an automatic interception of topics or key
events. Analyses of the sentiment of citizen opinions are crucial for the analysis of
issues relating to the services provided by public administration and the development of
new processes. The objective is to support the public decision maker in the decision
process in order to facilitate the growth and innovation aiming at improving the daily
life of the community. In this way, the public institution will have the opportunity to
acquire, identify, organize, and analyze the “noise” about a public sector and will
highlight the quantitative and qualitative aspects that determine the positive or negative
sentiment image for the qualification/requalification of its activities.
The information floating within social networks assist public administration to
better understand the current situation and to make predictions about the future with
respect to many social phenomena.

3 Lecce 2019: European Capital of Culture

Counting 90,000 citizens, Lecce is a mid-sized city, which represents the most
important province of Salento located in the “heel” of the Italian “boot”. Even though
Lecce is known for its enormous cultural artistic and naturalistic heritage, it can also be
Sentiment Analysis for Government: An Optimized Approach 101

considered as a typical example of southern Italian city from a socio-economic point of


view: poor in infrastructure, with high and increasing unemployment rates. However,
despite this disadvantageous context, remarkable investments in research university
education and tourism sector have been taking place during the last few years, making
Lecce an area of attraction at the international scale. In fact, the only possibility of
improvement for the territory is to bet on a radical change.
The change is aiming at obtaining a deep innovation of Lecce and Salento, starting
from a concrete enhancement of their resources, where citizens are extremely
important.
Three opportunities can be highlighted in order to reconsider the territory in a more
innovative and respectful way for the citizenship: the participation to the National
Smart City Observatory where Lecce is one of the pilot cities together with Beneven-to,
Pordenone and Trento, an urban planning co-created with the citizens, the candidacy as
European Capital of Culture 2019.
The European Capital of Culture is a city designated by the European Union for a
period of one calendar year during which it organizes a series of cultural events with a
strong European dimension [17].
As stated previously, the public administration of Lecce is trying to change
approach, creating a shared path towards a social model in which a direct participation
and collaboration of the citizens is included. In the guidelines of the candidacy as
European Capital of Culture, one of the main criteria of the bid book evaluation is “the
city and citizens” [18], referring to concrete initiatives that must be launched to attract
local neighboring and foreign citizens’ interest and participation. Moreover, these
initiatives should be long-term and should be integrated in cultural and social inno-
vation strategies.
The challenge of the candidacy is making Lecce a Smart City, which means
moving towards an innovative ecosystem that improves citizens’ quality of life through
an efficient use of resources and technologies. A fundamental aspect is citizens’ par-
ticipation aimed at collecting their needs as beneficiaries and main characters in the
open innovation process.
The urgent need to express a strong break with the past is well summarized in the
slogan for Lecce as European Capital of Culture: “Reinventing Eutopia”, that means
reinterpreting the European dream from a political, social, cultural and economic
perspective. That concept is composed by eight utopias for change, the main of which
is DEMOCRAtopia until 2019. As described in the bid book (Lecce 2019, 2013),
DEMOCRAtopia refers to the creation of a climate of trust, awareness, collaboration,
responsibility, and ownership, with a special emphasis on the collective knowledge and
on a development perspective oriented to citizens’ dreams and needs [19].
The title of European Capital of Culture may provide the government and the
communities with the opportunity to thrive together, in order to achieve a medium or
long-term goal. This requires a constant dialogue, which is an essential component in
this process activation. The candidacy is the beginning of a journey, a playground for
the future, a dream, a laboratory of ongoing and future experiments, and an opportunity
to become what we are, if we really want to [18].
102 A. Corallo et al.

The title of European Capital of Culture is a part of Open Service Innovation


Process because have as objective the creation of new processes, services and events in
the city of Lecce.
The Open Service Innovation Process was realized by allowing citizens to con-
tribute to the creation and design of new services, infrastructure and events.
The realization of the idea Management System and of events for promotion
allowed the involvement of citizen.
This paper is an instrument of analysis of an Open Service Innovation Process that
allows observing and evaluating the sentiment of citizen about Lecce 2019 event. The
opinion classification can be used to understand the strengths and weaknesses of the
process.

4 Dataset

We collected a corpus of tweets using the Twitter search API between 2 September
2014 and 17 November 2014, period in which there were more Twitter messages about
the event. We extracted tweets using query-based search to collect the tweets relating to
“#Lecce 2019” and “#noisiamolecce2019”, hashtag most used for this topic. The
resulting dataset contains 5,000 tweets. Duplicates and retweets are automatically
removed leaving a set of 1,700 tweets with a class distribution as shown in Table 1.

Table 1. Class distribution


Positive 391 23 %
Neutral 1241 73 %
Negative 68 4%
Total 1,700

In order to achieve a training set for creating a language model useful to the
sentiment classification, a step of manual annotation was performed. This process
involved three annotators and a supervisor. The annotators were asked to identify the
sentiment associated with the topic of the tweet and the supervisor has developed a
classification scheme and created a handbook to train annotators on how to classify text
documents.
The manual coding was performed using the following 3 labels:
• Positive: tweets that carry positive sentiment towards the topic Lecce 2019;
• Negative: tweets that carry negative sentiment towards the topic Lecce 2019;
• Neutral: tweets which do not carry any sentiment towards the topic Lecce 2019 or
tweets which do not have any mention or relation to the topic.
Each annotator has evaluated all 1,700 tweets. For the construction and the
dimension of the training set, see next section.
Analysis of the inter-coder reliability metrics demonstrates that annotators were
agreed for more than 80 % of the documents (the agreement is average of 0.82) with
Sentiment Analysis for Government: An Optimized Approach 103

good inter-coder reliability coefficients (Multi-Pi (Siegel and Castellan), Multi-Kappa


(Davis and Fleiss), Alpha (Krippendorf)) [20].
These measures are very important: a low inter-coder reliability means to generate
results that could not be considered “realistic” for sentiment analysis, whereas “real-
istic” means conceptually shared with the common thought. However, these metrics do
not affect the accuracy values of classification algorithms. It can therefore happen that,
despite the accuracy of the classification algorithm is equal to 100 %, the analysis
results are not conceptually consistent with the reality.

5 Method

5.1 Features Optimization


Before performing the classification of sentiment, the text has been processed by a
preprocessing component, using the following features:
• Identification of expressions with more than one word (n-grams). Often opinions
are transmitted as a combination of two or more consecutive words. This makes
incomplete the analysis performed on tokens that consists of only single words.
Generally, single terms (unigrams) are considered as features, but in several
researches, albeit with contrasting results, also n-grams of words were considered.
From some experimental results, it seems that the features formed by the more
frequent n-grams in the collection of documents, added to those consisting uni-
grams, increase the classifier performance because they are grasped as a whole,
parts of the expression that can’t be separate. However, sequences with length n > 3
are not helpful and could lead to a reduction in performance. For this reason, the
choice fell on the use of unigrams, bigram and their combination.
• Conversion of all letters in lowercase. The preservation of lowercase and uppercase
letters for all words can create a useless feature chipping with an increment of
processing time and a reduction of system performance. One of the most successful
approaches in the texts preprocessing for SA is to edit all letters in lowercase; even
if a term in all caps may want to communicate an intensification of an opinion, this
feature degrades performance and does not justify its use.
• Hashtags inclusion with the removal of the “#” character. Often hashtags are used
to characterize the sentiment polarity of an opinion expressed, as well as to direct
the text towards a specific subject; consider, for example, the hashtag “#happy” and
“#bad” that communicate a direct sentiment. An idea may be, therefore, to include
the text of the hashtag among valid elements to characterize the language model.
• Twitter user names removal.
• Emoticon removal or emoticon replacement with their relevant category. Emoti-
cons are extremely common elements in social media and, in some cases, are
carriers of reliable sentiment. The replacement of emoticons with a specific tag
according to their sentiment polarity (positive or negative) can be a winning
approach in some contexts. However, we must be careful: these representations
firstly express emotions and not sentiment intended as being for or against a par-
ticular topic. For example, being disappointed for a particular event, but still
104 A. Corallo et al.

favorable, creates contrast between the emoticons seen as negative ( ) and the
sentiment favorable for that topic.
• URL replacement with the string “URL”;
• Removal of words that do not begin with a letter of the alphabet.
• Removal of numbers and punctuation. Another element that can characterize text
polarity is punctuation, in particular exclamation points, question marks and
ellipsis. The inclusion of these elements in the classification process, can lead to a
more accurate definition of sentiment, also taking into account the repetitions that
intensify the opinion expressed. However, the inclusion of punctuation slows the
classifier speed.
• Stopwords removal. Some words categories are very frequent in the texts and are
generally not significant for sentiment analysis. This set of textual elements includes
articles, conjunctions and prepositions. The common use, in the field of sentiment
analysis, is to remove these entities from the text before the analysis.
• Removal of token with less than 2 characters;
• Shortening of repeated characters. Sometime words are lengthened with the rep-
etition of characters. This feature can be a reliable indicator of intensified emotion.
In the Italian language there are no sequences of three or more identical characters
in a word, so we can consider that such occurrences are an extension of the base
word. Since the number of repeated characters is not predictable and because the
probability that small differences are significant is low, it is possible to map
sequences of three or more repeated characters in sequences of only two characters.
• Stemming execution. Stemming is the process of reducing words to their word stem.
This can help to reduce the vocabulary size and thus to increase the classifier
performance, especially for small datasets. However stemming can be a
double-edged sword: the reduction of all forms of a word can eliminate the senti-
ment nuances that, in some cases, make the difference, or it can lead to the unifi-
cation of words that have opposite polarities. It seems that the benefits of stemmer
application are more evident when the documents of training are few, although the
differences are generally imperceptible.
• Part-of-Speech Tagging. There are many cases in which there is an interpretation
conflict among words with same representation but different role in the sentence.
This suggests that it may be useful to run a PoS tagger on the data and to use the
pair word-tag as a feature. In the literature, there is often a slight increase in
accuracy with the use of a PoS tagger at the expense of processing speed, which
slows the preprocessing phase.
As specified in the following paragraph, 8 classification stages were performed,
each with different approaches of text preprocessing and features selection, using
training set of different sizes. For each cycle a 10-fold cross-validation was performed,
dividing the manually annotated dataset into a training set and a fixed test set of 700
tweets. Every 10-fold validation was repeated 10 times. This was necessary to obtain
reliable results in terms of accuracy and suffering from a minimum amount of error
[29].
All stages contain the following preprocessing steps: all letters are converted to
lowercase; user names, URLs, numbers, punctuation, hashtags, words that do not begin
Sentiment Analysis for Government: An Optimized Approach 105

with a letter of the Latin alphabet and those composed of less than two characters, are
removed. In addition to these, the sets of features that characterize the 8 classification
stages are the following:
• set 1: uni-grams;
• set 2: bi-grams;
• set 3: uni-grams + bi-grams;
• set 4: set 1 + stopwords removal + repeated letters shortening;
• set 5: set 4 + stemming;
• set 6: set 5 + emoticon inclusion with replacement;
• set 7: set 6 + hashtags inclusion with character “#” removal;
• set 8: set 7 + PoS tag.

5.2 Training Set Dimension


To classify the documents in the test, the supervised machine learning methods, as
described below, uses relationships between features in the training set. The training set
then is a random sample of documents, representative of the corpus to be analyzed [21].
In order to achieve acceptable results with supervised machine learning approa-
ches, the training set must be constituted by a sufficient amount of documents.
Hopkins and King argue that five hundred documents are a general threshold in order
to have a satisfactory classification. Generally, this amount is effective in most cases,
but we must always consider the specific application of interest. For example, to the
increase of the number of categories in a classification scheme, necessarily increases
the number of information crucial to learn the relationships between the words and
each category, and, consequently, this requires the increase of the documents number
in the training set.
The creation phase of the training set is the most costly in terms of human labor. In
order to identify the minimum amount of documents to be annotated manually main-
taining a good accuracy of the results, it was performed an analysis that compares
different evaluation metrics of the various classifiers for each set of features identified,
varying the size of the training set from time to time.
10 classification runs were performed, bringing the documents of the training set
from 100 to 1,000 in steps of 100. For each cycle, a 10-fold cross-validation, repeated
10 times, was performed to minimize the error estimation accuracy.

5.3 Sentiment Classification


Today there are many algorithms of SA and this amount of solutions evidences how
hard it is to determine the right solution for every context. The algorithm that provides
100 % of efficiency does not exist; each approach tries to reach an improvement in a
particular aspect of the analysis, in a specific context and in a particular type of data to
process [22].
106 A. Corallo et al.

The main classes of SA algorithm, lexicon-based and machine learning-based, have


different fields of application and are based on different requirements [22].
The creation of a dictionary of terms previously classified has low accuracy of
results, considering the continuous changing of the language and the change in the
meaning of words in different contexts.
Moreover, it is arduous to find, in this type of dictionary, the words used in the
slang of the social networks and so, it’s very difficult to identify the polarity of the
words. Surely, the strength of this type of approach is the speed with which the results
can be obtained, allowing to obtain real-time analysis. Moreover, the absence of a
manually annotated training set makes the use of these methods more attractive.
The supervised machine learning approach allows obtaining an improved accuracy
but requires a training phase.
Using specific training set in these methods it is possible to adapt the tool to the
context of interest. If a supervised machine learning algorithm is trained on a language
model, specific for a given context and for a particular type of document, it will provide
less accurate results by moving the analysis to another context of interest [22].
A tool for SA in the context of public administration must be able to provide
reliable estimates and analysis on everything that citizens think about the institutions,
the efficiency of services and infrastructure and the level of satisfaction about events.
We believe, therefore, that the most suitable approach to the analysis of the sentiment
in this context, involves the use of supervised machine learning algorithms.
This work implements the classification of sentiment as follows:
(1) Single text document-level, using algorithms Naive Bayes Multinomial (NB) in
binary version [23] and Support Vector Machine (SVM) in the “one-vs-rest”
variant [24].
(2) Globally, at the entire dataset-level, using ReadMe [25].

Document-level Classification. Generally, the classification of sentiment is performed


on each distinct document. This is very useful when you want to characterize the
performance of sentiment over time or when you want to identify the motivations that
drive people to publish their opinions. The more accurate supervised algorithms are
Naive Bayes (NB) in binary Multinomial version and Support Vector Machine
(SVM) of the type “one-vs-rest”. This choice derives from the different analyses in the
literature showing the duality of these algorithms: Naive Bayes is characterized by a
rapid classification and the need for a training set of lower size than required by SVM,
which, however, shows results that are more accurate when there are many training
documents [26].
Dataset-level classification. It is also helpful to have accurate estimations of global
sentiment of the entire dataset of document. While the state of the art algorithms of
supervised machine learning achieve high accuracy in the individual classification of
documents, on the other hand tend to provide results affected by greater error when
individual results are aggregated to obtain an overall global measure. This requirement
has led to select, as an additional classification algorithm, the approach developed by
Hopkins and King, ReadMe; this technique allows obtaining an estimate of the
Sentiment Analysis for Government: An Optimized Approach 107

proportions of global sentiment of a dataset with a low margin of error, exceeding the
predictive performance of the other algorithms [25].
To evaluate the algorithms that classify individual text documents, the following
performance metrics were measured: Accuracy (A), Precision (P), Recall (R),
F1-measure [27].
In order to compare and evaluate classification algorithms that predict the overall
proportion of the categories of sentiment, the following statistics metrics are used:
Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) [25].

6 Results

The aim of this analysis is to identify which combination of features and training set
size produces an optimal sentiment classification of short text messages related to the
event “Lecce 2019”.
The Root Mean Square Error value has been calculated for the three algorithms
(svm, nb and readme) and for the first three sets of features (1, 2, 3), by varying the size
of the training set from 100 to 1,000 in steps of 100.

Table 2. Root mean square error value for three different sets of features and varying the size of
the training set
svm01 nb01 readme01 svm02 nb02 readme02 svm03 nb03 readme03
100 0,115 0,140 0,115 0,151 0,153 0,150 0,152 0,147 0,116
200 0,136 0,142 0,094 0,144 0,145 0,112 0,142 0,142 0,098
Trainig set dimension

300 0,132 0,139 0,089 0,137 0,139 0,093 0,134 0,136 0,086
400 0,127 0,134 0,081 0,130 0,134 0,083 0,128 0,132 0,079
500 0,123 0,130 0,075 0,125 0,129 0,077 0,123 0,128 0,073
600 0,119 0,125 0,071 0,121 0,125 0,073 0,119 0,123 0,071
700 0,115 0,121 0,069 0,116 0,120 0,070 0,114 0,118 0,069
800 0,111 0,116 0,067 0,112 0,116 0,068 0,110 0,114 0,067
900 0,108 0,113 0,066 0,108 0,112 0,067 0,107 0,111 0,066
1000 0,105 0,110 0,065 0,105 0,109 0,065 0,104 0,108 0,065

As seen from Table 2, the Root Mean Square Error value for the three classification
algorithms shows that, varying the size of the training set, the first three set of features
performs almost the same way. The use of bi-grams (set 2) generates slightly worse
results than the other two features sets, while the joint use of uni-grams and bi-grams
(set 3) produce a greater number of features that slows down a little the classification
step. For this reasons, the subsequent sets were constructed starting from the set 1 and
adding other methods of text preprocessing and features selection.
In all cases, ReadMe appears to be the best algorithm for the aggregate sentiment
(dataset-level) classification, even with a training set of small dimension. This result
validate our choice.
108 A. Corallo et al.

Fig. 1. Root mean square error values for ReadMe algorithm using different sets of features and
varying the size of the training set

Figure 1 shows how Root Mean Square Error of ReadMe classification varies by
changing the set of features.
The Root Mean Square Error decreases adding to unigrams other feature selection
and extraction methods, that is decreasing the number of words taken into account or
characterizing them in different ways. This reduction is particularly evident from set 4
onwards, where stopwords removal and repeated characters shortening allow to obtain
a Root Mean Square Error value of 4.2 %. However, the application of further pre-
processing methods (sets 5, 6, 7) is effective only with a small training set. It can be
also noted that for sets 4, 5, 6 and 7, the increase of the training tweets number over 600
has no effect on the trend of RMSE.
It is rather remarkable the reduction of Root Mean Square Error with the appli-
cation of set 8. For this set, we reach the least error (2.8 %) with a training set of 700
documents. However, having a training set greater than 400 tweets is not very useful in
terms of error reduction, since the Root Mean Square Error reduction is about 0.1 %.
There is not much difference in terms of computational complexity to create
training sets with sets of features 4, 5, 6, 7. All these sets are quite similar, but the set 7
give the best results. The use of PoS tagging (set 8) instead, introduces a slowdown in
the preprocessing stage, but reaching the best results among all feature sets.
The measures of Accuracy (Fig. 2), carried out for SVM and NB algorithms using
the set 7, shows almost the same trend for the two algorithms. This result, which see the
weakly NB next to the best state-of-art classification algorithm SVM, is probably due
to the kind of documents analyzed, namely short text messages (tweets), and is in
agreement with the literature that also point out how SVM perform better with longer
texts [28].
As shown in Fig. 3 and as already pointed out previously, the analysis of the
accuracy for the other sets of features doesn’t lead to significant increases. For all sets,
Sentiment Analysis for Government: An Optimized Approach 109

Fig. 2. Accuracy of SVM and NB using features set 7 varying the training set size

Fig. 3. Accuracy values of NB with different sets of features varying the training set size

the accuracy increases as the size of the training set, up to a value of 78 % for the NB
algorithm with the use of PoS tagging (set 8).
The same trends are obtained in the measures Precision (P), Recall (R),
F1-measure.
In summary, for the dataset-level sentiment analysis of the tweets, the choice of
unigrams features with the stopwords and repeated characters shortening, stemming,
emoticon replacement, hashtags inclusion with “#” character removal and PoS tagging
(set 8), proved to be the most successful. We believe that the best number of tweets to
be included in the training set to get good results with a good compromise between
110 A. Corallo et al.

error and human labor is 300, using features set 8, or 500 using features sets 4, 5, 6 or
7. However, even with 200–300 training tweets you can achieve good results.
For a document-level classification, an accuracy of 78 % is achieved with NB
algorithm by using the 1,000 tweets training set with feature set 8; however, for a
document-level classification it is sufficient to use set 6 or 7, since these generates
almost the same accuracy values than set 8, eliminating slowness of the PoS tagging
step. Here, unlike the dataset-level classification, there seems no be a flattening of the
accuracy increase as the size of training set grows. So, up to a certain limit, the more
training tweets you will have, the more accurate the sentiment classification. However,
we believe that a training set consisting of about 300–400 tweets can generate
acceptable results.
In both types of classification, the use of PoS tagging must be carefully chosen
according to the type of application; if the goal is real-time sentiment analysis, this
preprocessing approach must be avoided otherwise it can be used.

7 Conclusion

Following the state of the art experience about the use of algorithms for sentiment
classification, this paper intends to propose an optimized approach for the analysis of
tweets related to a public administration event. The possibility to extract opinions from
social networks and classify sentiment using different machine learning algorithms,
make this a valuable decision support tool for Government.
To meet this need, this paper proposes an approach that considers document-level
and dataset-level sentiment classification algorithms to maximize the accuracy of the
results in both single document and aggregated sentiment classification. The work also
point out which features sets produce better results compared to the size of the training
set and to the level of classification.
We have introduced a new dataset of 1,700 tweets relating to the public event of
“Lecce 2019: European Capital of Culture”. Each tweet in this set has been manually
annotated for positive, negative or neutral sentiment.
An accuracy of 78 % is achieved using NB document-level sentiment classification
algorithm and unigrams features with stopwords removal, repeated characters short-
ening, stemming, emoticon replacement, hashtags inclusion with “#” character removal
and PoS tagging with a training set of 1,000 tweets. A training set of 300–400 tweets
can be a reasonable lower limit to achieve acceptable results.
Our best overall result for a dataset-level classification is obtained with the ReadMe
approach using a feature set that included also the PoS tagging and a training set of 700
tweets. Using this optimal set of features, a dataset-level sentiment classification reports
a low Root Mean Square Error value, equal to 2.8 %. However, with a training set of
400 tweets can be obtained almost the same results.
In a context such as public administration, the emotional aspect of the opinions can
be crucial. Future work involves carrying out algorithms that allow extracting and
detecting the type of emotions or moods of citizens in order to support the decisions for
the public administration.
Sentiment Analysis for Government: An Optimized Approach 111

References
1. Jansen, B., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of
mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)
2. O’Connor, B., Balasubramanyan, R., Routledge, B., Smith, N.: From tweets to polls: linking
text sentiment to public opinion time series. In: Proceedings of the Fourth International
Conference on Weblogs and Social Media, ICWSM 2010, Washington, DC, USA (2010)
3. Tumasjan, A., Sprenger, T., Sandner, P., Welpe, I.: Predicting elections with Twitter: what
140 characters reveal about political sentiment. In: Proceedings of the Fourth International
Conference on Weblogs and Social Media, ICWSM 2010, Washington, DC, USA (2010)
4. Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: the good the bad and the
OMG! In: Proceedings of the Fifth International Conference on Weblogs and Social Media,
ICWSM 2011, Barcelona, Catalonia, Spain (2011)
5. Salathe, M., Khandelwal, S.: Assessing vaccination sentiments with online social media:
implications for infectious disease dynamics and control. PLoS Comput. Biol. 7(10),
1002199 (2011)
6. Mandel, B., Culotta, A., Boulahanis, J., Stark, D., Lewis, B., Rodrigue J.: A Demographic
analysis of online sentiment during hurricane irene. In: Proceedings of the Second Workshop
on Language in Social Media, LSM 2012, Stroudsburg (2012)
7. Xu, J.-M., Jun, K.-S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media.
In: HLT-NAACL, pp. 656–666 (2012)
8. Asur, S., Huberman, B.A.: Predicting the future with social media. In: Proceedings of the
2010 International Conference on 132 Web Intelligence and Intelligent Agent Technology,
WI-IAT 2010, vol. 01, pp. 492–499. IEEE Computer Society, Washington, D.C., USA
(2010)
9. Bakliwal, A., Foster, J., van der Puil, J., O’Brien, R., Tounsi, L., Hughes, M.: Sentiment
analysis of political tweets: towards an accurate classifier. In: Proceedings of the Workshop
on Language in Social Media (LASM 2013), pp. 49–58. Atlanta, Georgia (2013)
10. Sanjiv Das, M.C.: Yahoo! for Amazon: extracting market sentiment from stock message
boards. In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA)
(2001)
11. Tong, R.M.: An operational system for detecting and tracking opinions in on-line discussion.
In: Proceedings of the SIGIR Workshop on Operational Text Classification (OTC) (2001)
12. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and
semantic classification of product reviews. In: Proceedings of WWW, pp. 519–528 (2003)
13. Neri, F., Aliprandi, C., Camillo, F.: Mining the web to monitor the political consensus. In:
Wiil, U.K. (ed.) Counterterrorism and Open Source Intelligence. LNSN, pp. 391–412.
Springer, Vienna (2011)
14. Kale, A., Karandikar, A., Kolari, P., Java, A., Finin, T., Joshi, A.: Modeling trust and
influence in the blogosphere using link polarity. In: Proceedings of the International
Conference on Weblogs and Social Media (ICWSM) (2007)
15. Dolicanin, C., Kajan, E., Randjelovic, D.: Handbook of Research on Democratic Strategies
and Citizen-Centered E-Government Services, pp. 231–249. IGI Global, Hersey (2014)
16. Chesbrough, H.: Open Services Innovation. Wiley, New York (2011)
17. http://ec.europa.eu/programmes/creative-europe/actions/capitals-culture_en.htm
18. http://www.capitalicultura.beniculturali.it/index.php?it/108/suggerimenti-per-redigere-una-
proposta-progettuale-di-successo
19. http://www.lecce2019.it/2019/utopie.php
112 A. Corallo et al.

20. Koch, G.G., Landis, J.R.: The measurement of observer agreement for categorical data.
Biometrics 33, 159–174 (1977)
21. Hopkins, D., King, G.: Extracting systematic social science meaning from text. Unpublished
manuscript, Harvard University (2007). http://gking.harvard.edu/files/abs/words-abs.shtml
22. Liu, B.: Sentiment analysis and opinion mining. Synthesis Lectures on Human Language
Technologies. Morgan & Claypool Publishers (2012)
23. Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an
enhanced naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise,
T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg
(2013)
24. Yang, Y., Xu, C., Ren, G.: Sentiment Analysis of Text Using SVM. In: Wang, X., Wang, F.,
Zhong, S. (eds.) EIEM 2011. LNEE, vol. 138, pp. 1133–1139. Springer, London (2011)
25. King, G., Hopkins, D.: A method of automated nonparametric content. Am. J. Polit. Sci. 54
(1), 229–247 (2010)
26. Hassan, A., Korashy, H., Medhat, W.: Sentiment analysis algorithms and applications: a
survey. Ain Shams Eng. J. 5, 1093–1113 (2011)
27. Huang, J.: Performance Measures of Machine Learning. University of Western Ontario,
Ontario (2006)
28. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic
classification. In: Proceedings of the 50th Annual Meeting of the Association for
Computational Linguistics: Short Papers, vol. 2. Association for Computational
Linguistics (2012)
29. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Liu, L., Ӧzsu, M.T. (eds.)
Encyclopedia of Database Systems, pp. 532–538. Springer, New York (2009)

View publication stats

You might also like