Professional Documents
Culture Documents
com
Available online at www.sciencedirect.com
ScienceDirect
Procedia
Procedia Computer
Computer Science
Science 00(2017)
117 (2017)38–45
000–000
www.elsevier.com/locate/procedia
3rd International Conference on Arabic Computational Linguistics, ACLing 2017, 5–6 November
2017, Dubai, United Arab Emirates
Abstract
Sentiment analysis can help analyse trending topics such as political crises and predict it before it occurs. Yet, analysing senti-
ments in Arabic texts has not been explored much in the extant literature. In this paper, we present a new tool that applies sentiment
analysis to Arabic text tweets using a combination of parameters. Those parameters are (1) the time of the tweets, (2) preprocessing
methods like stemming and retweets, (3) n-grams features, (4) lexicon-based methods, and (5) machine-learning methods. Users
can select a topic and set their desired parameters. The model detects the polarity (negative, positive, both, and neutral) of the topic
from the recent related tweets and display the results. The tool is trained with 8000 randomly selected and evenly-labelled Arabic
tweets. Our experiments show that the Naive Bayes machine-learning approach is the most accurate in predicting topic polarity.
The tool is useful for intermediate and expert users and can help guide them in choosing the best combinations of parameters for
sentiment analysis.
1. Introduction
The social media platform Twitter contains rich and important information. It is often used by its users as an outlet
to express sentiment. Indeed, Jansen et al. [19] identified Twitter as an online word-of-mouth branding due to the
vast amount of opinions. Twitter is multi-domain and contains a broad set of topics including: politics, education, and
products. One way to analyse the large amount of opinions in Twitter is to apply sentiment analysis to it. Sentiment
analysis is an application of natural language processing, computational linguistics and text analytics that classifies
text into polarity (i.e., positive, negative, neutral) and emotion (e.g., angry sad, happy).
Sentiment analysis is dependent on the language of the text, as the sentiment analysis models will be trained using
text from the same language. Moreover, some words could infer a certain polarity in one culture and a different one in
∗ Mazen El-Masri
E-mail address: mazen.elmasri@qu.edu.qa
another. For example, the word white (i.e., as in colour) in Eastern Asian cultures represents death which is associated
with a negative polarity, however in Western cultures it usually represents peace and weddings which is a positive
polarity [9].
Research in sentiment analysis for Arabic language is limited and most of the previous research has been focused
on English. Arabic contains linguistic features that are opposing to the English language. The structures and grammar
of the language differ in the two languages. Arabic is a complicated language and includes several different dialects
including: Egyptian, Moroccan, Levantine, Iraqi, Gulf, and Yemeni [12].
In this paper, we create a tool for Arabic sentiment analysis. This tool allows users to apply sentiment analysis to
an given topic. It will allow users to select parameters including the time of the tweets, preprocessing, features and
machine learning techniques. This tool can educate users in choosing the best combinations of sentiment analysis
parameters for a given topic.
Related research about Arabic sentiment analysis is presented in section 2. The dataset used to train the sentiment
analysis models is outlined in section 3. The Arabic sentiment analysis tool is presented in section 4, followed by the
experiments and results in section 5. Lastly, we discuss our findings in section 6 and future work in section 7.
2. Related Research
Web based tools in sentiment analysis can serve many purposes. One study by Al-Subaihin [5] created a web-
based tool that allow users to determine the polarity of Arabic words as a game. This is useful for researchers to
create a lexicon. Existing research that explored Arabic sentiment analysis is limited to a few studies, specifically
when applied on Twitter. Related research includes: Shoukry and Rafea [26], Duwairi [14], Al-Ayyoub et al. [3] and
Al-Kabi et al. [4]. These will be discussed in detail in the following paragraphs.
Arabic sentiment analysis has been explored on a sentence-level basis [26]. Two human annotators labelled tweets
based on their polarity. They found that 500 of the tweets were positive and 500 were negative. Different prepro-
cessing methods were used including: removing user-names, pictures, hashtags, URLs and non Arabic words. They
experimented with two features: unigrams and bigrams and two classifiers: SVM and Naive Bayes. Two experiments
were implemented: one which included removing the stop words and one without removing them. Removing stop
words led to slight improvement in the performance. This suggests that stop words are valuable to the sentiment or
that other stop words needed to be removed. The results show that SVM led to a greater performance than NB giving
a 4-6% increase in accuracy. The best model was SVM with unigrams, which accuracy was 72%.
Different Arabic dialects were explored recently in sentiment analysis research [14]. The dataset used was 22550
tweets which included: positive-8529, negative-7021 and neutral-7000. Twitter API was used to collect the data and
the Crowdsourcing Tool was used to label it. The preprocessing methods used included: tokenization, removing stop
words (not including negation), and converting emoticons to their corresponding words. They applied two experiments
using two datasets, one which consisted of the tweets without removal of dialectical words. In the other dataset, they
replaced dialectical words with their corresponding Modern Standard Arabic words. The classifiers included Naive
Bayes (NB) and SVM classifiers. Their results show that NB was the best classifier for their application leading to
a F-score of 88% with dialect lexicon and 84% without it. SVM F-score was second best achieving a 87% accuracy
with dialect lexicon and 84% without it.
An example of a study that applied lexicon based sentiment analysis on tweets is Al-Ayyoub et al. [3]. They created
a lexicon with 120 thousand Arabic words. This included a previous lexicon created by Abuaiadh [1]. They analysed
Arabic news opinions from various websites including Twitter. Distinct Arabic stems were extracted, translated, and
then searched for in the lexicon. They used 300 tweets for each positive, negative and neutral classes as training data.
The preprocessing methods included: removing repetition of vowels, fixing spelling mistakes, fixing mistakes caused
by sound similarities. Two experiments were implemented: 1) lexicon-based sentiment analysis using the lexicon
they created; 2) keyword-based approach, selecting the most frequent words in the tweet. Consequently, their results
revealed that the lexicon-based approach led to the highest performance with an accuracy of 87%.
An Arabic sentiment analysis tool was developed by Al-Kabi et al. [4]. For training data, they used 1,080 Arabic
reviews from social media and news sites. They used preprocessing methods including removing the transliterated
Arabic words like momtaz (i.e. meaning excellent in English) and Arabizi (Arabic chat alphabet). Punctuation and
40 Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45
Author name / Procedia Computer Science 00 (2017) 000–000 3
non-alphabet characters were removed and some of the Arabic alphabet was normalised. They used KNN as a classifier
and achieved an accuracy of around 90% when K=1.
Our analysis of relevant extant literature shows that all machine learning, lexicon based and combined methods
could lead to a good sentiment analysis performance. Therefore, it is good to include all the three methods in a system
to analyse sentiment. The Combination method may perform better with dialectical data such as in Duwairi [14]. Also,
all three studies used preprocessing, which may have contributed to increasing the performance levels. It is difficult
to determine which features lead to the best performance and it may vary from one domain to another. Hence when
creating a system all features should be available to users.
3. Data Collection
As sentiment analysis depends on the training data which is labelled, we collected data from Tweets. Originally, we
collected 152477 tweets of which only 64342 tweets were unique. The tweets were collected using trend words which
were from different dialects but mostly were Egyptian. These trending keywords were automatically refreshed every
hour. We labelled the tweets using a previously created lexicons (See Table 1). To combine these lexicons and remove
duplicates (See Table 1), we used an algorithm that gave priority to positive and negative polarities over neutral [7].
The tweets were labelled into positive, negative, neutral and both (i.e. a label given when tweets contain both positive
and negative equally). We did not use any method to test the accuracy of the label outputs from the lexicon as our
focus of this paper was to present the web-based tool. Due to the unbalanced data in the classes we randomly selected
2000 of the tweets for each class. Consequently, our dataset consisted of 8000 tweets.
Our tool is illustrated in Figure 1. It was created using R language. Several packages including RWeka, shiny and
twitteR were used. RWeka is a package created by Hornik et al. [17]. It is a R interface to Weka which is a collection
of machine learning algorithms for data mining tasks written in Java. Shiny is a web application framework for R.
It was developed by Chang et al. [11] and allows users to deploy and share their applications online. TwitterR is a
R package created by Gentry [15]. It provides the Twitter API interface. We supply the users with different choices
which will be discussed below. The choices depend on their level of experience in sentiment analysis.
The first part of the tool, as illustrated in Figure 1, asks users to input the topic. We have set an example for the
word “Qatar”. In our results, we can see that 44% of the tweets collected related to the word “Qatar” is negative,
Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45 41
4 Author name / Procedia Computer Science 00 (2017) 000–000
which could be due to the current situation in the political environment. We do not set the domain of the tweets in this
system meaning that the tweets could be from multiple domains.
The second part of the system in setting the time input for the tweet collection. The time input consists of dates
which is set back to a week before the current date. This could be useful in trending topics, to collect only recent
tweets about a certain topic.
The third part of the system is related to the sentiment analysis part. Sentiment analysis can be using a lexicon-
based method or a machine learning one. The machine-learning method consists of four main steps: collecting the
data, preprocessing it, selecting the features and applying the machine learning techniques. The lexicon based method
includes determining the polarity from lexicons.
4.1. Preprocessing
Preprocessing the data can improve sentiment analysis performance by reducing errors in the data. It is a way to to
clean the data from unwanted elements. Without data preprocessing, sentiment analysis models can ignore important
words and negatively impact the accuracy of the results. On the other hand, using preprocessing extensively, may
cause loss of important data. One example of erroneous over-preprocessing is the removal of punctuation which could
be valuable to the analysis. There are many general preprocessing techniques, of which the most common are: remove
stop words, punctuation, repeated letters and numbers and covert text to lower or upper case.
Users can select if they would like to preprocess the data or not. Some data is best left unprocessed as it may held
valuable information such as in the education domain question marks could mean confusion [6]. In the application,
we include several preprocessing techniques such as: removing numbers, punctuation, URLs, hashtags, usernames,
retweets, and spaces. We also include a stemming preprocessing feature.
42 Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45
Author name / Procedia Computer Science 00 (2017) 000–000 5
4.2. Features
Features are inputs of the classifiers. They allow a more detailed analysis of the raw data. N-grams are one of the
most commonly used features [16, 2, 28]. N-grams is a sequence of n items (i.e., letters, syllables, or words) from
a text. They are mostly based on words. Unigram (one word) is the most common n-gram followed by bigram (two
words) and trigram (three words).
Previous research have found unigrams to perform best such as [25, 22]. Shoukry and Rafae [26] found in their
research that bigrams did not lead to any improvement. This could be due to their sparsity and the fact that the bigrams
were not found relevant. Mountassir et al. [21] and Rushdi et al. [24] found that trigrams lead to the best performance.
The type of ngram also depends on the type of domain used as some domains are more popular in phrases to express
sentiment. For example, in one study using the egyptian dialect, they found that there are many phrases particularly in
the politic domain that represent sentiment [18].
Results from past research show that the type of ngram that best increases results accuracy depends on a number
of factors related to the type of data treated. Accordingly, our tool gives the power to the users to choose one of three
features: unigrams, bigrams and trigrams
There are three different sentiment analysis methods: lexicon-based, machine-learning and a combined approach.
Our tool includes the lexicon-based approach and machine learning techniques, we intend to include a combine ap-
proach option to the tool in the near future. In the subsections below we describe the approaches.
1 In the future, we will experiment with different combinations to find the best models for many domains.
Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45 43
6 Author name / Procedia Computer Science 00 (2017) 000–000
In this paper, we also compare between the performance of the lexicon-based and machine-learning methods.
This is just a preliminary study to evaluate the performance of lexicon-based and machine learning based sentiment
models. For the lexicon based method, we experiment with the lexicon created by the human annotators from tweets.
As aforementioned, we only used part of the lexicon due to the large running time it would require to run through all
the lexicon. For the machine learning method, we experiment with both the Naive Bayes and SVM. In both of the
experiments, we preprocessed the data and used unigrams as a feature. These were chosen due to the common usage
in previous research.
The tool was run three times for each of the lexicon and classifiers. Twenty-five tweets were extracted and then
labelled manually to find the accuracy, precision, and recall for these sentiment analysis models. Table 2 presents the
results of the lexicon. As shown from the results, the average was good for the neutral and both classes and less for the
positive and negative ones. This was surprising considering that the majority of data was in the positive and negative
classes. The recall was high and precision was low across all classes suggesting that the lexicon returned most of the
relevant results. The lexicon could be improved in the future by adding more terms.
As for the machine learning results as shown in Table 3, the accuracy of the Naive Bayes was 70% and for the
SVM it was only 34%. In comparison to other research in Arabic sentiment analysis, our results contradict with them
for the SVM classifier. The results could be low due to the classes being 4 instead of 3 or 2 (i.e. positive and negative).
Also, the training data was labelled using previous lexicons and was not checked and filtered by human annotation
which could lead to errors.
NB SVM
6. Discussion
Web-based tools in sentiment analysis can serve users in detecting a polarity of a topic or sentence. Most of the
web-based tools are commercial based and the mechanism of how the polarity is derived is unknown. To the best
of our knowledge this does not exist an arabic web-based tool for this purpose. Arabic sentiment is important, as
there exist driving changes in the Middle Eastern region such as heavy political movements (i.e., Arab Spring). These
movements have affect on the entire world on a global scale due to the number of refugees and unstable country
statuses. This tool can be useful to eliminate problems in healthcare and business sectors. Governments and business
could tap into this knowledge to improve their performances and satisfaction of their people or customers.
Examples of web-based tools in research include [10, 8]. Chamlertwat et al. [10] created a tool that collected tweets,
filtered them for opinionated posts, detected the polarity, categorised them into product features and then visualised
the results. This study was done on English language and was focused on the product domain. Our tool on the other
hand is a generic tool and can be used over multiple domains. In contrast to Chamlertwat et al. [10], our tool is focused
on Arabic text.
44 Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45
Author name / Procedia Computer Science 00 (2017) 000–000 7
Ifeel is another tool which gave users access to seven existing sentiment analysis methods: SentiWordNet, Emoti-
cons, PANAS-t, SASA, Happiness Index, SenticNet, and SentiStrength. They allowed users to combine these methods
to achieve higher performances [8]. This tool allowed users to upload text, apply sentiment and visualise the results.
Our tool is similar but allows users to adapt sentiment analysis models including preprocessing the text and selecting
the features.
There are also many commercial websites which provide sentiment analysis online45 . Most of these websites do
not show how they classify the sentiment and the accuracy and performance is unknown. One of the websites allows
users to input text into three languages: English, Dutch and French2 . Another website provides users with a sentiment
scores3 . None of these websites were analysed Arabic text.
There are several limitations in our research. One limitation to our work is that the tweets we used to compare
the results of lexicon based and machine learning were only 25. However, in future we will experiment with a larger
number of tweets. Other limitations are that some of the tools functions are not operable at the moment for example the
Maximum Entropy classifier and the combined approach. Additionally, we did not experiment with all the sentiment
analysis components to test which model is best. In the future, we will evaluate the tool in more depth.
In this research, we created a web-based sentiment analysis tool to analyse Arabic sentiment. While there exist
many commercial websites for English sentiment analysis, this is the first tool for the Arabic language.
This tool allows users to select from different parameters (i.e., time of the tweets, preprocessing, features and
machine learning techniques) and then visualises the results. It also provides users with the accuracy of the models.
Our proposed tool can be used to analyse any topic given by the user, as our tool was trained by Twitter data which
consists of a wide range of domains. The tool is useful for intermediate or experienced users. It will be more beneficial
for experienced users as they would be able to determine the best combinations of parameters for a certain type of
topic. Users whom are intermediate can also benefit from it and learn new knowledge about outputs when selecting
different parameters.
The tool was created using R language and different R packages. We used 8000 randomly selected and evenly-
labelled Arabic Tweets to train the models in the tool. The tool allows users to choose any topic and is not restricted
to any domain like in previous studies. We will make the tool publicly available in the near future.
We compared between lexicon based method with a lexicon we created from tweets and machine learning method
using Naive Bayes and SVM. We found that NB results were good but the SVM classifier was extremely low. We also
found that the lexicon led to considerably good results considering the small size of the lexicon.
In the future, we will evaluate our tool with users to see its usefulness and effectiveness in providing sentiment
analysis. Our tool can be improved in the future, by adding more parameters such as domain, and more features and
classifiers.
Acknowledgements
This publication was made possible by the NPRP award [NPRP 7-1334-6-039 PR3] from the Qatar National
Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the
author[s].
References
[1] Abuaiadh, D., 2011. Dataset for arabic document classification. URL: http://diab.edublogs.org/
dataset-for-arabic-document-classification.
2 http://text-processing.com/demo/sentiment/
3 http://www.danielsoper.com/sentimentanalysis/
4 http://sentiment.vivekn.com/
5 http://textanalysisonline.com/textblob-sentiment-analysis
Mazen El-Masri et al. / Procedia Computer Science 117 (2017) 38–45 45
8 Author name / Procedia Computer Science 00 (2017) 000–000
[2] Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R., 2011. Sentiment analysis of twitter data, in: Proceedings of the Workshop on
Languages in Social Media, Association for Computational Linguistics, Stroudsburg, PA, USA. pp. 30–38.
[3] Al-Ayyoub, M., Essa, S.B., Alsmadi, I., 2015. Lexicon-based sentiment analysis of arabic tweets. International Journal of Social Network
Mining 2, 101–114.
[4] Al-Kabi, M., Gigieh, A., Alsmadi, I., Wahsheh, H., Haidar, M., 2013. An opinion analysis tool for colloquial and standard arabic, in: The
Fourth International Conference on Information and Communication Systems (ICICS 2013), pp. 23–25.
[5] Al-Subaihin, A.A., Al-Khalifa, H.S., Al-Salman, A.S., 2011. A proposed sentiment analysis tool for modern arabic using human-based
computing, in: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, ACM.
pp. 543–546.
[6] Altrabsheh, N., Cocea, M., Fallahkhair, S., 2015. Predicting students emotions using machine learning techniques, in: The 17th International
Conference on Artificial Intelligence in Education.
[7] Altrabsheh, N., El-Masri, M., Mansour, H., 2017. Combining sentiment lexicons of arabic terms, in: 23rd Americas Conference on Information
Systems.
[8] Araújo, M., Gonçalves, P., Cha, M., Benevenuto, F., 2014. ifeel: a system that compares and combines sentiment analysis methods, in:
Proceedings of the 23rd International Conference on World Wide Web, ACM. pp. 75–78.
[9] Aslam, M.M., 2006. Are you selling the right colour? a cross-cultural review of colour as a marketing cue. Journal of marketing communica-
tions 12, 15–30.
[10] Chamlertwat, W., Bhattarakosol, P., Rungkasiri, T., Haruechaiyasak, C., 2012. Discovering consumer insight from twitter via sentiment
analysis. J. UCS 18, 973–992.
[11] Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J., 2015. shiny: Web application framework for r, 2015. URL http://CRAN. R-project.
org/package= shiny. R package version 0.11 .
[12] Darwish, K., Magdy, W., Mourad, A., 2012. Language processing for arabic microblog retrieval, in: Proceedings of the 21st ACM international
conference on Information and knowledge management, ACM. pp. 2427–2430.
[13] Duwairi, R., El-Orfali, M., 2014. A study of the effects of preprocessing strategies on sentiment analysis for arabic text. Journal of Information
Science 40, 501–513.
[14] Duwairi, R.M., 2015. Sentiment analysis for dialectical arabic, in: Information and Communication Systems (ICICS), 2015 6th International
Conference on, IEEE. pp. 166–170.
[15] Gentry, J., 2012. twitter: R based twitter client. R package version 0.99 19.
[16] Go, A., Bhayani, R., Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report,
Stanford. URL: http://s3.eddieoz.com/docs/sentiment_analysis/Twitter_Sentiment_Classification_using_Distant_
Supervision.pdf.
[17] Hornik, K., Karatzoglou, D.M., Zeileis, A., Hornik, M.K., 2007. The rweka package .
[18] Ibrahim, H.S., Abdou, S.M., Gheith, M., 2015. Sentiment analysis for modern standard arabic and colloquial. arXiv preprint arXiv:1505.03105
.
[19] Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A., 2009. Twitter power: Tweets as electronic word of mouth. Journal of the American society
for information science and technology 60, 2169–2188.
[20] Lewandowsky, S., Spence, I., 1989. The perception of statistical graphs. Sociological Methods & Research 18, 200–242.
[21] Mountassir, A., Benbrahim, H., Berrada, I., 2012. Sentiment classification on arabic corpora: preliminary results of a cross-study. 3 e Séminaire
de Veille Stratégique, Scientifique et Technologique (VSST12) .
[22] Oraby, S., El-Sonbaty, Y., El-Nasr, M.A., 2013. Exploring the effects of word roots for arabic sentiment analysis., in: IJCNLP, pp. 471–479.
[23] Pang, B., Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Annual
Meeting on Association for Computational Linguistics 42, 271–278.
[24] Rushdi-Saleh, M., Martı́n-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M., 2011. Oca: Opinion corpus for arabic. Journal of the
American Society for Information Science and Technology 62, 2045–2054.
[25] Saif, H., He, Y., Alani, H., 2012. Semantic sentiment analysis of twitter, in: International Semantic Web Conference, Springer. pp. 508–524.
[26] Shoukry, A., Rafea, A., 2012. Sentence-level arabic sentiment analysis, in: Collaboration Technologies and Systems (CTS), 2012 International
Conference on, IEEE. pp. 546–550.
[27] Toth, T., 2006. Graphing data for decision making. Technical Reports. URL: http://udspace.udel.edu/handle/19716/2666.
[28] Wang, W., Wu, J., 2011. Emotion recognition based on cso&svm in e-learning. International Conference on Natural Computation (ICNC) 7,
566–570.