You are on page 1of 8

A Machine Learning Approach for Sentiment

Analysis in the Standard or Dialectal Arabic


Facebook Comments
Abdeljalil Elouardighi∗12 , Mohcine Maghfour1 , Hafdalla Hammia1 , Fatima-zahra Aazi3
1 LM2CE laboratory, FSJES, Hassan 1st University, Settat, Morocco
2 LRIT Laboratory, FSR, Mohammed V University, Rabat, Morocco
3 ESCA Ecole de Management, Casablanca, Morocco
abdeljalil.elouardighi@uhp.ac.ma, m.maghfour@uhp.ac.ma, hhammia@gmail.com, faazi@esca.ma

Abstract—Social networks like Facebook contain an enormous or a phenomenon are the most important information sources,
amount of data, called Big Data. Extracting valuable information which can be extracted and exploited for the SA [1].
and trends from these data allows a better understanding and
However, these social networks data are unstructured, infor-
decision-making. In general, there are two categories of
approaches to address this problem: Machine Learning mal and rapidly evolving. Their volume, variety and velocity,
approaches and lexicon based approaches. This work deals are big challenges for the analysis’ methods based on tradi-
with the sentiment analysis for Facebook’s comments written tional techniques. Collecting and processing these raw data to
and shared in Arabic language (Modern Standard or Dialectal) extract useful information are big challenges.
from a Machine Learning perspective. The process starts by
collecting and preparing the Arabic Facebook comments. Then,
The SA of the Arabic texts published in social networks
several combinations of extraction (n-grams) and weighting like Facebook, presents the same difficulties and challenges,
schemes (TF / TF-IDF) for features construction are conducted even more, given the specificities of Arabic language. The
to ensure the highest performance of the developed classification shared comments are most of time unstructured texts full of
models. In addition, to reduce the dimensionality and improve irregularities. This requires more cleaning and preprocessing
the classification performance, a features selection method is
applied. Three supervised classification algorithms have been work.
used: Naive Bayes, Random Forests and Support Vectors In general, the Arabic language used in Facebook, takes
Machines using R software. Our Machine Learning approach the form of the Modern Standard Arabic (MSA) or Dialectal
using sentiment analysis was implemented with the purpose of Arabic (DA) [2]. And, unlike other languages, like English,
analyzing the Facebook comments, written in Modern Standard the Arabic language has more complex particularities, from
Arabic or in Moroccan Dialectal Arabic, on the Morocco’s
its richer vocabulary to more compact aspects.
Legislative Elections of 2016. The results obtained are promising
and encourage us to continue working on this subject.

 ” consists of a sentence
For example, the word ” 
that is condensed into a single entity: ”and we gave it to you
Index Terms—Natural Language Processing; Sentiment Anal- to drink”. Diacritics (vowels) are also one of the important
ysis; Machine learning approach; Modern Standard Arabic;
Moroccan Dialectal; Feature construction; Feature selection.
characteristics of the Arabic language, since they can radically
change the meaning of words. For example, the word
” ” with different diacritics involve different meanings:
I. INTRODUCTION
  
 ”
” (to ride), ”  ” (to form), ”  ” (is formed), ”  ”
The transition into Internet communication in discussion 
forums, blog and social networks like Facebook and Twitter, (level), ”  ” (knees). In this work, we hadn’t handled this
provides new opportunities to improve the exploration of situation because in all the comments which we had collected,
information via Sentiment Analysis (SA). In social media, diacritics weren’t used.
people share their experiences, opinions or just talk about all Thus morphological complexity of the Arabic language
that concerns them online. The increasing expansion of the and the dialectal variety require advanced pre-processing,
contents and services of social media provide an enormous especially with the lack of published works and specific tools
collection of textual resources and present an excellent oppor- for pre-processing the Arabic texts [3].
tunity to understand the public’s sentiment by analysing its Indeed, several studies have been carried out for SA using
data. messages shared in the social networks and written in English
SA or Opinion Mining becomes a large domain of study. or in French etc. However, a few studies have been conducted
It tries to analyze the opinions, the sentiments, the attitudes for SA based on Arabic language.
and the peoples’ emotions on different subjects such as prod- This paper is concerned with studying SA on Facebook
ucts, services, organizations, etc. Indeed, the messages and comments written in MSA or in MDA (Moroccan Dialectal
comments shared in social networks, on a subject, an event Arabic). The goal is to highlight and overcome the main

‹,(((
challenges facing Arabic SA of social media, and then examine on different term weighting methods for Telugu corpus in
different classification models. combination with NB, SVM and K-Nearest Neighbors (K-NN)
Our main contributions in this work can be summarized as classifiers. Refaee et al. [10], presented a manually annotated
follows: Arabic social corpus of 8868 Tweets and discussed the method
of collecting and annotating the corpus. In [11], the authors
• Describing the properties of MSA and MDA and their
pointed out that it is the text representation schemes that
challenges for the SA;
dominate the performance of the text categorization rather than
• Presenting a set of pre-processing techniques of Facebook
the classifier functions. That is, choosing an appropriate term
comments written in MSA or in MDA for SA;
weighting scheme is more important than choosing and tuning
• Constructing and selecting features (words or groups
the classifier functions for text categorization.
of words) from Facebook comments written in MSA
In addition, some studies were based on both approaches as
or MDA which allows us to obtain the best sentiment
in [12], where Shoukry proposed a hybrid approach combining
classification model.
a machine learning method using SVM and the semantic ori-
The rest of this paper is organized as follows: In Section 2, entation approach. The goal was to improve the performance
we introduce some related works and briefly present their of sentence-level sentiment analysis based on Egyptian dialect.
proposed contributions. In Section 3 we describe in detail our The used corpus contains more than 20000 Egyptian dialect
machine learning process and its implementation for SA of tweets, from which 4800 manually annotated tweets were used
Facebook comments written in MSA or in MDA, we also (1600 positive, 1600 negative and 1600 neutral).
describe the used methods for features selection and extraction Studying the various propositions, we note that the SA
(words or groups of words). The results of the conducted in social networks was focused on tweets (from Twitter).
experiments are given in Section 4. A conclusion and some The possibility of classifying sentiments from Facebook’s
perspectives of this work are discussed in Section 5. comments, taking into account the particularity of this type
of text, is still relevant and very promising given the very few
II. RELATED WORK studies dealing with this subject. In this work, we propose a
The studies using SA on social networks may be grouped novel approach of SA on Facebook comments written in MSA
into two categories. The first one contains the methods based or MDA, based on machine learning techniques.
on lexicons of words. They consist in using a predefined
III. MACHINE LEARNING PROCESS OF FACEBOOK
collection (lexicon) of words, each one is annotated with a
COMMENTS
sentiment. The second category consists of machine learning
based approaches. Below, we will briefly describe some related We present in this section a Machine Learning (ML) process
works. (fig. 1) for SA conducted on Facebook comments written in
Within the first category, Abdul Majeed and Diab presented MSA or MDA.
a manually annotated corpus developed from modern standard This process starts by getting and preparing comments
Arabic with a new polarity lexicon [4]. They proposed two from Facebook. Then each comment is labelled as positive or
steps for classification. The first one was to construct a negative. Afterwards, features are extracted and pre-processed
binary classifier to sort subjective cases. The second one was from each comment.
to apply the binary classification to distinguish the positive To reduce the dimensionality and improve the quality of
cases from the negative ones. Nabil et al. [5] introduced four our supervised classification models, a features selection was
datasets in their work to build a multi-domain Arabic resource performed before classification, finally an evaluation step
(sentiment lexicon). They proposed a semi-supervised method allows to measure of the performance of our Machine learning
for building a sentiment lexicon that can be used efficiently in process.
SA.
Furthermore, Abdul-Mageed et al. [6], proposed SANA, a GettingData Text Feature
large-scale, multi-domain, and multi-genre Arabic sentiment fromFacebook Preprocessing Extraction
lexicon. The lexicon automatically extends two manually
collected lexicons: HUDA (4905 entries) and SIFFAT (3355
entries).
In the framework of machine learning based approaches, Evaluation Classification Feature
  Selection
Pang and Lee. [7] used machine-learning techniques for sen-
timent classification of movies using three classifiers: Naive
Bayes (NB), Maximum Entropy classification and Support Fig. 1. The Machine Learning process of Arabic Facebook comments for
Vector Machines (SVM). In their work, Lan et al. [8], sentiment analysis.
conducted experiments to compare various term weighting
schemes with SVM on two widely used benchmark datasets. A. Data collection and preparation
They also presented a new term weighting scheme tf.rf for Data collection from facebook was carried out via the
text categorization. Moreover, Murthy et al. [9] investigated Application Programming Interface: ”Facebook Graph API”,
which allows to collect comments shared publicly by Facebook they were duplicated: here ”   
!” becomes:


users. In this regard, a task of data sources targeting had to be ”    !” (not reasonable haha).
conducted beforehand according to the analysis objectives.

2) Tokenization: The extraction of words requires a prior
The collected comments can be irrelevant to the studied step through which a text is divided into tokens. In other
phenomenon, thus, in order to extract the relevant comments; languages like English or French, a token, in most cases, is
an interrogation task was performed at the level of the col- composed of a single word. However, the tokenization of an
lected data set, based on relative keywords corresponding to Arabic text results in several cases into more complex tokens.
the studied topic. If we consider the token ” )  
” , it’s equivalent to a sentence
We have targeted Moroccan online newspapers publishing in English: (we wrote it), this is the result of the compact
news articles in Arabic language. Two major criteria have been morphology of the Arabic language. We will present in the
considered. Firstly the number of visits to the online website of next paragraph, the stemming technique that simplifies this
the newspaper according to the Alexa websites ranking [13]. complexity.
Secondly, the number of subscribers in the Facebook page. 3) Stopwords removal: Among the obtained tokens, there
Therefore, only newspapers with a Facebook page that exceeds is some words that are not significant, irrelevant or do not
one million subscribers are retained. As in Twitter, Facebook bring information [14]. Of this fact, we have developed several
comments may be copied and republished (100 % identical). lists of stopwords that we have eliminated from the formed
Furthermore, the comments do not contain only sentences or corpus. We distinguish logical prepositions and connectors
words, but also URL (http://www ...), hashtags and signs (# from the MSA and those from the MDA, stopwords referring
$% =), punctuations etc. This requires, at this stage, a filtering to the places like names of cities and countries, stopwords
and cleaning step. referring to names of organizations and people, etc. We note
We aim to apply our ML process on Facebook comments that we have preserved certain prepositions as that relative to
written in MSA or in MDA, about the Moroccan’s legislatives the negation: ( ” !” ; ” ”; ” 7”; ” :) ; ” ...).
elections, which took place on October 7, 2016. Our main
4) Stemming: Extracting words from the corpus involve
objective is not to analyze these elections in order to draw preprocessing at the level of the tokens to unify the varieties of
conclusions but mainly to test the performances of our ML a word. Information retrieval (IR) states two important types
process for SA using using Facebook comments, published in of stemmer for the Arabic language. The first one is the root
MSA or in MDA, on a specific topic or a given phenomenon. stemmer; it’s an aggressive stemmer that reduces the word
The data collection and preparation step allowed us to select to its basic root [15]. This type of stemmer is more efficient
10 254 comments. in reducing dimensions for classifying text, but it leads to a
unification of words that are completely different.The second
B. Comments processing one is the light stemmer which eliminates only the most
Text pre-processing is a very important step toward the common prefixes and suffixes of a token [16]. It reduces
features construction. This stage has a major impact on the less the dimension of the features, but it preserves more the
classification models performance. It determines the words or meaning of the words [17].
group of words to be integrated as features or those to be In this work we have applied a light stemmer to the Arabic
removed from the data set. In this step we have used Python comments collected from Facebook [18]. We were inspired by
natural language toolkit (NLTK) . the works of Larkey et al, concerning their light10 Stemmer
1) Cleaning and normalizing the text: the Facebook com- [19], to implement a stemmer that treats both MSA and MDA.
For example, the words: ” ! ”, ” #
 <;,” ,” 
 <;,” ,”  

<;,”

ments published in Arabic, like other in languages, contain

often several irregularities and anomalies, which can be vol- are varieties of the word ”   ” (policy), that our stemmer
untary, such as the repetition of certain letters in some words as unifies in only one stem ” = ”. We present in the following
in: ”   
!” (not reasonable hahahahahaha), table (table I) an example of executing pre-processing tasks

or involuntary, such as spelling mistakes or incorrect use of on a comment.
a letter in place of another. For illustration, the incorrect The preprocessing step allowed us to extract from the 10 254
use of these letters ” "” and ” " ”, ” #
” and ” #”, ” $ ” and comments, 1 526 words.
” % ”, in some words will lead to an incorrect sentence, here:

0
1 2

 -. /
” & ' ()*+,  3,
 *+,” (Citizens are under the misguidance C. Comments Annotation and features construction
of the elected) is actually a misspelled sentence. Its correct 1) Comments annotation: SA methods using supervised
writing is ” & ' ( )
*+,
 -3 /
0
1 2
 3,
 *+,” (Citizens are under the learning imply prior determination of the polarity of opinions
care of the elected). So we were brought to normalize the expressed in the text. The annotation task allows a labelling of
text to supply dissimilar polarity between the text and and its entities. [17].
6 a unified shape of a letter in a word, for
example: ”45 7,” ; ”45 7,” ; ”457  ” are unified in : ”457,
 ” (the Labelling sentiment that are embedded in Facebook com-
last). Moreover we have removed the letters that are repeated ments is a difficult task, because, on the one hand, these
several times taking into consideration the special status of publications do not generally have indicators on the polarity
some letters like ” 8 ” or ” 9” and keep these letters twice if of opinions as in the case of movie or product that have
TABLE I N-grams are all combinations of adjacent words or letters
E XAMPLE OF EXECUTING THE PROCESSING TASKS ON A COMMENT of length n that we can find in source text. Although in
the literature the term can include the notion of any co-
Task Result
 ; >
 ?, @-A;,
 !  ! <;, " occurring set of elements in a string (e.g., an N-gram


> 4*+B
  
<;,C  !  

made up of the first and third words of a sentence) [21].
Original text
N-grams of texts are extensively used in text mining and
(The statement that this politician says
is unreasonable! hahahahaha natural language processing tasks. They are basically a
#moroccan politics)
 ; >
 ?, @-A;, set of co-occurring words within a given window and
 !  ! <;, " when computing the n-grams you typically move one

> 4*+,

    <;,   


Cleaning
word forward.
 !  ! <;, " ; >
   ?, @-A;, the next table (table III) shows an example of a prepro-
Normalizing

> 4*+,
<;,   
  

cessed comment with uni-gram and bi-gram sequences.

’ ! <;,’, ’ "’, ’; > ’, ’ ’, ’ ?,’, ’ @-A;,’


Tokenization
 ’ ’  
’  > 4*+,



<;,’, ’ ’ ,’  ’, ’  !’,

’ !’, ’ ! <;,’, ’; > ’, ’ @-A;,’


TABLE III
E XAMPLE OF UNI - GRAM AND BI - GRAM SEQUENCE
Stop words

 ’,’  
’  > 4*+,

<;,’, ’ ’, ’  
’,
removal

’ =’, ’ = ’, ’ ’, ’ @-D’

Comment
 , ,
Uni-gram Bi-gram
 P5
," :) > &<5
Stemming
’# 
Q :) > R &<5,Q
 
&<5, R ,Q
 4’, ’ = ’, ’ ’, ’  ’, -
 7

, & #  4 
*+,   
Q :) >Q &<5,Q ,Q

(He’s the best president Q -
 ,Q

,"Q @P5Q Q ," R @P5Q @P5 R :) >Q
of
of
the government
Morocco since
Q -

 , R ,"Q

an evaluation system allowing to deduct the polarity of the independence)
sentiment (stars **** or a score). On the other hand, the
opinions expressed in these comments concern not only the
• Features weighting schemes:
topic of interest but also other entities related to this topic
For this step, we considered the term weighting ap-
[20].
proaches that prove to be prominent for our work. They
Complexity of sentiments annotation becomes more accen-
are defined as follows:
tuated with the analysis of an Arabic text, because of the lack
of sentiment lexicon for the Moroccan dialectal language that  Term Frequency (TF):
has imposed on us for the moment to use a human annotation. Using this method [22] [23], each term t is assumed
Labelling sentiments of our data set is achieved through to have a value proportional to the number of times
crowdsourcing [2], since this task was assigned to a group it occurs in a document d is as follows:
of judges to define the polarity of the comments, positive or
negative. At the end, 6581 comments were labeled as negative w(d, t) = T F (d, t) (1)
and 3673 were considered as positive.  Term Frequency-Inverse Document Frequency (TF-
IDF):
TABLE II This approach follows Salton’s definition [24] [25]
E XAMPLE OF ANNOTATED COMMENTS
which combined TF and IDF to weight the terms.
The author showed that this approach gives better
Comment English translation Sentiment performance compared to the case where TF or IDF
EF G
;, 
>/ H & ' >*+
 The parliamentarians lux-
 4';, 5, are used separately. The combined result is given by:
ury is free and the edu-
; ;
> 
</0
1 I<;, cation?...the people=...I am
Negative w(d, t) = T F (d, t) × IDF (t) (2)
;
J =</0
1  & '>*+  4';, ashamed to say it even if
the parliamentarians are not
And for a given N documents, if n documents
(MSA & MDA) ashamed when they said that
KL<);, JM @,45 ;N;,
 The state’s money is illegiti- contain the term t, IDF is given as follows:
  
(MSA) K  M7, JM -O
mate for the poor people and Negative
legitimate for rich ones N
IDF (t) = log (3)
,"

 P5 :) > &<5  , , He’s the best president of n


(MDA) -  7


   the government of Morocco Positive
, & # 4*+,   since independence D. Feature selection
In machine learning and particularly for big dimensions,
2) Features construction and features vector: Features con-
the selection of a reduced subset of the studied variables
struction is an important step in SA because it establishes the
is an indispensable task. It allows essentially improving the
transition of unstructured data (text) towards the structured
performance of the prediction models, to identify the features
data (Features*Observations).
characterizing the data and to reduce the complexity and the
• n-grams extraction scheme: cost of calculation [26].
Several methods have been proposed for variables selection reduces the computation of probabilities. This classifier gives
in the context of discrimination. The objective is to address the good results and has been used in many researches such as
problems of collinearity, redundancy and the high possibility those reported in [31] and [32].
of existence of noise variables that generally give significant 2) Support Vectors Machines: It is an effective traditional
error rates. text categorization method. The main idea of SVM is to find
In this work we use the score ”Between Group to within the hyperplane, which is represented as a vector, that separates
Group Sum of Squares” (BSS/WSS) used in [27] [28], to select the document vectors of one class from the document vectors
the discriminant words. This score belong to the category of the other classes [33]. SVM shows very good performance
of filter approaches for variables selection whose process and higher accuracy in many studies dealing with sentiment
consist in evaluating the discriminant power of each variable, analysis in different languages. The work reported in [34]
independently of the others, a priori, before the step of shows that SVM led to high classification performance when
model estimation. Indeed, as we will test several methods applied to English language, compared to other classifiers.
of classification, we decided to use a filter approach which 3) Random Forests: Formally, a random forest is a predictor
is independent of the used learning method. In the other consisting of a collection of randomized base regression trees.
hand, the choice of the BSS/WSS score is due to its reduced These random trees are combined to form the aggregated
computational time and good performance in high dimensions. regression estimate [35].
As its name indicates, the BSS/WSS score identifies as The random forest classifier consists of a combination of
the most discriminating variables, those that guarantee a large tree classifiers where each classifier is generated using a
between classes and a small within-class variations. The score random vector sampled independently from the input vector,
of each variable ”j” is calculated as follow: and each tree casts a unit vote for the most popular class to
classify an input vector [36].

n tr 
Q
I(yi = q)(x̄qj − x.j )2 IV. RESULT AND DISCUSSION
i=1 q=1
BSS/W SS(j) = (4) We devote this section to implement our machine learning

n tr 
Q
I(yi = q)(xij − x̄qj )2 approach on Facebook comments written in MSA or in MDA,
i=1 q=1 about the Moroccan’s legislatives elections.
With ntr the size of the training set Q the number of Pang and Lee [37] have emphasized that the performance
categories(classes), ”I” is a Boolean function taking the value of classification models is influenced by the specificities of
1 if the condition is verified and 0 otherwise, x̄qj the average the studied dataset, such as language, topic, text length, etc.
value of the variable ”j” for data belonging to the class q For these reasons, we have tested several configurations, com-
and x̄.j the average value of the variable ”j” for all dataset. bining the weights of the features (TF and TF-IDF) with the
The variables associated with the highest scores are the most n-grams extraction in order to find out the best classification
discriminating ones. models using three algorithms. At the end, six experiments
We rank the variables in order of relevance, based on the were performed (Table IV).
training set using Matlab Language with the code source
implemented by Ooi et al [29]. Once the order obtained, we TABLE IV
construct a sequence of subsets using the forward method T ESTED CONFIGURATIONS
[30]. The first subset contains the first relevant variable; the Number
second one contains the first two variables in decreasing Test Configuration
of features
order of relevance and so on. For each subset, we calculate
the recognition rate based on the validation set. The subset Test 1 Unigrams / TF 1526
giving the recognition rate maximum is chosen as the best Test 2 Unigrams / TF-IDF 1526
one containing the optimal number of variables. Test 3 Bigrams / TF 1016
Test 4 Bigrams / TF-IDF 1016
E. Supervised classification
Test 5 Unigrams+Bigrams / TF 1746
To classify the Facebook comments, we have applied three
supervised classification algorithms: Naive Bayes (NB), Ran- Test 6 Unigrams+Bigrams / TF-IDF 1746
dom Forests (RF) and Support Vectors Machines (SVM) using
R software. As we conduct supervised classification using features
1) Naive Bayes: It is a kind of classifier that depends on selection, we aim to develop in every test from Table IV,
Bayes rule written in the following formula: the most accurate classification model that imply minimum
constructed features; thereby we split our labelled data set
P (c) × P (d|c)
P (c|d) = (5) into three subsets: 50% for training, 25% for validation and
P (d) 25% for testing.
The main idea of the Naive Bayes classifier is to suppose The training subset is used in the features selection step
that predictor variables are independent which substantially in order to identify the features that contribute the most in
Test 1: Unigrams/ TF Test 2: Unigrams/ TF−IDF Test 3: Bigrams/ TF
0.75

0.72

0.75
0.72
0.70
Accuracy

Accuracy

Accuracy
0.72
0.69 0.68

0.69
0.66
0.66

0.66 0.64
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Number of features Number of features Number of features
Test 4: Bigrams/ TF−IDF Test 5: Unigrams+Bigrams/ TF Test6: Unigrams+Bigrams / TF−IDF

0.72
0.750
0.750

0.70
0.725
0.725
Accuracy

Accuracy

Accuracy
0.68
0.700 0.700

0.66
0.675 0.675

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200


Number of features Number of features Number of features

SVM
NaiveBayes
RandomForest Fig. 2. Obtained accuracy for the tested configurations with feature selection .

determining sentiment polarity. Once the order of relevance TABLE V


of the features is obtained, we insert them one by one in R ESULTS OF THE TESTED CONFIGURATIONS
decreasing order of relevance (up to 200 features) and we Number Accuracy
Accuracy Accuracy
calculate the accuracy of each constructed subset on the Test Classifier
of
with all
with
with
validation set. The subset giving the maximal accuracy is selected selected
features test set
features features
chosen as the optimal subset of variables. Figure. 2 gives, SVM 186 0.76 0.74 0.75
Unigrams
for the six tested configurations, the calculated validation / TF
NB 56 0.39 0.71 0.73
accuracy rates according to the number of variables inserted RF 149 0.74 0.75 0.75
SVM 198 0.78 0.77 0.78
in decreasing order of relevance. Unigrams
NB 55 0.42 0.72 0.72
/ TF-IDF
The last step is to calculate the testing error rates of the RF 56 0.75 0.76 0.76
SVM 195 0.72 0.72 0.73
chosen models which measure their ‘real’ performance. The Bigrams
NB 20 0.35 0.69 0.68
training sample in this stage is composed of the data used / TF
RF 175 0.73 0.73 0.72
previously in training and validation. Table V. summarizes the Bigrams
SVM 199 0.72 0.72 0.72
results of the conducted experiments. NB 20 0.36 0.67 0.67
/ TF-IDF
RF 198 0.72 0.72 0.73
Before interpreting the impact of feature selection on our (Unigrams SVM 200 0.76 0.76 0.77
supervised classification, we will discuss the results of clas- +Bigrams) NB 100 0.39 0.74 0.74
/ TF RF 89 0.71 0.76 0.76
sification models regardless this aspect of features selection. (Unigrams SVM 199 0.78 0.77 0.77
Thus, in general, for the different tested configurations, the +Bigrams) NB 50 0.56 0.72 0.71
SVM accuracy surpasses that of the random forest, and with / TF-IDF RF 148 0.73 0.76 0.75
a higher degree that of the Naive Bayes. This last gives the
lowest accuracy for most configurations (less than 50%).
In the other hand, considering features selection results, features accuracy (1526 features) is equal to 75%. With an
we note a significant improvement in term of accuracy using optimal selected subset (56 features) , the registered accuracy
Naive Bayes classifier. For example in the test 2 with the is about 76%.
configuration unigrams/TF-IDF, the obtained accuracy with Furthermore, applying features selection with SVM classifi-
all features (1526) was equal to 42%, however, using feature cation does not seem to improve the classifier’s performance,
selection, a maximum accuracy of 72% is reached with only nevertheless, similar accuracy rate is obtained with less input
55 features (optimal feature set). features. For example, in test 2,the overall accuracy (78%
In the case of classification using random forest, the fea- with 1526 features) is equal to that obtained using a reduced
tures selection gives similar performance compared to those selected subset (198 features). Otherwise, if we take into
obtained using all features. For instance, in test 2 ,the overall consideration the impact of features selection with the adopted
configurations for feature construction, we deduce that the [10] E. Refaee and V. Rieser, “An arabic twitter corpus for subjectivity and
second configuration with unigrams extraction and TF-IDF sentiment analysis.” in LREC, 2014, pp. 2268–2273.
[11] E. Leopold and J. Kindermann, “Text categorization with support vector
weighting, is more efficient in term of accuracy/(number of machines. how to represent texts in input space?” Machine Learning,
features) ratio. vol. 46, no. 1-3, pp. 423–444, 2002.
[12] A. M. Shoukry, “Arabic sentence level sentiment analysis,” Ph.D.
V. C ONCLUSION AND PERSPECTIVES dissertation, The American University in Cairo, 2013.
[13] (2016) Alexa website. [Online]. Available: www.alexa.com/topsites/
This work dealt with the SA for Facebook’s comments countries/MA
written and shared in MSA or in MDA. A machine learning [14] R. M. Duwairi and I. Qarqaz, “Arabic sentiment analysis using super-
vised classification,” in Future Internet of Things and Cloud (FiCloud),
based process was proposed and applied to the Moroccan’s 2014 International Conference on. IEEE, 2014, pp. 579–583.
legislatives elections. [15] R. Duwairi, M. Al-Refai, and N. Khasawneh, “Stemming versus light
Several combinations of extraction (n-grams) and weighting stemming as feature selection techniques for arabic text categorization,”
in Innovations in Information Technology, 2007. IIT’07. 4th Interna-
schemes (TF / TF-IDF) for features construction were tested tional Conference on. IEEE, 2007, pp. 446–450.
to guarantee the highest performance of the developed clas- [16] F. S. Al-Anzi and D. AbuZeina, “Stemming impact on arabic text cat-
sification models. The results of the experiments showed that egorization performance: A survey,” in Information & Communication
Technology and Accessibility (ICTA), 2015 5th International Conference
the quality of the developed classification model depends on on. IEEE, 2015, pp. 1–7.
the features set built with the combination of extraction and [17] H. Saif, M. Fern, Y. He, and H. Alani, “H.: Evaluation datasets for
weighting schemes which we had experimented. For example, twitter sentiment analysis a survey and a new dataset, the sts-gold,” in
In: Proceedings, 1st ESSEM Workshop. Citeseer, 2013.
the best performance was obtained with the combination (Uni- [18] A. Elouardighi, M. Maghfour, and H. Hammia, “Collecting and process-
gram + Bi-gram / TF-IDF) with the classifiers SVM and NB. ing arabic facebook comments for sentiment analysis,” in International
The application of a features selection method allowed us to Conference on Model and Data Engineering. Springer, 2017, pp. 262–
274.
reduce the dimensionality while maintaining the performance [19] L. S. Larkey, L. Ballesteros, and M. E. Connell, “Light stemming
and even improving it in some cases. for arabic information retrieval,” in Arabic computational morphology.
As a future work, we expect to continue working, devel- Springer, 2007, pp. 221–243.
oping and testing other extraction and construction methods [20] B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on
human language technologies, vol. 5, no. 1, pp. 1–167, 2012.
taking into account the specificities of the dialectal Arabic [21] W. B. Cavnar, J. M. Trenkle et al., “N-gram-based text categorization,”
language. Some improvements still needed. Firstly, the size of Ann Arbor MI, vol. 48113, no. 2, pp. 161–175, 1994.
the dataset is still small, solid conclusions require definitely [22] L. B. Doyle, Information retrieval and processing. Melville Pub. Co.,
1975.
bigger datasets [38] [39]. Secondly, we aim to give more [23] H. P. Luhn, “A statistical approach to mechanized encoding and search-
importance to sentiments annotation by developing methods ing of literary information,” IBM Journal of research and development,
of automatic annotation based on a lexicon. vol. 1, no. 4, pp. 309–317, 1957.
[24] G. Salton, A. Wong, and C. S. Yang, “A vector space model for
automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov.
R EFERENCES 1975.
[1] N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. Al-Ayyoub, [25] G. Salton and M. J. McGill, Introduction to Modern Information
“Arabic sentiment analysis: Lexicon-based and corpus-based,” in Applied Retrieval. New York, NY, USA: McGraw-Hill, Inc., 1986.
Electrical Engineering and Computing Technologies (AEECT), 2013 [26] H. Liu and H. Motoda, Feature selection for knowledge discovery and
IEEE Jordan Conference on. IEEE, 2013, pp. 1–6. data mining. Springer Science & Business Media, 2012, vol. 454.
[2] R. Duwairi, R. Marji, N. Sha’ban, and S. Rushaidat, “Sentiment analysis [27] S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination
in arabic tweets,” in Information and communication systems (icics), methods for the classification of tumors using gene expression data,”
2014 5th international conference on. IEEE, 2014, pp. 1–6. Journal of the American statistical association, vol. 97, no. 457, pp.
[3] A. Assiri, A. Emam, and H. Aldossari, “Arabic sentiment analysis: A 77–87, 2002.
survey,” vol. 1, no. 6, 2015, pp. 75–85. [28] M. S. B. Sehgal, I. Gondal, and L. Dooley, “Missing value impu-
[4] M. Abdul-Mageed, M. T. Diab, and M. Korayem, “Subjectivity and tation framework for microarray significant gene selection and class
sentiment analysis of modern standard arabic,” in Proceedings of the prediction,” in International Workshop on Data Mining for Biomedical
49th Annual Meeting of the Association for Computational Linguistics: Applications. Springer, 2006, pp. 131–142.
Human Language Technologies: short papers-Volume 2. Association [29] C. H. Ooi, M. Chetty, and S. W. Teng, “Differential prioritization
for Computational Linguistics, 2011, pp. 587–591. between relevance and redundancy in correlation-based feature selection
[5] M. Nabil, M. A. Aly, and A. F. Atiya, “Astd: Arabic sentiment tweets techniques for multiclass gene expression data,” BMC bioinformatics,
dataset.” in EMNLP, 2015, pp. 2515–2519. vol. 7, no. 1, p. 320, 2006.
[6] M. Abdul-Mageed and M. T. Diab, “Sana: A large scale multi-genre, [30] T. Marill and D. Green, “On the effectiveness of receptors in recognition
multi-dialect lexicon for arabic subjectivity and sentiment analysis.” in systems,” IEEE transactions on Information Theory, vol. 9, no. 1, pp.
LREC, 2014, pp. 1162–1169. 11–17, 1963.
[7] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using [31] I. Rish, “An empirical study of the naive bayes classifier,” in IJCAI 2001
subjectivity summarization based on minimum cuts,” in Proceedings of workshop on empirical methods in artificial intelligence, vol. 3, no. 22.
the 42nd annual meeting on Association for Computational Linguistics. IBM New York, 2001, pp. 41–46.
Association for Computational Linguistics, 2004, p. 271. [32] A. McCallum, K. Nigam et al., “A comparison of event models for
[8] M. Lan, S.-Y. Sung, H.-B. Low, and C.-L. Tan, “A comparative study naive bayes text classification,” in AAAI-98 workshop on learning for
on term weighting schemes for text categorization,” in Neural Networks, text categorization, vol. 752. Madison, WI, 1998, pp. 41–48.
2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Confer- [33] Q. Ye, Z. Zhang, and R. Law, “Sentiment classification of online reviews
ence on, vol. 1. IEEE, 2005, pp. 546–551. to travel destinations by supervised machine learning approaches,”
[9] V. G. Murthy, B. V. Vardhan, K. Sarangam, and P. V. P. Reddy, “A Expert Systems with Applications, vol. 36, no. 3, pp. 6527–6535, 2009.
comparative study on term weighting methods for automated telugu text [34] G. Fung and O. L. Mangasarian, “Incremental support vector machine
categorization with effective classifiers,” International Journal of Data classification,” in Proceedings of the 2002 SIAM International Confer-
Mining & Knowledge Management Process, vol. 3, no. 6, p. 95, 2013. ence on Data Mining. SIAM, 2002, pp. 247–260.
[35] G. Biau, “Analysis of a random forests model,” Journal of Machine
Learning Research, vol. 13, no. Apr, pp. 1063–1095, 2012.
[36] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
5–32, 2001.
[37] B. Pang, L. Lee et al., “Opinion mining and sentiment analysis,”
Foundations and Trends R in Information Retrieval, vol. 2, no. 1–2,
pp. 1–135, 2008.
[38] A. Kanavos, N. Nodarakis, S. Sioutas, A. Tsakalidis, D. Tsolis, and
G. Tzimas, “Large scale implementations for twitter sentiment classifi-
cation,” Algorithms, vol. 10, no. 1, 2017.
[39] N. Nodarakis, S. Sioutas, A. K. Tsakalidis, and G. Tzimas, “Large scale
sentiment analysis on twitter with spark.” in EDBT/ICDT Workshops,
2016, pp. 1–8.

You might also like