Professional Documents
Culture Documents
The pre-processing stage of the texts is the most important in the analysis of feelings
because the messages in the social networks are characterized by expressions
colloquial, abbreviations, emoticons, a lengthening of words, a capital letter irregular
and they do not generally conform to the canonical grammatical rules.
According to this paper, we will address the different steps of preprocessing of Arabic
dialect in every referenced paper.
Adding tags: emoticons symbols and punctuation marks are replaced with
their corresponding meaningful word tags that represent their sentiment, for
example ‘:)’ is replaced by happy.
Data cleaning: removing items that do not include any sentiments such as
URLs, usernames, number etc…
The bag of words was used as well for emoticon symbols as extra features.
The final dataset consisted of about 826 tweet regarding different domains,
and was manually annotated. The experiments were conducted using a 10
fold cross validation method in order to ensure reliable results.
The results showed that using concepts features outperforms the baseline
BoW model. As far as the classifier model, SVM performed better than NB.
The highest F-measure value reached 95% when using AddC concepts
representation and SVM classifier.
‘
2) ‘Sentiment Analysis of Moroccan Tweets using Naive Bayes Algorithm’:
This paper aims to classify the Moroccan tweets by means the NB, use topic
modeling (LDA) to discover the topics of classified tweets and finally locate
these tweets on Moroccan map according to their categories by the tool
‘Folium’.
Preprocessing:
Assuming that the emotion symbols in the tweet represent the overall
sentiment contained in that tweet. The tweets are filtered by a list of positive
and negative emotion symbols.
Storing the words written using the Arabic or French alphabet in each slave
node of the cluster.
Normalize whitespace
Create function to detect the language used to write the text of tweet
Remove tokens of part of speech that are not important to the analysis by
using the part of speech software
During the experiment, the author collected a sample of 700 tweets for training
and 300 tweets for testing. Then, he applied NB as a classifier; afterwards
LDA was used for the classified tweets for each category.
To test the reliability of this model, 1100 tweets were collected from three
organizations in Jordan and were manually annotated.
The comparison results of manual and automatic classification results:
4) ‘Negation handling in sentiment analysis at sentence level’: This paper
addressed the problem of negation which affects the polarities of other words.
When a negation appears in a sentence it is important to determine the
sequence of words which are affected by the negation’s term. The only issue
to handle the negation is the scope of the latter which may be limited only to
the next word or may be extended up to other words following negation.
The negation may appear in two forms, ie. Morphological and syntactic
negations.
Scope of syntactic negations:
The diminisher negations are different from the syntactic negations because
they usually reduce the polarities of other words instead of completing
inverting the polarities. In order to determine the scope of diminishers we uses
the diminishers list (hardly, less, little etc…) to indicate the presence of such
type of negation in a sentence. The affected adjective or verb is determined
using the following two heuristics. 1) Adverb is usually used immediately
before or after an adjective or a verb, which it modifies. 2) If neither an
adjective nor a verb is immediately before or after the diminishers then it likely
to affect the nearby adjective or verb within the same clause. After
determining the affected adjective/verb, the polarity of this word is diminished
by using a reducing factor of 0.2.
In some cases the negation term and the negated opinionated word are
combined in a single word, e.g. In words such as end-less, impolite,
dishonest, non-cooperative, etc. This type of negation is called morphological
negation which can be formed by using either one of the nine prefixes (i.e. de-,
dis-, il-, im-, in-,ir-, mis-, non-, un-) or one suffix (i.e. -less) with a root word.
The improvement in performance by the proposed method is because of three
main reasons. First, the proposed method determines scope of different types
of negations more effectively. Second, an appropriate word sense
disambiguation method is adopted. Finally, all opinionated POS (i.e. adjective,
verb, adverb and noun) are considered while determining the polarity.
According to the experiments, the SA behaves better using the light stemmer,
the emoticons analysis, negation words and intensifiers by a range of 9 to 25
in accuracy.
Some references:
In this work, the author used a dataset of 340 MSA, 1402 colloquial and 1742
comments and applied several experiments using the preprocessing methods
below to evaluate the performance of these methods either individually or
combined. The performance of the sentiment analysis model was improved
when adopting such themes. The best performance for the sentiment analysis
model was achieved when using negations, and intensifiers. (98.2% for the
predicted positive, and 93.2% for the predicted negative). Finally the best
performance was for those comment written in MSA while the worst one for
those adopting the informal Arabic style.
Preprocessing
Negation handling: If a word is found in the negation list, the polarity of the
neighboring opinion word is computed by multiplying the score of opinion word
by -1. On the other hand, other method has proven to be effective in [12, 8]. In
this work only syntactic negations are handled by using the part of speech and
dependency tree in polarity calculation.
Preprocessing:
-Stemming
Feature extraction:
-N-gram
-Part of speech
After applying two models with different combinations of feature extractors and
using 10 fold cross validation for evaluation:
As far as the model, it turns out that the SVM model outperformed NB.
Some references:
‘Developing resources for sentiment analysis of informal Arabic text in social media’
This article addressed the topic of sentiment analysis of MSA and Jordanian
dialectical Arabic tweets; the main contribution of this latter consists of creating
a lexicon from the tweets by translating all the dialectical words into MSA by
Crowdsourcing tool.
The dataset has been annotated as well by three labels, which are positive,
negative and neutral.
Tokenizing
Khoja stemmer
Determining the weight of every token using the binary model where a token is
given a weight equals to 1 if it is present in the tweet under consideration or is
given a weight equals to 0 if the token is absent from the tweet.
Some references:
////
https://www.ijaiem.org/Volume2Issue5/IJAIEM-2013-05-26-063.pdf
Social Networks’ Text Mining for Sentiment Classification: The case of Facebook’
statuses updates in the “Arabic Spring” Era
This paper mainly addresses the usage of emoticons’ lexicon, interjections’ lexicon
and part of speech tagging as features in the preprocessing step
/////
https://www.researchgate.net/publication/233859560_Preprocessing_Eg
yptian_Dialect_Tweets_for_Sentiment_Mining
////