Professional Documents
Culture Documents
Reviews Classification
Using SentiWordNet Lexicon
Alaa Hamouda
Mohamed Rohaim
Department of Systems and Computers Engineering
Al_Azhar University
Abstract-Opinion lexicons are resources that associate
sentiment polarity for words. SentiWordNet is one of
these lexicons that assigns to each synset of WordNet
three sentiment numerical scores, positivity, negativity
and objectivity. Using opinion lexicons in opinion mining
research stems from the hypothesis that individual words
can be considered as a unit of opinion information, and
therefore may provide clues to review sentiment. This
paper presents the results of applying the SentiWordNet
lexical resource using various techniques to the problem
of automatic sentiment classification of reviews. Our
approach yields an improvement over the previously used
approaches for using SentiWordNet in sentiment
classification of reviews. The results indicate
SentiWordNet could be used as an important resource for
sentiment classification tasks.
I. INTRODUCTION
Opinion mining concerns with the reviews an author
expresses. With the rapid growth of available subjective text
on the internet in the form of product reviews, blog posts and
comments in discussion forums, business analysts are turning
their eyes on the internet in order to obtain factual as well as
more subtle and subjective information (opinions) on
companies and products. Opinion mining can assist in a
number of potential applications in areas such as search
engines, recommender systems and market research.
There are several approaches for detecting sentiment in text
present in literature. One of the most important is to use the
lexical resources such as a dictionary of opinionated terms.
SentiWordNet[1] is one such resource, containing opinion
information on terms extracted from the WordNet database by using a semi supervised learning method - and made
publicly available for research purposes. SentiWordNet
provides a readily available database of term sentiment
information for the English language, and could be used as a
replacement to the process of manually deriving opinion
lexicons.
In this paper we study various methods to classify product
review as positive or negative using SetiWordNet lexicon.
This study explores that SentiWordNet can be used in
effective way to classify product reviews. The remainder of
the paper is organized as follows: Section 2 presents the
previous work of classifying products reviews. Section 3
describes the SentiWordNet lexicon and how it is used to
make polarity classification, while section 4 presents our
proposed technique to improve the performance of
classification. The implementation and results are given in
120
the terms contained in the synset are. Each of the three scores
ranges from 0.0 to 1.0, and their sum is 1.0 for each synset.
This means that a synset may have nonzero scores for all the
three categories, which would indicate that the corresponding
terms have, in the sense indicated by the synset, each of the
three opinion-related properties only to a certain degree.
SentiWordNet word values have been semi-automatically
computed based on the use the semi-supervised method
described in [11]. Examples of sntiment scores associated to
SentiWordNet entries are shown in Figure 1; the entries
contain the parts of speech category of the displayed entry, its
positivity, its negativity, and the list of synonyms. We show
various synsets related to the words good and bad. There
are 4 senses of the noun good, 21 senses of the adjective
good, and 2 senses of the adverb good in WordNet. There
is one sense of the noun bad, 14 senses of the adjective
bad, and 2 senses of the adverb bad inWordNet.
Category
A
A
A
A
N
N
WNT
Number
01123148
00106020
01125429
01510444
03076708
05144079
pos
neg
Synonyms
0.875
0
0
0.25
0
0
0
0
0.625
0.25
0
0.875
good#1
good#2 full#6
bad#1
big#3 bad#2
trade_good#1 good#4 commodity#1
badness#1 bad#1
121
Negative Reviews
Precision
%
Recall
%
Precision
%
Recall
%
0.0
62.88
83.00
75.00
51.00
67.00
0.1
63.04
81.95
74.21
51.95
66.95
0.2
63.61
77.45
71.18
55.70
66.58
0.3
62.63
71.80
66.96
57.15
64.48
Threshold
Value
Over All
Accuracy
122
Positive Reviews
Negative Reviews
REFERENCES
[1]
Over All
Accuracy
Precision Recall
%
%
Precision
%
Recall %
0.0
64.58
82.50
75.78
54.75
68.63
0.1
64.16
82.35
75.37
54.00
68.18
0.2
64.37
78.15
72.20
56.75
67.45
0.3
63.02
72.35
67.55
57.55
64.95
5.4 Comparison
The table below illustrates how the proposed methods to
use SentiWordNet overcome the Term Counting method in
[8], where the same data set (Amazon reviews) were used in
all experiments.
Table (4): Comparing Results
Method
Term Counting [8] applied on Amazon Corpus
Sum on Review (this research)
Average on Sentence and Average on Review
(this research)
Accuracy
56.77 %
67.00 %
68.63 %
123