You are on page 1of 6

International Conference on Advances in Human Machine Interaction (HMI - 2016),

March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India

Improved Feature Extraction and Classification -


Sentiment Analysis
M.Trupthi1, Suresh Pabboju2, G.Narasimha ,
3

Research Scholar, Information Technology Department, Computer sciencience Department,


Computer Science Department, JNTUH, CBIT, Hyderabad, Telangana State, JNTUH, Jagital, Telangana State, India.
Hyderabad, Telangana State, India.

Abstract—Sentiment analysis is becoming a popular area Sentiment analysis can be performed at three different
of research and social media analysis, especially around user levels: document, sentence and aspect level. The document
reviews and tweets. It is a special case of text mining generally level sentiment analysis aims at classifying the entire
focused on identifying opinion polarity, People are usually document as positive or negative, (Pang et al, [3]; Turney,
interested to seek positive and negative opinions containing
[4]). The sentence level sentiment analysis is closely related
likes and dislikes, shared by users. Therefore reviews or
product features have got significant role in sentiment analysis. to subjectivity analysis. At this level each sentence is
In addition to sufficient work being performed in text analyzed and its opinion is determined as positive, negative
analytics, feature extraction in sentiment analysis is now or neutral, (Riloff et al, [4]; Terveen et al, [5]). The aspect
becoming an active area of research. Feature based sentiment leave l sentiment analysis aims at identifying the target of
analysis include feature extraction, sentiment classification and the opinion. The basis of this approach is that every opinion
finally the sentiment evaluation. In this paper, explored the has a target and an opinion without a target is of limited use,
machine learning classification approaches with different (Hu and Liu, [6]). The simplest strategy uses a bag-of-words
feature selection schemes to obtain a sentiment analysis model model. Which create lists of 'positive' and 'negative' words
for the movie review dataset and the results obtained are
and judge a document based on whether it has a
compared to identify the best possible approach.
preponderance of positive or negative words. Creating such
Keywords— Feature Extraction, Bag of Words, lists is not easy; some of the words are likely to be quite
Classification, Bigram Collocation, Information Features,
different for different kinds of topics. Initially, take a
Evaluation collection of documents on some topic and rate them (by
hand) as positive or negative and then use this collection to
I. INTRODUCTION train a classifier model
Text mining is the process of discovering useful and Limitations:
interesting knowledge from unstructured text. In order to The bag-of-words model does quite well in assessing short
discover knowledge from unstructured text data, the first reviews which are clearly addressed to a single object, but
step is to convert text data into a manageable representation. they have several limitations:
A common practice is to model text in a document as a set  some important words are ambiguous in their opinion:
of word features, i.e., “bag of words” (BOW). Often, some "low" is positive in "low price" but negative in "low
feature selection techniques are applied, such as stop-word quality"
removal or stemming, to only keep meaningful features and  the assessment does not reveal which aspects of the
improve the accuracy using supervised classification product led to the positive or negative sentiment,
algorithm [1]. although this may be more crucial to the decision maker
Sentiment Analysis or Opinion Mining is a challenging one to do the overall assessment
in Text Mining and Natural Language Processing for feature  If the review compares the item to other items, the bag-
extraction, classification and summarization of sentiments. of-words approach is unable to distinguish references to
Sentiment Analysis also assists individuals and the different items
organizations who are interested in knowing what other Overcoming all three limitations requires a richer model
people comment about a particular product, service topic, which can capture some of the structures of the language.
issue and event to find an optimal choice for which they are NGrams- Unigrams presents the simplest model for the
looking for. Sentiment Analysis in the context of n-gram approach. It consists of all the individual words
Government Intelligence aims at extracting public views on present in the text. The bigram model defines a pair of
government policies and decisions to infer possible public adjacent words. Each pair of words forms a single bigram.
reaction on implementation of certain policies [2]. The higher order grams can be formed in the similar way by
Sentiments with respect to movie reviews, the reviews are taking together the n adjacent words. Higher order n-grams
categorized into pos and neg categories. In this proposed are more efficient in capturing the context as they provide
work simple Naïve Bayes Classifier is considered as a better understanding of the word position.
baseline, using Boolean word feature extraction. An n-gram defines a subsequence of n items from a given
sequence. It is used in various fields of natural language

978-1-4673-8810-8/16/$31.00 ©2016 IEEE


International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
processing and genetic sequence analysis. An n-gram model classification performance. If a particular feature tends to be
defines a method for finding a set of n-gram words from a highly consistent in the texts of a certain class (positive class
given document. The commonly used models include or negative class), then the algorithm will generalize that
unigrams (n=1), bigrams (n=2) and trigrams (n=3). However this feature is a good indicator of that class (Brooke, 2009).
the value of n can be extended to higher level grams. The n- For example, beautiful may be a good indicator to
gram model can be better explained with the following generalize a text as positive. To date, many features have
examples: been applied in sentiment analysis, such as unigrams,
bigrams, trigrams, even higher level n-grams [6].
Text: “Honesty is the best policy.”
Unigrams: “honesty”, “is”, “the”, “best”, “policy”. B. N-grams
Bigrams: “honesty is”, “is the”, “the best”, “best policy”. An n-gram model is a type of probabilistic language model
Trigrams: “honesty is the”, “is the best”, “the best policy”. for predicting the next word conditioned on a sequence of
previous words using Markov models. The probabilistic
NLP and Machine Learning approaches were used for the expression is P(xi| xi-(n-1)…., xi-1 ). N-gram of size 1 is referred
process. Multiple experiments were carried out using to as unigram, size 2 as bigram, and size 3 as trigram. Since
different feature sets and parameters to obtain maximum n-grams are used for capturing dependencies between single
accuracy. words that stay in a text sequentially, the combination of
words does not necessarily have syntactical or semantic
II. LITERATURE SURVEY relations. Unigrams performed much better than bigrams
Turney (2002) presented a simple unsupervised learning when used as features for feature spaces in Pang et al.
algorithm to classify the reviews based on recommended (2002), while bigrams and trigrams contributed higher
(thumbs up) or not recommended (thumbs down) reviews performance than unigrams in (Dave et al., 2003; Ng et al.,
online. The sentiment classification of a review is predicted 21 2006). In Pang et al.(2002), unigrams also outperformed
by the average semantic orientation (SO) of adjective or adjectives when treated as features.
adverb phrases in the review. Opinions are usually In feature spaces of machine learning models for text
expressed by adjectives and adverbs. They used Point-wise classifications, the size of document vectors or sentence
Mutual Information (PMI) and Information Retrieval (IR) to vectors that are composed of Bag-of-Word features is
measure the similarity of pairs of words or phrases, which is usually big because it depends on the size of vocabularies in
to calculate semantic orientation (SO) of a word or phrase the whole corpus (dataset). For example, a corpus contains
by subtracting mutual information between the word or 5000 sentences, and the average number of vocabularies in
phrase and the reference word “excellent” from the mutual one sentence is 5, then the size of document vectors or
information between the word or phrase and the reference sentence vectors will be 25000 when unigrams are treated as
word “ poor”. The mutual information is the co-occurrence features. Large sized vectors can slow the system down and
of the two words or phrase among millions of online they are inefficient. Common ways to get rid of less
documents. Using 14410 reviews in 4 domain areas, they effective features are applying feature selection methods.
obtained 4% accuracy for the bank and automobile datasets, One way is to use stop words. Stop words are usually
and 66% accuracy for the movie review datasets. They domain specific, so it is important to find out the domain
argued that movie reviews were difficult to classify, since dependent stop word list.
movie reviews usually contain description words such as
“bad scene” or “good scene” which are not sentiment words. Feature Extraction
Although they received a decent result, the way they Preprocessing
Movie Review
calculated the semantic orientation (SO) of phrases was not Data
efficient enough as it involved retrieving millions of online
documents to get the co-occurrence of two words.
Classification
Classification
A. Finding appropriate features positive or
Sentiment analysis is a kind of text classification task. In Data negative class
supervised learning, a number of machine learning Preprocessing Knowledge
algorithms can be used in text classification to classify text. Discovery
When using machine learning models, the major focuses of
supervised learning have two aspects: constructing Evaluation
appropriate feature spaces and choosing appropriate
classification algorithms. In the Nature Language Processing Fig. 2.1 Model For classification
(NLP) tasks, features are also called terms or tokens. It is
important to find out right features when using machine C. Machine learning classification method
learning models for text mining. In sentiment analysis, many Text classifications using machine learning methods usually
efforts have focused on finding right features to improve focus on finding right features, appropriate feature

978-1-4673-8810-8/16/$31.00 ©2016 IEEE


International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
weighting values, feature selection methods, and right F-measure Metric: Precision and recall can be combined to
machine-learning algorithms. The supervised machine produce a single metric known as F-measure, which is the
learning algorithms are usually corpus-based classification weighted mean of precision and recall.
methods, which are to find out co-occurrence patterns (e.g., In the final phase of this work the results are evaluated to
frequency) of words in the corpus to determine the find the issues, improvements and ways to extend the work.
sentiments of words or phrases. Bayes Theorem is the basic A summary of the obtained results and future scope is also
theorem of many classification algorithms in text discussed. The results obtained are compared to the previous
classifications. The theorem provides a way to calculate the works to obtain a comparative summary of the existing work
probability of hypothesis base d on its prior probability and the proposed work.
(Mitchell 2003, p156).
III. PROPOSED METHODOLOGY
Bayes theorem is expressed as , which
This section presents the proposed technique to improve the
is the cornerstone of Bayesian learning methods, because it
feature extraction and evaluation is done which is required
provides a way to calculate the posterior
for sentiment analysis. The proposed approach uses a
probability , from prior
combination of NLP techniques and supervised learning [7].
probabilities , in which, h is the
In the first stage a pre-processing model is proposed to
hypothesis. More intuitively, h is the target classification in
optimize the dataset. In the second stage experiments are
space H, and D is the training dataset. We are often
performed using NLP and the machine learning methods to
interested in determining the best hypothesis h from the
obtain the performance of various feature selection schemes.
space H. In our problem, the target classification (the best The process was carried out in the following steps, 1)
probable hypothesis) is positive or negative. So, the Bayes Identified the corpus on which the sentiment features has to
theorem provides a direct method for calculating such be extracted 2) Feature Extraction BoW 3) Classification- A
probabilities. binary classifier is trained on the training set and then
D. Classification performance evaluation applied to the test set 4) Better Feature Extraction – Stop
The evaluation metrics provide greater insight into word filtering – Bigram Collocations 5) Eliminate Low
Information Features/High Information Feature Selection 6)
the performance characteristics of a binary classifier.
Comparison of Evaluations.
Accuracy (Ai) is commonly used as a measure for
categorization techniques. Accuracy values, however, are
A. Classification
much less reluctant to variations in the number of correct
decisions than precision and recall: Consider movie_reviews.csv file, Each row in the dataset
contains the text of the review, based on the tone of the review
we need to classify as positive (1), or negative(1). Next few
Precision (πi) is determined as the conditional probability instances should be trained with an algorithm using the reviews
that a random document d is classified under ci, or what and classifications train.csv, and then make predictions on the
would be deemed the correct category. It represents the reviews in test.csv. It will be able to calculate error using the
classifiers ability to place a document as being under the actual classifications in test.csv, and see how good our
predictions were [8].
correct category as opposed to all documents place in that
Naive Bayes classification algorithm, A naive bayes classifier
category, both correct and incorrect: works by figuring out the probability of different attributes of the
data being associated with a certain class. This is based on bayes’
theorem
Precision measures the exactness of a classifier. A higher
precision means less false positives, while a lower precision This basically states “the probability of A given that B is true
means more false positives. This is often at odds with recall, equals the probability of B given that A is true times the
as an easy way to improve precision is to decrease recall. probability of A being true, divided by the probability of B being
true.”
Recall (ρi) is defined as the probability that, if a random
document dx should be classified under category (ci), this B. BigramCollocation
decision is taken. Finding collocations requires first calculating the
frequencies of words and their appearance in the context of
other words. Often the collection of words will then requires
filtering to only retain useful content terms. Each ngram of
words may then be scored according to association measure,
Recall measures the completeness, or sensitivity, of a in order to determine the relative likelihood of each ngram
classifier. Higher recall means less false negatives, while being a collocation. The BigramCollocationFinder`` class
lower recall means more false negatives. Improving recall provide these functionalities, dependent on being provided a
can often decrease precision because it gets increasingly function which scores a ngram given appropriate frequency
harder to be precise as the sample space increases.

978-1-4673-8810-8/16/$31.00 ©2016 IEEE


International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
counts. A number of standard association measures are Table 3.1 Evaluation of a Sentiment Analysis
provided in bigram_measures and trigram_measures. TP Determined as a sentiment being
classified correctly as relating to a
Feature.
C. HIGH INFORMATION FEATURE SELECTION
FP Determined as a sentiment that is said
To find the highest information features, we need to to be related to the Feature incorrectly.
calculate information gain for each word. Information FN Determined as a sentiment that is not
gain for classification is a measure of how common a marked as related to a Feature but
feature is in a particular class compared to how common it is should be.
in all other classes. A word that occurs primarily in positive TN Sentiments that should not be marked
movie reviews and rarely in negative reviews is high as being in a particular Feature and are
not.
information. For example, the presence of the word
“magnificent” in a movie review is a strong indicator that IV IMPLEMENTATION
the review is positive. That makes “magnificent” a high A. Experimental Setup
information word. The point is to use only the most
informative features and ignore the rest. The corpus [9] has 1000 positive files and 1000 negative
One of the best metrics for information gain is chi files. In this paper, 3/4 of the files are used as the training
square. NLTK includes this in the BigramAssocMeasures set, and the rest as the test set. So it should give us 1500
class in the metrics package. To use it, first we need to training instances and 500 test instances. The classifier
calculate a few frequencies for each word, its overall training method provides a list of tokens in the form of
(feats, label), where feats is a feature dictionary and label is
frequency and its frequency within each class. This is done
the classification label. In our case, feats will be of the form
with a FreqDist for overall frequency of words, and a {word: True} and label will be one of ‘pos’ or ‘neg’.
Conditional FreqDist where the conditions are the class
labels. Once we have those numbers, we can score words B. Bag of Words Feature Extraction
with the BigramAssocMeasures .chi_sqfunction, then sort In the experiment, bag of words model was used to
the words by score and take the top 10000. We then put extract simple words as features, which is a simple
these words into a set, and use a set membership test in our dictionaries mapping a feature name to a feature value. A
feature selection function to select only those words that feature structure is a mapping from feature identifiers to
appear in the set. Now each file is classified based on the feature values, where each feature value is either a basic
presence of these high information words. value (such as a string or an integer), or a nested feature
structure. From the corpus for every word its feature name is
assigned a value True.
Example:
def word_feats(words):
return dict.fromkeys(words.split(), True)
print(word_feats("I love this sandwich."))
{'I': True, 'this': True, 'love': True, 'sandwich.': True}

C. Classification
Classification using machine learning can be in two steps:
1. Learning the model using the training dataset
2. Applying the trained model to the test dataset.
Sentiment analysis is a text classification problem and thus
any existing supervised classification method can be
applied. Naïve Bayes classifier is a simple probabilistic
classifier that is based on the Bayes theorem. This
classification technique assumes that the presence or
Fig. 3.1 Proposed model for improving feature extraction absence of any feature in the document is independent of the
presence or absence of any other feature. Naïve Bayes
D. Evaluation classifier considers a document as a bag of words and
There are various methods to determine effectiveness; assumes that the probability of a word in the document is
however, recision, recall, and accuracy are most often used. independent of its position in the document and the presence
To determine these, one must first begin by understanding if of other word. For a document d and class c, the experiment
the classification of a sentiment was a true positive (TP), was performed using Naive Bayes classifier for classifying
false positive (FP), true negative (TN), or false negative the movie reviews and the below results are obtained.
(FN)

978-1-4673-8810-8/16/$31.00 ©2016 IEEE


International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
on multiple words. Another possibility is the abundance of
neutral words, such words degrade the sentiment. Because
the classifier treats all words same and assign each word as
either pos or neg. so it is better to remove neutral or
meaningless words from the featuresets to improve metrics,
and only classify sentiment rich words.
In order to improve feature extraction two
modifications should be considered in word_feats 1) filter
out stopwords and 2) include bigram collocations which
significantly improves the accuracy. The classifier should
run on new feature dictionary and use the new features to
train the classifier.
Fig. 4.1 Training Set vs Test Set and Accuracy using A. 1) Stopword Filtering: Stopwords are words that are
Naive Bayes classifier
generally considered useless. Mostly these words are
In the above Fig. 4.1 the 10 most informative ignored because they are so common that including them
features are shown, which are highly descriptive would greatly increase the size of the index without
adjectives. Accuracy is found to be 73% but improving precision or recall. NLTK comes with
apparently people agree on sentiment only around a stopwords corpus that includes a list of 128 english
80% of the time. stopwords. The results after using filtered out bag of
D. Measuring Precision and Recall for Positive and words.
Negative Reviews Accuracy: 0.726
Here it is required to build 2 sets for each classification label pos precision: 0.649867374005 pos recall: 0.98
a reference set of correct values, and a test set of observed neg precision: 0.959349593496 neg recall: 0.472
values. Then we train a Naive Bayes Classifier. It will
collect reference values and observed values for each label Fig.4.3 stopword filtered bag of words
(pos or neg), then use those sets to calculate the precision, The results show that accuracy decreased by .2%, and pos
precision and neg recall dropped!
recall, and F-measure of the naive bayes classifier.
2) Bigram Collocations: Including bigrams will improve
E. precision: 0.651595744680851
pos pos recall: 0.98 accuracy. The hypothesis is that people say things like
pos F-measure: 0.7827476038338657 “not great”, which is a negative expression that the bag
F.
neg precision: 0.9596774193548387 neg recall: 0.476 of words model could interpret as positive since it seems
neg F-measure: 0.6363636363636364
“great” as a separate word. The BigramCollocation
Fig. 4.2 Results- precision, recall, and F-measure using naive Finder maintains 2 internal FreqDists, one for individual
bayes classifier. word frequencies, another for bigram frequencies. Once
From the above results: it has these frequency distributions, it can score
1. Nearly every file that is pos is correctly identified, with individual bigrams using a scoring function such as chi-
98% recall. This means very few false negatives in the pos square. These scoring function measure the collocation
class. correlation of 2 words, basically whether the bigram
2. But, a file given a pos classification is only 65% likely to occurs about as frequently as each individual word.
be correct. Not so good precision leads to 35% false Most Informative Features
positives for the pos label. magnificent = True pos : neg = 15.0 : 1.0
3. Any file that is identified as neg is 96% likely to be outstanding = True pos : neg = 13.6 : 1.0
correct (high precision). This means very few false insulting = True neg : pos = 13.0 : 1.0
positives for the neg class. vulnerable = True pos : neg = 12.3 : 1.0
('matt', 'damon') = True pos : neg = 12.3 : 1.0
4. But many files that are neg are incorrectly classified.
('give', 'us') = True neg : pos =12.3 : 1.0
Low recall causes 52% false negatives for the neg label. ludicrous = True neg : pos =11.8 : 1.0
The above results are not satisfactory, because in general uninvolving = True neg : pos = 11.7 : 1.0
people use positives words in negative reviews, such as the avoids = True pos : neg = 11.7 : 1.0
word “not” is preceded by positive word “great” like “not ('absolutely', 'no') = True neg : pos = 10.6 : 1.0
great” which is a negative word. Accuracy: 0.816
pos precision: 0.753205128205 pos recall: 0.94
E. Better Feature Selection neg precision: 0.920212765957 neg recall: 0.692
The classifier uses the bag of words, which considers every
word as independent; it cannot learn that “not great” is a Fig. 4.4 Evaluation after including Bigram collocation
negative. In such cases it is better to train the classifier The above results show that the bigram collocation has
improved the accuracy to almost 9% pos precision has

978-1-4673-8810-8/16/$31.00 ©2016 IEEE


International Conference on Advances in Human Machine Interaction (HMI - 2016),
March 03-05, 2016, R. L. Jalappa Institute of Technology, Doddaballapur, Bangalore, India
increased over 10% with only 4% drop in recall neg recall precision. Primarily in positive movie reviews and rarely in
has increased over 21% with just under 4% drop in negative reviews is high information.
precision. So it appears that the bigram hypothesis is correct The accuracy is over 20% higher when using only the best
and it is better to include significant bigrams which can 10000 words and pos precision has increased almost
enhance the classifier effectiveness. 24% while neg recall improved over 40%. These are huge
F. Eliminate Low Information Features/High Information increases with no reduction in pos recall and even a slight
Feature Selection increase in neg precision. Primarily in positive movie
In the previous classification, experiment was carried by reviews and rarely in negative reviews is high information.
considering the 200 best bigrams which shows a high
improvement on the accuracy and pos precision. When the V. Conclusion and Future Scope
classification model has hundreds or thousands of features, The proposed work presents an approach for sentiment
many (if not most) of the features are low information. analysis by comparing the different combination with
Individually they are harmless, but in aggregate, low various feature selection schemes. We successfully analyzed
information features can decrease performance. the different schemes for feature selection and their effect on
For example, the presence of the word “magnificent” in sentiment analysis. The classification using High
a movie review is a strong indicator that the review is Information Features results in more accuracy than Bigram
positive. That makes “magnificent” a high information Collocation. The model proposed in this paper is just an
word. So eliminating low information features gives a better initial step towards the improvement in the techniques for
model by removing noisy data. It can save from overfitting sentiment analysis. It is worth exploring the capabilities of
and also from the curse of dimensionality. So only the the model for the dynamic data and extending the research
higher information features are considered it in turn increase using hybrid techniques considerable scope for
the performance which also decreases the size of the model, improvement in the corpus creation and the work can also
which results in less memory usage along with faster be extended to improve the results using various
training and classification. Using the same evaluate classification algorithms and also consider reducing
classifier method as on classifying with bigrams but using dimensionality.
Information Features the results obtained are evaluating best
word features: References
Most Informative Features
magnificent = True pos : neg = 15.0 : 1.0 [1] Mr. Rahul Patel , Mr. Gaurav Sharma, “A survey on text mining
outstanding = True pos : neg = 13.6 : 1.0 techniques”, International Journal Of Engineering And Computer
insulting = True neg : pos = 13.0 : 1.0 Science,Volume 3 Issue 5, 5621-5625, May-2014.
vulnerable = True pos : neg = 12.3 : 1.0
[2] Stylios George et al, “Public Opinion Mining for Governmental
ludicrous = True neg : pos = 11.8 : 1.0
avoids = True pos : neg = 11.7 : 1.0 Decisions” Electronic Journal of eGovernment Volume. 8 Issue 2
uninvolving = True neg : pos = 11.7 : 1.0 2010, (pp203-214)
astounding = True pos : neg = 10.3 : 1.0 [3] B. Pang et al, 2002, Thumbs up sentiment classification using
fascination = True pos : neg = 10.3 : 1.0 machine learning techniques, Proceedings of the ACL-02
idiotic = True neg : pos = 9.8 : 1.0 conference on Empirical methods in natural language processing,
Accuracy: 0.93 vol.10, 79-86.
pos precision: 0.890909090909 pos recall: 0.98 [4] Riloff, E &Wiebe, J., 2003, Learning extraction patterns for
neg precision: 0.977777777778 neg recall: 0.88
subjective expressions, EMNLP‟03.
FIG. 4.5 HIGH INFORMATION FEATURE SELECTION
[5] Loren Terveen et al, 1997, PHOAKS: A system for sharing
recommendations, Communications of the Association for
G. Comparison on Improving Feature Extraction Computing Machinery (CACM), 40(3):59–62;8
[6] Zhang H, Yu Z, XU M, Shi Y, Feature-level Sentiment Analysis
for Chinese Product Reviews, IEEE, 2011.
[7] Indurkhya N., Damereau F.J., (Eds). 2010. Handbook of Natural
Language Processing. 2nd Ed., Chapman & Hall/CRC, Boca
Raton.
[8] H.M.Wallach,“Topic modeling: beyond bag-of-words,” in
Proceedings of the 23rd International Conference on Machine
Learning (ICML '06), pp. 977–984, Pittsburgh, Pa, USA, June 06.
[9] L. Zhuang, F. Jing and X.Y. Zhu, "Movie review mining and
summarization," in Proceedings of the 15th ACM international
The accuracy is over 20% higher when using only the best 10000 conference on Information and knowledge management (CIKM
words and pos precision has increased almost 24% while neg '06), ACM, New York, NY, USA, 2006.
recall improved over 40%. These are huge increases with no [10] A. McCallum, and K. Nigam, “A comparison of event models for
naïve Bayes text classification”, Journal of Machine Learning
reduction in pos recall and even a slight increase in neg
Research, Vol. 3, 2003, pp. 1265–1287.

978-1-4673-8810-8/16/$31.00 ©2016 IEEE

You might also like