You are on page 1of 5

1

Product review analysis Using Machine Learning

Abstract- Sentiment Analysis is the process of analysing  Very Negative


customers reviews related to any product using natural
language processing. A product can not always be rated This is usually referred to as fine-grained sentiment
solely on the basis of rating measure we need to analyse the analysis. This could be, for example, mapped onto a 5-
text so as to get in depth meaning of a review. The main aim star rating in a review, e.g.: Very Positive = 5 stars
of the paper is to apply various machine learning algorithms and Very Negative = 1 star.
like random forest,SVM(Support Vector Machine),Logistic Some systems also provide different flavors of
Regression on the datasets and thus identify whether the polarity by identifying if the positive or negative
review is positive or not. sentiment is associated with a particular feeling, such
as, anger, sadness, or worries (i.e. negative feelings) or
Index Terms-,product review,classification,dataset,random happiness, love, or enthusiasm (i.e. positive feelings).
forest,SVM,logistic regression,K-NN. Emotion detection Emotion detection aims at
detecting emotions like, happiness, frustration, anger,
sadness, and the like. Many emotion detection systems
resort to lexicons (i.e. lists of words and the emotions
I. INTRODUCTION they convey) or complex machine learning algorithms.
One of the downsides of resorting to lexicons is that
the way people express their emotions varies a lot and
Internet is full of data and people give their opinion about so do the lexical items they use. Some words that
everything and anything.It is extremely difficult to make a would typically express anger like shit or kill (e.g. in
correct choice when their is a large amount of data and no your product is a piece of shit or your customer
good method to make a decision.We need a method to support is killing me) might also express happiness
extract the sentiment out of a data and use it as a make a (e.g. in texts like This is the shit or You are killing it).
sensible choice , to solve such problem we use sentiment Aspect-based Sentiment Analysis Usually, when
analysis.SENTIMENTAL ANALYSIS is a kind of text analyzing the sentiment in subjects, for example
classification based on Sentimental Orientation (SO) of products, you might be interested in not only whether
opinion they contain. Sentiment analysis of product reviews people are talking with a positive, neutral, or negative
has recently become very popular in text mining and polarity about the product, but also which particular
computational linguistics research. aspects or features of the product people talk about.
That’s what aspect-based sentiment analysis is about.
Types of Sentiment Analysis In our previous example:
”The battery life of this camera is too short.”
There are many types and flavors of sentiment analysis and The sentence is expressing a negative opinion about
SA tools range from systems that focus on polarity the camera, but more precisely, about the battery life,
(positive, negative, neutral) to systems that detect feelings which is a particular feature of the camera.
and emotions (angry, happy, sad, etc) or identify intentions Intent analysis
(e.g. interested v. not interested). In the following section, Intent analysis basically detects what people want to
well cover the most important ones. do with a text rather than what people say with that
text. Look at the following examples:
Fine-grained Sentiment Analysis Your customer support is a disaster. Ive been on hold
for 20 minutes.
Sometimes you may be also interested in being more I would like to know how to replace the cartridge.
precise about the level of polarity of the opinion, so instead Can you help me fill out this form?
of just talking about positive, neutral, or negative opinions A human being has no problems detecting the
you could consider the following categories: complaint in the first text, the question in the second
text, and the request in the third text. However,
 Very positive machines can have some problems to identify those.
 Positive Sometimes, the intended action can be inferred from
 Neutral
 Negative
2

the text, but sometimes, inferring it requires some 2013


contextual knowledge. [10] Fan Sun, Ammar Belatreche, Sonya Coleman,T. M.
McGinnity ,Yuhua Li Pre-processing Online
Multilingual sentiment analysis Financial Text for Sentiment Classification: A Natural
Language Processing Approach
Multilingual sentiment analysis can be a difficult task. [11] D V Nagarjuna Devi, Chinta Kishore Kumar,Siriki
Usually, a lot of preprocessing is needed and that Prasad, A Feature Based Approach for Senti-
preprocessing makes use of a number of resources. ment Analysis by Using Support Vector Machine, 2016
Most of these resources are available online (e.g. IEEE 6th International Conference on Advanced
sentiment lexicons), but many others have to be Computing
created (e.g. translated corpora or noise detection [12] Shoiab Ahmed, AjitDanti, A Novel Approach for
algorithms). The use of the resources available Sentiment Analysis and Opinion Mining based on
requires a lot of coding experience and can take long SentiWordNet using Web Data, 2015 International
to implement. Conference on Trends in Automation, Communications
and Computing Technology (I-TACT-15)

II. LITERATURE SURVEY


III. IMPLEMENTATION
This part consist of papers that are surveyed:
i)Dataset
[1] Anjali Ganesh Jivani, A Comparative Study of
Stemming Algorithms, International. Journal. Com- The dataset of Product reviews comprises of 2 attributes and
puter. Technology. Applications., Vol 2 (6), 1930-38, 1000 tuples.The data in dataset were given by either online
ISSN:2229-6093. platform or local shops.
[2] Ahmad Kamal , Subjectivity Classification using The dataset comprises of the following attributes:
Machine Learning Techniques for Mining Feature - ⮚ Review
Opinion Pairs from Web Opinion Sources, International
⮚ Liked
Journal of Computer Science Issues (IJCSI), Volume
10 Issue 5, 2013, pp 191-200.
[3] J. Bollen, H. Mao, and X. Zeng, Twitter mood predicts ii)Data-preprocessing
the stock market, J. Computer Science., vol.
Data-preprocessing means transforming the data before
2, no. 1, pp. 18, Mar. 2011.
using it .It is done to convert the raw data into clean data.
[4] B. OConnor, R. Balasubramanyan, B. R. Routledge,
Due to data-preprocessing,it becomes easy to process the
and N. A. Smith, From tweets to polls: Linking
data and thus gives the better result.The example of the data-
text sentiment to public opinion time series, in Proc. 4th
preprocessing is to fill the null values in the dataset.
Int. AAAI Conf. Weblogs Social Media, Wash-
The three techniques used in data-preprocessing are:
ington, DC, USA, 2010.
[5] G. Mishne and N. Glance, Predicting movie sales from ● Rescale data
blogger sentiment, in Proc. AAAI-CAAW, ● Binarize the data
Stanford, CA, USA, 2006. ● Standardize the data
[6]. J. McAuley and J. Leskovec. From amateurs to
connoisseurs: modeling the evolution of user exper-
tise through online reviews. WWW, 2013.
iii)Machine learning Algorithms used:
[7]. Jeffrey Breen. twitter-sentiment-analysis-tutorial-
201107. https://github.com/jeffreybreen/twitter-
A. SVM(Support Vector Machine)
sentiment-analysis-tutorial-
201107/blob/master/data/opinion-lexicon-English. This algorithm aims to focus on to find the hyperplane in N-
[8] Pablo Gamallo,MarcosGarciaCitius: A Nave Bayes dimension plane which classifies the data-points distinctly.
Strategy for Sentiment Analysis on English Tweets Hyperplanes are the boundries which classifies the
Proceedings of the 8th International Workshop on datapoints distinctly.Various hyperplanes are created and
Semantic Evaluation (SemEval 2014), pages 171175, hyperplane with the largest margin is choosed in this
Dublin, Ireland, August 23-24 2014. algorithms to classify.The more is the margin,the less is the
[9] Ricardo Baeza-Yates, Berthier Ribeiro-Neto,Modern error.
Information Retrieval, ISBN-13: 978-0201398298,
3

B. Random Forest algorithm Name of the classification algorithm Accuracy


1. SVM 70.85%
It is the flexible,easy to use algorithm in the field of machine 2. K-NN 75.42%
learning.It is a supervised algorithm and can be used for both 3. Linear Regression 95.71%
classification and regression.Random forest reduce the 4. Random forest 90.71%
variance that create the disturbances in the results.We can 5. Naïve Bayes 58.85%
find the output of the individual tree through majority voting Result of algorithms
and thus smoothing out the variance to increase the accuracy
of the results.
V.CONCLUSION
C. Logistic Regrssion

It is another technique in the field of machine learning from In this paper,various algorithms are applied on the
field of statistic.It is used in binary classification method.To dataset.Various algorithm such as Random
squash the value between 0 and 1,it makes use of logistic forest,SVM,logistic regression,etc are applied to the
function or sigmoid function which is a S shaped graph. dataset.These algorithms gave different accuracy for the
dataset.
D. Naïve Bayes algorithm On analyzing the resul,it is seen that logistic regression
regression turns out to be the best algorithm with the
Although this algorithm is simple but is very powerful accuracy of 95.71%.
algorithm for predictive model in classification.This
classifier is from the family of probabilistic classifier which
makes use of the Bayes theorem with strong independence
assumption between features.It considers each feature to
contribute independently to the probability regardless of any
possible correlation between the attributes.

D. K-NN(K Nearest Neighbour)


It is a supervised algorithm which is used for classification
of data. K is the value of neighbour which we need to define
before fitting the model to data.

IV.RESULT AND ANALYSIS


In this section,different algorithms are compared to each
others on the basis of following factors:

i)Accuracy

It is the ratio of the correct predicted to the total number of


predictions.
Formula of accuracy is:-

Each algorithms give its own different result.


The result of each algorithm is shown in table below:
4
5

You might also like