Professional Documents
Culture Documents
K. J. Somaiya Institute of Engineering & Information Technology, University of Mumbai, Maharashtra, India
COMM-AN
Opinion Mining of Customer Feedback
Abstract— Product reviews have now moved to the or even a group of people to do[1].
online platforms, also the sheer amount of reviews since This menial work, can be instead left to a
the past few years has skyrocketed. It is impossible to read machine/system to first differentiate fake reviews and
and get the overall view of what the customer feels about a comments, and then filter them out, and lastly to sort the
product. The idea of this project is to take over this time- reviews and comments according to what the user has
consuming job of examining the data fetched and entered.
unearthing the underlying meanings of the data. This will A general rule that you can apply to help you make
be done by first gathering the data, cleaning/pre- sense of customer feedback is to group it by:
processing the data, clustering of data and finally
1. Type(classifying into categories)
representing the yield in a diagrammatic/graphical format.
We aim to design this project for the creators, developers, 2. Theme(mentioned/asked/complained about)
and producers not for the customers browsing through 3. Code(purpose of comments, in a concise manner)
site(s). After classifying all of the reviews and comments as
mentioned above, we can simply see which category,
Keywords— Natural Language Processing(NLP), theme, feature is the most popular. This information when
Corpus, Opinion Mining, Sentiment Analysis, Topic forwarded to the concerned organization is very useful
Modelling, Data Term Matrix. and will be used to decide any future actions[2].
http://ssrn.com/link/2019-ICAST.html
Electronic copy available at: https://ssrn.com/abstract=3368898
2nd International Conference on Advances in Science & Technology (ICAST-2019)
K. J. Somaiya Institute of Engineering & Information Technology, University of Mumbai, Maharashtra, India
http://ssrn.com/link/2019-ICAST.html
Electronic copy available at: https://ssrn.com/abstract=3368898
2nd International Conference on Advances in Science & Technology (ICAST-2019)
K. J. Somaiya Institute of Engineering & Information Technology, University of Mumbai, Maharashtra, India
◆ Data designing: Defining the data scope words or terms present in a collection of text. Here, the
and visualizing it along with extra words/terms represent to columns and their
insights. frequency(number of occurrences) represent the rows. To
◆ Domain: defining the data according to determine the value of each entry in the matrix, various
the expertise domain of the data. schemes are used. One such scheme is TF-IDF.
NLTK, it is a toolkit that consists of For instance, if one has the following two (short)
superior packages,it is used to make computers documents:
understand the human language with the intent of D1 = "I like databases" D2 = "I hate databases"
generating a response similar to humans.
Then the document-term matrix would be:
After text corpus, we use document-term matrix Table 1 : Example Data Term Matrix
(DTM), It is basically a matrix, with documents
designated by rows and words by columns, that the
elements are the counts or the weights (usually by TF-
IDF). Subsequent analysis is usually based creatively on
DTM. We perform Exploratory Data Analysis (EDA) on
DTM, EDA refers to the process of performing initial
investigations on data so as to discover patterns,
anomalies with the help of summary statistics and
graphical representations, it is all about making sense of
data in hand, before using it for the required purpose. Table 1 shows which documents contain which terms
Following these steps, NLP techniques are applied as and how frequently they appear.
mentioned above. Using this DTM we can gain insight on the most
Following is the algorithm for implementation : frequently occurring terms in the documents/collection of
documents. If the Data Cleaning step A is not done well,
Data Pre-processing (Cleaning and Spell Checking) - then the DTM will contain useless or erroneous data, for
It is the major task in the text mining process, where example including stop-words like ‘the’ or ‘a’ which are
we filter lots of unwanted terms from the bag of used in regular sentences quite frequently will be
words(fetched data). For further analysis like pattern included in the most frequent words, giving us no useful
finding it is preferred to treat each document as a bag of insight of the data. Another example of erroneous data
words- as a set of all words with the frequency of the word disrupting the useful data gained from a DTM is the
occurred in that document, also known as a corpus. inclusion of incorrect words, if these words are not
Here we have some cleaning methods to preprocessing corrected or removed they will either be counted as a
our corpus. Some documents have the implicit structure separate column or not joining with the column it as
terms like titles, sections, paragraphs etc. Step by step of supposed to be under, for example the word ‘good’
preprocessing task is as follows: occurs 5 times, and the error word ‘goood’ occurs 1 time,
Step 1: Convert all upper case letters into lower case this word will make its own column instead of being
letters example: “GAME” is converted into “game” here included the column of ‘good’.
all upper case letters are changed into lower case letters. Gaining and Displaying Insights -
Step 2: In this step we remove the punctuation marks Now that we have made the DTM for the documents,
like eg., . ? etc ) and replace with a single space character, we need to process it to gain some useful insights from it.
so as to not unintentionally merge words. This can be achieved by implementing Word Cloud and
Step 3: Removing of stop words, here we remove the Sentiment Analysis among other techniques.
stop words like “the, a, of etc” such types of words are ❏ Word Cloud - is a diagrammatic representation of
unused for text analysis. We can also remove some topic- the frequency/importance of a word or term in a
based common keywords from documents. document. In this step we need to simply apply
Step 4: The spell checking process is then used to find the wordcloud package in python, and using the
words not existing in the dictionary and then finding a DTM to provide the frequency of said term. The
suitable replacement for that word that is close enough to higher the frequency count of a word, the larger
the word not found. But for words that have no match are the word in the word cloud. Output of this is in
left as is. Fig. 2.
Step 5: Stemming process, one of the most important ❏ Sentiment Analysis - is used to determine the
task in pre-processing since it can transform words to polarity and subjectivity of document or set of
their roots.Example : removal of prefixes or postfixes of comments(in this case). Polarity is the value set
words, such as -ING, -S, -TION, -ALY, -LY, -IOUS, - to each term which differentiates them into
IOUSLY, -ED, -EDLY, etc. degrees of positivity and negativity within a
range of (-1 to 1). Subjectivity is the value set
Step 6: After completion of the above 5 steps some within the range of (-1 to 1), showing the degree
excess white spaces are generated such type of white of the term being a fact and opinion. The overall
spaces are removed from documents. polarity and subjectivity values are the mean of
Data Representation (Document-Term Matrix) - the polarity and subjectivity of each term in the
A document-term matrix(DTM) or term-document document. Eg. the word ‘good’ is set a polarity
matrix(if transposed) is a matrix of the frequency of value of 0.7 and subjectivity value of 0.6,
http://ssrn.com/link/2019-ICAST.html
Electronic copy available at: https://ssrn.com/abstract=3368898
2nd International Conference on Advances in Science & Technology (ICAST-2019)
K. J. Somaiya Institute of Engineering & Information Technology, University of Mumbai, Maharashtra, India
http://ssrn.com/link/2019-ICAST.html
Electronic copy available at: https://ssrn.com/abstract=3368898
2nd International Conference on Advances in Science & Technology (ICAST-2019)
K. J. Somaiya Institute of Engineering & Information Technology, University of Mumbai, Maharashtra, India
graphical format (pie chart). Shows the number of spam-like [12] P. Nie, J. Li, S. Khurshid, R. Mooney, and M.
comments from the total number of comments. This is Gligoric, "Natural Language Processing and Program
determined by applying a weighted dictionary filled with Analysis for Supporting Todo Comments as
spam words with their weights(how often that word is used Software Evolves" The University of Texas at
in spam). Austin, 2014.
[13] Q. Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B.
V. CONCLUSION AND FUTURE SCOPE Swen, and Z. Su,“Hidden Sentiment Association in
Chinese Web Opinion Mining”, Proc. 17th Int’l
Reviewing product comments and feedback will help
Conf. World Wide Web, 2008.
the users(producers, creators, and developers) to better
understand how their product audience feels on an [14] I. Smeureanu, C. Bucur, "Applying Supervised
average about their product(s). Using NLP we can read Opinion Mining Techniques on Online User
through innumerable amounts of comments posted by Reviews", Informatica Economică 2012.
customers and through cleaning, filtering and processing [15] G. Qiu, C. Wang, J. Bu, K. Liu, and C. Chen,
them, we can get an output that is concise and “Incorporate the Syntactic Knowledge in Opinion
understandable format. In the future, we could add more Mining in User-Generated Content,” Proc. WWW
complex and accurate topic modeling algorithms, along 2008 Workshop NLP Challenges in the Information
with implementing real-time analysis of the comments Explosion Era, 2008.
submitted by users. [16] A. Jebaseeli, Dr. E. Kirubakaran “M-Learning
REFERENCES Sentiment Analysis with Data Mining Techniques”,
[1] S. Manke, N. Shivale, “A Review on : Opinion International Journal of Computer Science And
Mining and Sentiment Analysis based on Natural Telecommunications, August 2012.
Language Processing” International Journal of [17] A. Berger, S. Della Pietra, and V.J. Della Pietra
Computer Applications 2015. 1996. "A maximum entropy approach to natural
[2] M. Pfaff, H. Krcmar, “Natural Language Processing language processing", Computational Linguistics.
Techniques for Document Classification in IT [18] G. Angulakshmi, Dr. R. Manickachezian, “An
Benchmarking” Conference Paper 2015. Analysis on Opinion Mining : Techniques and
[3] D. Bhattacharyya, S. Biswas, T. Kim, “A Review on Tools”, International Journal of Advanced Research
Natural Language Processing in Opinion Mining” in Computer and Communication Engineering, July
International Journal of International Journal of 2014.
Smart Home 2010. [19] Q. Mei, X. Ling, M.Wondra, H. Su, and C. Zhai,
[4] V. C. Cheng, C.H.C. Leung, J. Liu, A. Milani, "Topic sentiment mixture: Modeling facets and
“Probabilistic Aspect based mining model for drug opinions in weblogs", Proc. 16th Int. Conf. WWW,
reviews” IEEE transactions on Knowledge and Data USA, 2007.
Engineering 2014. [20] Z. Zhang, Q. Ye, Z. Zhang, Y. Li, “Sentiment
[5] J. Han, M. Kamber, J. Pei, “Data Mining: Concepts Classification of Internet Restaurant Reviews written
and Techniques”, Second Edition (The Morgan in Cantonese”, Expert Systems with Applications,
Kaufmann Series in Data Management Systems), 2011.
2006.
[6] B. Liu, “Sentiment analysis and subjectivity In:
Handbook of Natural Language Processing”, Second
Edition. Taylor and Francis Group, Boca 2010.
[7] R. Sharma, S. Nigam and R. Jain "Opinion Mining of
Movie Reviews at Document Level", International
Journal on Information Theory (IJIT), July 2014.
[8] N. Mishra, C.K.Jha "Classification of Opinion
Mining Techniques", International Journal of
Computer Applications , October 2012.
[9] P. Singh, M. Husain "Methodological Study of
Opinion Mining and Sentiment Analysis
Techniques", International Journal on Soft
Computing(IJSC), February 2014.
[10] T. Ahmad, M. Doja, "Ranking System for Opinion
Mining of Features from Review Documents", IJCSI
International Journal of Computer Science Issues,
July 2012.
[11] G. Krishna, S. Kavitha, S. Yamini, A. Rekha,
"Analysis of Various Opinion Mining Algorithms",
International Journal of Computer Trends and
Technology (IJCTT), April 2015.
http://ssrn.com/link/2019-ICAST.html
Electronic copy available at: https://ssrn.com/abstract=3368898