You are on page 1of 2

International Journal of Wisdom Based Computing, Vol.

1 (2), August 2011


Automated Intelligence Product Review Analysis

Anala Pandit#1, Pranav Anil Bhole*2

Research Scholar and Ph.D. student at CECS Department, University of Louisville,


Assistant Research Scientist at Embedded Design centre College of Engineering Pune.


Abstract The nuance of Business Intelligence makes the real difference among the rivals of corporates in the world. The corporate which implement additional effective strategy to run the business can rule the monopoly in the corporate world. Before launching the promising product, company launches the beta version for the getting the feedback from the users for amending the quality improving patches to the original product so that the product can be enhanced, user friendly, the way user want and eventually becomes a successful product. The product should be designed in the way user wants, this can be done using the effective feedback system from the users of the product. The paper describes the new approach of automated analysis of the feedbacks or reviews about the product from the social sites. The output of the analysis is graphical holistic approached in the form of tabular and charts showing the categorization of reviews based on the sentiments and notification of the reviews whether it is suggestion or complaint or recommendation and praising. KeywordsWeb Crawler, Lenient keyword matching algorithm, Corpus, WordNet etc.

valency [1] and its connotation (negative or positive) is stored in the 'Corpus' based on the most frequently used words from WordNet [5]. The sentiment analysis module opens each of the extracted text file in the multi-threading mode and generates the report based on the several types of comparisons of keywords. The paper involves the detailed description of method of classification based on the several types of comparisons like numerical valency[1], lenient percentage matching, key word spotting[9],lexical affinity [7]; statistical methods [4]; pre-processed models; a dictionary of affective concepts and lexicon [8]; common-sense knowledge base [8][9]; fuzzy logic [5]; knowledge-base approach [5], machine learning [6]; and domain specific classification [10] with the the adaptable Corpus to come up with the categorisation. II. EXTRACTION OF REVIEWS The social sites on the web are the major source of the reviews about the particular products. For example, has huge database of the reviews categorised based on the companies. There are thousands of reviews of products are uploaded by various vendors and customers, it is not feasible solution to manually copy paste the web data. A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms or Web spider. The Web Crawler is permits the automated extraction of the reviews from the web page based on the tag ID based detection of the text and extraction. Web Crawler creates the tree from the root link of the social site. Crawler parse the tree using Breath First Search algorithm and open the URL in the stream of the characters. Using the same flow, thousands of the web pages are internally opened by web crawler concurrently using the multi-threading technique and converted into the text files. III. SENTIMENT ANALYSIS MODULE Sentiment analysis module involves the line by line analysis of each and every plain text file which is given by the web crawler. Module start it's working by performing the general text cleaning processes.

I. INTRODUCTION The traditional tedious approach of reading the reviews one by one and manually analysing the sentiments in it has almost not feasible now because of numerous quantity of the reviews. When the reviews are used for amending the design of the product based on the customer's experience, the business intelligence comes into picture. Business Intelligence involves the concise information derived from the data and sentiments extracted from the reviews so that the manager can easily take the decision for the marketing and productivity of the designed product. The automated analysis has two parts, first is the extraction of the reviews from the Web or legacy software and second is the Sentiment analysis with report generation. Our sentiment analysis module gives the summary of the thousands of the reviews in the form of tree and graphs based categorisation (suggestion,complaint,recommendation). This module also consider the nested approach of sentiments For example, suggestion and complaint can be consecutive and so on. The sentiment analysis module consists of the lexical analysis based on the keyword spotting [9] and lenient percentage keyword matching algorithm. The adaptable mapping of the keyword and it's numerical

International Journal of Wisdom Based Computing, Vol. 1 (2), August 2011


A. Word Processing A sentence is fragmented into words separated by blanks so in English word processing i.e. separating the words in a sentence from each other is little trouble. B. Stop Words All the words do not have the same prior chance to be a keyword in a document. It is necessary to eliminate some of the tokens prior to extraction process based on following criteria: All tokens in the document should be identified by using delimiters such as spaces, tabs, new lines, dots, etc. Even though prepositions may appear many times in a text, they should not be labelled as a keyword. Non-alphanumeric characters and numbers should be eliminated. C. Frequency It is the number of times a keyword appears in text. It is obvious that the more important phrases will be more used in a text. However, using only frequency as a measure may cause some problems. Usually prepositions such as the, that, this, etc. appear much more than any other words even though they have no value as a keyword. Such words should be eliminated. D. Stemming Stemming became very useful to deal with linguistic issues. There can be some keywords which are originated from a same word. For ex: consider words Parse and Parsing, in this case Pars is the word which can represent both mentioned words, which can be very useful in further processing. So the Pars is used as keyword to be matched. After completing the basic process of plain text analysis, filtered text files are pass to the next process of evaluating the numerical valence of each and keyword in the text files. The temporary mapping of keyword and numerical valency is stored in cache. IV. CORPUS Sentiment analysis uses the corpus as a knowledge base, it stores the numerical valency [1] for each and every keyword and its connotation. The worNet [5]is used to initially fill up the database. The corpus is adaptable and huge entries of the keywords which can be amended as we get any new keywords A. Sentiment analysis using Percentage matching The corpus involves the lenient matching of keywords. We can set the threshold for qualifying the matching. For example, 'Administrator' matches with 'mini' with almost the 70% leniency. B. Addition of Jargon to the corpus Adaptable corpus can store the jargon related to particular dialect or professions.

C. Product categorisation There are well developed algorithms on the categorisation, all they need is the only well trained corpus. The corpus contains the entry for the categories like Suggestion , Complaint, Recommendations filled with the numerical valency[1] . V. OUTPUT The tabular and graphical output is very essential for the managerial level for easy analysis. The sentiment analysis module gives the categorisation of reviews which can be displayed in the charts and bars. The suggestions can be read carefully if the numerical valency[1] of overall review is more. VI. CONCLUSION The automated extraction and analysis of the reviews permits the higher level authority for the quick decision making and marketing of the products that where the business intelligence flashes the business to grow towards the success. ACKNOWLEDGMENT The automated intelligence product review analysis is one of the project which I have written in the Java. IEEE Explore helped me a lot for sentiment analysis module core logic. Special thanks to web crawler Lucene for logic of extraction. REFERENCES
[1] An Analytical Approach to Assess Sentiment of Text. IEEE paper by Mostafa Al Masum Shaikh, Helmut Prendinger, and Mitsuru Ishizuka English Vocabulary, B. Eriksson, Sentiment classification of movie reviews using linguistic parsing. Accessed Oct. 2006. Available online. URL: C. Fellbaum, (ed.), WordNet: An Electronic Lexical Databases, S. Fitrianie and L. J.M. Rothkrantz, Constructing Knowledge for Automated Text-Based Emotion Expressions, In Proceedings of CompSysTech, (June 15-16, Tarnovo, Bulgaria), 2006 J. Kamps and M. Mar, Words with Attitude, In Proceedings ofR. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, High-speed digital-to-RF converter, U.S. Patent 5 668 842, Sept. 16, 1997. The First International WordNet conference, (Jan. 21-25, Mysore, India), 2002. S. Knobloch, Affective News- Effects of Discourse Structure in Narratives on Suspense, Curiosity, and Enjoyment While Reading News and Novels, Communication Research 31, 3, pp. 259-287 H. Liu, H. Lieberman, and T. Selker, A Model of Textual Affect Sensing using Real-World Knowledge, In Proceedings of the Seventh International Conference on Intelligent User Interfaces, (January 12-15, Miami, FL), ACM,2003, pp. 125-132. H. Liu and P. Singh, ConceptNet: A Practical Commonsense Reasoning Toolkit, BT Technology Journal 22, 4, pp. 211-226 Oct. 2004, Kluwer Academic Publishers. R. Mihalcea and H. Liu, A corpus-based approach to finding happiness, Computational approaches for analysis of weblogs -happiness.pdf. T. Nasukawa and J. Yi, Sentiment Analysis: Capturing Favorability Using Natural Language Processing, In Proceedings of the 2nd international conference on Knowledge CAPture, (Oct. 2-5, Sanibel Island, FL), 2003, ACM Press, pp. 70-77. A. Ortony, G.L. Clore, and A. Collins, The Cognitive Structure

[2] [3]

[4] [5]


[7] [8]