You are on page 1of 56

MOVIE REVIEW AGGREGATION SYSTEM

A PROJECT REPORT

Submitted by

ADITYA [Reg No: RA1511008010509]


ABHINAV GUPTA [Reg No: RA1511008010264]
SHUBHAM KUMAR [Reg No: RA1511008010351]
RAVI PRAKASH UPADHYAY [Reg No: RA1511008010385]

Under the Guidance of

Mr. J.PRABAKARAN
(Assistant Professor, Department of Information Technology)

In partial fulfillment of the Requirements for the


Degree of

BACHELOR OF TECHNOLOGY IN
INFORMATION TECHNOLOGY

DEPARTMENT OF INFORMATION TECHNOLOGY


FACULTY OF ENGINEERING AND TECHNOLOGY SRM
INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603203
MAY 2019
SRM INSTITUTE OF SCIENCE AND
TECHNOLOGY KATTANKULATHUR-603203

BONAFIDE CERTIFICATE

Certified that this project report titled “MOVIE REVIEW AGGREGATION


SYSTEM” is the bonafide work of “ABHINAV GUPTA [Reg No:
RA1511008010264], ADITYA [Reg No: RA1511008010509], SHUBHAM KUMAR
[Reg No: RA1511008010351], RAVI PRAKASH UPADHYAY [Reg No:
RA1511008010385], who carried out the project work under my supervision. Ensured
further, that to the best of my insight the work announced in this does not shape some
portion of some other proposition or paper based on which a degree or grant was given on
a before event for this or some other applicant.

Signature of the Supervisor Signature

Mr. J.PRABAKARAN Dr. G. VADIVU


GUIDE HEAD OF THE DEPARTMENT
Assistant Professor Dept. of Information Technology
Dept. of Information Technology

Signature of Internal Examiner Signature of External Examiner

ii
ABSTRACT

Individual's conclusion has turned out to be one of the critical hotspots for different

administrations in regularly developing prominent interpersonal organizations.

Specifically, online assessments and criticism have transformed into a sort of virtual

money for organizations hoping to advertise their items, distinguish new chances, and

deal with their notorieties. All things considered, recommender structures are described as

the supporting systems which help customers to find information, things, or

organizations, (for instance, books, movies, music, propelled things, destinations, and TV

programs) by totaling and separating suggestions from various customers, which infers

reviews from various specialists, and customer characteristics. In the wake of survey such

audits, they take their choices. In this way, such surveys must be right and appropriate.

India makes the most number of movies on earth, around 1,500 to 2,000 reliably in excess

of 20 tongues, according to another report by Deloitte. This is far over the 700 or so films

made in the US and Canada every year anyway with regards to choose for what motion

picture to watch Indian crowd ends up in a hole. There is plenitude of motion picture

pundits and news which highlights film dependent on your nearby postings yet in the age

of paid surveys and obstinate motivation to examine, there is no real way to discover

authentic appraisals of open. We might want to take this up as a test to furnish our group

of onlookers with this significant device to improve their motion picture watching

background. Our framework is a film survey framework which will give the conclusion

identified with the motion pictures which are discharged. In contrast to different

frameworks, we will give rating by breaking down just the remarks of the general

population (no substantial input).

iii
ACKNOWLEDGEMENT

The completion of any inter-disciplinary project depends upon cooperation, coordination

and combined efforts of several sources of knowledge. We are grateful to Mr.

J.Prabakaran For his even willingness to give us valuable advice and direction,

whenever we approached him with a problem. We are grateful to him for giving right

direction to this undertaking and providing immense guidance for this project.

We are also thankful to Mrs. K.Nimala and Mr. M.Anand for their immense guiding in

Sentimental Analysis Part of our project.

We are also thankful to Dr. G. Vadivu Prof. & Head, Department of Information

Technology and all the staff members for their immense cooperation and motivation of

completing out the project.

ADITYA [Reg No: RA1511008010509]


ABHINAV GUPTA [Reg No: RA1511008010264]
SHUBHAM KUMAR [Reg No: RA1511008010351]
RAVI PRAKASH UPADHYAY [Reg No: RA1511008010385]

iv
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.


ABSTRACT iii
ACKNOWLEDGEMENT iv
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
LIST OF SYMBOLS x
1. INTRODUCTION 1
1.1 MULTIPLE-STRATEGY SENTIMENT ANALYSIS OF NB 1
1.2 PROBABILISTIC MODEL OF NB 2
1.3 PROBLEM STATEMENT 4
2 LITERATURE REVIEW 5
3 PROPOSED METHODOLOGY 15
3.1 MOVIE REVIEWS 16
3.2 PRE-PROCESSOR 16
3.3 NAIVE BAYES CLASSIFIER 17
3.4 TRAINING DATASET 18
3.5 SENTIMENT PROFILE GENERATION 19
3.6 DYNAMICALLY CHANGE OF RATINGS 19
3.7 COLLABORATIVE DIAGRAM 20
3.8 USE-CASE DIAGRAM 20
4 PROCEDURES 21
4.1 COLLECTION OF USER REVIEWS 21
4.2 PRE-PROCESSING 21
4.3 FEATURE SELECTION 24
4.4 SENTIMENT WORD IDENTIFICATION 24
4.5 SENTIMENT POLARITY IDENTIFICATION 25
4.6 SENTIMENT CLASSIFICATION 26
4.7 ANALYSIS OF REVIEWS 26
5 REQUIREMENTS 27
5.1 FUNCTIONAL REQUIREMENTS 27
5.2 NON-FUNCTIONAL REQUIREMENTS 27
5.3 HARDWARE REQUIREMENTS 28
5.4 SOFTWARE REQUIREMENTS 28

v
6 SYSTEM TESTING 29
6.1 TYPES OF TESTING 29
6.2 TESTING OJECTIVES 30
6.3 TESTING OUTCOMES 30
7 RESULTS 32
8 CONCLUSION 35
9 FUTURE ENHANCEMENTS 36
REFERENCES 37
APPENDIX 38
PAPER PUBLICATION STATUS 45
PLAGIARISM REPORT 46

vi
LIST OF TABLES

Table 5.1 Hardware Requirements............................................................................28


Table 5.2 Software Requirements.............................................................................28
Table 6.1 Polarity Index............................................................................................30
Table 6.2 Reviews Testing........................................................................................31

vii
LIST OF FIGURES

Figure 3.1 Architecture diagram...................................................................................15


Figure 3.2 Collaborative diagram.................................................................................20
Figure 3.3 Use-Case diagram.......................................................................................20
Figure 7.1 Dashboard...................................................................................................32
Figure 7.2 Description page.........................................................................................33
Figure 7.3 Comment box.............................................................................................34

viii
ABBREVIATIONS

AI Artificial Intelligence

NB Naive Bayes

GUI Graphical User Interface

SA Sentimental Analysis

SVM Support Vector Machine

NLU Natural Language Understanding

NLP Natural Language Processing

ix
LIST OF SYMBOLS

| Conditional Probability
+ Addition

x
CHAPTER 1

INTRODUCTION

The prior decade or so has seen a gigantic effect in the extent of data and information that
has been encountering social affiliations. These contemporary systems pass on an
extensive number of focus focuses, and billions of edges, with terabytes of data being
passed around on a moment to minute reason. Examining this data to watch structures is
an amazingly essential mechanical assembly tooth in the midst of the time spent data,
where these systems mull over the qualities of their clients. A significant part of the time,
these properties are gotten from basic sensible examination of the data we watch.
Supposition examination is just a language taking care of errand that utilizes a
computational technique to oversee perceive fearless substance and driving force it as
positive or negative. The unstructured insightful information on the Web a great part of
the time passes on articulation of suppositions of clients. Estimation examination attempts
to perceive the outpourings of end and demeanor of scientists. A reasonable inclination
examination count composes a report as 'positive' or 'negative', in context on the
evaluation granted in it. The record level inclination examination issue is fundamentally
as seeks after: Given a great deal of chronicles D, an appraisal examination estimation
arranges each report d has a spot with D into one of the two classes, positive and
negative. Positive engraving demonstrates that the report d bestows a positive supposition
and negative name deduces that d passes on a negative evaluation of the client.
Powerfully present day tallies attempt to see the sentiment at sentence-level, join
measurement or substance level.

1.1 Multiple-Strategy sentiment analysis of NB

The Naïve Bayes Algorithm (NB) computation is extensively used in report game plan as
a classification. Given a particular feature names, it figures the back probabilistic measure
of a report identifying with various classes and after that allocates it to the most vital
probability class. Finally, examination, Naïve Bayes Algorithm first courses of action
with stamped planning corpus where the estimation polarities of each record are known.
It tokenizes every article in planning corpus and concentrates estimation words. By then it
enrolls the back probability of every incline word and records them in a probability table.
1
Naive Bayes is a simple technique for constructing classifiers: models that assign class
labels to problem instances, represented as vectors of feature values, where the class
labels are drawn from some finite set. There is not a single algorithm for training such
classifiers, but a family of algorithms based on a common principle: all naive Bayes
classifiers assume that the value of a particular feature is independent of the value of any
other feature, given the class variable. For example, a fruit may be considered to be an
apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier considers
each of these features to contribute independently to the probability that this fruit is an
apple, regardless of any possible correlations between the color, roundness, and diameter
features.

For some types of probability models, naive Bayes classifiers can be trained very
efficiently in a supervised learning setting. In many practical applications, parameter
estimation for naive Bayes models uses the method of maximum likelihood; in other
words, one can work with the naive Bayes model without accepting Bayesian probability
or using any Bayesian methods.

Despite their naive design and apparently oversimplified assumptions, naive Bayes
classifiers have worked quite well in many complex real-world situations. In 2004, an
analysis of the Bayesian classification problem showed that there are sound theoretical
reasons for the apparently implausible efficacy of naive Bayes classifiers. Still, a
comprehensive comparison with other classification algorithms in 2006 showed that
Bayes classification is outperformed by other approaches, such as boosted trees or
random forests. An advantage of naive Bayes is that it only requires a small number of
training data to estimate the parameters necessary for classification.

1.2 Probabilistic model of NB

Conceptually, Naive Bayes is a contingent likelihood probability model: given an issue


occurrence to be classified, represented by a vector shows various
features (independent variables), that it provides to this for each of K
possible outcomes or classes .
The issue with the above plan is that if the quantity of highlights n is substantial or in the
event that an element can take on countless, at that point basing, then such a model on
probability tables is infeasible. We along these lines reformulate the model to make it

2
progressively tractable. Utilizing Bayes' hypothesis, the contingent likelihood can be
deteriorated as

In simple English, using Bayesian hypothesis terminology, the equation above can be
rewritten as

By and by, there is intrigue just in the numerator of that part, in light of the fact that the

denominator does not rely upon and the estimations of the highlights are given,
with the goal that the denominator is successfully consistent. The numerator is proportionate

to the joint likelihood which can be modified as pursues, utilizing the


chain rule for rehashed uses of the meaning of restrictive likelihood:

3
1.3 Problem Statement

In this venture we will attempt to foresee the "genuine" appraisals for films. Our
assumption in this endeavor is that “true” ratings are sufficiently approximated by
Metacritic scores of movies. The point of this venture is to utilize properties on the chart
produced by the Amazon clients and motion picture evaluations, and acquire the
estimation of these "genuine" appraisals, utilizing data from Metacritic for training and
testing.

A number of websites allow Internet users to submit movie reviews and aggregate them
into an average. Community-driven review sites have allowed the common moviegoer to
express their opinion on films. Many of these sites allow users to rate films on a 0 to 10
scale, while some rely on the star rating system of 1–5, 0–5 or 0–4 stars. The votes are
then culled into an overall rating and ranking for any particular film. Some of these
community driven review sites include Reviewer, Movie Attractions, Flixter, FilmCrave,
Flickchart and Everyone's a Critic. Rotten Tomatoes and Metacritic aggregate both scores
from accredited critics and those submitted by users.

On these online review sites, users generally only have to register with the site in order to
submit reviews. This means that they are a form of open access poll, and have the same
advantages and disadvantages; notably, there is no guarantee that they will be a
representative sample of the film's audience. In some cases, online review sites have
produced wildly differing results to scientific polling of audiences.

Some websites specialize in narrow aspects of film reviewing. For instance, there are
sites that focus on specific content advisories for parents to judge a film's suitability for
children. Others focus on a religious perspective (e.g. CAP Alert). Still others highlight
more esoteric subjects such as the depiction of science in fiction films. One such example
is Insultingly Stupid Movie Physics by Intuitor. Some online niche websites provide
comprehensive coverage of the independent sector; usually adopting a style closer to print
journalism. They tend to prohibit adverts and offer uncompromising opinions free of any
commercial interest. Their film critics normally have an academic film background.

4
CHAPTER 2

LITERATURE REVIEW

Title 1: Sentiment Analysis of Movie Reviews

Our primer work makes two essential obligations. Regardless, it inquires about the
utilization of 'Adverb+Verb' consolidate with 'Adverb+Adjective' join for report level
supposition depiction of an audit.
The point of view measurement feeling course of action makes a precise and direct
estimation profile of a movie on different bits of intrigue. Strikingly, the perspective
estimation end profile result is flawless to the report level tendency social event of
investigations of a film.

The point of view measurement feeling examination algorithmic plan masterminded by


us is a novel and phenomenal procedure for getting a full scale assurance profile of a film
from various surveys on various bits of evaluation. The resultant tendency profile is
useful, clear, and completely productive for clients. Additionally, the algorithmic game
plan utilized for perspective estimation supposition.

Sentiment analysis is a well-known task in the realm of natural language processing.


Given a set of texts, the objective is to determine the polarity of that text provides a
comprehensive survey of various methods, benchmarks, and resources of sentiment
analysis and opinion mining. The sentiments can consist of different classes. In this study,
we consider two cases:

1) A movie review is positive (+) or negative (-). This is similar to, where they also
employ a novel similarity measure. In , authors perform sentiment analysis after
summarizing the text.

2) A movie review is very negative (- -), somewhat negative (-), neutral (o), somewhat
positive (+), or very positive (+ +). For the first case, we picked a Kaggle competition
called “Bag of Words Meets Bags of Popcorn”.

5
The challenge consists of two main parts. In the first part, we try a variety of basic
sentiment analysis techniques. This provides a reasonable baseline to asses further
complex methods. In the second part, we try different variants of the basic models. The
objective of this part is to train a binary classifier for movie reviews (i.e., output classes
are positive/negative). As in many natural language tasks, the first task here is to clean
up, and convert the input texts (movie reviews) into numbers. This can be done using a
variety of methods such as bag of words, word to vector, etc. Afterwards, we train the
classifier.

Title 2: A Google wave-based fuzzy recommender system to disseminate


information in University Digital Libraries 2.0

In today’s day and age, Digital Libraries 2.0 are chiefly established on the collaboration
between customers through shared applications, for instance, wikis, locales, etc or new
possible perfect models similar to the waves proposed by Google. This new thought, the
wave, addresses a run of the mill space where resources and customers can coordinate.

The issue develops when the amount of advantages and customers is at peak; by then
gadgets for helping the customers in their information needs and requirements are basic.

For this circumstance a fleecy etymological recommender structure reliant on the Google
Wave capacities is proposed as gadget for passing on researchers captivated by essential
research lines. The system allows and provides the development of a normal space by
strategies a wave as a strategy for cooperating and exchanging contemplations between a
couple of experts enthusiastic about a comparable topic.

In like manner, the structure suggests, in a customized way, a couple of experts and
supportive resources for each wave. These recommendations are figured after a couple of
as of late described tendencies and characteristics by techniques for feathery semantic
names. Along these lines the structure empowers the possible composed endeavors
between multidisciplinary experts and endorses correlative resources important for the
affiliation.

Digital information allows the storage, access and transmission of millions of resources in
an easy way but at the same time this fact involves problems for finding the suitable

6
information. This problem is present in digital libraries. Digital libraries are an extension
of the classic libraries where information about different topics can be found easily, all
available information is accessible through the Web. The apparition of digital libraries
has changed the perception of traditional libraries. Digital libraries can be focused on
different contexts. In our case, we are especially interested in the University Digital
Libraries (UDL). These kinds of libraries store information about books, electronic
papers, electronic journals or official dailies and user profiles. The advent of University
Digital Libraries meant a change in the life of the researchers, the amount of information
available grew amazingly and the necessary time to access to that information was
considerably reduced.

The first person who used the term Web 2.0 was Dale Dougherty from the company
O’Reilly Media in 2004 and from that moment, Tim O’Reilly started to use that term in
his conferences to refer to the new developments that the Web is undergoing. The precise
definition of Web 2.0 is not clear. Many definitions can be found but the researchers are
still discussing the definitive definition. It is not clear if the Web 2.0 is a new paradigm or
simply a natural evolution of the current Web. Web 2.0 is based on the user as the main
figure who is capable of creating, modifying and publishing the content of the Web pages
in collaboration with other users. The user is able to interact in simplified way with the
applications because they are very lightweight, and it is not required to be an expert in
computer science to write your own content in applications such as blogs, wikis, social
networks, etc. Many new services 2.0 are appearing everyday; Facebook, Flickr,
Wikipedia and Blogspot are some clear examples of this fact.

The continued development of new and innovative applications involves the appearance
of new paradigms, such as in the case of Google Wave,1 a new tool which is capable of
encapsulating typical functions from other Web applications such as RSS, blogs, chats,
wikis, social networks, etc. The application of the capabilities of this new technology to
the UDLs is one the objectives of this work in order to extend the concept of Library 2.0.
The first person who used the term Library 2.0 was Casey and since that moment many
related works have emerged. Xu depicted a model (see Fig. 1) of the Library 2.0 based
on three components,

(i) the information,

(ii) the users and

7
(iii) the librarians.

He summarizes several applications based on Web 2.0 tools (blogs, RSS, tagging, wikis,
social networks, and podcasts) applied to Academic Libraries and this is the objective of
this work as well, the application of the Google Wave technology to develop a
recommender system that will suggest users and digital resources for collaborative
purposes between the users of a University Digital Library, specially the researchers. This
system allows the reduction of the necessary time to find collaborators and information
about digital resources depending on the user needs. An example of an application of the
system would be when several research groups want to request a European project. These
research groups have decided to collaborate for research purposes and they request a
common environment (a wave) from the university staff. They are the first members of
the wave and for example the official announcement and other related documents are the
first resources of the wave, but it is necessary to find new partners and old documents
about the announcements from past years, etc. That is the moment in which the
recommender system suggests new participants and relevant resources from the library to
achieve the collaborative objectives of the wave.

Title 3: Multi-Strategy Sentiment Analysis of Consumer Reviews Based on


Semantic Fuzziness

Another procedure for the computation of polarities and characteristics of Chinese


evaluation phrases are proposed in this examination, which could also be used to
dismember semantic cushion of Chinese Language.
It utilizes a probability regard, instead of a fixed a motivation for the furthest point
characteristics of estimation phrases, differentiated and the ordinary systems. As shown
by the polarities and characteristics of those articulations, the report proposes two multi-
system incline examination techniques independently subject to SVM and NB.
Particularly in this regard, the method reliant on NB is what we consider adversative
conjunctions.

The two techniques could be utilized for the presumption examination of files. The
attainability and sufficiency of our strategies is illustrated. For the future work, we will
research in districts of quantitative examination of modifiers, and separate compound
evaluation articulations of phonetic structure to find better systems for end examination.
In addition, we will investigate on fleecy evaluation and decoding of article since the soft

8
semantic of Chinese.

The Internet is currently not only an important source of information, but also a platform
of expressing views and sharing experiences. In this network, we can easily collect
reviews about products or services. Sentiment analysis is useful in commercial
intelligence application environment and recommender systems because it is a very
convenient channel for the two ends of the supply to communicate. In the sentiment
analysis, many strategies and techniques were used, such as machine learning, polarity
lexicons, natural language processing, and psychometric scales, which determine
different types of sentiment analysis, such as assumptions made, method reveals, and
validation datasets. At present, sentiment analysis is made at three levels: word, sentence,
and document, of which the sentence and the document are usually used in most current
studies.

The wordlevel, the fundamental, and consequently the more significant and more
challenging level, however, is seldom studied. For Chinese as a language, actually short
sentiment phrases of one or two Chinese characters are most fuzzy in meaning.
Traditional machine learning techniques can’t represent this characteristic. So a new
hybrid sentiment analysis is proposed in this study, which comprehensively uses Zadeh’s
fuzzy set theory, machine learning theory, and the method based on polarity lexicons. It
considers adversative conjunctions, such as ‘‘ (but)’’, ‘‘ (while)’’, ‘‘ (however)’’, etc. For
the characteristics of Chinese language, we increase the weight of sentences which
contain such conjunctions. Furthermore, it also considers opinion operators, e.g., ‘‘
(say)’’, ‘‘ (present)’’, ‘‘ (suggest)’’, etc. If a sentence contains such phrases, it’s regarded
as a neutral opinion. The three standard machine learning algorithms for sentiment
analysis are NB (Naive Bayes), ME (MaxEnt, or Maximum Entropy), and SVMs
(Support Vector Machines). For simplicity of the experiment, we only choose NB and
SVMs.

Previous work showed that traditional sentiment analysis approaches can be quite
effective. To automate the analysis of sentiment materials, different approaches were used
for the prediction for the sentiments of words, expressions and also documents, which
include Natural Language Processing (NLP) and pattern-based machine learning
algorithms, for example NB, ME, SVM, and unsupervised learning. Kim and Hovy first
produced a synonym set of candidate words with unknown emotions. Govindarajan

9
proposed a method of sentiment analysis on restaurant reviews using hybrid classification
technology.

While most researchers focus on machine learning-based sentiment analysis, others focus
on polarity lexicons-based methods. Kamps et al. determined word sentiment orientation
after calculating their semantic distance with their benchmarks in the WordNet synonym
structure chart. Wang et al. first studied the characters about the sentiment phrases in the
NTUSD polarity word bank to obtain their polarities and strengths based on their
characters.

Cambria adopted human-computer interaction, information retrieval and multi-modal


signal processing technologies to extract people’s sentiments among the ever-growing
online social database. Since each of the above studies had limited coverage and short
comings in prediction, we must consider semantic fuzziness when building sentiment
lexicon. This paper proposed a new approach, i.e. Multi-Strategy sentiment analysis
based on semantic fuzziness, which is a mix of machine learning and sentiment lexicons-
based approach.

Title 4: A hybrid fuzzy-based personalized recommender system for telecom


products services

The Internet makes shocking open gateways for relationship to give changed online
associations to their clients. Recommender frameworks are proposed to regularly make
adjusted proposition of things/associations to clients. Since different vulnerabilities exist
inside both thing and client information, it is a test to accomplish high recommendation
exactness.

This examination builds up a mix suggestion approach which joins client based and thing
based total separating structures with delicate set methods and applies it to adaptable
thing and association proposal.

It especially executes the proposed strategy in a sharp recommender structure


programming called Fuzzy-based Telecom Product Recommender System.
Test results demonstrate the adequacy of the proposed strategy and the fundamental
application displays that the FTCP-RS can tastefully assist clients with picking the most
10
reasonable adaptable things or associations.

Title 5: A Personalized Recommender System Based on a Hybrid Model

Recommender frameworks are implies for web personalization and fitting the perusing
background to the clients' particular needs. There are two classes of recommender
frameworks; memory-based and display based frameworks.

In this paper, the author proposes a customized recommender framework for the
following page expectation that depends on a half and half model from the two classes.
The summed up examples created by a model based strategies are customized to explicit
clients by coordinating client profiles produced from the conventional memory-based
framework's client thing grid.

The recommended framework offered a huge improvement in forecast speed over


conventional model-based use mining frameworks, while likewise offering a normal
improvement in the framework exactness and framework accuracy by 0.27% and 2.35%,
separately.

Web personalization could be defined as the process of tailoring a web site to the needs
and preferences of specific users. Given the huge amount of information available on the
World Wide Web it became very important to interact with the user, understand his
behavior and be one step ahead of him. Next-Page prediction techniques make use of the
information stored in Web server logs to build a model of users' behavior and these
models are used to anticipate the user's next page based on his profile.

Next page prediction improves on the friendliness of a web site. It also reduces network
latency by pre-fetching required pages. Also these prediction techniques are essential for
movie review aggregate system applications to recommend suitable content and offer
personalized advertisements. Recommender systems take advantage of the preferences of
a group of users to make individual recommendations. They help users locate interesting
objects among a huge set of available objects.

Web-based recommender systems are important tools for locating information and for
websites to recommend to their users products or services that meet their preferences.
There are two main approaches to recommender systems, memory-based (also known as
11
nearest neighbor) methods and model-based methods. Memory based recommender
systems store all ratings or opinions of all users and generalize from them at the time of
making recommendations. The techniques used by memory-based recommender systems
allow for recommendations that are tailored to the needs of each individual user,
however, the size of data that needs to be stored affects their scalability.

Title 6: Recommender systems based on social networks

The standard recommender structures, especially the communitarian filtering


recommender systems, have been analyzed by various researchers in the earlier decade.
Regardless, they disregard the social associations among customers. Without a doubt,
these associations can improve the accuracy of proposal. Recently, the examination of
social-based recommender structures has transformed into a working examination point.

In this paper, the author proposes a social regularization approach that joins relational
association information to benefit recommender structures. The two customers' family
relationships and rating records (marks) are used to predict the missing characteristics
(names) in the customer thing network. Especially, we use a bi-clustering count to
recognize the most sensible social affair of partners for making assorted last proposals.
Careful examinations on real datasets exhibit that the proposed strategy achieves
preferable execution over existing approaches.

Title 7: A Hybrid Trust-Based Recommender System for Online


Communities of Practice

The necessities everlastingly long learning and the speedy improvement of information
developments advance the headway of various online Community of Practices. In online
CoPs, restricted mental stability and metacognition are two essential issues, especially
when understudies face information over-trouble and there is no data master inside the
learning condition. This examination proposes a creamer, trust-based recommender
system to calm above learning issues and problems in online CoPs. A logical
investigation was driven using the Stack Overflow data to test the recommender system.
Basic disclosures include:

(1) Comparing with other informal community stages, understudies in online CoPs

12
have more grounded social relations and will when all is said in done connect with a more
diminutive get-together of people in a manner of speaking.

(2) The cross breed count can give more correct recommendations than huge name
based and content-based estimation.

(3) The proposed recommender framework can empower the game plan of modified
learning systems.

Title 8: A peer-to-peer recommender system for self-emerging user communities


based on gossip overlays

Tattle based shared conventions ended up being productive for supporting and providing
dynamic and complex data trade among conveyed peers. They are helpful for structure
and keeping up the system topology itself just as to help an unavoidable dispersion of the
data infused into the system. This is valuable in our current reality where there is a
developing need to get to and know about numerous kinds of appropriated assets like
Web pages, shared documents, online items, news and data. Finding adaptable, versatile
and productive systems tending to this point is a crucial issue, with significant social and
financial angles.

In this paper, the author proposes the general engineering of a framework whose point is
to misuse the community trade of data between companions so as to assemble a
framework ready to accumulate comparable clients and spread valuable proposals among
them.

Title 9: Social and Content Hybrid Image Recommender System for Mobile
Social Networks

One among the upsides of informal organizations is the likelihood to mingle and
customize the substance made or shared by the clients. In portable informal organizations,
where the gadgets have restricted abilities as far as screen size and registering power,
Multimedia Recommender Systems allows to show the most important substance to the
clients, contingent upon their preferences, connections and profile. Past recommender
frameworks are not ready to adapt to the vulnerability of mechanized labeling and are
13
learning area dependant. Furthermore, the instantiation of a recommender in this area
should adapt to issues emerging from the communitarian sifting inborn nature (cold
begin, banana issue, expansive number of clients to run, and so forth.).

The arrangement displayed in this paper tends to the previously mentioned issues by
proposing a half and half picture recommender framework, which consolidates
cooperative separating (social methods) with substance based systems, leaving the client
the freedom to give these procedures an individual weight. It considers feel and the
formal qualities of the pictures to conquer the issues of current methods, upgrading the
execution of existing frameworks to make a versatile informal communities recommender
with a high level of adjustment to any sort of client.

Title 10: A novel hybrid approach improving effectiveness of recommender systems

Recommender frameworks bolster clients by creating possibly intriguing


recommendations about applicable items and data. The expanding consideration towards
such devices is seen by both the extraordinary number of incredible and modern
recommender calculations created lately and their selection in numerous famous Web
stages.
Be that as it may, exhibitions of recommender frameworks can be influenced by
numerous basic issues concerning case, over-specialization, trait determination and
versatility. To alleviate some of such negative impacts, a cross breed recommender
framework, called Relevance Based Recommender, is proposed in this paper. It abuses
singular proportions of apparent significance registered by every client for each case of
intrigue and, to acquire a superior accuracy, likewise by considering the comparable to
measures processed by different clients for similar occurrences. A few tests demonstrate
the focal points presented by this recommender while producing conceivably alluring
proposals.

14
CHAPTER 3

PROPOSED METHODOLOGY

The proposed methodology of this project deals with various concepts we utilize in order
to implement this project. Our project, Movie Review Aggregation System uses Naive
Bayes to perform Sentiment Analysis on User provided review/comment to generate
dynamic ratings. The various concepts explained graphically as:

Figure 3.1: Architecture Diagram

15
As from the figure: we get that the movie review/comment is analysed and used to
generate the sentiment profile using the Naive Bayes Classifier. The steps in the process
are explained as:

3.1 Movie Reviews

These are the user provided comments and feedbacks to a particular movie which needs
to be analyzed for sentiment generation and dynamic changes in movie reviews. Movie
review is the examination of the film made by one individual or all in all communicating
the supposition on the motion picture. The eccentricity of motion picture survey is that it
doesn't just assess the motion picture however gives unmistakable suppositions which are
the establishment of film audit. A movie audit is a work of film analysis tending to the
benefits of at least one movies. By and large, the expression "motion picture survey"
suggests a work of journalistic film analysis as opposed to of scholarly analysis. Such
audits have showed up in papers and printed periodicals since the start of the film
business, and now are distributed all in all intrigue sites just as specific film and film
survey destinations. TV programs and different recordings are presently generally looked
into in comparative scenes and by comparative strategies.

3.2 Pre-Processor

This progression is utilized to expel all the pointless words in the given crude
information, for example, URLs, stopwords, and so on. This progression incorporates
Tokenization which is the way toward substituting a delicate information component with
a non-touchy comparable, alluded to as a token, that has no extraneous or exploitable
importance or esteem and Stemming which is in etymological morphology and data
recovery, stemming is the way toward decreasing bent words to their statement stem, base
or root structure.

The total and kind of getting ready done depends upon the possibility of the preprocessor;
some preprocessors are simply fit for performing commonly direct scholarly substitutions
and substantial scale expansions, while others have the force of certain programming
vernaculars. It can in like manner fuse full scale taking care of, record thought and
language enlargements. They normally perform full scale substitution, abstract thought of
various reports, and prohibitive total or joining.

16
Since it thinks nothing about the fundamental language, its utilization has been
scrutinized and huge numbers of its highlights incorporated straightforwardly with
different dialects. For instance, macros supplanted with forceful inlining and formats,
incorporates with gather time imports (this requires the conservation of sort data in the
article code, making this element difficult to retrofit into a language); contingent
arrangement is adequately practiced with on the off chance that else and dead code end in
certain dialects. Notwithstanding, a key point to recollect is that all preprocessor
mandates should begin another line.

Syntactic preprocessors were presented with the Lisp group of dialects. Their job is to
change sentence structure trees as per various client characterized rules. This is the
situation with Lisp and OCaml. Some different dialects depend on a completely outside
language to characterize the changes, for example, the XSLT preprocessor for XML, or
its statically composed partner CDuce.

Syntactic preprocessors are normally used to modify the linguistic structure of a


language, broaden a language by including new natives, or insert an area explicit
programming language (DSL) inside a universally useful language.

3.3 Naive Bayes Classifier

The Naive Bayes Classifier is an unsupervised AI strategy, it comprises of set of


calculations essentially bayes hypothesis of probabilistic measures to characterize the
given dataset into different classes.

Blameless Bayes has been pondered extensively since the 1960s. It was introduced
(anyway not under that name) into the substance recuperation arrange in the mid 1960s,
and remains an unmistakable (design) strategy for substance characterization, the issue of
settling on a choice about reports as having a spot with one order or the other, (for
instance, spam or true, sports or administrative issues, etc.). It also finds application in
modified restorative assurance.

Most noteworthy likelihood planning should be conceivable by surveying a closed


structure explanation, which takes direct time, rather than by exorbitant iterative
supposition as used for some various types of classifiers. In the estimations and
programming building composing, straightforward Bayes models are known under an
arrangement of names, including essential Bayes and opportunity Bayes. All of these
names reference the usage of Bayes' theory in the classifier's decision standard, anyway
17
naïve Bayes isn't (generally) a Bayesian strategy. There is authentically not a lone count
for getting ready such classifiers, yet a gathering of figurines reliant on an ordinary
standard: all honest Bayes classifiers expect the estimation of a particular component is
free of the estimation of some other component, given the particular class variable. For
example, a natural item may be seen as an apple if it is red, round, and around 10 cm in
separation over.

For specific sorts of probability models, honest Bayes classifiers can be arranged all
around capably in a managed getting the hang of setting. In many helpful applications,
parameter estimation for guiltless Bayes models uses the procedure for most
extraordinary likelihood; in that capacity, one can work with the unsuspecting Bayes
appear without compromising Bayesian probability or using any Bayesian systems.
Despite their unsophisticated arrangement and plainly distorted doubts, naïve Bayes
classifiers have worked very well in various many-sided real conditions. Regardless, an
extensive examination with other request figurings in 2006 exhibited that Bayes portrayal
is defeated by various systems, for instance, helped trees or self-assertive woods.

3.4 Training Dataset

As Naive Bayes is an unsupervised learning technique, we need a preparation dataset to


prepare the classifier to produce different assessment profiles for a given setting. For this
reason, we utilize the NLP-Stanford Dataset to prepare our classifier An arrangement
dataset is a dataset of models used for understanding, that is to fit the parameters (e.g.,
loads) of, for example, a classifier.

Most procedures that examine through planning data for definite associations tend to over
fit the data, inferring that they can recognize and manhandle clear associations in the
arrangement data that don't hold when all is said in done.

The model is at first fit on a planning dataset, that is a great deal of points of reference
used to fit the parameters (for instance heaps of relationship between neurons in phony
neural frameworks) of the model. The model (for instance a neural net or an
unsophisticated Bayes classifier) is set up on the arrangement dataset using a controlled
learning method (for instance incline plunge or stochastic edge drop). Eventually, the
planning dataset as often as possible involve sets of a data vector (or scalar) and the
looking at yield vector (or scalar), which is regularly implied as the goal (or imprint). The
present model is continued running with the arrangement dataset and produces a result,

18
which is then differentiated and the goal, for every data vector in the readiness dataset. In
light of the delayed consequence of the relationship and the specific learning count being
used, the parameters of the model are adjusted. The model fitting can fuse both variable
decision and parameter estimation.

3.5 Sentiment Profile Generation

After the use of Naive Bayes Classifier, the given movie review/comment is classified
into one of the following sentiments which is then used to alter the dynamic ratings of
that particular movie. The sentiments generated are:
● Very Positive

● Positive

● Neutral

● Negative

● Very Negative

3.6 Dynamic Ratings Changes based on the generated Sentiment Profile

As the Sentiment Profile of a particular review/comment is generated for a particular


movie, the rating of that movie is changed as per the calculations done for that particular
sentiment. Based on the sentiment calculation the dynamically changes in rating occurs
for a particular movie.

The average rating of all the reviews are taken into the account for calculating the final
rating for a particular movie. If the reviews continuously seems to be negative then it will
impact the ratings too and the star rating will be decreased and if the reviews comes out
to be of positive polarity then the star rating of that particular movie would gradually
increase.
The proposed methodologies can be further explained well by the collaborative diagram
and use-case diagram of the project. These are:

19
3.7 Collaborative Diagram

Figure 3.2: Collaborative Diagram

3.8 Use-Case Diagram

Figure 3.3: Use-Case Diagram

20
CHAPTER 4

PROCEDURES

Following are the set of Procedures one must follow to implement movie review aggregate
system:

4.1 Collection of User Reviews

Reviews are important for doing the Sentiment Analysis Task. For the Collection of
audits there are diverse methods which are utilized in this study. The surveys can be an
organized, semi-organized and unstructured sort. Conclusion Analysis inquire about,
there are open source system where specialist can get their information for the
examination reason. R is a programming language and a well suited condition for
quantifiable enlisting and structures reinforced under the R Foundation for Statistical
Computing. By introducing required bundles and validation procedure of social site, to
creep the audits from that site is simple errand. When we have our content information
with us then we can utilize that information for Pre-handling reason.

4.2 Pre-Processing

Data pre-processing is done to expel the inadequate uproarious and conflicting


information. Information must be pre-handled before utilizing in highlight choice
assignment. In pre-processing following are a few tasks:

4.2.1 Removing URLs, Special characters, Numbers, Punctuations etc.


Before we begin utilizing the survey's content we have to clean it. Evacuate the notices,
as we need to sum up:
● Remove all the uncommon characters.
● Remove the hash label sign (#) however not the real tag as this may contain data.
● Set all words to lowercase.
● Remove all accentuations, including the inquiry and outcry marks.
● Remove the URLs as they don't contain valuable data. We didn't see a distinction
in the quantity of URLs utilized between the supposition classes.
● Make beyond any doubt to change over the emoticons into single word.
● Remove digits.

21
One symptom of content cleaning is that a few lines don't have any words left in their
content. However, for the Word2Vec calculation this causes a blunder. There are diverse
systems to manage these missing qualities. Some are:
● Remove the total line, however in a generation domain this isn't attractive.
● Impute the missing an incentive with some placeholder content like *[no_text]*.
● When applying Word2Vec: utilize the normal all things considered.

4.2.2 Removing Stop words


However "stop words" when in doubt insinuates the most outstanding words in a
language, there is no single comprehensive once-over of stop words used by all normal
language taking care of mechanical assemblies, and in certainty not all instruments even
use such a summary. A couple of gadgets unequivocally keep away from removing these
stop words to help state look.

Any social occasion of words can be picked as the stop words for a given reason. The
articulation "stop word", which isn't in Luhn's 1959 presentation, and the related terms
"stop once-over" and "stoplist" appear in the writing instantly subsequently.

A predecessor thought was used in making a couple of concordances. In SEO stating,


stop words are the most generally perceived words that most web crawlers keep away
from, saving reality in getting ready broad data in the midst of crawling or requesting.
This urges web lists to save space in their databases.

4.2.3 Stemming

Stemming means to reduce a word on the basis of suffix and prefix and is used in NLU
and NLP. Stemming is a piece of etymological investigations in morphology and man-
made brainpower (AI) data recovery and extraction. Stemming is additionally a piece of
questions and Internet web search tools. Perceiving, looking and recovering more types of
words returns more outcomes. That extra data recovered is the reason stemming is
essential to seek inquiries and data recovery.
At the point when another word is discovered, it can introduce new research openings.
Regularly, as well as can be expected be achieved by utilizing the essential morphological
type of the word: the lemma. Stemming utilizes various ways to deal with decrease a
word to its base from whatever bent structure is experienced.
22
It tends to be easy to build up a stemming calculation. Some basic calculations will
basically strip perceived prefixes and postfixes. In any case, these basic calculations are
inclined to mistake. For instance, a blunder can decrease words like apathy to lazi rather
than apathetic. Instances of stemming calculations include:
● Queries in tables of arched types of words. This methodology requires every arched
structure be recorded.
● Addition strippi. Calculations perceive known additions on arched words and
evacuate them.
● Lemmatization. This calculation gathers every single arched type of a word so as to
separate them to their root lexicon structure or lemma. Words are separated into a
grammatical feature (the classes of word types) by method for the standards of
punctuation.
● Stochastic models. This calculation procures from tables of arched types of words.
By comprehension additions, and the tenets by which they are connected, a
calculation can stem new words.

4.2.4 Tokenization

Tokenization is the show of isolating a course of action of strings into pieces, for instance,
words, watchwords, articulations, pictures and distinctive parts called tokens. Tokens can be
solitary words, communicates or even whole sentences. Amid the time spent tokenization, a
couple of characters like highlight marks are discarded. The tokens become the commitment
for another methodology like parsing and substance mining.

Tokenization is utilized in software engineering, where it has an extensive influence during


the time spent lexical investigation.

Tokenization depends for the most part on basic heuristics so as to isolate tokens by following
a couple of steps:

● Tokens or words are isolated by whitespace, accentuation stamps or line breaks

● Void area or accentuation imprints could possibly be incorporated relying upon the need

All characters inside coterminous strings are a piece of the token. Tokens can be comprised of
every single alpha character, alphanumeric characters or numeric characters as it were.

23
Tokens themselves can likewise be separators. For instance, in most programming dialects,
identifiers can be set together with math administrators without blank areas. Despite the fact
that it appears this would show up as a solitary word or token, the sentence structure of the
language really thinks about the scientific administrator (a token) as a separator, so
notwithstanding when numerous tokens are packed up together, they can at present be isolated
by means of the numerical administrator.

4.3 Feature Selection

Highlight determination from pre-processed content is the troublesome errand in


estimation examination. The fundamental objective of the component determination is to
diminish the dimensionality of the element space and in this way computational expense.
Highlight determination will diminish the over fitting of the learning plan to the
preparation information. In various AI calculations were examined on a news survey
dataset with various element choice systems highlights are normally unigrams, bigrams
and grams. POS labeling is utilized in highlight choice procedures.

4.4 Sentiment Word Identification

Sentiment word identification is a crucial work in various utilizations of estimation


examination and sentiment mining, for example, audit mining, feeling holder finding, and
survey characterization. Conclusion words can be arranged into positive, negative and
impartial words.
Customary opinion examination regularly utilizes notion word reference to remove
notion data in content and group archives. In any case, rising casual words and
expressions in client created content call for investigation mindful to the specific
circumstance. For the most part, they have extraordinary implications in a specific
setting. As a result of its extraordinary execution in speaking to between word
connection, we use supposition word vectors to distinguish the exceptional words. Result
demonstrates the improved model shows better execution in speaking to the words with
unique importance, while continue doing great in speaking to exceptional colloquial
example.
Traditional sentiment analysis often uses sentiment dictionary to extract sentiment
information in text and classify documents. However, emerging informal words and
phrases in user generated content call for analysis aware to the context. Usually, they
have special meanings in a particular context. Because of its great performance in
representing inter-word relation, we use sentiment word vectors to identify the special
24
words. Based on the distributed language model word2vec, in this paper we represent a
novel method about sentiment representation of word under particular context, to be
detailed, to identify the words with abnormal sentiment polarity in long answers. Result
shows the improved model shows better performance in representing the words with
special meaning, while keep doing well in representing special idiomatic pattern. Finally,
we will discuss the meaning of vectors representing in the field of sentiment, which may
be different from general object-based conditions.
Different from traditional news corpus, user generated content is informal in linguistics.
While they constantly create new words and phrases, some informal words express more
meaning. It challenges traditional sentiment analysis methods as provide us more vivid
corpus to explore human beings’ sentiment expression. Exploring this kind of new words
requires deeply making use of context, especially semantic meaning. Using distributed
language methods in a particular context provide insight to latent language meaning,
while shows superiority in context aware and analogy. Sometimes, in a special language
context, some words would have special meaning different from normal environment. As
the social media developing, more online community the cluster effect. For example,
“refugee” is a positive word in the discussion of human rights union while negative
among the real estate holder. 2 In this paper, we use a model based on word2vec to find
out the special word in a particular context. Different from researches on short
information flow like twitter, we use long articles, answers to a question posted in social
media as corpus. Using our model, we provide a method dipping into the latent sentiment
tendency in long social articles. After training vectors using word2vec, we change the
vectors of words with known sentiment polarity and train them again controlling iteration
times. The special words in a particular context are detected in the model, while a better
vector expression of them is presented.

4.5 Sentiment Polarity Identification

The fundamental assignment in SA is arranging the extremity of a given content at the


report, sentence, or highlight. The extremity is in three class for example Positive,
Negative and Neutral. Extremity ID is finished by utilizing diverse dictionaries which
help to ascertain slant score, opinion quality and so forth. Extremity in slant investigation
alludes to recognizing assessment introduction (positive, unbiased, and negative) in
composed or spoken language. Different kinds of conclusion examination incorporate
fine-grained supposition investigation which gives more accuracy in the dimension of
extremity (for example positive, positive, unbiased, negative, and exceptionally negative)
25
and feeling examination which expects to recognize feelings in articulations (for example
joy, trouble, dissatisfaction, shock, and so on).
Language can contain articulations that are objective or emotional. Target articulations
are actualities. Emotional articulations are sentiments that portray individuals' sentiments
towards a particular subject or theme.

4.6 Sentiment Classification

Assessment characterization of news audit dataset and item survey dataset is finished
utilizing directed AI approaches like Naive Bayes, SVM, Maximum Entropy and so forth.
Precision is relies upon which dataset is utilized for which characterization techniques.
On account of Supervised AI approaches Training dataset is utilized to prepare the
arrangement display which at that point help to order the test information.Customary
opinion examination regularly utilizes notion word reference to remove notion data in
content and group archives. In any case, rising casual words and expressions in client
created content call for investigation mindful to the specific circumstance. For the most
part, they have extraordinary implications in a specific setting. As a result of its
extraordinary execution in speaking to between word connection, we use supposition
word vectors to distinguish the exceptional words. Result demonstrates the improved
model shows better execution in speaking to the words with unique importance, while
continue doing great in speaking to exceptional colloquial example.

4.7 Analysis of Reviews

At long last Analysis of result is imperative to settle on choice to individual and industry. If
there should be an occurrence of news audits on the off chance that more outcome is sure, at
that point client can choose to go that news occasion.
Investigation is utilized in business knowledge.

26
CHAPTER 5

REQUIREMENTS

5.1 Functional Requirements

Functional requirement are the capacities or highlights that must be incorporated into any
framework to fulfill the business needs and be worthy to the clients. In view of this, the
functional requirement that the framework must require are as per the following:
● System should be able to process new reviews and comments and store them in
database after retrieval.
● System should be able to analyze data and classify each reviews and comments
polarity.

5.2 Non-Functional Requirements

Non-functional requirements is a depiction of highlights, qualities and characteristic of


the framework just as any imperatives that may confine the limits of the proposed
framework.
The non-functional requirements are basically founded on the execution, data, economy,
control and security effectiveness and administrations. In view of these the non-utilitarian
necessities are as per the following:
● User friendly
● System ought to give better exactness
● To perform with productive throughput and reaction time.

27
5.3 Hardware Requirements

System i3 Processor

Hard Disk 500 GB

Monitor 15’’LED

Input Devices Keyboard, Mouse

Ram 4GB

Table 5.1: Hardware Requirements

5.4 Software Requirements

Operating system Windows

Coding Language Java

IDE Netbeans

Database SQLyog Enterprise

Table 5.2: Software Requirements

28
CHAPTER 6

SYSTEM TESTING
6.1 Types of Testing

UNIT TESTING is a component of programming testing where solitary


units/portions of an item are attempted. The explanation behind existing is to support
that each unit of the item executes as arranged. A unit is the smallest testable bit of
any item. It generally has one or two or three information sources and as a general rule
a lone yield. In procedural programming, a unit may be an individual program, work,
method, etc. In article arranged programming, the tiniest unit is a system, which may
have a spot with a base/super class, dynamic class or decided/kid class. (Some treat a
module of an application as a unit. This is to be discouraged as there will probably be
various individual units inside that module.) Unit testing frameworks, drivers, stubs,
and phony/fake articles are used to help unit testing.

INTEGRATION TESTING is an element of programming testing where particular


units are solidified and attempted as a social affair. The inspiration driving this
element of testing is to reveal faults in the correspondence between facilitated units.
Aircraft testers and test stubs are used to help Integration Testing.

SYSTEM TESTING is an element of programming testing where an aggregate and


consolidated writing computer programs is attempted. The explanation behind this test
is to survey the system's consistence with the foreordained necessities.

ACCEPTANCE TESTING is a component of programming testing where a


structure is striven for ampleness. The purpose behind this test is to evaluate the
system's consistence with the business requirements and overview whether it is
commendable for transport.

29
6.2 Testing Objectives
● Web page should be rendered perfectly.
● Details of all the movies should be displayed.
● Comment box should be there in every movie title.
● All textfields should be working perfectly.
● Users must be allowed to enter their mail id's.
● Comment box should allow users to write comment.
● The description should be coming on the left side of every movie.
● Movie's poster should come right up front.
● The overall rating of the movie should be displayed on the top.
● On the dashboard, every movie should be displayed and their short bio.
● On clicking a movie title from the dashboard, it's full description should open.
● The system should identify the correctness of the data entered in all the fields.
● The web page must not get delayed on loading.
● Every link should be responsive and each review must be processed.
● Every link must open it's respective pages and it's content.
● The star rating must get dynamically change.
● The sentiment analysis of all reviews entered should be done.
● Based on the sentiment polarity, the outcome should be displayed.
● The outcomes of the sentiment polarity must be recorded in the database.
● Every reviews entered in the system must be recorded and should be retrieved
successfully on the movie page, after the previous review/comment.

6.3 Testing Outcomes


Testing for the reviews have been done based on the polarity of the reviews.
Based on the outcomes, the following could be observed:

Very Positive
Positive
Neutral
Negative
Very Negative

Table 6.1: Polarity Index

30
For each polarity, the star ratings are fixed and based on that only the star ratings are
awarded to the reviews entered by the user and the overall average star rating of that
movie gets changed.
Testing for user's review have been done manually and the outcomes have been
reported.

RESULT
TEST SENTIMENT TEST INPUT EXPECTED ACTUAL
(PASS/
ID OUTCOME OUTCOME
FAIL)
1 Very good movie Very Positive Very Positive PASS
VERY
2 Best movie Very Positive Very Positive PASS
POSITIVE
3 Great movie Very Positive Very Positive PASS
4 Good movie Positive Positive PASS
5 Overall good Positive Positive PASS
POSITIVE
6 Must watch Positive Positive PASS
7 I love this movie Positive Positive PASS
8 Average movie Neutral Neutral PASS
9 NEUTRAL Ok Ok Neutral Neutral PASS
10 New concept Neutral Neutral PASS
11 Bad movie Negative Negative PASS
12 Not good Negative Negative PASS
NEGATIVE
13 Should not watch Negative Negative PASS
14 I hate this movie Negative Negative PASS
15 VERY Worst movie Very Negative Very Negative PASS
16 NEGATIVE Very bad movie Very Negative Very Negative PASS

Table 6.2: Reviews Testing

31
CHAPTER 7

RESULT

Figure 7.1: Dashboard

The above figure represents the home page or dashboard of the system. It shows various
movies titles listed on the website of the system. All the titles are shown with their album
art and short description. Users can find more description about their favorite movie title
by clicking on the thumbnail of the titles. There can also be seen a more click button.

32
Figure7.2: Description Page

The description page can be seen in Figure 7.2. It shows a bigger album art of the
movie. A proper description can also be found in this web page. A star rating can be
found next to the description. It contains the average rating of all the movie reviews.
Below that, the actual runtime can be found in minutes. It's release date and category
can be found next to the runtime. The name of the directors and star casts are also
included in this page. So the user can find full description about the movie.

33
Figure 7.3: Comment box

Figure 7.3 includes the comment box. This allows an user to enter their mail id and
drop a comment to review their favorite movie title. After this they need to click on
submit and get their review verified. Post verification their name and review would be
shown publicly in this page along with the sentiment type, which is system generated.

34
CHAPTER 8

CONCLUSION

The project’s experimental work makes the following important contributions. Firstly, it
proposes a dynamic rating system which takes reviews/comments as an input and
performs sentiment analysis on the same and dynamically changes movie ratings based
on reviews/comments. Secondly, Our Rating System can provide ratings on movie which
can be used to better decision making of audience and provides unbiased ratings as it
takes and process review/comment directly from the users. The dynamic rating generation
system from the analysis of comment/feedback has many future implementations which
can be beneficial. From this project, we found that every review can be represented into a
sentiment in a particular context, the understanding and processing this data into rating
figures can be a useful step towards automation of systems, usability and scope of the
web applications.

35
CHAPTER 9

FUTURE ENHANCEMENT

Future Enhancements in this project imparts to many functionalities of this dynamic


rating system generation from user comments/feedbacks which can be used in many
fields such as pharmacy, e-commerce, sales industry ,etc.
Some of the areas where this movie review aggregation system can be used are
highlighted as:
1. It can be further extended for generating reviews related to the products in Online
Shops, e-commerce websites.
2. It can be used for generating reviews for the online videos, advertisements, etc.

3. It can also be used for generating reviews related to the colleges during admission
process.
4. It could be enhanced for use in generating reviews of the candidates in the election.

The sentiment analyzed from the data can further be processed and be used. A
Recommendation System can be integrated into this project as an enhancement. You can
also get notifications based on recommendations.

We could add further libraries to provide graphical plot of the sentiment generated thus
giving a better representation of data processed in graphs.
We could also add further functionalities to the admin user of the project thus having the
functionality to disable comments for a particular movie, add/remove movie, change the
movie details and block a particular user account giving inappropriate reviews.
There is a possibility to implement a search functionality into this project which makes it
easier to search various movies into the database.

36
REFERENCES

[1] Lin, Z., "An exact examination of client and framework proposals in web based
business," Decision Support Systems, 68, pp. 111-124, 2014.
[2] Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G., “Recommender system
application developments: a survey,” Decision Support Systems, 74, pp. 12-32,
2015.

[3] Edmunds, An., and Morris, A, "The reportedly issue on data over-burden in
associations: a survey of the writing," International diary of data
management,20(1), pp. 17-28, 2000
[4] Berghel, H., “The Future of Digital Money Laundering. Computer,” 47(8), pp.
70-75, 2014.
[5] Pajala, T., Korhonen, P., Malo, P., Sinha, A., Wallenius, J., and Dehnokhalaji,
A., "Representing political assessments, power, and impact: A Voting Advice
Application," European Journal of Operational Research, 266(2), pp. 702-715,
2018.
[6] Terveen L, Hill W, Amento B, et al. PHOAKS: A system for sharing
recommendations[J]. Com munications of the ACM, 1997, 40(3): 59-62.
[7] Tatemura J. Virtual analysts for shared investigation of film
reviews[C]//Proceedings of the fifth universal meeting on Intelligent UIs. ACM,
2000: 272-275.
[8] Hu X, Tang L, Tang J, et al. Abusing social relations for conclusion examination
in microblogging[C]//Proceedings of the 6th ACM universal meeting on Web
hunt and information mining. ACM, 2013: 537-546.
[9] Ku L W, Liang Y T, Chen H. Conclusion Extraction, Summarization and
Tracking in News and Blog Corpora[C]//AAAI Spring Symposium:
Computational Approaches to Analyzing Weblogs. 2006: 100-107.
[10] Zadeh L A. Fuzzy sets[J]. Information and control, 1965, 8(3): 338-353.
[11] Turney P, Littman M L. Unsupervised taking in of semantic introduction from a
hundred-billion-word corpus[J]. 2002.
[12] Prabowo, R.. "Sentiment examination: A consolidated methodology", Journal of
Informetrics, 2009, 04.

37
APPENDIX

CODE
MainApp.java
public class MainApp
{

public static void main(String[] args) throws IOException


{
String text = "Those who find ugly meanings in beautiful things are corrupt without
being charming.";
SentimentAnalyzer sentimentAnalyzer = new SentimentAnalyzer();
sentimentAnalyzer.initialize();
SentimentResult sentimentResult = sentimentAnalyzer.getSentimentResult(text);
System.out.println("Sentiment Score: " + sentimentResult.getSentimentScore());
System.out.println("Sentiment Type: " + sentimentResult.getSentimentType());
System.out.println("Very positive: " +
sentimentResult.getSentimentClass().getVeryPositive() +"%");
System.out.println("Positive: " + sentimentResult.getSentimentClass().getPositive()+"%");
System.out.println("Neutral: " + sentimentResult.getSentimentClass().getNeutral()+"%");
System.out.println("Negative: " + sentimentResult.getSentimentClass().getNegative()+"%");
System.out.println("Very negative: " +
sentimentResult.getSentimentClass().getVeryNegative()+"%");
}
}

38
SentimentAnalyzer.java
public class SentimentAnalyzer
{
/*
* "Very negative" = 0 "Negative" = 1 "Neutral" = 2 "Positive" = 3
* "Very positive" = 4
*/
static Properties props;
static StanfordCoreNLP pipeline;

public void initialize()


{
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and
sentiment
props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
pipeline = new StanfordCoreNLP(props);
}

public SentimentResult getSentimentResult(String text)


{
SentimentResult sentimentResult = new SentimentResult();
SentimentClassification sentimentClass = new SentimentClassification();

if (text != null && text.length() > 0)


{
// run all Annotators on the text
Annotation annotation = pipeline.process(text);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class))
{
// this is the parse tree of the current sentence
Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
SimpleMatrix sm = RNNCoreAnnotations.getPredictions(tree);
String sentimentType = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
sentimentClass.setVeryPositive((double)Math.round(sm.get(4) * 100d));

39
sentimentClass.setPositive((double)Math.round(sm.get(3) * 100d));
sentimentClass.setNeutral((double)Math.round(sm.get(2) * 100d));
sentimentClass.setNegative((double)Math.round(sm.get(1) * 100d));
sentimentClass.setVeryNegative((double)Math.round(sm.get(0) * 100d));
sentimentResult.setSentimentScore(RNNCoreAnnotations.getPredictedClass(tree));
sentimentResult.setSentimentType(sentimentType);
sentimentResult.setSentimentClass(sentimentClass);
}
}
return sentimentResult;
}
}

40
SentimentClassification.java
public class SentimentClassification
{
double veryPositive;
double positive;
double neutral;
double negative;
double veryNegative;

public double getVeryPositive()


{
return veryPositive;
}

public void setVeryPositive(double veryPositive)


{
this.veryPositive = veryPositive;
}

public double getPositive()


{
return positive;
}

public void setPositive(double positive)


{
this.positive = positive;
}

public double getNeutral()


{
return neutral;
}

public void setNeutral(double neutral)

41
{
this.neutral = neutral;
}

public double getNegative()


{
return negative;
}

public void setNegative(double negative)


{
this.negative = negative;
}

public double getVeryNegative()


{
return veryNegative;
}

public void setVeryNegative(double veryNegative)


{
this.veryNegative = veryNegative;
}

42
SentimentResult.java
public class SentimentResult {
double sentimentScore;
String sentimentType;
SentimentClassification sentimentClass;

public double getSentiment()


{
return sentimentScore;
}

public double getSentimentScore()


{
return sentimentScore;
}

public void setSentimentScore(double sentimentScore)


{
this.sentimentScore = sentimentScore;
}

public String getSentimentType()


{
return sentimentType;
}

public void setSentimentType(String sentimentType)


{
this.sentimentType = sentimentType;
}

public SentimentClassification getSentimentClass()


{
return sentimentClass;
}

43
public void setSentimentClass(SentimentClassification sentimentClass)
{
this.sentimentClass = sentimentClass;
}

44
PAPER PUBLICATION STATUS

Publication process not yet started.

45
PLAGIARISM REPORT

46

You might also like