You are on page 1of 10

Cinematics Recommendation System

Name of Guide: Prof. Dhanraj Jadhav

Sr. No Name Roll No.


1 Aditya Utpat 42132
2 Kunal Nilakhe 42133
3 Sushant Dhore 42134
4 Shubham Sonawane 42135
ABSTRACT:
Over the past years, the internet has broadened the horizon of various domains to interact and share
meaningful information. As it is said that everything has its pros and cons therefore, along with the
expansion of domain comes information overload and difficulty in extraction of data. To overcome this
problem, the recommendation system plays a vital role. It is used to enhance the user experience by giving
fast and coherent suggestions. A recommendation engine filters the data using different algorithms and
recommends the most relevant items to users. It first captures the past behaviour of a customer and based on
that, recommends products which the users might be likely to watch. If a completely new user visits the
system, the system will not have any past history of that user. In such a scenario One possible solution could
be to recommend the most watched movies. One is Demographic Filtering i.e. They offer generalized
recommendations to every user, based on movie popularity and/or genre. The System recommends the same
movies to users with similar demographic features. Since each user is different, this approach is considered
to be too simple. The basic idea behind this system is that movies that are more popular and critically
acclaimed will have a higher probability of being liked by the average audience. Second is content-based
filtering, where we try to profile the user’s interests using information collected, and recommend items based
on that profile. The other is collaborative filtering, where we try to group similar users together and use
information about the group to
make recommendations to the user.
Problem Statement:
To effectively recommend movies to users by performing sentimental analysis on the established database
and recommend similar movies.

Introduction:
A recommender system is a simple algorithm whose aim is to provide the most relevant information to a user
by discovering patterns in a dataset. The algorithm rates the items and shows the user the items that they
would rate highly. An example of recommendation in action is when you visit Amazon and you notice that
some items are being recommended to you or when Netflix recommends certain movies to you. They are also
used by Music streaming applications such as Spotify and Deezer to recommend music that you might like.

Below is a very simple illustration of how recommender systems work in the context of an e-commerce site.

Two users buy the same items A and B from an e-commerce store. When this happens the similarity
index of these two users is computed. Depending on the score the system can recommend item C to the other
user because it detects that those two users are similar in terms of the items they purchase . In this model we
are using cosine similarity scores which calculates the similarity scores of movies, and the movies which have
a score near to the maximum value of cosine i.e. 1 are similar movies and are then recommended. All the
Hollywood movies from 1950 to 2020 are present in the datasets. Whenever a user searches for a movie he is
displayed with the movie’s details like cast, crew with other trivia related to the movie and similar movies
which are recommended by calculating the similarity scores using the model.
Proposed Model:

Firstly, information like cast, crew, directors, genre is collected from the dataset. The genre data is
present in a continuous form i.e. separation is not performed on the data. For ease in reading and
understanding it is split for example Action | Fiction | Thriller | Fantasy. Secondly, the release date, overview
and all other information available on the cast is segregated. The release date is converted into set date and
time. The dataset is in the form of string, it needs to be converted into list, for its usability in the model. It is
converted into list by the use of literal_eval((x)) in python. All the data extracted is combed and customized
using various parameters like omitting words which are insignificant like “and, the, A”. Next, for the movies
released after the year 2018, the data is extracted from the Wikipedia. The data available on the Wikipedia is
not in the desired format so the data is extracted in the form of the table using “pd.read_html (link, header =
0)”. The extracted tables are than appended and the desirable data is obtained.

The sentiments are analysed using “TF-IDF Vectorizer”. Its full-form is Term Frequency –inverse
document frequency. The reviews.txt file from the dataset contains all the reviews given by the user. These
reviews are converted into vectors using TF-IDF Vectorizer. These vector scores are needed to calculate the
cosine similarity scores. The model than compares these scores using the Multinomial Naïve Bayes which
than classifies everything based on the classes assigned to it. So the movies under the category Action,
Thriller will get classified into such categories.
Flowchart:
Cosine Similarity:

Cosine similarity is a method to measure the difference between two non zero vectors of an inner product

space. See the example below to understand.

Suppose I want to check if Bernard and Clarissa have similar movie preferences, and I only have two movie

reviews. The reviews are scores from 1 to 5, where 5 is the best score and 1 the worst, and 0 means that a

person has not watched the movie.

I can represent each person’s reviews in a separate vector.

Vector b represents Bernard and vector c Clarissa.

The cosine similarity will measure the similarity between these two vectors which is a measurement of how
similar are the preferences between these two people.

In the image, below each vector represents a person’s preferences and they have an angle θ between them.
Similar vectors will have a lower angle θ, and dissimilar vectors (different film preferences) will have bigger
θ.
In the example above the similarity 0.989 is close to the maximum value of 1, this means that
given only two movie reviews the two users have similar preferences.
Multinomial Naïve Bayes:
With an ever-growing amount of textual information stored in electronic form such as legal documents,
policies, company strategies, etc., automatic text classification is becoming increasingly important. This
requires a supervised learning technique that classifies every new document by assigning one or more class
labels from a fixed or predefined class. It uses the bag of words approach, where the individual words in the
document constitute its features, and the order of the words is ignored. Multinomial NB is used in this model
as it provides high computational speed and accuracy.
Literature Survey:
Name Description Remark
Movie recommender based on To convert data into vectors for Scope for using wider datasets
plot summary using TF-IDF cosine similarity and better accuracy in
vectorization conversion.
Movie recommender using This paper calculates the vectors Can analyse results in a broader
Cosine Similarity using cosine similarity spectrum using more movie
Step-by-step guide in using dot scores. Uses only dot product no
product. conversion to vector shown.
Bayesian Multinomial Naïve Detailed knowledge of fully More simple examples with real
Bayes classifier to text Bayesian and Bayesian time implementations needed.
classification. Multinomial Naïve Bayes
systems.
Filters used in recommender Different demographics, used in Less types of demographics
systems recommender systems shown shown, more options can be
introduced.
Building a movie recommender Preparation of a basic Works only with movies
using python recommendation system available on Netflix.
Paper on understanding Cosine Thorough explanation on what is Easier to understand examples
similarity cosine similarity is given can be given to help improve
understanding.
Movie recommendation system Uses K-Nearest Neighbour Scope for better accuracy
using K-Nearest Neighbour algorithm to identify similar
movies.
Recommender system using K- K-means clustering used and Creating cluster qualities and
Means Clustering. similar movies are clustered. quantities needs more time
Recommender using Using of collaborative filtering Only collaborative filtering used,
collaborative filtering shown more algorithms can be used to
improve optimization
Movie recommendation system Uses cuckoo search which is a Just optimizes the dataset better
using cuckoo search meta-heuristic algorithm recommendation algorithm
needed
System Architecture:
Software Requirements:
Jupyter notebook
Libraries like pandas, sklearn, matplotlib
Python3, JavaScript, html, CSS

Hardware Requirements:
System type 64 or 32 bit
Intel core i3/i5/i7 processor

Output:
Movies are recommended to the user who has similar taste in movies and prefers those genres.
 Saves the time required to search movies of one’s liking.
 Estimates the sentiments of the user about the movie accurately using its sentiment analysis model
 Information about the movie, its cast, crew with images. genre and much more available.
 Live implementation of the model available on heroku.

Conclusion/Results:
Cosine similarity is a very accurate, optimized and efficient algorithm. The result is very precise as it is
based on precise decimal values. The model not only recommends similar movies but also analyses
sentiments using multinomial naïve bayes algorithm which classifies data into different classes and selects
single keywords based on the input given to it. Its predominantly used in textual data analysis as it is highly
effective and accurate.

References:
1. Rishabh Ahuja, Arun Solanki, Anand Nayyar Jan 2019. Movie recommendation system using K-
means clustering and k-nearest neighbour.
2. Ashraf M. Kibriya, Eibe Frank, Bernhard Pfahringer, Geoferry Holmes Aug 2016.Multinomial
Naïve Bayes for text categorization.
3. https://github.com/Pulkit1080/Movie-Recommendation-System
4. Bernard Kurka May 2019. Basic movie recommender using python.
5. https://github.com/kishan0725/AJAX-Movie-Recommendation-System-with-Sentiment-Analysis
6. Tanisha Tripathi, Tushar Narula, Movie recommendation using cosine similarity and KNN
algorithm.

You might also like