Professional Documents
Culture Documents
Mini Project 3rd Year Report Group no-M22IT309
Mini Project 3rd Year Report Group no-M22IT309
ON
Submitted by-
Department of Information
Galgotias College of Engineering and Technology
GreaterNoida
INDEX
Chapter
TITLE Page No
No
Abstract 2
1. Introduction 3-4
1.1 Introduction 5
2. Literature Review 8
3. Methodology 9
4. Concept/Theory 10-13
6. Application/uses 25-26
7. Conclusion 27
8 References 28
ACKNOWLEDGEMENT
I want to give special thanks to our Mini Project coordinator Dr. Javed Miya for the timely
advice and valuable guidance during designing and implementation of this project work.
I also want to express my sincere thanks and gratitude to Dr. Sanjeev Kumar Singh, Head
of Department (HOD), and Information Technology Department for providing me with the
facilities and for all the encouragement and support.
Finally, I express my sincere thanks to all staff members in the department of Information
Technology branch for all the support and cooperation.
Aatika (2000970130001)
In this hustling world, entertainment is a necessity for each one of us to refresh our mood and
energy. Entertainment regains our confidence for work and we can work more
enthusiastically. For revitalizing ourselves, we can listen to our preferred music or can watch
movies of our choice. For watching favourable movies online we can utilize movie
recommendation systems, which are more reliable, since searching of preferred movies will
require more and more time which one cannot afford to waste.
In this paper, to improve the quality of a movie recommendation system, a Hybrid approach
by combining content based filtering and collaborative filtering, using Support Vector
Machine as a classifier and genetic algorithm is presented in the proposed methodology and
comparative results have been shown which depicts that the proposed approach shows an
improvement in the accuracy, quality and scalability of the movie recommendation system
than the pure approaches in three different datasets. Hybrid approach helps to get the
advantages from both the approaches as well as tries to eliminate the drawbacks of both
methods.
1.1 INTRODUCTION
Social media is now an inseparable part of our lives. With its constant growth,
it has contributed to superfluous, heterogeneous data which can be
overwhelming due to its volume and velocity, thus limiting the availability of
relevant and required information when a particular query is to be served.
Hence, a need for a personalized, fine-grained user preference-oriented
framework for resolving this problem and also, to enhance user experience is
increasingly felt.
Collaboration has been a big buzzword for the past several years, as
organizations realize that effective team collaboration is key to innovation.
New project management methods have emerged to extend the meaning of
team collaboration from the simple act of working together to a more complex
function of inter-relating diverse project teams to achieve new ideas, innovative
practices and to yield superior results. These methods include practices and
project collaboration tools that promote communication, idea sharing and
transparency for local and remote teams.
Our paper analyzes the use of machine learning algorithms in recommender systems
and identifies new research opportunities. The goals of this study are to:
Recommender systems encompass a class of techniques and algorithms that can suggest
“relevant” items to users. They predict future behavior based on past data through a
multitude of techniques including matrix factorization.
We now live in what some call the “era of abundance”. For any given product, there are
sometimes thousands of options to choose from. Think of the examples above: streaming
videos, social networking, online shopping; the list goes on. Recommender systems help to
personalize a platform and help the user find something they like.
The easiest and simplest way to do this is to recommend the most popular items. However,
to really enhance the user experience through personalized recommendations, we need
dedicated recommender systems.
1.3 OUTLINE OF PROJECT
The movies are recommended based on the content of the movie you entered or selected.
The main parameters that are considered for the recommendations are the genre, director,
and top 3 casts. The details of the movies, such as title, genre, runtime, rating, poster,
casts, etc., are fetched from TMDB. The reviews of each individual movie given by the
users are "web-scraped" from the IMDB website with the help of beautifulsoup4, and the
reviews are subjected to sentiment analysis, where the model predicts whether the review
is positive or negative.
Some serious drawbacks which restrain the effectiveness of the RSs are discussed
below:
Over-specialization
The amount of user input determines the accuracy of CBRS suggestions. If the RS
does not have enough knowledge about the user, the recommendation will perform
poorly. No CBR system can produce relevant recommendations if the content being
analyzed has adequate information to distinguish between what the user likes and
what the user dislikes. This circumstance is known as the restricted content analysis
challenge.
Cold start
When a new item or user is added to a RS, the system will not have any previous
records (ratings, preferences, search history, etc.) on which to base a suggestion.
This is referred to as the "cold start" issue. It's also known as the "new user" or
"new item" problem. This technique is insufficient and incorrect since consumers
with similar demographic characteristics may have differing levels of interest in a
single items.
Sparsity
The RSs deal with quite huge datasets in practice. As a result, the user-item matrix
utilized for CF is exceedingly sparse, negatively affecting the performance of the
CF systems' predictions or suggestions. It can also happen when a person uses a
product but does not care to rate it afterward. In certain circumstances, users do not
rank goods that they are unfamiliar with.
2. LITERATURE REVIEW
Luis M Capos et al has analyzed two traditional recommender systems i.e. content based filtering
and collaborative filtering. As both of them have their own drawbacks he proposed a new system
which is a combination of Bayesian network and collaborative filtering. A hybrid system has been
presented by Harpreet Kaur et al. The system uses a mix of content as well as collaborative
filtering algorithm. The context of the movies is also considered while recommending.
Urszula Kużelewska et al. proposed clustering as a way to deal with recommender systems. Two
methods of computing cluster representatives were presented and evaluated. Centroid-based
solution and memory-based collaborative filtering methods were used as a basis for comparing
effectiveness of the proposed two methods. The result was a significant increase in the accuracy
of the generated recommendations when compared to just centroid-based method.
Costin-Gabriel Chiru et al. proposed Movie Recommender, a system which uses the information
known about the user to provide movie recommendations. This system attempts to solve the
problem of unique recommendations which results from ignoring the data specific to the user. The
psychological profile of the user, their watching history and the data involving movie scores from
other websites is collected. They are based on aggregate similarity calculation. The system is a
hybrid model which uses both content based filtering 10 and collaborative filtering.
3. METHODOLOGY
The hybrid approach proposed an integrative method by merging fuzzy kmeans clustering
method and genetic algorithm based weighted similarity measure to construct a movie
recommendation system. The proposed movie recommendation system gives finer
similarity metrics and quality than the existing Movie recommendation system but the
computation time which is taken by the proposed recommendation system is more than
the existing recommendation system. This problem can be fixed by taking the clustered
data points as an input dataset.
The proposed approach is for improving the scalability and quality of the movie
recommendation system .We use a Hybrid approach , by unifying Content-Based Filtering
and Collaborative Filtering, so that the approaches can be profited from each other. For
computing similarity between the different movies in the given dataset efficiently and in
least time and to reduce computation time of the movie recommender engine we used
cosine similarity measure.
Agile Methodology:
Data Analysis:
Make sure that that the collected data sets are correct and analysing the data in the csv
files. i.e. checking whether all the column Felds are present in the data sets.
Algorithms:
In our project we have only two algorithms one is cosine similarity and other is single
valued decomposition are used to build the machine learning recommendation model.
DEMOGRAPHIC FILTERING-
● Sort the scores and recommend the best rated movie to the users.
We can use the average ratings of the movie as the score but using this won't be fair
enough since a movie with 8.9 average rating and only 3 votes cannot be considered
better than the movie with 7.8 as average rating but 40 votes. So, we’ll be using
IMDB's weighted rating (wr) which is given as:-
Fig 3.1 Demographic Filtering
Content-based filtering recommends items based on item profile and user profile
comparisons. In the form of keywords, a user profile is content that is considered
to be relevant to the user (or features). A user profile can be thought of as a
collection of assigned keywords (terms, attributes) gathered by an algorithm from
objects the user finds relevant (or interesting).
The Object profile is a list of keywords (or features) for an item. Consider the
following scenario: A person goes to a patisserie to purchase his favorite cake, 'X.'
Unfortunately, cake 'X' has been sold out, so the merchant suggests that the person
buy cake 'Y,' which is made up of comparable ingredients of ‘X’. This is an instance
of content-based filtering.
We are now in a good position to define our recommendation function. These are
the following steps we'll follow:-
While our system has done a decent job of finding movies with similar plot
descriptions, the quality of recommendations is not that great. "The Dark Knight
Rises" returns all Batman movies while it is more likely that the people who liked
that movie are more inclined to enjoy other Christopher Nolan movies. This is
something that cannot be captured by the present system.
While our system has done a decent job of finding movies with similar plot
descriptions, the quality of recommendations is not that great. "The Dark Knight
Rises" returns all Batman movies while it is more likely that the people who liked
that movie are more inclined to enjoy other Christopher Nolan movies. This is
something that cannot be captured by the present system.
5. DETAILS OF IMPLEMENTATION/CONSTRUCTION
Technologies Used
Python
HTML
Javascript
CSS
Below is the implementation of different web pages which are the part of our web app.
CODE:
import numpy as np
import pandas as pd
from flask import Flask, render_template, request
from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics.pairwise import
cosine_similarity import json
import bs4 as bs import urllib.request import pickle
import requests
# load the nlp model and tfidf vectorizer from disk filename = 'nlp_model.pkl'
clf = pickle.load(open(filename, 'rb'))
vectorizer = pickle.load(open('tranform.pkl','rb'))
def create_similarity():
data = pd.read_csv('main_data.csv') # creating a count matrix
cv = CountVectorizer()
count_matrix = cv.fit_transform(data['comb']) # creating a similarity score matrix similarity =
cosine_similarity(count_matrix) return data,similarity
def rcmd(m):
m = m.lower() try:
data.head() similarity.shape
except:
data, similarity = create_similarity() if m not in data['movie_title'].unique():
return('Sorry! The movie you requested is not in our database. Please check the spelling or try with some
other movies')
else:
i = data.loc[data['movie_title']==m].index[0] lst = list(enumerate(similarity[i]))
lst = sorted(lst, key = lambda x:x[1] ,reverse=True)
lst = lst[1:11] # excluding first item since it is the requested movie
itself
l = []
for i in range(len(lst)): a = lst[i][0]
l.append(data['movie_title'][a]) return l
def get_suggestions():
data = pd.read_csv('main_data.csv')
return list(data['movie_title'].str.capitalize()) app = Flask( name )
@app.route("/") @app.route("/home") def home():
suggestions = get_suggestions()
return render_template('home.html',suggestions=suggestions)
# call the convert_to_list function for every string that needs to be converted to list
rec_movies = convert_to_list(rec_movies) rec_posters = convert_to_list(rec_posters) cast_names =
convert_to_list(cast_names) cast_chars = convert_to_list(cast_chars) cast_profiles =
convert_to_list(cast_profiles) cast_bdays = convert_to_list(cast_bdays) cast_bios =
convert_to_list(cast_bios) cast_places = convert_to_list(cast_places)
# combining multiple lists as a dictionary which can be passed to the html file so that it can be processed
easily and the order of information will be preserved
movie_cards = {rec_posters[i]: rec_movies[i] for i in range(len(rec_posters))}
vote_count=vote_count,release_date=release_date,runtime=runtime,status=status
,genres=genres,
movie_cards=movie_cards,reviews=movie_reviews,casts=casts,cast_details=cast_d etails)
Recommend.js
$(function() {
// Button will be disabled until we type anything inside the input field const source =
document.getElementById('autoComplete');
const inputHandler = function(e) { if(e.target.value==""){
$('.movie-button').attr('disabled', true);
}
else{
$('.movie-button').attr('disabled', false);
}
}
source.addEventListener('input', inputHandler);
$('.movie-button').on('click',function(){
var my_api_key = '41e6207bf2b198de81c99d4c73d9063b'; var title = $('.movie').val();
if (title=="") {
$('.results').css('display','none');
$('.fail').css('display','block');
}
else{
load_details(my_api_key,title);
}
});
});
// get the basic details of the movie from the API (based on the name of the movie)
function load_details(my_api_key,title){
$.ajax({
type: 'GET',
url:'https://api.themoviedb.org/3/search/movie?api_key='+my_api_key+'&query='
+title,
success: function(movie){ if(movie.results.length<1){
$('.fail').css('display','block');
$('.results').css('display','none');
$("#loader").delay(500).fadeOut();
}
else{
$("#loader").fadeIn();
$('.fail').css('display','none');
$('.results').delay(1000).css('display','block'); var movie_id = movie.results[0].id;
var movie_title = movie.results[0].original_title; movie_recs(movie_title,movie_id,my_api_key);
}
},
error: function(){ alert('Invalid Request');
$("#loader").delay(500).fadeOut();
},
});
}
// passing the movie name to get the similar movies from python's flask function
movie_recs(movie_title,movie_id,my_api_key){
$.ajax({
type:'POST', url:"/similarity", data:{'name':movie_title}, success: function(recs){
if(recs=="Sorry! The movie you requested is not in our database. Please check the spelling or try with
some other movies"){
$('.fail').css('display','block');
$('.results').css('display','none');
$("#loader").delay(500).fadeOut();
}
else {
$('.fail').css('display','none');
$('.results').css('display','block'); var movie_arr = recs.split('---'); var arr = [];
for(const movie in movie_arr){ arr.push(movie_arr[movie]);
}
get_movie_details(movie_id,my_api_key,arr,movie_title);
}
},
error: function(){ alert("error recs");
$("#loader").delay(500).fadeOut();
},
});
}
// get all the details of the movie using the movie id.
function get_movie_details(movie_id,my_api_key,arr,movie_title) {
$.ajax({
type:'GET', url:'https://api.themoviedb.org/3/movie/'+movie_id+'?api_key='+my_api_key, success:
function(movie_details){
show_details(movie_details,arr,movie_title,my_api_key,movie_id);
},
error: function(){ alert("API Error!");
$("#loader").delay(500).fadeOut();
},
});
}
// passing all the details to python's flask for displaying and scraping the movie reviews using imdb id
function show_details(movie_details,arr,movie_title,my_api_key,movie_id){ var imdb_id =
movie_details.imdb_id;
var poster = 'https://image.tmdb.org/t/p/original'+movie_details.poster_path;
var overview = movie_details.overview; var genres = movie_details.genres;
var rating = movie_details.vote_average; var vote_count = movie_details.vote_count;
var release_date = new Date(movie_details.release_date); var runtime = parseInt(movie_details.runtime);
var status = movie_details.status; var genre_list = []
for (var genre in genres){ genre_list.push(genres[genre].name);
}
var my_genre = genre_list.join(", "); if(runtime%60==0){
runtime = Math.floor(runtime/60)+" hour(s)"
}
else {
runtime = Math.floor(runtime/60)+" hour(s) "+(runtime%60)+" min(s)"
}
arr_poster = get_movie_posters(arr,my_api_key); movie_cast = get_movie_cast(movie_id,my_api_key);
ind_cast = get_individual_cast(movie_cast,my_api_key);
details = { 'title':movie_title,
'cast_ids':JSON.stringify(movie_cast.cast_ids), 'cast_names':JSON.stringify(movie_cast.cast_names),
'cast_chars':JSON.stringify(movie_cast.cast_chars),
'cast_profiles':JSON.stringify(movie_cast.cast_profiles),
'cast_bdays':JSON.stringify(ind_cast.cast_bdays), 'cast_bios':JSON.stringify(ind_cast.cast_bios),
'cast_places':JSON.stringify(ind_cast.cast_places), 'imdb_id':imdb_id,
'poster':poster, 'genres':my_genre, 'overview':overview, 'rating':rating,
'vote_count':vote_count.toLocaleString(), 'release_date':release_date.toDateString().split(' ').slice(1).join('
'),
$.ajax({
type:'POST', data:details, url:"/recommend", dataType: 'html', complete: function(){
$("#loader").delay(500).fadeOut();
},
success: function(response) {
$('.results').html(response);
$('#autoComplete').val('');
$(window).scrollTop(0);
}
});
}
url:'https://api.themoviedb.org/3/person/'+movie_cast.cast_ids[cast_id]+'?api
_key='+my_api_key,
async:false,
success: function(cast_details){ cast_bdays.push((new
Date(cast_details.birthday)).toDateString().split(' ').slice(1).join(' '));
cast_bios.push(cast_details.biography); cast_places.push(cast_details.place_of_birth);
}
});
}
return
{cast_bdays:cast_bdays,cast_bios:cast_bios,cast_places:cast_places};
}
// getting the details of the cast for the requested movie function get_movie_cast(movie_id,my_api_key){
cast_ids= []; cast_names = []; cast_chars = []; cast_profiles = [];
top_10 = [0,1,2,3,4,5,6,7,8,9];
$.ajax({
type:'GET',
url:"https://api.themoviedb.org/3/movie/"+movie_id+"/credits?api_key="+my_api
_key,
async:false,
success: function(my_movie){ if(my_movie.cast.length>=10){
top_cast = [0,1,2,3,4,5,6,7,8,9];
}
else {
top_cast = [0,1,2,3,4];
}
for(var my_cast in top_cast){ cast_ids.push(my_movie.cast[my_cast].id)
cast_names.push(my_movie.cast[my_cast].name); cast_chars.push(my_movie.cast[my_cast].character);
cast_profiles.push("https://image.tmdb.org/t/p/original"+my_movie.cast[my_cas t].profile_path);
}
},
error: function(){ alert("Invalid Request!");
$("#loader").delay(500).fadeOut();
}
});
return
{cast_ids:cast_ids,cast_names:cast_names,cast_chars:cast_chars,cast_profiles: cast_profiles};
}
url:'https://api.themoviedb.org/3/search/movie?api_key='+my_api_key+'&query='
+arr[m],
async: false,
success: function(m_data){
arr_poster_list.push('https://image.tmdb.org/t/p/original'+m_data.results[0]. poster_path);
},
error: function(){ alert("Invalid Request!");
$("#loader").delay(500).fadeOut();
},
})
}
return arr_poster_list;
}
Autocomplete.js new autoComplete({
data: { // Data src [Array, Function, Async]
| (REQUIRED)
src: films,
},
selector: "#autoComplete", // Input field selector
| (Optional)
threshold: 2, // Min. Chars length to start Engine
| (Optional)
debounce: 100, // Post duration for engine to start
| (Optional)
searchEngine: "strict", // Search Engine type/mode
| (Optional)
resultsList: { // Rendered results list object
| (Optional)
render: true, container: source => {
source.setAttribute("id", "food_list");
},
destination: document.querySelector("#autoComplete"), position: "afterend",
element: "ul"
},
maxResults: 5, // Max. number of rendered results
| (Optional)
I
,._,.
' ..,.,...
ONE DALMATIANS
■■
CINDERELLA SNOW WHITE AND THE LITTLE MERMAID
(i >(I g
6. APPLICATION/USES
Almost any business can benefit from a recommendation system. There are two important
aspects that determine how much a business benefits from a recommendation system:
With this framework, we can identify industries that stand to gain from recommendation
systems:
E-Commerce
Is an industry where recommendation systems were first widely used. With millions of
customers and data on their online behavior, e-commerce companies are best suited to
generate accurate recommendations.
Retail
Target scared shoppers back in the 2000s when Target systems were able to predict
pregnancies even before mothers realized their own pregnancies. Shopping data is the most
valuable data as it is the most direct data point on a customer’s intent. Retailers with troves of
shopping data are at the forefront of companies making accurate recommendations.
Media
Similar to e-commerce, media businesses are one of the first to jump into recommendations.
It is difficult to see a news site without a recommendation system.
Banking
A mass-market product that is consumed digitally by millions. Banking for masses and SMEs
are prime for recommendations. Knowing a customer’s detailed financial situation, along with
their past preferences, coupled with data of thousands of similar users is quite powerful.
Telecom
It Shares similar dynamics with banking. Telcos have access to millions of customers whose
every interaction is recorded. Their product range is also rather limited compared to other
industries, making recommendations in telecom an easier problem.
Utilities
Similar dynamics with telecom but utilities have an even narrower range of products,
making recommendations rather simple.
Increased sales/conversion
There are very few ways to achieve increased sales without increased marketing effort. Once
you set up an automated recommendation system, you get recurring additional sales without
any effort.
The shortest path to a sale is great since it reduces the effort for both you and your customer.
Recommendation systems allow you to reduce your customers’ path to a sale by
recommending them an appropriate option sometimes even before they search for it.
By getting customers to spend more on your website, you can increase their familiarity with
your brand and user interface, increasing their probability to make future purchases from
you.
Reduced churn
Recommendation system-powered emails are one of the best ways to re-engage customers.
Discounts or coupons are other effective yet costly ways of re-engaging customers and they
can be coupled with recommendations to increase customers’ probability of conversion.
7. CONCLUSION