You are on page 1of 45

Recommendation

System
How computers know what we
really want????

• linkedIn-
Think about • youtube-
• Amazon-
• Netflix-
• Twitter- user connections
• Two-thirds of movies watched by
Netflix customers are recommended
movies
• 38% of click-through rates on Google
Why should we use News are recommended links
recommendation • 35% of sales at Amazon arise from
recommended products
system
Steve Jobs: “A lot of times, people don’t
know what they want until you show it
to them.”
• Drive Traffic. A recommendation
engine can bring traffic to your site.
• Provide Relevant Material.
• Engage Customers.
Benefits of RS • Transform Shoppers to Clients.
• Increase Average Order Value.
• Boost Number of Items per Order
• The purpose of a recommender
system is to suggest relevant items
to users.
What is a • To achieve this task, there exist
Recommendation three major categories of methods :
System? collaborative filtering methods ,
content based method and hybrid
filtering method.
• In technical terms, a recommendation
engine problem is to develop a
mathematical model or objective
function which can predict how much a
user will like an item.

Technical • If U = {users}, I = {items} then F =


Objective function and measures the
interpretation of usefulness of item I to user U, given by:
RS F: U x I → R
Where R = {recommended items}.
• For each user u, we want to choose the
item i that maximizes the objective
function:
Classification of Recommendation System
Recommendation System
Hybrid Collaborative
Content Based
Filtering Filtering
Technique
Technique Technique

memory based Model based


filtering filtering

Clustering
Matrix factorization
Item Based User Based based Deep learning
based algorithm
algorithm

SVD
Multi layered
KNN Singular Value Neural Networks
Decomposition

Probabilistic Matrix
Factorization

Non –ve matrix


factorization
Recommendation System

Hybrid Filtering Collaborative Filtering Content Based Context Utility Knowledge Trust Modern
Technique Technique Technique based based Demographic based Aware

Context Aware
Weighted memory based filtering Model based Constraint Based Case based
filtering
Semantic based
Mixed Item User
Based Based Clustering based Matrix factorization based Deep learning Cross domain
algorithm algorithm
Switching
Restricted Boltzmann Machine Peer-to peer (PP)
KNN SVD
Feature Singular Value
combination Deep Neural Cross Lingual
Decomposition Network(DNN)
Feature
Augmentation Probabilistic Matrix Recurrent Neural
Factorization Network (RNN)

Cascade
Collaborative Filtering Neural
Non –ve matrix Network (CFN)
factorization
meta-
level Neural Autoregressive Distribution
Estimation(NADE)

Deep Convoluted NN (DCNN)

Generative Adversarial Network (GAN)

Graph Neural Network (GNN)

MLP

Auto Encoder (AE)


AI based Recommendation System

Recommendation System

Evolutionary Natural
Transfer Reinforcement Fuzzy Computer
Deep Learning Active Learning Algorithms Language
Learning Learning Techniques Vision
(EAs) Processing(NLP)
Collaborative Filtering Technique
Content Based filtering
• It is based on description of the item and a
profile of the user’s preference.
Hybrid Recommendation System
Netflix Challenge
1. Problem Statement
Netflix provided a lot of anonymous rating data, and a prediction accuracy bar
that is 10% better than what Cinematch can do on the same training data set.
(Accuracy is a measurement of how closely predicted ratings of movies match
subsequent actual ratings.)

• Objectives:
• Predict the rating that a user would give to a movie that he has not yet rated.
• Minimize the difference between predicted and actual rating (RMSE and
MAPE)
2.Format in the file

subsequent line in the file corresponds to a rating from a customer and its date in the
following format:

Customer ID, Rating, Date

Movie IDs range from 1 to 17770 sequentially


Customer IDs range from 1 to 2649429, with gaps.

There are 480189 users.


Ratings are on a five-star (integral) scale from 1 to 5.
Dates have the format YYYY-MM-DD.
Mapping a real word problem to a machine
learning problem
Type of Machine Learning Problem

For a given movie and user we need to predict the rating would be given by him/her to the
movie.
• The given problem is a Recommendation problem.
• It can also be seen as a Regression problem

Performance metric
• Mean Absolute Percentage Error:
https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
• Root Mean Square Error:
https://en.wikipedia.org/wiki/Root-mean-square_deviation
Machine Learning Objective and Constraints
• Minimize RMSE.
• Try to provide some interpretability.

Step1
load all python libraries required to perform the task.
3. Exploratory
A)Pre-processing
a)Converting / Merging whole data to required format: u_i, m_j, r_ij
#Create a file 'data.csv' before reading it
# Read all the files in netflix and store them in one big file('data.csv')
# reading from each of the four files and appending each rating to a global file
'train.csv’
# create data frame from data.csv file.
# arranging the ratings according to time.
b)Checking the NaN values
# just to make sure that all Nan containing rows are deleted..
c)Removing the duplicates
# by considering all columns..( including timestamp)
d) Basic Statistics (#Ratings, #Users, and #Movies)
# print them as
Total no of ratings : 0
Total No of Users : 0
B) Spliting data into Train and Test(80:20)
a) Make data frames for both test and train data and store them
# create the dataframe and store it in the disk for offline purposes.. for both as
test.csv and train.csv
b) Basic Statistics in Train data (#Ratings, #Users, and #Movies)
# movies = train_df.movie.value_counts()
# users = train_df.user.value_counts()
c) Basic Statistics in Test data (#Ratings, #Users, and #Movies)
C) Exploratory Data Analysis on Train data
a) plotting the distribution of ratings
# method to make y-axis more readable
# Add new column (week day) to the data set for analysis
b) Number of Ratings per a month
c) Analysis on the Ratings given by user
4. Machine learning model
• Sampling Data
• Finding Global Average of all movie ratings, Average rating per User, and Average rating per
Movie (from sampled train)
• Featurizing data
• Applying Machine learning models
• Regression model
• Surprise model
• Matrix Factorization Techniques
• SVD Matrix Factorization User Movie interactions
• SVD +Restricted Boltzmann (linear combination of these reduced RSME to 0.88%)
• Interleaving to improve personlisation & context Awareness
•Predictions
•Optimization
Algorithms

First Model
1)Korbell – Matrix Factorization ~0.8914 RSME
Restricted Boltzmann Machine~0.8990 RSME (a little worst)
Linear blend of both reduced the error 0.88
Limitations
They were built to handle 100 millions ratings where as there were more
than 5 billions ratings.
2) Simun Funck- introduced incremental, iterative, and approximate SVD
using gradient descent. This provided practical way to scale MF
• Koren et.al- Enchanced MF by SVD++ (this asymmetric variation enables
adding both implicit and explicit feedback and removes the need to
parameterizing the user.

Second Model
• Salakhutditnov et al.~ They proposed an RBM structure with binary
hidden units and softmax visible units with 5 biasesonly for the movies
the user rated.
• Others methods
- MF combines with traditional neighbourhood approaches.
Netflix Personalisation Approach
• Optimization of accuracy and diversity (personal or household)
• Awareness-
• A different way of promoting trust with the personalization component is to provide
explanations[18] as to why we decide to recommend a given movie or show.
• We are not recommending it because it suits our business needs, but
because it matches the information we have from you: your explicit taste preferences
and ratings, your viewing history, or even your friends' recommendations.
• Combining social(Facebook) with collaborative filtering data is also a worthy area of
research.
• Similarity is also an important source of personalization (cab be between movies or
user and can be in multiple dimension such as metadata, ratings, or viewing data.
• Objective of recommender is to present a number of attractive
items for a person to choose from.( can be accomplished by selecting some items and sorting in
expected enjoyment.(Utility)
• need an appropriate ranking model.
• need to optimize ranking algorithms to give the highest scores to titles that a member is most
likely to play and enjoy.
• popularity is the opposite of personalization:
• the goal becomes to find a personalized ranking function that is better than item popularity.
• Aprroach-1 ~this is to use the member’s predicted rating of each item as an adjunct to item
popularity
• to build a ranking prediction model-using a very simple scoring function to be a linear
combination of popularity and predicted rating.
• score(u; v) = w1p(v) + w2r(u; v) + b, where u=user, v=video item, p=popularity and r=predicted
rating
• This family of machine learning problems is known as "Learning to Rank" and is central to
application scenarios such as search engines or ad targeting.
• Many traditional supervised classication methods such as logistic regression or tree ensembles can be used
for ranking when using this pointwise approach. (or sophisticated pairwise or even list wise)
• Models
• many different models were used, list of methods you should probably know about if you are working in
machine learning for personalization:
• Linear regression, Logistic regression, Elastic nets, Singular Value Decomposition, Restricted Boltzmann
Machines, Markov Chains, Latent Dirichlet Allocation, Association Rules, Matrix factorization, Gradient
Boosted Decision Trees, Random Forests, and Clustering techniques from the simple k-means to graphical
approaches such as Affinity Propagation.
• There are proven solutions to scale offline computations using, for example, Hadoop.
• online and nearline

• Reference -Big & Personal: data and models behind Netflix recommendations
Xavier Amatriain,Netflix, xavier@netflix.com
Matrix Factorization
benefit of storage
take comedy and action as F 1 and F2 which can
be 100 different and we assign them random
values
Netflix Matrix, is sparse

white box are


predicted value

You might also like