You are on page 1of 26

Recommendation Engine I

DS 862: Machine Learning for Business Analysts

Rex Cheung

San Francisco State University

Apr 21, 2021

Rex Cheung (SFSU, DS 862) 4/21/2021 1 / 26


Outline

Introduction to Recommendation Engine


Content Based Filtering
How to Start from Scratch

Rex Cheung (SFSU, DS 862) 4/21/2021 2 / 26


Recommendation Engine

A model (or class of models).


Goal is to suggest recommendations to users based on predicting
missing rating or preference.
Also referred to recommendation system.

Rex Cheung (SFSU, DS 862) 4/21/2021 3 / 26


Recommendation Engine

Used by pretty much all large (and small) companies:


Amazon: products
Spotify: songs
Netflix: shows and movies
Facebook: friends, content
Etc..

Rex Cheung (SFSU, DS 862) 4/21/2021 4 / 26


Some Motivation: The Netflix Million Dollar Problem

The Netflix Prize: launched in October 2006.


Goal: Improve the (at that time) current algorithm by 10%
Training data of 100,480,507 ratings from 480,189 users and 17,770
movies.
Predict the 2,817,131 missing ratings.
Competition concluded on Sept 21, 2009, with a winning solution
improving the existing score by 10.6%.
Winner team was awarded $1 Million.

Rex Cheung (SFSU, DS 862) 4/21/2021 5 / 26


Types of Models

Broadly divided into 3 categories:


Content-based filtering: recommendation based on item and/or user
attribute
Collaborative filtering: recommendation based on interactions
(ratings, preferences, etc.)
Hybrid of both: like ensemble

Rex Cheung (SFSU, DS 862) 4/21/2021 6 / 26


Figure: Image Source

Rex Cheung (SFSU, DS 862) 4/21/2021 7 / 26


A Very Basic Example

Movie ratings:
Iron Man Beautiful Mind Harry Potter Toy Story 4 Annabelle The Greatest Showman
User A 4 5 1
User B 5 5 5
User C 1 1 5 3
User D 3 2 2 1

What movies should we recommend to user B?

Rex Cheung (SFSU, DS 862) 4/21/2021 8 / 26


Content Based Filtering

Looks at content / attributes


Uses item features to recommend other items similar to what the user
likes, based on their previous actions
Focus on attributions of items (or users), rather than using ratings
Several approaches:
1 Using only item attributes
2 Using both user profile and item attributes

Rex Cheung (SFSU, DS 862) 4/21/2021 9 / 26


Figure: Image Source

Rex Cheung (SFSU, DS 862) 4/21/2021 10 / 26


Content-Based Filtering: Item Attributes Only

General pseudocode:
For each item, obtain the attributes.
Calculate similarity between items.
Recommends user items that is most similar to user’s previously
viewed item.

Rex Cheung (SFSU, DS 862) 4/21/2021 11 / 26


Item Attribution Example

Movie Attributes:
Action Bibliography Drama Horror Sci-Fi Animation Adventure
Iron Man 1 1 1
Beautiful Mind 1 1
Harry Potter 1
Toy Story 4 1 1
Annabelle 1
The Greatest Showman 1 1

Rex Cheung (SFSU, DS 862) 4/21/2021 12 / 26


Item Attribution Example

Need to measure similarity between two items. Need to define a similarity


metric:
Similarity Based Metrics: The larger the value, the more similar.
Examples: Pearson’s correlation, Spearman’s correlation, Cosine
similarity, Jaccard similarity.
Distance Based Metrics: The smaller the value, the more similar.
Examples: Euclidean distance, Manhattan distance.

Rex Cheung (SFSU, DS 862) 4/21/2021 13 / 26


Cosine Similarity
Pp
x ·y xi yi
i=1q
s(x , y ) = cos(θ) = = qP
||x ||||y || p 2 Pp 2
i=1 xi i=1 yi

Figure: Image Source

Useful for text documents, or when observations are binary, though can be
used when vectors are real-valued.
Rex Cheung (SFSU, DS 862) 4/21/2021 14 / 26
Jaccard Similarity

|X ∩ Y | |X ∩ Y |
s(x , y ) = =
|X ∪ Y | |X | + |Y | − |X ∩ Y |

Figure: Image Source

Useful when observations are binary.

Rex Cheung (SFSU, DS 862) 4/21/2021 15 / 26


Item Attribution Example Continue

Similarity matrix:
Iron Man Beautiful Mind Harry Potter Toy Story 4 Annabelle TGS
Ironman 1 0 0.5774 0.4082 0 0
Beautiful Mind 0 1 0 0 0 1
Harry Potter 0.5774 0 1 0.7071 0 1
Toy Story 4 0.4082 0 0.7071 1 0 0
Annabelle 0 0 0 0 1 0
TGS 0 1 0 0 0 1

So, if a user last watched Toy Story 4, the content based recommendation
system will recommend Harry Potter and the Iron Man as the next movies
to watch.

Rex Cheung (SFSU, DS 862) 4/21/2021 16 / 26


User Profile and Item Attributes

The above example uses only item attributes.


Good when we don’t have user profile, i.e. new user.
If we have user profile, can leverage that information as well.
This method is based on user’s taste and item content.

Rex Cheung (SFSU, DS 862) 4/21/2021 17 / 26


User Profile and Item Attributes Example

Let’s use User A as example.

User Profile Item Attribute


Act Bi Dr H SF Ani Ad
IM 4 1 0 0 0 1 0 1
TS 5 0 0 0 0 0 1 1
Anna 1 0 0 0 1 0 0 0

Rex Cheung (SFSU, DS 862) 4/21/2021 18 / 26


User Profile and Item Attributes Example

Create a weighted attribute matrix

Act Bi Dr H SF Ani Ad
IM 4 0 0 0 4 0 4
TS 0 0 0 0 0 5 5
Anna 0 0 0 1 0 0 0

Total 4 0 0 1 4 5 9
Scaled Total 0.17 0 0 0.05 0.17 0.22 0.39

Rex Cheung (SFSU, DS 862) 4/21/2021 19 / 26


User Profile and Item Attributes Example

For the new movies, create a weighted rating:

Table: Black are original attribute value. Red is the weighted attribute value.

Act Bi Dr H SF Ani Ad Total


BM 0 1 (0) 1 (0) 0 0 0 0 0
HP 0 0 0 0 0 0 1 (0.39) 0.39
TGS 0 1 (0) 1 (0) 0 0 0 0 0

The blue column contains the recommendation score. Of the 3 movies, we


would recommend Harry Potter to user A, then Beautiful Mind and The
Greatest Showman.

Rex Cheung (SFSU, DS 862) 4/21/2021 20 / 26


Example

Based on the above approach, what movie(s) would you recommend to


user C?

Rex Cheung (SFSU, DS 862) 4/21/2021 21 / 26


Advantages and Disadvantages
Approach 1:
Pros:
Only need item description. Can avoid the ’cold start’ problem.
Easy to explain.
Cons:
Tends to be over-specalized. Only recommends items that are ’too
similar’.
Approach 2:
Pros:
User independent. As will see later, collaborative filtering depends on
other users as well.
Also give weights to attributes.
Again easy to explain.
Cons:
New user may not have enough profile information.
Again over-specialized.
Rex Cheung (SFSU, DS 862) 4/21/2021 22 / 26
Some Common Problems in Recommendation System

Cold start: For a new user or item, there isn’t enough data to make
accurate recommendations
Scalability: A large amount of computation power is often necessary
to calculate recommendations, especially when you have too many
users / items
Sparsity: Users may have only rated a small portion of items

Rex Cheung (SFSU, DS 862) 4/21/2021 23 / 26


Starting From Scratch

To start building a model from scratch, we can use content-based


modeling.
In most cases, you won’t have information about user profile.
But you will have item descriptions.
Can later switch to other methods.

Rex Cheung (SFSU, DS 862) 4/21/2021 24 / 26


Summary

Discussed the Content Based approach of building recommendation


system.
Simple to start, with various limitation.
Depends on the similarity / distance metrics used (though not highly
dependent).
Can solve the ’cold start’ problem.

Rex Cheung (SFSU, DS 862) 4/21/2021 25 / 26


Reference

General: 1, 2, 3, 4, 5
Content Based: 1, 2, 3
Some use case: 1, 2, 3

Rex Cheung (SFSU, DS 862) 4/21/2021 26 / 26

You might also like