Collaborative Filtering & Recommendation System

Wake Forest University | School of Business
Marketing Analytics
BAN 6065
Professor Jia Li
Spring 2021
Today’s Agenda
• Product Analytics
– Recommendation System
– Synchronous Lecture: Market Basket Analysis

– Asynchronous Lecture: Collaborative Filtering
1
Netflix
• Online streaming video service
– The company started as an online DVD rental
service
– Started streaming video in 2007
– Today, more than 158 million subscribers in 190
countries
• The best stock in the past 10 years
– 10-year cumulative return: 3,522%
Netflix’s Recommendation System

Netflix says 80
percent of watched
content is based on
algorithmic
recommendations.
Netflix says 80 percent of watched content is based on

algorithmic recommendations
2
Other Examples
• Retail: Amazon
• Friends: Facebook
• Professional connection: LinkedIn
• Music: Spotify
• Websites: Reddit
Question Time
• Question 1
Accompanying Questions:
https://wakeforest.qualtrics.com/jfe/form/SV_397533jR7DWEmZE
(Link available at Canvas; right below the link to this video)
3
Collaborative Filtering
Key Ideas
• Intuition: Low tech way to get recommendation - ask your friends!
– Some of your friends have better “taste” than others (likely-minded)
• Problem: Not scalable

– As more and more options become available, it become less practical to
decide what you want by asking a small group of people
– They may not be aware of all the options
• Solution: Collaborative Filtering

– Search a large group of people and find a smaller set with tastes similar to
yours
– Looks at other things they like and combines them to create a ranked list of
suggestions
– First used by David Goldberg (Xerox PARC, 1992): “Using collaborative filtering
to weave an information tapestry.”
4
Input Data
• Explicit (Questioning)
– Explicit rating (1 -5 numerical ratings)
– Favorites (Likes): 1 (liked), 0 (No vote), -1 (disliked)
• Implicit (Behavioral)
– Purchase: 1 (bought), 0 (didn’t buy)
– Clicks: 1 (clicked), 0 (didn’t click)
– Reads: 1 (read), 0 (didn’t read)
– Watching a Video: 1 (watched), 0 (didn’t watch)
– Hybrid: 2 (bought), 1 (browsed), 0 (didn’t buy)
Preference Data: Structure

• Rows: Customers/Users
• Columns: Items
Lady in Snake You, Me,
Just My Superman The Night
Customer ID the on a and
Luck Returns Listener
Water Plane Dupree
Mike 2.5 3.5 3.0 3.5 2.5 3.0
Jay 3.5 3.5 3.0
July 3.5 3.0 4.0 2.5 4.5
Peter 2.5 4.0 2.0 3.0
Stephen 3.0 4.0 5.0 3.0
5
Collaborative Filtering Tasks
1. Finding Similar Users: Calculating Similarities
2. Ranking the Users
3. Recommending Items based on weighted preference data
Finding Similar Users

• Calculate pair-wise similarities
– Euclidean Distance: Simple, but subject to rating inflation
– Cosine similarity: better with binary/fractional data
– Pearson correlation: continuous variables (e.g. numerical ratings)
– Others: Jaccard coefficient, Manhattan distance
6
Cosine similarity
1 0 1 1
0 1 0 1
1 0 0 1
0
1 1 1
0 1 0 1
Ranking the Users

• Focal customer
– Toby: Preference Vector (“Snakes on a Plane”: 4.5, “Superman
Returns”: 4.0, “You, Me, and Dupree: 1.0”)
Pearson Correlation Similarity

Customer ID
with Toby
Mike 0.99
Jay 0.38
July 0.89
Peter 0.92
Stephen 0.66
– Top 3 matches: Michael, Peter, July -> Likely-minded!
7
Recommend movies to Toby based on
the user ranking
Lady in Snake You, Me,

Just My Superman The Night
Customer ID the on a and
Luck Returns Listener
Water Plane Dupree
Mike 2.5 3.5 3.0 3.5 2.5 3.0
Jay 3.5 3.5 3.0
July 3.5 3.0 4.0 2.5 4.5
Peter 3.0 4.0 2.0 3.0
Stephen 3.0 4.0 5.0 3.0
Toby 4.5 4.0 1.0
Recommending Items – 1/2
• Problems: If we only use top 1 likely-minded customer

– May accidently turn up customers who haven’t reviewed some of
the movies that I might like
– Could return a customer who strangely liked a movie that got bad
reviews from all other customers
• Solution: Score the items by producing a weighted score

that ranks the customers (weights by similarity)
8
Recommending Items – 2/2
Lady in Snake Just Superm
Customer You, Me, The Night
Similarity the on a My an
ID and Dupree Listener
Water Plane Luck Returns
Michael 0.99 2.5 3.5 3.0 3.5 2.5 3.0
Jay 0.38 3.5 3.5 3.0
July 0.89 3.5 3.0 4.0 2.5 4.5
Peter 0.92 3.0 4.0 2.0 3.0
Stephen 0.66 3.0 4.0 5.0 3.0
Weighted
5.24 10.26
Rating Sum
Sum of
1.91 2.80
Weight
Normalized
2.74 3.66
Rating
Item-based Collaborative Filtering

• You tend to like a product because you bought some similar
products.
9
Item-based Collaborative Filtering
Example
Product Product Product
1 2 3
Customer1 2 3
Customer2 5 2
Customer3 3 3 1
Customer4 2 2
Item-item similarity is computed by looking into co-rated
items only.
Question Time
• Question 2
• Question 3
• Question 4
10
Problems with Collaborative Filtering
• When data are sparse, correlations (weights) are based on
very few common items -> unreliable
• It cannot handle new items
• It do not incorporate attribute information
• Alternative way: content-based recommendations

– Let’s use attribute information!
Content-based Recommendations
1. Defined features and feature values
2. Describe each item as a vector of features
3. Develop a user profile: the types of items this user likes

– A weighted vector of item attributes
– Weights denote the importance of each attribute to the user
4. Recommend items that are similar to those that a user liked in the
past
• Note 1: Similar to information retrieval (text mining)

• Note 2: Pre-computation possible; More scalable -> Used by Amazon
11
Example of Content-based approach
• Movie content
• Genre, actors, director, movie summary, ….
Netflix Recommendation Algorithm: A

Hybrid Type
• Netflix uses a hybrid recommendation algorithm that
combines collaborative filtering and content-based
recommendations (Matrix Factorization)
• For example, consider a collaborative filtering approach
where we determine that Amy and Carl have similar
preferences.
• We could then do content mining, where we would find that
“Terminator”, which both Amy and Carl liked, is classified in
almost the same set of genres as “Starship Troopers.”
• Recommend “Starship Troopers” to both Amy and Carl, even
though neither of them have seen it before.
12
Question Time
• Question 5
The Netflix Prize

• From 2006 to 2009, Netflix ran a contest asking the
public to submit algorithms to predict user ratings for
movies.
• Training data set of ~100,000,000 ratings and test

data set of ~3,000,000 ratings were provided.
• Offered a grand prize of USD1,000,000 to the team

who could beat Netflix’s own algorithm, Cinematch,
by more than 10% (measured in RMSE).
13
Contest Rules
• If the grand prize as not yet reached, progress prizes
of USD50,000 per year would be awarded for the
best result so far, as long as it had >1% improvement
over the previous year.
• Teams must submit code and a description of the

algorithm to be awarded any prizes.
• If any team met the 10% improvement goal, last call

would be issued and 30 days would remain for all
teams to submit their best algorithm.
Initial Results
• The contest went live on October 2, 2006.
• By October 8, a team submitted an algorithm that

beat Cinematch.
• By October 15, there were three teams with

algorithms beating Cinematch.
• One of these solutions beat Cinematch by >1%,

qualifying for a progress prize.
14
Progress During the Contest
• By June 2007, over 20,000 teams had registered from
over 150 countries.
• The 2007 progress prize went to Team BellKor, with

an 8.43% improvement on Cinematch.
• In the following year, several teams from across the

world joined forces.
Competition intensified …
• The 2008 progress prize went to Team BellKor which
contained researchers from the original BellKor team
as well as the Team BigChaos.
• This was the last progress prize because another 1%

improvement would reach the grand prize goal of
10%.
15
Last Call announced …
• On June 26, 2009, the Team BellKor’s Pragmatic
Chaos submitted a 10.05% improvement over
Cinematch.
The Final 30 Days

• 29 Days after last call was announced, on July 25,
2009, the Team The Ensemble submitted a 10.09%
improvement.
• When Netflix stopped accepting submissions the

next day, BellKor’s Pragmatic Chaos had submitted a
10.09% improvement solution and The Ensemble
had submitted a 10.10% improvement solution.
• Netflix would now test the algorithms on a private

test set and announce the winner.
16
Winner is declared!
• On September 18, 2009, a winning team was
announced
• BellKor’s Pragmatic
Chaos won the
competition and the
USD1,000,000 grand
prize.
More ideas for improvement

• Ensemble methods (combining algorithms)
– Most advanced/commercial algorithms combine kNN, matrix
factorization (handling large/sparse matrix), and other classifiers
• Marginal propensity to buy with/without recommendation

(instead of probability of buy)
– Anand Bodapati (JMR 2008): “Customers who buy this product buy these
other products” kind of recommendation system frequently
recommends what customers would have bought anyway and the
recommendation system often creates only purchase acceleration rather
than expand sales
• Incorporate text reviews: text review data can be used as a basis

to calculate similarities (i.e. text mining)
– Basic methods only rely on numerical ratings/purchase data
17

Collaborative Filtering &amp; Recommendation System

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Collaborative Filtering &amp; Recommendation System

Uploaded by

Copyright:

Available Formats

Wake Forest University | School of Business

– Synchronous Lecture: Market Basket Analysis

Netflix’s Recommendation System

Netflix says 80 percent of watched content is based on

• Professional connection: LinkedIn

• Problem: Not scalable

• Solution: Collaborative Filtering

Preference Data: Structure

2. Ranking the Users

3. Recommending Items based on weighted preference data

Finding Similar Users

– Cosine similarity: better with binary/fractional data

– Pearson correlation: continuous variables (e.g. numerical ratings)

– Others: Jaccard coefficient, Manhattan distance

Ranking the Users

Pearson Correlation Similarity

– Top 3 matches: Michael, Peter, July -> Likely-minded!

Lady in Snake You, Me,

Recommending Items – 1/2

• Problems: If we only use top 1 likely-minded customer

• Solution: Score the items by producing a weighted score

Item-based Collaborative Filtering

• It cannot handle new items

• It do not incorporate attribute information

• Alternative way: content-based recommendations

2. Describe each item as a vector of features

3. Develop a user profile: the types of items this user likes

• Note 1: Similar to information retrieval (text mining)

Netflix Recommendation Algorithm: A

The Netflix Prize

• Training data set of ~100,000,000 ratings and test

• Offered a grand prize of USD1,000,000 to the team

• Teams must submit code and a description of the

• If any team met the 10% improvement goal, last call

• By October 8, a team submitted an algorithm that

• By October 15, there were three teams with

• One of these solutions beat Cinematch by >1%,

• The 2007 progress prize went to Team BellKor, with

• In the following year, several teams from across the

• This was the last progress prize because another 1%

The Final 30 Days

• When Netflix stopped accepting submissions the

• Netflix would now test the algorithms on a private

More ideas for improvement

• Marginal propensity to buy with/without recommendation

• Incorporate text reviews: text review data can be used as a basis

You might also like

Collaborative Filtering & Recommendation System

Collaborative Filtering & Recommendation System