You are on page 1of 17

Wake Forest University | School of Business

Marketing Analytics
BAN 6065
Professor Jia Li
Spring 2021

Today’s Agenda

• Product Analytics
– Recommendation System

– Synchronous Lecture: Market Basket Analysis


– Asynchronous Lecture: Collaborative Filtering

1
Netflix
• Online streaming video service
– The company started as an online DVD rental
service
– Started streaming video in 2007
– Today, more than 158 million subscribers in 190
countries
• The best stock in the past 10 years
– 10-year cumulative return: 3,522%

Netflix’s Recommendation System


Netflix says 80
percent of watched
content is based on
algorithmic
recommendations.

Netflix says 80 percent of watched content is based on


algorithmic recommendations

2
Other Examples
• Retail: Amazon

• Friends: Facebook

• Professional connection: LinkedIn

• Music: Spotify

• Websites: Reddit

Question Time

• Question 1

Accompanying Questions:
https://wakeforest.qualtrics.com/jfe/form/SV_397533jR7DWEmZE
(Link available at Canvas; right below the link to this video)

3
Collaborative Filtering

Key Ideas
• Intuition: Low tech way to get recommendation - ask your friends!
– Some of your friends have better “taste” than others (likely-minded)

• Problem: Not scalable


– As more and more options become available, it become less practical to
decide what you want by asking a small group of people
– They may not be aware of all the options

• Solution: Collaborative Filtering


– Search a large group of people and find a smaller set with tastes similar to
yours
– Looks at other things they like and combines them to create a ranked list of
suggestions
– First used by David Goldberg (Xerox PARC, 1992): “Using collaborative filtering
to weave an information tapestry.”

4
Input Data
• Explicit (Questioning)
– Explicit rating (1 -5 numerical ratings)
– Favorites (Likes): 1 (liked), 0 (No vote), -1 (disliked)

• Implicit (Behavioral)
– Purchase: 1 (bought), 0 (didn’t buy)
– Clicks: 1 (clicked), 0 (didn’t click)
– Reads: 1 (read), 0 (didn’t read)
– Watching a Video: 1 (watched), 0 (didn’t watch)
– Hybrid: 2 (bought), 1 (browsed), 0 (didn’t buy)

Preference Data: Structure


• Rows: Customers/Users
• Columns: Items
Lady in Snake You, Me,
Just My Superman The Night
Customer ID the on a and
Luck Returns Listener
Water Plane Dupree
Mike 2.5 3.5 3.0 3.5 2.5 3.0
Jay 3.5 3.5 3.0
July 3.5 3.0 4.0 2.5 4.5
Peter 2.5 4.0 2.0 3.0
Stephen 3.0 4.0 5.0 3.0

5
Collaborative Filtering Tasks
1. Finding Similar Users: Calculating Similarities

2. Ranking the Users

3. Recommending Items based on weighted preference data

Finding Similar Users


• Calculate pair-wise similarities
– Euclidean Distance: Simple, but subject to rating inflation

– Cosine similarity: better with binary/fractional data

– Pearson correlation: continuous variables (e.g. numerical ratings)

– Others: Jaccard coefficient, Manhattan distance

6
Cosine similarity

1 0 1 1

0 1 0 1

1 0 0 1
0
1 1 1

0 1 0 1

Ranking the Users


• Focal customer
– Toby: Preference Vector (“Snakes on a Plane”: 4.5, “Superman
Returns”: 4.0, “You, Me, and Dupree: 1.0”)

Pearson Correlation Similarity


Customer ID
with Toby
Mike 0.99
Jay 0.38
July 0.89
Peter 0.92
Stephen 0.66

– Top 3 matches: Michael, Peter, July -> Likely-minded!

7
Recommend movies to Toby based on
the user ranking

Lady in Snake You, Me,


Just My Superman The Night
Customer ID the on a and
Luck Returns Listener
Water Plane Dupree
Mike 2.5 3.5 3.0 3.5 2.5 3.0
Jay 3.5 3.5 3.0
July 3.5 3.0 4.0 2.5 4.5
Peter 3.0 4.0 2.0 3.0
Stephen 3.0 4.0 5.0 3.0
Toby 4.5 4.0 1.0

Recommending Items – 1/2

• Problems: If we only use top 1 likely-minded customer


– May accidently turn up customers who haven’t reviewed some of
the movies that I might like
– Could return a customer who strangely liked a movie that got bad
reviews from all other customers

• Solution: Score the items by producing a weighted score


that ranks the customers (weights by similarity)

8
Recommending Items – 2/2
Lady in Snake Just Superm
Customer You, Me, The Night
Similarity the on a My an
ID and Dupree Listener
Water Plane Luck Returns
Michael 0.99 2.5 3.5 3.0 3.5 2.5 3.0
Jay 0.38 3.5 3.5 3.0
July 0.89 3.5 3.0 4.0 2.5 4.5
Peter 0.92 3.0 4.0 2.0 3.0
Stephen 0.66 3.0 4.0 5.0 3.0
Weighted
5.24 10.26
Rating Sum
Sum of
1.91 2.80
Weight
Normalized
2.74 3.66
Rating

Item-based Collaborative Filtering


• You tend to like a product because you bought some similar
products.

9
Item-based Collaborative Filtering
Example
Product Product Product
1 2 3
Customer1 2 3
Customer2 5 2
Customer3 3 3 1
Customer4 2 2
Item-item similarity is computed by looking into co-rated
items only.

Question Time

• Question 2
• Question 3
• Question 4

Accompanying Questions:
https://wakeforest.qualtrics.com/jfe/form/SV_397533jR7DWEmZE
(Link available at Canvas; right below the link to this video)

10
Problems with Collaborative Filtering
• When data are sparse, correlations (weights) are based on
very few common items -> unreliable

• It cannot handle new items

• It do not incorporate attribute information

• Alternative way: content-based recommendations


– Let’s use attribute information!

Content-based Recommendations
1. Defined features and feature values

2. Describe each item as a vector of features

3. Develop a user profile: the types of items this user likes


– A weighted vector of item attributes
– Weights denote the importance of each attribute to the user

4. Recommend items that are similar to those that a user liked in the
past

• Note 1: Similar to information retrieval (text mining)


• Note 2: Pre-computation possible; More scalable -> Used by Amazon

11
Example of Content-based approach
• Movie content
• Genre, actors, director, movie summary, ….

Netflix Recommendation Algorithm: A


Hybrid Type
• Netflix uses a hybrid recommendation algorithm that
combines collaborative filtering and content-based
recommendations (Matrix Factorization)
• For example, consider a collaborative filtering approach
where we determine that Amy and Carl have similar
preferences.
• We could then do content mining, where we would find that
“Terminator”, which both Amy and Carl liked, is classified in
almost the same set of genres as “Starship Troopers.”
• Recommend “Starship Troopers” to both Amy and Carl, even
though neither of them have seen it before.

12
Question Time

• Question 5

Accompanying Questions:
https://wakeforest.qualtrics.com/jfe/form/SV_397533jR7DWEmZE
(Link available at Canvas; right below the link to this video)

The Netflix Prize


• From 2006 to 2009, Netflix ran a contest asking the
public to submit algorithms to predict user ratings for
movies.

• Training data set of ~100,000,000 ratings and test


data set of ~3,000,000 ratings were provided.

• Offered a grand prize of USD1,000,000 to the team


who could beat Netflix’s own algorithm, Cinematch,
by more than 10% (measured in RMSE).

13
Contest Rules
• If the grand prize as not yet reached, progress prizes
of USD50,000 per year would be awarded for the
best result so far, as long as it had >1% improvement
over the previous year.

• Teams must submit code and a description of the


algorithm to be awarded any prizes.

• If any team met the 10% improvement goal, last call


would be issued and 30 days would remain for all
teams to submit their best algorithm.

Initial Results
• The contest went live on October 2, 2006.

• By October 8, a team submitted an algorithm that


beat Cinematch.

• By October 15, there were three teams with


algorithms beating Cinematch.

• One of these solutions beat Cinematch by >1%,


qualifying for a progress prize.

14
Progress During the Contest
• By June 2007, over 20,000 teams had registered from
over 150 countries.

• The 2007 progress prize went to Team BellKor, with


an 8.43% improvement on Cinematch.

• In the following year, several teams from across the


world joined forces.

Competition intensified …
• The 2008 progress prize went to Team BellKor which
contained researchers from the original BellKor team
as well as the Team BigChaos.

• This was the last progress prize because another 1%


improvement would reach the grand prize goal of
10%.

15
Last Call announced …
• On June 26, 2009, the Team BellKor’s Pragmatic
Chaos submitted a 10.05% improvement over
Cinematch.

The Final 30 Days


• 29 Days after last call was announced, on July 25,
2009, the Team The Ensemble submitted a 10.09%
improvement.

• When Netflix stopped accepting submissions the


next day, BellKor’s Pragmatic Chaos had submitted a
10.09% improvement solution and The Ensemble
had submitted a 10.10% improvement solution.

• Netflix would now test the algorithms on a private


test set and announce the winner.

16
Winner is declared!
• On September 18, 2009, a winning team was
announced

• BellKor’s Pragmatic
Chaos won the
competition and the
USD1,000,000 grand
prize.

More ideas for improvement


• Ensemble methods (combining algorithms)
– Most advanced/commercial algorithms combine kNN, matrix
factorization (handling large/sparse matrix), and other classifiers

• Marginal propensity to buy with/without recommendation


(instead of probability of buy)
– Anand Bodapati (JMR 2008): “Customers who buy this product buy these
other products” kind of recommendation system frequently
recommends what customers would have bought anyway and the
recommendation system often creates only purchase acceleration rather
than expand sales

• Incorporate text reviews: text review data can be used as a basis


to calculate similarities (i.e. text mining)
– Basic methods only rely on numerical ratings/purchase data

17

You might also like