You are on page 1of 7

Assignment 1: Artwork

Personalisation at Netflix

Course: Big Data Analytics


Instructor: Professor Srikumar Krishnamoorthy
Academic Associate: Ms. Simran Ketha
Submitted by: Dhavala V S Aditya (19422)
Introduction lot about the progress Netflix has made in its algorithms
and better sets the context for the artwork recommender.
Netflix is an American on-demand OTT content
The first recommender algorithm that Netflix used was
streaming platform. It was founded by Reed Hastings &
Cinematch. It was much simpler than the algorithms
Marc Randolph as a video rental company but over the
being used today and was based on collaborative filtering
years has developed a reputation as a force to reckon with
with the nearest neighbourhood selection. (Jebara, 2018)
in the technology world (Netflix, 2020). Today Netflix is
present worldwide and consumes 15% (Morris, 2018) of The primary approach that popped out of the 2006
the world’s internet bandwidth which is an indicator of Netflix prize was linear factorization model-based
the sheer wealth of data that Netflix collects. The approach. In linear factorization, the user-movie matrix
company is particularly well known for its recommender was equated to the matrix product of two skinny
systems. matrices. If the values in the two skinny matrices can be
calculated approximately, multiplication could be used to
During the initial years, Netflix had the “Cinematch”
fill in the missing entries. (Jebara, 2018)
recommendation algorithm to recommend DVDs to
customers (Xavier Amatriain, Netflix Recommendations: 𝑅 = 𝑈𝑀
Beyond the 5 stars (Part 1), 2012). In 2006, they
introduced the “Netflix Prize”, a $1 million award to any
team which could reduce the RMSE of their rating
predictor by 10%. The Grand Prize was finally won by a
team which clocked thousands of hours of work and a
blend of hundreds of predictive algorithms.
In 2007, Netflix launched its online streaming service. By
then, Netflix had realized the value that improved
recommendations & personalization provided. The Figure 1: Representation of Linear Matrix Factorisation (Jebara,
company renewed its focus on the same. Streaming 2018)
changed the constraint of DVD utilization as well as the
type and amount of data that was now available. Subsequently, non-linear factorization models
replaced the linear factorization model. In this case,
The contest not only gave Netflix a vastly improved the view history was used as an input to neural
algorithm but ended up creating an attractive brand for networks. The neural networks reduced or
Netflix among computer scientists because of which they “encoded” the dimensionality of the original view
were able to attract star-talent to the company (Xavier
history into a smaller “code”, analogous to the
Amatriain, Netflix Recommendations: Beyond the 5 stars
(Part 1), 2012). Since then, Netflix has not looked back
skinny matrix in the above case. This “code” could
and has embraced personalization to the most then be grown back into the predicted view history
fundamental extent possible. using a “decoder”. The error between the
reconstructed view history and the actual view
Everything that Netflix shows is some form of history could be used to train the neural network.
personalization, ranging from the titles themselves to the
(Jebara, 2018)
number of rows and columns on the homepage. Netflix
even predicts which artwork a customer would prefer in
the recommended titles. This report will focus on how
Netflix personalizes that artwork.

Netflix title recommender


algorithms
Evolution of Netflix title recommender Figure 2: Representation of non-linear factorization (Jebara, 2018)
algorithms
The next step was to make the reconstructed view history
Netflix’s recommender algorithm for titles has evolved probabilistic by introducing a Gaussian. This improved
swiftly since 2006. Chronicling this history informs us a the model because it captured the uncertainty in the
predicted view history. This makes sense because there
does exist uncertainty in the user’s viewing pattern. The calculating the probability that a user would
Gaussian uncertainty was later replaced by Multinomial continue watching the title.
uncertainty because Gaussian uncertainty gave negative
preferences for some movies, which was impossible. After the algorithm generates candidate rows, based on
(Jebara, 2018) the device which the customer uses the rows compete for
space on the Netflix home page. Both stability and
We see that over the years, Netflix has evolved from diversity are taken into consideration when the rows are
simple collaborative filtering models to complex neural ranked. E.g. If a user is used to picking movies from
nets that today majorly power all Netflix “Continue Watching” row, it should appear at that
recommendations. These models are subject to position frequently. However, the user may not care
consistent improvement by testing out new hypotheses. about stability for rows like “US TV Shows”, which open
up scope for injecting diversity.
Ranking Methodology
Netflix makes use of a two-tiered ranking system.
Ranking happens within each row with the best
recommendations appearing towards the left. It also
Artwork
happens across rows with the best rows appearing recommendations
towards the top. (Xavier Amatriain, 2012)
The Problem
Consumer research studies conducted in 2014 revealed
that artwork was the most significant influencer for a
customer to engage with a title and consisted of more
than 80% of the focus while browsing Netflix. Since the
consumer spent less than 90 seconds on the homepage,
it was imperative to get the artwork right for them to
engage with the titles. (Nelson, 2016)
Figure 3: Ranking Scheme (Xavier Amatriain, 2012)
Solution evolution
Each row has a particular theme attached to it and is
ranked differently. Some of the standard in-row ranking Initially, the company took a non-personalized approach
algorithms (Xavier Amatriain, 2012) used are: to the same (Krishnan, 2016). It tested the take rates of
various artworks for a title using A/B tests and chose the
i. Personalized Video Ranking (PVR): It filters artwork with the best take rate to be shown to all
down movies and shows that fulfil particular criteria customers. However, every A/B test has a “regret”
(e.g. US TV Shows) ranked with user features & associated with it, which is essentially the worse
popularity. There are very diverse rows which come experience which was given to a fraction of the users for
here, e.g. 1980s Time Travel movies is also a some time.
plausible row candidate.
ii. Top N-video ranker: Instead of looking at
particular criteria, it takes into account the entire
catalogue.
iii. Trending Now: It captures temporal trends and
presents them to the user. These temporal trends
generally result in seasonal predictions, e.g.
Romantic movies during Valentine’s season.
iv. Video-Video Similarity: Resembling an item-item
collaborative filtering mechanism, it ranks movies
based on similarity with a particular movie or shows Figure 4: Regret in A/B Testing (Krishnan, 2016)
the user watched recently.
v. Continue Watching: This row presents titles that Netflix eventually decided that it should personalize the
the user left in between. Based on the amount the artwork experience for each user. However,
user watched and the context, the algorithm ranks personalization could not work with the traditional A/B
the titles that the user stopped watching by test approach because it would take several sessions to
conduct enough A/B tests across multiple pieces of
artwork to find the most acceptable one, i.e. This is different from supervised learning algorithms,
personalization drastically increased the regret (Krishnan, where the feedback that is received is the correct label. In
2016). the case of contextual bandits, the algorithm just gets a
reward based on whether the answer was correct or not
Key challenges with personalization but does not receive feedback about the correct answer.
Firstly, while a recommender system could show multiple Supervised Learning Contextual Bandit
options to the user at the same time, only a single piece Input: Features Input: Context
of artwork could be shown for a title. This means that it Output: Predicted Label Output: Action
was difficult to differentiate when a user engaged with a Feedback: Actual Label Feedback: Reward
title regardless of the artwork or was influenced by it.
This also meant that the artwork personalization needed
to work in conjunction with existing recommender
systems. It is the same reason for increased regret (Ashok
Chandrashekar, 2017).
Secondly, artwork personalization would only work when
there is a significantly big pool of engaging and diverse
artwork to choose from which satisfied the palates of a
Figure 5: Supervised vs Unsupervised Learning (Jebara,2018)
varied audience. Currently, Netflix has a creatives team to
create this diverse set of images. (Ashok Chandrashekar, In the case of Netflix, the slot machine is analogous to
2017) the artwork; the reward is analogous to user engagement
with the title and the additional information about each
Finally, the company faced engineering challenges
user (view history, location, title information) is an
because additional impression data needed to be logged
analogue to the context.
at a peak requirement of 20 million requests per second
and the system had to work fast enough to deliver the Training the Model
right artwork to the user whenever a session began. Since
Netflix interface is highly visual, any time an artwork is Training the model consists of five steps.
not shown due to high latency, it would drastically
degrade the user experience. (Ashok Chandrashekar,
2017)

Solution Elements
What are contextual bandits?
Contextual bandits are a class of online reinforcement
learning algorithms. The machine learning is not done as
a batch but interleaved with the data collection process. Figure 6: Model Training Workflow (Jebara, 2018)
Contextual bandits are designed to minimize regret.
(Surmenok, 2017) The first is sampling, i.e. the randomization. The training
data for the model is obtained by administering
As an illustrative analogue, imagine that a man is in a controlled randomization in the learned model’s
casino with several slot machines with different predictions. Different randomization schemes used in
probabilities of rewards and he wants to choose the best conjunction with contextual bandits are available, e.g.
one. The man would need to trade-off exploration (trying UCB, epsilon-greedy & Thompson. Netflix uses
different slot machines) with exploitation (repeatedly pull Thompson sampling scheme. (Jebara, 2018)
the best lever). This is the multi-armed bandit problem.
If we consider that we are getting additional information The second step is observation. The dataset contains data
from the environment or “context” about each slot in the form of a [user, title, image] tuple which
machine (e.g. one slot machine is big, lights are blinking contains information about the user history, location, title
on another), the problem changes to a contextual bandit metadata etc. (Ashok Chandrashekar, 2017). The tuple
problem (Surmenok, 2017). also stores the user engagement label. Each title has
typically more than a dozen candidate images. The In the above example, the previously logged actions of
context is observed, and user image preferences are the user are compared with the model assignment. Only
learned across titles because for other image candidates, in cases where the logged actions and model assignments
there are some people who engaged with it and some who matched, the take fraction was calculated. For this
did not. After observing the context, these preferences particular example, the match occurred only in three
are modelled to predict the probability of engagement for cases, out of which in one case the user did not engage.
each [user, title, image] tuple. For context, the This gives us a take rate of 2/3 (Jebara, 2018).
algorithm works in conjunction with the title
recommender and uses the users viewing history, the The predicted take rates using replay did increase
specific title, the metadata, the genre they have been significantly.
playing, the country as well as the language. After shortlisting a few contextual bandit algorithms, they
The third step is action. The highest-ranking image ran online A/B tests (Steve Urban, 2016) with the new
candidate based on the calculated probabilities is shown contextual bandit algorithm, and sure enough, there was
to the user. a significant increase in the take-rate as shown below.

The fourth step is the reward. A good outcome for which


the algorithm will be rewarded is when the user watches
and enjoys the content. The bad outcome when no
reward is given is when the user either does not click on
the title, or the user clicks but does not enjoy the content.
The second outcome is to avoid training the algorithm to
promote clickbait. The weighted reward that is given is
the overall take-rates for all the images. Figure 8: Result (Ashok Chandrasekhar, 2017)

The fifth step is update. Based on the user engagement, Solving the Engineering problem
the model updates the [user, title, image] tuple. The
As discussed previously, the image lookup for the UI
sampling will change depending upon the reward, and the
needs to be swift or will result in poor user experience.
model eventually learns how to maximize the reward.
The company could use two strategies for the same, i.e.
(Ashok Chandrashekar, 2017)
Live-compute or Online pre-compute (Jebara, 2018).
Since the reward is the overall take rate (Jebara, 2018), it
Live-compute is synchronous computation for a
effectively personalizes the most impactful images for
response to a user request. Online pre-compute means
each user-title combination.
that computation and storage of the image assignment
Performance Evaluation & Outcome take place in the cache before even a request arrives from
a user.
Several contextual bandit algorithms were developed by
Netflix using varying parameters. Before deployment, Live compute has access to the freshest data of the
they had to conduct offline tests of whether the customer and knows the entire context. It also does not
performance improved the take rates (Ashok compute the data for every user beforehand, which
Chandrashekar, 2017). means it computes only what is necessary. However, Live
Compute needs to have high availability and swiftness to
For this, they used a technique called “replay” which respond to user requests fast, which limits its ability to
allowed them to answer counterfactual questions on run complex algorithms (Jebara, 2018).
logged exploration data. They calculated the take rates for
only those cases where the user saw the artwork, which Online pre-compute can handle large amounts of data
was predicted by the current algorithm. and more complex algorithms. It can also average out
computational cost across users. However, it has a lag
period and does not have the entire context of the user
and does not provide the freshest recommendations
(Jebara, 2018).
For artwork recommender systems, Netflix uses online
pre-compute because live compute has high service-level
requirements in the absence of which UX degradation
Figure 7: Replay calculation (Jebara, 2018) will happen.
Netflix also has built-in redundancy with what it calls fans the image will convey that Uma Thruman is starring
graceful degradation. This means that in case of failure of in it (Ashok Chandrashekar, 2017).
the contextual bandit due to unavailability, the image falls
back to a non-personalized image. In case that also fails,
the default artwork is picked up and shown to the user
(Jebara, 2018).

Business Outcome
Even though Netflix is a subscription service, one of its Figure 9: Example of Outcome (Ashok Chansrasekhar, 2017)
early insights was that customers wanted to watch their
A similar comparison can be made genre-wise. People
money’s worth on Netflix. Hence, it is reasonably well
who watch romantic movies versus people who watch
established that Netflix watch time is highly correlated
comedy movies get different images for the same title
with repeat subscription and revenues.
according to their preferences (Ashok Chandrashekar,
Does increasing the take rate lead to higher watch time? 2017).
Is it not possible that the user was just replacing one title
for another? An early study when Netflix was using non-
contextual bandits did conclude that increasing the take
rate led to an increase in watch-time. Hence, Netflix
achieved a significant business outcome.

Figure 10: Example of Outcome (Ashok Chandrasekhar, 2017)


The benefit to the
Customer
Suggested Improvements
One of the most crucial features of a recommender
system is explainability. Just putting a recommendation in Automated artwork creation
front of a customer is not enough. One must be able to
Diversity in the artwork is essential for personalization.
answer “Why the customer would like it?” The answer to
If there are only two to three pieces of artwork choices
this question for simpler algorithms was easy. For a user
available, then the value addition through the contextual
based collaborative filtering algorithm, the answer was
bandit algorithm is minimal.
“Because other users similar to you have liked it.”
However, as Netflix’s recommender systems became However, currently, the artwork is created manually by
more complex, the algorithm became more of a black the creatives team at Netflix. Generating new images this
box, and their explainability reduced. Today, even way is expensive and time-consuming. Considering
employees of Netflix cannot explain why the system gave Netflix has 10000+ titles, this is not scalable as Netflix
a specific recommendation (Nelson, 2016). keeps increasing its catalogue.
Netflix realized that the best way to “explain” would be Instead, Netflix could create a model to scan the footage
through artwork. Imagery is a powerful thing and a way of the videos and find good artwork. This could be
to intuitively communicate to the customer that this trained by using the artwork data that they currently have
would be something they like. and identifying features which contribute to the success
of the artwork, finding scenes where such features are
“A picture is worth a thousand words.” expressed and editing them to create the required images.
In the case of Netflix, a good picture is the difference The methodology for identifying features is described
between a customer engaging with a recommended title well in
and the customer dropping off.
Metadata & Preview personalization
Customers who love watching Travolta movies would
In addition to artwork, the personal experience of the
look at the Pulp Fiction image with him and figure out
author suggests that another critical decision criterion is
that the reason they want to watch Pulp Fiction is that it
the video preview that one sees when a title is hovered
has John Travolta starring in it, while for Uma Thurman
over. A similar approach could be taken for preview
personalization and the other information presented to
the customer, e.g. the synopsis that is presented to the
user could be chosen from a set of pre-written summaries
and the preview video set could be achieved by using
scanning the video footage.

Avoiding clickbait
Currently, Netflix does not punish clickbait, i.e. when a
customer clicks on the title because of the artwork but
does not like it. It treats it equivalent to when a user does
not click on the title. While this already mitigates clickbait
to an extent, an improvement could be to award negative
rewards for clickbait. This would discourage clickbait
images faster and besides, could be used to identify
further which artwork is highly likely to be clickbait.

References
1. Ashok Chandrashekar, F. A. (2017, December 7). Artwork Personalization
at Netflix. Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/artwork-personalization-c589f074ad76
2. Jebara, T. (2018, November 7). Artwork Personalization at Netflix. New
York, USA.

3. Krishnan, G. (2016, May 3). Selecting the best artwork for videos through A/B
testing. Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/selecting-the-best-artwork-for-videos-
through-a-b-testing-f6155c4595f6

4. Morris, C. (2018, October 2). Netflix Consumes 15% of the World’s Internet
Bandwidth. Retrieved from Fortune:
https://fortune.com/2018/10/02/netflix-consumes-15-percent-of-
global-internet-
bandwidth/#:~:text=The%20streaming%20video%20giant%20consu
mes,to%2019.1%25%20of%20total%20traffic.

5. Nelson, N. (2016, May 3). The Power Of A Picture. Retrieved from Netflix
Media Centre: https://media.netflix.com/en/company-blog/the-
power-of-a-picture

6. Netflix. (2020). Retrieved from Wikipedia:


https://en.wikipedia.org/wiki/Netflix

7. Steve Urban, R. S. (2016, April 29). It’s All A/Bout Testing: The Netflix
Experimentation Platform. Retrieved from Medium: The Netflix
Technology Blog: https://netflixtechblog.com/its-all-a-bout-testing-
the-netflix-experimentation-platform-4e1ca458c15

8. Surmenok, P. (2017, August 27). Contextual Bandits and Reinforcement


Learning. Retrieved from Medium: Towards Data Science:
https://towardsdatascience.com/contextual-bandits-and-
reinforcement-learning-6bdfeaece72a

9. Xavier Amatriain, J. B. (2012, April 6). Netflix Recommendations: Beyond the


5 stars (Part 1). Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/netflix-recommendations-beyond-the-5-
stars-part-1-55838468f429

10. Xavier Amatriain, J. B. (2012, Jun 20). Netflix Recommendations: Beyond the
5 stars (Part 2). Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/netflix-recommendations-beyond-the-5-
stars-part-2-d9b96aa399f5

You might also like