Professional Documents
Culture Documents
Personalisation at Netflix
Solution Elements
What are contextual bandits?
Contextual bandits are a class of online reinforcement
learning algorithms. The machine learning is not done as
a batch but interleaved with the data collection process. Figure 6: Model Training Workflow (Jebara, 2018)
Contextual bandits are designed to minimize regret.
(Surmenok, 2017) The first is sampling, i.e. the randomization. The training
data for the model is obtained by administering
As an illustrative analogue, imagine that a man is in a controlled randomization in the learned model’s
casino with several slot machines with different predictions. Different randomization schemes used in
probabilities of rewards and he wants to choose the best conjunction with contextual bandits are available, e.g.
one. The man would need to trade-off exploration (trying UCB, epsilon-greedy & Thompson. Netflix uses
different slot machines) with exploitation (repeatedly pull Thompson sampling scheme. (Jebara, 2018)
the best lever). This is the multi-armed bandit problem.
If we consider that we are getting additional information The second step is observation. The dataset contains data
from the environment or “context” about each slot in the form of a [user, title, image] tuple which
machine (e.g. one slot machine is big, lights are blinking contains information about the user history, location, title
on another), the problem changes to a contextual bandit metadata etc. (Ashok Chandrashekar, 2017). The tuple
problem (Surmenok, 2017). also stores the user engagement label. Each title has
typically more than a dozen candidate images. The In the above example, the previously logged actions of
context is observed, and user image preferences are the user are compared with the model assignment. Only
learned across titles because for other image candidates, in cases where the logged actions and model assignments
there are some people who engaged with it and some who matched, the take fraction was calculated. For this
did not. After observing the context, these preferences particular example, the match occurred only in three
are modelled to predict the probability of engagement for cases, out of which in one case the user did not engage.
each [user, title, image] tuple. For context, the This gives us a take rate of 2/3 (Jebara, 2018).
algorithm works in conjunction with the title
recommender and uses the users viewing history, the The predicted take rates using replay did increase
specific title, the metadata, the genre they have been significantly.
playing, the country as well as the language. After shortlisting a few contextual bandit algorithms, they
The third step is action. The highest-ranking image ran online A/B tests (Steve Urban, 2016) with the new
candidate based on the calculated probabilities is shown contextual bandit algorithm, and sure enough, there was
to the user. a significant increase in the take-rate as shown below.
The fifth step is update. Based on the user engagement, Solving the Engineering problem
the model updates the [user, title, image] tuple. The
As discussed previously, the image lookup for the UI
sampling will change depending upon the reward, and the
needs to be swift or will result in poor user experience.
model eventually learns how to maximize the reward.
The company could use two strategies for the same, i.e.
(Ashok Chandrashekar, 2017)
Live-compute or Online pre-compute (Jebara, 2018).
Since the reward is the overall take rate (Jebara, 2018), it
Live-compute is synchronous computation for a
effectively personalizes the most impactful images for
response to a user request. Online pre-compute means
each user-title combination.
that computation and storage of the image assignment
Performance Evaluation & Outcome take place in the cache before even a request arrives from
a user.
Several contextual bandit algorithms were developed by
Netflix using varying parameters. Before deployment, Live compute has access to the freshest data of the
they had to conduct offline tests of whether the customer and knows the entire context. It also does not
performance improved the take rates (Ashok compute the data for every user beforehand, which
Chandrashekar, 2017). means it computes only what is necessary. However, Live
Compute needs to have high availability and swiftness to
For this, they used a technique called “replay” which respond to user requests fast, which limits its ability to
allowed them to answer counterfactual questions on run complex algorithms (Jebara, 2018).
logged exploration data. They calculated the take rates for
only those cases where the user saw the artwork, which Online pre-compute can handle large amounts of data
was predicted by the current algorithm. and more complex algorithms. It can also average out
computational cost across users. However, it has a lag
period and does not have the entire context of the user
and does not provide the freshest recommendations
(Jebara, 2018).
For artwork recommender systems, Netflix uses online
pre-compute because live compute has high service-level
requirements in the absence of which UX degradation
Figure 7: Replay calculation (Jebara, 2018) will happen.
Netflix also has built-in redundancy with what it calls fans the image will convey that Uma Thruman is starring
graceful degradation. This means that in case of failure of in it (Ashok Chandrashekar, 2017).
the contextual bandit due to unavailability, the image falls
back to a non-personalized image. In case that also fails,
the default artwork is picked up and shown to the user
(Jebara, 2018).
Business Outcome
Even though Netflix is a subscription service, one of its Figure 9: Example of Outcome (Ashok Chansrasekhar, 2017)
early insights was that customers wanted to watch their
A similar comparison can be made genre-wise. People
money’s worth on Netflix. Hence, it is reasonably well
who watch romantic movies versus people who watch
established that Netflix watch time is highly correlated
comedy movies get different images for the same title
with repeat subscription and revenues.
according to their preferences (Ashok Chandrashekar,
Does increasing the take rate lead to higher watch time? 2017).
Is it not possible that the user was just replacing one title
for another? An early study when Netflix was using non-
contextual bandits did conclude that increasing the take
rate led to an increase in watch-time. Hence, Netflix
achieved a significant business outcome.
Avoiding clickbait
Currently, Netflix does not punish clickbait, i.e. when a
customer clicks on the title because of the artwork but
does not like it. It treats it equivalent to when a user does
not click on the title. While this already mitigates clickbait
to an extent, an improvement could be to award negative
rewards for clickbait. This would discourage clickbait
images faster and besides, could be used to identify
further which artwork is highly likely to be clickbait.
References
1. Ashok Chandrashekar, F. A. (2017, December 7). Artwork Personalization
at Netflix. Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/artwork-personalization-c589f074ad76
2. Jebara, T. (2018, November 7). Artwork Personalization at Netflix. New
York, USA.
3. Krishnan, G. (2016, May 3). Selecting the best artwork for videos through A/B
testing. Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/selecting-the-best-artwork-for-videos-
through-a-b-testing-f6155c4595f6
4. Morris, C. (2018, October 2). Netflix Consumes 15% of the World’s Internet
Bandwidth. Retrieved from Fortune:
https://fortune.com/2018/10/02/netflix-consumes-15-percent-of-
global-internet-
bandwidth/#:~:text=The%20streaming%20video%20giant%20consu
mes,to%2019.1%25%20of%20total%20traffic.
5. Nelson, N. (2016, May 3). The Power Of A Picture. Retrieved from Netflix
Media Centre: https://media.netflix.com/en/company-blog/the-
power-of-a-picture
7. Steve Urban, R. S. (2016, April 29). It’s All A/Bout Testing: The Netflix
Experimentation Platform. Retrieved from Medium: The Netflix
Technology Blog: https://netflixtechblog.com/its-all-a-bout-testing-
the-netflix-experimentation-platform-4e1ca458c15
10. Xavier Amatriain, J. B. (2012, Jun 20). Netflix Recommendations: Beyond the
5 stars (Part 2). Retrieved from Medium: The Netflix Technology Blog:
https://netflixtechblog.com/netflix-recommendations-beyond-the-5-
stars-part-2-d9b96aa399f5