Trends in Personalized Video Recommendations

Trends in Personalized Video
Recommendations
Amey Dharwadker
Machine Learning Technical Lead, Facebook
/in/ameydharwadker
Acknowledgements
● Presenting work done by many folks across the industry and academia.
● Special thanks to the Video Recommendations Core Ranking team and

Facebook App leadership.
This Talk
● Introduction to Video Recommendations

● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Online Video Statistics
Source
92.7% of global internet users worldwide watch digital videos each week (2022) Source
Video Recommendations Products Examples
Facebook Watch Instagram Reels Youtube Netflix Tiktok

Video Recommendations Vision
Help users find, enjoy and create interesting video content to

Consumers & Creators Relevance
strengthen their relationships, create communities
and realize economic opportunities.

Jobs to be Done
This Talk

● Quick Summary
Typical Industrial Video Recommender System
Multi-stage ranking operating under computational resources constraints.
Billions Candidate High

Generation Thousands
Find somewhat related videos for the user, Scalable, Inexpensive models/rules
Billions Candidate High Early Stage Low

Ranking Thousands
ML models with medium complexity, Very fast, Cheap inference cost

Billions Candidate High Early Stage Low Hundreds

Full Ranking
Ranking Thousands
High complexity ML models to optimize what’s left and send to Predictions Aggregation stage
Billions Candidate High Early Stage Low Hundreds Predictions

Full Ranking
Ranking Thousands
Aggregation
Aggregate predictions optimizing for business logic, eg: combine with integrity rules
Video Recommendations Trends
● Multi-stakeholder Marketplace
○ Multiple stakeholders with diverse goals, eg: users, creators, platform
● Mix Length Video Formats

○ Need to cater to both short, snackable videos along with long-form videos
● Mix Length Video Formats

○ Need to cater to both short, snackable videos along with long-form videos
● Large Inventory to Rank (UGC)

○ User connections trusted more than media organizations / brands
○ Quick, casual video content preferred over polished production
Video Recommendations Personalization Challenges
● Optimizing for Multiple Objectives and Stakeholders

○ Tradeoffs often difficult to reconcile

● Mitigating Bias in Training Data

○ Exposure bias, Position bias, etc.


● Mitigating Bias in Prediction Events Design

○ Different events favor videos of different duration


● Mitigating Bias in Prediction Events Design

○ Different events favor videos of different duration
● Better Personalization
○ Need to capture higher-order interactions
This Talk

● Quick Summary
Multi-task Learning (MTL)
Goal: Combine knowledge from several tasks to help each other
Source
Predicting reviews and sentiment together

performs better than predicting each separately
Source
Predicting user engagement and satisfaction

together performs better than single task learning
(STL) predicting each separately
Information Transfer in MTL
How to improve information transfer between tasks in multi-task learning

(MTL) ?
Information Transfer in MTL
How to improve information transfer between tasks in multi-task learning

(MTL) ?
Negative Transfer in MTL
● Negative transfer is prevalent when training over heterogeneous tasks (Wu et al., ’20)
● Shared MTL module capacity:

○ Too large → No transfer
○ Too small → Negative transfer
Shared Shared
Modules Modules
● Geometry between the tasks matter
○ Measured by covariance matrices
○ Misaligned data → negative/suboptimal transfer
Positive transfer in target task when

source has same covariance.
Negative to positive transfer when

source has different covariance.
Positive transfer in target task when

source has same covariance.
Negative to positive transfer when

source has different covariance.
● Covariance alignment modules between shared module and task inputs (jointly trained)
Task Optimization Imbalance in MTL
How to improve target task performance via auxiliary tasks without

optimization imbalance problem?
Shared parameters of a MTL model are

updated based on the sum of gradients of
each task's loss

MetaBalance (He et al., ’22) balances

auxiliary losses by manipulating gradients to
match magnitude of the target loss.

● Flexible gradient adaptation framework

● Strengthens dominance of target task
● Enhances knowledge transfer from weak
auxiliary task
This Talk

● Quick Summary
Training Data Conformity Bias
How to disentangle user interactions influenced by conformity bias leading

to inconsistency with true interests?

● Learn separate conformity and interest

embeddings from training data (Zheng et al.,
’21).

● Learn separate conformity and interest

embeddings from training data (Zheng et al.,
’21).
● Supervision on embedding distribution

enforces disentanglement (eg:
max(dist(E(con), E(int))))
Training Data Position Bias
How to disentangle user interactions influenced by position bias leading to

inconsistency with true interests?

● Prediction factorized into relevance component

from main tower + bias component from
shallow tower.
Source

● Prediction factorized into relevance component

from main tower + bias component from
shallow tower.
● Shallow tower trained with features contributing

to position bias with dropout.
○ Eg: Position, Interface, etc.
Source
Biased Prediction Events
How to effectively rank both short and long videos jointly in a recommender
system?
system?
● Unbiased metric Watch Time Gain

(WTG) → User’s WT relative to average
WT of all users on videos with similar
duration (Zheng et al., ’22)
system?
● Unbiased metric Watch Time Gain

(WTG) → User’s WT relative to average
WT of all users on videos with similar
duration (Zheng et al., ’22)
This Talk

● Quick Summary
Learn Better Feature Interactions
How to design model architectures that effectively capture feature

interactions while maintaining consistent performance across datasets?
Different modules capturing the same degree of interaction have

different strengths and capture non-overlapping information.

Deep & Hierarchical Ensemble Network (DHEN) (Buyun et al.,

’22)
● Recursively stack interaction and ensemble layers to learn

interactions of different orders.

● Example Interaction Modules: Self-attention, DCN,

Convolution, etc.
● Model Quality: Each module in a layer implicitly connected
to modules in other layers.
● Generalizability: Flexible module combinations, Pluggable
modular design.
This Talk

● Quick Summary
Quick Summary
● Video Recommendations broad trends across the industry.
● Optimizing the system for multiple stakeholders and balancing conflicting objectives is
an active area of research.
● Lot of emphasis on debiasing training data and prediction events used to train and
serve recommendations models.
● Deep learning model architectures are still evolving to generate better personalized
recommendations from very large video inventory.
● Many more insights and results in each of the cited papers!

Thank you for your attention!
Trends in Personalized Video

Recommendations
Amey Dharwadker
Machine Learning Technical Lead, Facebook
/in/ameydharwadker

Trends in Personalized Video Recommendations

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Trends in Personalized Video Recommendations

Uploaded by

Copyright:

Trends in Personalized Video

● Special thanks to the Video Recommendations Core Ranking team and

● Introduction to Video Recommendations

Facebook Watch Instagram Reels Youtube Netflix Tiktok

Help users find, enjoy and create interesting video content to

strengthen their relationships, create communities

and realize economic opportunities.

● Introduction to Video Recommendations

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low

ML models with medium complexity, Very fast, Cheap inference cost

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low Hundreds

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low Hundreds Predictions

● Mix Length Video Formats

● Mix Length Video Formats

● Large Inventory to Rank (UGC)

● Optimizing for Multiple Objectives and Stakeholders

● Optimizing for Multiple Objectives and Stakeholders

● Mitigating Bias in Training Data

● Optimizing for Multiple Objectives and Stakeholders

● Mitigating Bias in Training Data

● Mitigating Bias in Prediction Events Design

● Optimizing for Multiple Objectives and Stakeholders

● Mitigating Bias in Training Data

● Mitigating Bias in Prediction Events Design

● Introduction to Video Recommendations

Goal: Combine knowledge from several tasks to help each other

Predicting reviews and sentiment together

Predicting user engagement and satisfaction

How to improve information transfer between tasks in multi-task learning

How to improve information transfer between tasks in multi-task learning

● Shared MTL module capacity:

Positive transfer in target task when

Negative to positive transfer when

Positive transfer in target task when

Negative to positive transfer when

How to improve target task performance via auxiliary tasks without

Shared parameters of a MTL model are

How to improve target task performance via auxiliary tasks without

MetaBalance (He et al., ’22) balances

How to improve target task performance via auxiliary tasks without

● Flexible gradient adaptation framework

● Introduction to Video Recommendations

How to disentangle user interactions influenced by conformity bias leading

How to disentangle user interactions influenced by conformity bias leading

● Learn separate conformity and interest

How to disentangle user interactions influenced by conformity bias leading

● Learn separate conformity and interest

● Supervision on embedding distribution

How to disentangle user interactions influenced by position bias leading to

How to disentangle user interactions influenced by position bias leading to

● Prediction factorized into relevance component

How to disentangle user interactions influenced by position bias leading to

● Prediction factorized into relevance component

● Shallow tower trained with features contributing

● Unbiased metric Watch Time Gain

● Unbiased metric Watch Time Gain

● Introduction to Video Recommendations

How to design model architectures that effectively capture feature

Different modules capturing the same degree of interaction have

How to design model architectures that effectively capture feature

Deep & Hierarchical Ensemble Network (DHEN) (Buyun et al.,

● Recursively stack interaction and ensemble layers to learn