You are on page 1of 46

Trends in Personalized Video

Recommendations

Amey Dharwadker
Machine Learning Technical Lead, Facebook

/in/ameydharwadker
Acknowledgements

● Presenting work done by many folks across the industry and academia.

● Special thanks to the Video Recommendations Core Ranking team and


Facebook App leadership.
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Online Video Statistics

Source

92.7% of global internet users worldwide watch digital videos each week (2022) Source
Video Recommendations Products Examples

Facebook Watch Instagram Reels Youtube Netflix Tiktok


Video Recommendations Vision

Help users find, enjoy and create interesting video content to


Consumers & Creators Relevance

strengthen their relationships, create communities

and realize economic opportunities.


Jobs to be Done
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Typical Industrial Video Recommender System

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High


Generation Thousands

Find somewhat related videos for the user, Scalable, Inexpensive models/rules
Typical Industrial Video Recommender System

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low


Generation Thousands
Ranking Thousands

ML models with medium complexity, Very fast, Cheap inference cost


Typical Industrial Video Recommender System

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low Hundreds


Full Ranking
Generation Thousands
Ranking Thousands

High complexity ML models to optimize what’s left and send to Predictions Aggregation stage
Typical Industrial Video Recommender System

Multi-stage ranking operating under computational resources constraints.

Billions Candidate High Early Stage Low Hundreds Predictions


Full Ranking
Generation Thousands
Ranking Thousands
Aggregation

Aggregate predictions optimizing for business logic, eg: combine with integrity rules
Video Recommendations Trends

● Multi-stakeholder Marketplace
○ Multiple stakeholders with diverse goals, eg: users, creators, platform
Video Recommendations Trends

● Multi-stakeholder Marketplace
○ Multiple stakeholders with diverse goals, eg: users, creators, platform

● Mix Length Video Formats


○ Need to cater to both short, snackable videos along with long-form videos
Video Recommendations Trends

● Multi-stakeholder Marketplace
○ Multiple stakeholders with diverse goals, eg: users, creators, platform

● Mix Length Video Formats


○ Need to cater to both short, snackable videos along with long-form videos

● Large Inventory to Rank (UGC)


○ User connections trusted more than media organizations / brands
○ Quick, casual video content preferred over polished production
Video Recommendations Personalization Challenges

● Optimizing for Multiple Objectives and Stakeholders


○ Tradeoffs often difficult to reconcile
Video Recommendations Personalization Challenges

● Optimizing for Multiple Objectives and Stakeholders


○ Tradeoffs often difficult to reconcile

● Mitigating Bias in Training Data


○ Exposure bias, Position bias, etc.
Video Recommendations Personalization Challenges

● Optimizing for Multiple Objectives and Stakeholders


○ Tradeoffs often difficult to reconcile

● Mitigating Bias in Training Data


○ Exposure bias, Position bias, etc.

● Mitigating Bias in Prediction Events Design


○ Different events favor videos of different duration
Video Recommendations Personalization Challenges

● Optimizing for Multiple Objectives and Stakeholders


○ Tradeoffs often difficult to reconcile

● Mitigating Bias in Training Data


○ Exposure bias, Position bias, etc.

● Mitigating Bias in Prediction Events Design


○ Different events favor videos of different duration

● Better Personalization
○ Need to capture higher-order interactions
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Multi-task Learning (MTL)

Goal: Combine knowledge from several tasks to help each other

Source

Predicting reviews and sentiment together


performs better than predicting each separately
Source

Predicting user engagement and satisfaction


together performs better than single task learning
(STL) predicting each separately
Information Transfer in MTL

How to improve information transfer between tasks in multi-task learning


(MTL) ?
Information Transfer in MTL

How to improve information transfer between tasks in multi-task learning


(MTL) ?
Negative Transfer in MTL
● Negative transfer is prevalent when training over heterogeneous tasks (Wu et al., ’20)

● Shared MTL module capacity:


○ Too large → No transfer
○ Too small → Negative transfer

Shared Shared
Modules Modules
Negative Transfer in MTL
● Geometry between the tasks matter
○ Measured by covariance matrices
○ Misaligned data → negative/suboptimal transfer
Negative Transfer in MTL
● Geometry between the tasks matter
○ Measured by covariance matrices
○ Misaligned data → negative/suboptimal transfer

Positive transfer in target task when


source has same covariance.

Negative to positive transfer when


source has different covariance.
Negative Transfer in MTL
● Geometry between the tasks matter
○ Measured by covariance matrices
○ Misaligned data → negative/suboptimal transfer

Positive transfer in target task when


source has same covariance.

Negative to positive transfer when


source has different covariance.

● Covariance alignment modules between shared module and task inputs (jointly trained)
Task Optimization Imbalance in MTL

How to improve target task performance via auxiliary tasks without


optimization imbalance problem?

Shared parameters of a MTL model are


updated based on the sum of gradients of
each task's loss
Task Optimization Imbalance in MTL

How to improve target task performance via auxiliary tasks without


optimization imbalance problem?

MetaBalance (He et al., ’22) balances


auxiliary losses by manipulating gradients to
match magnitude of the target loss.
Task Optimization Imbalance in MTL

How to improve target task performance via auxiliary tasks without


optimization imbalance problem?

● Flexible gradient adaptation framework


● Strengthens dominance of target task
● Enhances knowledge transfer from weak
auxiliary task
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Training Data Conformity Bias

How to disentangle user interactions influenced by conformity bias leading


to inconsistency with true interests?
Training Data Conformity Bias

How to disentangle user interactions influenced by conformity bias leading


to inconsistency with true interests?

● Learn separate conformity and interest


embeddings from training data (Zheng et al.,
’21).
Training Data Conformity Bias

How to disentangle user interactions influenced by conformity bias leading


to inconsistency with true interests?

● Learn separate conformity and interest


embeddings from training data (Zheng et al.,
’21).

● Supervision on embedding distribution


enforces disentanglement (eg:
max(dist(E(con), E(int))))
Training Data Position Bias

How to disentangle user interactions influenced by position bias leading to


inconsistency with true interests?
Training Data Position Bias

How to disentangle user interactions influenced by position bias leading to


inconsistency with true interests?

● Prediction factorized into relevance component


from main tower + bias component from
shallow tower.

Source
Training Data Position Bias

How to disentangle user interactions influenced by position bias leading to


inconsistency with true interests?

● Prediction factorized into relevance component


from main tower + bias component from
shallow tower.

● Shallow tower trained with features contributing


to position bias with dropout.
○ Eg: Position, Interface, etc.
Source
Biased Prediction Events
How to effectively rank both short and long videos jointly in a recommender
system?
Biased Prediction Events
How to effectively rank both short and long videos jointly in a recommender
system?

● Unbiased metric Watch Time Gain


(WTG) → User’s WT relative to average
WT of all users on videos with similar
duration (Zheng et al., ’22)
Biased Prediction Events
How to effectively rank both short and long videos jointly in a recommender
system?

● Unbiased metric Watch Time Gain


(WTG) → User’s WT relative to average
WT of all users on videos with similar
duration (Zheng et al., ’22)
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Learn Better Feature Interactions

How to design model architectures that effectively capture feature


interactions while maintaining consistent performance across datasets?

Different modules capturing the same degree of interaction have


different strengths and capture non-overlapping information.
Learn Better Feature Interactions

How to design model architectures that effectively capture feature


interactions while maintaining consistent performance across datasets?

Deep & Hierarchical Ensemble Network (DHEN) (Buyun et al.,


’22)

● Recursively stack interaction and ensemble layers to learn


interactions of different orders.
Learn Better Feature Interactions

How to design model architectures that effectively capture feature


interactions while maintaining consistent performance across datasets?

● Example Interaction Modules: Self-attention, DCN,


Convolution, etc.
● Model Quality: Each module in a layer implicitly connected
to modules in other layers.
● Generalizability: Flexible module combinations, Pluggable
modular design.
This Talk

● Introduction to Video Recommendations


● Video Recommendations Trends and Challenges
● Challenges with Multi-task Learning
● Challenges with Biased Training Data
● Challenges with Learning Better Feature Interactions
● Quick Summary
Quick Summary

● Video Recommendations broad trends across the industry.

● Optimizing the system for multiple stakeholders and balancing conflicting objectives is
an active area of research.

● Lot of emphasis on debiasing training data and prediction events used to train and
serve recommendations models.

● Deep learning model architectures are still evolving to generate better personalized
recommendations from very large video inventory.

● Many more insights and results in each of the cited papers!


Thank you for your attention!

Trends in Personalized Video


Recommendations

Amey Dharwadker
Machine Learning Technical Lead, Facebook

/in/ameydharwadker

You might also like