Professional Documents
Culture Documents
Hanna Hauptmann
22-09-2021
General Introduction
Why is information
filtering needed?
• Information overload
• Too many movies,
books, webpages, songs,
plumbers, etc
• Searching is difficult
Recommender Systems
User 1 8 1
? 2 7
User 2 2
? 5 7 5
User 3 5 4 7 4 7
User 4 7 1 7 3 8
User 5 1 7 4 6 5
• Common situation
• Lots of users and items but only few ratings
• Sparseness of user-item matrix
• In addition, new items are continuously added
• Users should also rate these items
• Number of ratings has to keep up with new users
and items
• Possible solution include the automatic generation of
ratings and implicit user profiling, e.g. click on a video
constitutes (positive) rating
Diversity of Recommendations
• Item set I
• Users U, V with u[i] denoting rating of item i by user
• 𝑢 denotes the rating vector of user u denotes the
vector norm
("#$)!
• Mean squared difference: 𝑠𝑖𝑚 𝑢, 𝑣⃗ = &
"'$
• Cosine similarity: 𝑠𝑖𝑚 𝑢, 𝑣⃗ = "∗$
• Pearson/Spearman Correlation
Neighborhood of Similar Users I
• Aggregate neighborhood
• Follow similarity threshold method first
• If S is too small (less than k users) determine
„centroid“ of set S and add users which are most
similar to centroid (less deviators than center-based
method)
• How many neighbors?
• Only consider positively correlated neighbors
• Can be optimized based on data set
• Often, between 50 and 200
What is Clustering?
• Partitioning/flat algorithms
• Usually start with random (partial) partitioning
• Refine it iteratively
• E.g. K-Means clustering
• Hierarchical algorithms
• Bottom-up, agglomerative
• Example: Build tree-based hierarchical taxonomy
(dendrogram) from set of documents
Aside: Demographic Recommenders
• Given
• Set S with most similar users to u
• s[i] rating of a user (from S) from an item i
• Goal: Predict the rating of u for i
• Easiest option: arithmetic mean
• Problem: Similarity of u with members of S is not taken
into account à Solution: Weighting based on similarity
• Problem: Different users utilize rating scale differently
à Solution: Consider deviation from average rating (for
user)
• Many variations of algorithms in research literature
Advantages Collaborative Filtering
• Discussion so far:
• „Standard“ CF, operates on user-item-matrix, thus
the raw data of ratings (memory-based)
• User-user similarity
• Other option: item-item similarity
• Model-based CF approaches
• Use user-item matrix to generate a model
• Most common option: Model-based item-item
approach
Model-based Collaborative Filtering
• Knowledge base
• Connects user preferences and item features
• Variables: user model features (requirements), item
features (catalogue)
• Set of constraints
• Logical implications
• IF user requires A THEN proposed item should
possess feature B
• Hard and soft/weighted constraints
• Solution preferences
Finding a Set of Suitable Items I
• Advantages
• Works well in practice
• No meta data about items needed
• Cross-domain recommendations, high diversity
• Disadvantages
• Cold start, new user and new item problems
• For all methods based on „learning“
• Effort required for users to give ratings
• But implicit feedback
• Good in „taste“ related domains
Content-based Recommender Systems
• Advantages
• No (or less pronounced) „new item“ problem
• Good scalability
• Often no explicit user profile or ratings required
• Disadvantages
• Item model limited to explicitly analyzed features
• Effort required to build item model
• Overfitting, portfolio effect
• Poor diversity
• Good if structured item description is available
Hybrid Recommender Systems
Selects a recommender
based on situation and/or
user profile
Different profile à different
techniques
Components with different
performance for some types of
users
But always only one technique
used
Existence of criterion for
switching decision
Example
Use CF, if active user made at
least n ratings
Use CB filter, otherwise
Different Hybridization Designs (2)
Secondary technique
refines candidates from
primary technique –
Hierarchical
Result space of candidate items
from 1st recommender is input
for 2nd
Secondary technique used as „tie
breaker“ only
Example
1. Determine documents
from collection that fit
user query (content-
based approach)
2. Rank results by CF or
other method
Meta-Level
A model learned by
contributing recommender
as input for actual
recommender
Contributing recommender
completely replaces the
original knowledge source
with a learned model
Problems
Not always straightforward (or
necessarily feasible) to derive a
meta-level hybrid from any given
pair of recommenders
Contributing recommender has
to produce some kind of model
that can be used as input by the
actual recommender
Possible Combinations