You are on page 1of 86

Recommender Systems Part 1

Hanna Hauptmann
22-09-2021
General Introduction
Why is information
filtering needed?

• Information overload
• Too many movies,
books, webpages, songs,
plumbers, etc
• Searching is difficult
Recommender Systems

• Systems that help find


the good stuff
• Systems that make
personalized
recommendations of
goods, services, and
people (Kautz)
How to recommend?

• User preferences for items are learned


• The recommender system suggests other items
• that are similar
Or
• that similar users liked => user-based collaborative
filtering
What do we need?

How do we know user’s opinion?


• Explicit: users rate items
• Implicit: looking at clicks etc.

What does ‘similar’ items mean?


• Similar in content => content-based filtering
• Similar in ‘appreciation’ by other users => item-based
collaborative filtering
Example: MovieLens

• User rates movies


• The system suggests
‘best bets’
• Users keep rating
movies while checking
best bets
• To test out a Movie
Recommender go to
http://movielens.umn.ed
u/
Terms and Definitions
Recommender Data Model

• Set U={u1, ..., un} of users


• Set I={i1, ..., im} of items (e.g. products)
• Elements from U and I can be described by a vector
• u=(a1, ..., as) attributes of user profile (preferences,
ratings,...)
• i=(b1, ..., bt) description of items (metadata, features,
price,...)
• Goal of recommendation process: recommend new
items for an active user u
Ratings

• ø := no rating for item


• Unary: Item is marked (or not)
• E.g. „product was purchased“
• Binary: „good/bad“, „+/-“ etc.
• Scalar: Numerical rating (e.g. 1-5) etc.
• Other representations possible, e.g. {*, **, ***,
****, *****}
• Distinguish between “no rating” and “neutral” rating
• Range [-1,1]: 0 is neutral, not no rating
User-Item Matrix Item 1 Item 2 Item 3 Item 4 Item 5

User 1 8 1
? 2 7

User 2 2
? 5 7 5

User 3 5 4 7 4 7

User 4 7 1 7 3 8

User 5 1 7 4 6 5

• Store ratings for each


user and item User 6 8 3 8 3 7
• What about items that
have not been rated yet
Recommender Systems Properties I

• Individual vs. collaborative


• Recommend item(s) for active user
• Collaborative: Take also information about other
users into account (often their ratings for items)
• Individual: Based on user model for active user only
• Reactive/pull vs. push/proactive
• Reactive/pull: explicit interaction of user with system
• System generates recommendations even without
explicit user interaction
Recommender Systems Properties II

• User- vs. item-based (collaborative filtering)


• „Standard“ CF calculates user-user similarity
• Other option: item-item similarity
• Still collaborative, not individual/content-based!
• Memory- vs. model-based
• Interpretation of „raw data“, e.g. user-item matrix
• Generation of a dedicated model, e.g. item-item
matrix
Types of Recommender Systems

• Collaborative filtering (CF)


• Content-based filtering (CB)
• Also utility- or knowledge-based approaches
• Constraint-based recommendation
• Case-based recommendation
• Hybrid recommender systems
• Combination of several other recommenders
Problems in Real Life Solutions
Recommender Systems Issues

• Cold start and latency problems


• Sparseness of user-item matrix
• Diversity of recommendations
• Scalability
• Privacy and trust
• Changing user interests (dynamics)
Cold Start Problems

• „New user“ and „new item“ problem


• Systems cannot recommend items to new users with
no profile or history
• Same for new items
• Also „latency problem“: items need some time until
they can be recommended
• Chicken-and-egg problem
• Users will not use system without good
recommendations
• No incentive to rate items etc.
• System cannot generate good recommendations
Data Sparseness

• Common situation
• Lots of users and items but only few ratings
• Sparseness of user-item matrix
• In addition, new items are continuously added
• Users should also rate these items
• Number of ratings has to keep up with new users
and items
• Possible solution include the automatic generation of
ratings and implicit user profiling, e.g. click on a video
constitutes (positive) rating
Diversity of Recommendations

• What about new items?


• “Serendipity” (unexpected, surprising items)
• Do not recommend items that are already known
• Do not recommend items that are too similar to
already known items
• Possible solutions
• Use content-based approaches to easier integrate
new items in recommendation process
• Use collaborative filtering to allow „cross-domain“
recommendations
Scalability

• The more items and users, the higher the


computational effort to analyze the data
• Storage/memory and runtime complexity
• Scalability is an issue in practice
• Problem in particular with memory-based approaches
• Possible solutions include
• Use model-based approach
• Limit the number of items and/or users, e.g. only
consider items that received at least k ratings
• Pre-compute recommendations for users
Privacy and Trust

• Collecting and interpreting personal data, e.g. ratings


• Control for users?
• Bought product may have been gift for other person –
Privacy problem!
• The more information the system is able to collect, the
higher the recommendation quality in general
• How can user trust a recommended item?
• Possible solutions include
• Consider social relationships („Web of Trust“)
• Let user control their profile information
• Explanations of recommendations
Changing User Interests (Dynamics)

• User model is often relatively static


• But dynamic evolution over user interests
• Changes over time, older ratings may not be valid
• Also called “interest/concept/profile drift”
• Also the context of recommendations
• Example: Mobile restaurant guide
• Solutions in research literature include
• Distinction between short- and long-term interests
• Context-aware recommender systems
Collaborative Filtering Recommendations
Collaborative Filtering (CF)

• Basic idea: System recommends items which were


preferred by similar users in the past
• Based on ratings
• Express preferences of the active user and also other
users’ collaborative approach
• Works on user-item matrix
• Memory- or model-based
• No item meta data etc.
• Assumption: Similar taste in the past implies similar
taste in future
• CF is formalization of „word of mouth“ among friends
General Process

• Users rate items


• Find set S of users which have rated similar to the
active user u in the past (neighborhood)
• Generate candidate items for recommendation, rated
in neighborhood of u, but not rated by u yet
• Predict rating of u for candidate items
• Select and display n best items
Required Algorithms

• Metric for user-user similarity


• Mean-squared difference
• Cosine
• Pearson/Spearman correlation
• Select set S of most similar users (to active user u)
• Center-based
• Similarity threshold
• Aggregate neighborhood
• Metric to predict the rating of u for an item i
User-User Similarity

• Item set I
• Users U, V with u[i] denoting rating of item i by user
• 𝑢 denotes the rating vector of user u denotes the
vector norm
("#$)!
• Mean squared difference: 𝑠𝑖𝑚 𝑢, 𝑣⃗ = &
"'$
• Cosine similarity: 𝑠𝑖𝑚 𝑢, 𝑣⃗ = "∗$
• Pearson/Spearman Correlation
Neighborhood of Similar Users I

Goal: Determine set of users which are most similar to the


active user u
• Center-based
• S contains k most similar users
à Problem: maybe some of the users are not really
similar, if k was chosen too large, deviators possible
• Similarity threshold
• S contains all users with a similarity bigger than a
threshold t
à Problem: maybe too few users in S
Neighborhood of Similar Users II

• Aggregate neighborhood
• Follow similarity threshold method first
• If S is too small (less than k users) determine
„centroid“ of set S and add users which are most
similar to centroid (less deviators than center-based
method)
• How many neighbors?
• Only consider positively correlated neighbors
• Can be optimized based on data set
• Often, between 50 and 200
What is Clustering?

• Clustering: Process of grouping set of objects into


classes of similar objects
• Objects within a cluster should be similar
• Objects from different clusters should be dissimilar
• Commonest form of unsupervised learning
• Important task that finds many applications in IR and
other places
• Recommender scenario: Find similar items to items
which user has liked in the past or similar users
Clustering Algorithms

• Partitioning/flat algorithms
• Usually start with random (partial) partitioning
• Refine it iteratively
• E.g. K-Means clustering
• Hierarchical algorithms
• Bottom-up, agglomerative
• Example: Build tree-based hierarchical taxonomy
(dendrogram) from set of documents
Aside: Demographic Recommenders

• To predict a user’s opinion for an item, use the opinion


of similar users [as in user-based Collaborative
Filtering]
• Similarity between users is decided by looking at
demographics (stereotypes), e.g.
• User modeling with stereotypes
• Clustering methods can be applied to group users
• Groups can come from marketing research
• Expert estimations or learned from historic data
Prediction CF Recommender

• Given
• Set S with most similar users to u
• s[i] rating of a user (from S) from an item i
• Goal: Predict the rating of u for i
• Easiest option: arithmetic mean
• Problem: Similarity of u with members of S is not taken
into account à Solution: Weighting based on similarity
• Problem: Different users utilize rating scale differently
à Solution: Consider deviation from average rating (for
user)
• Many variations of algorithms in research literature
Advantages Collaborative Filtering

• Works well in practice


• Quality improves with more and more ratings
• Only ratings as input data required, no other data
• In particular, no information about items needed
• CF is able to generate cross-domain („cross genre“)
recommendations
• High diversity, because item categories etc. are not
considered
• Implicit user feedback often adequate, e.g. Unary
ratings such as „Click on product Web page“
Disadvantages Collaborative Filtering

• New user and new item problem (cold start)


• Often sparseness in user-item matrix
• „Grey sheep“: users with „extraordinary“ taste
• „Black sheeps“: users that intentionally make incorrect
ratings
• Trust and robustness are issues
• CF is prone to manipulation/attacks
• Not applicable in every domain, e.g. when specific,
short-term user preferences have to be respected
Memory- vs. Model-based CF

• Discussion so far:
• „Standard“ CF, operates on user-item-matrix, thus
the raw data of ratings (memory-based)
• User-user similarity
• Other option: item-item similarity
• Model-based CF approaches
• Use user-item matrix to generate a model
• Most common option: Model-based item-item
approach
Model-based Collaborative Filtering

• In general: find patterns based on training data


• Probabilistic methods (Bayes Theorem), clustering
models, latent semantic models such as singular
value decomposition (SVD), association rule mining,
• Basic idea: use the similarity between items (and not
users) to make predictions
• Simple model-based item-item approach in principle
• Calculate similarity of items
• Result is an item-item matrix
• Generate recommendation based on ratings of
active user and item-item matrix
Item-Item based CF

• Look for items that are


similar to Item5
• Use Alice's ratings for
these items to predict
the rating for Item5
Slope One

• Main idea: Use average difference (similarity) of ratings


for two items
• Predicts rating on one item x based on ratings of other
items (item-based CF)
• Use pairs of items
• Using linear regression: f(x) = ax + b
• Simpler form (a=1): f(x) = x + b
• b is average difference between two items' ratings
• Store predicted ratings for users in database
• Allows (non-personalized) recommendations
Matrix Factorisation

• Represent users and items in a lower dimensional


latent space
• Learns matrices for users and items which when
multiplied fit the known ratings as well as possible
Advantages Model-based CF

• Model and also predictions can be pre-computed


• Reduces runtime complexity of recommendation
generation significantly
• Scalability important in practice
• Especially with regard to number of users
• Memory-based CF needs a higher number of ratings to
calculate user-user similarity with accuracy
Disadvantages Model-based CF

• Storage/memory requirements, item-item matrix grows


very big with many items
• User-item matrix is usually pretty sparse à efficient
storage possible
• Result also often similar to items which the user
already knows
• Because item similarity is basis for recommendation
• Memory-based user-user approach often results in
better recommendations with higher diversity
• Still collaborative, not content-based approach
Content-based Recommendations
Recommender Data Model Revisited

• Set U={u1, ..., un} of users


• Set I={i1, ..., im} of items (e.g. products)
• Elements from U and I can be described by a vector
• u=(a1, ..., as) attributes of user profile (preferences,
ratings,...)
• i=(b1, ..., bt) description of items (metadata, features,
price,...)
• Goal of recommendation process: recommend new
items for an active user u
Collaborative vs. Individual R.S.

• Collaborative: Take also information about other users


into account (usually ratings), works on user-item
matrix
• Individual: Based on user model for active user only i.e.
user-item-matrix is reduced to match user vector with
item set
• Often used synonymously with „content-based“
filtering
• Exploits information about items, e.g. explicit meta
data such as price
Content-based Recommendation

• Match user profile with itemset


• Recommend item similar to those the active user has
liked in the past
• Representation of items (item model)
• Often as a vector of features (e.g. price, megapixel of
camera, keywords,…)
• Representation of users (user model)
• E.g. rating vector, learned user preferences,
transaction history
• Methods and metrics to match items and users
Methods and Variants

• User customization: User explicitly specifies interesting


categories, for example document modeling (e.g. Web
search engine)
• Data mining/machine learning methods
• Classification, clustering, decision trees, Bayesian
modeling, ...
• Rule-based systems
• Modeling expert knowledge or learning rules from
user behavior
• Constraint-based recommenders
• Case-based recommenders
Content
Representation and
Item Similarities

• Represent items and


users in the same way
• Compute similarity of
unseen item j with user
profile i based on the
keyword overlap, e.g.
Dice coefficient
• keywords 𝑏! describes Book 𝑏!
with a set of keywords
• Item similarity:
" × $%&'()*+(.! )∩$%&'()*+ ."
$%&'()*+(.! ) 0 $%&'()*+1."2
Term-Frequency - Inverse Document Frequency

• Simple keyword representation has its problems:


• longer documents have a higher chance to overlap
• Standard measure: TF-IDF
• Encodes text documents in multi-dimensional
Euclidian space
• Weighted term vector TF: Measures, how often a term
appears (density in a document) assuming that
important terms appear more often
• Normalization has to be done in order to take
document length into account àIDF: Aims to reduce
the weight of terms that appear in all documents
TF-IDF

• Given a keyword 𝑖 and a document 𝑗


• 𝑇𝐹(𝑖,𝑗) term frequency of keyword 𝑖 in document 𝑗
• 𝐼𝐷𝐹(𝑖) inverse document frequency calculated as
𝑰𝑫𝑭(𝒊)=𝒍𝒐𝒈 𝑵/(𝒏(𝒊))
• 𝑁 : number of all recommendable documents
• 𝑛(𝑖) : number of documents from 𝑁 in which keyword 𝑖
appears
• 𝑇𝐹−𝐼𝐷𝐹 is calculated as: 𝑻𝑭-𝑰𝑫𝑭(𝒊,𝒋)=𝑻𝑭(𝒊,𝒋)∗𝑰𝑫𝑭(𝒊)
Improving the vector space model

• Remove stop words


• Use stemming
• Size cut-offs
• Use lexical knowledge, use more elaborate methods
for feature selection
• Remove words that are not relevant in the domain
• Detection of phrases as terms
• Limitations: semantic meaning remains unknown
Linear Classifiers

Most learning methods aim


to find coefficients of a
linear model
Simplified classifier with
only two dimensions can
be represented by line
x1 and x2 correspond to keyword
vector representation of
documents
w1, w2 and b are parameters to be
learned
Classification of document based
on checking w1x1+w2x2>b
Can be generalized to n-
dimensional space
Decision Trees

• Mapping from observations


about an item to conclusions
about its target value
• Partitioning dataset into trees
• In basic setting two classes
appear at leaf nodes, e.g.
interesting or not interesting (or
„play“ / „don‘t play“ etc)
• Goal: Learn decision tree based
on user preferences or past user
behavior
• Ideal for structured, small data
Advantages Content-based Filtering

• No (or less pronounced) „new item“ problem


• Usually good scalability
• Because most approaches are model-based
• Often no explicit profile acquisition needed
• ratings not needed, transaction history sufficient
• Often no domain knowledge needed
• Item description sufficient
• Often quality of recommendation improves over time
• Better user model
Disadvantages Content-based Filtering

• Item model limited to analyzed features


• E.g. keywords in document or points-of-interests
relevant for mobile applications
• Features have to be available explicitly
• Difficult for non-textual items
• Overfitting, overspecialization, portfolio effect
• Recommendation based on similarity only
• No real new, unexpected items (diversity/serendipity
often poor)
• Cold start à Ramp-up phase required
• Often still „new user“ problem
Knowledge-based Recommendations
Knowledge-Based Recommendation

• Different views on “knowledge”


• Similarity functions to determine matching degree
between query and item
• Case-based recommender system
• Utility-based RS, e.g. Multi-Attribute Utility Theory
(MAUT)
• Logic-based knowledge descriptions from domain
expert, e.g. Hard and soft constraints
Typical Approaches

• Constraint-based recommender systems


• Based on explicitly defined set of recommendation
rules (constraints)
• Retrieve items that fulfil recommendation rules and
user requirements
• Case-based recommender systems
• Based on different types of similarity measures
• Retrieve items that are similar to user requirements
• Both approaches are iterative, i.e. users can change
their requirements
Constraint-based Recommendation

• Knowledge base
• Connects user preferences and item features
• Variables: user model features (requirements), item
features (catalogue)
• Set of constraints
• Logical implications
• IF user requires A THEN proposed item should
possess feature B
• Hard and soft/weighted constraints
• Solution preferences
Finding a Set of Suitable Items I

• Rule-based filtering with conjunctive queries


• if user choses “low” price, recommend cameras with
price < 300
• If user choses “nature photography”, recommend
cameras with more than 10 mega pixels
• Conjunctive queries
• Create a conjunctive query (“and” expression) from
the right hand side of the matching rules
• Possible compromises for the user can be efficiently
calculated in memory
Finding a Set of Suitable Items II

• Encode the problem as Constraint Satisfaction Problem


(CSP): Basically, a very simple model consisting of
variables having a defined and typically finite domains
• Constraints that describe allowed value assignments to
the variables
• The problem: Find an assignment of values to all
variables, such that no constraint is violated
• Solution search
• Problem is NP complete in general
• Many practically relevant problems
• Efficient solver implementations exist
Additional Reasoning

• Explicit nature of the problem encoding allows various


types of reasoning
• What if the user's requirements cannot be fulfilled?
What if they are user requirements are inconsistent?
• Find a “relaxation” or “compromise”
• Show “no solutions found” message and ask user
• What if the knowledge base is inconsistent?
• Find a “diagnosis”
• Why was a certain item (not?) recommended
• Compute logical explanations
Ranking the Items

• A CSP/conjunctive query encoding does not entail a


ranking of the solution
• Possible approaches in case of unsatisfiable
requirements
• Rank those items highly that fulfil most constraints
• If there are many solutions
• Use a distance function to determine the “closest”
solution
• Use a utility-model to rank the items, e.g. based on
Multi-Attribute Utility Theory (MAUT)
Interacting with Constraint-based Recommender

• User specifies his or her initial preferences


• All at once
• Incrementally in a wizard-style
• Interactive dialog
• User is presented with a set of matching items
• With explanation as to why a certain item was
recommended
• User might revise his or her requirements
• See alternative solutions
• Narrow down the number of matching items
Introduction Case-based Recommender

• A form of individual recommendation


• Structured information with a well defined set of
features and feature values
• Travel information presented in its price, duration,
accommodation, location, mode of transport, etc.
• Job information presented in the job kinds, salary,
business category of each company, educational
level, experience, location etc.
• Information is represented as cases and system
recommends cases that are most similar to the user’s
preferences
Case-Based Reasoning

• Case-based recommendation origins in Case-Based


Reasoning (CBR)
• It is to solve new problems by reusing the solutions
to problems that have been previously solved and
stored as cases in a case-base
• Each case consists of a specification part, which
describes the problem and a solution part
• Solutions to similar prior problems are useful
starting point for new problem solving
• “Users would like the similar one that they liked
before.”
Hybrid Approaches
Collaborative Filtering

• Advantages
• Works well in practice
• No meta data about items needed
• Cross-domain recommendations, high diversity
• Disadvantages
• Cold start, new user and new item problems
• For all methods based on „learning“
• Effort required for users to give ratings
• But implicit feedback
• Good in „taste“ related domains
Content-based Recommender Systems

• Advantages
• No (or less pronounced) „new item“ problem
• Good scalability
• Often no explicit user profile or ratings required
• Disadvantages
• Item model limited to explicitly analyzed features
• Effort required to build item model
• Overfitting, portfolio effect
• Poor diversity
• Good if structured item description is available
Hybrid Recommender Systems

• All recommenders have distinguished pros and cons


• Idea: Combine various methods to avoid disadvantages
of single techniques
• Example: Combine CF with content-based approach
to avoid new user/item problems
• Combine different types
• It is also possible to combine different algorithms of
the same type, e.g. content-based
• Hybrid recommender: Combination of at least two
other recommender algorithms
• Different alternatives for the combination
Different Hybridization Designs (1)

• Parallel use of several systems


• Weighted: Score of different recommendation
components are combined (numerically)
• Switching: System chooses among recommendation
components and applies the selected one
• Mixed: Recommendations from different
recommenders are presented together
Weighted

Determine predicted score


for each item
Metric to express how much a
user likes the item
Determine candidate items to
recommend
Scoring: Predict score for items
Process
1. Apply different
techniques in parallel
2. Calculate combination
of scores
Simplest case: mean average,
linear combination
Or different weight for the
different techniques
Based on training experiences
Mixed

Combine lists at user interface level,


merging based on ranking of items
in single technique
Presentation of results from
different techniques side-by-side in
a combined list
Advantages
Easy to combine various, very different
methods
Can also be done if scores from different
recommenders cannot be compared
Disadvantages
Best (overall) items may not be found
More difficult to evaluate
Switching

Selects a recommender
based on situation and/or
user profile
Different profile à different
techniques
Components with different
performance for some types of
users
But always only one technique
used
Existence of criterion for
switching decision
Example
Use CF, if active user made at
least n ratings
Use CB filter, otherwise
Different Hybridization Designs (2)

• Monolithic exploiting different features


• Feature Combination (FC): Features derived from
different knowledge sources are combined together
and given to a single recommendation algorithm
• Feature Augmentation (FA): One recommendation
technique is used to compute a feature or set of
features, which is then part of the input to the next
technique
Feature Combination
(FC)

Combine single features of


recommenders
Do not combine recommenders,
or candidate list of recommender
Inject features of one source into
a different source for processing
different data
Features of “contributing
recommender” are used as a part
of the “actual recommender”
Example
CB filter utilizes ratings of other
users in addition to item meta
data
Feature Augmentation
(FA)

Generates new features for


each item
Augmentation/combination
is done offline
Not raw features (FC), but
the result of computation
from contributing
recommender (FA)
Advantages
Example
Apply CB method to generate
score for item
Use score as rating in CF
Different Hybridization Designs (3)

• Pipelined invocation of different systems


• Cascade: Recommenders are given strict priority,
with the lower priority ones breaking ties in the
scoring of the higher ones
• Meta-level: One recommendation technique is
applied and produces some sort of model, which is
then the input used by the next technique
Cascade

Secondary technique
refines candidates from
primary technique –
Hierarchical
Result space of candidate items
from 1st recommender is input
for 2nd
Secondary technique used as „tie
breaker“ only
Example
1. Determine documents
from collection that fit
user query (content-
based approach)
2. Rank results by CF or
other method
Meta-Level

A model learned by
contributing recommender
as input for actual
recommender
Contributing recommender
completely replaces the
original knowledge source
with a learned model
Problems
Not always straightforward (or
necessarily feasible) to derive a
meta-level hybrid from any given
pair of recommenders
Contributing recommender has
to produce some kind of model
that can be used as input by the
actual recommender
Possible Combinations

• Not all hybrid combinations of recommenders are


reasonable or possible
• Order is important, for example, a content-
based/collaborative feature augmentation hybrid is
different from one that applies the collaborative part
first and uses its features in a content-based
recommender
• Existing implementations focus on combining content-
based filter with collaborative filtering to reduce cold
start problems of CF
Readings

• Ricci, Francesco, Lior


Rokach, and Bracha
Shapira. "Introduction to
recommender systems
handbook." Recommender
systems handbook.
Springer, Boston, MA,
2011. 1-35.
• Konstan, Joseph A., and
John Riedl. "Recommender
systems: from algorithms
to user experience." User
modeling and user-
adapted interaction 22.1
(2012): 101-123.

You might also like