Professional Documents
Culture Documents
Unit Iii-Collaborative Filtering
Unit Iii-Collaborative Filtering
A systematic approach, Nearest-neighbor collaborative filtering (CF), user-based and item-based CF,
components of neighborhood methods (rating normalization, similarity weight computation, and neighborhood
selection
Suggested Activities:
• Practical learning – Implement collaborative filtering concepts
• Assignment of security aspects of recommender systems
Suggested Evaluation Methods:
• Quiz on collaborative filtering
• Seminar on security measures of recommender systems
• Collaborative filtering filters information by using the interactions and data collected by the system from
other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely
to agree again in the future.
• The concept is simple: when we want to find a new movie to watch we’ll often ask our friends for
recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes
similar to our own.
• Most collaborative filtering systems apply the so-called similarity index-based technique. In the
neighborhood-based approach, a number of users are selected based on their similarity to the
active user. Inference for the active user is made by calculating a weighted average of the ratings
of the selected users.
• Collaborative-filtering systems focus on the relationship between users and items. The similarity of items
is determined by the similarity of the ratings of those items by the users who have rated both items.
• Collaborative filtering recommender systems have played a significant role in the rise of web services
and content platforms like Amazon, Netflix, YouTube, etc. in recent years. In this age of information,
knowing what the customer wants before they even know it themselves is nothing short of a superpower.
As the name suggests, recommender system algorithms are used to offer relevant content or product to
the consumer based on their taste or previous choices
• User-based, which measures the similarity between target users and other users.
• Item-based, which measures the similarity between the items that target users rate or interact with and
other items.
1
Why do we need recommender systems?
• Back in 2006, Netflix offered a prize to solve a simple problem that had been around for
years. It was to find the best collaborative algorithm to predict user ratings for films
that they haven't watched yet, based on previous ratings of other movies.
• Today, e-commerce giants continue to try to solve this problem in a better way by
observing users’ past behavior to predict what other things the same user will like. .
• Recommendations also help customers discover new products and offers that they’re
not explicitly looking for, thus speeding up the search process. This allows companies to
send out personalized newsletters via email that offer new TV shows, movies, products,
and services that are better suited for them.
• One of the most significant advantages of modern recommendation algorithms is their
ability to take implicit feedback and suggest new content/products, thus staying up-to-
date with customers’ preferences. This enables businesses to continue catering to
customers even if their tastes change over time.
• Similar users are divided into small clusters and are recommended new items according to the
preferences of that cluster. Let’s understand this with an easy movie recommendation example:
• Users 1 and 2 liked Movie 1. Since User 1 liked movies 2 and 4 a lot, there’s a high
chance of User 2 enjoying the same.
2
• Users 1 and 3 have opposite tastes.
• Users 3 and 4 both disliked Movie 2, so there’s a high chance User 4 will also dislike
Movie 4.
• User 3 might dislike Movie 1.
3
A systematic approach to collaborative filtering involves the following steps:
1. Data Collection: Gather user-item interaction data, such as ratings, reviews, purchases, or clicks.
2. Data Preprocessing: Clean and prepare the data for analysis, including handling missing values, outliers, and
data normalization.
3. User or Item Representation: Encode user preferences or item features into a suitable representation, such as
user-item matrices or item-attribute vectors.
4. Similarity Calculation: Compute similarity scores between users or items based on their respective
representations.
5. Nearest Neighbor Identification: Identify the nearest neighbor for each user or item based on the calculated
similarity scores.
6. Prediction Generation: Predict the rating or preference of a user for an item based on the ratings or preferences
of their nearest neighbor.
7. Evaluation and Optimization: Evaluate the performance of the CF algorithm using appropriate metrics and
refine the model parameters to improve accuracy.
8. Deployment and Maintenance: Integrate the CF algorithm into the recommender system and monitor its
performance over time, making adjustments as needed.
4
Effective collaborative filtering relies on the quality and quantity of user-item interaction data. Additionally, the
choice of similarity measures, nearest neighbor identification techniques, and prediction algorithms can
significantly impact the performance of the CF system.
• content-based approaches, which use the content of items previously rated by a user u, collaborative
(or social) filtering approaches rely on the ratings of u as well as those of other users in the system.
• The key idea is that the rating of u for a new item i is likely to be similar to that of another user v. if u
and v have rated other items in a similar way. Likewise, u is likely to rate two items i and j in a
similar fashion, if other users have given similar ratings to these two items.
• Collaborative filtering methods can be grouped in the two general classes of neighborhood and model-
based methods.
• In neighborhood based (memory-based or heuristic-based ) collaborative filtering the user-item ratings
stored in the system are directly used to predict ratings for new items.
• This can be done in two ways known as user based or item-based recommendation.
o User-based systems, such as GroupLens (Social Computing Research at the University of
Minnesota) ,Bellcore video (Library Toolkit is a set of tools for constructing and browsing
libraries of digital video), and Ringo (Social Information Filtering for Music Recommendation),
evaluate the interest of a user u for an item I using the ratings for this item by other users, called
neighbors, that have similar rating patterns. The neighbors of user u are typically the users v
whose ratings o the items rated by both u and v, i.e. 𝐿_𝑢𝑣 , are most correlated to those of u.
o Item-based approaches, on the other hand, predict the rating of a user u for an item i based on
the ratings of u for items similar to i. In such approaches, two items are similar if several users
of the system have rated these items in a similar fashion.
• In model-based approaches use these ratings to learn a predictive model. The general idea is to
model the user-item interactions with factors representing latent characteristics of the users and items in
the system, like the preference class of users and the category class of items. This model is then trained
using the available data, and later used to predict ratings of users for new items. Model-based
approaches for the task of recommending items are numerous and include Bayesian Clustering , Latent
Semantic Analysis , Latent Dirichlet Allocation, Maximum Entropy , Boltzmann Machines, Support
Vector Machinesand Singular Value Decomposition
• Item-based CF algorithms recommend items to a user based on the similarity of items to items that
the user has interacted with in the past. The algorithm first identifies a set of similar items based
on their attributes or features. The similarity between items is typically measured using distance
metrics or similarity measures such as Jaccard similarity or cosine similarity. Once the similar
items are identified, the algorithm recommends to the active user items that are similar to items
that the user has liked in the past.
7
• Simplicity: Memory-based approaches are intuitive and simple to implement, making them a
viable option for solving problems with moderately big datasets in a short amount of time.
• Transparency: Memory-Based systems’ suggestions are easy to understand since they are
grounded in the user’s and the item’s direct interactions.
• Serendipity: Memory-based filtering has the potential to provide serendipitous
recommendations, in which users stumble onto previously unknown but potentially
fascinating content through shared relationships with other users
Drawbacks of Memory-Based Collaborative Filtering:
• Sparsity and Scalability: Since the frequency of user-item interactions tends to decrease as the dataset
expands, it becomes more difficult to discover trustworthy neighbours and might cause scaling
problems.
• Cold Start: Memory-Based systems struggle when there are too few contacts with new users or things
to make reliable suggestions.
• Limited Representation: Memory-based approaches may provide subpar results because they fail to
fully capture complicated patterns in the data.
Model-based collaborative approach
• Cooperative Modelling Instead of using a predetermined set of rules, filters use a statistical or
machine learning model to identify and exploit hidden links and patterns in the data. These models
are then used to estimate users’ preferences for unseen objects based on their training data of past
interactions between users and items
• In the model-based approach, machine learning models are used to predict and rank interactions
between users and the items they haven’t interacted with yet. These models are trained using the
interaction information already available from the interaction matrix by deploying different
algorithms like matrix factorization, deep learning, clustering, etc.
Matrix factorization
Matrix factorization is used to generate latent features by decomposing the sparse user-item interaction matrix
into two smaller and dense matrices of user and item entities.
Matrix factorization is a popular technique used in Collaborative Filtering (CF) for recommendation systems.
CF is a method to predict a user's interests by collecting preferences or behavior information from many
8
users. Matrix factorization is particularly effective in collaborative filtering because it can handle the sparsity
of user-item interaction data.
Here's how matrix factorization works in the context of collaborative filtering:
1. Understanding the Data Matrix:
• Assume you have a matrix R representing user-item interactions. Rows correspond to users,
columns correspond to items, and the entries Rui represent user u's interaction (like rating,
purchase, or view) with item i. However, most entries are unknown (missing) because not all
users interact with all items.
2. Objective of Matrix Factorization:
• The goal of matrix factorization in CF is to decompose this sparse matrix R into the product
of two lower-dimensional matrices U and I
𝑹 ≈ 𝑼 × 𝑰𝑻
• Here, U (an 𝑚 × 𝑘 matrix) represents user embeddings, where each row u (out of m rows)
corresponds to a user's latent factors in an k-dimensional space.
• I (an 𝑛 × 𝑘 matrix) represents item embeddings, where each row i (out of n rows) corresponds
to an item's latent factors in the same k-dimensional space.
3. Matrix Factorization Process:
• Matrix factorization aims to learn the matrices U and I by minimizing the reconstruction error
between R and 𝑈 × 𝐼 𝑇 . This is typically achieved through optimization techniques like
gradient descent, alternating least squares, or stochastic gradient descent.
• The objective function could be formulated as:
minimize ∑(𝑢,𝑖)∈observed (𝑅𝑢𝑖 − (𝑈 × 𝐼 𝑇 )𝑢𝑖)2 +λ (∥ 𝑈 ∥2+∥ 𝐼 ∥2 ) where λ is a regularization
parameter to prevent overfitting.
4. Prediction and Recommendations:
Once the matrices U and I are learned, the missing entries in R can be estimated as
𝑼 × 𝑰𝑻
Recommendations for a user u can be made by suggesting items that have the highest
predicted scores (entries in 𝑼 × 𝑰𝑻 ) for that user, but have not been interacted with yet.
5. Key Advantages:
• Matrix factorization is effective in handling sparsity because it leverages latent factors to
capture user and item interactions.
• It can provide personalized recommendations even for users with very few interactions.
Advantages of Model-Based Collaborative Filtering:
• Scalability: Model-Based approaches outperform Memory-Based ones in dealing with big and
sparse datasets because they learn underlying patterns without making direct comparisons of users
or things.
• Cold Start Mitigation: By using supplementary data or a hybrid method, model-based filtering may
help with the cold start issue.
• Flexibility: Model-based methods may use a wide variety of data and attributes, allowing for the
incorporation of context to enhance suggestions.
• Neighborhood-based recommender systems fall under the collaborative filtering umbrella and focus
on using behavioral patterns, such as movies that users have watched in the past, to identify similar
users (i.e., users who demonstrate similar preferences), or similar items (i.e., items that receive similar
interest from the same users).
• Nearest Neighbors Collaborative Filtering (NNCF) is a technique used in recommendation systems
to predict user preferences based on the similarity between users or items. It falls under the umbrella
of Collaborative Filtering (CF), which utilizes the collective wisdom of users to make
recommendations.
• User-based Collaborative Filtering (UBCF):
o Predict a user's preference for an item by finding similar users based on their historical
ratings.
• Item-based Collaborative Filtering (IBCF):
o Predict a user's preference for an item by finding similar items based on how users have rated
them.
Steps Involved Nearest Neighbors Collaborative Filtering
Step-1: Data Representation: Represent user-item interactions as a matrix R, where rows correspond to users
and columns correspond to items. Each entry Rui represents a user u's rating (or interaction) with item i.
11
Step-2: Similarity Calculation: Compute similarity between users (for UBCF) or items (for IBCF) based on
their rating patterns. Common similarity metrics include cosine similarity, Pearson correlation, or Jaccard
similarity.
Step-3: Nearest Neighbors Selection: For a given user u (or item i), identify the k most similar users (or
items) based on the computed similarity scores.
Nearest Neighbors are typically selected based on the highest similarity scores.
Step-4: Prediction:
• UBCF Prediction: Predict user u's rating for item i by averaging the ratings of the k nearest
Neighbors who have rated item i, weighted by their similarity to user u.
• IBCF Prediction: Predict user u's rating for item i by combining ratings of items similar to item i,
weighted by the similarity between items.
• We refer to the technique that computes similar users as user-based and to the technique that focuses
on computing similar items as item-based.
• An example of the item-based technique is Netflix’s “Because you watched…” feature, which
recommends movies or shows based on examples that users previously showed interest in.
• An example of a user-based recommender system is booking.com, which recommends destinations
based on the historical behavior of other users with similar travel history.
Pipeline Overview
12
The image below summarizes the pipeline for our implementation of item-based and user-based recommender
systems in our declarative language, Rel. Without loss of generality, we focus on a movie recommendation
use case, where we are given interactions between users and movies.
Step 1: We convert user-item interactions to a bipartite graph.
The first step is to convert the input interactions data to a bipartite graph that contains two types of nodes:
Users and Movies, as shown in the image below.
The two node types are connected by an edge that we call watched. In Rel, Users and Movies are represented
by entity types, and their attributes, such as id and name, are represented by value types
Step 4: Scoring
• We sort the scores for every user in order to generate top-k recommendations.
• Using the similarities calculated in the previous step, we then compute the (user, movie) scores for all
pairs. We predict that a user will watch movies that are similar to the movies they have watched in the
past (item-based approach).
• The score for a pair (user, movie) indicates how likely it is for a user to watch a movie and is calculated
as follows: Where:
Step 5: We evaluate performance using evaluation metrics that are widely used for recommender systems
User-Based Collaborative Filtering is a technique used to predict the items that a user might like on the basis
of ratings given to that item by other users who have similar taste with that of the target user. Many websites
use collaborative filtering for building their recommendation system.
Step 1: Finding the similarity of users to the target user U. Similarity for any two users ‘a’ and ‘b’ can be
calculated from the given formula,
14
Step 2: Prediction of missing rating of an item Now, the target user might be very similar to some users and
may not be much similar to others. Hence, the ratings given to a particular item by the more similar users
should be given more weightage than those given by less similar users and so on. This problem can be solved
by using a weighted average approach. In this approach, you multiply the rating of each user with a similarity
factor calculated using the above mention formula. The missing rating can be calculated as
Alice 5 4 1 4 ?
U1 3 1 2 3 3
U2 4 3 4 3 5
U3 3 3 1 5 4
Step 1: Calculating the similarity between Alice and all the other users At first we calculate the
averages of the ratings of all the user excluding I5 as it is not rated by Alice
Therefore, we calculate the average as
Therefore, we have
𝟓+𝟒+𝟏+𝟒 𝟏𝟒
𝒓̅𝑨𝒍𝒊𝒄𝒆 = = = 𝟑. 𝟓
𝟒 𝟒
𝟑+𝟏+𝟐+𝟑 𝟗
𝒓̅𝑨𝒖𝟏 = = = 𝟐. 𝟐𝟓
𝟒 𝟒
𝟒+𝟑+𝟒+𝟑 𝟏𝟒
𝒓̅𝒖𝟐 = = = 𝟑. 𝟓
𝟒 𝟒
𝟑+𝟑+𝟏+𝟓 𝟏𝟐
𝒓̅𝒖𝟑 = = =𝟑
𝟒 𝟒
15
• Hence, we get the following matrix,
U3 0 0 -2 2
Now, we calculate the similarity between Alice and all the other users
Step 2: Predicting the rating of the app not rated by Alice Now, we predict Alice’s rating for BBC News
App,
=3.5+{(0.301*(3-2.25)+(-0.33*(5-3.5)+(0.707*(4-3)}/{|0.301|+|0.33|+|0.707|}
=3.5+{(0.301*0.75)+(-0.33*1.5)+(0.707*1)}/{|0.301|+|0.33|+|0.707|}
=3.83
• Collaborative Filtering is a technique or a method to predict a user’s taste and find the items that a
user might prefer on the basis of information collected from various other users having similar tastes
or preferences.
• It takes into consideration the basic fact that if person X and person Y have a certain reaction for
some items then they might have the same opinion for other items too.
• The two most popular forms of collaborative filtering are:
• User Based: Here, we look for the users who have rated various items in the same way and then find
the rating of the missing item with the help of these users.
• Item Based: Here, we explore the relationship between the pair of items (the user who bought Y, also
bought Z). We find the missing rating with the help of the ratings given to the other items by the user.
• Item to Item Similarity: The similarity between item pairs can be found in different ways. One of the
most common methods is to use cosine similarity
• Prediction Computation: The second stage involves executing a recommendation system. It uses the
items (already rated by the user) that are most similar to the missing item to generate rating.
16
• We hence try to generate predictions based on the ratings of similar products. We compute this using
a formula which computes rating for a particular item using weighted sum of the ratings of the other
similar product.
User_1 2 – 3
User_2 5 2 –
User_3 3 3 1
User_4 – 2 2
Example:
Step 1: Finding similarities of all the item pairs.
Form the item pairs. For example, in this example the item pairs are (Item_1, Item_2), (Item_1, Item_3),
and (Item_2, Item_3). Select each item to pair one by one. After this, we find all the users who have rated
for both the items in the item pair. Form a vector for each item and calculate the similarity between the two
items using the cosine formula stated above..
Sim(Item1, Item2)
In the table, we can see only User_2 and User_3 have rated for both items 1 and 2.
Thus, let I1 be vector for Item_1 and I2 be for Item_2. Then,
I1 = 5U2 + 3U3 and,
I2 = 2U2 + 3U3
Sim(Item2, Item3)
In the table we can see only User_3 and User_4 have rated for both the items 1 and 2.
Thus, let I2 be vector for Item_2 and I3 be for Item_3. Then,
I2 = 3U3 + 2U4 and,
I3 = 1U3 + 2U4..
Sim(Item1, Item3)
17
In the table we can see only User_1 and User_3 have rated for both the items 1 and 2.
Thus, let I1 be vector for Item_1 and I3 be for Item_3. Then,
I1 = 2U1 + 3U3 and,
I3 = 3U1 + 1U3
Advantages:
• Simple and intuitive approach to collaborative filtering.
• Effective in scenarios where users/items have sparse interactions.
• Can capture complex user-item relationships based on similarity metrics.
Challenges and Considerations:
• Data Sparsity: Nearest Neighbors CF may struggle with sparse datasets, where not all users
have rated many items.
• Scalability: Computing pairwise similarities can be computationally expensive for large
datasets.
• Cold Start Problem: Nearest Neighbors CF may face challenges when dealing with new users or
items with few ratings.
• Neighborhood methods, a class of collaborative filtering algorithms, rely on the concept of finding
a "neighborhood" of users or items similar to a target user or item. These methods are based on
the idea that users who have similar preferences tend to like similar items, and vice versa. The key
components of neighborhood methods include:
18
• Similarity Measure: Neighborhood methods use a similarity measure to quantify the similarity between
users or items. Common similarity measures include cosine similarity, Pearson correlation coefficient,
and Jaccard similarity. The choice of similarity measure can significantly affect the performance of the
algorithm.
• Neighborhood Selection: Once the similarity between users or items is computed, the next step is to
select a subset of neighbors that are most similar to the target user or item. This subset is known as the
neighborhood. The size of the neighborhood, i.e., the number of nearest neighbors to consider, can be
fixed or adaptive.
• Rating Prediction: After selecting the neighborhood, the algorithm predicts the rating of a target user
for an item by aggregating the ratings of its neighbors for that item. This can be done using various
aggregation functions such as weighted average, weighted sum, or regression-based methods.
• Item or User-Based Approach: Neighborhood methods can be either item-based or user-based. In item-
based approaches, similarities between items are computed based on the ratings given by users, and
recommendations are made by finding items similar to those the user has liked. In user-based approaches,
similarities between users are calculated based on their rating patterns, and recommendations are made
by identifying users similar to the target user and recommending items they have liked.
• Rating Normalization: To improve the accuracy of predictions, rating normalization techniques may be
applied. These techniques adjust the ratings to account for user or item biases, such as users who tend to
rate items more positively or items that are consistently rated higher or lower than others.
• Sparse Data Handling: Neighborhood methods often face the challenge of dealing with sparse data,
where many user-item pairs have no ratings. Various strategies such as neighborhood expansion,
imputation, or incorporating auxiliary information may be employed to handle sparse data and improve
recommendation quality.
• When it comes to assigning a rating to an item, each user has its own personal scale. Even if an
explicit definition of each of the possible ratings is supplied (e.g., 1=“strongly disagree”,
2=“disagree”, 3=“neutral”, etc.), some users might be reluctant to give high/low scores to items they
liked/disliked.
• Two of the most popular rating normalization schemes that have been proposed to convert
individual ratings to a more universal scale are mean-centering and Z-score
I. Mean-centering
19
Example As shown in Figure,
although Diane gave an average rating of 3 to the movies “Titanic” and “Forrest Gump”, the user-
mean-centered ratings show that her appreciation of these movies is in fact negative. This is because
her ratings are high on average, and so, an average rating correspond to a low degree of appreciation.
Differences are also visible while comparing the two types of mean-centering. For instance, the item-
mean-centered rating of the movie “Titanic” is neutral, instead of negative, due to the fact that much
lower ratings were given to that movie.
Therefore, we have
𝟓+𝟏+𝟐+𝟐 𝟏𝟎
𝒓̅𝒋𝒐𝒉𝒏 = = = 𝟐. 𝟓
𝟒 𝟒
𝟏+𝟓+𝟐+𝟓+𝟓 𝟏𝟖
𝒓̅𝒍𝒖𝒄𝒚 = = = 𝟑. 𝟔
𝟓 𝟓
𝟐+𝟑+𝟓+𝟒 𝟏𝟒
𝒓̅𝒆𝒓𝒊𝒄 = = = 𝟑. 𝟓
𝟒 𝟒
20
𝟒+𝟑+𝟓+𝟑 𝟏𝟓
𝒓̅𝒅𝒊𝒂𝒏𝒆 = = = 𝟑. 𝟕𝟓
𝟒 𝟒
Likewise, Diane’s appreciation for “The Matrix” and John’s distaste for “Forrest Gump” are more
pronounced in the item-mean-centered ratings.
Therefore, we have
𝟓+𝟏+𝟐+𝟒 𝟏𝟐
𝒓̅𝒎𝒂𝒕𝒓𝒊𝒙 = = =𝟑
𝟒 𝟒
𝟏+𝟓+𝟑 𝟗
𝒓̅𝒕𝒊𝒕𝒂𝒏𝒊𝒄 = = =𝟑
𝟑 𝟑
𝟐+𝟓+𝟒 𝟏𝟏
𝒓̅𝒘𝒂𝒍𝒍𝒆 = = = 𝟑. 𝟔
𝟑 𝟑
𝟐+𝟑+𝟓 𝟏𝟎
𝒓̅𝒅𝒊𝒆 𝒉𝒂𝒓𝒅 = = = 𝟑. 𝟑𝟑
𝟑 𝟑
𝟐++𝟓+𝟓+𝟑 𝟏𝟓
𝒓̅𝒇𝒐𝒓𝒆𝒔𝒕 𝒈𝒖𝒎𝒑 = = = 𝟑. 𝟕𝟓
𝟒 𝟑
Criteria to be considered
21
When choosing between the implementation of a user-based and an item-based neighborhood
recommender system, five criteria should be considered:
• Accuracy: The accuracy of neighborhood recommendation methods depends mostly on the ratio between
the number of users and items in the system. The similarity between two users in user-based methods,
which determines the neighbors of a user, is normally obtained by comparing the ratings made by these
users on the same items. On the other hand, an item-based method usually computes the similarity between
two items by comparing ratings made by the same user on these items.
• Efficiency: The memory and computational efficiency of recommender systems also depends on the ratio
between the number of users and items. Thus, when the number of users exceeds the number of items, as is
it most often the case, item- based recommendation approaches require much less memory and time to
compute the similarity weights (training phase) than user-based ones, making them more scalable. However,
the time complexity of the online recommendation phase, which depends only on the number of available
items and the maximum number of neighbors, is the same for user-based and item-based methods.
• Stability: The choice between a user-based and an item-based approach also depends on the frequency
and amount of change in the users and items of the system. If the list of available items is fairly static in
comparison to the users of the system, an item-based method may be preferable. On the contrary, in
applications where the list of available items is constantly changing, e.g., an online article recommender,
user-based methods could prove to be more stable.
• Justifiability: An advantage of item-based methods is that they can easily be used to justify a
recommendation. User-based methods, however, are less amenable to this process because the active user
does not know the other users serving as neighbors in the recommendation.
• Serendipity: In item-based methods, the rating predicted for an item is based on the ratings given to similar
items. Consequently, recommender systems using this approach will tend to recommend to a user item that
are related to those usually appreciated by this user.
Z-score normalization, also known as standard score normalization, is a statistical technique used to
rescale a distribution of values to have a mean of zero and a standard deviation of one. This
normalization technique is often applied to features or variables in data preprocessing to ensure that
they are on a comparable scale, which can be beneficial for certain machine learning algorithms.
The formula for calculating the Z-score of a data point x is:
𝑥−𝜇
𝑧=
𝜎
Where:
• z is the Z-score.
• x is the original value.
• μ is the mean of the distribution.
• σ is the standard deviation of the distribution.
Here's how Z-score normalization works:
1. Calculate Mean and Standard Deviation: Compute the mean (μ) and standard deviation (σ) of the
data distribution.
2. Normalize Data: For each data point, subtract the mean (μ) and then divide by the standard deviation
(σ). This centers the data distribution around zero and scales it to have a standard deviation of one.
Z-score normalization is particularly useful in situations where the data distribution may have outliers
or exhibit skewness. By rescaling the data to have a mean of zero and a standard deviation of one, Z-
22
score normalization helps to mitigate the impact of outliers and ensures that all features contribute
equally to the analysis.
It's important to note that Z-score normalization assumes that the data distribution is approximately
Gaussian (normal). If the distribution is significantly non-normal, other normalization techniques may
be more appropriate. Additionally, Z-score normalization is sensitive to outliers, so preprocessing
steps such as outlier removal or transformation may be necessary before normalization.
Consider, two users A and B that both have an average rating of 3. Moreover, suppose that the
ratings of A alternate between 1 and 5, while those of B are always 3. A rating of 5 given to an
item by B is more exceptional than the same rating given by A, and, thus, reflects a greater
appreciation for this item.
While mean-centering removes the offsets caused by the different perceptions of an average rating,
Zscore normalization also considers the spread in the individual rating scales.
Once again, this is usually done differently in user-based than in item-based recommendation. In user-
based methods, the normalization of a rating rui divides the user-mean-centered rating by the standard
deviation σu of the ratings given by user u:
23
A user-based prediction of rating rui using this normalization approach would therefore be obtained
as
Likewise, the z-score normalization of rui in item-based methods divides the item mean-centered
rating by the standard deviation of ratings given to item i:
• Comparing mean-centering with Z-score, as mentioned, the second one has the additional benefit of
considering the variance in the ratings of individual users or items. This is particularly useful if the
rating scale has a wide range of discrete values or if it is continuous. On the other hand, because the
ratings are divided and multiplied by possibly very different standard deviation values, Z-score can be
more sensitive than mean-centering and, more often, predict ratings that are outside the rating scale.
• Finally, if rating normalization is not possible or does not improve the results, another possible
approach to remove the problems caused by the individual rating scale is preference-based filtering.
The particularity of this approach is that it focuses on predicting the relative preferences of users instead
of absolute rating values. Since the rating scale does not change the preference order for items,
predicting relative preferences removes the need to normalize the ratings.
24
where Iuv once more denotes the items rated by both u and v. A problem with
this measure is that is does not consider the differences in the mean and variance of the
ratings made by users u and v.
A popular measure that compares ratings where the effects of mean and variance
have been removed is the Pearson Correlation (PC) similarity:
The Pearson correlation coefficient 𝑟𝑥𝑦 between two users x and y is calculated
as:
25
To calculate the Pearson correlation similarity between User 1 and User 2 based on the provided ratings
for movies, we'll follow the steps outlined earlier:
So, the Pearson correlation similarity between User 1 and User 2 is approximately 0.546. This indicates a
moderate positive correlation between their ratings on the shared movies.
So, the Pearson correlation similarity between User 1 and User 3 is approximately -0.56. This negative
correlation suggests some dissimilarity between their ratings on the shared movie.
So, the Pearson correlation similarity between User 1 and User 4 is approximately -1.45. This negative
correlation suggests some dissimilarity between their ratings on the shared movies.
27
III. Mean Squared Difference (MSD)
The Mean Squared Difference (MSD) is a statistical measure used to quantify the average squared
difference between two sets of values. It is commonly employed in various fields, including statistics,
machine learning, and signal processing, to assess the similarity or dissimilarity between datasets.
Example:
Suppose we have three users (User X, User Y, and User Z) and their ratings for four movies (Movie
1, Movie 2, Movie 3, and Movie 4). Here are the ratings:
User X: [4, 3, 5, 2]
User Y: [3, 2, 4, 3]
User Z: [5, 4, 3, 2]
To calculate the Mean Squared Difference (MSD) between User X and User Y for these movies, we
follow these steps:
• Compute the squared difference between corresponding ratings of User X and User Y for each movie.
• Calculate the mean of these squared differences.
28
Let's proceed with the calculations:
Squared differences:
Movie 1: (4 - 3)^2 = 1
Movie 2: (3 - 2)^2 = 1
Movie 3: (5 - 4)^2 = 1
Movie 4: (2 - 3)^2 = 1
Mean Squared Difference (MSD):
MSD(X, Y) = (1 + 1 + 1 + 1) / 4 = 4 / 4 = 1
So, the Mean Squared Difference between User X and User Y for these movies is 1.
Similarly, you can calculate the MSD between other pairs of users or for different sets of movies. MSD is
a simple metric that gives you an idea of how similar or dissimilar the ratings of two users are. A lower
MSD indicates greater similarity in ratings.
To calculate the Mean Squared Difference (MSD) between User X and User Z, we'll follow the same
steps:
Here are the ratings:
User X: [4, 3, 5, 2]
User Z: [5, 4, 3, 2]
Squared differences:
Movie 1: (4 - 5)^2 = 1
Movie 2: (3 - 4)^2 = 1
Movie 3: (5 - 3)^2 = 4
Movie 4: (2 - 2)^2 = 0
Mean Squared Difference (MSD):
MSD(X, Z) = (1 + 1 + 4 + 0) / 4 = 6 / 4 = 1.5
So, the Mean Squared Difference between User X and User Z for these movies is 1.5.
A lower MSD indicates greater similarity in ratings. In this case, the MSD between User X and User
Z is higher than the MSD between User X and User Y (which was 1), suggesting that User X's ratings
are more similar to User Y's ratings than to User Z's ratings.
To calculate the Mean Squared Difference (MSD) between User Y and User Z for the given movies, we'll
follow the same steps as before:
Compute the squared difference between corresponding ratings of User Y and User Z for each movie.
Calculate the mean of these squared differences.
Here are the ratings:
User Y: [3, 2, 4, 3]
User Z: [5, 4, 3, 2]
Squared differences:
Movie 1: (3 - 5)^2 = 4
Movie 2: (2 - 4)^2 = 4
Movie 3: (4 - 3)^2 = 1
Movie 4: (3 - 2)^2 = 1
Mean Squared Difference (MSD):
MSD(Y, Z) = (4 + 4 + 1 + 1) / 4 = 10 / 4 = 2.5
So, the Mean Squared Difference between User Y and User Z for these movies is 2.5.
29
This indicates that User Y and User Z have somewhat differing preferences across these movies, as
reflected by their ratings.
Squared differences:
Movie 1: (3 - 5)^2 = 4
Movie 2: (2 - 4)^2 = 4
Movie 3: (4 - 3)^2 = 1
Movie 4: (3 - 2)^2 = 1
Mean Squared Difference (MSD):
MSD(Y, Z) = (4 + 4 + 1 + 1) / 4 = 10 / 4 = 2.5
So, the Mean Squared Differences between the users are:
MSD(X, Y) = 1
MSD(X, Z) = 1.5
MSD(Y, Z) = 2.5
These values indicate the level of similarity between the ratings of each pair of users. Lower MSD values
indicate greater similarity.
NEIGHBORHOOD SELECTION
The selection of the neighbors used in the recommendation of items is normally done in two
steps:
1) a global filtering step where only the most likely candidates are kept
2) a per prediction step which chooses the best candidates for this prediction.
PRE – FILTERING OF NEIGHBORS
The pre-filtering of neighbors is an essential step that makes neighborhood-based approaches
practicable by reducing the amount of similarity weights to store, and limiting the number of
candidate neighbors to consider in the predictions. There are several ways in which this can be
accomplished:
• Top-N filtering: For each user or item, only a list of the N nearest-neighbors and their respective similarity weight is kept.
To avoid problems with efficiency or accuracy, N should be chosen carefully. Thus, if N is too large, an excessive amount
of memory will be required to store the neighborhood lists and predicting ratings will be slow. On the other hand, selecting a
too small value for N may reduce the coverage of the recommendation method, which causes some items to be never
recommended.
• Threshold filtering: Instead of keeping a fixed number of nearest-neighbors, this approach keeps all the neighbors whose
similarity weight has a magnitude greater than a given threshold 𝑤𝑚𝑖𝑛 . While this is more flexible than the previous filtering
technique, as only the most significant neighbors are kept, the right value of wmin may be difficult to determine.
• Negative filtering: In general, negative rating correlations are less reliable than positive ones. Intuitively, this is because
strong positive correlation between two users is a good indicator of their belonging to a common group (e.g., teenagers,
science-fiction fans, etc.). However, although negative correlation may indicate membership to different groups, it does not
tell how different these groups are, or whether these groups are compatible for other categories of items. While experimental
investigation have found negative correlations to provide no significant improvement in the prediction accuracy, whether
such correlations can be discarded depends on the data.
30
NEIGHBORS IN THE PREDICTIONS
Calculate Similarity: Use a similarity metric (such as Pearson correlation, cosine similarity, or Jaccard
similarity) to measure the similarity between users or items based on their ratings or features.
Identify Neighbors: Select the top-k most similar users or items as neighbors. The value of k can be
predefined or determined dynamically.
Make Predictions: Use the ratings of the neighbors to predict ratings for the target user or item. This can be
done by taking a weighted average of the ratings given by neighbors, where the weights are the similarities
between the neighbors and the target user (or item).
Recommendation: Once predictions are made, recommend items with the highest predicted ratings to the
target user.
Suppose we have three users (User A, User B, and User C) and their ratings for movies (Movie 1, Movie 2,
and Movie 3). We want to predict the rating of Movie 3 for User A.
• Identify Neighbors: Let's say we choose User B and User C as neighbors based on their high
similarity scores.
• Make Predictions: We can predict the rating of Movie 3 for User A by taking a weighted average of
the ratings given by User B and User C for Movie 3, where the weights are their similarities with
User A.
• Recommendation: Recommend Movie 3 to User A if the predicted rating is above a certain threshold.
• In practice, recommendation systems use more sophisticated algorithms and techniques, but the basic
idea remains the same: identify similar users or items as neighbors and use their preferences to make
predictions or recommendations.
31
SECURITY ASPECTS OF RECOMMENDER SYSTEMS
Collaborative filtering is a popular technique used in recommender systems to predict the preferences
of a user by leveraging the preferences of other similar users. There are two main types of collaborative
filtering: user-based collaborative filtering and item-based collaborative filtering.
python
Copy code
import numpy as np
class CollaborativeFiltering:
def __init__(self, ratings):
32
self.ratings = ratings
self.similarity_matrix = self.calculate_similarity_matrix()
def calculate_similarity_matrix(self):
similarity_matrix = np.zeros((len(self.ratings), len(self.ratings)))
for i in range(len(self.ratings)):
for j in range(len(self.ratings)):
if i == j:
similarity_matrix[i][j] = 1
else:
similarity_matrix[i][j] = self.calculate_similarity(self.ratings[i], self.ratings[j])
return similarity_matrix
# Example usage
ratings = np.array([
[5, 3, 0, 1],
[4, 0, 0, 1],
[1, 1, 0, 5],
[0, 1, 5, 4],
[0, 1, 5, 0],
])
cf = CollaborativeFiltering(ratings)
user_id = 0
predicted_ratings = cf.predict_ratings(user_id)
print("Predicted ratings for user", user_id, ":", predicted_ratings)
33
This implementation computes the similarity between users based on their rating vectors using the
cosine similarity metric. Then, it predicts the ratings for a given user by considering the ratings of
similar users weighted by their similarity scores. Finally, it prints the predicted ratings for a specified
user.
Now, we can predict the ratings for User 5 by taking a weighted average of the ratings of User 4 for
the movies:
Predicted rating for Movie 1 = (Similarity(User 5, User 4) * Rating(User 4, Movie 1)) / Similarity sum
= (0.8944 * 0) / 1.4488 ≈ 0
Predicted rating for Movie 2 = (0.8944 * 1) / 1.4488 ≈ 0.6189
Predicted rating for Movie 3 = (0.8944 * 5) / 1.4488 ≈ 3.0846
Predicted rating for Movie 4 = (0.8944 * 4) / 1.4488 ≈ 2.4727
So, the predicted ratings for User 5 would be approximately 0 for Movie 1, 0.6189 for Movie 2, 3.0846
for Movie 3, and 2.4727 for Movie 4. These ratings are based on the ratings of User 4, who is the most
similar to User 5.
34