Professional Documents
Culture Documents
Collaborative Filtering
Contents
• Recommender systems are the software tools and techniques that provide suggestions, such as useful
recommendations on Amazon, news recommendations on online news websites, and the so on.
• The main goal of recommender systems is to provide suggestions to online users to make better
into consideration the available digital footprint of the user and information about a product, such as
specifications, feedback from the users, comparison with other products, and so on, before making
recommendations.
Recommender System – Landscape – Types
User-based CF
Content-Based Memory-based
Item-based CF
Collaborative
Personalized Filtering Based
(CF) Matrix
Factorization
using SVD
Hybrid Model-based
Deep Learning
Recommender Methods
System
Popularity Based
Recency Based
Sorting/Filtering
Non-Personalized
on
Most-valued
based
Genre/Category/
Topic Based
Recommender System – Rating Data
Rating Data of Social Media and Web Platforms
• Generally rating data is used in multiple web platforms and
social media platforms
• Video streaming (e.g., Netflix, Youtube, PrimeVideo)
• Hotels & Hospitality (e.g., Airbnb, Booking.com,
Makemytrip.com)
• Taxi/Cab (Ola, Uber ), Books (Google books, Goodreads.com)
• Vendors (B2B platforms like IndiaMart or Alibaba)
• General consumer goods (Amazon.com, Flipkart)
• Doctors & Hospitals (Yelp.com), Movies (IMDB, Rottentomatoes)
• Social Media Post & Content (Twitter, Facebook )
• Digital newswebsites, articles (timesofIndia, WallstreetJournal)
among others.
Rating Data
• Ratings can range from 1 to 5 , 1 to 7 , Like, Yay-Nay-Love, Like/Dislike, Thumbs-
up/Thumbs-Down, Yes/No, Okay/Not-Okay .
• Rating can change also for business https://www.cnet.com/tech/services-and-
software/netflix-adds-two-thumbs-up-rating-for-content-you-absolutely-love/
• What does ratings represents ?
• Feelings of liking or disliking an activity/product/service/experience measured in degrees of
rating.
• Reflect the human behavior and becomes a snapshot of past likes and dislikes.
Past behavior can be used to predict future behavior.
• This becomes the foundation to use ratings as the data for analytics purpose. The most common
way to reflect this purpose is to generate recommendations based on past behavior which may be
useful and apt for current user behavior.
• The system which incorporates various ways of generating recommendations is also called
recommender system and many social media and web platforms use it to bring more options to
user personalization and to boost user activity on their platforms.
Rating Data Uses
Personalized Recommendations: Rating data is often leveraged to
provide personalized recommendations to users. By analyzing the
preferences and patterns in users' rating behaviors, platforms can
offer tailored suggestions and content based on individual tastes and
preferences.
Euclidean Cosine
• Similarity measurements distance Similarity
Pearson
Jaccard
Correlation
Similarity
coefficient
Mathematically
This calculation using matrix multiplication of two n-dimensional vectors A and B in general we use the formula as
• dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
https://scikit-
learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances
• Then compute Euclidean Similarity = 1/(1+Euclidean Distance) . Range of values will be 0 to 1, with 1 being
similar and closer to Zero being Dissimilar.
Similarity Measures - Cosine Similarity
Cosine measure ranges from 0 to 1 if only positive ratings are considered. If negative ratings are used/provided/allowed in
user-item matrix, then range of values of Cosine similarity will be -1 to 1 with 1 showing similar item.
• The Jaccard similarity measures the similarity between two sets of data
to see which members are shared and distinct. The Jaccard similarity is
calculated by dividing the number of observations in both sets by the
number of observations in either set.
• In other words, the Jaccard similarity can be computed as the size of the
intersection divided by the size of the union of two sets.
• Jaccard measures work on binary rating data. Let us recompute the user-item rating matrix with
following assumptions:
• Any rating greater than or equal to 2 will be considered likability for a user towards the item so will be treated as
one.
• Any rating less than 2 will be considered not-likeable for a user towards the item so will be treated as zero.
• The Pearson correlation coefficient measures the linear relationship between two datasets.
Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed.
Similarity Measures – Pearson Correlation - Example
• Let the user-item rating matrix be
User\Item Item1 Item2
A 2 3
B 1 2
• The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking,
Pearson’s correlation requires that each dataset be normally distributed.
• Pearson correlation similarity = 1 for above user-item rating matrix
• Pearson correlation similarity ranges from -1 to 1, where -1 means dissimilarity and 1 means complete
similarity between the users.
In Python implementation is through a Pearson correlation through scipy.stats
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
Collaborative Filtering
Collaborative filtering recommender systems
• Collaborative filtering is a branch of recommendation that takes account of the information about different
users. The word "collaborative" refers to the fact that users collaborate with each other to recommend
items. In fact, the algorithms take account of user purchases and preferences in form of ratings.
• Task performed here is filtering items from a large set of alternatives collaboratively between users'
preferences depicted in form of user ratings for the items.
• Two users share the same interests in the past (i.e., they liked the same items such as books, posts, movies,
music etc.) they will also have similar tastes in the future.
• Collaborative filtering approach considers only user preferences (rating data) and does not take into
account the features or contents of the items being recommended. This approach requires a large set of
user preferences for more accurate results.
• Item-based collaborative filtering: This recommends to a user the items that are most similar to the
user's purchases.
• User-based collaborative filtering: This recommends to a user the items that are the most preferred by
similar users.
User\Item Baahubali1 Baahubali2 KGF1 KGF2 Kantara User\Item Baahubali1 Baahubali2 KGF1 KGF2 Kantara
Calvin 5 4 2 - 3 Calvin 5 4 2 0 3
Hobbes - 3 4 5 - Hobbes 0 3 4 5 0
Peanut 2 - 3 4 5 Peanut 2 0 3 4 5
Bhim 4 2 - 1 - Bhim 4 2 0 1 0
New_User 1 - 2 - - New_User 1 0 2 0 0
There are missing values shown by ‘-’ ‘-’ are replaced by zero values for
calculations.
Let the user matrix be given for 4 user and a “New_User” to the web app with five items which are movies. With the not
available ratings as not every user can rate every movie and neither every movie can be rated by all users’ in the real-
world scenario.
Recommendation task : Use Collaborative Filtering to generate Top-N recommendations for the items not rated by the
New_User
User-based collaborative filtering (UBCF)
• Measure how similar each user is to the new one. Like IBCF, popular similarity measures are
correlation and cosine.
• Rate the items purchased by the most similar users. The rating is the average rating among similar
users :
• Take Weighted average rating, using the similarities as weights
• Relative difference is applied to overcome user bias for poor raters and generous raters (optional, to remove
bias)
• If the new item hasn't been purchased by anyone, it will never be recommended. IBCF matches items
that have been purchased by the same users, so it won't match the new item with any of the others.
UBCF recommends to each user items purchased by similar users, and no one purchased the new
item. So, the algorithm won't recommend it to anyone.
Hybrid recommender systems
• Combining various recommender systems to build a more robust system. By combining
various recommender systems, we can eliminate the disadvantages of one system with the
advantages of another system and thus build a more robust system.
• Combining collaborative filtering methods, where the model fails when new items don't
have ratings, with content-based systems, where feature information about the items is
available, new items can be recommended more accurately and efficiently.
• Considerations?
• What techniques should be combined to achieve the business solution?
• How should we combine various techniques and their results for better predictions?
Evaluation
Evaluation techniques
• System is efficient or accurate? - base on which we state that the system is good?
• Whether the model is over fitting or under fitting ? How well the model fits the future data or test
data?
• You can do cross validation and create confusion matrix or use RMSE values
• Multi-Modal Recommendations:
• Cross-Platform Consistency: If recommendations are provided across multiple platforms (website, app,
smart TV), ensure a consistent experience and recommendations.
Top-N recommendations – Considerations – Data Privacy & Data Management