You are on page 1of 9

Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 1 22-4-2010 #1

R
decision making. With burgeoning consumerism buo-
Recommender Systems
yed by the emergence of the web, buyers are being pre-
sented with an increasing range of choices while sellers
Prem Melville, Vikas Sindhwani
are being faced with the challenge of personalizing their
Machine Learning,
advertising efforts. In parallel, it has become common
IBM T. J. Watson Research Center,
for enterprises to collect large volumes of transactional
Route /P.O. Box ,
data that allows for deeper analysis of how a customer
 Kitchawan Rd,
base interacts with the space of product offerings. Rec-
Yorktown Heights,
ommender systems have evolved to fulfill the natural
NY , USA
dual need of buyers and sellers by automating the gen-
eration of recommendations based on data analysis.
Definition The term “collaborative filtering” was introduced in
The goal of a recommender system is to generate mean- the context of the first commercial recommender sys-
ingful recommendations to a collection of users for tem, called Tapestry (Goldberg, Nichols, Oki, & Terry,
items or products that might interest them. Sugges- ), which was designed to recommend documents
tions for books on Amazon, or movies on Netflix, drawn from newsgroups to a collection of users. The
are real-world examples of the operation of industry- motivation was to leverage social collaboration in order
strength recommender systems. The design of such to prevent users from getting inundated by a large vol-
recommendation engines depends on the domain and ume of streaming documents. Collaborative filtering,
the particular characteristics of the data available. For which analyzes usage data across users to find well-
example, movie watchers on Netflix frequently provide matched user-item pairs, has since been juxtaposed
ratings on a scale of  (disliked) to  (liked). Such a data against the older methodology of content filtering,
source records the quality of interactions between users which had its original roots in information retrieval.
and items. Additionally, the system may have access to In content filtering, recommendations are not “collab-
user-specific and item-specific profile attributes such as orative” in the sense that suggestions made to a user
demographics and product descriptions, respectively. do not explicitly utilize information across the entire
Recommender systems differ in the way they ana- user-base. Some early successes of collaborative filter-
lyze these data sources to develop notions of affinity ing on related domains included the GroupLens sys-
between users and items, which can be used to identify tem (Resnick, Neophytos, Bergstrom, Mitesh, & Riedl,
well-matched pairs. ⊲Collaborative Filtering systems b).
analyze historical interactions alone, while ⊲Content- As noted in Billsus and Pazzani (), initial
based Filtering systems are based on profile attributes; formulations for recommender systems were based
and hybrid techniques attempt to combine both of these on straightforward correlation statistics and predictive
designs. The architecture of recommender systems and modeling, not engaging the wider range of practices
their evaluation on real-world problems is an active area in statistics and machine learning literature. The col-
of research. laborative filtering problem was mapped to classifi-
cation, which allowed dimensionality reduction tech-
Motivation and Background niques to be brought into play to improve the quality
Obtaining recommendations from trusted sources is of the solutions. Concurrently, several efforts attempted
a critical component of the natural process of human

C. Sammut, G. Webb (eds.), Encyclopedia of Machine Learning, DOI ./----, © Springer-Verlag Berlin Heidelberg, 
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 2 22-4-2010 #2

 Encyclopedia of Machine Learning

to combine content-based methods with collaborative Items


1 2 ... i ... m
filtering, and to incorporate additional domain knowl- 1 5 3 1 2
edge in the architecture of recommender systems. 2 2 4
Further research was spurred by the public avail- Users : 5
u 3 4 2 1
ability of datasets on the web, and the interest gener- : 4
ated due to direct relevance to e-commerce. Netflix, n 3 2
an online streaming video and DVD rental service,
a 3 5 ? 1
released a large-scale dataset containing  million rat-
ings given by about half-a-million users to thousands of Recommender Systems. Figure . User ratings matrix,
movie titles, and announced an open competition for where each cell ru,i corresponds to the rating of user u for
the best collaborative filtering algorithm in this domain. item i. The task is to predict the missing rating ra,i for the
Matrix Factorization (Bell, Koren, & Volinsky, ) active user a
techniques rooted in numerical linear algebra and sta-
tistical matrix analysis emerged as a state-of-the-art ● Hybrid approaches: These methods combine both
technique. collaborative and content-based approaches.
Currently, recommender systems remain an active
area of research, with a dedicated ACM conference, Collaborative Filtering
intersecting several subdisciplines of statistics, machine Collaborative filtering (CF) systems work by collect-
learning, data mining, and information retrievals. App- ing user feedback in the form of ratings for items
lications have been pursued in diverse domains rang- in a given domain and exploiting similarities in rat-
ing from recommending webpages to music, books, ing behavior amongst several users in determining
movies, and other consumer products. how to recommend an item. CF methods can be fur-
ther subdivided into neighborhood-based and model-
Structure of Learning System based approaches. Neighborhood-based methods are
The most general setting in which recommender sys- also commonly referred to as memory-based approaches
tems are studied is presented in Fig. . Known user (Breese, Heckerman, & Kadie, ).
preferences are represented as a matrix of n users and
m items, where each cell ru,i corresponds to the rating Neighborhood-based Collaborative Filtering In neigh-
given to item i by the user u. This user ratings matrix is borhood-based techniques, a subset of users are cho-
typically sparse, as most users do not rate most items. sen based on their similarity to the active user, and a
The recommendation task is to predict what rating a weighted combination of their ratings is used to pro-
user would give to a previously unrated item. Typically, duce predictions for this user. Most of these approaches
ratings are predicted for all items that have not been can be generalized by the algorithm summarized in the
observed by a user, and the highest rated items are pre- following steps:
sented as recommendations. The user under current
consideration for recommendations is referred to as the . Assign a weight to all users with respect to similarity
active user. with the active user.
The myriad approaches to recommender systems . Select k users that have the highest similarity with
can be broadly categorized as: the active user – commonly called the neighborhood.
. Compute a prediction from a weighted combination
● Collaborative Filtering (CF): In CF systems, a user is of the selected neighbors’ ratings.
recommended items based on the past ratings of all
users collectively. In step , the weight wa,u is a measure of similar-
● Content-based recommending: These approaches ity between the user u and the active user a. The most
recommend items that are similar in content to items commonly used measure of similarity is the Pearson
the user has liked in the past, or matched to pre- correlation coefficient between the ratings of the two
defined attributes of the user. users (Resnick, Iacovou, Sushak, Bergstrom, & Reidl,
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 3 22-4-2010 #3

Encyclopedia of Machine Learning 

a), defined below: () proposed item-to-item collaborative filtering


where rather than matching similar users, they match
∑i∈I (ra,i − r a )(ru,i − r u ) a user’s rated items to similar items. In practice, this
wa,u = √ ()
  approach leads to faster online systems, and often
∑i∈I (ra,i − r a ) ∑i∈I (ru,i − r u )
results in improved recommendations (Linden et al.,
where I is the set of items rated by both users, ru,i is the ; Sarwar, Karypis, Konstan, & Reidl, ).
rating given to item i by user u, and r u is the mean rating In this approach, similarities between pairs of items
given by user u. i and j are computed off-line using Pearson correlation,
In step , predictions are generally computed as given by:
the weighted average of deviations from the neighbor’s
mean, as in: ∑u∈U (ru,i − r̄i )(ru,j − r̄j )
wi,j = √ √ ()
∑u∈U (ru,i − r̄i ) ∑u∈U (ru,j − r̄j )
∑u∈K (ru,i − r u ) × wa,u
pa,i = r a + () where U is the set of all users who have rated both items
∑u∈K wa,u
i and j, ru,i is the rating of user u on item i, and r̄i is the
where pa,i is the prediction for the active user a for item average rating of the ith item across users.
i, wa,u is the similarity between users a and u, and K is Now, the rating for item i for user a can be predicted
the neighborhood or set of most similar users. using a simple weighted average, as in:
Similarity based on Pearson correlation measures
the extent to which there is a linear dependence between ∑j∈K ra,j wi,j
pa,i = ()
two variables. Alternatively, one can treat the ratings ∑j∈K ∣wi,j ∣
of two users as a vector in an m-dimensional space,
where K is the neighborhood set of the k items rated by
and compute similarity based on the cosine of the angle
a that are most similar to i.
between them, given by:
For item-based collaborative filtering too, one may
⃗ra ⋅ ⃗ru use alternative similarity metrics such as adjusted cosine
wa,u = cos(⃗ra ,⃗ru ) = similarity. A good empirical comparison of variations
∥⃗ra ∥ × ∥⃗ru ∥
m
∑i= ra,i ru,i of item-based methods can be found in Sarwar et al.
=√ √ () ().
m  m 
∑i= ra,i ∑i= ru,i
Significance Weighting: It is common for the active
When computing cosine similarity, one cannot have user to have highly correlated neighbors that are based
negative ratings, and unrated items are treated as having on very few co-rated (overlapping) items. These neigh-
a rating of zero. Empirical studies (Breese et al., ) bors based on a small number of overlapping items tend
have found that Pearson correlation generally performs to be bad predictors. One approach to tackle this prob-
better. There have been several other similarity mea- lem is to multiply the similarity weight by a significance
sures used in the literature, including Spearman rank weighting factor, which devalues the correlations based
correlation, Kendall’s τ correlation, mean squared differ- on few co-rated items (Herlocker et al., ).
ences, entropy, and adjusted cosine similarity (Herlocker, Default Voting: An alternative approach to dealing
Konstan, Borchers, & Riedl, ; Su & Khoshgoftaar, with correlations based on very few co-rated items is
). to assume a default value for the rating for items that
Several extensions to neighborhood-based CF, which have not been explicitly rated. In this way one can now
have led to improved performance are discussed below. compute correlation (Eq. ) using the union of items
Item-based Collaborative Filtering: When applied to rated by users being matched as opposed to the inter-
millions of users and items, conventional neighborhood- section. Such a default voting strategy has been shown
based CF algorithms do not scale well, because of to improve collaborative filtering by Breese et al. ().
the computational complexity of the search for sim- Inverse User Frequency: When measuring the similar-
ilar users. As a alternative, Linden, Smith, and York ity between users, items that have been rated by all (and
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 4 22-4-2010 #4

 Encyclopedia of Machine Learning

universally liked or disliked) are not as useful as less techniques are a class of widely successful latent factor
common items. To account for this Breese et al. () models where users and items are simultaneously rep-
introduced the notion of inverse user frequency, which resented as unknown feature vectors (column vectors)
is computed as fi = log n/ni , where ni is the number wu , hi ∈ Rk along k latent dimensions. These feature
of users who have rated item i out of the total number vectors are learnt so that inner products wuT hi approx-
of n users. To apply inverse user frequency while using imate the known preference ratings ru,i with respect
similarity-based CF, the original rating is transformed to some loss measure. The squared loss is a standard
for i by multiplying it by the factor fi . The underlying choice for the loss function, in which case the following
assumption of this approach is that items that are uni- objective function is minimized,
versally loved or hated are rated more frequently than

others. J (W, H) = ∑ (ru,i − wuT hi ) ()
(u,i)∈L
Case Amplification: In order to favor users with high
similarity to the active user, Breese et al. () intro- where W = [w . . . wn ]T is an n × k matrix, H =
duced case amplification which transforms the original [h . . . hm ] is a k ×m matrix, and L is the set of user-item
weights in Eq. () to pairs for which the ratings are known. In the imprac-
tical limit where all user-item ratings are known, the
w′a,u = wa,u ⋅ ∣wa,u ∣ ρ−
above objective function is J(W, H) = ∥R − WH∥fro
where R denotes the n × m fully known user-item
where ρ is the amplification factor, and ρ ≥ .
matrix. The solution to this problem is given by tak-
Other notable extensions to similarity-based col-
ing the truncated SVD of R, R = UDV T and setting
laborative filtering include weighted majority predic-  

tion (Nakamura & Abe, ) and imputation-boosted W = Uk Dk , H = Dk VkT where Uk , Dk , Vk contain the
CF (Su, Khoshgoftaar, Zhu, & Greiner, ). k largest singular triplets of R. However, in the realis-
tic setting where the majority of user-item ratings are
Model-based Collaborative Filtering Model-based tech- unknown, such a nice globally optimal solution cannot
niques provide recommendations by estimating param- be directly obtained, and one has to explicitly opti-
eters of statistical models for user ratings. For example, mize the non-convex objective function J(W, H). Note
Billsus and Pazzani () describe an early approach to that in this case, the objective function is a particu-
map CF to a classification problem, and build a classifier lar form of weighted loss, that is, J(W, H) = ∥S ⊙
for each active user representing items as features over (R − WH)∥fro where ⊙ denotes elementwise products,
users and available ratings as labels, possibly in con- and S is a binary matrix that equals one over known
junction with dimensionality reduction techniques to user-item pairs L, and  otherwise. Therefore, weighted
overcome data sparsity issues. Other predictive model- low-rank approximations are pertinent to this discus-
ing techniques have also been applied in closely related sion (Srebro & Jaakkola, ). Standard optimization
ways. procedures include gradient-based techniques, or pro-
More recently, ⊲latent factor and matrix factoriza- cedures like alternating least squares where H is solved
tion models have emerged as a state-of-the-art method- keeping W fixed and vice versa until a convergence cri-
ology in this class of techniques (Bell et al., ). terion is satisfied. Note that fixing either W or H turns
Unlike neighborhood based methods that generate rec- the problem of estimating the other into a weighted
ommendations based on statistical notions of simi- ⊲linear regression task. In order to avoid learning a
larity between users, or between items, latent factor model that overfits, it is common to minimize the objec-
models assume that the similarity between users and tive function in the presence of ⊲regularization terms,
items is simultaneously induced by some hidden lower- J(W, H)+γ∥W∥ + λ∥H∥ , where γ, λ are regularization
dimensional structure in the data. For example, the parameters that can be determined by cross-validation.
rating that a user gives to a movie might be assumed Once W, H are learnt, the product WH provides an
to depend on few implicit factors such as the user’s approximate reconstruction of the rating matrix from
taste across various movie genres. Matrix factorization where recommendations can be directly read off.
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 5 22-4-2010 #5

Encyclopedia of Machine Learning 

Different choices of loss functions, regularizers, and . The use of additional user-specific and item-specific
additional model constraints have generated a large parameters to account for systematic
body of literature on matrix factorization techniques. biases in the ratings such as popular movies receiv-
Arguably, for discrete ratings, the squared loss is not ing higher ratings on average.
the most natural loss function. The maximum margin . Incorporating temporal dynamics of rating behavior
matrix factorization (Rennie & Srebro, ) approach by introducing time-dependent variables.
uses margin-based loss functions such as the hinge loss
used in ⊲SVM classification, and its ordinal extensions J(W, H) = ∑ (ru,i (t) − bu (t) − bi (t) − r̂
(u,i)∈L
for handling multiple ordered rating categories. For rat-

ings that span over K values, this reduces to finding −wuT (t)hi )
K −  thresholds that divide the real line into consecu-
tive intervals specifying rating bins to which the output where t denotes a time-stamp and W includes time-
is mapped, with a penalty for insufficient margin of sep- dependent user dimensions.
aration. Rennie and Srebro () suggest a nonlinear
conjugate gradient algorithm to minimize a smoothed In many settings, only implicit preferences are avail-
version of this objective function. able, as opposed to explicit like–dislike ratings. For
Another class of techniques is the nonnegative example, large business organizations, typically, metic-
matrix factorization popularized by the work of Lee ulously record transactional details of products pur-
and Seung () where nonnegativity constraints are chased by their clients. This is a one-class setting since
imposed on W, H. There are weighted extensions of the business domain knowledge for negative examples
NMF that can be applied to recommendation problems. – that a client has no interest in buying a product ever
The rating behavior of each user may be viewed as being in the future – is typically not available explicitly in
a manifestation of different roles, for example, a compo- corporate databases. Moreover, such knowledge is dif-
sition of prototypical behavior in clusters of users bound ficult to gather and maintain in the first place, given
by interests or community. Thus, the ratings of each user the rapidly changing business environment. Another
are an additive sum of basis vectors of ratings in the example is recommending TV shows based on watching
item space. By disallowing subtractive basis, nonnega- habits of users, where preferences are implicit in what
tivity constraints lend a “part-based” interpretation to the users chose to see without any source of explicit
the model. NMF can be solved with a variety of loss ratings. Recently, matrix factorization techniques have
functions, but with the generalized KL-divergence loss been advanced to handle such problems (Pan & Scholz,
defined as follows, ) by formulating confidence weighted objective

function, J(W, H) = ∑(u,i) cu,i (ru,i − wuT hi ) , under
ru,i
J(W, H) = ∑ ru,i log − ru,i + wuT hi the assumption that unobserved user-item pairs may
u,i∈L wuT hi
be taken as negative examples with a certain degree of
NMF is in fact essentially equivalent to probabilis- confidence specified via cu,i .
tic latent semantic analysis (pLSA) which has also
Content-based Recommending
previously been used for collaborative filtering tasks
(Hofmann, ). Pure collaborative filtering recommenders only utilize
The recently concluded million-dollar Netflix com- the user ratings matrix, either directly, or to induce a
petition has catapulted matrix factorization techniques collaborative model. These approaches treat all users
to the forefront of recommender technologies in col- and items as atomic units, where predictions are made
laborative filtering settings (Bell et al., ). While the without regard to the specifics of individual users or
final winning solution was a complex ensemble of dif- items. However, one can make a better personalized rec-
ferent models, several enhancements to basic matrix ommendation by knowing more about a user, such as
factorization models were found to lead to improve- demographic information (Pazzani, ), or about an
ments. These included: item, such as the director and genre of a movie (Melville,
Mooney, & Nagarajan, ). For instance, given movie
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 6 22-4-2010 #6

 Encyclopedia of Machine Learning

genre information, and knowing that a user liked “Star simple approach is to allow both content-based and col-
Wars” and “Blade Runner,” one may infer a predilection laborative filtering methods to produce separate ranked
for science fiction and could hence recommend “Twelve lists of recommendations, and then merge their results
Monkeys.” Content-based recommenders refer to such to produce a final list (Cotter & Smyth, ). Claypool,
approaches, that provide recommendations by compar- Gokhale, and Miranda () combine the two predic-
ing representations of content describing an item to tions using an adaptive weighted average, where the
representations of content that interests the user. These weight of the collaborative component increases as the
approaches are sometimes also referred to as content- number of users accessing an item increases.
based filtering. Melville et al. () proposed a general frame-
Much research in this area has focused on recom- work for content-boosted collaborative filtering, where
mending items with associated textual content, such content-based predictions are applied to convert a
as web pages, books, and movies; where the web sparse user ratings matrix into a full ratings matrix, and
pages themselves or associated content like descrip- then a CF method is used to provide recommendations.
tions and user reviews are available. As such, several In particular, they use a Naïve Bayes classifier trained
approaches have treated this problem as an infor- on documents describing the rated items of each user,
mation retrieval (IR) task, where the content associ- and replace the unrated items by predictions from this
ated with the user’s preferences is treated as a query, classifier. They use the resulting pseudo ratings matrix to
and the unrated documents are scored with rele- find neighbors similar to the active user, and produce
vance/similarity to this query (Balabanovic & Shoham, predictions using Pearson correlation, appropriately
). In NewsWeeder (Lang, ), documents in each weighted to account for the overlap of actually rated
rating category are converted into tf-idf word vectors, items, and for the active user’s content predictions. This
and then averaged to get a prototype vector of each cat- approach has been shown to perform better than pure
egory for a user. To classify a new document, it is com- collaborative filtering, pure content-based systems, and
pared with each prototype vector and given a predicted a linear combination of the two. Within this content-
rating based on the cosine similarity to each category. boosted CF framework, Su, Greiner, Khoshgoftaar, and
An alternative to IR approaches, is to treat rec- Zhu () demonstrated improved results using a
ommending as a classification task, where each exam- stronger content-predictor, TAN-ELR, and unweighted
ple represents the content of an item, and a user’s Pearson collaborative filtering.
past ratings are used as labels for these examples. In Several other hybrid approaches are based on tra-
the domain of book recommending, Mooney and Roy ditional collaborative filtering, but also maintain a
() use text from fields such as the title, author, content-based profile for each user. These content-
synopses, reviews, and subject terms, to train a multi- based profiles, rather than co-rated items, are used
nomial ⊲naïve Bayes classifier. Ratings on a scale of to find similar users. In Pazzani’s approach (Pazzani,
 to k can be directly mapped to k classes (Melville ), each user-profile is represented by a vector of
et al., ), or alternatively, the numeric rating can weighted words derived from positive training exam-
be used to weight the training example in a proba- ples using the Winnow algorithm. Predictions are made
bilistic binary classification setting (Mooney & Roy, by applying CF directly to the matrix of user-profiles
). Other classification algorithms have also been (as opposed to the user-ratings matrix). An alternative
used for purely content-based recommending, includ- approach, Fab (Balabanovic & Shoham, ), uses rele-
ing ⊲k-nearest neighbor, ⊲decision trees, and ⊲neural vance feedback to simultaneously mold a personal filter
networks (Pazzani & Billsus, ). along with a communal “topic” filter. Documents are
initially ranked by the topic filter and then sent to a
Hybrid Approaches user’s personal filter. The user’s relevance feedback is
In order to leverage the strengths of content-based and used to modify both the personal filter and the origi-
collaborative recommenders, there have been several nating topic filter. Good et al. () use collaborative
hybrid approaches proposed that combine the two. One filtering along with a number of personalized informa-
tion filtering agents. Predictions for a user are made by
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 7 22-4-2010 #7

Encyclopedia of Machine Learning 

applying CF on the set of other users and the active Where pu,i is the predicted rating for user u on item i, ru,i
user’s personalized agents. is the actual rating, and N is the total number of ratings
Several hybrid approaches treat recommending as in the test set.
a classification task, and incorporate collaborative ele- A related commonly used metric, ⊲Root Mean
ments in this task. Basu, Hirsh, and Cohen () use Squared Error (RMSE), puts more emphasis on larger
Ripper, a ⊲rule induction system, to learn a function absolute errors, and is given by:
that takes a user and movie and predicts whether the √
movie will be liked or disliked. They combine collab- ∑{u,i} (pu,i − ru,i )
RMSE = ()
orative and content information, by creating features N
such as comedies liked by user and users who liked
Predictive accuracy metrics treat all items equally.
movies of genre X. In other work, Soboroff and Nicholas
However, for most recommender systems the primary
() multiply a term-document matrix representing all
concern is accurately predict the items a user will
item content with the user-ratings matrix to produce a
like. As such, researchers often view recommending
content-profile matrix. Using latent semantic Indexing,
as predicting good, that is, items with high ratings
a rank-k approximation of the content-profile matrix
versus bad or poorly rated items. In the context of
is computed. Term vectors of the user’s relevant doc-
information retrieval (IR), identifying the good from
uments are averaged to produce a user’s profile. Then,
the background of bad items can be viewed as dis-
new documents are ranked against each user’s profile in
criminating between “relevant” and “irrelevant” items;
the LSI space.
and as such, standard IR measures, like ⊲Precision,
Some hybrid approaches attempt to directly com-
⊲Recall and ⊲Area Under the ROC Curve (AUC) can
bine content and collaborative data under a single
be utilized. These, and several other measures, such
probabilistic framework. Popescul, Ungar, Pennock,
as F-measure, Pearson’s product-moment correlation,
and Lawrence () extended Hofmann’s aspect mo-
Kendall’s τ, mean average precision, half-life utility, and
del (Hofmann, ) to incorporate a three-way co-
normalized distance-based performance measure are dis-
occurrence data among users, items, and item content.
cussed in more detail by Herlocker et al. ().
Their generative model assumes that users select latent
topics, and documents and their content words are gen-
Challenges and Limitations
erated from these topics. Schein, Popescul, Ungar, and
This section, presents some of the common hurdles
Pennock () extend this approach, and focus on
in deploying recommender systems, as well as some
making recommendations for items that have not been
research directions that address them.
rated by any user.
Sparsity: Stated simply, most users do not rate most
Evaluation Metrics items and, hence, the user ratings matrix is typically
The quality of a recommender system can be evalu- very sparse. This is a problem for collaborative filter-
ated by comparing recommendations to a test set of ing systems, since it decreases the probability of find-
known user ratings. These systems are typical measured ing a set of users with similar ratings. This problem
using predictive accuracy metrics (Herlocker, Konstan, often occurs when a system has a very high item-to-
Terveen, & Riedl, ), where the predicted ratings are user ratio, or the system is in the initial stages of use.
directly compared to actual user ratings. The most com- This issue can be mitigated by using additional domain
monly used metric in the literature is ⊲Mean Absolute information (Melville et al., ; Su et al., ) or
Error (MAE) – defined as the average absolute differ- making assumptions about the data generation process
ence between predicted ratings and actual ratings, given that allows for high-quality imputation (Su et al., ).
by: The Cold-Start Problem: New items and new users
∑{u,i} ∣pu,i − ru,i ∣ pose a significant challenge to recommender systems.
MAE = ()
N Collectively these problems are referred to as the cold-
start problem (Schein et al., ). The first of these
problems arises in collaborative filtering systems, where
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 8 22-4-2010 #8

 Encyclopedia of Machine Learning

an item cannot be recommended unless some user has While pure content-based methods avoid some of
rated it before. This issue applies not only to new items, the pitfalls discussed above, collaborative filtering still
but also to obscure items, which is particularly detri- has some key advantages over them. Firstly, CF can
mental to users with eclectic tastes. As such the new- perform in domains where there is not much content
item problem is also often referred to as the first-rater associated with items, or where the content is diffi-
problem. Since content-based approaches (Mooney & cult for a computer to analyze, such as ideas, opinions,
Roy, ; Pazzani & Billsus, ) do not rely on rat- etc. Secondly, a CF system has the ability to provide
ings from other users, they can be used to produce serendipitous recommendations, that is, it can recom-
recommendations for all items, provided attributes of mend items that are relevant to the user, but do not
the items are available. In fact, the content-based predic- contain content from the user’s profile.
tions of similar users can also be used to further improve
predictions for the active user (Melville et al., ). Further Reading
The new-user problem is difficult to tackle, since Good surveys of the literature in the field can be found
without previous preferences of a user it is not possible in Adomavicius and Tuzhilin (); Bell et al. ();
to find similar users or to build a content-based pro- Su and Khoshgoftaar (). For extensive empirical
file. As such, research in this area has primarily focused comparisons on variations of Collaborative Filtering
on effectively selecting items to be rated by a user so refer to Breese (), Herlocker (), Sarwar et al.
as to rapidly improve recommendation performance ().
with the least user feedback. In this setting, classical
techniques from ⊲active learning can be leveraged to Recommended Reading
address the task of item selection (Harpale & Yang, Adomavicius, G., & Tuzhilin, A. (). Toward the next generation
; Jin & Si, ). of recommender systems: A survey of the state-of-the-art and
possible extensions. IEEE Transactions on Knowledge and Data
Fraud: As recommender systems are being increasingly Engineering, (), –.
adopted by commercial websites, they have started to Balabanovic, M., & Shoham, Y. (). Fab: Content-based, collabo-
play a significant role in affecting the profitability of sell- rative recommendation. Communications of the Association for
Computing Machinery, (), –.
ers. This has led to many unscrupulous vendors engag-
Basu, C., Hirsh, H., & Cohen, W. (July ). Recommendation as
ing in different forms of fraud to game recommender classification: Using social and content-based information in
systems for their benefit. Typically, they attempt to recommendation. In Proceedings of the fifteenth national con-
ference on artificial intelligence (AAAI-), Madison, Wisconsin
inflate the perceived desirability of their own products
(pp. –).
(push attacks) or lower the ratings of their competitors Bell, R., Koren, Y., & Volinsky, C. (). Matrix factorization tech-
(nuke attacks). These types of attack have been broadly niques for recommender systems. IEEE Computer ():–.
studied as shilling attacks (Lam & Riedl, ) or pro- Billsus, D., & Pazzani, M. J. (). Learning collaborative infor-
mation filters. In Proceedings of the fifteenth international con-
file injection attacks (Burke, Mobasher, Bhaumik, & ference on machine learning (ICML-), Madison, Wisconsin
Williams, ). Such attacks usually involve setting (pp. –). San Francisco: Morgan Kaufmann.
up dummy profiles, and assume different amounts of Breese, J. S., Heckerman, D., & Kadie, C. (July ). Empirical anal-
ysis of predictive algorithms for collaborative filtering. In Pro-
knowledge about the system. For instance, the average
ceedings of the fourteenth conference on uncertainty in artificial
attack (Lam & Riedl, ) assumes knowledge of the intelligence, Madison, Wisconsin.
average rating for each item; and the attacker assigns Burke, R., Mobasher, B., Bhaumik, R., & Williams, C. ().
Segment-based injection attacks against collaborative filtering
values randomly distributed around this average, along
recommender systems. In ICDM ’: Proceedings of the fifth
with a high rating for the item being pushed. Studies IEEE international conference on data mining (pp. –).
have shown that such attacks can be quite detrimental to Washington, DC: IEEE Computer Society. Houston, Texas
predicted ratings, though item-based collaborative fil- Claypool, M., Gokhale, A., & Miranda, T. (). Combining
content-based and collaborative filters in an online newspa-
tering tends to be more robust to these attacks (Lam & per. In Proceedings of the SIGIR- workshop on recommender
Riedl, ). Obviously, content-based methods, which systems: algorithms and evaluation.
only rely on a users past ratings, are unaffected by profile Cotter, P., & Smyth, B. (). PTV: Intelligent personalized TV
guides. In Twelfth conference on innovative applications of arti-
injection attacks.
ficial intelligence, Austin, Texas (pp. –).
Encyclopedia of Machine Learning Chapter No: 00338 Page Proof Page 9 22-4-2010 #9

Encyclopedia of Machine Learning 

Goldberg, D., Nichols, D., Oki, B., & Terry, D. (). Using col- SIGKDD conference on knowledge discovery and data mining
laborative filtering to weave an information tapestry. Commu- (KDD), Paris, France.
nications of the Association of Computing Machinery, (), Pazzani, M. J. (). A framework for collaborative, content-based
–. and demographic filtering. Artificial Intelligence Review, 
Good, N., Schafer, J. B., Konstan, J. A., Borchers, A., Sarwar, B., Her- (–), –.
locker, J., et al. (July ). Combining collaborative filtering Pazzani, M. J., & Billsus, D. (). Learning and revising user
with personal agents for better recommendations. In Proceed- profiles: The identification of interesting web sites. Machine
ings of the sixteenth national conference on artificial intelligence Learning, (), –.
(AAAI-), Orlando, Florida (pp. –). Popescul, A., Ungar, L., Pennock, D. M., & Lawrence, S. (). Prob-
Harpale, A. S., & Yang, Y. (). Personalized active learning for abilistic models for unified collaborative and content-based
collaborative filtering. In SIGIR ’: Proceedings of the st recommendation in sparse-data environments. In Proceedings
annual international ACM SIGIR conference on research and of the seventeenth conference on uncertainity in artificial intelli-
development in information retrieval, Singapore (pp. –). gence. University of Washington, Seattle
New York: ACM. Rennie, J., & Srebro, N. (). Fast maximum margin matrix factor-
Herlocker, J., Konstan, J., Borchers, A., & Riedl, J. (). An algo- ization for collaborative prediction. In International conference
rithmic framework for performing collaborative filtering. In on machine learning, Bonn, Germany.
Proceedings of nd international ACM SIGIR conference on Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., & Reidl, J.
research and development in information retrieval, Berkeley, (a). GroupLens: An open architecture for collaborative fil-
California (pp. –). New York: ACM. tering of netnews. In Proceedings of the  computer supported
Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (). cooperative work conference, New York. New York: ACM. ACM
Evaluating collaborative filtering recommender systems. ACM Press New York, NY, USA
Transactions on Information Systems, (), –. Resnick, P., Neophytos, I., Bergstrom, P., Mitesh, S., & Riedl, J.
Hofmann, T. (). Probabilistic latent semantic analysis. In Pro- (b). Grouplens: An open architecture for collaborative
ceedings of the Fifteenth Conference on Uncertainty in Artificial filtering of netnews. In CSCW – Conference on computer
Intelligence, Stockholm, Sweden, July -August ,  Morgan supported cooperative work, Chapel Hill (pp. –). Addison-
Kaufmann Wesley.
Hofmann, T. (). Latent semantic analysis for collaborative fil- Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (). Item-based
tering. ACM Transactions on Information Systems, (), –. collaborative filtering recommendation algorithms. In WWW
Jin, R., & Si, L. (). A Bayesian approach toward active learning ’: Proceedings of the tenth international conference on World
for collaborative filtering. In UAI ’: Proceedings of the th Wide Web (pp. –). New York: ACM. Hong Kong
conference on uncertainty in artificial intelligence, Banff, Canada Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. ().
(pp. –). Arlington: AUAI Press. Methods and metrics for cold-start recommendations. In SIGIR
Lam, S. K., & Riedl, J. (). Shilling recommender systems for fun ’: Proceedings of the th annual international ACM SIGIR
and profit. In WWW ’: Proceedings of the th international conference on research and development in information retrieval
conference on World Wide Web, New York (pp. –). New (pp. –). New York: ACM. Tampere, Finland
York: ACM. Soboroff, I., & Nicholas, C. (). Combining content and collabo-
Lang, K. (). NewsWeeder: Learning to filter netnews. In Proceed- ration in text filtering. In T. Joachims (Ed.), Proceedings of the
ings of the twelfth international conference on machine learning IJCAI’ workshop on machine learning in information filtering
(ICML-) (pp. –). San Francisco. Tahoe City, CA, USA (pp. –).
Morgan Kaufmann, ISBN --- Srebro, N., & Jaakkola, T. (). Weighted low-rank approxima-
Lee, D. D., & Seung, H. S. (). Learning the parts of objects by tions. In International conference on machine learning (ICML).
non-negative matrix factorization. Nature, , . Washington DC
Linden, G., Smith, B., & York, J. (). Amazon.com recom- Su, X., Greiner, R., Khoshgoftaar, T. M., & Zhu, X. (). Hybrid
mendations: Item-to-item collaborative filtering. IEEE Internet collaborative filtering algorithms using a mixture of experts. In
Computing, (), –. Web intelligence (pp. –).
Melville, P., Mooney, R. J., & Nagarajan, R. (). Content-boosted Su, X., & Khoshgoftaar, T. M. (). A survey of collaborative filter-
collaborative filtering for improved recommendations. In Pro- ing techniques. Advances in Artificial Intelligence, , –.
ceedings of the eighteenth national conference on artificial intel- Su, X., Khoshgoftaar, T. M., Zhu, X., & Greiner, R. ().
ligence (AAAI-), Edmonton, Alberta (pp. –). Imputation-boosted collaborative filtering using machine
Mooney, R. J., & Roy, L. (June ). Content-based book recom- learning classifiers. In SAC ’: Proceedings of the  ACM
mending using learning for text categorization. In Proceedings symposium on applied computing (pp. –). New York:
of the fifth ACM conference on digital libraries, San Antonio, ACM.
Texas (pp. –).
Nakamura, A., & Abe, N. (). Collaborative filtering using
weighted majority prediction algorithms. In ICML ’: Proceed-
ings of the fifteenth international conference on machine learning
(pp. –). San Francisco: Morgan Kaufmann. Madison,
Wisconsin
Pan, R., & Scholz, M. (). Mind the gaps: Weighting the unknown
in large-scale one-class collaborative filtering. In th ACM

You might also like