You are on page 1of 26

Introduction to Recommender System

Guo, Guangming
guogg.good@gmail.com
Outline

• Background & Definition


• Some history worth noting
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 2


Outline

• Background & Definition


– Related areas
– Challenges
– Paradigms
• Some history worth noting
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 3


Become clear with basic concepts

• First step of learning

• Building blocks of new ideas

• Define the rules to play with

• Prerequisites for communication

2012-12-19 Lab of Semantic Computing and Data Mining 4


Definition of Recommender Systems

• Also named recommendation systems


• A subclass of information filtering system that seek to predict
the 'rating' or 'preference' that a user would give to an item
(such as music, books, or movies) or social element (e.g. peop
le or groups) they had not yet considered, using a model built
from the characteristics of an item (content-based approache
s) or the user's social environment (collaborative filtering app
roaches). --http://en.
wikipedia.org/wiki/Recommender

2012-12-19 Lab of Semantic Computing and Data Mining 5


More truth

• Important vertical technique in data mining


• One of the most success solution for industry

• Became an independent research area in 1990s


– Many highly reputed academic conferences such as SIGIR, KDD, ICML, WWW, E
MNLP et al. have it as their subtopics.
– RecSys is fully devoted to this area
• Data mining/machine learning approach
– 1) specifying heuristics that define the utility function and empirically validatin
g its performance
– 2) estimating the utility function that optimizes certain performance criterion,
such as the mean square error.

2012-12-19 Lab of Semantic Computing and Data Mining 6


Chanllenges

• Cold start
• Long tail
• Data sparsity
• Scalability
• Social & Temporal
• Context-aware
• Personality-aware
• Being accuracy is not enough

2012-12-19 Lab of Semantic Computing and Data Mining 7


Related Research Area

• Cognitive science
• Text mining
• Natural Language Processing
• Information retrieval
• Machine learning
• Association mining
• Approximation theory
• Management science
• Consumer choice in marketing

2012-12-19 Lab of Semantic Computing and Data Mining 8


Paradigm of RecSys

• Content-based recommendations:
– recommended items similar to the ones the user preferred in the past;
• Collaborative recommendations:
– recommended items that people with similar tastes and preferences liked in th
e past;
• Knowledge-based recommendations:
– recommended items based existing knowledge models that fit the needs of us
ers
• Hybrid approaches:
– Combination of various input data or/and composition various mechanism

2012-12-19 Lab of Semantic Computing and Data Mining 9


Background

• Universe Problem in Information Age


– Information overload
– From SE to Recsys
– pull vs. push
– Web 1.0 vs. web 2.0
• Leverage the existing user generated data
– User profile
– Behavior history on the web,Rating
– Click through data, browse data
• Great benefits(win-win)
– Help users find valuable information
– Help business make more profits

2012-12-19 Lab of Semantic Computing and Data Mining 10


Outline

• Background & Definition


• Some history worth noting
– Netflix prize
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 11


A peak in the history

• Research on collaborative filtering algorithm reached a peak d


uring the Netflix movie recommendation competition

• October 2, 2006 ~ September 21, 2009

• RMSE
– Must outperform baseline by 10%

2012-12-19 Lab of Semantic Computing and Data Mining 12


The Million Dollar Programming Prize

• The Netflix Prize


– Greatly energize the research in Recsys
– Last from 2006 to 2009
• Finalist: BellKor’s Pragamatic Chaos team
– A joint-team
– Andreas Töscher and Michael Jahrer ( Commendo Research &Consulting Gmb
H), originally team BigChaos
– Robert Bell, and Chris Volinsky (AT& T), Yehuda Koren (Yahoo),originally team B
ellKor
– Martin Piotte and Martin Chabbert, originally team Pragmatic Theory
• The ensemble Team
– The most accurate algorithm in 2007 used an ensemble method of 107 differe
nt algorithmic approaches

2012-12-19 Lab of Semantic Computing and Data Mining 13


Outline

• Background & Definition


• Some history worth noting
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 14


Existing applications

• News/Article recommendation
• Targeted Advertisement
• Tags Recommendation
• Mobile Recommendation

• E-commerce
– Books, movies, music…

2012-12-19 Lab of Semantic Computing and Data Mining 15


Benefits

• Alternative to Search Engine

• Boost the profit


– Amazon et al.

• Better user experience

2012-12-19 Lab of Semantic Computing and Data Mining 16


Outline

• Background & Definition


• Some history worth noting
• Various applications
• Main-stream approach
– Content-based
– Collaborative filtering
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 17


Content-based

• Simple compute the similarity


– Cosine similarity or pearson correlation coefficient
– TF-IDF

• Utilize dimensionality reduction


– LDA

2012-12-19 Lab of Semantic Computing and Data Mining 18


Collaborative filtering

• Association mining
• Memory-based
– Nearest-neighbors
• Model-based
– Latent fator model

• Some comparison
– Space & time
– Theory foundation and interpretability

2012-12-19 Lab of Semantic Computing and Data Mining 19


Latent factor model

•  LSI, pLSA, LDA, latent class model, Topic model et al.

• A method based on matrix factorization/decomposition

where R is the rating matrix, P and Q are sub-matrix after dim


ension reduction
An low-rank approximation of the original matrix

2012-12-19 Lab of Semantic Computing and Data Mining 20


Computations

•  Traditional SVD
– Needs a simple method to complete the matrix
– Cost on the completed dense matrix is very high

• Situation changed in 2006 after the Netflix Prize


– Simon Funk
– Defined a cost function on the training data

• To avoid overfitting, add regularization term

• Gradient descent to optimize C(p,q)

2012-12-19 Lab of Semantic Computing and Data Mining 21


Outline

• Background & Definition


• Some history worth noting
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 22


Evaluation Criterion

• User satisfaction by quesionnaire


• Precision
– RMSE
– Top-k
• Coverage
• Diversity
• Novelty
• Serendipity
– Originally thinking recommendation has non-sense
• …

2012-12-19 Lab of Semantic Computing and Data Mining 23


Outline

• Background & Definition


• Some history worth noting
• Various applications
• Main-stream approach
• Evaluation
• Some resources

2012-12-19 Lab of Semantic Computing and Data Mining 24


葫芦项亮

2012-12-19 Lab of Semantic Computing and Data Mining 25


Resources

• www.recsyswiki.com

• 各大推荐引擎资料汇总 by 大魁
– http://blog.csdn.net/lzt1983/article/details/7914536

2012-12-19 Lab of Semantic Computing and Data Mining 26

You might also like