Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Bayesian Adaptive User Profiling with Explicit & Implicit Feedback (slides)

Bayesian Adaptive User Profiling with Explicit & Implicit Feedback (slides)

Ratings:

4.5

(2)
|Views: 56 |Likes:
Published by philip zigoris
we use both explicit and implicit
feedback to build a user’s profile and use Bayesian hierarchi-
cal methods to borrow information from existing users. We
analyze the usefulness of implicit feedback and the adap-
tive performance of the model on two data sets gathered
from user studies where users’ interaction with a document,
or implicit feedback, were recorded along with explicit feed-
back. Our results are two-fold: first, we demonstrate that
the Bayesian modeling approach effectively trades off be-
tween shared and user-specific information, alleviating poor
initial performance for each user. Second, we find that im-
plicit feedback has very limited unstable predictive value by
itself and only marginal value when combined with explicit
feedback.
we use both explicit and implicit
feedback to build a user’s profile and use Bayesian hierarchi-
cal methods to borrow information from existing users. We
analyze the usefulness of implicit feedback and the adap-
tive performance of the model on two data sets gathered
from user studies where users’ interaction with a document,
or implicit feedback, were recorded along with explicit feed-
back. Our results are two-fold: first, we demonstrate that
the Bayesian modeling approach effectively trades off be-
tween shared and user-specific information, alleviating poor
initial performance for each user. Second, we find that im-
plicit feedback has very limited unstable predictive value by
itself and only marginal value when combined with explicit
feedback.

More info:

Published by: philip zigoris on Mar 16, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/16/2009

pdf

text

original

 
Bayesian Adaptive User Profiling with Explicit & ImplicitFeedback
PhilipZigoris
DepartmentofComputerScienceUniversityofCalifornia,SantaCruz
zigoris@soe.ucsc.eduYiZhang
DepartmentofComputerScienceUniversityofCalifornia,SantaCruz
yiz@soe.ucsc.edu
ABSTRACT
Research in information retrieval is now moving into a per-sonalized scenario where a retrieval or filtering system main-tains a separate user profile for each user. In this framework,information delivered to the user can be automatically per-sonalized and catered to individual user’s information needs.However, a practical concern for such a personalized systemis the “cold start problem”: any user new to the systemmust endure poor initial performance until sufficient feed-back from that user is provided.To solve this problem, we use both explicit and implicitfeedback to build a user’s profile and use Bayesian hierarchi-cal methods to borrow information from existing users. Weanalyze the usefulness of implicit feedback and the adap-tive performance of the model on two data sets gatheredfrom user studies where users’ interaction with a document,or
implicit feedback 
, were recorded along with explicit feed-back. Our results are two-fold: first, we demonstrate thatthe Bayesian modeling approach effectively trades off be-tween shared and user-specific information, alleviating poorinitial performance for each user. Second, we find that im-plicit feedback has very limited unstable predictive value byitself and only marginal value when combined with explicitfeedback.
Categories and Subject Descriptors
H.3.3 [
Information Storage and Retrieval
]: Informa-tion Search and Retrieval – Information filtering, relevancefeedback
General Terms
Algorithms, Human Factors
Keywords
Information Retrieval, User Modeling, Bayesian Statistics,Implicit Feedback
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
CIKM’06,
November 5–11, 2006, Arlington, Virginia, USA.Copyright 2006 ACM 1-59593-433-2/06/0011 ...
$
5.00.
1. INTRODUCTION
Although ad hoc retrieval systems have become part of the daily life of internet users or digital library users, it isclear that there is great variety in users’ needs. Thus, suchsystems can not offer the best possible service since they arenot tailoring information to individual user needs.IR research is now moving into a more complex environ-ment with user centered or personalized adaptive informa-tion retrieval, information filtering and recommendation sys-tems as major research topics. As opposed to traditional adhoc systems, a personalized system adaptively learns a pro-file for each user, automatically catering to the user’s specificneeds and preferences [1].To learn a reliable user specific profile, an adaptive sys-tem usually needs a significant amount of explicit feedback(training data) from the user. However, the average userdoesn’t like to answer a lot of questions or provide explicitfeedback on items they have seen. Meanwhile, a user doesnot want to endure poor performance while the system is“trainingand expects a system to work reasonably wellas soon as he/she first uses the system. Good initial per-formance is an incentive for the user to continue using thesystem. Thus an important aspect of personalization is todevelop a system that works well initially with less explicituser feedback.Much prior research has been carried out exploring theusefulness of implicit feedback [10] because it is easy to col-lect and requires no extra effort from the user. On the otherhand, user independent systems perform reasonably well formost users, mainly because the system parameters have beentuned to a reasonable value based on thousands of existingusers. These observations suggest at least two possible ap-proaches to improve the early stage performance of a per-sonalized system: using cheap implicit feedback from theuser and borrowing information from other users. Both ap-proaches may help the system reduce its uncertainty aboutthe user, especially when the user just starts using the sys-tem and has not provided much explicit feedback.This paper explores both directions under a single, uni-fied Bayesian hierarchical modeling framework. Intuitively,combining the two approaches under this framework mayachieve a nice tradeoff between bias and variance trade off,especially for a new user. This is because including im-plicit feedback decreases the bias and increases variance of the learner, while borrowing information from other usersthrough a Bayesian prior in our framework has the inverseeffect. We demonstrate, empirically, that the hierarchicalmodel controls the tradeoff between shared and personal in-
 
formation, thereby alleviating poor initial performance. Wealso evaluate the long-term usefulness of different types of feedback for predicting a user’s rating for a document.The paper is organized as follows. We begin with a brief review of related work in Section 2. We then describe theBayesian hierarchical modeling approach to learn user spe-cific models (Section 3) and introduce a computationallyefficient model, the Hierarchical Gaussian network, as anexample to be used in our experiments (Section 4). Section5 and 6 describe the experimental methodology and resultson two real user study data set. Section 7 concludes.
2. RELATED WORK
Implicit feedback is a broad term including any kind of natural interaction of a user with a document [16]. Exam-ples of implicit feedback are mouse and keyboard usage, pagenavigation, book-marking, and editing. While the focus of much emerging research, there is still some debate over thevalue of implicit feedback. For example, it has been ob-served in numerous controlled studies [3, 12, 6, 4] that thereis a clear correlation between the time a user spends view-ing a document and how useful they found that document.However, Kelly, et al. [9] demonstrate, in a more natural-istic setting, that this correlation varies significantly acrosstasks and conclude that the usefulness of implicit feedbackis conditioned heavily on what type of information the useris seeking. Standard machine learning techniques such asSVMs [8], neural networks [12], and graphical models [4][18]have been used to explore the usefulness of implicit feedback,with varying degrees of success. In our work we use a hi-erarchical Bayesian hierarchical model to integrate implicitfeedback.The idea of borrowing information from other users toserve the information needs of a user is widely studied inthe area called collaborative filtering and Bayesian hierar-chical modeling has applied in this context [17]. The majordifferences in our work are: 1) we focus on developing com-plex user models that go beyond relevance based contentmodel by including other explicit feedback (such as topicfamiliarity and readability) and implicit feedback (such asuser actions and system context); 2) we focus on the
adaptive
learning environment and analyze the online performance of the learning system. This analysis makes it clear how an
adaptive
system can benefit from the hierarchical Bayesianframework; and 3) we use a Gaussian network (Gaussianuser models with Gaussian prior), while the prior work usesdifferent functional forms, such as multinomial user modelswith Dirichlet prior [17]. Our method is more computation-ally efficient and also has less privacy concerns while sharinginformation across users.
3. HIERARCHICAL BAYESIANFRAMEWORK
One concern about using user-specific models, or profiles,is that there is an initial period where the user must endurepoor performance. On the other hand, existing user inde-pendent systems seem to work well by tuning parameters forthe general public. This motivates us to borrow informationfrom existing users to serve a new user. Here we adopt aprincipled approach for doing this based on Bayesian hier-archical models.Let
u
represent the model of a user
u
. Over time, theuser interacts with the system and provides explicit and im-plicit feedback about documents the user has read. Let a
d
-dimensional random variable
x
u
represent the informationabout a document for user
u
. Each dimension of 
x
u
corre-sponds to a feature, and the features could be user actionson the document, the document relevance scores the systemderives from the user’s explicit feedback on other documents,user-independent features of the document, and so on. Theuser
u
may provide a rating
y
u
to a document
1
x
u
accordingto the user model
u
. For now, a
model 
is a function thattakes information about a document and returns an estima-tion of whether the user likes the document or not (rating):
u
:
x
u
y
u
. The functional form for
u
varies for dif-ferent learning system, and we delay this discussion untilSection 4. We make no assumption about the distributionof documents.A Bayesian based learning system begins with a certainprior belief,
(
|
θ
), about the distribution of user models.In the simplest terms the hierarchical Bayesian frameworkcan be written as
u
(
|
θ
)
y
=
u
(
x
)and is also illustrated in Figure 1 Note that this is a verygeneral framework and can accommodate any class of func-tions for which a reasonable prior can be specified. Thisincludes any function parameterized by real number (e.g.SVMs, neural networks with fixed topology) and regressiontrees [2].A personalized system tries to learn the user model
u
fora user
u
. As the user uses the system, the system receivesa sample of document-rating pairs,
D
u
=
{
(
x
ui
,y
ui
)
|
i
=1
...N 
u
}
, where
u
is the number of training examples foruser
u
. Using Baye’s Rule, the system can update its belief about the user model based on the data and get the
posterior 
distribution over user models:
(
u
|
θ,D
u
) =
(
D
u
|
u
,θ
)
(
u
|
θ
)
(
D
u
|
θ
)=
(
u
|
θ
)
u
Y
i
=1
(
u
(
x
ui
) =
y
ui
|
u
)
(
u
(
x
ui
) =
y
ui
|
θ
)To find the maximum a posteriori (MAP) model,
uMAP 
,we can ignore the denominator since it is the same for allmodels. Then we have:
uMAP 
= argmax
(
|
θ,D
u
)= argmax
(
|
θ
)
u
Y
i
=1
(
(
x
ui
) =
y
ui
|
)= argmax
log
(
|
θ
) +
u
X
i
=1
log
(
(
x
ui
) =
y
ui
|
) (1)Equation 1 shows clearly how the fitness of a model is de-composed into the model’s prior probability and the datalikelihood given the model.Incorporating a prior distribution automatically managesthe trade off between generic user information and user spe-
1
This paper sometimes refers to
x
u
as simply a “document”even though it has features of the user’s interaction with it.
 
cific information based on the amount of training data avail-able for the specific user. When a user
u
starts using asystem, the system has almost no information about theuser. Because
u
for the user is small, the prior distribu-tion
p
(
|
θ
) is a major contributor when estimating the usermodelˆ
(first term in Equation 1). If the system has areliable prior, it can perform well even with little data forthe user. How can we find a good prior for
? We treatuser
u
’s model
u
as a sample from the distribution
(
|
θ
).Given a set of user models, we can choose
θ
to maximizetheir likelihood.As the system gets more training data for this particu-lar user,
uMAP 
is dominated by the data likelihood (secondterm in Equation 1) and the prior learned from other usersbecomes less important. As a result, the system works likea well tuned non-personalized system for a new user, butkeeps on improving the user model as more feedback fromthe user is available. Eventually, after a sufficient amountof data is collected, the user’s profile should reach a fixedpoint.
4. HIERARCHICALGAUSSIANNETWORK
Specific functional forms for
u
and
(
|
θ
) are needed tobuild a practical learning system. For our work, we let
be asimple linear regression function and let
w
be the parameterof 
. We chose to represent the prior distribution as a mul-tivariate normal, i.e.
(
|
θ
) =
(
w
|
θ
) =
(
w
;
µ,
Σ). Themotivation for this choice is the fact that the self-conjugacyof the normal distribution simplifies computation. For sim-plicity, we assume that the covariance matrix Σ is diagonal,i.e. the components of the model are independent. Thus wehave:
w
u
(
w
;
µ,
Σ)
y
=
x
·
w
u
+
εε
(
ε
;0
,κ
2
u
)where
ε
is independent zero mean Gaussian noise. One canalso view
y
as a random sample from a normal distribution
(
y
;
x
·
w
u
,κ
2
u
). It is possible to extend the hierarchicalmodel to include a prior on the variance of the noise
κ
u
and learn it from the data. For simplicity, we set
κ
2
u
to aconstant value of 0
.
1 for all
u
in our experiments.Based on Equation 1, the MAP estimate of the user model
w
u
for a particular user
u
is [7]:
w
uMAP 
= argmax
w
log
(
w
|
θ
) +
u
X
i
=1
log
(
x
ui
,y
ui
|
w
)= argmin
w
(
w
µ
)
Σ
1
(
w
µ
) +1
κ
u
u
X
i
=1
(
x
i
·
w
y
i
)
2
(2)Equation 2 also shows the natural tradeoff that occursbetween the prior and the data. As we collect more data foruser
u
, the second term is expected to grow linearly with
u
.This term, which corresponds to the mean square error (
L
2
loss) of the model
w
on the training sample, will eventuallydominate the first term, which is the loss associated withthe deviation from the prior mean.The variance of the prior Σ also affects this tradeoff. Forinstance, if the values on the diagonal of Σ are very largefor all
j
, the prior is close to a uniform distribution and
¡¢£¤ ¡¢£¥ 
Figure 1: Illustration of dependencies of variablesin the hierarchical model. The rating,
, for a doc-ument,
X
, is conditioned on the document and themodel,
, associated with that user. Users shareinformation about their model through the use of aprior,
Θ
.
w
MAP 
is close to the linear least square estimator. In thereverse situation, the prior dominates the data and the MAPestimate will be closer to
µ
.So far, we have discussed the learning process of a singleuser and assume the prior is fixed. However, the system willhave many users, and the prior should be learned from theother users data. We use an initially uniform prior when thesystem is launched
2
, i.e.
(
µ,
Σ) is the same for all
µ
, Σ.As users join the system, the prior distribution
p
(
|
θ
) isupdated. At certain point when the system has
existingusers, the system has a set of 
user models
w
u
:
u
= 1
...K 
.Based on the evidence from these existing users, we can inferthe parameters
θ
= (
µ,
Σ) of our prior distribution:ˆ
µ
=1
k
K
X
u
=1
w
u
(3)ˆΣ =1
k
1
K
X
u
=1
(
w
u
ˆ
µ
)
·
(
w
u
ˆ
µ
)
(4)This is, in essence, the mechanism by which we share userinformation: the prior mean is the average of that param-eter for all other user models. Similarly, the prior variancecaptures how consistent a parameter is for all users.
4.1 Finding
w
MAP 
with Belief Propagation
Given a large pool of data, we need to find the MAPestimation of all
u
and
θ
in Figure 1. A simple approachis to treat it as one large optimization problem where allparameters are optimized simultaneously. If there are
users with data sets¯
D
=
{
D
1
,...,D
K
}
then we can write
2
It is, of course, possible to extend the hierarchical modelto include a prior distribution for the parameters
µ
and Σ.

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->