P. 1
David Vallet's Master Thesis

David Vallet's Master Thesis


|Views: 525|Likes:
Published by dvallet
Master Theshis on personalization and use of context on personalization techinques
Master Theshis on personalization and use of context on personalization techinques

More info:

Published by: dvallet on Feb 20, 2008
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Specific, advanced mechanisms need to be developed in order to ensure that

personalization is used at the right time, in the appropriate direction, and in the right

amount. Users seem inclined to rely on personalized features when they need to save

time, wish to spare efforts, have vague needs, have limited knowledge of what can be

queried for (e.g. for lack of familiarity with a repository, or with the querying system

itself), or are not aware of recent content updates. Personalization is clearly not

appropriate, for instance, when the user is looking for a specific, known content item, or

when the user is willing to provide detailed relevance feedback, engaging in a more

conscientious interactive search session. Even when personalization is appropriate,

user preferences are heterogeneous, variable, and context-dependent. Furthermore,

Personalized Information Retrieval in Context Using Ontological Knowledge


there is inherent uncertainty in the system when automatic preference learning is used.

To be accurate, personalization needs to combine long-term predictive capabilities,

based on past usage history, with shorter-term prediction, based on current user

activity, as well as reaction to (implicit or explicit) user feedback to personalized output,

in order to correct the system’s assumptions when needed. The few proposals that

mention the concept of having a profile based both on long-term (i.e. the user profile)

and short-term interests (i.e. the context) don’t have a clear distinction between the two

representations, either by not differentiating the acquisition techniques [46, 110], or by

not differentiation the exploitation techniques [73].

The idea of contextual personalization, proposed and developed here, responds to the

fact that human preferences are multiple, heterogeneous, changing, and even

contradictory, and should be understood in context with the user goals and tasks at

hand. Indeed, not all user preferences are relevant in all situations. For instance, if a

user is consistently looking for some contents in the Formula 1 domain, it would not

make much sense that the system prioritizes some Formula 1 picture with a helicopter

in the background, as more relevant than others, just because the user happens to

have a general interest for aircrafts. In the semantic realm of Formula 1, aircrafts are

out of (or at least far from) context. Taking into account further contextual information,

available from prior sets of user actions, the system can provide an undisturbed, clear

view of the actual user’s history and preferences, cleaned from extraordinary

anomalies, distractions or “noise” preferences. We refer to this surrounding information

as contextual knowledge or just context, offering significant aid in the personalization

process. The effect and utility of the proposed techniques consists of endowing a

personalized retrieval system with the capability to filter and focus its knowledge about

user preferences on the semantic context of ongoing user activities, so as to achieve

coherence with the thematic scope of user actions at runtime.

As already discussed in previous sections of this document, context is a difficult notion

to grasp and capture in a software system. In our approach, we focus our efforts on this

major topic of retrieval systems, by restricting it to the notion of semantic runtime

context [118]. The latter forms a part of general context, suitable for analysis in

personalization and can be defined as the background themes under which user

activities occur within a given unit of time. From now on we shall refer to semantic

runtime context as the information related to personalization tasks and we shall use the

Personalized Information Retrieval in Context Using Ontological Knowledge


simplified term context for it. The problems to be addressed include how to represent

the context, how to determine it at runtime, and how to use it to influence the activation

of user preferences, "contextualize" them and predict or take into account the drift of

preferences over time (short and long-term).

As will be described in section 5.3, in our current solution to these problems, a runtime

context is represented as (is approximated by) a set of weighted concepts from the

domain ontology. How this set is determined, updated, and interpreted, will be

explained in section 5.4. Our approach to the contextual activation of preferences is

then based on a computation of the semantic similarity between each user preference

and the set of concepts in the context, as will be shown in section 5.5.1. In spirit, the

approach consists of finding semantic paths linking preferences to context. The

considered paths are made of existing semantic relations between concepts in the

domain ontology. The shorter, stronger, and more numerous such connecting paths,

the more in context a preference shall be considered.

The proposed techniques to find these paths take advantage of a form of Constraint

Spreading Activation (CSA) strategy [36], as will be explained in section 5.5. In the

proposed approach, a semantic expansion of both user preferences and the context

takes place, during which the involved concepts are assigned preference weights and

contextual weights, which decay as the expansion grows farther from the initial sets.

This process can also be understood as finding a sort of fuzzy semantic intersection

between user preferences and the semantic runtime context, where the final computed

weight of each concepts represents the degree to which it belongs to each set.

Finally, the perceived effect of contextualization should be that user interests that are

out of focus, under a given context, shall be disregarded, and only those that are in the

semantic scope of the ongoing user activity (the "intersection" of user preferences and

runtime context) will be considered for personalization. As suggested above, the

inclusion or exclusion of preferences needs not be binary, but may range on a

continuum scale instead, where the contextual weight of a preference shall decrease

monotonically with the semantic distance between the preference and the context.

Personalized Information Retrieval in Context Using Ontological Knowledge


5.1. Notation

Before continuing, we provide a few details on the mathematical notation that will be

used in the sequel. It will be explained again in most cases when it is introduced, but

we gather it all here, in a single place, for the reader's convenience.


The domain ontology.


The set of all relations in O.


The set of all documents or content in the search space.

M : D → [0,1]|O|

A mapping between document and their semantic

annotations, i.e. M(d) ∈ 0,1 | |

is the concept-vector

metadata of a document d ∈ D.


The set of all users.


The set of all possible user preferences.


The set of all possible contexts.


An instantiation of P and C for the domain O, where P is

represented by the vector-space 1,1 | |

and C by 0,1 | |


P : U → P

A mapping between users and preferences, i.e. P(u) ∈ P is

the preference of user u ∈ U.

C : U × N → C

A mapping between users and contexts over time, i.e. C(u,t)

∈ C is the context of a user u ∈ U at an instant t ∈ N.

EP : U → P

Extended user preferences.

EC : U × N → C

Extended context.

CP : U × N → P

Contextualized user preferences, also denoted as


vx, where v ∈ [-1,1]|O|

We shall use this vector notation for concept-vector spaces,

where the concepts of an ontology O are the axis of the

vector space. For a vector v ∈ [-1,1]|O|

, vx ∈ [-1,1] is the

coordinate of v corresponding to the concept x∈O. This
notation will be used for all the elements ranging in the

1,1 | |

space, such as document metadata Mx(d), user
preferences Px(u), runtime context Cx(u,t), and others.


The set of all possible user requests, such as queries,
viewing documents, or browsing actions.

Personalized Information Retrieval in Context Using Ontological Knowledge


prm : D × U × N → [-1,1] prm(d,u,t) is the estimated contextual interest of user u for the

document d at instant t.

sim : D × Q → [0,1]

sim(d,q) is the relevance score computed for the document d
for a request q by a retrieval system external to the
personalization system.

score : D × Q × U × N → [-1,1]

score(d,q,u,t) is the final personalized relevance score
computed by a combination of sim and prm.

5.2. Preliminaries

Our strategies for the dynamic contextualization of user preference are based on three

basic principles: a) the representation of context as a set of domain ontology concepts

that the user has “touched” or followed in some manner during a session, b) the

extension of this representation of context by using explicit semantic relations among

concepts represented in the ontology, and c) the extension of user preferences by a

similar principle. Roughly speaking, the “intersection” of these two sets of concepts,

with combined weights, will be taken as the user preferences of interest under the

current focus of user action. The ontology-based extension mechanisms will be

formalized on the basis of an approximation to conditional probabilities, derived from

the existence of relations between concepts. Before the models and mechanisms are

explained in detail, some preliminary ground for the calculation of combined

probabilities will be provided and shall be used in the sequel for our computations.

Given a finite set Ω, and a ∈ Ω, let P(a) be the probability that a holds some condition.

It can be shown that the probability that a holds some condition, and it is not the only

element in Ω that holds the condition, can be written as:

Ω ∑

1 | |

∏ · |

Equation 4 Probability of holding condition a, inside a finite set Ω

Provided that a ∩ x are mutually independent for all x ∈ Ω (the right hand-side of the

above formula is based on the using the inclusion-exclusion principle applied to

probability [128]). Furthermore, if we can assume that the probability that a is the only

element in Ω that holds the condition is zero, then the previous expression is equal to


Personalized Information Retrieval in Context Using Ontological Knowledge


We shall use this form of estimating “the probability that a holds some condition” with

two purposes: a) to extend user preferences for ontology concepts, and b) to determine

what parts of user preferences are relevant for a given runtime context, and should

therefore be activated to personalize the results (the ranking) of semantic retrieval, as

part of the process described in detail in [36]. In the former case, the condition will be

“the user is interested in concept a”, that is, P(a) will be interpreted as the probability

that the user is interested in concept a of the ontology. In the other case, the condition

will be “concept a is relevant in the current context”. In both cases, the universe Ω will

correspond to a domain ontology O (the universe of all concepts).

In both cases, Equation 4 provides a basis for estimating P(a) for all a∈O from an initial

set of concepts x for which we know (or we have an estimation of) P(x). In the case of

preferences, this set is the initial set of weighted user preferences for ontology

concepts, where concept weights are interpreted as the probability that the user is

interested in a concept. In the other case, the initial set is a weighted set of concepts

found in elements (links, documents, queries) involved in user actions in the span of a

session with the system. Here this set is taken as a representation of the semantic

runtime context, where weights represent the estimated probability that such concepts

are important in user goals. In both cases, Equation 4 will be used to implement an

expansion algorithm that will compute probabilities (weights) for all concepts starting

from the initially known (or assumed) probabilities for the initial set. In the second case,

the algorithm will compute a context relevance probability for preferred concepts that

will be used as the degree of activation that each preference shall have (put in rough

terms, the (fuzzy) intersection of context and preferences will be found this way).

Equation 4 has some interesting properties with regards to the design of algorithms

based on it. In general, for X=

, where xi ∈ [-1,1], let us define

∑ 1 | | ∏

. It is easy to see that this function has the following properties:

• R (X) ∈ [-1,1]
• R (X) ≥ xi for all i (in particular this means that R (X) = 1 if xi = 1 for some i).
• R (X) increases monotonically with respect to the value of xi, for all i.
• Since R (X) = x0 + (1 – x0) ·

, R (X) can be computed efficiently. Note

also that R(X) does not vary if we reorder X.
These properties will be useful for the definition of algorithms with computational


Personalized Information Retrieval in Context Using Ontological Knowledge


You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->