P. 1
David Vallet's Master Thesis

David Vallet's Master Thesis

4.0

|Views: 525|Likes:
Published by dvallet
Master Theshis on personalization and use of context on personalization techinques
Master Theshis on personalization and use of context on personalization techinques

More info:

Published by: dvallet on Feb 20, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/08/2014

pdf

text

original

The contextualization techniques described in the previous sections have been

implemented in an experimental prototype, and tested on a medium-scale corpus. A

formal evaluation of the contextualization techniques may require a significant amount

of extra feedback from users in order to measure how much better a retrieval system

can perform with the proposed techniques than without them. For this purpose, it would

be necessary to compare the performance of retrieval a) without personalization, b)

with simple personalization, and c) with contextual personalization. This requires

building a testbed consisting of: a document corpus, a set of task descriptions, the

relevance judgments for each task description, and the interaction model, either fixed

or provided by the users.

The document corpus consists of 145,316 documents (445MB) from the CNN web

site6

, plus the KIM domain ontology and knowledge base publicly available as part of

the KIM Platform, developed by Ontotext Lab7

, with minor extensions. The KB contains

a total of 281 RDF classes, 138 properties, 35,689 instances, and 465,848 sentences.

The CNN documents are annotated with KB concepts, amounting to over three million

annotation links. The relation weights R and R-1

were first set manually on an intuitive

basis, and tuned empirically afterwards by running a few trials.

6

http://dmoz.org/News/Online_Archives/CNN.com

7

http://www.ontotext.com/kim

Personalized Information Retrieval in Context Using Ontological Knowledge

63

Task descriptions will be similar to the simulated situation explained in section 6.1.1,

the goal of the task description is to provide the sufficient query and contextual

information to the user to perform the relevance assessments and to stimulate the

interaction with the evaluation system.

Similar to the TREC Hard track (see 6.1.4) we will use real users that will provide

explicit relevance assessments with respect to a) query relevance, b) query relevance

and general user preference (i.e. regardless of the task at hand), and c) query

relevance and specific user preference (constrained to the context of her task)

The selected metrics are precision and recall values for single query evaluation and

average precision and recall, and mean average precision for whole system

performance evaluation. Average precision and recall is the average value of the PR

points for a set of n queries. PR values was chosen as it allows a finer analysis of the

results, as the values can be represented in a graph and different levels of

performance can be compared at once. For instance, a retrieval engine can have a

good precision, showing relevant results in the top 5 documents, whereas the same

search system can lack of a good recall performance, as it is unable to find a good

proportion of relevant documents in the search corpus, see Figure 15 for a visual

description of the precision and recall areas.

Since the contextualization techniques are applied in the course of a session, one way

to evaluate them is to define a sequence of steps where the techniques are put to

work. This is the approach followed presented set of experiments, for which we the

task descriptions consist of a fixed set of hypothetic context situations, detailed step by

step.

Personalized Information Retrieval in Context Using Ontological Knowledge

64

Precision area

Recall area

Optimal results

0,0

0,2

0,4

0,6

0,8

1,0

Recall

0,0

0,2

0,4

0,6

0,8

1,0

Precision

Good precision performance
Good recall performance

Figure 15. Different areas of performance for a precision and recall curve. The goal area to
reach is the upper right part, as it will denote that the search engine has a good precision over
all the results and has achieved a good recall value of documents.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->