You are on page 1of 25

Chapter four

Retrieval Evaluation
Introduction to Retrieval evaluation

 Retrieval evaluation is an evaluation measure for an information

retrieval system that are used to assess how well the search/ browse
results satisfied the users query intent.
 The evaluation is a systematic determination of subjects merit, worth

and significant, using criteria governed by a set of standards before it’s


implementation.
 It ascertain the degree of achievement in regard to the aim and

objectives and results of any search action that has been completed.
Retrieval evaluation …
 The evaluation of information retrieval system measure which of the two

existing system perform better and try to assess how the level of
performance of a given can be improved.
 Thus, the first type of evaluation which should be considered is a functional

analysis in which the specified system functionalities are tested one by one.
 The evaluation metrics may be online metric focused at users interactions

with the search system (search engine models) while offline metrics
measure relevance from text/document collection, likely each result or
search engine results page (SERP) .
Retrieval evaluation …
 It is a simple procedure which can be quite useful for catching

programming errors. Given that the system has passed the functional
analysis phase, one should proceed to evaluate the performance of the
system.
 Lancaster state that the evaluation of information retrieval system can

be justified by the following three issues:-

- How well the system is satisfying it’s objectives

- How efficiently it is satisfying it’ objectives and

- Whether the system justified it’s existence


Retrieval Evaluation purposes

 The main purpose is to focus on the process of implementation rather than on

it’s impact.
 The evaluation helps to investigate the degree to which the state goals have

been achieved from the collection of documents.


 To measure the information retrieval effectiveness in the standard way, need a

test collection consisting of the following three things:-

- A document collection

- a test suit of information needs, expressible as queries

- A set of relevance judgments, standardly a binary assessment of either


relevant or non-relevant for each query-document- pair. There are different
perspectives as mentioned below.
Retrieval Evaluation purposes …
 Swanson states that seven purposes for information retrieval
evaluation:-

- To assess a set of goals, a programming plan, or a design prior to


implementation

- To determine whether and how well goals or performance


expectations are being fulfilled.

- To determine specific reasons for success and failure

- To uncover principles underlying a successful programme

- To explore techniques for increasing programme effectiveness

- To establish a foundation of further research on the reason for


the relative success of alternative technique and - To improve the
Retrieval Evaluation purposes …

Keen forwards three major purposes for information retrieval


evaluations:-

- The need for measures with which to make merit


comparisons within a single test situation. In other words, evaluation
studies are conducted to compare the merits or demerits of two or
more system.

- The need for measure with which to make comparison


between results obtained in different test situation and

- The need for assessing the merit of a real-life system.


Retrieval Evaluation purposes …

 The evaluation of information retrieval can be conducted into two

different viewpoints.

- Managerial viewpoint: when evaluation is conducted from


managerial point of view which is called managerial oriented evaluation.

- User viewpoint: when evaluation is conducted from the user


point of view which is called user-oriented evaluation study.
Retrieval Evaluation purposes …

According to Lancaster in 1971proposed five evaluation criteria:-

- Coverage of the system

- Ability of the system to avoid retrieval of unwanted items (i.e


precision).

- Ability of the system to retrieve wanted items (i.e recall).

- The response time of the system and

- The amount of effort required by user.


 This type of evaluation is referred to as retrieval performance

evaluation.
Retrieval performance Evaluation

 When considering retrieval performance evaluation, we should first consider

the retrieval task that is to be evaluated.


 For instance, the retrieval task could consist simply of a query processed in

batch mode (i.e., the user submits a query and receives an answer back) or of
a whole interactive session (i.e., the user specifies his information need
through a series of interactive steps with the system).
 Further, the retrieval task could also comprise a combination of these two

strategies.
 Batch and interactive query tasks are quite distinct processes and thus their

evaluations are also distinct.


Retrieval performance Evaluation …

 In fact, in an interactive session, user effort, characteristics of the interface

design, guidance provided by the system, and duration of the session are
critical aspects which should be observed and measured.
 In a batch session, none of these aspects is nearly as important as the quality

of the answer set generated.


 Retrieval performance evaluation in the early days of computer-based

information retrieval systems focused primarily on laboratory experiments


designed for batch interfaces.
 In the 1990s, a lot more attention has been paid to the evaluation of real life

experiments.
Retrieval performance Evaluation (Online Metrics)
 Online metrics are generally created from search logs. The metrics are

often used to determine the success of engine results. Some of them


are:

- Session abandonment rate: Session abandonment rate is a ratio


of search sessions which do not result in a click.

- Click-through rate: Click-through rate (CTR) is the ratio of


users who click on a specific link to the number of total users who view a
page, email, or advertisement. It is commonly used to measure the success
of an online advertising campaign for a particular website as well as the
effectiveness of email campaigns.
Online Metric…
 Session success rate: Session success rate measures the ratio of

user sessions that lead to a success result from the index.


 Zero result rate: result is not found is the ratio of SERPs which

returned with no results. The metric either indicates


a recall issue, or that the information being searched for is not in
the index.
Retrieval performance Evaluation (offline)
 Offline metrics are generally created from relevance judgment sessions

where the judges score the quality of the search results. Mainly the very
common retrieval performance evaluation measures / criteria are: -

- Recall

- Precision

- Fallout

- Generality

- F-score / F- measure : weighted harmonic mean


Precision and Recall
 In pattern recognition, information retrieval and classification (machine
learning), precision (also called positive predictive value) is the
fraction of relevant instances among the retrieved instances,
while recall (also known as sensitivity) is the fraction of the total
amount of relevant instances that were actually retrieved. Both
precision and recall are therefore based on an understanding and
measure of relevance.
 Recall

It is defined as the proportion of the total relevant documents that is


retrieved.
Retrieval performance Evaluation …
 Example: if there are 100 documents in a repository that are relevant to a given

query and 60 of these items are retrieved in a given search, then the recall is
stated to be 60% in other words the system has been able to retrieve 60% of
the relevant items. That is Recall = 60/100 *100 = 60
 Therefore,

The number of relevant item in the collection = 100.

The number of relevant items retrieved = 60.


 The above example indicate that in the repository, there are 100 relevant data

and 60 of them are relevant and retrieved for the user according his/her
request.
Retrieval performance Evaluation …
Precision: it is defined as the proportion of documents retrieved that is relevant.

 Example:- in a given search the system retrieves 90 items, out of these 45 are

relevant and 45 are non-relevant, the precision is 50%.

Therefore,

Precision = 45/90 * 100 = 50%

The total items retrieved =90.

The number of relevant retrieved items = 45.


Retrieval performance Evaluation …

  
 Fall-out:- Fall-out ration is a proportion of non-relevant items that have been
retrieved in a given transaction/ search.

This is closely related to specificity (1- specificity). It can be as a probability that


a non-relevant document is retrieved by the query.
 Generality

Generality ratio is the relevant items that have been retrieved in a given search.
o F-score/ F-measure : it is a weighted harmonic mean of precision and recall,

the traditional F-measure or balanced F-score is expressed as follows.


o F
Retrieval performance Evaluation …
 The F measure is also known as F1 measure, because recall and
precision are evenly weighted.

Query:-
1. Write and discuss the relationship F-measure and E-measure of recall
and precision?
2. Discuss about the confusion matrices of recall and precision?
Limitation of Recall and Precision

 Different users may want different levels of recall.

 A user is going to prepare a state-of-the-art report on a topic would like to have all the items

available on the topics and therefore will go for high recall.


 Where as a user need to know about a given topic will prefer to have a few items and thus will

not require a high recall.


 Other Performance Evaluation criteria's

- Effectiveness : It is the level up to which the given system attain it’s objective. It
measures how far it can retrieve relevant information while with holding non-relevant information.

- Efficiency: It indicates how economically the system is achieving it’s objective. In an


information retrieval system efficiency can be measured with the factor of cost and time.

- Usability

- Satisfaction and

- Cost
Other Performance Evaluation criteria's

 Usability: It measures that embraces the interface through which the

user interacts with the system. It takes into account the user and their
expectations, skills and experiences.

 Satisfaction:

- Searching tasks

- Searching settings

- Searchers state contributes to quality and satisfaction judgments


in the digital environment.

- Other perspectives of satisfaction are to be found in service


quality and website quality literature's.
Other Performance Evaluation criteria's …

 Cost: users may experience costs in terms of any payment that they

need to make for system or document access. Most significant cost is


associated with the time that they spend in searching a system.
Assignment- 4 (15%)

1. Explain about reference collection and types of reference collection


of retrieval measures?

2. Explain how artificial intelligence (AI), machine learning (ML) can


help in information retrieval?

3. Write a simple programming concept using python for retrieving


from repositories / cloud effectively?

You might also like