Chapter Four (ISR)

Chapter four
Retrieval Evaluation
Introduction to Retrieval evaluation
 Retrieval evaluation is an evaluation measure for an information
retrieval system that are used to assess how well the search/ browse
results satisfied the users query intent.
 The evaluation is a systematic determination of subjects merit, worth
and significant, using criteria governed by a set of standards before it’s

implementation.
 It ascertain the degree of achievement in regard to the aim and
objectives and results of any search action that has been completed.
Retrieval evaluation …
 The evaluation of information retrieval system measure which of the two
existing system perform better and try to assess how the level of
performance of a given can be improved.
 Thus, the first type of evaluation which should be considered is a functional
analysis in which the specified system functionalities are tested one by one.
 The evaluation metrics may be online metric focused at users interactions
with the search system (search engine models) while offline metrics
measure relevance from text/document collection, likely each result or
search engine results page (SERP) .
Retrieval evaluation …
 It is a simple procedure which can be quite useful for catching
programming errors. Given that the system has passed the functional
analysis phase, one should proceed to evaluate the performance of the
system.
 Lancaster state that the evaluation of information retrieval system can
be justified by the following three issues:-
- How well the system is satisfying it’s objectives
- How efficiently it is satisfying it’ objectives and
- Whether the system justified it’s existence

Retrieval Evaluation purposes
 The main purpose is to focus on the process of implementation rather than on
it’s impact.
 The evaluation helps to investigate the degree to which the state goals have
been achieved from the collection of documents.

 To measure the information retrieval effectiveness in the standard way, need a
test collection consisting of the following three things:-
- A document collection
- a test suit of information needs, expressible as queries
- A set of relevance judgments, standardly a binary assessment of either

relevant or non-relevant for each query-document- pair. There are different
perspectives as mentioned below.
Retrieval Evaluation purposes …
 Swanson states that seven purposes for information retrieval
evaluation:-
- To assess a set of goals, a programming plan, or a design prior to

implementation
- To determine whether and how well goals or performance

expectations are being fulfilled.
- To determine specific reasons for success and failure
- To uncover principles underlying a successful programme
- To explore techniques for increasing programme effectiveness
- To establish a foundation of further research on the reason for

the relative success of alternative technique and - To improve the
Keen forwards three major purposes for information retrieval

evaluations:-
- The need for measures with which to make merit

comparisons within a single test situation. In other words, evaluation
studies are conducted to compare the merits or demerits of two or
more system.
- The need for measure with which to make comparison

between results obtained in different test situation and
- The need for assessing the merit of a real-life system.

 The evaluation of information retrieval can be conducted into two
different viewpoints.
- Managerial viewpoint: when evaluation is conducted from

managerial point of view which is called managerial oriented evaluation.
- User viewpoint: when evaluation is conducted from the user

point of view which is called user-oriented evaluation study.
According to Lancaster in 1971proposed five evaluation criteria:-
- Coverage of the system
- Ability of the system to avoid retrieval of unwanted items (i.e

precision).
- Ability of the system to retrieve wanted items (i.e recall).
- The response time of the system and
- The amount of effort required by user.

 This type of evaluation is referred to as retrieval performance
evaluation.
Retrieval performance Evaluation
 When considering retrieval performance evaluation, we should first consider
the retrieval task that is to be evaluated.

 For instance, the retrieval task could consist simply of a query processed in
batch mode (i.e., the user submits a query and receives an answer back) or of
a whole interactive session (i.e., the user specifies his information need
through a series of interactive steps with the system).
 Further, the retrieval task could also comprise a combination of these two
strategies.
 Batch and interactive query tasks are quite distinct processes and thus their
evaluations are also distinct.

Retrieval performance Evaluation …
 In fact, in an interactive session, user effort, characteristics of the interface
design, guidance provided by the system, and duration of the session are
critical aspects which should be observed and measured.
 In a batch session, none of these aspects is nearly as important as the quality
of the answer set generated.

 Retrieval performance evaluation in the early days of computer-based
information retrieval systems focused primarily on laboratory experiments

designed for batch interfaces.
 In the 1990s, a lot more attention has been paid to the evaluation of real life
experiments.
Retrieval performance Evaluation (Online Metrics)
 Online metrics are generally created from search logs. The metrics are
often used to determine the success of engine results. Some of them

are:
- Session abandonment rate: Session abandonment rate is a ratio

of search sessions which do not result in a click.
- Click-through rate: Click-through rate (CTR) is the ratio of

users who click on a specific link to the number of total users who view a
page, email, or advertisement. It is commonly used to measure the success
of an online advertising campaign for a particular website as well as the
effectiveness of email campaigns.
Online Metric…
 Session success rate: Session success rate measures the ratio of
user sessions that lead to a success result from the index.

 Zero result rate: result is not found is the ratio of SERPs which
returned with no results. The metric either indicates

a recall issue, or that the information being searched for is not in
the index.
Retrieval performance Evaluation (offline)
 Offline metrics are generally created from relevance judgment sessions
where the judges score the quality of the search results. Mainly the very
common retrieval performance evaluation measures / criteria are: -
- Recall
- Precision
- Fallout
- Generality
- F-score / F- measure : weighted harmonic mean

Precision and Recall
 In pattern recognition, information retrieval and classification (machine
learning), precision (also called positive predictive value) is the
fraction of relevant instances among the retrieved instances,
while recall (also known as sensitivity) is the fraction of the total
amount of relevant instances that were actually retrieved. Both
precision and recall are therefore based on an understanding and
measure of relevance.
 Recall
It is defined as the proportion of the total relevant documents that is

retrieved.
 Example: if there are 100 documents in a repository that are relevant to a given
query and 60 of these items are retrieved in a given search, then the recall is
stated to be 60% in other words the system has been able to retrieve 60% of
the relevant items. That is Recall = 60/100 *100 = 60
 Therefore,
The number of relevant item in the collection = 100.
The number of relevant items retrieved = 60.

 The above example indicate that in the repository, there are 100 relevant data
and 60 of them are relevant and retrieved for the user according his/her
request.
Precision: it is defined as the proportion of documents retrieved that is relevant.
 Example:- in a given search the system retrieves 90 items, out of these 45 are
relevant and 45 are non-relevant, the precision is 50%.
Therefore,
Precision = 45/90 * 100 = 50%
The total items retrieved =90.
The number of relevant retrieved items = 45.


 Fall-out:- Fall-out ration is a proportion of non-relevant items that have been
retrieved in a given transaction/ search.
This is closely related to specificity (1- specificity). It can be as a probability that

a non-relevant document is retrieved by the query.
 Generality
Generality ratio is the relevant items that have been retrieved in a given search.
o F-score/ F-measure : it is a weighted harmonic mean of precision and recall,
the traditional F-measure or balanced F-score is expressed as follows.

o F
 The F measure is also known as F1 measure, because recall and
precision are evenly weighted.
Query:-
1. Write and discuss the relationship F-measure and E-measure of recall
and precision?
2. Discuss about the confusion matrices of recall and precision?
Limitation of Recall and Precision
 Different users may want different levels of recall.
 A user is going to prepare a state-of-the-art report on a topic would like to have all the items
available on the topics and therefore will go for high recall.

 Where as a user need to know about a given topic will prefer to have a few items and thus will
not require a high recall.

 Other Performance Evaluation criteria's
- Effectiveness : It is the level up to which the given system attain it’s objective. It
measures how far it can retrieve relevant information while with holding non-relevant information.
- Efficiency: It indicates how economically the system is achieving it’s objective. In an

information retrieval system efficiency can be measured with the factor of cost and time.
- Usability
- Satisfaction and
- Cost
Other Performance Evaluation criteria's
 Usability: It measures that embraces the interface through which the
user interacts with the system. It takes into account the user and their
expectations, skills and experiences.
 Satisfaction:
- Searching tasks
- Searching settings
- Searchers state contributes to quality and satisfaction judgments

in the digital environment.
- Other perspectives of satisfaction are to be found in service

quality and website quality literature's.
Other Performance Evaluation criteria's …
 Cost: users may experience costs in terms of any payment that they
need to make for system or document access. Most significant cost is

associated with the time that they spend in searching a system.
Assignment- 4 (15%)
1. Explain about reference collection and types of reference collection

of retrieval measures?
2. Explain how artificial intelligence (AI), machine learning (ML) can

help in information retrieval?
3. Write a simple programming concept using python for retrieving

from repositories / cloud effectively?

Chapter Four (ISR)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter Four (ISR)

Uploaded by

Copyright:

Available Formats

Chapter four

 Retrieval evaluation is an evaluation measure for an information

and significant, using criteria governed by a set of standards before it’s

be justified by the following three issues:-

- How well the system is satisfying it’s objectives

- How efficiently it is satisfying it’ objectives and

- Whether the system justified it’s existence

 The main purpose is to focus on the process of implementation rather than on

been achieved from the collection of documents.

test collection consisting of the following three things:-

- a test suit of information needs, expressible as queries

- A set of relevance judgments, standardly a binary assessment of either

- To assess a set of goals, a programming plan, or a design prior to

- To determine whether and how well goals or performance

- To determine specific reasons for success and failure

- To uncover principles underlying a successful programme

- To explore techniques for increasing programme effectiveness

- To establish a foundation of further research on the reason for

Keen forwards three major purposes for information retrieval

- The need for measures with which to make merit

- The need for measure with which to make comparison

- The need for assessing the merit of a real-life system.

 The evaluation of information retrieval can be conducted into two

- Managerial viewpoint: when evaluation is conducted from

- User viewpoint: when evaluation is conducted from the user

According to Lancaster in 1971proposed five evaluation criteria:-

- Coverage of the system

- Ability of the system to avoid retrieval of unwanted items (i.e

- Ability of the system to retrieve wanted items (i.e recall).

- The response time of the system and

- The amount of effort required by user.

 When considering retrieval performance evaluation, we should first consider

the retrieval task that is to be evaluated.

evaluations are also distinct.

 In fact, in an interactive session, user effort, characteristics of the interface

of the answer set generated.

information retrieval systems focused primarily on laboratory experiments

often used to determine the success of engine results. Some of them

- Session abandonment rate: Session abandonment rate is a ratio

- Click-through rate: Click-through rate (CTR) is the ratio of

user sessions that lead to a success result from the index.

returned with no results. The metric either indicates

- F-score / F- measure : weighted harmonic mean

It is defined as the proportion of the total relevant documents that is

The number of relevant item in the collection = 100.

The number of relevant items retrieved = 60.

relevant and 45 are non-relevant, the precision is 50%.

Precision = 45/90 * 100 = 50%

The total items retrieved =90.

The number of relevant retrieved items = 45.

This is closely related to specificity (1- specificity). It can be as a probability that

the traditional F-measure or balanced F-score is expressed as follows.

 Different users may want different levels of recall.

available on the topics and therefore will go for high recall.

not require a high recall.

- Efficiency: It indicates how economically the system is achieving it’s objective. In an

 Usability: It measures that embraces the interface through which the

- Searchers state contributes to quality and satisfaction judgments

- Other perspectives of satisfaction are to be found in service

need to make for system or document access. Most significant cost is

1. Explain about reference collection and types of reference collection

2. Explain how artificial intelligence (AI), machine learning (ML) can

3. Write a simple programming concept using python for retrieving