Professional Documents
Culture Documents
A/Prof. Jianxin Li
School of Information Technology
Deakin University
jianxin.li@Deakin.edu.au
Review of Last Week’s Class
2
Outline of Today’s Class
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
3
Retrieval Evaluation
• Retrieval Performance
• Evaluations
• Precision
• Recall
• Single Value Summaries
4
The Importance of Evaluation
5
Types of Evaluation
Strategies
• System-centered studies
– Given documents, queries, and relevance judgments
– Try several variations of the system
– Measure which system returns the “best” hit list
• User-centered studies
– Given several users, and at least two retrieval systems
– Have each user try the same task on both systems
– Measure which system works the “best”
6
Outline of Today’s Class
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
• Self-test Question Discussion
7
Evaluation Criteria
Effectiveness
– How “good” are the documents that are returned?
– System only, human + system
• Efficiency
– Retrieval time, indexing time, index size
• Usability
– Learnability, frustration
– Novice vs. expert users
8
Good Effectiveness Measures
9
The Notion of Relevance
10
What is relevance?
Query Documents
IR Black Box
Ranked List
Evaluation
Relevance Judgments
Module
Measure of Effectiveness
A.
B.
C.
D.
E.
F.
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
• Self-test Question Discussion
14
Set-Based Measures
• Precision = A ÷ (A+B)
• Recall = A ÷ (A+C)
• Miss = C ÷ (A+C)
• False alarm (fallout) = B ÷ (B+D)
When is precision important?
When is recall important?
15
Recall and Precision
• Recall
– the fraction of the relevant Relevant Docs
documents which has been in Answer Set
retrieved A collection
• Precision
– the fraction of the retrieved Answer Set
documents which is relevant
A+B
Relevant Docs
A+C
16
Measuring Precision and
Recall
• Assume there are a total of 14 relevant documents
• The user is not usually presented with all the documents in the
answer set A at once
Hits 01-10
Precision 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
Recall 1/14 1/14 1/14 1/14 2/14 3/14 3/14 4/14 4/14 4/14
Hits 11-20
Precision 5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 6/20
Recall 5/14 5/14 5/14 5/14 5/14 6/14 6/14 6/14 6/14 6/14
17
Graphing Precision and
Recall
0.8
0.6
Precision
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
Recall
18
Need for Interpolation
• Two issues:
– How do you compare performance across queries?
– Is the sawtooth shape intuitive of what’s going on?
0.8
0.6
Precision
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
Recall
Solution: Interpolation!
19
Interpolation
• Why?
– We have no observed data between the data points
– Strange sawtooth shape doesn’t make sense
• It is an empirical fact that on average as recall
increases, precision decreases
• Interpolate at 11 standard recall levels
– 100%, 90%, 80%, … 30%, 20%, 10%, 0% (!)
– How?
P ( R ) = max{P′ : R′ ≥ R ∧ ( R′, P′) ∈ S }
20
Interpolation
0.8
0.6
Precision
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
Recall
0.8
0.6
Precision
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
Recall
23
Precision versus Recall
Figures
100
80
Precision
60
40
20
0
0 10 20 30 40 50 60 70 80 90 100
Recall
24
Single Value Summaries
25
Single Value Summaries
(MAP)
Hits 1-10
Precision 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
Hits 11-20
Precision 5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 4/20
26
Single Value Summaries
(R-precision)
• R-precision (Recall-precision)
– the precision at the R-th position in the ranking, where R is
the total number of relevant documents for the current query
Hits 1-10
Precision 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
Hits 11-20
Precision 5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 4/20
27
Problems with Precision and
Recall rate
28
Outline of Today’s Class
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
• Self-test Question Discussion
29
The Harmonic Mean
2
F ( j) =
1 1
+
R( j ) P( j )
• R(j): the recall for the j-th document in the ranking
• P(j): the precision for the j-th document in the ranking
30
The Harmonic Mean
2× P × R
F=
P+R
• F is always in the interval [0,1]
• F is 0 when no relevant documents have been retrieved
– in which case P and R are both 0
• F is 1 when all ranked documents are relevant
– in which case P and R are both 1
• Further more, F is only high when both R and P are high,
meaning that the determination of the maximum value of F can
be interpreted as an attempt to find the best possible
compromise between recall and precision.
31
E Measure (parameterized F
Measure)
• A variant of F measure that allows weighting
emphasis on precision over recall:
(1 + β ) PR (1 + β )
2 2
E= = β2 1
β P+R
2
+
R P
• Value of β controls trade-off:
– β = 1: Equally weight precision and recall (E=F).
– β > 1: Weight recall more.
– β < 1: Weight precision more.
32
Outline of Today’s Class
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
• Self-test Question Discussion
34
Conclusion of Today’s Class
• Retrieval Evaluation
• Evaluation Criteria
– Effectiveness
– Efficiency
– Usability
• Metrics
– Precision, recall, miss, false alarm
• Harmonic Mean and the E Measure
• Self-test Question Discussion
37