Professional Documents
Culture Documents
“Human experts mark, for each query and for each doc, Relevant or Non-relevant
or at least for subset of docs that some system returned for that query
THE SIGIR MUSEUM
EVALUATION
• Precision = 6/20
• Recall = 6/8
EXAMPLE : PRECISION &
RECALL
• Suppose wife asked husband about the dates of 4 important events -
wedding anniversary, her birthday, mother-in law and father-in law
birthday dates.
• If husband was able to recall all these 4 dates but with 8 attempts in
total.
• Your recall score is 100% but your precision score is 50% which is
4 divided by 8.
SHOULD WE INSTEAD USE THE ACCURACY
MEASURE FOR EVALUATION?
PRECISION/RECALL
1 ( 2 1) PR
F
1
(1 )
1 2
PR
P R
F1-Score
A Mean for Precision and Recall
𝟐 𝑷𝑹 People usually use balanced F1
𝑭𝟏 = measure i.e., with = 1 or = ½
𝑷+ 𝑹
• Here 40(P) patients are diabetic in which 30(TP) patients the medicine
worked well.
• Accuracy= 30+1000/(30+50+10+1000) = 95%
• Issues are - Number of TN is very high (Non Diabetic) due to which
accuracy is going high. Data is skewed.
• In such scenario Precision recall Curve is helpful as they don’t consider TN
ROC VS PR CURVE
• P@1 = 0
• P@2 = ½ R
• P@3 = 2/3 R R
• P@4 = 2/4 R R
• R@1 = 0
• R@2 = 1/100 R
• R@3 = 2/100 R R
• R@4 = 2/100 R R
Sec. 8.4
A PRECISION-RECALL CURVE
What is happening here when
precision dips without increase in
1.0 recall
0.8
Precision
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
INTERPOLATED PRECISION
R R R R
R R R R
•
(Interpolated) P@1 = 0.5 R
•
(Interpolated) P@2 = 2/3 R R
R R R R R
• Average Precision = ?
For convenience, we refer to Interpolated Average Precision when we say AP
WHAT IS THE AVERAGE PRECISION?
• Case 1:
R R R R R
• Average Precision = ½ + ½ + ½ 5+ ½ + ½
• Case 2:
R R R
R R R
Only 3 relevant
• Query3: docs in
R R R corpus.
COMPUTE MAP
• Query1:
R R R R R Only 5 relevant
docs in corpus.
• Query2:
R R R
• Compute MAP.
70 Nonrelevant Nonrelevant
20 Relevant Nonrelevant
10 Nonrelevant Relevant
KAPPA EXAMPLE