You are on page 1of 22

Least Risk Boundaries

(Unequal costs of error)


• The goal of the classifier is to classify a pattern into the most
likely class given its feature values, prior probabilities and
conditional probabilities.
• Choosing the most likely class reduces the overall error rate.
• But some applications involves Costs / Penalties for
misclassification.
• If these costs are considered the decision boundaries may
change.
• So, Least-Risk (or) Optimum decision boundaries are
considered in such cases.
• Suppose that in 2-class problem, the reward for
correctly classifying a sample from class A is Rs.3 and
for class B is Rs.4. The penalty for misclassification is
Rs.8 for class A and Rs.20 for class B. For a sample
feature x, P(A|x) = 0.6 then calculate the risk factors.

• Since P(A|x) = 0.6, P(B|x)=0.4

• L(A^|A)= -3 L(B^|B)= -4 L(B^|A)= 8 L(A^|B)=20


• The risk of choosing class A for the given sample x is given by

• R(A^|x) = L(A^|A) P(A|x) + L(A^|B) P(B|x)


• = (-3)(0.6) + (20) (0.4) = +6.2

• The risk of choosing class B for the given sample x is given by

• R(B^|x) = L(B^|B) P(B|x) + L(B^|A) P(A|x)


• = (-4)(0.4) + (8)(0.6) = +3.2

• Decision: Since the risk of choosing Class B is less, Class B is chosen as


most likely class for the given sample x.
• Determine the optimal (least-risk) boundary between 2 simple
bivariate normal classes with the following features.
• P(A)=1/3 , P(B)=2/3 L(B^|A)= 20 L(A^|B)=10

How to evaluate performance of classifier
Simple counting
• Simple counting : 2/4 = 0.5 50%
• Acuuracy

• Fractional counting

Accuracy?????
Confusion Matrices
• It represents the classification results.
TP = 30, TN = 930, FP = 30, FN = 10
Why do we need Precision and Recall?
Accuracy in this example = 75%

Cancer Non-Cancer

Cancer 1 1
Non-Cancer 60 185
• Precision tells us how many of the correctly predicted cases actually
turned out to be positive.
• Recall tells us how many of the actual positive cases we were able to
predict correctly with our model.
• Precision is a useful metric in cases where False Positive is a higher
concern than False Negatives.

• Recall is a useful metric in cases where False Negative trumps False


Positive.

• Sometimes it depends on cost also.


• A classifier is tested for 16 flu and 24 non-flu
samples. Out of the flu samples it has predicted
13 samples have flu and out of 24 non-flu
samples 4 samples were predicted as flu.
Determine accuracy, precision and recall.
• A Diamond shop owner wanted to segregate the gems
into diamonds and stones in order to sell. Out of 85
gems there were 65 diamonds and 20 stones. The first
classifier classifies 45 out of 65 diamonds correctly
and 15 out of 20 stones correctly. The second
classifier classifies 55 out of 65 diamonds correctly
and 5 out of 20 stones correctly. Choose the suitable
classifier for the given scenario. Justify your answer.

You might also like