Confusion Matrix:
A confusion matrix is a simple table that shows how well a classification
model is performing by comparing its predictions to the actual results. It
breaks down the predictions into four categories: correct predictions for both
classes (true positives and true negatives) and incorrect predictions (false
positives and false negatives). This helps you understand where the model
is making mistakes, so you can improve it.
Confusion matrices can be used with any classifier algorithm, such as Naive Bayes,
logistic regression models, decision trees, and so forth. Because of their wide ability
in data science and machine learning models.
Why is it Called a "Confusion Matrix"?
The term "Confusion Matrix" comes from the idea that a classification model can sometimes
be confused between different classes when making predictions.
The matrix shows where the model makes correct and incorrect predictions by comparing
the actual vs. predicted values.
1. Structure of a Confusion Matrix
For a binary classification problem (e.g., predicting whether income is ≤50K or >50K), the
confusion matrix looks like this:
Actual \ Predicted Positive (1) Negative (0)
Positive (1) True Positive (TP) False Negative (FN)
Negative (0) False Positive (FP) True Negative (TN)
Definitions:
True Positive (TP): Correctly predicted positive cases (e.g., model predicts high
income, and person actually has high income).
True Negative (TN): Correctly predicted negative cases (e.g., model predicts low
income, and person actually has low income).
False Positive (FP) (Type I Error): Model incorrectly predicts positive (e.g., model
predicts high income, but person actually has low income).
False Negative (FN) (Type II Error): Model incorrectly predicts negative (e.g., model
predicts low income, but person actually has high income).
Performance Metrics Derived from the Confusion Matrix
1. Accuracy – How often the model is correct
Accuracy=TP+TN/TP+TN+FP+FN
2. Precision (Positive Predictive Value) – When the model predicts positive, how
often is it correct?
Precision=TP/TP+FP
3. Recall (Sensitivity, True Positive Rate) – How well does the model identify
actual positives?
Recall=TP/TP+FN
4. F1 Score – Harmonic mean of Precision and Recall
F1=2[Precision×Recall/Precision+Recall
5. Specificity (True Negative Rate) – How well does the model identify actual
negatives?
Specificity=TN/TN+FP
6. False Positive Rate (FPR)
FPR=FP/FP+TN
Example of a Confusion Matrix
Imagine we use a model to predict whether a person earns >50K (1) or ≤50K (0) using the
Adult Census Income Dataset.
Actual \ Predicted >50K (Predicted 1) ≤50K (Predicted 0)
>50K (Actual 1) 300 (TP) 50 (FN)
≤50K (Actual 0) 40 (FP) 600 (TN)
Accuracy: 300+600/300+600+40+50=90%
Precision: 300/300+40=88.2%
Recall: 300/300+50=85.7%
F1 Score: 2×(88.2×85.7)(88.2+85.7)=86.9%
Final Conclusion:
✅ The model is 90.9% accurate and has good precision (88.2%) and specificity (93.8%).