Professional Documents
Culture Documents
The provided information includes the evaluation results of a Naive Bayes classifier that was trained
using a supervised approach to classify spam and non-spam emails. The dataset includes 4601 instances
and 10 attributes including various word frequencies, character frequencies, and capital run length
averages. The model was tested on the training data itself and achieved an overall accuracy of 83.09%.
The detailed accuracy by class indicates that the model performed significantly better in classifying non-
spam emails (class 0) than spam emails (class 1). Specifically, the true positive rate for class 0 was 97.0%,
indicating that 97.0% of actual non-spam emails were correctly classified as such by the model, while the
true positive rate for class 1 was only 61.7%, indicating that only 61.7% of actual spam emails were
correctly classified. The false positive rate was 38.3% for class 0 and 3.0% for class 1. The precision for
class 0 was 79.6%, indicating that 79.6% of emails classified as non-spam by the model were actually
non-spam, while the precision for class 1 was much higher at 93.0%, indicating that 93.0% of emails
classified as spam by the model were actually spam.
The confusion matrix shows that out of the 2788 non-spam emails, the model correctly classified 2704
(97.0%) as non-spam and misclassified 84 (3.0%) as spam. Out of the 1813 spam emails, the model
correctly classified 1119 (61.7%) as spam and misclassified 694 (38.3%) as non-spam.
The kappa statistic was 0.6238, indicating a substantial level of agreement between the actual and
predicted classifications. The mean absolute error was 0.1696 and the root mean squared error was
0.3984, which indicates that the model's predictions were relatively close to the actual values. However,
the relative absolute error was high at 35.5209%, indicating that the model's predictions had a
significant level of error compared to the actual values. The root relative squared error was also high at
81.5284%, indicating a relatively large deviation between the predicted and actual values.
In summary, the Naive Bayes classifier achieved an overall accuracy of 83.09% in classifying spam and
non-spam emails, with better performance in classifying non-spam emails than spam emails. The
model's precision in classifying spam emails was higher than in classifying non-spam emails, but the false
positive rate for non-spam emails was much higher than for spam emails. The kappa statistic indicated
substantial agreement between the actual and predicted classifications, but the relative absolute error
and root relative squared error were high, indicating a significant level of error in the model's
predictions.
=== Summary of model analysis ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
Weighted Avg. 0.831 0.244 0.849 0.831 0.822 0.653 0.942 0.938
a b <-- classified as
2704 84 | a = 0
694 1119 | b = 1
In the filter options, select a feature selection algorithm, such as CfsSubsetEval, and
configure the options as needed
Apply the filter to your dataset and save the new feature-selected dataset for further
processing
Information gain and chi-squared feature selection: We can use information gain and chi-squared
feature selection filters to identify the most important features in the dataset. These filters calculate the
relevance of each feature for the classification task and select the most informative features. WEKA
provides information gain and chi-squared feature selection filters that we can use.
Choose the "Ranker" search method from the "Search" dropdown menu
The ranked list of attributes will be shown in the "Selected attributes" pane
Select the least four attributes of the top-ranked attributes and click "Apply" to select them
Algorithm selection:
Click on the "Classify" tab and select the "Choose" button to select the classifier you
want to use
Configure the options of the classifier as needed and click "OK" to apply the classifier to
your dataset