You are on page 1of 14

Class 3: Classification

Our Vision: To be a World Class University


Classification task process

Data Data pre- Classification algorithm


processing Decision
collection

Our Vision: To be a World Class University


Classification technique
• Classification method/technique
• predicts categorical labels (discrete or nominal)
• needs to be carefully trained on the training set only.

• Goal: predicting categorical or nominal or discrete class labels.

• Classification algorithm classifies data by building a predictive model that can be used later to
predict (classify) new data or unseen labels.

• Note: each classifier produces a model that is called a classifier (or predictive) model

• Classification technique helps to perform detection/recognition.

Our Vision: To be a World Class University


Classification technique using WEKA
To perform classification via WEKA, the following steps can be applied after data preprocessing:
1. Choose the desired classifier using the “Choose” button.
o The selection of the classification algorithm (classifier) depends on the data. Algorithms
with a gray color are not active. Hence, to apply such algorithms, you need to modify the
dataset and make it able to be used by the classifier. All WEKA classifiers have "Capabilities"
that provide information on what data characteristics they can handle. This allows
modifying the data and makes it suitable to be used by the algorithm.
o Some algorithms are having parameters (e.g., pruning parameter in “J48” classifier (decision
tree) or learning rate in “Multilayer Perceptron” classifier). These parameters can be
obtained if you left-click on the algorithm. Tweaking parameters sometimes enhance/harm
the performance of the algorithm.
o ZeroR classifier can be used only for the first time as a baseline classifier, but do not select it
among the top selected classifiers.

Our Vision: To be a World Class University


2. Select the test option to test/evaluate your classifier
o 10-fold cross-validation (or “Percentage split”) is recommended in general cases. If you
have an independent test set (i.e., a file with labels that you need to predict), use
“Supplied test set”
o “Use training set” option is not recommended.

How 10-fold cross validation works?

One data set is divided randomly


into 10 parts. 9 parts are used for
training and reserve one tenth for
testing. This procedure is repeated
10 times; each time reserving a
different tenth for testing.

Our Vision: To be a World Class University


5
3. To see the predictions on the “Classifier output” panel, click the “More options...” button,
then choose “PlainText” as an option of the “Output prediction”.

4. Double-check the “Class” from the drop-down menu. Our Vision: To be a World Class University
4. Hit the “Start” button to apply the classifier on the data and perform the prediction task.

• Usually, this is the last step if you do not have an independent test set to label. So, you can obtain
the result and start the analysis.

• If you have independent test data (besides the training data), then you need to continue the next
steps.

Our Vision: To be a World Class University


5. If you are happy with the result of the algorithm, save the model to apply it on the test set data.
To do so, see the below figure.

Our Vision: To be a World Class University


6. Load your test set data, use the option “Supplied test set”.

Our Vision: To be a World Class University


9
7. Load the classifier model that you saved previously.

Our Vision: To be a World Class University


8. To apply the loaded model on the test set, see the below figure.

Our Vision: To be a World Class University


11
9. To see the final predicted labels for the test set, have a look at the below figure.

Our Vision: To be a World Class University


Example data
• The example data contain the following data features:

1. Number of times pregnant


2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test%
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class: (1) tested_negative (2) tested_positive

Our Vision: To be a World Class University


Thank you!

Our Vision: To be a World Class University

You might also like