MyChap3-Classification - Part 1

Faculty of Science and Humanitarian
studies- Al-Aflaj
Department of
computer Science
CS 4551: Machine Learning
1
Classification
Classification could be binary or multi classification:
Binary Classification
Multi Classification
Example: Image classification
input desired
output
apple
pear
tomato
cow
dog
horse
Object Recognition for Automotive Vehicles
6
Example: Training and testing
Training set (labels known) Test set (labels

unknown)
◼ Key challenge: generalization to unseen examples
New data is classified based on the training set
The basic classification framework
y = f(x)
output
classification input
function
◼ Learning: given a training set of labeled examples
{(x1,y1), …, (xN,yN)}, estimate the parameters of the
prediction function f
◼ Inference(Test): apply f to a never before seen test
example x and output the predicted value y = f(x)
Classification—A Two-Step Process
1. Model construction: based on a training set
2. Evaluation using a test set independent of training

set (otherwise overfitting)
➢ If the accuracy is acceptable, use the model to

classify new data
9
Classification approaches
◼ K-nearest neighbor classifier (K-nn).
◼ Decision Tree
◼ Naïve Bayes classifier.
◼ More advanced approach:

 Support Vector machine
 Deep learning
10
K-nearest neighbor (K-NN) classifier
✓ K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
✓ K-NN algorithm assumes the similarity between the new case/data

and available cases and put the new case into the category that is
most similar to the available categories.
✓ K-NN algorithm can be used for Regression as well as for

Classification but mostly it is used for the Classification problems.
✓ It is also called a lazy learner algorithm because it does not learn

from the training set immediately instead it stores the dataset and at the
time of classification, it performs an action on the dataset.
K-nearest neighbor (K-NN)
classifier
Here, we need to put the new

data point (blue point) in the
right category.
Steps of the K-NN
1) Determine a parameter K (k will be the number of
nearest neighbors )
2) Calculate the distance between the query-instance

and all the training samples
3) Sort distances and determine nearest neighbors

based on the K-th minimum distance
4) Use simple majority of the category of nearest

neighbors as the prediction value of the query instance
Firstly, we will choose the number of neighbors, so we will choose
the k=5.
Secondly, Calculate the distance (Euclidean distance)
How to select the value of K in the K-NN Algorithm?
✓ There is no particular way to determine the best value for "K", so we

need to try some values to find the best out of them. The most
preferred value for K is 5.
✓ A very low value for K such as K=1, can be noisy and lead to
the effects of outliers in the model.
✓ Large values for K are good, but it may find some difficulties.
Example 1
Suppose, we have a dataset for smoking people, with two features (cigarettes and
weight). And we want to know whether vulnerable to a heart attack?
Name Cigarett Weight Heart Attack

es
A 7 70 Yes
B 7 40 Yes
C 3 40 No
D 1 40 No
E 3 70
1. Determine a parameter K, suppose k=3

2. Calculate the distance between the new case and other previous patient cases in dataset.
Name Cigarette Weight Heart Attack Distance
s
A 7 70 Yes
B 7 40 Yes
C 3 40 No
D 1 40 No
E 3 70
3. Sorting
Question: what do you think about the classified result? Yes or No 17

Example 2
◼ We aim to determine the species of a new flower. We measure 4 characteristics of
previously known flowers, and we obtain the following table.
◼ Training set: Sepal length Sepal width Petal length Petal Species
widt
h
S1 5.1 3.5 1.4 0.2 iris setosa
S2 4.7 3.2 1.3 0.2 iris setosa
S3 7.0 3.2 4.7 1.4 iris versicolor
◼ Test set
Sepal Sepal Petal Petal Species
length width length width
S 4.9 3.0 1.4 0.2 ?
Question: Determine its species using the k-nn algorithm with k=3.
18
Solution
Sepal length Sepal width Petal length Petal Species

widt
h
S1 5.1 3.5 1.4 0.2 iris setosa
S2 4.7 3.2 1.3 0.2 iris setosa
Sepal Sepal Petal Petal Species

length width length width
S 4.9 3.0 1.4 0.2 ?
19
Solution
Step 1:
Distance(S,S1)= ( (𝟒. 𝟗 − 𝟓. 𝟏)𝟐 + (𝟑. 𝟎 − 𝟑. 𝟓)𝟐 + (𝟏. 𝟒 − 𝟏. 𝟒)𝟐 + (𝟎. 𝟐 − 𝟎. 𝟐)𝟐 )=

0.5385
Distance(S,S2) ( (4.9 − 4.7)2 + (3.0 − 3.2)2 + (1.4 − 1.3)2 + (0.2 − 0.2)2 )=0.3
Distance(S,S3) ( (4.9 − 7.0)2 + (3.0 − 3.2)2 + (1.4 − 4.7)2 + (0.2 − 1.4)2 )=4.09
Distance(S,S4) ( (4.9 − 6.3)2 + (3.0 − 3.3)2 + (1.4 − 6)2 + (0.2 − 2.5)2 )=5.33
Step 2: Sorting
S2 iris setosa
S1 iris setosa
S3 iris versicolor
S4 iris versicolor
Step 3: Decision
The species iris setosa is the most frequent in k=3, so S belong to

the class iris setosa.
20
Question
What does multiclass classifier mean?
a) A classifier that can predict any field that contains two separate
values, such as a "default" or "non-default" prediction.
b) A classifier that can predict multiple fields with multiple separate

values.
c) A classifier that can predict one among several fields with a

separate multiple value, such as Treatment A, Treatment B, or
Treatment C.

MyChap3-Classification - Part 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MyChap3-Classification - Part 1

Uploaded by

Copyright:

Available Formats

Faculty of Science and Humanitarian

CS 4551: Machine Learning

Training set (labels known) Test set (labels

2. Evaluation using a test set independent of training

➢ If the accuracy is acceptable, use the model to

◼ More advanced approach:

✓ K-NN algorithm assumes the similarity between the new case/data

✓ K-NN algorithm can be used for Regression as well as for

✓ It is also called a lazy learner algorithm because it does not learn

Here, we need to put the new

2) Calculate the distance between the query-instance

3) Sort distances and determine nearest neighbors

4) Use simple majority of the category of nearest

✓ There is no particular way to determine the best value for "K", so we

Name Cigarett Weight Heart Attack

1. Determine a parameter K, suppose k=3

Question: what do you think about the classified result? Yes or No 17

Sepal length Sepal width Petal length Petal Species

Sepal Sepal Petal Petal Species

Distance(S,S1)= ( (𝟒. 𝟗 − 𝟓. 𝟏)𝟐 + (𝟑. 𝟎 − 𝟑. 𝟓)𝟐 + (𝟏. 𝟒 − 𝟏. 𝟒)𝟐 + (𝟎. 𝟐 − 𝟎. 𝟐)𝟐 )=

The species iris setosa is the most frequent in k=3, so S belong to

b) A classifier that can predict multiple fields with multiple separate

c) A classifier that can predict one among several fields with a

You might also like