You are on page 1of 21

Faculty of Science and Humanitarian

studies- Al-Aflaj
Department of
computer Science

CS 4551: Machine Learning

1
Classification
Classification could be binary or multi classification:
Binary Classification

Multi Classification
Example: Image classification
input desired
output
apple

pear

tomato

cow

dog

horse
Object Recognition for Automotive Vehicles

6
Example: Training and testing

Training set (labels known) Test set (labels


unknown)
◼ Key challenge: generalization to unseen examples
New data is classified based on the training set
The basic classification framework

y = f(x)
output
classification input
function
◼ Learning: given a training set of labeled examples
{(x1,y1), …, (xN,yN)}, estimate the parameters of the
prediction function f
◼ Inference(Test): apply f to a never before seen test
example x and output the predicted value y = f(x)
Classification—A Two-Step Process
1. Model construction: based on a training set

2. Evaluation using a test set independent of training


set (otherwise overfitting)

➢ If the accuracy is acceptable, use the model to


classify new data

9
Classification approaches
◼ K-nearest neighbor classifier (K-nn).
◼ Decision Tree
◼ Naïve Bayes classifier.

◼ More advanced approach:


 Support Vector machine
 Deep learning

10
K-nearest neighbor (K-NN) classifier
✓ K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.

✓ K-NN algorithm assumes the similarity between the new case/data


and available cases and put the new case into the category that is
most similar to the available categories.

✓ K-NN algorithm can be used for Regression as well as for


Classification but mostly it is used for the Classification problems.

✓ It is also called a lazy learner algorithm because it does not learn


from the training set immediately instead it stores the dataset and at the
time of classification, it performs an action on the dataset.
K-nearest neighbor (K-NN)
classifier

Here, we need to put the new


data point (blue point) in the
right category.
Steps of the K-NN
1) Determine a parameter K (k will be the number of
nearest neighbors )

2) Calculate the distance between the query-instance


and all the training samples

3) Sort distances and determine nearest neighbors


based on the K-th minimum distance

4) Use simple majority of the category of nearest


neighbors as the prediction value of the query instance
Firstly, we will choose the number of neighbors, so we will choose
the k=5.
Secondly, Calculate the distance (Euclidean distance)
How to select the value of K in the K-NN Algorithm?

✓ There is no particular way to determine the best value for "K", so we


need to try some values to find the best out of them. The most
preferred value for K is 5.

✓ A very low value for K such as K=1, can be noisy and lead to
the effects of outliers in the model.

✓ Large values for K are good, but it may find some difficulties.
Example 1
Suppose, we have a dataset for smoking people, with two features (cigarettes and
weight). And we want to know whether vulnerable to a heart attack?

Name Cigarett Weight Heart Attack


es
A 7 70 Yes
B 7 40 Yes
C 3 40 No
D 1 40 No
E 3 70

1. Determine a parameter K, suppose k=3


2. Calculate the distance between the new case and other previous patient cases in dataset.
Name Cigarette Weight Heart Attack Distance
s
A 7 70 Yes
B 7 40 Yes
C 3 40 No
D 1 40 No
E 3 70

3. Sorting

Question: what do you think about the classified result? Yes or No 17


Example 2
◼ We aim to determine the species of a new flower. We measure 4 characteristics of
previously known flowers, and we obtain the following table.
◼ Training set: Sepal length Sepal width Petal length Petal Species
widt
h
S1 5.1 3.5 1.4 0.2 iris setosa
S2 4.7 3.2 1.3 0.2 iris setosa
S3 7.0 3.2 4.7 1.4 iris versicolor
S4 6.3 3.3 6.0 2.5 iris versicolor

◼ Test set
Sepal Sepal Petal Petal Species
length width length width
S 4.9 3.0 1.4 0.2 ?

Question: Determine its species using the k-nn algorithm with k=3.
18
Solution

Sepal length Sepal width Petal length Petal Species


widt
h
S1 5.1 3.5 1.4 0.2 iris setosa
S2 4.7 3.2 1.3 0.2 iris setosa
S3 7.0 3.2 4.7 1.4 iris versicolor
S4 6.3 3.3 6.0 2.5 iris versicolor

Sepal Sepal Petal Petal Species


length width length width
S 4.9 3.0 1.4 0.2 ?

19
Solution
Step 1:

Distance(S,S1)= ( (𝟒. 𝟗 − 𝟓. 𝟏)𝟐 + (𝟑. 𝟎 − 𝟑. 𝟓)𝟐 + (𝟏. 𝟒 − 𝟏. 𝟒)𝟐 + (𝟎. 𝟐 − 𝟎. 𝟐)𝟐 )=


0.5385
Distance(S,S2) ( (4.9 − 4.7)2 + (3.0 − 3.2)2 + (1.4 − 1.3)2 + (0.2 − 0.2)2 )=0.3
Distance(S,S3) ( (4.9 − 7.0)2 + (3.0 − 3.2)2 + (1.4 − 4.7)2 + (0.2 − 1.4)2 )=4.09
Distance(S,S4) ( (4.9 − 6.3)2 + (3.0 − 3.3)2 + (1.4 − 6)2 + (0.2 − 2.5)2 )=5.33

Step 2: Sorting
S2 iris setosa
S1 iris setosa
S3 iris versicolor
S4 iris versicolor

Step 3: Decision

The species iris setosa is the most frequent in k=3, so S belong to


the class iris setosa.
20
Question
What does multiclass classifier mean?

a) A classifier that can predict any field that contains two separate
values, such as a "default" or "non-default" prediction.

b) A classifier that can predict multiple fields with multiple separate


values.

c) A classifier that can predict one among several fields with a


separate multiple value, such as Treatment A, Treatment B, or
Treatment C.

You might also like