Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance

Machine Learning
Lecture # 2
Data Normalization, KNN & Minimum Distance
1
Courtesy: Dr Usman Akram (NUST)
Generalization
– While classes can be specified by training samples with
known labels, the goal of a recognition system is to
recognize novel inputs
– When a recognition system is over-fitted to training
samples, it may give bad performance for typical inputs
2
OverFitting
3
PERFORMANCE MEASUREMENTS
R.O.C.
Analysis
False positives – i.e. falsely predicting an event

False negatives – i.e. missing an incoming event
Similarly, we have “true positives” and “true negatives”
Prediction
0 1
0
TN FP
Truth
1 FN TP
Accuracy Measures
• Accuracy
– = (TP+TN)/(P+N)
• Sensitivity or true positive rate (TPR)
– = TP/(TP+FN) = TP/P
• Specificity or TNR
– = TN/(FP+TN) = TN/N
• Positive Predictive value (Precision) (PPV)
– = Tp/(Tp+Fp)
• Recall
– = Tp/(Tp+Fn)
ROC Curve
Choosing the threshold
• Where should we set the threshold
• We could choose the equal error rate point where the errors
in positive training set equals the errors in the negative
training set
• Data classes may be very imbalanced (e.g. |D+| ≪ |D−|)
• In many applications we might be risk adverse to false
positives or false negatives
• Want to see all the options
• The receiver operating characteristic (ROC) curve is a standard
way to test this
Threshold
Call these patients “negative” Call these patients “positive”
Test Result
Some definitions ...
True Positives
Test Result
without the disease
with the disease
Test Result False

without the disease
Positives
with the disease
True
negatives
Test Result
without the disease
with the disease
False
negatives
Test Result
without the disease
with the disease
ROC curve comparison
A good test: A poor test:
100%
100%
True Positive Rate
True Positive Rate

0 0
% %
0 100% 100%
0
% False Positive Rate False Positive Rate
%
ROC curve extremes
Best Test: Worst test:
100% 100%
True Positive Rate
True Positive
Rate
0
0 %
% 0 100
0 100 False Positive %
False Positive % %
% Rate
Rate
The distributions The distributions

don’t overlap at overlap completely
all
AUC for ROC curves
100% 100%
AUC = 100%
True Positive Rate
True Positive
Rate
AUC = 50%
0
0 %
% 0 100
0 100 False Positive Rate %
False Positive Rate % %
%
100% 100%
AUC = 90%
True Positive
True Positive AUC = 65%

Rate
Rate
0 0
% %
0 100 100
False Positive Rate 0
% % False Positive Rate %
%
Data Normalization
• Between 0 to 1
((x-min(x))/(max(x)-min(x)))
• Between -1 to 1
((x-min(x))/(max(x)-min(x)))*2-1
Data Normalization
Classification Example
• Why recognising rugby players is (almost)
handwriting
the same problem as
recognition
Can we LEARN to recognise a rugby player?
What are the “features” of a rugby player?

Rugby players = short + heavy?
190cm
130cm
60kg 90kg
Ballet dancers = tall + skinny?
190cm
130cm
60kg 90kg
Rugby players “cluster” separately in the space.
Height
Weight
K Nearest Neighbors
Nearest Neighbour Rule
Consider a two class problem where

each sample consists of two
measurements (x,y).
For a given query point q, k=1

assign the class of the
nearest neighbour.
Compute the k nearest k=3

neighbours and assign the
class by majority vote.
50
The K-Nearest Neighbour Algorithm
Who’s this?
Height
Weight
1. Measure distance to all points
Height
Weight
2. Find closest “k” points  (here k=3, but it could be more)
Height
Weight
2. Find closest “k” points  (here k=3, but it could be more)
3. Assign majority class
Height
Weight
“Euclidean distance”
d  (w  w )2  (h 
h )2
1
1
(w, h)
Height
d
(w1, h1)
Weight
for each testing

point
measure distance to every training point
find the k closest points
identify the most common class among
those k
predict that class
end
• Advantage: Surprisingly good classifier!
• Disadvantage: Have to store the entire training
set in memory
Euclidean distance still works in 3-d, 4-d, 5-d, etc….
d  (x  x )  ( y  y )  (z 
2 2
z )2
1 1
1
x = Height
y = Weight
z = Shoe size
Choosing the wrong features makes it difficult,
too many and it’s computationally intensive.
Possible features:
- Shoe size
- Height
?
- Age
- Weight
Shoe size
Age
Nearest Neighbour Rule
Consider a two class problem where

each sample consists of two
measurements (x,y).
For a given query point q, k=1

assign the class of the
nearest neighbour.
Compute the k nearest k=3

neighbours and assign the
class by majority vote.
59
Nearest Neighbor Classifier
10
9
8
7
Antenna Length
6
5 If the nearest instance to the previously
4 unseen instance is a Katydid
class is Katydid
3 else
2 class is
1 Grasshopper
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshopper
Abdomen Length s
The nearest neighbor algorithm is sensitive to outliers…
The solution is to…

We can generalize the nearest neighbor algorithm to
the K- nearest neighbor (KNN) algorithm.
We measure the distance to the nearest K instances, and let
them vote. K is typically chosen to be an odd number.
K=1 K=3
K-Nearest Neighbour Model
• Picking K
– Use N fold cross validation – Pick K to minimize the cross validation error
– For each of N training example
 Find its K nearest neighbours

 Make a classification based on these K neighbours
 Calculate classification error
 Output average error over all examples
– Use the K that gives lowest average error over the N training examples
63
• Example : Classify whether a customer will respond to a survey question
using a 3-Nearest Neighbor classifier
Customer Age Income No. credit Response

cards
John 35 35K 3 No
Rachel 22 50K 2 Yes
Hannah 63 200K 1 No
Tom 59 170K 1 No
Nellie 25 40K 4 Yes
David 37 50K 2 ?
65
• Example : 3-Nearest Neighbors
Age Income
Response
cards cards
John 35 35 35K 3 No
35K
Rachel 22 22 50K
50K 2 Yes
Hannah 6363 200K

200K 1 No 15.16
Tom 59 59 170K 1 No 15
170K
Nellie 25 25 40K40K 4 Yes 152.23
122
15.74
David 37 37 50K50K 2 ?
66
Age Income
Response
cards cards
John 35 35 35K 3 3 No
35K No
Rachel 22 22 50K
50K 2 Yes
2
Hannah 6363 200K
200K Yes
1 No 15.16
Tom 59 59 170K 1 No 15
170K
Nellie 25 25 40K40K 4 Yes 152.23
122
15.74
David 37 37 50K50K 2 ?
Three nearest ones to David are: No, Yes, Yes
67
Age Income
Response
cards cards
John 35 35 35K 3 3 No
35K No
Rachel 22 22 50K
50K 2 Yes
2
Hannah 6363 200K
200K Yes
1 No 15.16
Tom 59 59 170K 1 No 15
170K
Nellie 25 25 40K40K 4 Yes 152.23
122
15.74
David 37 37 50K50K 2 Ye??s
Three nearest ones to David are: No, Yes, Yes
68
• Example: For the example we saw earlier, pick the best K from the set {1, 2,
3} to build a K-NN classifier

cards
John 35 35K 3 No
Rachel 22 50K 2 Yes
Hannah 63 200K 1 No
Tom 59 170K 1 No
Nellie 25 40K 4 Yes
David 37 50K 2 ?
69
Minimum Distance
Minimum Distance Classifier
• For a test sample X, compute Dj(X) for each
class j
• Assign class with minimum D(x) value
• Here mj is mean value of training samples

from jth class
Manipulating Dj(X)
• Now instead of Dj(X), we compute
discriminent function dj(X) for each class
• Assign class with maximum dj(X) value
Acknowledgements
 Introduction to Machine Learning, Alphaydin
 Digital image processing, Gonzalez, 3rd Edition
 Pattern Classification” by Duda et al., John Wiley & Sons.
 Some material adapted from Dr Ali Hassan’s slides
Material in these slides has been taken from, the following
resources
76

Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance

Uploaded by

Copyright:

Available Formats

Machine Learning

False positives – i.e. falsely predicting an event

Similarly, we have “true positives” and “true negatives”

Call these patients “negative” Call these patients “positive”

Call these patients “negative” Call these patients “positive”

Test Result False

A good test: A poor test:

True Positive Rate

The distributions The distributions

True Positive AUC = 65%

What are the “features” of a rugby player?

Consider a two class problem where

For a given query point q, k=1

Compute the k nearest k=3

for each testing

Consider a two class problem where

For a given query point q, k=1

Compute the k nearest k=3

The solution is to…

– For each of N training example

 Find its K nearest neighbours

Customer Age Income No. credit Response

Rachel 22 50K 2 Yes

Nellie 25 40K 4 Yes

Hannah 6363 200K

Three nearest ones to David are: No, Yes, Yes

Three nearest ones to David are: No, Yes, Yes

Customer Age Income No. credit Response

Rachel 22 50K 2 Yes

Nellie 25 40K 4 Yes

• Here mj is mean value of training samples

You might also like