Foundations of Data Science - Unit 5 - Accuracy KNN

3/14/2023
Data Science
Unit 5
Accuracy Measures & KNN
1
3/14/2023
Outline
▪ Accuracy Measures
▪ Eager and Lazy Learners
▪ Distance Computation and Normalization
▪ K-Nearest Neighbor
▪ Cross Validation
Dr.
Spring 2023 Muhammad 3
Usman Arif
Positive and Negative Tuples

▪ Positive tuples (tuples of the main class of interest)
▪ Negative tuples (all other tuples).
▪ For example,
▪ The positive tuples may be buys computer = yes
▪ The negative tuples are buys computer = no.
Dr.
Usman Arif
2
3/14/2023
The Building Blocks

▪ True positives TP: These refer to the positive tuples that were correctly labeled
by the classifier as positive. Let TP be the number of true positives.
▪ True negatives TN: These are the negative tuples that were correctly labeled by
the classifier as negative. Let TN be the number of true negatives.
▪ False positives FP: These are the negative tuples that were incorrectly labeled
as positive (e.g., tuples of class buys computer = no for which the classifier
predicted buys computer = yes). Let FP be the number of false positives.
▪ False negatives FN: These are the positive tuples that were mislabeled as
negative (e.g., tuples of class buys computer = yes for which the classifier
predicted buys computer = no). Let FN be the number of false negatives.
Dr.
Usman Arif
Metrics for Performance Evaluation

▪ Focus on the predictive capability of a model
▪ Rather than how fast it takes to classify or build models, scalability, etc.
▪ Confusion Matrix:
PREDICTED CLASS a: TP (true positive)

Class=Yes Class=No b: FN (false negative)
Class=Yes a b c: FP (false positive)
ACTUAL (TP) (FN)
CLASS d: TN (true negative)
Class=No c d
(FP) (TN)
Dr.
Usman Arif
3
3/14/2023
Metrics for Performance Evaluation…

PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)
▪ Most widely-used metric:
a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
Dr.
Usman Arif
Limitation of Accuracy
▪ Consider a 2-class problem

▪ Number of Class 0 examples = 9990
▪ Number of Class 1 examples = 10
▪ If model predicts everything to be class 0, accuracy is

9990/10000 = 99.9 %
▪ Accuracy is misleading because model does not detect any class
1 example
Dr.
Usman Arif
4
3/14/2023
Cost-Sensitive Measures PREDICTED CLASS
a
Precision (p) =
Class=Yes Class=No
a+c ACTUAL Class=Yes a b

CLASS (TP) (FN)
a
Recall (r) = Class=No c d
a+b (FP) (TN)
2rp 2a
F - measure (F) = =
r + p 2a + b + c
wa + w d
Weighted Accuracy = 1 4
wa + wb+ wc + w d
1 2 3 4
l Precision is biased towards C(Yes|Yes) & C(Yes|No)

l Recall is biased towards C(Yes|Yes) & C(No|Yes)
l F-measure is biased towards all except C(No|No)
Dr.
Usman Arif
a
Precision (p) =
a+c
Recall and Precision a
Recall (r) =
Actual Prediction a+b
T T
2rp 2a
T F F - measure (F) = =
r + p 2a + b + c
F T
F F PREDICTED CLASS
F T
Class=Yes Class=No
T T
T T ACTUAL Class=Yes a b
CLASS (TP) (FN)
T F
F T Class=No c d
(FP) (TN)
T T
Dr.
Usman Arif
10
5
3/14/2023
Recall and Precision

Actual Prediction ▪ Precision = 4 / 7
T T
T F
F T
F F
F T
T T
T T
T F
F T
T T
Dr.
Usman Arif
11
Recall and Precision

Actual Prediction ▪ Recall = 4 / 6
T T ▪ Precision = 4 / 7
T F ▪ F-Measure = 8 / 13
F T
F F
F T
T T
T T
T F
F T
T T
Dr.
Usman Arif
12
6
3/14/2023
Other Performance Measures

▪ Speed: This refers to the computational costs involved in
generating and using the given classifier.
▪ Robustness: This is the ability of the classifier to make correct
predictions given noisy data or data with missing values.
▪ Scalability: This refers to the ability to construct the classifier
efficiently given large amounts of data.
▪ Interpretability: This refers to the level of understanding and
insight that is provided by the classifier or predictor.
Dr.
Usman Arif
13
K- Nearest Neighbor
Dr.
Spring 2023 Muhammad
14
Usman Arif
14
7
3/14/2023
Eager Learners
▪ Eager learners, when given a set of training tuples, will

construct a generalization (i.e., classification) model
before receiving new (e.g., test) tuples to classify.
▪ We can think of the learned model as being ready and

eager to classify previously unseen tuples.
Dr.
Usman Arif
15
Lazy Learners
▪ Imagine a contrasting lazy approach
▪ The learner instead waits until the last minute before doing any model
construction in order to classify a given test tuple.
▪ When given a training tuple, a lazy learner simply stores it (or does only
a little minor processing) and waits until it is given a test tuple.
▪ Only when it sees the test tuple does it perform generalization in order
to classify the tuple based on its similarity to the stored training tuples.
Dr.
Usman Arif
16
8
3/14/2023
Nearest Neighbor Classifiers

▪ Basic idea:
▪ If it walks like a duck, quacks like a duck, then it’s probably a duck
Dr.
Usman Arif
17
Instance Based Learning

▪ In instance-based learning the training examples are stored
verbatim, and a distance function is used to determine which
member of the training set is closest to an unknown test
instance.
▪ Once the nearest training instance has been located, its class is
predicted for the test instance.
▪ The only remaining problem is defining the distance function, and
that is not very difficult to do, particularly if the attributes are
numeric.
Dr.
Usman Arif
18
9
3/14/2023
Distance for Nominal Attributes

▪ For categorical attributes, a simple method is to compare the
corresponding value of the attribute in tupleX1 with that in tupleX2.
▪ If the two are identical (e.g., tuplesX1 and X2 both have the color
blue), then the difference between the two is taken as 0.
▪ If the two are different (e.g., tupleX1 is blue but tupleX2 is red),
then the difference is considered to be 1.
Dr.
Usman Arif
19
Distance for Ordinal Attributes

▪ Consider an attribute that measures the quality of a product, e.g., a
candy bar, on the scale:
▪ {poor, fair, OK, good, wonderful}.
▪ It would seem reasonable that a product, P1, which is rated

wonderful, would be closer to a product P2, which is rated good,
than it would be to a product P3, which is rated OK.
▪ To make this observation quantitative, the values of the ordinal
attribute are often mapped to successive integers, beginning at 0 or
1, e.g.,
▪ {poor=0, fair=1, OK=2, good=3, wonderful=4}
Dr.
Usman Arif
20
10
3/14/2023
Distance for Ordinal Attributes
▪ This assumes equal intervals between successive

values of the attribute, and this is not necessarily
so.
▪ Otherwise, we would have an interval or ratio attribute.
▪ But in practice, our options are limited, and in the
absence of more information, this is the standard
approach for defining proximity between ordinal
attributes.
Dr.
Usman Arif
21
Interval and Ratio Based Attributes
▪ Absolute Difference
Dr.
Usman Arif
22
11
3/14/2023
Normalization
▪ Different interval or ratio attributes are measured on

different scales
▪ The effects of some attributes might be completely
dwarfed by others that had larger scales of measurement.
▪ Consequently, it is usual to normalize all attribute values
to lie between 0 and 1
Dr.
Usman Arif
23
Normalization Techniques Recall!
▪ Min-Max Normalization
▪ Decimal Scaling
▪ v’(i) = v(i) / 10k
▪ For the smallest k such that max |v’(i)|< 1.
Dr.
Usman Arif
24
12
3/14/2023
Similarity Between Data

Objects
Dr.
Spring 2023 Muhammad
25
Usman Arif
25
Distance Function
▪ Although there are other possible choices, most instance-based
learners use Euclidean distance.
▪ Euclidean distance between two examples.
▪ P = [p1, p2, p3,.., pn]
▪ Q = [q1, q2, q3,..., qn]
▪ The Euclidean distance between X and Y is defined as:
▪ When comparing distances it is not necessary to perform the

square root operation; the sums of squares can be compared
directly.
Dr.
Usman Arif
26
13
3/14/2023
Distance Function (Contd.)
▪ One alternative to the Euclidean distance is the

Manhattan or city-block metric, where the difference
between attribute values is not squared but just added up
(after taking the absolute value).
Dr.
Usman Arif
27
K-Nearest Neighbor
▪ The k-nearest-neighbor method was first described in the early
1950s.
▪ The method is labor intensive when given large training sets,

and did not gain popularity until the 1960s when increased
computing power became available.
▪ It has since been widely used in the area of pattern

recognition.
Dr.
Usman Arif
28
14
3/14/2023
K-Nearest Neighbor
Unknown record
▪ Nearest-neighbor classifiers are based on

learning by analogy, that is, by comparing a
given test tuple with training tuples that are
similar to it.
▪ The training tuples are described by n
attributes.
▪ Each tuple represents a point in an n-
dimensional space. In this way, all of the training
tuples are stored in an n-dimensional pattern
space.
Dr.
Usman Arif
29
K-Nearest Neighbor
Unknown record
▪ When given an unknown tuple, a k-nearest-

neighbor classifier searches the pattern
space for the k training tuples that are closest
to the unknown tuple.
▪ These k training tuples are the k “nearest
neighbors” of the unknown tuple.
▪ “Closeness” is defined in terms of a distance
metric, such as Euclidean distance.
Dr.
Usman Arif
30
15
3/14/2023
Majority Voting Example

▪ The test sample (green circle) should be
classified either to the first class of blue
squares or to the second class of red
triangles.
▪ If k = 3 it is assigned to the second class
because there are 2 triangles and only 1
square inside the inner circle.
▪ If k = 5 it is assigned to the first class (3
squares vs. 2 triangles inside the outer
circle). Dr.
Usman Arif
31
Example: Income Data
Dr.
Usman Arif
32
16
3/14/2023
Majority Voting
▪ Take the majority vote of class labels among the k-nearest
neighbors
▪ If K=5: Response -> No
Dr.
Usman Arif
33
Example: Continuous Attributes
Dr.
Usman Arif
34
17
3/14/2023
How to Determine the class label of a Test

Sample?
▪ Take the majority vote of class labels among the k-nearest
neighbors
▪ Weight the vote according to distance
▪ weight factor, 𝒘 = 𝟏/𝒅𝟐
Dr.
Usman Arif
35
Right Value of K
▪ A good value for K can be determined experimentally.
▪ Starting with k = 1, we use a test set to estimate the error rate of the
classifier.
▪ This process can be repeated each time by incrementing k to allow
for one more neighbor.
▪ The k value that gives the minimum error rate may be selected.
▪ In general, the larger the number of training tuples is, the larger the
value of k will be (so that classification and prediction decisions can
be based on a larger portion of the stored tuples)
Dr.
Usman Arif
36
18
3/14/2023
Example: Mammals vs. Non-mammals

K = 3, K = 5
Dr.
Usman Arif
37
Example: Play Tennis

K = 3, K = 5
Dr.
Usman Arif
38
19
3/14/2023
Example: Non-Normalized
Dr.
Usman Arif
39
Example: Normalized
Dr.
Usman Arif
40
20
3/14/2023
Low Training Time

▪ Unlike eager learning methods, lazy learners do less work when a
training tuple is presented and more work when making a
classification or prediction.
▪ Because lazy learners store the training tuples or “instances,” they

are also referred to as instance based learners, even though all
learnings are essentially based on instances.
Dr.
Usman Arif
41
High Prediction Time

▪ Although instance-based learning is simple and effective, it is often slow.
▪ The obvious way to find which member of the training set is closest to an
unknown test instance is to calculate the distance from every member of the
training set and select the smallest.
▪ This procedure is linear in the number of training instances: in other words, the
time it takes to make a single prediction is proportional to the number of training
instances.
▪ When making a classification or prediction, lazy learners can be computationally
expensive.
▪ They require efficient storage techniques and are well-suited to implementation
on parallel hardware.
Dr.
Usman Arif
42
21
3/14/2023
Limitations
▪ It tends to be slow for large training sets, because the entire set must be
searched for each test instance
▪ It performs badly with noisy data, because the class of a test instance is
determined by its single nearest neighbor without any “averaging” to help to
eliminate noise.
▪ It performs badly when different attributes affect the outcome to different

extents—in the extreme case, when some attributes are completely irrelevant—
because all attributes contribute equally to the distance formula. The effect can be
reduced by using weighted attributes
Dr.
Usman Arif
43
KNN Feature Weighting

▪ Scale each feature by its importance for classification
▪ Can use our prior knowledge about which features are more important
▪ Can learn the weights wk using cross‐validation (to be covered later)
Dr.
Usman Arif
44
22
3/14/2023
Reducing the Number of Exemplars

▪ The plain nearest-neighbor rule stores a lot of redundant exemplars: it is almost
always completely unnecessary to save all the examples seen so far.
▪ A simple variant is to classify each example with respect to the examples already
seen and to save only ones that are misclassified.
▪ Discarding correctly classified instances reduces the number of exemplars and

proves to be an effective way to prune the exemplar database.
▪ Does not work well with Noisy data
Dr.
Usman Arif
45
Pruning Noisy Exemplars

▪ Noisy exemplars inevitably lower the performance of any nearest-neighbor
scheme that does not suppress them because they have the effect of repeatedly
misclassifying new instances.
▪ A solution is to monitor the performance of each exemplar that is stored and

discard ones that do not perform well.
▪ This can be done by keeping a record of the number of correct and incorrect
classification decisions that each exemplar makes.
Dr.
Usman Arif
46
23
3/14/2023
Prediction Using Nearest Neighbor

▪ Nearest neighbor classifiers can also be used for prediction, that is, to return a
real-valued prediction for a given unknown tuple.
▪ In this case, the classifier returns the average value of the real-valued labels
associated with the k nearest neighbors of the unknown tuple.
Dr.
Usman Arif
47
24

Foundations of Data Science - Unit 5 - Accuracy KNN

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foundations of Data Science - Unit 5 - Accuracy KNN

Uploaded by

Copyright:

Available Formats

3/14/2023

Accuracy Measures & KNN

Positive and Negative Tuples

The Building Blocks

Metrics for Performance Evaluation

PREDICTED CLASS a: TP (true positive)

Metrics for Performance Evaluation…

▪ Most widely-used metric:

▪ Consider a 2-class problem

▪ If model predicts everything to be class 0, accuracy is

Cost-Sensitive Measures PREDICTED CLASS

a+c ACTUAL Class=Yes a b

l Precision is biased towards C(Yes|Yes) & C(Yes|No)

Recall and Precision

Recall and Precision

Other Performance Measures

▪ Eager learners, when given a set of training tuples, will

▪ We can think of the learned model as being ready and

Nearest Neighbor Classifiers

Instance Based Learning

Distance for Nominal Attributes

Distance for Ordinal Attributes

▪ It would seem reasonable that a product, P1, which is rated

Distance for Ordinal Attributes

▪ This assumes equal intervals between successive

Interval and Ratio Based Attributes

▪ Different interval or ratio attributes are measured on

Normalization Techniques Recall!

Similarity Between Data

▪ When comparing distances it is not necessary to perform the

Distance Function (Contd.)

▪ One alternative to the Euclidean distance is the

▪ The method is labor intensive when given large training sets,

▪ It has since been widely used in the area of pattern

▪ Nearest-neighbor classifiers are based on

▪ When given an unknown tuple, a k-nearest-

Majority Voting Example

Example: Income Data

Example: Continuous Attributes

How to Determine the class label of a Test

Example: Mammals vs. Non-mammals

Example: Play Tennis

Low Training Time

▪ Because lazy learners store the training tuples or “instances,” they

High Prediction Time

▪ It performs badly when different attributes affect the outcome to different

KNN Feature Weighting

▪ Can learn the weights wk using cross‐validation (to be covered later)

Reducing the Number of Exemplars

▪ Discarding correctly classified instances reduces the number of exemplars and

▪ Does not work well with Noisy data

Pruning Noisy Exemplars

▪ A solution is to monitor the performance of each exemplar that is stored and

Prediction Using Nearest Neighbor

You might also like