You are on page 1of 24

3/14/2023

Data Science
Unit 5

Accuracy Measures & KNN

1
3/14/2023

Outline
▪ Accuracy Measures
▪ Eager and Lazy Learners
▪ Distance Computation and Normalization
▪ K-Nearest Neighbor
▪ Cross Validation

Dr.
Spring 2023 Muhammad 3
Usman Arif

Positive and Negative Tuples


▪ Positive tuples (tuples of the main class of interest)
▪ Negative tuples (all other tuples).
▪ For example,
▪ The positive tuples may be buys computer = yes
▪ The negative tuples are buys computer = no.

Dr.
Spring 2023 Muhammad 4
Usman Arif

2
3/14/2023

The Building Blocks


▪ True positives TP: These refer to the positive tuples that were correctly labeled
by the classifier as positive. Let TP be the number of true positives.
▪ True negatives TN: These are the negative tuples that were correctly labeled by
the classifier as negative. Let TN be the number of true negatives.
▪ False positives FP: These are the negative tuples that were incorrectly labeled
as positive (e.g., tuples of class buys computer = no for which the classifier
predicted buys computer = yes). Let FP be the number of false positives.
▪ False negatives FN: These are the positive tuples that were mislabeled as
negative (e.g., tuples of class buys computer = yes for which the classifier
predicted buys computer = no). Let FN be the number of false negatives.

Dr.
Spring 2023 Muhammad 5
Usman Arif

Metrics for Performance Evaluation


▪ Focus on the predictive capability of a model
▪ Rather than how fast it takes to classify or build models, scalability, etc.

▪ Confusion Matrix:

PREDICTED CLASS a: TP (true positive)


Class=Yes Class=No b: FN (false negative)
Class=Yes a b c: FP (false positive)
ACTUAL (TP) (FN)
CLASS d: TN (true negative)
Class=No c d
(FP) (TN)
Dr.
Spring 2023 Muhammad 6
Usman Arif

3
3/14/2023

Metrics for Performance Evaluation…


PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)

▪ Most widely-used metric:

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
Dr.
Spring 2023 Muhammad 7
Usman Arif

Limitation of Accuracy

▪ Consider a 2-class problem


▪ Number of Class 0 examples = 9990
▪ Number of Class 1 examples = 10

▪ If model predicts everything to be class 0, accuracy is


9990/10000 = 99.9 %
▪ Accuracy is misleading because model does not detect any class
1 example

Dr.
Spring 2023 Muhammad 8
Usman Arif

4
3/14/2023

Cost-Sensitive Measures PREDICTED CLASS

a
Precision (p) =
Class=Yes Class=No

a+c ACTUAL Class=Yes a b


CLASS (TP) (FN)
a
Recall (r) = Class=No c d
a+b (FP) (TN)

2rp 2a
F - measure (F) = =
r + p 2a + b + c
wa + w d
Weighted Accuracy = 1 4

wa + wb+ wc + w d
1 2 3 4

l Precision is biased towards C(Yes|Yes) & C(Yes|No)


l Recall is biased towards C(Yes|Yes) & C(No|Yes)
l F-measure is biased towards all except C(No|No)
Dr.
Spring 2023 Muhammad 9
Usman Arif

a
Precision (p) =
a+c
Recall and Precision a
Recall (r) =
Actual Prediction a+b
T T
2rp 2a
T F F - measure (F) = =
r + p 2a + b + c
F T
F F PREDICTED CLASS
F T
Class=Yes Class=No
T T
T T ACTUAL Class=Yes a b
CLASS (TP) (FN)
T F
F T Class=No c d
(FP) (TN)
T T
Dr.
Spring 2023 Muhammad 10
Usman Arif

10

5
3/14/2023

Recall and Precision


Actual Prediction ▪ Precision = 4 / 7
T T
T F
F T
F F
F T
T T
T T
T F
F T
T T
Dr.
Spring 2023 Muhammad 11
Usman Arif

11

Recall and Precision


Actual Prediction ▪ Recall = 4 / 6

T T ▪ Precision = 4 / 7
T F ▪ F-Measure = 8 / 13
F T
F F
F T
T T
T T
T F
F T
T T
Dr.
Spring 2023 Muhammad 12
Usman Arif

12

6
3/14/2023

Other Performance Measures


▪ Speed: This refers to the computational costs involved in
generating and using the given classifier.
▪ Robustness: This is the ability of the classifier to make correct
predictions given noisy data or data with missing values.
▪ Scalability: This refers to the ability to construct the classifier
efficiently given large amounts of data.
▪ Interpretability: This refers to the level of understanding and
insight that is provided by the classifier or predictor.

Dr.
Spring 2023 Muhammad 13
Usman Arif

13

K- Nearest Neighbor
Dr.
Spring 2023 Muhammad
14
Usman Arif

14

7
3/14/2023

Eager Learners

▪ Eager learners, when given a set of training tuples, will


construct a generalization (i.e., classification) model
before receiving new (e.g., test) tuples to classify.

▪ We can think of the learned model as being ready and


eager to classify previously unseen tuples.

Dr.
Spring 2023 Muhammad 15
Usman Arif

15

Lazy Learners
▪ Imagine a contrasting lazy approach
▪ The learner instead waits until the last minute before doing any model
construction in order to classify a given test tuple.
▪ When given a training tuple, a lazy learner simply stores it (or does only
a little minor processing) and waits until it is given a test tuple.
▪ Only when it sees the test tuple does it perform generalization in order
to classify the tuple based on its similarity to the stored training tuples.

Dr.
Spring 2023 Muhammad 16
Usman Arif

16

8
3/14/2023

Nearest Neighbor Classifiers


▪ Basic idea:
▪ If it walks like a duck, quacks like a duck, then it’s probably a duck

Dr.
Spring 2023 Muhammad 17
Usman Arif

17

Instance Based Learning


▪ In instance-based learning the training examples are stored
verbatim, and a distance function is used to determine which
member of the training set is closest to an unknown test
instance.
▪ Once the nearest training instance has been located, its class is
predicted for the test instance.
▪ The only remaining problem is defining the distance function, and
that is not very difficult to do, particularly if the attributes are
numeric.

Dr.
Spring 2023 Muhammad 18
Usman Arif

18

9
3/14/2023

Distance for Nominal Attributes


▪ For categorical attributes, a simple method is to compare the
corresponding value of the attribute in tupleX1 with that in tupleX2.
▪ If the two are identical (e.g., tuplesX1 and X2 both have the color
blue), then the difference between the two is taken as 0.
▪ If the two are different (e.g., tupleX1 is blue but tupleX2 is red),
then the difference is considered to be 1.

Dr.
Spring 2023 Muhammad 19
Usman Arif

19

Distance for Ordinal Attributes


▪ Consider an attribute that measures the quality of a product, e.g., a
candy bar, on the scale:
▪ {poor, fair, OK, good, wonderful}.

▪ It would seem reasonable that a product, P1, which is rated


wonderful, would be closer to a product P2, which is rated good,
than it would be to a product P3, which is rated OK.
▪ To make this observation quantitative, the values of the ordinal
attribute are often mapped to successive integers, beginning at 0 or
1, e.g.,
▪ {poor=0, fair=1, OK=2, good=3, wonderful=4}
Dr.
Spring 2023 Muhammad 20
Usman Arif

20

10
3/14/2023

Distance for Ordinal Attributes

▪ This assumes equal intervals between successive


values of the attribute, and this is not necessarily
so.
▪ Otherwise, we would have an interval or ratio attribute.
▪ But in practice, our options are limited, and in the
absence of more information, this is the standard
approach for defining proximity between ordinal
attributes.
Dr.
Spring 2023 Muhammad 21
Usman Arif

21

Interval and Ratio Based Attributes

▪ Absolute Difference

Dr.
Spring 2023 Muhammad 22
Usman Arif

22

11
3/14/2023

Normalization

▪ Different interval or ratio attributes are measured on


different scales
▪ The effects of some attributes might be completely
dwarfed by others that had larger scales of measurement.
▪ Consequently, it is usual to normalize all attribute values
to lie between 0 and 1

Dr.
Spring 2023 Muhammad 23
Usman Arif

23

Normalization Techniques Recall!

▪ Min-Max Normalization

▪ Decimal Scaling
▪ v’(i) = v(i) / 10k
▪ For the smallest k such that max |v’(i)|< 1.

Dr.
Spring 2023 Muhammad 24
Usman Arif

24

12
3/14/2023

Similarity Between Data


Objects
Dr.
Spring 2023 Muhammad
25
Usman Arif

25

Distance Function
▪ Although there are other possible choices, most instance-based
learners use Euclidean distance.
▪ Euclidean distance between two examples.
▪ P = [p1, p2, p3,.., pn]
▪ Q = [q1, q2, q3,..., qn]
▪ The Euclidean distance between X and Y is defined as:

▪ When comparing distances it is not necessary to perform the


square root operation; the sums of squares can be compared
directly.

Dr.
Spring 2023 Muhammad 26
Usman Arif

26

13
3/14/2023

Distance Function (Contd.)

▪ One alternative to the Euclidean distance is the


Manhattan or city-block metric, where the difference
between attribute values is not squared but just added up
(after taking the absolute value).

Dr.
Spring 2023 Muhammad 27
Usman Arif

27

K-Nearest Neighbor
▪ The k-nearest-neighbor method was first described in the early
1950s.

▪ The method is labor intensive when given large training sets,


and did not gain popularity until the 1960s when increased
computing power became available.

▪ It has since been widely used in the area of pattern


recognition.

Dr.
Spring 2023 Muhammad 28
Usman Arif

28

14
3/14/2023

K-Nearest Neighbor
Unknown record

▪ Nearest-neighbor classifiers are based on


learning by analogy, that is, by comparing a
given test tuple with training tuples that are
similar to it.
▪ The training tuples are described by n
attributes.
▪ Each tuple represents a point in an n-
dimensional space. In this way, all of the training
tuples are stored in an n-dimensional pattern
space.
Dr.
Spring 2023 Muhammad 29
Usman Arif

29

K-Nearest Neighbor
Unknown record

▪ When given an unknown tuple, a k-nearest-


neighbor classifier searches the pattern
space for the k training tuples that are closest
to the unknown tuple.
▪ These k training tuples are the k “nearest
neighbors” of the unknown tuple.
▪ “Closeness” is defined in terms of a distance
metric, such as Euclidean distance.

Dr.
Spring 2023 Muhammad 30
Usman Arif

30

15
3/14/2023

Majority Voting Example


▪ The test sample (green circle) should be
classified either to the first class of blue
squares or to the second class of red
triangles.
▪ If k = 3 it is assigned to the second class
because there are 2 triangles and only 1
square inside the inner circle.
▪ If k = 5 it is assigned to the first class (3
squares vs. 2 triangles inside the outer
circle). Dr.
Spring 2023 Muhammad 31
Usman Arif

31

Example: Income Data

Dr.
Spring 2023 Muhammad 32
Usman Arif

32

16
3/14/2023

Majority Voting
▪ Take the majority vote of class labels among the k-nearest
neighbors
▪ If K=5: Response -> No

Dr.
Spring 2023 Muhammad 33
Usman Arif

33

Example: Continuous Attributes

Dr.
Spring 2023 Muhammad 34
Usman Arif

34

17
3/14/2023

How to Determine the class label of a Test


Sample?
▪ Take the majority vote of class labels among the k-nearest
neighbors
▪ Weight the vote according to distance
▪ weight factor, 𝒘 = 𝟏/𝒅𝟐

Dr.
Spring 2023 Muhammad 35
Usman Arif

35

Right Value of K
▪ A good value for K can be determined experimentally.
▪ Starting with k = 1, we use a test set to estimate the error rate of the
classifier.
▪ This process can be repeated each time by incrementing k to allow
for one more neighbor.
▪ The k value that gives the minimum error rate may be selected.
▪ In general, the larger the number of training tuples is, the larger the
value of k will be (so that classification and prediction decisions can
be based on a larger portion of the stored tuples)
Dr.
Spring 2023 Muhammad 36
Usman Arif

36

18
3/14/2023

Example: Mammals vs. Non-mammals


K = 3, K = 5

Dr.
Spring 2023 Muhammad 37
Usman Arif

37

Example: Play Tennis


K = 3, K = 5

Dr.
Spring 2023 Muhammad 38
Usman Arif

38

19
3/14/2023

Example: Non-Normalized

Dr.
Spring 2023 Muhammad 39
Usman Arif

39

Example: Normalized

Dr.
Spring 2023 Muhammad 40
Usman Arif

40

20
3/14/2023

Low Training Time


▪ Unlike eager learning methods, lazy learners do less work when a
training tuple is presented and more work when making a
classification or prediction.

▪ Because lazy learners store the training tuples or “instances,” they


are also referred to as instance based learners, even though all
learnings are essentially based on instances.

Dr.
Spring 2023 Muhammad 41
Usman Arif

41

High Prediction Time


▪ Although instance-based learning is simple and effective, it is often slow.
▪ The obvious way to find which member of the training set is closest to an
unknown test instance is to calculate the distance from every member of the
training set and select the smallest.
▪ This procedure is linear in the number of training instances: in other words, the
time it takes to make a single prediction is proportional to the number of training
instances.
▪ When making a classification or prediction, lazy learners can be computationally
expensive.
▪ They require efficient storage techniques and are well-suited to implementation
on parallel hardware.
Dr.
Spring 2023 Muhammad 42
Usman Arif

42

21
3/14/2023

Limitations
▪ It tends to be slow for large training sets, because the entire set must be
searched for each test instance

▪ It performs badly with noisy data, because the class of a test instance is
determined by its single nearest neighbor without any “averaging” to help to
eliminate noise.

▪ It performs badly when different attributes affect the outcome to different


extents—in the extreme case, when some attributes are completely irrelevant—
because all attributes contribute equally to the distance formula. The effect can be
reduced by using weighted attributes

Dr.
Spring 2023 Muhammad 43
Usman Arif

43

KNN Feature Weighting


▪ Scale each feature by its importance for classification
▪ Can use our prior knowledge about which features are more important

▪ Can learn the weights wk using cross‐validation (to be covered later)

Dr.
Spring 2023 Muhammad 44
Usman Arif

44

22
3/14/2023

Reducing the Number of Exemplars


▪ The plain nearest-neighbor rule stores a lot of redundant exemplars: it is almost
always completely unnecessary to save all the examples seen so far.

▪ A simple variant is to classify each example with respect to the examples already
seen and to save only ones that are misclassified.

▪ Discarding correctly classified instances reduces the number of exemplars and


proves to be an effective way to prune the exemplar database.

▪ Does not work well with Noisy data

Dr.
Spring 2023 Muhammad 45
Usman Arif

45

Pruning Noisy Exemplars


▪ Noisy exemplars inevitably lower the performance of any nearest-neighbor
scheme that does not suppress them because they have the effect of repeatedly
misclassifying new instances.

▪ A solution is to monitor the performance of each exemplar that is stored and


discard ones that do not perform well.

▪ This can be done by keeping a record of the number of correct and incorrect
classification decisions that each exemplar makes.

Dr.
Spring 2023 Muhammad 46
Usman Arif

46

23
3/14/2023

Prediction Using Nearest Neighbor


▪ Nearest neighbor classifiers can also be used for prediction, that is, to return a
real-valued prediction for a given unknown tuple.

▪ In this case, the classifier returns the average value of the real-valued labels
associated with the k nearest neighbors of the unknown tuple.

Dr.
Spring 2023 Muhammad 47
Usman Arif

47

24

You might also like