You are on page 1of 21

Machine Learning

1
Machine Learning Algorithms
• Regression

• Classification
• Classical Algorithms
• Decision Trees
• CBR
• Neutral Networks
• Ensemble of Classifiers

• Clustering
• Classical Algorithms
• Density-based clustering

2
The missing value
Regression will be
a) 50
b) 58
Linear Regression c) 55
80

70

60

50

40

30

20

10

0
0 2 4 6 8 10 12 14 16

Regression will always give you a REAL number as result


3
What is this?

2.0

1.8

1.6
Y
1.4

1.2

1.0

0 2 4 6 8 10 12 14 16 18
X

4
Classification

Non Spam

Filter
(Rich) Spam

5
Classification

Features/Paramet
ers:
Meaning input
which help you to
make a decision

Classification:
Output is a choice
among a finite
option

6
What does this image contain?

7
What does this image contain?

The missing is supervisor in this ..

This kind of ML is called clustering (we only have input part


and do not know output)
8
Clustering

9
Nature of Available Data
• Features

• Will you take course taught by Dr. Hamza


• Yes
Output
• No

• What will matter in the right decision?

Input

10
Very Accurate
Information

11
Incomplete
Information

Efficient Solution or
Correct Solution

12
13
Nearest Neighbor Method (KNN Algorithm)
• Basic Idea: Similar Inputs have similar Outputs
• This may work in some situations but may not work at other situations
• Classification Rule: for a test input x, assign the most common label amongst its K
most similar (nearest) training inputs

14
Nearest Neighbor Method (K-NN Algorithm)
• Formal Definition: Assuming x to be our test points, lets denote the
set of the k nearest neighbors of x as Sx

Every point that is in D but not in Sx is at least as far from x as the


furthest point in Sx

15
Nearest Neighbor Method (K-NN Algorithm)
• We define the classifier h() as a function returning the most common
label in
ℎ 𝑥 𝑥 = 𝑚𝑜𝑑𝑒 ({𝑦 ′′ : (𝑥 ′′ , 𝑦 ′′ ) ∈ 𝑆𝑥 })

• Where mode() means to select the label of highest occurrence

• So what to do if there is a draw?


Keep k odd or return the result of k-NN with a smaller k

16
K-NN Decision Boundary

17
K-NN Decision Boundary
One side of line one label

Other side is other label

What happens in this


case??

Then you can choose


anyone you like.. Same in
both cases

18
K-NN Algorithm Properties
• A supervised, non-parametric algorithm
• It does not make any assumptions about the underlying distribution nor tries to
estimate it
• There are no parameters to train like in regression/Bayes
• Parameters allow models to make predictions
• There is a hyper parameter k, that needs to be tuned
• Hyper parameters help with the learning/prediction process

• Used for classification and regression


• Classification: Chose the most frequent class label amongst k-nearest neighbor
• Regression: Take an average over the output values of the k-nearest neighbors and
assign to the test points – e.g. maybe weighted
• Instance based: instead of performing explicit generalization, form hypotheses by
comparing new problem with training instances
19
K-NN Algorithm Properties
• Lazy learner:
• Delay computations on training data until a query is made, as opposed
to eager learning
• Good for continuously updated training data like recommender systems
• Slower to evaluate and need to store the whole training data

20
Classification

Examples
GPA A+/F/C
fruits Appple/Organge
good,bad,honest

21

You might also like