Supervised Learning

MACHINE
LEARNING
SUPERVISED LEARNING
LEC-7-8
1 Nazia Bibi
CLASSIFICATION
2
LEARNING A CLASS FROM EXAMPLES
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Positive (+) and negative (–) examples
 Input representation:
x1: price, x2 : engine power
3
TRAINING SET X
For each car
x 1 
x 
x 2 
 1 if x is positive
r 
0 if x is negative
For N training examples
X  {x ,r }t t N
t 1
4
5
6
7
CLASS C
 p 1  price  p 2  AND  e 1  engine power  e 2 
For suitable values of

p1,p2,e1 and e2
Class C is defined by a
rectangle in the price-
engine power space.
8
CLASS C
 p 1  price  p 2  AND e 1  engine power  e 2 
This equation fixes the hypothesis class H – the set of

rectangles
Learning algorithm to find a particular hypothesis h Є H to
approximate C as closely as possible
Expert defines the hypothesis class

Algorithm finds the parameters
9
 “False positive" is when a good quality item gets
rejected
 “False negative" is when a poor quality item gets

accepted.
10
HYPOTHESIS CLASS H
 1 if h classifies x as positive
h (x )  
0 if h classifies x as negative
Training Error: Predictions of h

which do not match the
required values in X
   
N
E (h | X )   1 h x t  r t
t 1
11
HYPOTHESIS CLASS H How to read?
   
N
E (h | X )   1 h x t  r t
t 1
Error on hypothesis h
given the training set X
12
13
S, G, AND THE VERSION SPACE
Most specific hypothesis, S

Most general hypothesis, G
h Î H, between S and G is
consistent
and make up the

version space
14
Ci for i=1,...,K
MULTIPLE CLASSES

 1 if x t
 Ci
X  {x t ,r t }tN1 ri  
t
0 if x  C j , j  i
t
Train hypotheses
hi(x), i =1,...,K:
1 if x t  Ci
 
hi xt 
0 if x  C j , j  i
t
15
Ci for i=1,...,K
MULTIPLE CLASSES
K Class problem = K – 2 class problems
Positive examples for

class : Luxury Sedan
Rest ALL – Negative

examples
16
LINEAR REGRESSION
17
EXAMPLE
David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m

Victoria Beckham: 1.68m Angelina Jolie: 1.70m Laura Bush: ?
 To predict height of the wife in a couple, based on the husband’s height

 Response (out come or dependent) variable (Y):
 height of the wife
 Predictor (explanatory or independent) variable (X): 18
 height of the husband
WHAT IS LINEAR
 Remember this?
19
WHAT IS LINEAR
 A slope of 2 means
that every 1-unit
change in X yields a
2-unit change in Y.
20
EXAMPLE
 Dataset giving the living areas and prices of 50 houses
21
EXAMPLE
 We can plot this data
Given data like this,

how can we learn to
predict the prices of
other houses as a
function of the size of
their living areas?
22
NOTATIONS
 The “input” variables – x(i) (living area in this example)
 The “output” or target variable that we are trying to
predict – y(i) (price)
 A pair (x(i), y(i)) is called a training example
 A list of m training examples {(x(i), y(i)); i =
 1, . . . ,m}—is called a training set
 X denote the space of input values, and Y the space of

output values
23
REGRESSION
Given a training set, to learn a function h : X → Y so
that h(x) is a “good” predictor for the corresponding
value of y. For historical reasons, this function h is
called a hypothesis.
24
CHOICE OF HYPOTHESIS
 Decision
 How to represent the hypothesis h
 For linear regression – we assume that the hypothesis is
Linear
h( x )   0  1 x
25
HYPOTHESIS
 Generally we’ll have more than one input features
x1=Living area h( x)   0  1 x1   2 x2
x2 = # of bedrooms 26
HYPOTHESIS
 Hypothesis
h( x)   0  1 x1   2 x2
 To show dependence on θ:
h ( x)   0  1 x1   2 x2
 OR
h( x |  )   0  1 x1   2 x2
This is the price that the hypothesis predicts for a given

27
house with living area x1 and number of bedrooms x2
HYPOTHESIS
h ( x)   0  1 x1   2 x2
 For conciseness
Define x0  1 h ( x)   0 x0  1 x1   2 x2
2
h ( x )    i xi θs are called the parameters and
i 0 are real numbers
 For n features
n Job of learning alogrithm to
h ( x )    i xi   T X find or learn these
i 0
parameters
28
CHOSING THE REGRESSION LINE
Which of these
lines to chose?
Y Y
X X 29
y  h ( x)   0  1 x

The predicted value is:
yˆ i  h ( xi )   0  1 xi
Y The true value for xi is yi
yˆi
Error or residual yˆi  yi
yi
Consider this point xi
xi X 30
How to chose this
best fit line
m
min  (h ( x (i ) )  y ( i ) ) 2
Y 
i 1
Minimize the sum of the squared

In other words: (why squared?) distances of the
How to chose points (Yi’s) from the line for the m
the θs training examples
X
31
min J ( )

GRADIENT DESCENT
 Chose initial values of θ0 and θ1 and continue moving the
direction of steepest descente
J(θ)
32
θ0
θ1
GRADIENT DESCENT
 Chose initial values of θ0 and θ1 and continue moving the
direction of steepest descente
 The step size is controlled by a parameter called learning
rate
 Starting point is
important
33
MODEL SELECTION
g  x   w 1x  w 0
Life is not as simple as Linear
g  x   w 2x 2  w 1x  w 0
 Non-Linear Regression
Higher order polynomial
34

GENERALIZATION
 Generalization: How well a model performs on new data
 Overfitting:
 The chosen hypothesis is too complex
 For example: Fitting a 3rd order polynomial on linear data
 Underfitting:
 The chosen hypothesis is too simple
 For example: Fitting a line on a quardatic function
35
CROSS VALIDATION
 To estimate generalization error, we need data unseen

during training. We split the data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
Chose the hypothesis that is best on the validation

set – Cross Validation
36
CROSS VALIDATION
 Example: Find the right order of polynomial in
regression?
 Use the training set to estimate the coefficients
 Caclulate the errors on the validation set
 Chose the one with the least validation error
 Question: What is the expected error of the chosen

model?
 Can NOT use the validation error
 The validation data has been used to chose the model –
effectively a part of training
 Use the TEST data set
37
SUMMARY
 Model
h ( x) or h  x| 
 Loss Function
m
E ( | x)  J ( )   (h ( x (i ) )  y ( i ) ) 2
i 1
 Optimization
min E ( | x)

38
COVARIANCE
n
 ( x  X )( y
i i Y )
cov ( x , y )  i 1
n 1
cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent
39
CORRELATION COEFFICIENT
 Pearson’s Correlation Coefficient is standardized
covariance:
cov( x, y )
r
var x var y
40
 Measures the relative strength of the linear relationship
between two variables
 Unit-less
 Ranges between –1 and 1
 The closer to –1, the stronger the negative linear

relationship
 The closer to 1, the stronger the positive linear relationship
 The closer to 0, the weaker any linear relationship
41
Y Y
X X
r = -0.8 r = -0.6
Y
Y Y
42
X X
r = +0.8 r = +0.2
Strong relationships Weak relationships
Y Y
X X
Y Y
43
X X
CORRELATION ANALYSIS
 Correlation coefficient (also called Pearson’s product moment

coefficient)
 
n n
(ai  A)(bi  B ) (ai bi )  n A B
rA, B  i 1
 i 1
(n) A B (n) A B
A B
where n is the number of tuples, and are the respective means of A and
B, σA and σB are the respective standard deviation of A and B, and Σ(aibi) is
the sum of the AB cross-product.
44
COVARIANCE
 Covariance is similar to correlation
Correlation coefficient:
where n is the number of tuples, and A B

are the
respective mean or expected values of A and B, σA and σB
are the respective standard deviation of A and B.
45
CO-VARIANCE: AN EXAMPLE
 It can be simplified in computation as

CO-VARIANCE: AN EXAMPLE
 Suppose two stocks A and B have the following values in one week:
(2, 5), (3, 8), (5, 10), (4, 11), (6, 14).
 Question: If the stocks are affected by the same industry trends, will their
prices rise or fall together?
 E(A) = (2 + 3 + 5 + 4 + 6)/ 5 = 20/5 = 4

 E(B) = (5 + 8 + 10 + 11 + 14) /5 = 48/5 = 9.6
 Cov(A,B) = (2×5+3×8+5×10+4×11+6×14)/5 − 4 × 9.6 = 4
 Thus, A and B rise together since Cov(A, B) > 0.
ACKNOWLEDGEMENTS
 Dr. Imran Siddiqi, Bahria University, Islamabad
 Machine Learning, Andrew Ng – Stanford University
 Machine Intelligence, Dr M. Hanif, UET, Lahore
 Lecture Slides, Introduction to Machine Learning, E.

Alpyadin, MIT Press.
48

Supervised Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised Learning

Uploaded by

Copyright:

Available Formats

MACHINE

x1: price, x2 : engine power

For N training examples

 p 1  price  p 2  AND  e 1  engine power  e 2 

For suitable values of

 p 1  price  p 2  AND e 1  engine power  e 2 

This equation fixes the hypothesis class H – the set of

Expert defines the hypothesis class

 “False negative" is when a poor quality item gets

Training Error: Predictions of h

Most specific hypothesis, S

and make up the

Positive examples for

Rest ALL – Negative

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m

 To predict height of the wife in a couple, based on the husband’s height

Given data like this,

 A list of m training examples {(x(i), y(i)); i =

 1, . . . ,m}—is called a training set

 X denote the space of input values, and Y the space of

This is the price that the hypothesis predicts for a given

CHOSING THE REGRESSION LINE

Y The true value for xi is yi

Consider this point xi

Minimize the sum of the squared

Higher order polynomial

 To estimate generalization error, we need data unseen

Chose the hypothesis that is best on the validation

 Question: What is the expected error of the chosen

cov(X,Y) > 0 X and Y are positively correlated

 Ranges between –1 and 1

 The closer to –1, the stronger the negative linear

 The closer to 0, the weaker any linear relationship

 Correlation coefficient (also called Pearson’s product moment

where n is the number of tuples, and A B

 It can be simplified in computation as

 E(A) = (2 + 3 + 5 + 4 + 6)/ 5 = 20/5 = 4

 Machine Intelligence, Dr M. Hanif, UET, Lahore

 Lecture Slides, Introduction to Machine Learning, E.

You might also like