Professional Documents
Culture Documents
Supervised Learning
Supervised Learning
LEARNING
SUPERVISED LEARNING
LEC-7-8
1 Nazia Bibi
CLASSIFICATION
2
LEARNING A CLASS FROM EXAMPLES
Class C of a “family car”
Prediction: Is car x a family car?
Knowledge extraction: What do people expect from a
family car?
Positive (+) and negative (–) examples
Input representation:
3
TRAINING SET X
For each car
x 1
x
x 2
1 if x is positive
r
0 if x is negative
X {x ,r }t t N
t 1
4
5
6
7
CLASS C
Class C is defined by a
rectangle in the price-
engine power space.
8
CLASS C
10
HYPOTHESIS CLASS H
1 if h classifies x as positive
h (x )
0 if h classifies x as negative
N
E (h | X ) 1 h x t r t
t 1
11
HYPOTHESIS CLASS H How to read?
N
E (h | X ) 1 h x t r t
t 1
Error on hypothesis h
given the training set X
12
13
S, G, AND THE VERSION SPACE
h Î H, between S and G is
consistent
14
Ci for i=1,...,K
MULTIPLE CLASSES
1 if x t
Ci
X {x t ,r t }tN1 ri
t
0 if x C j , j i
t
Train hypotheses
hi(x), i =1,...,K:
1 if x t Ci
hi xt
0 if x C j , j i
t
15
Ci for i=1,...,K
MULTIPLE CLASSES
K Class problem = K – 2 class problems
16
LINEAR REGRESSION
17
EXAMPLE
19
WHAT IS LINEAR
A slope of 2 means
that every 1-unit
change in X yields a
2-unit change in Y.
20
EXAMPLE
Dataset giving the living areas and prices of 50 houses
21
EXAMPLE
We can plot this data
22
NOTATIONS
The “input” variables – x(i) (living area in this example)
The “output” or target variable that we are trying to
predict – y(i) (price)
A pair (x(i), y(i)) is called a training example
23
REGRESSION
Given a training set, to learn a function h : X → Y so
that h(x) is a “good” predictor for the corresponding
value of y. For historical reasons, this function h is
called a hypothesis.
24
CHOICE OF HYPOTHESIS
Decision
How to represent the hypothesis h
For linear regression – we assume that the hypothesis is
Linear
h( x ) 0 1 x
25
HYPOTHESIS
Generally we’ll have more than one input features
x1=Living area h( x) 0 1 x1 2 x2
x2 = # of bedrooms 26
HYPOTHESIS
Hypothesis
h( x) 0 1 x1 2 x2
To show dependence on θ:
h ( x) 0 1 x1 2 x2
OR
h( x | ) 0 1 x1 2 x2
Define x0 1 h ( x) 0 x0 1 x1 2 x2
2
h ( x ) i xi θs are called the parameters and
i 0 are real numbers
For n features
n Job of learning alogrithm to
h ( x ) i xi T X find or learn these
i 0
parameters
28
CHOSING THE REGRESSION LINE
Which of these
lines to chose?
Y Y
X X 29
y h ( x) 0 1 x
yˆi
Error or residual yˆi yi
yi
xi X 30
CHOSING THE REGRESSION LINE
How to chose this
best fit line
m
min (h ( x (i ) ) y ( i ) ) 2
Y
i 1
X
31
min J ( )
GRADIENT DESCENT
Chose initial values of θ0 and θ1 and continue moving the
direction of steepest descente
J(θ)
32
θ0
θ1
GRADIENT DESCENT
Chose initial values of θ0 and θ1 and continue moving the
direction of steepest descente
The step size is controlled by a parameter called learning
rate
Starting point is
important
33
MODEL SELECTION
g x w 1x w 0
Life is not as simple as Linear
g x w 2x 2 w 1x w 0
Non-Linear Regression
34
GENERALIZATION
Generalization: How well a model performs on new data
Overfitting:
The chosen hypothesis is too complex
For example: Fitting a 3rd order polynomial on linear data
Underfitting:
The chosen hypothesis is too simple
For example: Fitting a line on a quardatic function
35
CROSS VALIDATION
37
SUMMARY
Model
h ( x) or h x|
Loss Function
m
E ( | x) J ( ) (h ( x (i ) ) y ( i ) ) 2
i 1
Optimization
min E ( | x)
38
COVARIANCE
n
( x X )( y
i i Y )
cov ( x , y ) i 1
n 1
39
CORRELATION COEFFICIENT
Pearson’s Correlation Coefficient is standardized
covariance:
cov( x, y )
r
var x var y
40
CORRELATION COEFFICIENT
Measures the relative strength of the linear relationship
between two variables
Unit-less
41
CORRELATION COEFFICIENT
Y Y
X X
r = -0.8 r = -0.6
Y
Y Y
42
X X
r = +0.8 r = +0.2
CORRELATION COEFFICIENT
Strong relationships Weak relationships
Y Y
X X
Y Y
43
X X
CORRELATION ANALYSIS
n n
(ai A)(bi B ) (ai bi ) n A B
rA, B i 1
i 1
(n) A B (n) A B
A B
where n is the number of tuples, and are the respective means of A and
B, σA and σB are the respective standard deviation of A and B, and Σ(aibi) is
the sum of the AB cross-product.
44
COVARIANCE
Covariance is similar to correlation
Correlation coefficient:
(2, 5), (3, 8), (5, 10), (4, 11), (6, 14).
Question: If the stocks are affected by the same industry trends, will their
prices rise or fall together?
48