Slide 4 - Linear Regression With Multiple Variables

Linear'Regression'with'
mul2ple'variables'
Mul2ple'features'
Machine'Learning'
2
Mul4ple%features%(variables).%
Size%(feet2)% Price%($1000)%
% %
% %
2104' 460'
1416' 232'
1534' 315'
852' 178'
…' …'
Andrew'Ng'
3
Mul4ple%features%(variables).%
Size%(feet2)% Number%of% Number%of% Age%of%home% Price%($1000)%
% bedrooms% floors% (years)% %
% % % % %
2104' 5' 1' 45' 460'
1416' 3' 2' 40' 232'
1534' 3' 2' 30' 315'
852' 2' 1' 36' 178'
…' …' …' …' …'
Nota2on:'
='number'of'features'
='input'(features)'of''''''''training'example.'
='value'of'feature''''in''''''''training'example.'
Andrew'Ng'
4
Hypothesis:'
'Previously:'
Andrew'Ng'
5
For'convenience'of'nota2on,'define''''''''''''''''.'
1 by (n+1)
matrix
Mul2variate'linear'regression.'
Andrew'Ng'
6
mul2ple'variables'
Gradient'descent'for'
mul2ple'variables'
Machine'Learning'
8
Hypothesis:'
Parameters:'
n+1 dimensional vector
Cost'func2on:'
Gradient'descent:'
Repeat'
(simultaneously'update'for'every'''''''''''''''''''''''')'
Andrew'Ng'
9
Until Convergence
mul2ple'variables'
Gradient'descent'in'
prac2ce'I:'Feature'Scaling'
Machine'Learning'
11
Feature%Scaling% Practical tricks for making gradient descent work well
Idea:'Make'sure'features'are'on'a'similar'scale.'
E.g.'''''''='size'(0X2000'feet2)' size'(feet2)'
''''''''''''''='number'of'bedrooms'(1X5)'
problem with
number'of'bedrooms'
two features
if you plot the

if you run gradient contours of the
descents on this cost function J a useful thing to do is to scale the features
cos-function, your of theta (theta
gradients may end 0 =0), it can
up taking a long take on this the contours
time before it can very very may look more
skewed elliptical like circles. And
finally find its way g r a d i e n t
to the global shape
descent can
minimum. fi n d a m u c h
more direct path
to the global
minimum
Andrew'Ng'
12
13
Feature%Scaling%
Get'every'feature'into'approximately'a'''''''''''''''''''''''''''range.'
Andrew'Ng'
14
Average value
(Max - Min)
4
mul2ple'variables'
Gradient'descent'in'
prac2ce'II:'Learning'rate'
Machine'Learning'
16
Gradient%descent%
X  “Debugging”:'How'to'make'sure'gradient'
descent'is'working'correctly.'
X  How'to'choose'learning'rate'''''.'
Andrew'Ng'
17
Making%sure%gradient%descent%is%working%correctly.%
J(θ) should decrease

after every iteration
Example'automa2c'
convergence'test:'
Declare'convergence'if'''''''
decreases'by'less'than'''''''
in'one'itera2on.'
0' 100' 200' 300' 400' Deciding this threshold may be hard
No.'of'itera2ons' number of iterations Gradient descent takes
to converge depends on the application
Andrew'Ng'
18
Making%sure%gradient%descent%is%working%correctly.%
Gradient'descent'not'working.''
Use'smaller''''.''
No.'of'itera2ons'
No.'of'itera2ons' Theta
No.'of'itera2ons'
X  For'sufficiently'small''''','''''''''''''should'decrease'on'every'itera2on.'
X  But'if''''''is'too'small,'gradient'descent'can'be'slow'to'converge.'
Andrew'Ng'
19
Summary:%
X  If'''''is'too'small:'slow'convergence.'
X  If'''''is'too'large:'''''''''may'not'decrease'on'
every'itera2on;'may'not'converge.' Slow Converge also
possible
To'choose'''','try'
Andrew'Ng'
mul2ple'variables'
Features'and'
polynomial'regression'
Machine'Learning'
21
Housing%prices%predic4on%
Land area
sometimes by defining new features you might actually get a better model.
Andrew'Ng'
22
It doesn't look like a straight line fits this data very well.
Polynomial%regression%
quadratic model
Price'
(y)'
Cubic Function
Size'(x)'
Feature scaling is important
Andrew'Ng'
23
Choice%of%features%
Price'
(y)'
Size'(x)'
Andrew'Ng'
24
mul2ple'variables'
Normal'equa2on'
Machine'Learning'
26
Gradient'Descent'
iterative algorithm that takes many steps,
multiple iterations of gradient descent to
converge to the global minimum.
Normal'equa2on:'Method'to'solve'for''
analy2cally.' For some linear regression problems, Normal equation will
give us a much better way to solve for the optimal value of
the parameters theta.
Andrew'Ng'
27
Intui2on:'If'1D' Example: Theta is just
a scalar value
The way to minimize this quadratic

Solve for θ function is to set derivatives equal
to zero.
(for'every''')'
Solve'for''
Andrew'Ng'
28
Examples:''
add an extra column Size%(feet2)% Number%of% Number%of% Age%of%home% Price%($1000)%
% bedrooms% floors% (years)% %
% % % % %
1' 2104' 5' 1' 45' 460'
1' 1416' 3' 2' 40' 232'
1' 1534' 3' 2' 30' 315'
1' 852' 2' 1' 36' 178'
m x (n+1) dimensional matrix

m dimensional vector
Andrew'Ng'
29
%%%%%%training%examples,%%%%%features.%
Gradient'Descent' Normal'Equa2on'
No need to do features scaling
•  Need'to'choose''''.'' •  No'need'to'choose''''.'
•  Needs'many'itera2ons.' •  Don’t'need'to'iterate.'
•'   Works'well'even' •  Need'to'compute'
when'''''is'large.'
•  Slow'if'''''is'very'large.'
Andrew'Ng'
To summarize, so long as the number of features is not too large, the normal equation 30
gives us a great alternative method to solve for the parameter theta. Concretely, so long
as the number of features is less than 1000, normal equation method can be used
rather than gradient descent.
As we get to the more complex learning algorithm, for example, when we talk about
classification algorithm, like a logistic regression algorithm, the normal equation method
actually do not work for those more sophisticated learning algorithms, and, we will have
to resort to gradient descent for those algorithms.
So, gradient descent is a very useful algorithm to know. The linear regression will have a
large number of features and for some of the other algorithms, because, for them, the
normal equation method just doesn't apply and doesn't work. But for this specific model
of linear regression, the normal equation can give you an alternative that can be much
faster, than gradient descent.
So, depending on the detail of the problems and how many features that you have, both
of these algorithms are well worth knowing about.

Slide 4 - Linear Regression With Multiple Variables

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slide 4 - Linear Regression With Multiple Variables

Uploaded by

Copyright:

Available Formats

Linear'Regression'with'

if you plot the

J(θ) should decrease

Feature scaling is important

The way to minimize this quadratic

m x (n+1) dimensional matrix

You might also like