Linear Regression

Machine Learning
Linear Regression
Na Lu
Xi’an Jiaotong University
Machine Learning
• Machine learning: Field of study that gives

computers the ability to learn without being
explicitly programmed.
– By Arthur Samuel (1959)
Machine Learning
• Well-posed learning problem: A computer

program is said to learn from experience E with
respect to some task T and some performance
measure P, if its performance on T, as
measured by P, improves with experience E.
– By Tom Mitchell (1998)
Question
Three types of learning
• Supervised learning
• Learn to predict an output when given an input
vector
• Reinforcement learning
• Learn to select an action to maximize payoff.
• Unsupervised learning
• Discover a good internal representation of the
input and find out what it could be.
Two types of supervised learning
• Each training case consists of an input vector x and a
target output y.
• Regression: The target output is a real number or a
whole vector of real numbers.
– The price of a stock.
– The temperature during a day.
– Aim: to get as close as you can to the real number.
• Classification: The target output is a class label.
– The simplest case is a choice between 1 and 0.
– Facial identities with multiple labels.
– Aim: to classify the input into the correct category.
How does supervised learning work
• Start by choosing a model-class: y = f (x; W)

– A model class f is a mapping of the input vector x to
the predicted output y with parameters in W.
• The target of learning is to reduce the discrepancy
between the model predicted output and the actual
output based on given input-output pairs.
– L2 norm (least square) is a widely used measure of
the discrepancy in case of regression.
– L2 norm and other particular measures could be used
in case of classification.
Reinforcement learning
• The output is an action or a sequence of actions; and the
supervisory information is an occasional scalar reward.
– The goal of the action selection is to maximize the
expected future rewards.
– A discount factor is employed to incorporate the
delayed rewards.
• Difficulties in reinforcement learning:
– The delayed rewards make it hard to know when we
went wrong.
– A scalar reward dose not supply sufficient information.
– Not many parameters could be learnt from
reinforcement learning as could be done in
supervised and unsupervised learning.
Unsupervised learning
• The aim of unsupervised learning is hard to be

defined.
– One major aim is to find an internal
representation of the input for subsequent
supervised learning or reinforcement learning.
– The related research focus on clustering.
– Unsupervised learning has been ignored for
about 40 years by the machine learning
community.
Other goals of unsupervised learning
• To provides a compact, low-dimensional

representation of the input.
– High-dimensional inputs typically reside on or
near a low-dimensional manifold (or several
such manifolds).
– Principle component analysis is such a
representative linear method.
• To find the clusters from the input.
– Clusters could be interpreted as a very sparse
code where only one feature is nonzero.
Question
Supervised learning
• Data from Portland, Oregon State, US
• Supervised learning: right answer given

• Regression: predict continuous valued output (house
price)
Supervised learning
• Breast cancer (malignant or benign?)
• Supervised learning: correct labels given

• Classification: Discrete valued output (0, 1)
Supervised learning
• More than one feature considered
– Clump thickness
– Uniformity of cell size
– Uniformity of cell shape
– ……
Unsupervised learning
Question
Applications
Cocktail party problem
Cocktail party problem
• Microphone 1 Output 1
Output 2
[W, s v] =svd((repmat(sum(x.*x,1),size(x,1),1).*x*x’);
Problem
500
Housing Prices
(Portland, OR) 400
300
Price 200
(in 1000s
of dollars) 100
0
0 1000 2000 3000
Size (feet2)
Supervised Learning Regression Problem

Given the “right answer” for Predict real-valued output
each example in the data.
Training set of Size in feet2 Price ($) in
housing prices (x) 1000's (y)
(Portland, OR) 2104 460
1416 232
1534 315
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
(x, y) : one training example

(x(i),y(i)): ith training example
How do we represent h ?
Training Set
) θ 0 + θ1 x
hθ ( x=
Learning Algorithm
y
Size of Estimate
h x
house d price
Linear regression with one variable.
Univariate linear regression.
Question
Linear regression
with one variable
Cost function
Training Set Size in feet2 Price ($) in
(x) 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
: Parameters
How to choose s?
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
1 m
y
J (θ 0 ,θ1 )
= ∑
2m i =1
( hθ ( x (i )
) − y (i ) 2
)
x Minimize J (θ 0 ,θ1 )
θ0 ,θ1
Idea: Choose so that

is close to for
our training examples
Question
Linear regression
with one variable
Cost function
intuition I
Hypothesis: Simplified
Parameters:
Cost Function:
Goal:
(for fixed , this is a function of x) (function of the parameter )
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 m
J (θ1 ) ∑
2m i =1
(hθ ( x (i ) ) − y (i ) ) 2
1 m
= ∑
2m i =1
(θ1 x (i )
− y (i ) 2
)
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
=
J (0.5) [(0.5 − 1) 2 + (1 − 2) 2 + (1.5 − 3) 2 ]
6
3.5
= ≈ 0.58
6
3 3
2 2
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 2
J (0)= [1 + 22 + 32 ]
6
14
= ≈ 2.3
6
Question
Hypothesis:
Parameters:
Cost Function:
Goal:
(for fixed , this is a function of x) (function of the parameters )
500
400
Price ($)
in 1000’s 300
200
100
0
0 1000 2000 3000
Size in feet2 (x)
Linear regression
with one variable
Gradient descent
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
J(θ0,θ1)
θ1
θ0
J(θ0,θ1)
θ1
θ0
Gradient descent algorithm
Correct: Simultaneous update Incorrect:

Question
Linear regression
with one variable
Gradient descent
intuition
If α is too small, gradient
descent can be slow.
If α is too large, gradient descent

can overshoot the minimum. It
may fail to converge, or even
diverge.
Question
at local optima
Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α
fixed.
As we approach a local
minimum, gradient
descent will
automatically take
smaller steps. So, no
need to decrease α over
time.
Linear regression
with one variable
Gradient descent for

linear regression
Gradient descent algorithm Linear Regression Model
∂ 1 m
∑
∂θ j 2m i =1
( hθ ( x (i )
) − y (i ) 2
)
∂ 1 m
= ∑
∂θ j 2m i =1
(θ 0 + θ1 x (i )
− y (i ) 2
)
1 m
∑
m i =1
( hθ ( x (i )
) − y (i )
)
1 m
∑
m i =1
( hθ ( x (i )
) − y (i )
) ⋅ x (i )
update
and
simultaneously
J(θ0,θ1)
θ1
θ0
J(θ0,θ1)
θ1
θ0
“Batch” Gradient Descent
“Batch”: Each step of gradient

descent uses all the training
examples.
Question
The End
Thank you

Linear Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression

Uploaded by

Copyright:

Available Formats

Machine Learning

• Machine learning: Field of study that gives

• Well-posed learning problem: A computer

• Start by choosing a model-class: y = f (x; W)

• The aim of unsupervised learning is hard to be

• To provides a compact, low-dimensional

• Data from Portland, Oregon State, US

• Supervised learning: right answer given

• Breast cancer (malignant or benign?)

• Supervised learning: correct labels given

• More than one feature considered

Supervised Learning Regression Problem

(x, y) : one training example

Idea: Choose so that

Correct: Simultaneous update Incorrect:

If α is too large, gradient descent

Gradient descent for

“Batch”: Each step of gradient

You might also like