26 Regresi

~ Curve Fitting ~
Least Squares Regression
Chapter 17
1
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Curve Fitting
• Fit the best curve to a discrete data set and
obtain estimates for other data points
• Two general approaches:

– Data exhibit a significant degree of scatter
Find a single curve that represents the
general trend of the data.
– Data is very precise. Pass a curve(s)
exactly through each of the points.
• Two common applications in engineering:

Trend analysis. Predicting values of dependent
variable: extrapolation beyond data points or
interpolation between data points.
Hypothesis testing. Comparing existing

mathematical model with measured data.
2
Simple Statistics
In sciences, if several measurements are made of a particular quantity,
additional insight can be gained by summarizing the data in one or more
well chosen statistics:
Arithmetic mean - The sum of the individual data points (yi)
divided by the number of points.
y
 yi
i  1,  , n
n
Standard deviation – a common measure of spread for a sample
( y  y) 2
 i
( y  y ) 2
variance y 
2
Sy  i
or S
n 1 n 1
Sy
c.v.  100%
Coefficient of variation – y
quantifies the spread of data (similar to relative error)
3
Linear Regression
• Fitting a straight line to a
set of paired observations: e Error
(x1, y1), (x2, y2),…,(xn, yn)
yi : measured value
e : error
yi = a0 + a1 xi + e Line equation
y = a0 + a1 x
e = yi - a0 - a1 xi
a1 : slope
a0 : intercept 4
Choosing Criteria For a “Best Fit”
• Minimize the sum of the residual errors
for all available data?
n n Inadequate! Regression
e  (y
i 1
i
i 1
i  ao  a1 xi ) (see )
line
• Sum of the absolute values?

n n
 ei 
Inadequate!
i 1
y
i 1
i  a0  a1 xi
(see
)
• How about minimizing the distance that

an individual point falls from the line?
This does not work either! see 
• Best strategy is to minimize the sum of the squares of the
residuals between the measured-y and the y calculated with the
linear model:
e Error
n
S r   ei2
i 1
n
  ( yi ,measured  yi ,model ) 2
i 1
n
S r   ( yi  a0  a1 x i ) 2
i 1
• Yields a unique line for a given set of data

• Need to compute a0 and a1 such that Sr is minimized! 6
Least-Squares Fit of a Straight Line
n n
Minimize error : S r   e   ( yi  a0  a1 xi ) 2
2
i
i 1 i 1
S r
 2 ( yi  ao  a1 xi )  0   y a a xi 1 i 0
ao 0
S r
 2 ( yi  ao  a1 xi ) xi   0   y x a x a x
i i 0 i
2
1 i 0
a1
Since a 0  na0
(1) na0   xi a1   yi Normal equations which can

(2)  x a   x a   y x
i 0
2
i 1 i i
be solved simultaneously
Least-Squares Fit of a Straight Line
n n
Minimize error : S r   e   ( yi  a0  a1 xi ) 2
2
i
i 1 i 1
Normal equations which can be solved simultaneously
na0   x a   y i 1 i
 x a i 0   x a   y x 2
i 1 i i
 n

 x  a     y 
i 0 i
a 
 xi      y x  2
x i 1 i i
n xi yi   xi  yi Mean values
a1 
n x   xi 
2 2
i
using (1), a0 can be expressed as a0  y  a1 x

“Goodness” of our fit
The spread of data
(a)around the mean
(b)around the best-fit line
Notice the improvement

in the error due to linear
regression
• Sr = Sum of the squares of residuals around the regression line

• St = total sum of the squares around the mean
• (St – Sr) quantifies the improvement or error reduction due to describing data
in terms of a straight line rather than as an average value.
r : correlation coefficient St  S r S t   ( yi  y ) 2
r  2
r2 : coefficient of determination St n
S r   ( yi  a0  a1 xi ) 2
i 1
• For a perfect fit Sr=0 and r = r2 = 1
signifies that the line explains 100 percent of the variability of the data.
• For r = r2 = 0  Sr=St  the fit represents no improvement
Linearization of Nonlinear Relationships
(a) Data that is ill-suited for linear Exponential Eq.
least-squares regression
(b) Indication that a parabola may be
more suitable
y  e  x
slope  
intercept  ln 
Linearization of Nonlinear Relationships
Power Eq. Saturation

growth-rate Eq.
Data to be fit to the power equation:
y   2 x 2
 log y   2 log x  log  2
x y log x log y
1 0.5 0 -0.301
2 1.7 0.301 0.226
3 3.4 0.477 0.531
4 5.7 0.602 0.756
5 8.4 0.699 0.924
Linear Regression yields the result:

log y = 1.75*log x – 0.300
After (log y)–(log x) plot is obtained

Find  and  using:
2 = 1.75 log 2 = - 0.3 2 = 0.5
Slope   2 intercept  log  2 See Exercises.xls
Polynomial Regression
• Some engineering data is poorly represented by a straight line. A curve
(polynomial) may be better suited to fit the data. The least squares method can
be extended to fit the data to higher order polynomials.
• As an example let us consider a second order polynomial to fit the data points:
y  a0  a1 x  a2 x 2
n n
Minimize error : S r   e   ( yi  a0  a1 xi  a2 xi2 ) 2
2
i
i 1 i 1
S r
 2 ( yi  ao  a1 xi  a2 xi2 )  0 na0   xi a1   xi2 a2   yi
ao
S r
a1
 2 xi ( yi  ao  a1 xi  a2 xi2 )  0  x a   x a   x a   x y
i 0
2
i 1
3
i 2 i i
S r
 2 xi2 ( yi  ao  a1 xi  a2 xi2 )  0
a2  x a   x a   x a   x
2
i 0
3
i 1
4
i 2
2
i yi
13
Polynomial Regression
• To fit the data to an mth order polynomial, we need to solve the following
system of linear equations ((m+1) equations with (m+1) unknowns)
 n x i  x   a0    y i 
m
i
    
  xi x x   a1     xi yi 
2 m 1
i  i
          
 mm     
 i
x m
 i
x m 1
  xi  am   xi yi 
m
Matrix Form
14
***here Multiple Linear Regression
• A useful extension of linear regression is the case where y is a linear function of
two or more independent variables. For example:
y = a o + a 1x 1 + a 2x 2 + e
• For this 2-dimensional case, the regression line becomes a plane as shown in
the figure below.
15
Multiple Linear Regression
n n
Example (2 - vars) : Minimize error : S r   e   ( yi  a0  a1 x1i  a2 x2i ) 2
2
i
i 1 i 1
S r
 2 ( yi  ao  a1 x1i  a2 x2i )  0 na0   x1i a1   x2i a2   yi
ao
S r
a1
 2 x1i ( yi  ao  a1 x1i  a2 x2i )  0  x a   x a   x
1i 0
2
1i 1 x a2   x1i yi
1i 2 i
S r
 2 x2i ( yi  ao  a1 x1i  a2 x2i )  0
a2  x a   x 2i 0 1i x2 i a1   x a   x
2
2i 2 2i yi
 n

x 1i x   a0    y i 
2i
   
  x1i x x x 1i 2 i   a1     x1i yi 
2
1i
  x2 i x x x 2 
   x2i yi 
2 i   a2 
 1i 2 i
Which method would you use to solve this Linear System of Equations? 16
Multiple Linear Regression
 n

x 1i x 2i   a0    y i 
   
  x1i x x x
1i 2 i   a1     x1i yi 
2
1i
Example 17.6   x2 i x x x 2 
   x2i yi 
2 i   a2 
 1i 2 i
The following data is calculated from the equation y = 5 + 4x1 - 3x2

x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
Use multiple linear regression to fit this data.

Solution:
 6 16.5 14  a0   54  this system can be solved using
16.5 76.25 48  a   243.5 Gauss Elimination.
  1    The result is: a0=5 a1=4 and a2= -3
 14 48 54 a2   100  y = 5 + 4x1 -3x2

26 Regresi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

26 Regresi

Uploaded by

Copyright:

Available Formats

~ Curve Fitting ~

Least Squares Regression

• Two general approaches:

• Two common applications in engineering:

Hypothesis testing. Comparing existing

• Sum of the absolute values?

• How about minimizing the distance that

• Yields a unique line for a given set of data

(1) na0   xi a1   yi Normal equations which can

Normal equations which can be solved simultaneously

using (1), a0 can be expressed as a0  y  a1 x

Notice the improvement

• Sr = Sum of the squares of residuals around the regression line

Power Eq. Saturation

Linear Regression yields the result:

After (log y)–(log x) plot is obtained

The following data is calculated from the equation y = 5 + 4x1 - 3x2

Use multiple linear regression to fit this data.

You might also like