You are on page 1of 17

~ Curve Fitting ~

Least Squares Regression

Chapter 17

1
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Curve Fitting
• Fit the best curve to a discrete data set and
obtain estimates for other data points

• Two general approaches:


– Data exhibit a significant degree of scatter
Find a single curve that represents the
general trend of the data.
– Data is very precise. Pass a curve(s)
exactly through each of the points.

• Two common applications in engineering:


Trend analysis. Predicting values of dependent
variable: extrapolation beyond data points or
interpolation between data points.

Hypothesis testing. Comparing existing


mathematical model with measured data.
2
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Simple Statistics
In sciences, if several measurements are made of a particular quantity,
additional insight can be gained by summarizing the data in one or more
well chosen statistics:
Arithmetic mean - The sum of the individual data points (yi)
divided by the number of points.
y
 yi
i  1,  , n
n
Standard deviation – a common measure of spread for a sample
( y  y) 2
 i
( y  y ) 2

variance y 
2
Sy  i
or S
n 1 n 1
Sy
c.v.  100%
Coefficient of variation – y
quantifies the spread of data (similar to relative error)
3
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linear Regression
• Fitting a straight line to a
set of paired observations: e Error
(x1, y1), (x2, y2),…,(xn, yn)

yi : measured value
e : error

yi = a0 + a1 xi + e Line equation
y = a0 + a1 x
e = yi - a0 - a1 xi

a1 : slope
a0 : intercept 4

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Choosing Criteria For a “Best Fit”
• Minimize the sum of the residual errors
for all available data?
n n Inadequate! Regression

e  (y
i 1
i
i 1
i  ao  a1 xi ) (see )
line

• Sum of the absolute values?


n n

 ei 
Inadequate!
i 1
y
i 1
i  a0  a1 xi
(see
)

• How about minimizing the distance that


an individual point falls from the line?
This does not work either! see 
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Best strategy is to minimize the sum of the squares of the
residuals between the measured-y and the y calculated with the
linear model:
e Error
n
S r   ei2
i 1
n
  ( yi ,measured  yi ,model ) 2
i 1
n
S r   ( yi  a0  a1 x i ) 2
i 1

• Yields a unique line for a given set of data


• Need to compute a0 and a1 such that Sr is minimized! 6
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line

n n
Minimize error : S r   e   ( yi  a0  a1 xi ) 2
2
i
i 1 i 1

S r
 2 ( yi  ao  a1 xi )  0   y a a xi 1 i 0
ao 0

S r
 2 ( yi  ao  a1 xi ) xi   0   y x a x a x
i i 0 i
2
1 i 0
a1

Since a 0  na0

(1) na0   xi a1   yi Normal equations which can


(2)  x a   x a   y x
i 0
2
i 1 i i
be solved simultaneously

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line
n n
Minimize error : S r   e   ( yi  a0  a1 xi ) 2
2
i
i 1 i 1

Normal equations which can be solved simultaneously

na0   x a   y i 1 i

 x a i 0   x a   y x 2
i 1 i i

 n

 x  a     y 
i 0 i
a 
 xi      y x  2
x i 1 i i

n xi yi   xi  yi Mean values
a1 
n x   xi 
2 2
i

using (1), a0 can be expressed as a0  y  a1 x


Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
“Goodness” of our fit
The spread of data
(a)around the mean
(b)around the best-fit line

Notice the improvement


in the error due to linear
regression

• Sr = Sum of the squares of residuals around the regression line


• St = total sum of the squares around the mean
• (St – Sr) quantifies the improvement or error reduction due to describing data
in terms of a straight line rather than as an average value.
r : correlation coefficient St  S r S t   ( yi  y ) 2
r  2

r2 : coefficient of determination St n
S r   ( yi  a0  a1 xi ) 2
i 1
• For a perfect fit Sr=0 and r = r2 = 1
signifies that the line explains 100 percent of the variability of the data.
• For r = r2 = 0  Sr=St  the fit represents no improvement
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships
(a) Data that is ill-suited for linear Exponential Eq.
least-squares regression
(b) Indication that a parabola may be
more suitable
y  e  x

slope  

intercept  ln 

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships

Power Eq. Saturation


growth-rate Eq.

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data to be fit to the power equation:
y   2 x 2
 log y   2 log x  log  2

x y log x log y
1 0.5 0 -0.301
2 1.7 0.301 0.226
3 3.4 0.477 0.531
4 5.7 0.602 0.756
5 8.4 0.699 0.924

Linear Regression yields the result:


log y = 1.75*log x – 0.300

After (log y)–(log x) plot is obtained


Find  and  using:
2 = 1.75 log 2 = - 0.3 2 = 0.5
Slope   2 intercept  log  2 See Exercises.xls
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Polynomial Regression
• Some engineering data is poorly represented by a straight line. A curve
(polynomial) may be better suited to fit the data. The least squares method can
be extended to fit the data to higher order polynomials.
• As an example let us consider a second order polynomial to fit the data points:

y  a0  a1 x  a2 x 2
n n
Minimize error : S r   e   ( yi  a0  a1 xi  a2 xi2 ) 2
2
i
i 1 i 1

S r
 2 ( yi  ao  a1 xi  a2 xi2 )  0 na0   xi a1   xi2 a2   yi
ao
S r
a1
 2 xi ( yi  ao  a1 xi  a2 xi2 )  0  x a   x a   x a   x y
i 0
2
i 1
3
i 2 i i

S r
 2 xi2 ( yi  ao  a1 xi  a2 xi2 )  0
a2  x a   x a   x a   x
2
i 0
3
i 1
4
i 2
2
i yi
13
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Polynomial Regression
• To fit the data to an mth order polynomial, we need to solve the following
system of linear equations ((m+1) equations with (m+1) unknowns)

 n x i  x   a0    y i 
m
i
    
  xi x x   a1     xi yi 
2 m 1
i  i
          
 mm     
 i
x m
 i
x m 1
  xi  am   xi yi 
m

Matrix Form

14
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
***here Multiple Linear Regression
• A useful extension of linear regression is the case where y is a linear function of
two or more independent variables. For example:
y = a o + a 1x 1 + a 2x 2 + e
• For this 2-dimensional case, the regression line becomes a plane as shown in
the figure below.

15
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Multiple Linear Regression
n n
Example (2 - vars) : Minimize error : S r   e   ( yi  a0  a1 x1i  a2 x2i ) 2
2
i
i 1 i 1

S r
 2 ( yi  ao  a1 x1i  a2 x2i )  0 na0   x1i a1   x2i a2   yi
ao
S r
a1
 2 x1i ( yi  ao  a1 x1i  a2 x2i )  0  x a   x a   x
1i 0
2
1i 1 x a2   x1i yi
1i 2 i

S r
 2 x2i ( yi  ao  a1 x1i  a2 x2i )  0
a2  x a   x 2i 0 1i x2 i a1   x a   x
2
2i 2 2i yi

 n

x 1i x   a0    y i 
2i
   
  x1i x x x 1i 2 i   a1     x1i yi 
2
1i
  x2 i x x x 2 
   x2i yi 
2 i   a2 
 1i 2 i

Which method would you use to solve this Linear System of Equations? 16
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Multiple Linear Regression
 n

x 1i x 2i   a0    y i 
   
  x1i x x x
1i 2 i   a1     x1i yi 
2
1i

Example 17.6   x2 i x x x 2 
   x2i yi 
2 i   a2 
 1i 2 i

The following data is calculated from the equation y = 5 + 4x1 - 3x2


x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27

Use multiple linear regression to fit this data.


Solution:
 6 16.5 14  a0   54  this system can be solved using
16.5 76.25 48  a   243.5 Gauss Elimination.
  1    The result is: a0=5 a1=4 and a2= -3
 14 48 54 a2   100  y = 5 + 4x1 -3x2

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

You might also like