You are on page 1of 28

Engineering Computation

Curve Fitting 1

Curve Fitting By
Least-Squares Regression

and
Spline Interpolation

Part 7

Engineering Computation

Curve fitting

Curve Fitting:
Given a set of points:
- experimental data
- tabular data
- etc.
Fit a curve (surface) to the points so that we can easily evaluate
f(x) at any x of interest.

If x within data range


interpolating (generally safe)
If x outside data range
extrapolating (often dangerous)

Engineering Computation

Curve fitting

Curve Fitting:

Two main methods will be covered:


1. Least-Squares Regression
Function is "best fit" to data.
Does not necessarily pass through points.
Used for scattered data (experimental)
Can develop models for analysis/design.
2. Interpolation
Function passes through all (or most) points.
Interpolates values of well-behaved (precise) data or for
geometric design.

Engineering Computation

Curve Fitting & Interpolation

Curve Fitting:
1. We have discussed Least-Squares Regression where the
function is "best fit" to points but does not necessarily pass
through the points.

2. We now discuss Interpolation & Extrapolation


The function passes through all (or at least most) points.

interpolation

extrapolation

Engineering Computation

Least Squares Regression: General Procedure

Engineering Computation

Least Squares Regression

Curve Fitting by Least-Squares Regression:

Objective:
Obtain low order approximation (curve or surface) that
"best fits" data
Note:
Because the order of the approximation is < the number
of data points, the curve or surface can not pass through
all points.
We will need a consistent criterion for determining the
"best fit."
Typical Usage:
Scattered (experimental) data
Develop empirical models for analysis/design.

Engineering Computation

Least Squares Regression

Least-Squares Regression:

1. In laboratory, apply x, measure y, tabulate data.


2. Plot data and examine the relationship.
y

yi

xi

Engineering Computation

Least Squares Regression

Least-Squares Regression:

1. In laboratory, apply x, measure y, tabulate data.


2. Plot data and examine the relationship.
y

yi

xi

Engineering Computation

Least Squares Regression

Least-Squares Regression:
3. Develop a "model" an approximate relationship between y
and x:
y = mx +b
4. Use the model to predict or estimate y for any given x.
5. "Best fit" of the data requires:
Optimal way of finding parameters (e.g., slope and
intercept of a straight line.
Perhaps optimize the selection of the model form
(i.e., linear, quadratic, exponential, ...).
That the magnitudes of the residual errors do not vary in
any systematic fashion. [In statistical applications, the
residual errors should be independent and identically
distributed.]

Engineering Computation

Least Squares Regression

10

Least-Squares Regression
Given: n data points: (x1,y1), (x2,y2), (xn,yn)
Obtain: "Best fit" curve:
f(x) =a0 Z0(x) + a1 Z1(x) + a2 Z2(x)++ am Zm(x)
ai's are unknown parameters of model
Zi's are known functions of x.
We will focus on two of the many possible types of regression
models:
Simple Linear Regression
Z0(x) = 1 & Z1(x) = x
General Polynomial Regression
Z0(x) = 1, Z1(x)= x, Z2(x) = x2, , Zm(x)= xm

b = REGRESS(y,X) returns the vector of regression


coefficients, b, in the linear model y = Xb,
(X is an nxp matrix, y is the nx1 vector of observations).
[B,BINT,R,RINT,STATS] = REGRESS(y,X,alpha) uses the input,
ALPHA to calculate 100(1 - ALPHA) confidence intervals for
B and the residual vector, R, in BINT and RINT
respectively. The vector STATS contains the R-square
statistic along with the F and p values for the regression.

>> x=linspace(0,1,20);
>> y=2*x+1+0.1*randn(20,1);
>> plot(x,y,'.')
>> xx=[ones(20,1), x];
>> b=regress(y,xx)
b=
>> yy=xx*b;
1.0115
>> hold on
1.9941
>> plot(x,yy,k-')

Engineering Computation

Least Squares Regression: General Procedure

12

Least Squares Regression (cont'd):


General Procedure:
For the ith data point, (xi,yi) we find the set of coefficients for
which:

yi = a0 Z0(xi) + a1 Z1(xi) .... + am Zm (xi) + ei


where ei is the residual error = the difference between reported
value and model:
ei = yi a0Z0 (xi) a1Z1 (x)i amZm (xi)
Our "best fit" will minimize the total sum of the squares of the
residuals:
n

Sr ei2
i 1

Engineering Computation

Least Squares Regression: General Procedure

13

measured
value

y
yi

ei
modeled
value

x
xi
Our "best fit" will be the function which minimizes the sum of
squares of the residuals:

Sr

i 1

yi
Sr
ei2
a jZ j (x i )

i 1
i 1
j1

yi a 0 Z0 (xi ) a1Z1(x i ) a 2 Z2 (x i ) a m Zm (x i ) 2
n

Engineering Computation

Least Squares Regression: General Procedure

14

Least Squares Regression (cont'd):


n

Sr e

2
i

i 1

( yi a0 Z0 ( xi ) am Z m ( xi ))
i 1

To minimize this expression with respect to the unknowns a0, a1


am take derivatives of Sr and set them to zero:
n

Sr
2
a 0

Z (x )(y a Z (x ) ... a
0

Sr
2
a1

i 1
n

Z (x )(y a Z (x ) ... a

Sr
2
a m

m Zm (x i ))

m Zm (x i ))

i 1
n

Z
i 1

m (x i )(yi

a 0 Z0 (x i ) ... a m Zm (x i ))

Engineering Computation

Least Squares: Linear Algebra

Least Squares Regression (cont'd):


In Linear Algebra form:
{Y} = [Z] {A} + {E} or {E} = {Y} [Z] {A}
where: {E} and {Y} --- n x 1
[Z] -------------- n x (m+1)
{A} ------------- (m+1) x 1
n = # points

(m+1) = # unknowns

{E}T = [e1 e2 ... en],

{Y}T = [y1 y2 ... yn],


{A}T = [a0 a1 a2 ... am]

Z0 x1
Z x
Z 0 2

Z0 x n

Z0 x1
Z1x 2
Z1x n

Zm x1
Zm x 2

Zm x n

15

Engineering Computation

Least Squares: Sum Square error 16

Least Squares Regression (cont'd):


{E} = {Y} [Z]{A}
Then
Sr = {E}T{E} = ({Y}[Z]{A})T ({Y}[Z]{A})
= {Y}T{Y} {A}T[Z]T{Y} {Y}T[Z]{A} + {A}T[Z]T[Z]{A}
= {Y}T{Y} 2 {A}T[Z]T{Y} + {A}T[Z]T[Z]{A}
Setting

or

=0

for i =1,...,n yields:

S r
= 0 = 2 [Z]T[Z]{A} 2 [Z]T{Y}
a i

[Z]T[Z]{A} = [Z]T{Y}

Engineering Computation

Least Squares: Normal Equations

Least Squares Regression (cont'd):

[Z]T[Z]{A} = [Z]T{Y}

(C&C Eq. 17.25)

This is the general form of Normal Equations.


They provides (m+1) equations in (m+1) unknowns.

(Note that we end up with a system of linear equations.)

17

Engineering Computation

Least Squares: Simple Linear Regression

Simple Linear Regression (m = 1):


Given: n data points, (x1,y1),(x2,y2),(xn,yn)
with n > 2
Obtain: "Best fit" curve:
f(x) = a0 + a1x
from the n equations:
y1 = a0 + a1x1 + e1
y2 = a0 + a1x2 + e2
yn = a0 + a1xn + en

Or, in matrix form: [Z]T[Z] {A} = [Z]T{Y}

1 1
x x
2
1

1 x1
1 1 x 2 a 0 1 1

xn
a1 x1 x 2

1
x
n

y1
1 y 2

x n
yn

18

Engineering Computation

Least Squares: Simple Linear Regression

19

Simple Linear Regression (m = 1):

Normal Equations
[Z]T[Z] {A} = [Z]T{Y}
upon multiplying the matrices become

n
xi

i1

xi
yi
a 0 i1

i 1

n
n
a1

2
xi
x i yi

i 1

i1

Normal Equations
for Linear Regression
C&C Eqs. (17.4-5)
(This form works well
for spreadsheets.)

Engineering Computation

Least Squares: Simple Linear Regression

20

Simple Linear Regression (m = 1):


[Z]T[Z] {A} = [Z]T{Y}
Solving for {a}:
n

n
a1

x y x y
i i

i 1

i 1

i 1

2
xi

i 1

i 1
2

xi

1
yi a1
n
i 1
y a1x

1
a0
n

C&C equations (17.6) and (17.7)

i 1

xi

Engineering Computation

Least Squares: Simple Linear Regression

21

Simple Linear Regression (m = 1):


[Z]T[Z] {A} = [Z]T{Y}
A better version of the first normal equation is:
n

y y x x
i

a1

i 1

2
x

i
i 1

which is easier and numerically more stable, but the 2nd equation
remains the same:

a 0 y a1x

ENGRD 241 / CEE 241: Engineering Computation

Curve Fitting 22

Common Nonlinear Relations:

Objective: Use linear equations for simplicity.


Remedy: Transform data into linear form and perform
regressions.
Given: data which appears as:
(1) exponential-like curve:

y = a1eb1x

(e.g., population growth, radioactive decay,


attenuation of a transmission line)
Can also use: ln(y) = ln(a1) + b1x

ENGRD 241 / CEE 241: Engineering Computation

Curve Fitting 23
5

Common Nonlinear Relations:

(2) Power-like curve:

y = a2x

ln(y) = ln(a2) + b2 ln(x)

b2
2

a3=5
b3=1..10

20

(3) Saturation growth-rate curve

40

60

80

a3x
y=
b3 + x

population growth under limiting conditions


1
1 1 b3 1
= a3 + b3 = +
y
x a3 a3 x

Be careful about the implied distribution of the errors. Always


use the untransformed values for error analysis.

100

Engineering Computation

Goodness of fit

Major Points in Least-Squares Regression:


1. In all regression models one is solving an
overdetermined system of equations, i.e., more
equations than unknowns.
2. How good is the fit?
Often based on a coefficient of determination, r2

24

Engineering Computation

Goodness of fit

r2 Compares the average spread of the data about the


regression line compared to the spread of the data about
the mean.
Spread of the data around the regression line:

Sr ei2 ( yi y 'i )2
Spread of the data around the mean:

St

(yi - y) 2

25

Engineering Computation

Goodness of fit

26

Coefficient of determination
describes how much of variance is explained by
the regression equation

St - Sr
r =
St
2

Want r2 close to 1.0.


Doesn't work if models have different numbers of parameters.
Be careful when using different transformations always do the
analysis on the untransformed data.

Engineering Computation

Standard Errpr of the estimate

27

Precision :
If the spread of the points around the line is of similar
magnitude along the entire range of the data,
Then one can use

sy

Sr

n (m 1)

= standard error of the estimate


(standard deviation in y)

to describe the precision of the regression estimate (in which m+1


is the number of coefficients calculated for the fit, e.g., m+1=2 for
linear regression)

Engineering Computation

Standard Errpr of the estimate

28

Statistics
Chapra and Canale in sections PT5.2, 17.1.3 and 17.4.3 discuss
the statistical interpretation of least squares regression and some
of the associated statistical concepts.
The statistical theory of least squares regression is elegant,
powerful, and widely used in the analysis of real data
throughout the sciences.
See Lecture Notes pages X-14 through X-16.

You might also like