You are on page 1of 8

ECM6 Computational Methods :

Slide 1 of 5

Lecture 11b
Polynomial Regression
Brian G. Higgins
Department of Chemical Engineering & Materials Science
University of California, Davis
April 2014, Hanoi, Vietnam

ECM6Lecture11bVietnam_2014.nb

Background
We will now extend the ideas from Lecture 11a to find the best fit in a least squares sense for an
arbitrary polynomial:

P HxL = a0 + a1 x + a2 x2 + a3 x3 + + an xn
As before we will be given a set of M data points in the form

88x1 , y1 <, 8x2 , y2 <, 8x3 , y3 <, 8xM , yM <<


and our objective is to find the parameters ak that minimize the least square error. Thus
M

E2 HPL = IIa0 + a1 xk + a2 x2k + + an xnk M - yk M

k=1

We know from calculus that to minimize the least square error, the desired values of ai must satisfy

E2 HLL
ai

= 0 , i = 0, 1, , n

Note that since the xk ' s and yk ' s are constant. Thus we have n+1 linear equations for the n+1 parameters ak

ECM6Lecture11bVietnam_2014.nb

Example 1
Suppose we have the following data points
881, 5.12<, 83, 3<, 86, 2.48<, 89, 2.34<, 815, 2.18<<
and we want to find the parameters ak that give the least square error for a guess function

PHxL = a0 + a1 x + a2 x 2 + a3 x 3

Solution Step1: Define Regression Functions


We proceed as before and define the following quantities
data3 = 881, 5.12<, 83, 3<, 86, 2.48<, 89, 2.34<, 815, 2.18<<;
x@i_D := data3@@i, 1DD; y@i_D := data3@@i, 2DD; M = Length@data3D;
P@x_D := a0 + a1 x + a2 x2 + a3 x3
M

E2 @g_, M_D := Hg@x@iDD - y@iDL2


i=1

Solution Step 2: Evaluate the Least Squared Function


Evaluating the least squares function gives
E2 @P, MD
H- 5.12 + a0 + a1 + a2 + a3 L2 + H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L2 + H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L2 +
H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L2 + H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L2

Solution Step 3: Determining the Equations for Parameters ai :


To take the derivative of the sum of squares with respect to a parameter we proceed as follows
D@E2 @P, MD, a0 D
2 H- 5.12 + a0 + a1 + a2 + a3 L + 2 H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L + 2 H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L +
2 H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L + 2 H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L

Thus it is a simple matter to generate the various equations required to determine the parameters ai :

ECM6Lecture11bVietnam_2014.nb

eqns =
8D@E2 @P, MD, a0 D 0, D@E2 @P, MD, a1 D 0, D@E2 @P, MD, a2 D 0, D@E2 @P, MD, a3 D 0<
82 H- 5.12 + a0 + a1 + a2 + a3 L +
2 H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L + 2 H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L +
2 H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L + 2 H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L 0,
2 H- 5.12 + a0 + a1 + a2 + a3 L + 6 H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L +
12 H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L + 18 H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L +
30 H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L 0,
2 H- 5.12 + a0 + a1 + a2 + a3 L + 18 H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L +
72 H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L + 162 H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L +
450 H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L 0,
2 H- 5.12 + a0 + a1 + a2 + a3 L + 54 H- 3 + a0 + 3 a1 + 9 a2 + 27 a3 L +
432 H- 2.48 + a0 + 6 a1 + 36 a2 + 216 a3 L + 1458 H- 2.34 + a0 + 9 a1 + 81 a2 + 729 a3 L +
6750 H- 2.18 + a0 + 15 a1 + 225 a2 + 3375 a3 L 0<

Solution Step 4: Solving for Parameters ai :


As before we use Solve to find the values of the parameters
sol3 = Flatten@Solve@eqnsDD
8a0 6.41245, a1 - 1.55201, a2 0.181872, a3 - 0.00648415<

Solution Step 5: Determining the Least Squared Error


The least square error is
E2 @P, MD . sol3
0.120889

Using Mathematicas Fit Function


Let us compare our previous results with Mathematica's Fit function
cubicFit = FitAdata3, 91, x, x2 , x3 =, xE
6.41245- 1.55201 x + 0.181872 x2 - 0.00648415 x3

The results are the same


Now let us visualize the fit graphically
plt1 = Plot@P@xD .sol3, 8x, 0, 15<, PlotStyle 8Blue, Thick<, Frame True,
FrameLabel 8Style@"x", 16D, Style@"PHxL", 16D<, PlotRange AllD;
plt2 = ListPlot@data3, PlotStyle 8PointSize@LargeD, Red<D;
Show@plt1, plt2, PlotRange AllD
6

PHxL

ECM6Lecture11bVietnam_2014.nb

10

12

14

x
It is apparent from the plots that a cubic polynomial is not an exceptionally good fit visually. How can we
evaluate the goodness of the fit quantitatively. We leave this question for later.
Note as the order of the polynomial increases, so does its ability to oscillate.

ECM6Lecture11bVietnam_2014.nb

Example 2
Problem Statement
Consider the following set of {x,y} data
882, 0.75<, 84, 0.1875<, 85, 0.1200<, 86, 0.0833<, 88, 0.0469<<
Determine the parameter values c and n in the model

y = c xn
that gives the best fit of the data in a least squares sense.

Solution Step1: Model Transformation


By taking the natural log of the model we get

Y=m+nX
where the following definitions are used

Y = Loge HyL, X = Loge HxL, m = Loge HcL

Solution Step 2: Data Transformation


To proceed we need to transform the data
data5 = 882., 0.75<, 84., 0.1875<, 85., 0.1200<, 86., 0.0833<, 88., 0.0469<<;

We will make use of rules to transform the data


data5b = data5 . 8x_, y_< 8Log@xD, Log@yD<
880.693147, - 0.287682<, 81.38629, - 1.67398<,
81.60944, - 2.12026<, 81.79176, - 2.48531<, 82.07944, - 3.05974<<

Solution Step 3: Least Squares Formulation


As before we define functions that extract out the X and Y data , and define our model and the least
squares function
X@i_D := data5b@@i, 1DD; Y@i_D := data5b@@i, 2DD; M = Length@data5bD;
Pow@X_D := m + n X
M

E2 @g_, M_D := Hg@X@iDD - Y@iDL2


i=1

Solution Step 4: Solving Least Square Problem


It is a simple matter to determine the equations for the parameters m and n:

ECM6Lecture11bVietnam_2014.nb

eqns = 8D@E2 @Pow, MD, mD 0, D@E2 @Pow, MD, nD 0<


82 H0.287682 + m + 0.693147 nL + 2 H1.67398 + m + 1.38629 nL +
2 H2.12026 + m + 1.60944 nL + 2 H2.48531 + m + 1.79176 nL + 2 H3.05974 + m + 2.07944 nL 0,
1.38629 H0.287682 + m + 0.693147 nL + 2.77259 H1.67398 + m + 1.38629 nL +
3.21888 H2.12026 + m + 1.60944 nL + 3.58352 H2.48531 + m + 1.79176 nL +
4.15888 H3.05974 + m + 2.07944 nL 0<

We solve this set using Solve


sol = Flatten@Solve@eqnsDD
8m 1.09838, n - 1.99983<

Solution Step 5: Transforming Back to Original Variables


Finally we need to transform this parameters back to the original definitions
regressParam = c m . sol
c 2.99929

Thus the model that is a best fit of the data is

y = 2.2

.999 x-1.999 3 x-2

Note instead of using the loge to transform our power function into a linear model, we could have used
log10 or for that matter log to any base.

Solution Step 6: Visualizing the Fit


Here is the plot in terms of the transformed variables (X, Y)
plt1 = Plot@Hm + n XL . sol, 8X, 0.5, 2.6<, PlotStyle 8Blue, Thick<, Frame True,
FrameLabel 8Style@"X", 16D, Style@"Y=PHXL", 16D<, PlotRange AllD;
plt2 = ListPlot@data5b, PlotStyle 8PointSize@LargeD, Red<D;
Show@plt1, plt2, PlotRange AllD
0

Y=PHXL

-1

-2

-3

-4
0.5

1.0

1.5

2.0

X
Here is the plot in terms of the original variables (x, y)

2.5

ECM6Lecture11bVietnam_2014.nb

plt1 = PlotA3 x-2 , 8x, 2, 10<, PlotStyle 8Blue, Thick<, Frame True,
FrameLabel 8Style@"x", 16D, Style@"PHxL", 16D<, PlotRange AllE;
plt2 = ListPlot@data5, PlotStyle 8PointSize@LargeD, Red<D;
Show@plt1, plt2, PlotRange AllD
0.7
0.6
0.5

PHxL

0.4
0.3
0.2
0.1
0.0
2

10

References
These notes and the examples were adapted from the follow texts:
M. J. Maron, Numerical Analysis. A Practical Approach, 2nd Edition, Macmillan Publishing Company,
1987
A. J. Pettoprezzo, Introductory Numerical Analysis, Dover Publications, 1984