You are on page 1of 28

1 | P a g e

Table of Contents
Fundamentals of Curve Fitting
4
Linear Least-Squares Regression
.. 4
Linearization
.
5
Polynomial & Generalized Linear Least-Squares Regression
6
Polynomial Interpolation/Function Approximation
8
Lagrange Interpolation Formula
8
Newton Interpolating Polynomial
.. 10
Hermite Interpolation Polynomial
15
Optimization: Two or More Variables
.. 19
Extrapolation

21
Taylor Series
.
21
Trapezoid Rule

22
Romberg Integration
24
2 | P a g e
References

28
3 | P a g e
1. Fundamentals of Curve Fitting
We are often presented with discrete data for which we desire a more continuous
representation. For instance, given a set of empirical measurements, we might predict
some sort of functional relationship between the relevant parameters of the experiment by
fitting a curve to the points. As such, we are more interested in capturing the overall
trend in the data rather than capturing the actual values of each and every point in the
data set. The most common approach to curve fitting is least-squares regression, and the
process of fitting a curve to discrete data is often referred to as a regression analysis.
Regression is generally a highly statistical endeavor. The more traditional sources of
numerical error like roundoff are not as important in regression analyses, since bigger
issues exist with regards to the suitability of the functions to fit.
1.1 Linear Least-Squares Regression
Consider a set of N paired data points {x
i
,y
i
}, i [1,N]. We wish to fit a straight line
through the points, represented functionally as y =mx +b. If we evaluate the latter
function at every given x
i
, we can construct a set of predicted values to compare to the
given y
i
. The differences between the given y
i
and the predicted values are referred to as
the residuals e
i
(note we are careful not to refer to these as errors). We need to find the
values of m and b such that we somehow minimize the collection of all of these residuals.
The least-squares approach finds the best values by minimizing the sum of the squares
of the residuals S
r
:
N N
2 2
r i i i
i 1 i 1
S e ( y mx b)



(1.1.1.1)
Only the values of m and b are unknown in (1.1.1.1), so a minimization of S
r
must be
with respect to these parameters. In other words, (1.1.1.1) is essentially a function of m
and b alone once we have chosen on a data set to fit. To minimize a function of multiple
variables, we must set all first partial derivatives to zero:
N N
r 2
i i i i i
i 1 i 1
m
S
( y mx b) 2 [( y mx b) x ] 0
m

(1.1.1.2)
N N
r 2
i i i i
i 1 i 1
b
S
( y mx b) 2 ( y mx b) 0
b

(1.1.1.3)
Simplifying and distributing the summations, we arrive at two simultaneous linear
equations for m and b:
4 | P a g e
2
i i i i
b x m x x y + (1.1.4)
i i
b N m x y +
(1.1.5)
The limits on the summation symbols have been omitted for simplicity in (1.1.4) and
(1.1.5), and will continue as such henceforth. A solution of (1.1.4) and (1.1.5) yields:
i i i i
2 2
i i
N x y x y
m
N x ( x )


(1.1.6)
mean mean
b y m x
(1.1.7)
Where x
mean
and y
mean
are the mean (average) values of all x
i
and all y
i
, respectively. The
straight line constructed with these values for m and b will be the best fit to the data
amongst all other straight lines, but not necessarily amongst all functions. It is
certainly possible that some other function may fit the data better, by achieving a smaller
result for S
r
. The goodness of fit indicates how well a function represents the
variability in any given data set, and provides a relative measure of suitability. Goodness
of fit is usually quantified by the correlation coefficient r, defined as:
t r 2
t i mean
t
S S
r , where S ( y y )
S

(1.1.8)
The correlation coefficient is derived by statistical reasoning, and the expression shown
in (8) can actually be applied to any function, linear or not. A function that achieves a
value of r = 1 constitutes a perfect fit to the data, explaining 100% of the variability seen
(and in fact would pass through every given point exactly, yielding S
r
= 0). A function
with r =0 explains none of the variability, and would be a very poor fit. In relative terms,
therefore, we may say that one function is a better fit than another if it has a correlation
coefficient closer to 1. It is this concept of the goodness of fit that predominates
discussions of error in curve fitting, rather than regular numerical errors.
1.2 Linearization
The foregoing process of linear regression is a lot more powerful than it may seem at
first. We can actually use the results to fit a much wider variety of functions than just
straight lines, provided we can put the given function in a linear form by appropriate
modifications. Consider fitting a power-law representation like y = x

. Rather than
starting from the beginning, we can linearize the function by taking the logarithm of both
sides, which gives (any base will work; here we use log
10
):
10 10 10 10
log ( y x ) log ( y) log ( x) log ( )

+
(1.2.1)
Thus, instead of fitting y = x

to the original data set {x


i
,y
i
}, we will fit y =mx +b to
the modified set {log
10
(x
i
), log
10
(y
i
)} using (1.1.6) and (1.1.7). We then have the
5 | P a g e
corresponding values of and from (9) as =m, =10
b
. Or, suppose we wish to
use y = e

x
. This we linearize by taking the natural log:
x
ln( y e ) ln( y) x ln( )

+ (1.2.2)
Thus, instead of fitting y = e

x
to the original data set {x
i
,y
i
}, we will fit y =mx +b to
the modified set {x
i
, ln(y
i
)} using (1.1.6) and (1.1.7). We then have the corresponding
values of and from (10) as =m, = e
b
.
Where applicable, this linearization process can significantly simplify regression
analyses. Of course not all nonlinear functions can be linearized, so other methods must
be used to fit these to discrete data. Some can still be analyzed in the context of a
generalized linear least-squares regression, which fits nonlinear functions by
formulating a linear system for solution. Polynomial regression is an prime example of
the method, and is discussed in the next section. The most general approach of all
follows from generalized optimization methods, in which the residual function S
r
is
minimized forthright by purely numerical means. The Solver add-in for Excel is a very
versatile tool that can fit any function to any set of data simply by formulating a target
cell with the calculated value of S
r
to minimize (indeed, the Solver utility can do much
more than this as well).
Like several of the more powerful features of Excel and other Microsoft Office products,
Solver is the creation of a third-party provider (as is Equation Editor). It is nothing more
than a graphical user interface to a previously developed numerical method.
1.3 Polynomial & Generalized Linear Least-Squares
Regression
Higher-order polynomials are very frequently used to fit data sets with curvature that
cannot be properly captured by straight lines alone. Although polynomials of order two
and above are nonlinear functions, their least-squares formulations mimic the simpler
linear case in that linear systems result for solution. Consider fitting the function
y =a +bx +cx
2
to the given data set {x
i
,y
i
}. S
r
is formulated for this case as:
N N
2 2 2
r i i i i
i 1 i 1
S e ( y a bx cx )



(1.3.1)
To minimize (1.3.1) with respect to the parameters a, b, and c we must satisfy three
simultaneous equations:
N N
r 2 2 2
i i i i i i
i 1 i 1
a
S
( y a bx cx ) 2 ( y a bx cx ) 0
a

(1.3.2)
N N
r 2 2 2
i i i i i i i
i 1 i 1
b
S
( y a bx cx ) 2 [( y a bx cx ) x ] 0
b

(1.3.3)
6 | P a g e
N N
r 2 2 2 2
i i i i i i i
i 1 i 1
c
S
( y a bx cx ) 2 [( y a bx cx ) x ] 0
c

(1.3.4)
Following some re-arrangement:
2
i i i
a N b x c x y + + (1.3.5)
2 3
i i i i i
a x b x c x x y + + (1.3.6)
2 3 4 2
i i i i i
a x b x c x x y + + (1.3.7)
Thus, the problem of determining a least-squares second-order polynomial is equivalent
to solving a system of three simultaneous linear equations. By extension, an m
th
-order
polynomial results from the solution of m+1 simultaneous linear equations. It is in this
way that the nonlinear polynomial functions are still considered to be part of a
generalized linear least-squares approach. Consider fitting the following linear
combination of m shape functions f
i
to some set of n observed data points:
1 1 2 2 m 1 m 1 m m
y c f c f c f c f

+ + + + L
(1.3.8)
The m shape functions themselves may be nonlinear, but cannot contain any of the
unknown fit coefficients c
i
. This is still considered to be a linear regression analysis,
since the resulting equations comprise a linear system for solution of the fit coefficients,
regardless of any nonlinearities in the shape functions. Let y
obs
be the vector of n
observed dependent data points, and F be the matrix of function values resulting from
substitution of each of the corresponding n independent data values into each shape
function in turn:
1 1 2 1 m 1
1 2 2 2 m 2
1 n 2 n m n
f ( x ) f ( x ) f ( x )
f ( x ) f ( x ) f ( x )
f ( x ) f ( x ) f ( x )
]
]
]

]
]
]
]
]
F
L
L
M M O M
L
(1.3.9)
Thus, F is n x m, and not square in general. Note that once we have chosen the shape
functions, and selected a data set to fit, the values in y
obs
and F are immediately
determined. By the same minimization procedure as before, it can be shown that the
following is the matrix formulation for solution of the vector c, which contains the
unknown and desired fit coefficients c
i
:
T 1 T
[ ]


obs
c F F F y
(1.3.10)
In this or any other case, we can always determine a correlation coefficient following the
usual definition in (1.1.8), in order to compare the goodness of fit to other possible
choices.
7 | P a g e
Polynomial Interpolation/Function Approximation
Interpolation is the process of defining a function that takes on
specified values at specified points. We all know that two points
determine a straight line. More precisely, any two points in the plane,
(x
1
, y
1
) and (x
2
, y
2
), with x
1
6= x
2
, determine a unique first degree
polynomial in x whose graph passes through the two points. There are
many different formulas for the polynomial, but they all lead to the
same straight line graph. This generalizes to more than two points.
Given n points in the plane, (x
k
, y
k
), k = 1 , . . . , n, with distinct x
k
's,
there is a unique polynomial in x of degree less than n whose graph
passes through the points. It is easiest to remember that n, the
number of data points, is also the number of coefficients, although
some of the leading coefficients might be zero, so the degree might
actually be less than n = 1. Again, there are many different formulas
for the polynomial, but they all define the same function. This
polynomial is called the interpolating polynomial because it exactly
reproduces the given data:
Later, we examine other polynomials, of lower degree, that only
approximate the data. They are not interpolating polynomials. The
most compact representation of the interpolating polynomial is the
Lagrange form
1.4 Lagrange Interpolation Formula
For a set of data points:
n i y x
i i
, , 2 , 1 , ,
and 1 > n , the
elementary Lagrange interpolation formula is
n i
x x
x x
x l
n
i j j j i
j n
i
..., , 3 , 2 , 1 , ) (
, 1



(2.1.1)
8 | P a g e
) (x l
n
i
is a polynomial with degree no greater than 1 n . Its value at any data
point k
x
within the data set is either 1 or 0
k i for
x x
x x
x l
n
i j j j i
j k
k
n
i



, 1 ) (
, 1
(2.1.2)
k i for
x x
x x
x l
n
i j j j i
j k
k
n
i



, 0 ) (
, 1
(2.1.3)
which is equivalent to

'

'


k i
k i
x l
ik k
n
i
, 0
, 1
) (
(2.1.4)
The Lagrange interpolation polynomial of degree 1 n is
) ( ) (
1
1
x l y x p
n
i
n
i
i n


(2.1.5)
and its values at the data points are
n k y y x l y x p
k ik
n
i
i k
n
i
n
i
i k n
, , 2 , 1 , ) ( ) (
1 1
1



(2.1.6)
which means
) (
1
x p
n passes through all the data points exactly.
For the simplest case where 2 n , there are only two data points
and ) (
1
x p is a linear function which passes through the two data
points. Thus, ) (
1
x p is just a straight line with its two end points being
the two data points. From (2.1.6), we have
) ( ) (
1
1 2
1 2
1
1 2
1
2
2 1
2
1 1
x x
x x
y y
y
x x
x x
y
x x
x x
y x p

(2.1.7)
9 | P a g e
For 3 n ,
) (
2
x p
is the quadratic polynomial that passes through three data
points.
) )( (
) )( (
) )( (
) )( (
) )( (
) )( (
) (
2 3 1 3
2 1
3
3 2 1 2
3 1
2
3 1 2 1
3 2
1 2
x x x x
x x x x
y
x x x x
x x x x
y
x x x x
x x x x
y x p


+


+

(2.1.8)
An advantageous property of the Lagrange interpolation polynomial is that the
data points need not be arranged in any particular order, as long as they are mutually
distinct. Thus the order of the data points is not important. For an application of the
Lagrange interpolation polynomial, say we know
1
y
and
2
y
or
1
y
,
2
y
and 3
y
, then
we can estimate the function
) (x y
anywhere in
] , [
2 1
x x
linearly and
] , [
3 1
x x

quadratically. This is what we do in finite element analysis.
1.5 Newton Interpolating Polynomial
Suppose there is a known polynomial
) (
1
x p
n that interpolates the data set:
) , , 2 , 1 , , ( n i y x
i i

. When one more data point
) , (
1 1 + + n n
y x
, which is distinct
from all the old data points, is added to the data set, we can construct a new polynomial
that interpolates the new data set. Keep also in mind that the new data point need not be
at either end of the old data set. Consider the following polynomial of degree
n

+
n
i
i n n n
x x c x p x p
1
1
) ( ) ( ) (
(2.2.1)
where n
c
is an unknown constant. In the case of 1 n , we specify
) (
0
x p
as
1 0
) ( y x p
, where data point 1 need not be at the beginning of the data set.
At the points of the old data set, the values of
) (x p
n are the same as those of
) (
1
x p
n . This is because the second term in Equation 3.2.1 is zero there. Since we
assume
) (
1
x p
n interpolates the old data set,
) (x p
n does so too.
At the new data point, we want 1 1
) (
+ +

n n n
y x p
. This can be accomplished by
setting the coefficient n
c
to be

+
+ +

+
+ +

n
i
i n
n n n
n
i
i n
n n n n
n
x x
x p y
x x
x p x p
c
1
1
1 1 1
1
1
1 1 1
) (
) (
) (
) ( ) (
(2.2.2)
10 | P a g e
which definitely exists since 1 + n
x
is distinct from
n) , , 2 1, i ( x
i

. Now
) (x p
n is
a polynomial that interpolates the new data set.
For any given data set:
) , , 2 , 1 , , ( n i y x
i i

, we can obtain the interpolating
polynomial by a recursive process that starts from
) (
0
x p
and uses the above
construction to get
) (
1
x p
,
) (
2
x p
, ,
) (
1
x p
n . We will demonstrate this process
through the following example.
Example 2.2.1. Construct an interpolating polynomial for the following data set using the
formula in Equations 2.2.1 and 2.2.2
I 1 2 3 4 5
X 0 5 7 8 10
Y 0 2 -1 -2 20
Step1: for 1 i
0 ) (
1 0 0
y c x p
Step 2: adding point #2
x c x x c x p x p
1 1 1 0 1
) ( ) ( ) ( +
Applying
2 2 1
) ( y x p , we get 4 . 0
1
c . So
x x x c x p x p 4 . 0 ) ( ) ( ) (
1 1 0 1
+
Step 3: adding point #3
) 5 ( 4 . 0 ) )( ( ) ( ) (
2 2 1 2 1 2
+ + x x c x x x x x c x p x p
Applying
3 3 2
) ( y x p
, we get 2714 . 0
2
c . So
) 5 ( 2714 . 0 4 . 0 ) (
2
x x x x p
Step 4: adding point #4
) 7 )( 5 ( ) ( ) )( )( ( ) ( ) (
3 2 3 2 1 3 2 3
+ + x x x c x p x x x x x x c x p x p
Applying 4 4 3
) ( y x p
, we get
0548 . 0
3
c
. So
11 | P a g e
) 7 )( 5 ( 0548 . 0 ) ( ) (
2 3
+ x x x x p x p
Step 5: adding point #5
) 8 )( 7 )( 5 ( ) ( ) )( )( )( ( ) ( ) (
4 3 4 3 2 1 4 3 4
+ + x x x x c x p x x x x x x x x c x p x p
Applying 5 5 4
) ( y x p
, we get
0712 . 0
4
c
. So
) 8 )( 7 )( 5 ( 0712 . 0 ) ( ) (
3 4
+ x x x x x p x p
which is the final answer.
If we expand the recursive form, the r.h.s of Equation (2.2.1), we obtain the more
familiar form of a polynomial
) ( ) )( ( ) )( ( ) ( ) (
2 1 2 1 2 1 1 0 1 n n n
x x x x x x c x x x x c x x c c x p + + + +


(2.2.3)
which is called the Newtons interpolation polynomial. Its constants can be determined
from the data set:
) , , 2 , 1 , , ( n i y x
i i



) )( ( ) ( ) (
) ( ) (
) (
2 3 1 3 2 1 3 1 0 3 3 2
1 2 1 0 2 2 1
0 1 1 0
x x x x c x x c c y x p
x x c c y x p
c y x p
+ +
+

(2.2.4)
which gives

) )( (
) (
2 3 1 3
1 3 1 0 3
2
1 2
1 2
1 2
0 2
1
1 0
x x x x
x x c c y
c
x x
y y
x x
c y
c
y c

(2.2.5)
We should note that forcing the polynomial through data with no regard for rates
of change in the data (i.e., derivatives) results in a
0
C continuous interpolating
polynomial. Alternatively, each data condition i i
y x p ) (
is called a
0
C constraint.
Lets use the following notation for these constants
, 2 , 1 , ] [
2 1 1

i y x x x c
i i (2.2.6)
12 | P a g e
which has the following property
1
1 2 1 3 2
2 1
] [ ] [
] [
x x
y x x x y x x x
y x x x
i
i i
i

(2.2.7)
so that it is called the divided difference.
For example:
[ ] [ ]
[ ] [ ]
1 3
2 1 3 2
3 2 1
1 2
1 2
2 1
, ,
, , , ,
x x
y x x y x x
y x x x
x x
y y
y x x

, etc. Using
this recursion property, a table of divided differences can be generated as follows

y x x x x y x x x y x x y x
y x x x y x x y x
y x x y x
y x
] [ ] [ ] [
] [ ] [
] [
4 3 2 1 4 3 2 4 3 4 4
3 2 1 3 2 3 3
2 1 2 2
1 1
(2.2.8)
This table can be viewed as part of an
) 1 ( + n n
matrix for a data set that has
n

points. The first column is the
x
values of the data set and the second column is the y or
function values of the data set. For the rest of the
) 1 ( ) 1 ( n n
lower triangle, the rule
for the construction of its elements, say, element(i, j), is as follows
(1) It takes the form of a division (
1 , 3 j i j
)
) 1 , 2 ( ) 1 , (
) 1 , 1 ( ) 1 , (
) , (
+

j i element i element
j i element j i element
j i element
(2.2.9)
where
Element(i, j-1) is the element in the matrix immediately left of element(i, j);
Element(i-1, j-1) is the element above and immediately left;
Element(i, 1) is the x on the same row;
Element(i-j+2, 1). To find it, going from element(i, j) diagonally upward and leftward,
when reaching the second column, it is the element to the left.
(2) The denominator is easier to see in this form:
k q
q l k
x x
left above element left on element
y x x x

] [
Example 2.2.2. Using the table of divided differences, construct the Newton interpolating
polynomial for the data set in Example 2.2.1.
13 | P a g e
071 . 0
0 10
055 . 0 767 . 0
767 . 0
5 10
167 . 0 4
4
7 10
1 11
11
8 10
2 20
20 10
055 . 0
0 8
271 . 0 167 . 0
167 . 0
5 8
5 . 1 1
1
7 8
1 2
2 8
271 . 0
0 7
4 . 0 5 . 1
5 . 1
5 7
2 1
1 7
4 . 0
0 5
0 2
2 5
0 0

where the number of decimal digits is reduced so that the table fits the page here. The
bracketed terms <> are the coefficients of the Newton polynomial. Then
))) 0712 . 0 ) 8 ( 0548 . 0 )( 7 ( 2714 . 0 )( 5 ( 4 . 0 ( 0
) 8 )( 7 )( 5 ( 0712 . 0 ) 7 )( 5 ( 0548 . 0 ) 5 ( 2714 . 0 4 . 0 0 ) (
4
+ + + +
+ + +
x x x x
x x x x x x x x x x x p
The second line above is called a nested form that can save computation operations and
produces a more accurate solution numerically. Note in passing how similar this table is
to Romberg quadrature (or Richardsons extrapolation), only in here the formula for
calculating the rightmost terms is a simple ratio of differences.
The algorithm is as follows
(1) Input and plot data set points;
(2) Declare c(n, n + 1) as the matrix for the table; Note: c = element in Equation
(2.2.9);
(3) c(i, 1) x; c(i, 2) y;
(4) Calculate the rest of the table due to Equation (2.2.9);
(5) Calculate the polynomial using c(i, i +1) as the coefficients of the polynomial.
Figure 2.2.1 shows the resulting polynomial for the above example.
14 | P a g e
-2 0 2 4 6 8 10 12
-10
-5
0
5
10
15
20
x
y
Newton Interpolating Polynomial
data points
Newton Polynomial
Figure 2.2.1 Newton interpolation polynomial.
Through the above example, we can see the advantages of the divided difference
table over the algebraic approach we have in Example 3.2.1. First, it has less
computational operations. We do not need to write the polynomial and then use the C
0
condition to calculate the constants. Second, it is much easier to incorporate it in a
computer code.
It is important to realize that both the Lagrange and Newton polynomials are C
0
continuous and each would generate the same result. One effect of C
0
continuity can be
seen in the large dip that occurs between the first two data points in the figure.
1.6 Hermite Interpolation Polynomial
The Hermite interpolation accounts for the derivatives of a given function. For
example, in the case of a beam finite element, suppose we need to obtain cubic
polynomials that satisfy the following cases:
(1) Consider: y = ax
3
+ bx
2
+ cx + d in [0, 1].
(2) Apply conditions
15 | P a g e
@ x = 0 @ x = 1
Case 1: y = 1, y = 0 y = y = 0
Case 2: y = 0, y = 1 y = y = 0
Case 3: y = 0, y = 0 y = 1, y = 0
Case 4: y = 0, y = 0 y = 0, y = 1
(3) Solve each case for a, b, c, d.
We recall the standard Newton form for a cubic polynomial
) )( )( ( ) )( ( ) ( ) (
3 2 1 3 2 1 2 1 1 0
x x x x x x c x x x x c x x c c x y + + +
(2.3.1)
This clearly is not what the Newton interpolation is meant for, but we can employ it as
follows: approximate y by using the divided difference between two points, then letting
one approach the other in the limit. For example, if y (x
i
) is used, consider adding another
point x
m
. Letting these two points converge to one point, we obtain
i m i
i m
i m
m i
x x as x y
x x
y y
y x x

), ( ] [
From this we discover that the divided difference is an approximation of a derivative.
Then in the divided difference table, we will put two entries for x
i
in the data set and do
not calculate
[ ] y x x
i i
,
in its original form (which will overflow numerically), but rather
simply put the y (x
i
) value there.
For the case mentioned above, the table would be
y x x x x y x x x y y x
y x x x y x x y x
y y x
y x
] [ ] [
] [ ] [
2 2 1 1 2 2 1 2 2 2
2 1 1 2 1 2 2
1 1 1
1 1

The tables for the four cases are best determined by hand. Then substitution of the
diagonal values for the i
c
s and for the i
x
s into Equation 3.3.1 yield the polynomials.
The results are:
16 | P a g e
1 3 2 ) 1 ( 2 1 0 1 ) (
2 1 0 0 1
1 1 0 1
0 1 0
1 0
: 1
2 3 2 2
+ + +

x x x x x x x y
Case
x x x x x x x x y
Case
+ + +

2 3 2 2
2 ) 1 ( 1 1 1 0 ) (
1 0 0 0 1
1 0 0 1
1 0 0
0 0
: 2
2 3 2 2
3 2 ) 1 ( 2 1 0 0 ) (
2 1 0 1 1
1 1 1 1
0 0 0
0 0
: 3
x x x x x x x y
Case
+ + +

2 3 2 2
) 1 ( 1 0 0 0 ) (
1 1 1 0 1
0 0 0 1
0 0 0
0 0
: 4
x x x x x x x y
Case
+ + +
The polynomials are plotted in Figure 2.3.1.
For cases involved with higher order derivatives, the principle is the same. One
thing worth noting here is that when y
(n)
(x
i
) is used, all lower derivatives and y(x
i
) itself
must be included in the constraints. For example, you can not have y (x
i
) as a constraint
but not y(x
i
), nor y
(2)
(x
i
) but not y (x
i
) and y(x
i
).
17 | P a g e
0 0.5 1
0
0.2
0.4
0.6
0.8
1
x
y
case 1
0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
x
y
case 2
data points
Hermite Polynomial
0 0.5 1
0
0.2
0.4
0.6
0.8
1
x
y
case 3
0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
x
y
case 4
Figure 2.3.1 Hermite interpolation.
Example 2.3.1. Constructing displacements in a beam element from Hermite polynomials
Consider the beam of length L shown in Figure 2.3.2. The Hermite polynomials are:
1 ) ( 3 ) ( 2 ) (
2 3
1
+
L
x
L
x
x N
(2.3.2)
x
L
x
L
x
x N +
2
2
3
2
2 ) ( (2.3.3)
2 3
3
) ( 3 ) ( 2 ) (
L
x
L
x
x N + (2.3.4)
L
x
L
x
x N
2
2
3
3
) ( (2.3.5)
These polynomials interpolation functions may be thought of as the fundamental modes
of deflection, two of which are shown in Figure 2.3.2. The deflection
) (x w
of any
statically loaded beam can be written in terms of these modes as
2 4 2 3 1 2 1 1
) ( N W N N W N x w + + +
(2.3.6)
18 | P a g e
where the subscripts associate quantities with positions (or nodes) 1 and 2 on the beam
and
, 2 , 1 , , i W
i i

are the deflection and slope, respectively, at each node.
Figure 3.3.2 Static bending modes N
1
and N
3
in a beam.
Optimization: Two or More Variables
Multivariate optimization refers to a function with two or more variables that we
want to optimize (either minimize or maximize). To solve the problem, we need to look
at the derivative of the function, but because we now have two unknowns, the process is a
little more complicated.
We take the derivative of the function with respect to each variable. This is called the
Partial Derivative. When we take the partial derivative with respect to one variable, all
other variables are treated as constants.
The partial derivative of f(x,y) with respect to x is written as: f is pronounced dye
x
The partial derivative of f(x,y) with respect to y is written as: f
y
i.e.
f(x,y) = 16x
2
+ y
2
+ 7xy + 14y + 37x - 280
f / x = 32x + 0 + 7y + 0 +37 0 Remember, y is a constant
= 32x + 7y + 37
f / y = 0 + 2y + 7x + 14 + 0 0 Remember, x is a constant
= 2y + 7x + 14
There are two types of second derivatives when there is more than one variable in the
function.
Pure 2 nd Derivative: Taking the partial derivative of f(x,y) twice with respect to the same
variable.
19 | P a g e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
N
N
1
N
3

x/L
Cross 2 nd Derivative: Taking the partial derivative of f /x with respect to y or
taking the partial derivative of f /y with respect to x.
Pure 2
nd
Derivative is written as
2
f /x
2
and
2
f /y
2
Cross 2
nd
Derivative is written as
2
f /xy
Note: It doesnt matter which way you calculate the Cross 2
nd
Derivative, you will get the
same answer.
i.e.

2
f /x
2
= 32 + 0 + 0 = 32 (y is the constant)

2
f /y
2
= 2 + 0 + 0 = 2 (x is the constant)

2
f /xy: First look at 32x + 7y + 37 and take the derivative with respect to
y (x is the constant).
= 0 + 7 + 0 = 7
Second, look at 2y + 7x + 14 and take the derivative with respect
to x (y is the constant).
= 0 + 7 + 0 = 7
The stationary point(s) are found when the first derivatives are set equal to zero and then
solved for x and y.
i.e. 32x + 7y + 37 32x + 7y + 37 = 0
1 32x + 7y = -37 Equation 1)
and 2y + 7x + 14 2y + 7x + 14 = 0
2y + 7x = -14 Equation 2)
Multiply 1) by 2 and 2) by 7:
1 1) 64x + 14y = -74
2 2) 49x + 14y = -98 Subtract 2) from 1)
15x = 24
x = 1.6
32(1.6) + 7y = -37
7y = -88.2
y = -12.6
For a particular stationary point to be either a local maximum or local minimum, it must
meet the following condition:

2
f(x,y) /x
2
X
2
f(x,y) /y
2
> [
2
f(x,y) /xy]
2
Pure 2
nd
X Pure 2
nd
> (Cross 2
nd
)
2
with respect with respect
to x to y
i.e. 32 x 2 > 7
2
64 > 49
Yes! Therefore, the extreme point exists. But, is it a minimum or maximum?
Note: If it were <, then the point is not an extreme point
If it were =, then we do not know
If
2
f(x,y) /x
2
> 0 Local Minimum
If
2
f(x,y) /x
2
< 0 Local Maximum
20 | P a g e
i.e. Since
2
f(x,y) /x
2
> 0 The stationary point (1.6, -12.6) is a local minimum
So, since the Pure 2
nd
derivative with respect to x will always be > 0 (because it is a
constant (32), it doesnt matter what x is, it will always be 32.
And, since the product of the two Pure 2
nd
derivatives will always be > than the Cross 2
nd
derivative, and this condition will hold for any value of x and y, the stationary point (1.6,
-12.6) is also a global minimum.
Steps:
1 1. Find the partial derivatives with respect to each of the variables.
2 2. Calculate the pure second derivatives with respect to each of the variables.
3 3. Calculate the cross second derivative with respect to one of the variables.
4 4. Determine the stationery point(s) by setting the partial derivatives equal to zero and
solving the system of equations.
5 5. Check the required condition to determine if the extreme point actually exists.
6 6. Check the pure second derivative with respect to x to determine whether the
extreme point is a maximum or minimum.
7 7. Calculate the value of the function at the stationery point.
Extrapolation
1.7 Taylor Series
Function
) (x y
can be expanded over a small interval t using the Taylor series
from a start or reference point
x
) (
! 4
1
) (
! 3
1
) (
! 2
1
) ( ) ( ) (
) 4 ( 4 3 2
+ + + + + + x y t x y t x y t x y t x y t x y
(4.1.1)
This forms an approximation to
y
in t which runs, say, from
i
x
to
1 + i
x
, and
can be used to integrate
) (t y
. For example, if we keep the first four terms on the r.h.s.
of the above series in an integration, we get
( )
( )
) 4 ( 5
4 3 2
0
) 4 ( 4 3 2
0
24 6 2
! 3
1
! 2
1
y h O y
h
y
h
y
h
hy
dt y t O y t y t y t y ydt I
h h
+ + + +
]
]
]

+ + + +

(4.1.2)
where i i i
x x h
+1 = h, a constant. Alternatively, by setting t to h in the Taylor
series, we may also approximate the first derivative
[ ] ) ( ' ) ( ) ( ) (
1
) (
1
y h O
h
y y
y y h O x y h x y
h
x y
i i
i

+
+
(4.1.3)
21 | P a g e
The last expression, a first forward finite difference, will be used to complete the
trapezoid rule below. Note that it is only first order accurate.
The famous, second-order accurate central difference approximations for
derivatives are obtained by writing a backward Taylor expansion y(x-t) and letting t = h.
Then subtracting it from y(x+t) yields
) (
2
2 1 1
y h O
h
y y
y
i i
i


+
(4.1.4)
Adding it to y(x+t) yields
) ( ) 2 ( / 1
) 4 ( 2
1 1
2
y h O y y y h y
i i i i
+
+
(4.1.5)
Fig. 4.1 Finite difference techniques to calculate first order derivative at xi
1.8 Trapezoid Rule
The concept of the trapezoidal rule comes from approximating
) (x y

at any point t in an interval using the Taylors series. Then integrate the
approximation. If the approximation is a linear function, the resulting
integration is called the trapezoidal rule.
22 | P a g e
x
i-1
x
i
x
i+1
y
i+1
y
i
h
Backward
difference
Forward
difference
True derivative
y
i-1
Central
difference
h
For the data interval
] , [
1 + i i
x x
( ) ) (
2
) (
2
) (
2
) ( ) (
3
1
3 1
2
3
2
0 0
2
0 0
2
0

+ +

+
]
]
]

+
+
+

i i i
i
i i
i i i i
h h
i
h
i i
h
i i i
h
y h O y y
h
y h O
h
y y h
hy y h O y
h
hy
dt y t O tdt y dt y dt y t O ty y ydt I
(4.2.1)
which is the trapezoidal rule for one interval. If we use it over a range of intervals
running from a to b,
] , , , [
1 2 1
b x x x a
n

+

, we get
2
2
3
1
2 1
1
3
1
1
) 2 (
2
) 2 (
2
) ( ) (
2
ch y y y
h
y nh y y y
h
y h O y y
h
I
b
n
i
i a
avg n
n
i
i
n
i
i i
n
i
i
b
a
+ + +

+ + +

+ +

+

+

(2.2.2)
which is the composite trapezoidal rule.
Example 4.2.1. Use the trapezoidal rule to calculate the following integral
dx x I ) sin(
2
1
0

23 | P a g e
1.9 Romberg Integration
Denote
) (h I
as I with subinterval length h . Consider I for two
different h
2
2 2
2
1 1
) (
) (
ch h I I
ch h I I
+
+
(4.3.1)
which gives

) ( ) (
2
2
2
1
2 1
h h
h I h I
c


(4.3.2)
By substituting it into either of the above equations, we would get a better approximation
to I . This idea of using multiple subinterval lengths to extrapolate better
approximations is generally called the Richardson extrapolation. It will be discussed in
more detail after the following example.
Example 4.3.1. Calculate the integral in Example 4.2.1 with 1 . 0 h and 05 . 0 h .
Then use the Richardson extrapolation to get a better approximation.
solve this example. We have
3
3
10 0570 . 2 1 ) 05 . 0 ( : 05 . 0 @
10 2382 . 8 1 ) 1 . 0 ( : 1 . 0 @



I h
I h
where the integration values shown are the error terms compared with the exact solution
of 1. Then
82416 . 0
05 . 0 1 . 0
) 10 0570 . 2 1 ( ) 10 2382 . 8 1 (
2 2
3 3




c
which gives
6 2
10 3922 . 3 1 1 . 0 ) 1 . 0 (

+ c I I
which is clearly an improvement over
) 1 . 0 ( I
and
) 05 . 0 ( I
without going
through an extensive integration process. We can improve upon this even
further.
24 | P a g e
Generally the Richardson extrapolation rule assumes
) (h A
is an
approximation written as
) ( ) 0 ( ) (
q p
h O ch A h A + + (4.3.3)
where
q p <
and they need not to be integers.
) 0 ( A
is the exact solution
and the terms to the right of it denote the error in A(h). To eliminate the
p
h
term, lets introduce a fixed number 1 > r . Then for an interval r h /
) ] / ([ ) 0 ( ) / (
q p p
r h O h cr A r h A + +

(4.3.4)
Solving for c from the two equations above and then substituting it back into
the first equation, we get
) (
1
) ( ) / (
) (
) ( ) / (
) ( ) 0 (
q
p
p
q
p p p
p
h O
r
h A r h A r
h O
h r h
h A r h A
h h A A +

(4.3.5)
which gives an improved version of the approximation. We can repeat this
process and get better and better approximations. This is the Richardson
approximation.
The Romberg integration uses the Richardson extrapolation on the
composite trapezoidal rule. The detailed proof of related theorems can be
found in many numerical analysis books such as Gautschi (1997) and
Buchanan and Turner (1992). Only the schemes are given here.
For the trapezoidal rule, the usual interval subdivision is 2, so let 2 r
and following Richardson, take even powers in the sequence
, 6 , 4 , 2 p
,
and
2 + p q
. Moreover, if we recast the approximation (r.h.s. above) as a
sequence in n = 1, 2, 3, , then A(0) can be written as
1 4
) ( ) 2 / ( 4
) (
1 1


n
n n
n
n
h A h A
h A (4.3.6)
25 | P a g e
where p = 2n,
n n
4 2
2

and the subscript on A denotes its value at each recursive step,


the sequence of which is discussed next.
If we start from
) (
0
h A
and we want to get ) (
1
h A , we need to calculate
) 2 / (
0
h A
first. Then to get ) (
2
h A , we need ) 2 / (
1
h A in addition to ) (
1
h A . ) 2 / (
1
h A
in turn needs
) 4 / (
0
h A
. The scheme can be arranged in a form very similar to the table
of divided differences (see Section 3.2) :

) ( ) 2 / ( ) 4 / ( ) 8 / (
) ( ) 2 / ( ) 4 / (
) ( ) 2 / (
) (
) ( ) ( ) ( ) (
3 2 1 0
2 1 0
1 0
0
8 6 4 2
h A h A h A h A
h A h A h A
h A h A
h A
h O h O h O h O
(4.3.7)
The 0
A
column is found from the composite trapezoidal rule. The algorithm for the
implementation of this scheme is as follows
(1) Set the desired level of approximation, nl , e.g., 4 nl in Equation (4.3.7);
(2) Define the integrated function and set the interval
] , [ b a
over which the
integration is to be done, and the initial number of intervals,
n
;
(3) Declare the table in (4.3.7) as a matrix r of dimension (nl, nl);
(4) Calculate the first column of r using the composite trapezoidal rule (Equations
4.3.8 and 4.3.9 given below);
(5) Calculate the rest of the matrix using Equation (4.3.6).
In step 4, we start with the first element using the composite trapezoidal rule:
[ ]

+ + +
n
k
h k a y h b y a y
h
r
2
) 1 ( )) ( ) ( (
2
) 1 , 1 (
(4.3.8)
where the sum goes to n because we have n+1 points. For subsequent
nl i , , 3 , 2
,
the calculation will be done step-by-step in a do loop down column 1 where each interval
length is half of the previous step. Since half of the data points and their function values
are calculated already, we take advantage of this by revising the composite trapezoidal
rule:
26 | P a g e
[ ]
[ ] [ ]
[ ] [ ]
[ ] [ ]
[ ] [ ]
[ ]

+ +
+ + + + +
+ + + + +
+ + + + +

'

'

+ + + + +
+ + +

i
i i
i i
i i
i i
i
n
k
i i
n
k
i i
n
j
i
i i
n
k
i i
n
j
i
i i
n
k
i i
n
k
i
i i
n
k
i
n
k
i i
i
n
k
i i
i
h k a y h i r
h k a y h h j a y
h
b y a y
h
h k a y h h j a y
h
b y a y
h
h k a y h h k a y
h
b y a y
h
h k a y h k a y h b y a y
h
h k a y h b y a y
h
i r
6 , 4 , 2
6 , 4 , 2 2
1
1 1
6 , 4 , 2 4 , 3 , 2
1 1
6 , 4 , 2
1
7 , 5 , 3
1 1
1
7 , 5 , 3 6 , 4 , 2
1
2
) 1 ( ) 1 , 1 (
2
1
) 1 ( ) 1 (
2
)) ( ) ( (
2 2
1
) 1 ( ) 1 1 2 (
2
)) ( ) ( (
2 2
1
) 1 ( ) 1 (
2
)) ( ) ( (
2 2
1
) 1 ( ) 1 ( )) ( ) ( (
2 2
1
) 1 ( )) ( ) ( (
2
) 1 , (
1
1
(4.3.9)
where i
n
, i
h
are the number of intervals and their length for row i , respectively, and
1
2

i i
n n
.
Example 4.3.2. Calculate the integral in Example 4.2.1 using the Romberg integration for
five levels.
The resulting error matrix is
-8.2382e-3
-2.0570e-3 3.3922e-6
-5.1409e-4 2.1155e-7 -4.9836e-10
-1.2851e-4 1.3214e-8 -7.7686e-12 1.8652e-14
-3.2128e-5 8.2579e-10 -1.2057e-13 8.8818e-16 8.8818e-16
In Example 4.2.1, the error for the composite trapezoidal rule with 10
6
intervals is
about 10
-13
. We achieved 10
-14
in just four levels of Romberg approximation. This scheme
certainly saves a lot of computational time.
27 | P a g e
References
http://livetoad.org/Courses/Documents/292d/Notes/least-
squares_and_qr-decomposition.pdf
http://www.esm.psu.edu/courses/emch407/njs/notes02/ch3_2.doc
https://people.rit.edu/jdweme/emem440/Fundamentals%20of
%20Curve%20Fitting.doc
http://micro5.mscc.huji.ac.il/~aries/CurveFitting.doc
http://www.esm.psu.edu/courses/emch407/njs/notes02/ch2_2.doc
http://www.busi.mun.ca/bsimmons/2401/Chapter8.pdf
28 | P a g e

You might also like