This action might not be possible to undo. Are you sure you want to continue?
Principles and Applications of
NonLinear Least Squares: An
Introduction for Physical
Scientists using Excel’s Solver
Les Kirkup, Department of Applied Physics, Faculty of Science, University of
Technology, Sydney, New South Wales 2007, Australia.
email: Les.Kirkup@uts.edu.au
Version date: October 2003
2
Preamble
Least squares is an extremely powerful technique for fitting equations to data and is
carried out in laboratories every day. Routines for calculating parameter estimates
using linear least squares are most common, and many inexpensive pocket calculators
are able to do this. As we move away from fitting the familiar equation, y = a + bx to
data, we usually need to employ computer based programs such as spreadsheets, or
specialised statistical packages to do the ‘number crunching’. In situations where an
equation is complex, we may need to use nonlinear least squares to fit the equation to
experimental or observational data.
Nonlinear least squares is treated in this document with a focus on how Excel’s
Solver utility may be employed to perform this task. Though I had originally intended
to concentrate more or less exclusively on using Solver to carry out nonlinear least
squares (due to the general availability of Excel and the fact that I’d already written a
text discussing data analysis using Excel!), several other related topics emerged
including model identification, Monte Carlo simulations and uncertainty propagation.
I have included something about those topics in this document. In addition, I have
tried to include helpful worked examples to illustrate the techniques discussed.
I hope the document serves its purpose (I had senior undergraduates and graduates in
the physical sciences in mind when I wrote it) and I would appreciate any comments
as to what might have been included (or discarded).
3
CONTENTS
Section 1: Introduction 5
1.1 Reasons for fitting equations to data 7
Section 2: Linear Least squares 8
2.1 Standard errors in best estimates 11
Section 3: Extensions of the linear least squares technique 13
3.1 Using Excel to solve linear least squares problems 13
3.2 Limitations of linear least squares 14
Section 4: Excel's Solver addin 17
4.1 Example of use of Solver 17
4.2 Limitations of Solver 24
4.3 Spreadsheet for the determination of standard errors in parameter estimates 26
4.4 Confidence intervals for parameter estimates 28
Section 5: More on fitting using nonlinear least squares 30
5.1 Local Minima in SSR 30
5.2 Starting values 32
5.3 Starting values by curve stripping 33
5.4 Effect of instrument resolution and noise on best estimates 35
5.4.1 Adding normally distributed noise to data using Excel’s Random Number Generator 37
5.4.2 Fitting an equation to noisy data 38
5.4.3 Relationship between sampling density and parameter estimates 39
Section 6: Linear least squares meets nonlinear least squares 42
Section 7: Weighted nonlinear least squares 46
7.1 Weighted fitting using Solver 46
7.2 Example of weighted fitting using Solver 47
7.2.1 Best estimates of parameters using Solver 49
7.2.2 Determining the D matrix 50
7.2.3 The weight matrix, W 51
7.2.4 Calculation of ( )
1
T
WD D
−
52
7.2.5 Bringing it all together 52
Section 8: Uncertainty propagation, least squares estimates and calibration 54
8.1: Example of propagation of uncertainties involving parameter estimates 55
8.2 Uncertainties in derived quantities incorporating least squares estimates 59
8.3: Example of propagation of uncertainties in derived quantities 60
4
8.4: Uncertainty propagation and nonlinear least squares 61
8.4.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least
squares 62
Section 9: More on Solver 67
9.1 Solver Options 67
9.2 Solver Results 70
Section 10: Modelling and Model Identification 71
10.1 Physical Modelling 71
10.2 Data driven approach to discovering relationships 72
10.3 Other forms of modelling 73
10.4 Competing models 73
10.5 Statistical Measures of Goodness of Fit 74
10.5.1 Adjusted Coefficient of Multiple Determination 74
10.5.2 Akaike’s Information Criterion (AIC) 75
10.5.3 Example 76
Section 11: Monte Carlo simulations and least squares 78
11.1 Using Excel’s Random Number Generator 79
11.2 Monte Carlo simulation and nonlinear least squares 82
11.3 Adding heteroscedastic noise using Excel’s Random Number Generator 86
Section 12: Review 90
Acknowledgements 90
Problems 91
References 101
5
Section 1: Introduction
Scientists in all areas of the physical sciences search for defensible models that
describe the way nature works. As a part of that search they often investigate the
relationship between physical variables. As examples, they might want to know how
the,
• electrical resistance of a superconductor depends on the temperature of the
superconductor.
• width of an absorption peak in liquid chromatography depends on the flow of
the mobile phase through a packed column.
• electrical permittivity of a solid depends on the moisture content in the solid.
• output voltage from a conductivity sensor depends on the electrical
conductivity of the liquid in which the sensor is immersed.
A model that explains or describes the relationship between physical variables may be
devised from first principles, or it may represent a new development of an established
model. Whatever the situation, once a model has been devised, it is prudent to
compare it to ‘real’ data obtained by experiment. One reason for doing this is to
establish whether predictions of the model are consistent with experimental data.
Consider a specific example in which nuclear radiation passes through material of
thickness, x. The relationship between the intensity, I, of the radiation and x can be
written,
I I x B
o
= − + exp( ) µ (1.1)
I
o
is the intensity recorded in the absence of the material when the background
radiation is negligible, µ is the absorption coefficient of the material and B is the
background intensity.
The appropriateness (or otherwise) of equation 1.1 may be investigated for a
particular material by considering radiation intensity versus material thickness data, as
shown in figure 1.1.
6
0
200
400
600
800
1000
1200
0.0 0.5 1.0 1.5 2.0
Thickness (cm)
I
n
t
e
n
s
i
t
y
(
c
o
u
n
t
s
)
Figure 1.1: Intensity versus thickness data.
If equation 1.1 fairly describes the relationship between intensity and thickness, we
should be able to find values for I
o
, µ and B such that the line generated by
equation 1.1, when x varies between x = 0.0 and x = 2.0, ‘fits’ the data shown in
figure 1.1 (i.e. passes close to the data points). We could begin by making an
intelligent guess at values for I
o
, µ and B. Figure 1.2 shows the outcome of one
attempt at guessing values for I
o
, µ and B.
0
200
400
600
800
1000
1200
0.0 0.5 1.0 1.5 2.0
Thickness (cm)
I
n
t
e
n
s
i
t
y
(
c
o
u
n
t
s
)
Line generated using equation 1.1,
where,
Io = 800
µ = 1
B = 10
Figure 1.2: Line drawn through intensity versus thickness data using equation 1.1.
It would have been fortuitous had the guesses for I
o
, µ and B, given in figure 1.2
produced a line that passed close to the data. We could try other values for I
o
, µ and B
and through a process of ‘trial and error’ improve the fit of the line to the data.
However, it must be admitted that this is an inefficient way to fit any equation to data
7
and that guesswork must give way to a better approach. This is the main consideration
of this document.
1.1 Reasons for fitting equations to data
It is possible to fit almost any equation to any data. However, a compelling reason for
fitting an equation in the physical sciences is that it provides for an insightful
interpretation of physical or chemical processes or phenomena. In particular, the
fitting of an equation can assist in validating or refuting a theoretical model and allow
for the determination of physically meaningful parameters
1
.
As an example, the parameters in equation 1.1 have physical meaning. For example, µ
in equation 1.1 is a quantity that characterises radiation absorption by a material. The
applicability of equation 1.1 to a particular material is likely to have been studied by
other workers. Therefore a value for µ as determined through analysing the data in
figure 1.1 may be compared to that reported by others.
There are situations in which an equation is fitted to data for the purpose of calibration
and no attempt is made to relate parameters in the equation to physical constants. For
example, the concentration of a particular chemical species might be determined using
Atomic Absorption Spectroscopy (AAS). An instrument is calibrated by measuring
the absorption of known concentrations of the species. A graph of absorption, y,
versus concentration, x, is plotted. The next step is to fit an equation to the data. Using
the equation, it is possible to determine species concentration from measurements
made of absorption.
1
This issue is taken up again in section 10.
8
Section 2: Linear Least squares
Often in an experiment there is a known, expected or proposed relationship between
variables measured during the experiment. In perhaps the most common situation, the
relationship between the dependent (or response) variable, y, and the independent (or
predictor) variable, x, may be expressed as,
y = a + bx (2.1)
Equation 2.1 is the equation of a straight line with intercept, a, and slope, b.
In principle, we should be able to find the intercept and slope by drawing a straight
line through the points. In practice, the intercept and slope cannot be known exactly,
as this would require that we eliminate (or correct for) all sources of random and
systematic error in the data. This is not possible. If it were possible to eliminate all
sources of error, and assuming the relationship between x and y is linear, we could
write the ‘exact’ relationship between x and y as,
y = α + βx (2.2)
where α is the ‘true intercept’ and β is the ‘true slope’. α and β are often referred to as
parameters
2
and through applying techniques based on sound statistical principles, it
is possible to establish best estimates of those parameters. We will represent the best
estimates of α and β by symbols a and b respectively
3
.
A powerful and widely used technique for establishing best estimates of parameters
4
is that of least squares. The technique
5
is versatile and allows parameters to be
estimated when the relationship between x and y is more complex than that given by
equation 2.1. For example, a, b and c in the equations 2.3 to 2.5 may be determined
using the technique of least squares.
• cx
x
b
a y + + = (2.3)
• y = a + bx + cz. (here both x and z are independent variables) (2.4)
• y = a+ ( ) [ ] cx b exp 1− . (2.5)
In this discussion of least squares, the following assumptions are made:
1) There are no errors in the x values.
2) Errors in the y values are normally distributed with a mean of zero and a
constant variance. Constant variance errors are sometimes referred to as
homoscedastic errors.
3) Errors in the y values are uncorrelated, so that, for example, the error in the ith
y value is not correlated to the error in the (i+1)th y value.
2
Sometimes referred to as population parameters or regression coefficients.
3
In some texts, best estimates of α and β are written as αˆ and β
ˆ
respectively.
4
Refer to chapters 6 and 7 of Kirkup (2002) for more details.
5
The technique is also widely referred to as regression.
9
The ith observed y value is written as y
i
and the ith value of x as x
i
. The ith predicted y
value found using the equation of the line is written as
i
yˆ , such that
6
,
i i
bx a y + = ˆ (2.6)
The least squares technique of fitting equations to data requires the calculation of
( )
2
ˆ
i i
y y − . We sum ( )
2
ˆ
i i
y y − from i = 1 to i = n, where n is the number of data
points. The summation is written
7
,
( )
¯
=
=
− =
n i
i
i i
y y SSR
1
2
ˆ (2.7)
SSR is the Sum of Squares of the Residuals
8
. Strictly, equation 2.7 applies to fitting by
‘unweighted’ least squares. Weighted least squares is considered in section 7.
The next stage is to find values of a and b which minimise SSR in equation 2.7. This
is the key step in any least squares analysis, as values of a and b that minimise SSR
are regarded as the best estimates obtainable of the parameters in an equation
9
. Best
estimates could be found by ‘trial and error’, or by a systematic numerical search
using a computer. When a straight line is fitted to data, an equation for the best line
can be found analytically by partially differentiating SSR with respect to a and b in
turn then setting the resulting equations equal to zero. Simultaneous equations
obtained by this process are solved for a and b to give,
( )
¯ ¯
¯ ¯ ¯ ¯
−
−
2
2
2
=
i i
i i i i i
x x n
y x x y x
a (2.8)
and,
( )
¯ ¯
¯ ¯ ¯
−
−
2
2
=
i i
i i i i
x x n
y x y x n
b (2.9)
An elegant approach to determining a and b employs matrices. An added advantage of
the matrix approach is that it may be conveniently extended to situations in which
more complex equations are fitted to experimental data.
The equations to be solved for a and b can be expressed in matrix form as:
6
i
yˆ is sometimes referred to as ‘y hat’.
7
In future we assume that all summations are carried out between i = 1 to i = n, and therefore we omit
the limits of the summations.
8
i i
y y ˆ − is referred to as the ith residual.
9
The process by which estimates are varied until some condition (such as the minimisation of SSR) is
satisfied is often called ‘optimisation’.
10
=
¯
¯
¯ ¯
¯
i i
i
i i
i
y x
y
b
a
x x
x n
2
(2.10)
Equation 2.10 can be written concisely as,
AB = P (2.11)
where,
=
¯ ¯
¯
2
i i
i
x x
x n
A
=
b
a
B
=
¯
¯
i i
i
y x
y
P
To determine elements a and b of the matrix B, equation 2.11 is manipulated to give,
B = A
1
P (2.12)
where A
1
is the inverse matrix
10
of the matrix, A. Matrix inversion and matrix
multiplication are onerous to perform manually, especially if matrices are large. The
built in matrix functions in Excel are well suited to estimating parameters in linear
least squares problems.
Exercise 1
Table 2.1 contains xy data which are shown plotted in figure 2.1.
Table 2.1: xy data.
x y
2 70
4 63
6 49
8 42
10 31
0
10
20
30
40
50
60
70
80
0 2 4 6 8 10 12
x
y
Figure 2.1: Linearly related xy data.
Using the data in table 2.1,
10
A
1
is used in the calculation of the standard errors in parameter estimates and is sometimes referred
to as the ‘error matrix’.
11
i) find best estimates for the intercept, a, and the slope, b, of a straight line fitted
to the data using linear least squares [80.7, 4.94].
ii) draw the line of best fit through the points.
iii) calculate the sum of squares of residuals, SSR. [9.9].
2.1 Standard errors in best estimates
In addition to the best estimates, a and b, the standard errors in a and b are required as
this allows confidence intervals
11
to be quoted for the parameters α and β.
Calculations of a and b depend on the measured y values. As a consequence,
uncertainties in the y values contribute to the uncertainties in a and b. In order to
calculate uncertainties in a and b, the usual starting point is to determine the standard
errors in a and b, written as σ
a
and σ
b
respectively. σ
a
and σ
b
are given by
12
,
( )
2
1
2
1
2
∆
=
¯ i
a
x σ
σ (2.13)
2
1
2
1
∆
=
n
b
σ
σ (2.14)
where
( )
∆ = n x x
i i
2
2
−
¯ ¯
(2.15)
and
( )
2
1
2
ˆ
2
1
−
−
≈
¯ i i
y y
n
σ (2.16)
Alternatively, σ
a
and σ
b
may be determined using matrices
13
. The covariance matrix,
V, contains elements which are the variances (as well as the covariances) of the best
estimates of a and b. V, may be written,
14
1
A V
−
=
2
σ (2.17)
A
1
appears in equation 2.12. σ
2
can be found using equation 2.16.
Standard errors in a and b are written explicitly as,
( )
2
1
1
11
−
= A
a
σ σ (2.18)
( )
2
1
1
22
−
= A
b
σ σ (2.19)
11
See Kirkup (2002) p226.
12
See Bevington and Robinson (1992).
13
See chapter 5 of Neter et al. (1996).
14
The covariance matrix is considered in more detail in section 9.
12
1
11
−
A and
1
22
−
A are diagonal elements of the A
1
matrix
15
.
Exercise 2
Using matrices, or otherwise, determine the standard errors in the intercept and slope
of the best straight line through the data given in table 2.1. [1.9, 0.29]
15
See Williams (1972).
13
Section 3: Extensions of the linear least squares technique
The technique of least squares used to fit equations to experimental data can be
extended in several ways:
• Weighting the fit. The assumption that the standard deviation in y values is the
same for all values of x (a characteristic which is sometimes referred to as
homoscedasticity
16
) may not be valid. When it is not valid, we need to
‘weight’ the fit, in effect forcing the line closer to those points that are known
to higher precision. Weighted fitting is considered in section 7.
• More complex equations may be fitted to the data. Equations such as
cx
x
b
a y + + = and y = a + bx + cx
2
are linear in the parameters and may be
fitted using linear least squares. The added computational complexity, which
can arise when there are more than two parameters to be estimated, favours
fitting by matrix methods. These methods are most conveniently applied using
a computer for matrix manipulation/inversion.
• Equations may be fitted using linear least squares in which the equations have
more than one independent variable. As an example, the equation
y = a + bx + cz may be fitted to data, where x and z are the independent
variables (this is sometimes referred to as ‘multiple regression’).
3.1 Using Excel to solve linear least squares problems
Excel is capable of fitting functions to data that are linear in parameters. This may be
achieved by using one of the following features in Excel:
• The LINEST() function
• The Regression tool in the Analysis ToolPak
Excel has no built in tool for performing weighted least squares, though a spreadsheet
may be created to perform this procedure
17
.
Excel does not provide an easy to use utility for fitting an equation to data requiring
the application of nonlinear least squares. However, with the aid of a powerful addin
called ‘Solver’ resident in Excel, fitting using nonlinear least squares is possible. We
will deal with Solver in sections 4 and 9, but first we consider nonlinear least
squares.
16
The condition where the variance in y values is not constant for all x, is referred to as
‘heteroscedasticity’.
17
See Kirkup (2002), section 6.10.
14
3.2 Limitations of linear least squares
Quite complex functions can be fitted to data using linear least squares. As examples,
y = a + x bln + x c exp (3.1)
y = a + bx +
2
x
c
(3.2)
The equation to be fitted is inserted into equation 2.7. SSR is partially differentiated
with respect to each parameter estimate in turn. The resulting equations are set equal
to zero and solved to find best estimates of the parameters.
It is worth highlighting that the ‘linear’ in linear least squares does not mean that a
plot of y versus x will produce a graph containing data which lie along a straight line.
‘Linear’ refers to the fact that the partial derivatives,
a
SSR
∂
∂
,
b
SSR
∂
∂
etc. as
described in section 2, are linear in the parameter estimates. Using this definition,
equations 3.1 and 3.2 may be fitted to data using linear least squares.
Some relationships between physical variables require transformation before they are
suitable for fitting by linear least squares. As an example, the variation of electrical
resistance, R, with temperature, T, of some semiconductor materials is known to
follow the relationship,

.

\

=
T
R R
γ
exp
0
(3.3)
where R
0
and γ are constants.
Taking natural logarithms of both sides of equation 3.3 and comparing the resulting
equation with y = a + bx, we obtain,
(3.4)
Taking the y values to be ln R and the x values to be 1/T, least squares may be used to
find best estimates for ln R
0
(and hence R
0
) and γ. If the errors in R have constant
variance, then after transformation, the errors in ln(R) do not have constant variance.
In this circumstance weighted fitting is required
18
.
Weighted fitting of equations using least squares matters most when the scatter in data
is large. If data show small scatter, then the best estimates found using weighted least
squares are very similar to the best estimates found by using unweighted least squares.
18
See Dietrich (1991) p303.
1
ln ln
0

.

\

+ =
T
R R γ
x b a y + =
15
Though transforming equations can assist in many situations, there are some
equations that cannot be transformed into a form suitable for fitting by linear least
squares. As examples,
x c
bx
a y
+
+ =
2
(3.5)
cx b a y exp + = (3.6)
[ ] cx b a y exp 1− + = (3.7)
dx c bx a y exp exp + = (3.8)
For equations 3.5 to 3.8 it is not possible to obtain a set of linear equations that may
be solved for best estimates of the parameters. We must therefore resort to another
method of finding best estimates. That method still requires that parameter estimates
are found that minimise SSR.
SSR may be considered to be a continuous function of the parameter estimates. A
surface may be constructed, sometimes referred to as a hypersurface
19
in M
dimensional space, where M is the number of parameters appearing in the equation to
be fitted to data. The intention is to use nonlinear least squares to discover estimates,
a, b, c etc which yield a minimum in the hypersurface. As with linear least squares,
these estimates are regarded as the best estimates of the parameters in the equation.
Figure 3.1 shows a hypersurface which depends on estimates a and b.
Figure 3.1: Variation of SSR as a function of parameter estimates, a and b. This
figure is adapted from rcs.chph.ras.ru/nlr.ppt by Alexey Pomerantsev.
19
See Bevington and Robinson (1992).
b
a
SSR
Minimum in hypersurface
16
Fitting by nonlinear least squares begins with reasonable guesses for the best
estimates of the parameters. The objective is to modify the starting values in an
iterative fashion until a minimum is found in SSR. The computational complexity of
the iteration process means that nonlinear least squares can only realistically be
carried out using a computer.
There are many documented ways in which the values of a, b, c etc. can be found
which minimise SSR, including Grid Search (Bevington and Robinson, 1992), Gauss
Newton method (NielsenKudsk, 1983) and the Marquardt algorithm (Bates and
Watts, 1988).
Nonlinear least squares is unnecessary when the derivatives of SSR with respect to
the parameters are linear in parameters. In this situation linear least squares offers a
more efficient route to determining best estimates of the parameters (and the standard
errors in the best estimates). Nevertheless, a linear equation can be fitted to data using
nonlinear least squares. The answer obtained for best estimates of parameters and the
standard errors in the best estimates should agree, irrespective of whether a linear
equation is fitted using linear or nonlinear least squares.
20
20
We consider this in more detail in section 6.
17
Section 4: Excel's Solver addin
Solver, first introduced in 1991, is one of the many 'add ins' available in Excel
21
.
Originally designed for business users, Solver is a powerful and flexible optimisation
tool which is capable of finding (as an example) the best estimates of parameters
using least squares. It does this by iteratively altering the numerical value of variables
contained in the cells of a spreadsheet until SSR is minimised. To solve nonlinear
problems, Solver uses Generalized Reduced Gradient (GRG2) code developed at the
University of Texas and Cleveland State University
22
. Features of Solver are best
described by reference to a particular example.
4.1 Example of use of Solver
Consider an experiment in which the rise of air temperature in an enclosure (such as a
room) is measured as a function of time as heat passes through a window into the
enclosure. Table 4.1 contains the raw data. Figure 4.1 displays the same data in
graphical form.
Table 4.1: Variation of air temperature in an enclosure with time.
Time (minutes) Temperature (°C)
2 26.1
4 26.8
6 27.9
8 28.6
10 28.5
12 29.3
14 29.8
16 29.9
18 30.1
20 30.4
22 30.6
24 30.7
21
See Fyltra et al. (1998)
22
See Excel's online Help. See also Smith and Lasden (1992).
18
20.0
22.0
24.0
26.0
28.0
30.0
32.0
0 5 10 15 20 25
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e
(
C
)
Figure 4.1: Temperature variation with time inside an enclosure.
Through a consideration of the flow of heat into and out of an enclosure, a
relationship may be derived for the air temperature, T, inside the enclosure as a
function of time, t. The relationship can be expressed,
( ) [ ] t k T T
s
α exp 1− + = (4.1)
where T
s
, k and α are constants. Equation 4.1 may be written in a form consistent with
other equations appearing in this document. Using x and y for independent and
dependent variables respectively and a, b and c for the parameter estimates,
equation 4.1 becomes
23
,
( ) [ ] cx b a y exp 1− + = (4.2)
To find best estimates, a, b and c, we proceed as follows:
1. Enter the raw data from table 4.1 into columns A and B of an Excel worksheet as
shown in sheet 4.1.
2. Type =$B$15+$B$16*(1EXP($B$17*A2)) into cell C2 as shown in sheet 4.1.
Cells B15 to B17 contain the starting values for a, b and c respectively.
3. Use the cursor to highlight cells C2 to C13.
4. Click on the Edit menu. Click on the Fill option, then click on the Down option
24
.
23
Equation 4.2 is of the same form as that fitted to data obtained through fluorescent decay
measurements, where the decay is characterised by a single time constant – see Walsh and Diamond
(1995).
24
These steps are often abbreviated in Excel texts to Edit ®Fill ®Down.
19
Sheet 4.1: Temperature (y) and time (x) data from table 4.1 entered into a
spreadsheet
25
.
A B C
1 x(mins) y(°C) yˆ (°C)
2 2 26.1 =$B$15+$B$16*(1EXP($B$17*A2))
3 4 26.8
4 6 27.9
5 8 28.6
6 10 28.5
7 12 29.3
8 14 29.8
9 16 29.9
10 18 30.1
11 20 30.4
12 22 30.6
13 24 30.7
14
15 a 1
16 b 1
17 c 1
Sheet 4.2 shows the values returned in the C column. As the squares of the residuals
are required, these are calculated in column D.
Sheet 4.2: Calculation of sum of squares of residuals.
A B C D
1 x(mins) y(°C) yˆ (°C) (y yˆ )
2
(°C
2
)
2 2 26.1 5.38906 991.560654
3 4 26.8 52.5982 6304.066229
4 6 27.9 401.429 184323.2129
5 8 28.6 2978.96 9045405.045
6 10 28.5 22024.5 486333300.3
7 12 29.3 162753 26498009287
8 14 29.8 1202602 1.44632E+12
9 16 29.9 8886109 7.89635E+13
10 18 30.1 6.6E+07 4.31124E+15
11 20 30.4 4.9E+08 2.35385E+17
12 22 30.6 3.6E+09 1.28516E+19
13 24 30.7 2.6E+10 7.01674E+20
14 SSR = 7.14765E+20
15 a 1
16 b 1
17 c 1
The sum of the squares of residuals, SSR, is calculated in cell D14 by summing the
contents of cells D2 through to D13. It is clear that the choices of starting values for a,
b and c are poor, as the predicted values, yˆ , in column C of sheet 4.2 bear no
resemblance to the experimental values in column B. As a consequence, SSR is very
large. Choosing good starting values for parameter estimates is often crucial to the
25
The estimated values of the dependent variable based on an equation like equation 4.2 must be
distinguished from values obtained through experiment. Estimated values are represented by the
symbol, yˆ and experimental values by the symbol, y.
20
success of fitting equations using nonlinear least squares and we will return to this
issue later.
SSR in cell D14 is reduced by carefully altering the contents of cells B15 through to
B17. Solver is able to adjust the parameter estimates in cells B15 to B17 until the
number in cell D14 is minimised. To accomplish this, choose Tools on Excel's Menu
bar and pull down to Solver. If Solver does not appear, then on the same pull down
menu, select AddIns and tick the Solver Addin box. After a short delay, Solver
should be added to the Tools pull down menu.
Click on Solver. The dialog box shown in figure 4.2 should appear.
Figure 4.2: Solver dialog box with cell references inserted.
After entering the information into the dialog box, click on the Solve button. After a
few seconds Solver returns with the dialog box shown in figure 4.3.
We want to minimise the value in cell D14 so
D14 becomes our 'target' cell.
Solver is capable of adjusting cell contents
such that the value in the target cell is
maximised, minimised or reaches a specified
value. For least squares analysis we require
the content of the target cell to be minimised.
Excel alters the values in these cells in order
to minimise the value in cell D14
It is possible to constrain the values in one or more
cells (for example a parameter estimate can be
prevented from assuming a negative value, if a
negative value is considered to be 'unphysical').
No constraints are applied in this example.
21
Figure 4.3: Solver dialog box indicating that fitting has been completed.
Inspection of cells B15 to B17 in the spreadsheet indicates that Solver has adjusted
the parameters. Sheet 4.3 shows the new parameters, SSR, etc.
Sheet 4.3: Best values for a, b and c returned by Solver when starting values are poor.
A B C D
1 x(mins) y(°C) yˆ (°C) (y yˆ )
2
(°C
2
)
2 2 26.1 24.75011 1.822210907
3 4 26.8 27.99648 1.431570819
4 6 27.9 29.10535 1.452872267
5 8 28.6 29.48411 0.781649268
6 10 28.5 29.61348 1.239842411
7 12 29.3 29.65767 0.127929367
8 14 29.8 29.67277 0.01618844
9 16 29.9 29.67792 0.049318684
10 18 30.1 29.67968 0.176666436
11 20 30.4 29.68028 0.517990468
12 22 30.6 29.68049 0.845498795
13 24 30.7 29.68056 1.039257719
14 SSR = 9.500995581
15 a 15.24587
16 b 14.43473
17 c 0.5371
18
SSR in cell D14 in sheet 4.3 is almost 20 orders of magnitude smaller than that in cell
D14 in sheet 4.2. However, all is not as satisfactory as it might seem. Consider the
best line through the points which utilises the parameter estimates in cells B15
through to B17 of sheet 4.3.
22
Figure 4.4: Graph of y versus x showing the line based on equation 4.2 where a, b
and c have the values given in sheet 4.3.
A plot of residuals (i.e. a plot of (
i i
y y ˆ − ) versus x
i
) is often used as an indicator of the
‘goodness of fit’ of an equation to data, with trends in the residuals indicating a poor
fit
26
. However, no plot of residuals is required in this case to reach the conclusion that
the line on the graph in figure 4.4 is not a good fit to the experimental data. Solver has
found a minimum in SSR, but this is a local minimum
27
and the parameter estimates
are of little worth. The source of the problem can be traced to the poorly chosen
starting values (i.e. a = b = c = 1). Working from these initial estimates, Solver has
discovered a minimum in SSR. However, there is another combination of parameter
estimates that will produce an even lower value for SSR.
Methods by which good starting values for parameter estimates may be obtained are
considered in section 5.2. In the example under consideration here, we note (by
reference to equation 4.2) that when x = 0, y = a. Drawing a line 'by eye' through the
data in Figure 4.1 indicates that, when x = 0, y ≈ 25.5 °C. Starting values for b and c
may also be established by a similar preliminary analysis of data which we will
consider in section 5.2. Denoting starting values by a
0
, b
0
and
c
0
, we find
28
,
a
0
= 25.5, b
0
= 5.5 and c
0
= 0.12
Inserting these values into sheet 4.2 and running Solver again gives the output shown
in sheet 4.4 and in graphical form in figure 4.5.
26
See Cleveland (1994) and Kirkup (2002) for a discussion of residuals.
27
Local minima are discussed in section 5.1.
28
All parameter estimates in this example have units (for example the unit of c is min
1
, assuming time
is measured in minutes). For convenience units are omitted until the analysis is complete.
25
26
27
28
29
30
31
0 5 10 15 20 25
x
y
( ) [ ] x y 5371 . 0 exp 1 43 . 14 25 . 15 ˆ − − + =
23
Sheet 4.4: Best values for a, b and c returned by Solver when starting values for
parameter estimates are good.
A B C D
1 x(mins) y(°C) yˆ (°C) (y yˆ )
2
(°C
2
)
2 2 26.1 26.07247 0.000757691
3 4 26.8 26.97734 0.031447922
4 6 27.9 27.72762 0.029716516
5 8 28.6 28.34972 0.062639751
6 10 28.5 28.86555 0.133625786
7 12 29.3 29.29326 4.54949E05
8 14 29.8 29.64789 0.023136202
9 16 29.9 29.94195 0.001759666
10 18 30.1 30.18577 0.007356121
11 20 30.4 30.38793 0.00014558
12 22 30.6 30.55556 0.001974583
13 24 30.7 30.69456 2.96361E05
14 SSR = 0.29263495
15 a 24.98118
16 b 6.387988
17 c 0.093668
18
25
26
27
28
29
30
31
0 5 10 15 20 25
x
y
Figure 4.5: Graph of y versus x showing line and equation of line based on a, b and c
in sheet 4.4.
The sum of squares of residuals in cell D14 of sheet 4.4 is less than that in cell D14 of
sheet 4.3. This indicates that the parameter estimates obtained using Solver when
good starting values are used are rather better than those obtained when the starting
values are poorly chosen. In addition, the line fitted to the data in figure 4.5 (where
the line is based upon the new best estimates of the parameters) is far superior to the
line fitted to the same data in shown in figure 4.4. This is further reinforced by the
plot of residuals shown in figure 4.6 which exhibit a random scatter about the x axis.
( ) [ ] x y 09367 . 0 exp 1 388 . 6 98 . 24 ˆ − − + =
24
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0 5 10 15 20 25 30
x
y y ˆ −
Figure 4.6: Plot of residuals based on the data and equation in figure 4.5.
4.2 Limitations of Solver
Solver is able to efficiently solve for the best estimates of parameters in an equation,
such as those appearing in equation 4.2. However, Solver does not provide standard
errors in the parameter estimates. Standard errors in estimates are extremely
important, as without them it is not possible to quote a confidence interval for the
estimates and so we cannot decide if the estimates are ‘good enough’ for any
particular purpose.
If there are three parameters to be estimated, the standard errors in the parameter
estimates can be determined with the assistance of the matrix of partial derivatives
given by
29
,
E =

.

\

∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂
∂
∂

.

\

∂
∂
¯ ¯ ¯
¯ ¯ ¯
¯ ¯ ¯
2
2
2
c
y
c
y
b
y
c
y
a
y
c
y
b
y
b
y
b
y
a
y
c
y
a
y
b
y
a
y
a
y
i i i i i
i i i i i
i i i i i
(4.3)
The standard errors in a, b and c are the diagonal elements of the covariance matrix,
V, given by equation 2.17. Explicitly
30
,
( )
2
1
1
11
−
= E
a
σ σ (4.4)
( )
2
1
1
22
−
= E
b
σ σ (4.5)
29
Note that this approach can be extended to any number of parameters. See Neter et al. (1996)
chapter 13.
30
Compare these with equations 2.18 and 2.19.
25
( )
2
1
1
33
−
= E
c
σ σ (4.6)
where
31
,
( )
2
1
2
ˆ
3
1
−
−
≈
¯ i i
y y
n
σ (4.7)
A convenient way to calculate the elements of the E matrix is to write,
E = D
T
D (4.8)
D
T
is the transpose of the matrix, D.
D is given by,
D =
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
n n n
i i i
2 2 2
1 1 1
(4.9)
The partial derivatives in equation 4.9 are evaluated on completion of fitting an
equation using Solver, i.e. at values of a, b and c that minimise SSR. It is possible in
some situations to determine the partial derivatives analytically. A more flexible
approach and one that is generally more convenient, is to use the method of ‘finite
differences’ to find
a
y
∂
∂
1
,
a
y
∂
∂
2
etc. In general,
( ) [ ] [ ]
( ) a a
x c b a y x c b a y
a
y
i i
x c b
i
i
− +
− +
≈ 
.

\

∂
∂
δ
δ
1
, , , , , , 1
, ,
(4.10)
As double precision arithmetic is used by Excel, the perturbation, , in equation 4.10
can be as small as = 10
6
or 10
7
.
Similarly, the partial derivatives,
b
y
i
∂
∂
and
c
y
i
∂
∂
, are approximated using,
( ) [ ] [ ]
( ) b b
x c b a y x c b a y
b
y
i i
x c a
i
i
− +
− +
≈ 
.

\

∂
∂
δ
δ
1
, , , , , 1 ,
, ,
(4.11)
31
n 3 in the denominator of the term in the square brackets of equation 4.7 appears because the
estimate of the population standard deviation in the y values requires that the sum of squares of
residuals be divided by the number of degrees of freedom. The numbers of degree of freedom is the
number of data points, n, minus the number of parameters, p, in the equation. In this example, p = 3.
26
and,
( ) [ ] [ ]
( ) c c
x c b a y x c b a y
c
y
i i
x b a
i
i
− +
− +
≈ 
.

\

∂
∂
δ
δ
1
, , , , 1 , ,
, ,
(4.12)
4.3 Spreadsheet for the determination of standard errors in parameter estimates
In an effort to clarify the process of estimating standard errors, we describe a stepby
step approach using an Excel spreadsheet
32
.
To find good approximations of the derivatives
a
y
∂
∂
1
and
a
y
∂
∂
2
, it is necessary to perturb
a slightly (say to 1.000001×a) while leaving the parameter estimates b and c at their
optimum values. Sheet 4.5 shows the optimum values, as obtained by Solver for a, b
and c in cells G20 to G22. Cell H20 contains the value 1.000001×a. Cell I21 contains
the value 1.000001×b and cell J22 contains the value 1.000001×c.
Sheet 4.5: Modification of best estimates of parameters.
F G H I J
19 from solver b,c constant a,c constant a,b constant
20 a 24.98118 24.98120574 24.98118076 24.98118
21 b 6.387988 6.387988103 6.387994491 6.387988
22 c 0.093668 0.093668158 0.093668158 0.09367
23
We use the modified parameter estimates to calculate the numerator in equation 4.10.
The denominator in equation 4.10 may be determined by entering the formula =
$H$20$G$20 into a cell on the spreadsheet.
The partial derivative
1
, ,
1
x c b
a
y

.

\

∂
∂
is calculated by entering the formula
=(H2C2)/($H$20$G$20) into cell L2 of sheet 4.6
33
. By using Fill®Down, the
formula may be copied into cells in the L column so that the partial derivative is
calculated for every x
i
. To obtain
b
y
i
∂
∂
,
c
y
i
∂
∂
etc. this process is repeated for columns M
and N, respectively of sheet 4.6. The contents of cells L2 to N13 become the elements
of the D matrix given by equation 4.9.
32
It is possible to combine these steps into a macro of Visual Basic program (see Walkenback , 2001).
33
The values in the C column of the spreadsheet are shown in sheet 4.4.
27
Sheet 4.6: Calculation of partial derivatives.
H I J K L M N
1
yˆ with b,c,
constant
yˆ with, a,c
constant
yˆ with, a,b,
constant) dy/da dy/db dy/dc
2 26.07249879 26.0724749 26.07247 1 0.17084 10.5934
3 26.9773606 26.97733762 26.97734 1 0.31249 17.5673
4 27.72764019 27.72761795 27.72762 1 0.42994 21.8493
5 28.34974564 28.34972402 28.34972 1 0.52732 24.1556
6 28.86557359 28.86555249 28.86555 1 0.60807 25.0362
7 29.29327999 29.29325932 29.29326 1 0.67503 24.911
8 29.64791909 29.64789877 29.6479 1 0.73055 24.0978
9 29.94197336 29.94195334 29.94195 1 0.77658 22.8355
10 30.18579282 30.18577304 30.18577 1 0.81475 21.3012
11 30.38795933 30.38793976 30.38794 1 0.84639 19.6247
12 30.5555887 30.55556929 30.55557 1 0.87264 17.8993
13 30.69458107 30.69456181 30.69456 1 0.89439 16.1907
Excel’s TRANSPOSE() function is used to transpose the D matrix. We proceed as
follows:
• Highlight cells B24 to N26.
• In cell B24 type =TRANSPOSE(L2:N13).
• Press Ctrl ® Shift ®Enter to transpose of contents of cells L2 to N13 into
cells B24 to N26.
• Multiply D
T
with D (using the MMULT matrix function in Excel) to give E,
i.e.,
E = D
T
D =
64 . 5252 8741 . 160 062 . 246
8741 . 160 49356 . 5 65898 . 7
062 . 246 65898 . 7 12
(4.13)
The MINVERSE() function in Excel is used to find the inverse of E, i.e.,
E
1
=
− −
− −
− −
0054669 . 0 03456 . 0 09005 . 0
03456 . 0 870672 . 1 48539 . 0
09005 . 0 48539 . 0 239517 . 2
(4.14)
Two more steps are required to calculate the standard errors in the parameter
estimates. The first is to calculate the square root of each diagonal element of the
matrix E
1
. The second is to calculate σ using equation 4.7. Using the sum of squares
of residuals appearing in cell D14 of sheet 4.4, we obtain,
1803 . 0 2926 . 0
3 12
1
2
1
=
×
−
≈ σ
It follows that,
28
( )
2
1
1
11
−
= E
a
σ σ ( )
2
1
240 . 2 1803 . 0 × = = 0.270 (4.15)
( )
2
1
1
22
−
= E
b
σ σ ( )
2
1
871 . 1 1803 . 0 × = = 0.247 (4.16)
( )
2
1
1
33
−
= E
c
σ σ ( )
2
1
005467 . 0 1803 . 0 × = = 0.0133 (4.17)
4.4 Confidence intervals for parameter estimates
We use parameter estimates and their respective standard errors to quote a confidence
interval for each parameter
34
. For the parameters appearing in equation 4.1,
T
s
= a ± t
X%,ν
σ
a
(4.18)
k = b ± t
X%,ν
σ
b
(4.19)
α = c ± t
X%,ν
σ
c
(4.20)
t
X%,ν
is the critical value of the t distribution for X% confidence level with ν degrees
of freedom. t values are routinely tabulated in statistical texts. In this example ν = n 3
where n is the number of data points. In table 4.1 there are 12 points, so that ν = 9. If
we choose a confidence level of 95 %, (the commonly chosen level),
t
95%,9
= 2.262
Restoring the units of measurements and quoting 95% confidence intervals gives,
T
s
= (24.98 ± 2.262 × 0.270) °C = (24.98 ± 0.61) °C
k = (6.388 ± 2.262 × 0.247) °C = (6.39 ± 0.56) °C
α = (0.09367 ± 2.262 × 0.0133) min
1
= (0.094 ± 0.030) min
1
Exercise 3
The amount of heat entering an enclosure through a window may be reduced by
applying a reflective coating to the window. An experiment is performed to establish
the effect of a reflective coating on the rise in air temperature within the enclosure.
The temperature within the enclosure as a function of time is shown in table 4.2.
Time (minutes) Temperature (°C)
2 24.9
4 25.3
6 25.4
8 25.8
10 26.0
12 26.3
14 26.4
16 26.6
18 26.5
20 26.8
22 27.0
24 26.9
Table 4.2: Data for exercise 3.
34
See Kirkup (2002), p226.
29
Fit equation 4.2 to the data in table 4.2. Find a, b, and c and their respective standard
errors. Note that good starting values for parameter estimates are required if fitting by
nonlinear least squares is to be successful.
[a = 24.503 °C, σ
a
= 0.128 °C, b = 3.0613 °C, σ
b
= 0.227 °C, c = 0.0682 min
1
,
σ
c
= 0.0147 min
1
]
30
Section 5: More on fitting using nonlinear least squares
There are several challenges to face when fitting equations to data using nonlinear
least squares. These can be summarised as,
1) Choosing an appropriate model to describe the relationship between x and y.
2) Avoiding local minima in SSR.
3) Establishing good starting values prior to fitting by nonlinear least squares.
We consider 2) and 3) in this section. Model identification is considered in section 10.
5.1 Local Minima in SSR
When data are noisy, or starting values are far from the best estimates, a nonlinear
least squares fitting routine can become ‘trapped’ in a local minimum.
To illustrate this situation, we draw on the analysis of data appearing in section 4.1.
Equation 4.2 is fitted to the data in table 4.1 using the starting values given in sheet
4.2 and the best estimates, a, b and c are obtained for the parameters. For clarity, the
relationship between only one parameter estimate (c) and SSR is considered. Solver
finds a minimum in SSR when c is about 0.53 and terminates the fitting procedure.
The variation of SSR with c is shown in figure 5.1.
The minimum in SSR in figure 5.1 is referred to as a local minimum as there is
another combination of parameter estimates that will give a lower value for SSR. The
lowest value of SSR obtainable corresponds to a global minimum. It is the global
minimum that we would like to identify in all least squares problems.
9.5
10.0
10.5
11.0
11.5
12.0
0.8 0.7 0.6 0.5 0.4 0.3
c
SSR
Figure 5.1: Variation of SSR with c when a local minimum has been found when
equation 4.2 is fitted to data.
When starting values are used that are closer to the final values
35
, the nonlinear
fitting routine finds parameter estimates that produce a lower final value for SSR.
Figure 5.2 shows the variation of SSR with c in the interval (0.18 < c < 0.04).
35
See section 4.1.
31
Figure 5.2: Variation of SSR with c when a global minimum has been found when
equation 4.2 is fitted to data.
A number of indicators can assist in identifying a local minimum, though there is no
‘foolproof’ way of deciding whether a local or global minimum has been discovered.
A good starting point is to plot the raw data along with the fitted line (as illustrated in
figure 4.4). A poor fit of the line to the data could indicate,
• A local minimum has been found.
• An inappropriate model has been fitted to the data.
When a local minimum in SSR is found, the standard errors in the parameter estimates
tend to be large. As an example, best estimates appearing in sheet 4.3 (resulting from
being trapped in a local minimum), their respective standard errors and the magnitude
of the ratio of these quantities (expressed as a percentage) are,
a = 15.25, σ
a
= 9.27, so that σ
a
/a × 100% = 61 %
b = 14.43, σ
b
= 9.17, so that σ
b
/b × 100% = 64 %
c = 0.5371, σ
c
= 0.284, so that σ
c
/c × 100% = 53 %
When the global minimum in SSR is found (see sheet 4.4), the best estimates of the
parameters, standard errors etc. are,
a = 24.98, σ
a
= 0.270, so that σ
a
/a × 100% = 1.1 %
b = 6.388, σ
b
= 0.247, so that σ
b
/b × 100% = 3.9 %
c = 0.09367, σ
c
= 0.0133, so that σ
c
/c × 100% = 14 %
There is merit in fitting the same equation to data several times, each time using
different starting values for the parameter estimates. If, after fitting, there is
consistency between the final values obtained for the best estimates, then it is likely
that the global minimum has been identified.
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
0.20 0.15 0.10 0.05 0
c
SSR
32
5.2 Starting values
There are no general rules that may be applied in order to determine good starting
values
36
for parameter estimates prior to fitting by nonlinear least squares. It is
correct, but sometimes unhelpful, to remark that familiarity with the relationship
being studied can assist greatly in deciding what might be reasonable starting values
for parameter estimates.
A useful approach to determining starting values is to begin by plotting the
experimental data. Consider the data in figure 5.3, which has a smooth line drawn
through the points ‘by eye’.
20.0
22.0
24.0
26.0
28.0
30.0
32.0
0 5 10 15 20 25
y
(
°
C
)
≈31
≈25.5
Figure 5.3: Line drawn 'by eye' through the data given in table 4.1.
If the relationship between x and y is given by equation 5.1, then we are able to
estimate a and b by considering the data in figure 5.3 and a ‘rough’ line drawn
through the data.
( ) [ ] cx b a y exp 1− + = (5.1)
Equation 5.1 predicts that y =a when x is equal to zero. From figure 5.3 we see that
when x = 0, y ≈ 25.5 °C, so that a ≈ 25.5 °C. When x is large, (and assuming c is
negative) then y = a + b. Inspection of the graph in figure 5.3 indicates that when x is
large, y ≈ 31.0 °C, i.e. a + b ≈ 31.0 °C. It follows that b ≈ 5.5 °C. If we write the
starting values for a and b as a
0
and b
0
respectively, then a
0
= 25.5 °C and b
0
= 5.5 °C.
In order to determine a starting value for c, c
0
, equation 5.1 is rearranged into the
form
37
,
36
Sometimes referred to as initial estimates.
37
Starting values, a
0
and b
0
, are substituted into the equation.
x (minutes)
33
x c
b
a y
0
0
0
1 ln =


.

\
 −
− (5.2)
Equation 5.2 has the form of an equation of a straight line passing through the origin
(ie y = bx). It follows that plotting


.

\
 −
−
0
0
1 ln
b
a y
versus x should give a straight line
with slope, c
0
.
y = 0.1243x + 0.2461
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0 5 10 15 20 25 30
x
l
n
(
[
1

(
y

a
0
)
/
b
0
]
Figure 5.4: Line of best fit used to determine starting value for c.
Figure 5.4 shows a plot of


.

\
 −
−
0
0
1 ln
b
a y
versus x. The line of best fit and the
equation of the line has been added using the Trendline option in Excel
38,39
. The slope
of the line is approximately 0.12. The starting values may now be stated for this
example, i.e.,
a
0
= 25.5 °C, b
0
= 5.5 °C, c
0
= 0.12 min
1
These starting values were used in the successful fit of equation 4.2 to the data given
in table 4.1 (The output of the fit is shown in sheet 4.4).
5.3 Starting values by curve stripping
Establishing starting values in some situations is quite difficult and may require a
significant amount of preprocessing of the data. For example, the fitting to data of an
equation consisting of a sum of exponential terms, such as,
dx c bx a y exp exp + = (5.3)
38
For details of Trendline see page 222 in Kirkup (2002).
39
An equation of the form y = a + bx was fitted to the data using Trendline in Excel. Alternatively, we
could have fitted y = bx to the data. Either approach would have given an acceptable starting value
for c
0
.
34
or
40
fx e dx c bx a y exp exp exp + + = (5.4)
is particularly challenging especially when data are noisy and/or the ratio of the
parameters within the exponentials is less than approximately 3 (e.g. when the ratio
b/d in equation 5.3 is less than 3)
41
. Fitting of equations such as equation 5.3 and
equation 5.4 is quite common, for example the kinetics of drug transport through the
human body is routinely modelled using ‘compartmental analysis’. Compartmental
analysis attempts to predict concentrations of drugs as a function of time (eg in blood
or urine), The relationship between concentration and time is often well represented
by a sum of exponential terms. In analytical chemistry, excited state lifetime
measurements offer a means of identifying components in a mixture. The decay of
phosphorescence with time that occurs after illumination of the mixture may be
captured. The decay can be represented by a sum of exponential terms. Fitting a sum
of exponentials by nonlinear least squares allows for each component in the mixture
to be discriminated
42
.
If an equation to be fitted to data consists of a sum exponential terms, good starting
values for parameter estimates are extremely important if local minima in SSR are to
be avoided. It is also possible that, if starting values for the parameter estimates are
too far from the optimum values, SSR will increase during the iterative process to such
an extent that it exceeds the maximum floating point number that a spreadsheet (or
other program) can handle. In this situation, fitting is terminated and an error message
is returned by the spreadsheet.
Data in figure 5.5 have been gathered in an experiment in which the decay of photo
generated current in the wide band gap semiconductor cadmium sulphide (CdS) is
measured as a function of time after photoexcitation of the semiconductor has ceased.
There appears to be an exponential decay of the photocurrent with time. Theory
indicates
43
that there may be more than one decay mechanism for photoconductivity.
That, in turn, suggests that an equation of the form given by equation 5.3 or equation
5.4 is appropriate.
40
Here we assume f d b > >
41
See Kirkup and Sutherland (1988).
42
See Demas (1983).
43
See Bube (1960), chapter 6.
35
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Time (ms)
P
h
o
t
o
c
u
r
r
e
n
t
(
a
r
b
i
t
r
a
r
y
u
n
i
t
s
)
Figure 5.5: Photocurrent versus time data for cadmium sulphide.
If equation 5.3 is to be fitted to data, how are starting values for parameter estimates
established? If b is large (and negative) then the contribution of the first term in
equation 5.3 to y is small when x exceeds some value, which we will designate as x.
Equation 5.3 can now be written, for x > x,
dx c y exp ≈ (5.5)
Equation 5.5 can be linearised by taking natural logarithms of both sides of the
equation. The next step is to fit a straight line to the transformed data to find
(approximate values) for c and d which we will designate as c
0
and d
0
respectively.
Now we revisit equation 5.3 and write for x < x,
bx a x d c y exp exp
0 0
= − (5.6)
Transforming equation 5.6 by taking natural logarithms of both sides of the equation
then fitting a straight line to the transformed data will yield approximate values for a
and b which can serve as starting values in a nonlinear fit. For a more detailed
discussion of how to determine starting values when an equation to be fitted consists
of a sum of exponential terms, see Kirkup and Sutherland, 1988.
5.4 Effect of instrument resolution and noise on best estimates
Errors in the dependent variable lead to uncertainties in parameter estimates
44
. If
errors are very large, it may not be possible to establish reasonable parameter
estimates. To illustrate the effect of errors on fitting, we consider the outcome of
Monte Carlo simulations in which errors are added to data in the form of normally
distributed noise to ‘noise free’ data
45
. After noise is added, nonlinear least squares is
performed to find best estimates of the parameters.
44
In the case of a ‘model violation’ (such that the equation fitted to the data is not appropriate) there
would be nonzero residuals even if the data were error free. Such nonzero residuals would translate to
uncertainties in parameter estimates.
45
Monte Carlo simulations are dealt with in section 11.
36
To study the effect of noise on parameter estimates, data are generated in an
‘experiment’ in which the temperature of water in a vessel is monitored as it cools in a
laboratory. The equation relating temperature, T, to time, t, is written,
( ) ) exp( kt T T T T
s
− − + =
∞ ∞
(5.7)
where T
∞
is the temperature at infinite time (which is equal to room temperature), T
s
is
the starting temperature, and k is the rate constant for cooling.
We choose (arbitrarily),
T
∞
= 26 °C, T
s
= 62 °C, k = 0.034 min
1
Noise free data generated at 5 minute intervals between t = 0 and t = 55 minutes using
equation 5.7 are shown in figure 5.6.
20
25
30
35
40
45
50
55
60
65
0 10 20 30 40 50 60
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e
(
o
C
)
Figure 5.6: Noise free data of temperature versus time generated using equation 5.7.
Writing equation 5.7 using our usual convention for variables and parameter estimates
gives,
( ) ) exp( cx a b a y − − + = (5.8)
The next step is to fit equation 5.8 to the data, using the following conditions:
Starting values: a
0
= 25, b
0
= 60, c
0
= 0.02
The fitting options
46
are selected using Solver Options dialog box as shown in
figure 5.7:
46
Fitting options are discussed in section 9.1.
37
Figure 5.7: Solver Options used to fit equation 5.8 to the data in figure 5.6.
Using Solver, the following values were recovered for best estimates of parameters
and standard errors in best estimates.
a σ
a
b σ
b
c σ
c
SSR
25.99993547 3.66 ×10
5
61.99997243 1.45 ×10
5
0.0339998 7.65×10
8
2.73 ×10
9
Table 5.1: Best estimate of parameters and standard errors in parameters.
5.4.1 Adding normally distributed noise to data using Excel’s Random Number
Generator
To investigate the effect of errors on the fitting of equations to data, normally
distributed noise
47
of constant standard deviation (i.e. homoscedastic data) is added to
noise free data
48
.
Normally distributed noise can be added to the data by using the Random Number
Generation tool in the Analysis ToolPak. The mean and standard deviation of the
random numbers are controlled using the dialog box shown in figure 5.8. When
adding noise, it is usual to select the mean to be zero. The standard deviation can have
any value (the larger the value, the greater the ‘noise’). For this example it is
convenient to leave the standard deviation at its default value of one.
47
Also referred to as Gaussian noise.
48
Heteroscedastic noise can also be added to data with the aid of Excel’s Random Number Generation
tool (see section 11.3).
38
Figure 5.8: Normally distributed noise with zero mean and standard deviation of one.
Noise is generated using the Random Number Generation tool in Excel’s Analysis
ToolPak.
The ‘experimental’ data in column D (i.e. the data with noise added) are obtained by
adding values in columns B to those in column C. Figure 5.8 shows the formula
entered into cell D2. The next step is to use Fill®Down to enter the formula into cells
D3 to D13. A plot of y values with noise added (as given in column D of figure 5.8)
versus x is shown in figure 5.9.
20
25
30
35
40
45
50
55
60
65
0 10 20 30 40 50 60
Time (minutes)
T
e
m
p
e
r
a
t
u
r
e
(
o
C
)
Figure 5.9: Data in figure 5.6 with addition of normally distributed noise.
5.4.2 Fitting an equation to noisy data
To show the effect errors have on the fitting of equation 5.8 to data, parameter
estimates (and standard errors) are compared when temperature data,
• are noise free
• are rounded to the nearest 0.1 °C, but no noise is added
• have noise of standard deviation 0.2 °C added
• have noise of standard deviation 1 °C added
• have noise of standard deviation 5 °C added.
A B C D
1 x(min) y(°C) Noise (°C) y
exp
(°C)
2 0 62 0.30023 =B2+C2
3 5 56.3719334 1.27768
4 10 51.62373162 0.244257
5 15 47.61784084 1.276474
6 20 44.23821173 1.19835
7 25 41.38693755 1.733133
8 30 38.98141785 2.18359
9 35 36.95196551 0.23418
10 40 35.23978797 1.095023
11 45 33.79528402 1.0867
12 50 32.57660687 0.6902
13 55 31.54845183 1.69043
39
Temperature data over the period x = 0 to x = 55 minutes were generated in the
manner described in section 5.4.1. The starting values for all fits were a
0
= 25,
b
0
= 60, c
0
= 0.02. Solver Options were as given in figure 5.7.
Noise a σ
a
b σ
b
c σ
c
SSR
None 25.9999 3.66 × 10
5
61.99997 1.45 × 10
5
0.0339998 7.65×10
8
2.73 ×10
9
RD:0.1
49
25.9870 0.0712 61.9995 0.0282 0.0339850 0.000149 0.01031
0.2 26.4735 0.378 61.9991 0.154 0.0345083 0.000821 0.3070
1.0 19.2820 4.54 61.1136 0.959 0.0245648 0.00487 12.63
5.0 25.4289 6.42 70.1401 4.82 0.0484832 0.0197 278.2
Table 5.1: Best estimates of parameters and standard errors in estimates.
As anticipated, the standard errors in the estimates increase as the noise increases. In
order to indicate to what extent the estimates a, b and c differ from the true values,
T
∞
= 26 °C, T
s
= 62 °C and k = 0.034 min
1
respectively, percentage differences are
presented in table 5.2.
Noise a (a T
∞
)×100%/ T
∞
b (b T
s
)×100%/ T
s
c (b k)×100%/ k
None 25.9999 0.000385 62.0000 4.84E05 0.0339998 0.000588
RD:0.1 25.9870 0.0500 61.9995 0.000806 0.0339850 0.0441
0.2 26.4735 1.82 61.9991 0.00145 0.0345083 1.50
1.0 19.2820 25.8 61.1136 1.43 0.0245648 27.8
5.0 25.4289 2.20 70.1401 13.1 0.0484832 42.6
Table 5.2: (Absolute) percentage difference between parameter estimates and true
values.
Note that, on the whole, the percentage difference between the parameter estimates
and the true value as given in table 5.2, increases as the noise increases. However
examination of table 5.2 reveals that for a noise of standard deviation of 5, the
estimate of T
∞
is within ≈ 2% of the true value. This should be expected: as noise
added is random there is a possibility that ‘by chance’ a good estimate for some
parameter will be obtained even when the noise is quite large. However if we were to
repeat the simulation many times we would find that, on average, the percentage
difference between the true values of the parameters and the parameter estimates
would increase as the noise level increased.
5.4.3 Relationship between sampling density and parameter estimates
When repeat measurements are made of a single quantity (such as the time taken for a
ball to free fall through a fixed distance) the standard error in the mean,
x
σ , of the
data is related to the standard deviation, σ, by,
50
n
x
σ
σ = (5.9)
49
Denotes temperature values rounded to 0.1 °C.
50
See Kirkup (2002), ch1.
40
Equation 5.9 indicates that
x
σ reduces as n / 1 i.e. if more measurements are made,
we profit by a reduction in the standard error of the mean. It is anticipated that in
analysis by least squares, there is a similar reduction in the standard error of the
parameter estimates as the number of measurements increases
51
. To establish this,
consider the analysis of data generated using equation 5.7 (with parameters,
T
∞
= 26 °C, T
s
= 62 °C, k = 0.034 min
1
) to which noise of unity standard deviation
has been added to data ‘gathered’ in the range x = 0 °C to x = 60 °C. Data are
generated at evenly spaced intervals of temperature. The number of values were
chosen to be n = 9, 16, 25, 33, 41, 49, 61, 91, 121. a, b and c and their respective
standard errors were determined using (unweighted) nonlinear least squares.
Equation 5.8 was fitted to the data in order to establish best estimates and standard
errors in the best estimates. Squaring the standard errors gives the variances,
2
a
σ ,
2
b
σ and
2
c
σ , in the parameter estimates.
If an equation of the form given in equation 5.9 is valid for the standard errors in the
parameter estimates, then plotting
2
a
σ ,
2
b
σ and
2
c
σ versus 1/n should produce a
straight line. Figure 5.10 shows such plots.
outlier
y = 22.942x + 0.826
R
2
= 0.2831
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
V
a
r
i
a
n
c
e
o
f
p
a
r
a
m
e
t
e
r
e
s
t
i
m
a
t
e
,
a
a)
y = 5.8982x + 0.1142
R
2
= 0.892
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
V
a
r
i
a
n
c
e
o
f
p
a
r
a
m
e
t
e
r
e
s
t
i
m
a
t
e
,
b
b)
51
Frenkel (2002) discusses the relationship between the standard errors in the parameter estimates and
the number of data, n.
41
y = 0.0001x + 2E06
R
2
= 0.9164
0.0E+00
2.0E06
4.0E06
6.0E06
8.0E06
1.0E05
1.2E05
1.4E05
1.6E05
1.8E05
0.00 0.02 0.04 0.06 0.08 0.10 0.12
1/n
v
a
r
i
a
n
c
e
o
f
p
a
r
a
m
e
t
e
r
e
s
t
i
m
a
t
e
,
c
c)
Figure 5.10: Variance of parameter estimates as a function of number of data. Each
graph shows the equation of the best straight line fitted to the points and the
coefficient of determination, R
2
.
With the exception of the circled data point in figure 5.10a, the points on the graphs in
figures 5.10a to 5.10c appear to be follow a linear relationship, indicating that the
variance of the parameter estimates does decrease (at least approximately) as 1/n.
42
Section 6: Linear least squares meets nonlinear least squares
It is possible to use the technique of nonlinear least squares to fit linear equations to
data. In such circumstances we expect the same values to emerge for the best
estimates of the parameters and the standard errors in the estimates, irrespective of
whether fitting is carried out by linear or nonlinear least squares.
To illustrate this, we consider an example in which the van Deemter equation is fitted
to gas chromatography data in table 6.1
52
.
v (ml/min)
H (mm)
3.4 9.59
7.1 5.29
16.1 3.63
20.0 3.42
23.1 3.46
34.4 3.06
40.0 3.25
44.7 3.31
65.9 3.50
78.9 3.86
96.8 4.24
115.4 4.62
120.0 4.67
Table 6.1: Plate height versus flow rate data.
The relationship between plate height, H, and flow rate, v, can be written,
53
Cv
v
B
A H + + = (6.1)
where A, B and C are constants. Consistent with our convention of naming variables
and parameter estimates, we rewrite equation 6.1 as,
cx
x
b
a y + + = (6.2)
a, b, and c are estimates of the constants A, B and C respectively in equation 6.1.
Equation 6.2 may be fitted to the data in table 6.1 using linear least squares. A
convenient way to accomplish this is to use the Regression tool in the Analysis
ToolPak in Excel
54
. Figure 6.1 shows an Excel spreadsheet containing the data and the
output of the Regression tool. To perform (linear) least squares with this tool, we
place values of 1/x and x in adjacent columns (these appear in columns B and C of
figure 6.1).
52
See Moody H W (1982).
53
See Snyder et al. (1997), p46.
54
See Kirkup (2002) p 373.
43
Figure 6.1: Fitting equation 6.2 to data using the Regression tool in Excel’s Analysis
ToolPak.
Equation 6.2 is now fitted to the data in table 6.1 using Solver to perform nonlinear
least squares. The approach adopted for determining the best estimates and the
standard errors in the best estimates is as described in sections 4, 4.1 and 4.2.
As anticipated, both linear least squares and nonlinear least squares return the same
best estimates for the parameters and standard errors in the best estimates, as can be
seen by inspection of figures 6.1 and 6.2.
a
b
c
σ
a
σ
b
σ
c
44
Best estimates of parameters Standard errors in estimates
Standard deviation
of y values
E =D
T
D
D matrix
D
T
matrix
Inverse of E matrix
Sum of squares of residuals
Figure 6.2: Spreadsheet for fitting equation 6.2 to data in table 6.1 using nonlinear least squares.
45
Exercise 4
The Knox equation is widely used to represent the relationship between the plate
height H, and the velocity, v, of the mobile phase of a liquid chromatograph (LC)
55
.
The relationship may be written,
Cv
v
B
Av H + + =
3
1
(6.3)
where A, B and C are constants.
Table 6.2 shows LC data of plate height versus flow velocity for data published by
Katz et al (1983)
56
.
H (cm) v (cm/s)
0.004788 0.03027
0.003704 0.04527
0.003116 0.06507
0.002526 0.10023
0.002292 0.1306
0.002176 0.1653
0.002246 0.2488
0.002360 0.3185
0.002678 0.4792
0.002856 0.6028
Table 6.2: Data from Katz et al. (1983).
Use either linear or nonlinear least squares to fit equation 6.3 to the data in table 6.2
and thereby obtain best estimates of A, B and C and the standard errors in the
estimates. [0.002509
3
1
3
2
s cm ⋅ , 0.0001232 cm
2
/s , 0.0008720 s, 0.000185
3
1
3
2
s cm ⋅ ,
3.12 × 10
6
cm
2
/s, 0.000326 s].
55
See Kennedy and Knox, (1972).
56
The data were obtained with a benzyl acetate solute and a mobile phase of 4.48% (w/v) ethyl acetate
in n – pentane.
46
Section 7: Weighted nonlinear least squares
There are some occasions where the standard deviation of the errors in the y values is
not constant (i.e. errors exhibit heteroscedasticity). Such a situation may be revealed
by plotting residuals
57
, ( ) y y ˆ − , versus x. If errors are heteroscedastic, then weighted
fitting is required. The purpose of weighted fitting is to obtain best estimates of the
parameters by forcing the line close to the data that are known to high precision, while
giving much less weight to those data that exhibit large scatter.
The starting point for weighted fitting using least squares is to define a sum of squares
of residuals that takes into account the standard deviation in the y values. We write,
¯


.

\
 −
=
2
2
ˆ
i
i i
y y
σ
χ (7.1)
We refer to χ
2
as the weighted sum of squares of residuals
58
. σ
i
is the standard
deviation in the ith yvalue. The purpose of the weighted fitting is to find best
estimates of parameters that minimise χ
2
in equation 7.1.
If σ
i
is constant, as it is in unweighted fitting using least squares, equation 7.1 can be
replaced by equation 2.7. In this sense, equation 7.1 can be thought as the more
general formulation of least squares.
7.1 Weighted fitting using Solver
In order to establish best estimates of parameters using Solver when weighted fitting
is performed, we use an approach similar to that described in section 4. For weighted
fitting, an extra column in the spreadsheet containing the standard deviations σ
i
is
required. It is possible that the absolute values of σ
i
are unknown and that only
relative standard deviations are known. For example, equations 7.2 and 7.3 are
sometimes used when weighted fitting is required,
i i
y ∝ σ (7.2)
i i
y ∝ σ (7.3)
Weighted fitting can be carried out so long as,
• the absolute standard deviations in values are known, or
• the relative standard deviations are known.
In order to accomplish weighted non linear least squares, we proceed as follows:
1) Fit the desired equation to data by calculating χ
2
as given by equation 7.1. Use
Solver to modify parameter estimates so that χ
2
is minimised.
2) Determine the elements in the D matrix, as described in section 4.
57
See section 6.10 of Kirkup (2002).
58
¯ 

.

\
 −
2
ˆ
i
i i
y y
σ
follows a chisquared distribution, hence the use of the symbol , χ
2
.
47
3) Construct the weight matrix, W, in which the diagonal elements of the matrix
contain the weights to be applied to the y values.
4) Calculate the weighted standard deviation σ
w
, where σ
w
is given by,
2
1
2


.

\

−
=
p n
w
χ
σ (7.4)
χ
2
is given by equation 7.1, n is the number of data points and p is the number
of parameters in the equation to be fitted to the data.
5) Calculate the standard errors of the parameter estimates, given by
59
( ) ( ) [ ]
2
1
1
T
WD D B
−
=
w
σ σ (7.5)
B is the matrix containing elements equal to the best estimates of the
parameters. σ
w
is the weighted standard deviation, given by equation 7.4.
6) Calculate the confidence interval for each parameter appearing in the equation
at a specified level of confidence (usually 95%).
To illustrate steps 1 to 6, we consider an example of weighted fitting using Solver.
7.2 Example of weighted fitting using Solver
The relationship between the current, I, through a tunnel diode and the voltage, V
across the diode may be written
60
,
( )
2
V B AV I − = (7.6)
A and B are constants to be estimated using least squares. Table 7.1 shows current –
voltage data for a germanium tunnel diode.
V(mV) I (mA)
10 4.94
20 6.67
30 10.57
40 10.11
50 10.44
60 12.90
70 10.87
80 9.73
90 7.03
100 5.61
110 3.80
120 2.36
Table 7.1: Current voltage data for a germanium tunnel diode.
59
See Neter et al. (1996).
60
See Karlovsky (1962).
48
Equation 7.6 could be fitted to data using unweighted nonlinear least squares (in the
first instance it usually sensible to use an unweighted fit, as the residuals may show
little evidence of heteroscedacity and so there is little point in performing a more
complex analysis).
In this example we are going to assume that the error in the y quantity is proportional
to the size of the y quantity, ie equation 7.3 is valid for these data.
The data in table 7.12 are entered into a spreadsheet as shown in sheet 7.1 and is
plotted in figure 7.1.
Sheet 7.1: Data from table 7.1 entered into a spreadsheet.
A B
1 x(mV) y(mA)
2 10 4.94
3 20 6.67
4 30 10.57
5 40 10.11
6 50 10.44
7 60 12.90
8 70 10.87
9 80 9.73
10 90 7.03
11 100 5.61
12 110 3.80
13 120 2.36
14
Figure 7.1: Current –voltage data for a germanium tunnel diode.
0
2
4
6
8
10
12
14
0 20 40 60 80 100 120 140
x (mV)
y
(
m
A
)
49
7.2.1 Best estimates of parameters using Solver
Consistent with symbols used in other analyses in this document, we rewrite equation
7.6 as,
( )
2
x b ax y − = (7.7)
We can obtain a reasonable value for b, which we will use as a starting value, b
0
, by
noting that equation 7.7 predicts that y = 0 when x = b. By inspection of figure 7.1 we
see that when y = 0, x ≈ 130 mV, so that b
0
= 130 mV. Equation 7.7 is rearranged to
give,
( )
2
x b x
y
a
−
= (7.8)
An approximate value for a (which we take to be the starting value, a
0
) can be
obtained, by choosing any data pair from sheet 7.1 (say, x = 50 mV and
y = 10.44 mA) and substituting these into equation 7.8 along with b
0
= 130 mV. This
gives (to two significant figures) a
0
= 3.3 × 10
5
.
Sheet 7.2 shows the cells containing the calculated values of current ( yˆ ) in column C
based on equation 7.7. The parameter estimates are the starting values (3.3 × 10
5
and
130) in cells D17 and D18. Column D of sheet 7.2 contains the weighted sum of
squares of residuals. The sum of these residuals appears in cell D14.
Sheet 7.2: Fitted values and weighted sum of squares of residuals before optimisation
occurs.
B C D
1
y(mA)
yˆ
2
ˆ


.

\
 −
y
y y
2 4.94 4.752 0.001448311
3 6.67 7.986 0.038927822
4 10.57 9.9 0.004017905
5 10.11 10.692 0.003313932
6 10.44 10.56 0.000132118
7 12.90 9.702 0.061457869
8 10.87 8.316 0.055205544
9 9.73 6.6 0.103481567
10 7.03 4.752 0.105001811
11 5.61 2.97 0.221453287
12 3.80 1.452 0.381793906
13 2.36 0.396 0.692562482
14 sum 1.668796555
15
16 solver
17 a 3.30E05
18 b 130
50
Running Solver (using the default settings – see section 9.1) gives the output shown in
sheet 7.3.
Sheet 7.3: Fitted values and weighted sum of squares of residuals after optimisation
using Excel’s Solver.
B C D
1
y(mA)
yˆ
2
ˆ


.

\
 −
y
y y
2 4.94 4.451251736 0.009788509
3 6.67 7.671430611 0.022541876
4 10.57 9.797888132 0.005335934
5 10.11 10.96797581 0.007201911
6 10.44 11.31904514 0.007089594
7 12.90 10.98844764 0.02195801
8 10.87 10.11353481 0.004843048
9 9.73 8.831658167 0.008524277
10 7.03 7.280169211 0.00126636
11 5.61 5.596419449 5.86015E06
12 3.80 3.917760389 0.000960354
13 2.36 2.381543539 8.33317E05
14 sum 0.089599066
15
16 from solver
17 a 2.289E05
18 b 149.4440503
The weighted standard deviation is calculated using equation 7.4, ie
2
1
2


.

\

−
=
p n
w
χ
σ =
2
1
2 12
6 0.08959906

.

\

−
= 0.09465678 (7.9)
7.2.2 Determining the D matrix
In order to determine the matrix of partial derivatives, we calculate
( ) [ ] [ ]
( ) a a
x b a y x b a y
a
y
i i
x b
i
i
− +
− +
≈ 
.

\

∂
∂
δ
δ
1
, , , , 1
,
(7.10)
and
( ) [ ] [ ]
( ) b b
x b a y x b a y
b
y
i i
x a
i
i
− +
− +
≈ 
.

\

∂
∂
δ
δ
1
, , , 1 ,
,
(7.11)
is chosen to be 10
6
(see section 4.2). Sheet 7.4 shows the values of the partial
derivatives in the D matrix.
51
Sheet 7.4: Calculation of partial derivatives used in the D matrix.
E F G H I
1
yˆ (b constant) yˆ (a constant)
dy/da
dy/db
2 4.451256188 4.451261277 194446.4316 0.063842869
3 7.671438283 7.671448325 335115.2430 0.118528971
4 9.79789793 9.797912649 428006.4343 0.164058306
5 10.96798677 10.96800576 479120.0055 0.200430873
6 11.31905646 11.31907916 494455.9567 0.227646674
7 10.98845863 10.98848436 480014.2876 0.245705707
8 10.11354493 10.11357286 441794.9986 0.254607974
9 8.831666999 8.831696179 385798.0894 0.254353473
10 7.280176491 7.280205816 318023.5601 0.244942205
11 5.596425045 5.596453279 244471.4107 0.226374169
12 3.917764307 3.917790076 171141.6412 0.198649367
13 2.381545921 2.381567715 104034.2515 0.161767798
14
15
16 b constant a constant
17 2.289E05 2.289E05
18 149.4440503 149.4441997
7.2.3 The weight matrix, W
The weight matrix is a square matrix with diagonal elements proportional to
2
1
i
σ
and
other elements equal to zero
61
. In this example, σ
i
is taken to be equal to y
i
, so that the
diagonal matrix is as given in sheet 7.5.
Sheet 7.5: Weight matrix for tunnel diode analysis (while the weights are shown to
only three decimal places, Excel retains all figures for the calculations).
C D E F G H I J K L M
N
24 0.041 0 0 0 0 0 0 0 0 0 0 0
25
0 0.022 0 0 0 0 0 0 0 0 0 0
26
0 0 0.009 0 0 0 0 0 0 0 0 0
27 0 0 0 0.010 0 0 0 0 0 0 0 0
28
0 0 0 0 0.009 0 0 0 0 0 0 0
29 0 0 0 0 0 0.006 0 0 0 0 0 0
30
0 0 0 0 0 0 0.008 0 0 0 0 0
31 0 0 0 0 0 0 0 0.011 0 0 0 0
32
0 0 0 0 0 0 0 0 0.020 0 0 0
33
0 0 0 0 0 0 0 0 0 0.032 0 0
34
0 0 0 0 0 0 0 0 0 0 0.069 0
35
0 0 0 0 0 0 0 0 0 0 0 0.180
61
For details on the weight matrix, see Neter et (1996).
D
W
52
7.2.4 Calculation of ( )
1
T
WD D
−
To obtain standard errors in estimates a and b, we must determine ( )
1
T
WD D
−
.
Sheet 7.6 shows the several steps required to determine ( )
1
T
WD D
−
. The steps consist
of:
a) Calculation of the matrix WD. The elements of this matrix are shown in
cells C37 to D48. (W is multiplied with D using the MMULT() function in
Excel).
b) Calculation of the matrix D
T
WD. The elements of this matrix are shown in
cell G37 to H38.
c) Inversion of the matrix, D
T
WD. The elements of the inverted matrix are
shown in cells G41 to H42.
Sheet 7.6: Calculation of ( )
1
T
WD D
−
.
B C D E F G H
37 WD 7967.94045 0.002616 D
T
WD 2.2728E+10 15410.18847
38 7532.55853 0.002664 15410.1885 0.013460574
39 3830.89566 0.001468
40 4687.5077 0.001961
41 4536.55955 0.002089 (D
T
WD)
1
1.9662E10 0.0002251
42 2884.5279 0.001477 0.0002251 331.9969496
43 3739.05374 0.002155
44 4075.06361 0.002687
45 6435.00139 0.004956
46 7767.87729 0.007193
47 11851.9142 0.013757
48 18678.9449 0.029045
7.2.5 Bringing it all together
To calculate the standard errors in a and b, the weighted standard deviation of the
mean (given by equation 7.4) is multiplied by the square root of the diagonal elements
of the ( )
1
T
WD D
−
matrix, i.e.
( )
2
1
10
10 1.9662
−
× =
w a
σ σ = 0.09465678 ×( )
2
1
10
10 1.9662
−
× = 1.327 × 10
6
and
( )
2
1
00 . 332
w b
σ σ = = 0.09465678 × ( )
2
1
10
10 00 . 332
−
× = 1.725
It follows that the 95% confidence intervals for A and B are,
A= a ± t
95%,ν
σ
a
(7.12)
B = b ± t
95%,ν
σ
b
(7.13)
t
95%,ν
is t value corresponding to the 95 % level of confidence and ν is the number of
degrees of freedom.
53
In this example, the number of degrees of freedom, ν = n – p = 12 – 2 = 10. From
statistical tables
62
,
t
95%,10
= 2.228
It follows that (inserting units),
A= (2.29 ± 0.30) × 10
5
mA/(mV)
3
B = (149.4 ± 3.8) mV
Exercise 5
Equation 7.6 may be transformed into a form suitable for fitting by linear least
squares.
a) Show that equation 7.6 can be rearranged into the form,
V A B A
V
I
2
1
2
1
2
1
− = 
.

\

(7.14)
b) Plot a graph of
2
1

.

\

V
I
versus V.
c) Use unweighted least squares to obtain best estimates of A and B and standard
errors in the best estimates
63
. [2.30 × 10
5
mA/(mV)
3
, 149.8 mV,
2.0 × 10
6
mA/mV
3
, 1.5 mV)
d) Why is it preferable to use nonlinear least squares to estimate parameters
rather than to linearise equation 7.6 followed by using linear least squares to
find these estimates?
62
See, for example, Kirkup (2002) page 385.
63
Care must be exercised when calculating the uncertainty in the estimate of B as this requires use of
both slope and the intercept and these are correlated. For more information see Kirkup (2002),
page 232.
54
Section 8: Uncertainty propagation, least squares estimates and
calibration
Establishing best estimates of parameters in an equation may be the main purpose of
fitting an equation to experimental data. For example, in an experiment to study the
variation of resistance, R, with time, t, in a photoconductor, the primary purpose of the
fitting may be to obtain best estimates for the parameters A
1
, A
2
, B
1
and B
2
which
appear in equation 8.1
64
which represents a possible relationship between R and t.
( ) ( ) t B A t B A R
2 2 1 1
exp exp + = (8.1)
There are situations in which parameter estimates are used to calculate other
quantities of interest. A common example involves gathering xy data for the purpose
of calibration. Once the best estimates of the parameters in the calibration equation
have been determined, the equation is used to find ‘values of x’ from measured values
of y.
For example, if the relationship between x and y is
y = a + bx (8.2)
then for a given (mean) value of y, y , the corresponding value of x, xˆ , can be
determined. This is done by rearranging equation 8.2 and replacing y by y and x by
xˆ , so that
b
a y
x
−
= ˆ (8.3)
One approach to calculating the standard error in xˆ is to assume that the errors in a, b
and y are uncorrelated. In this situation the standard error,
xˆ
σ , is given by,
2
2 2
ˆ
ˆ ˆ ˆ


.

\

∂
∂
+ 
.

\

∂
∂
+ 
.

\

∂
∂
=
y b a x
y
x
b
x
a
x
σ σ σ σ (8.4)
Unfortunately, errors in the parameters estimates a and b are correlated
65
, so it is not
valid to use equation 8.4. To correctly determine
xˆ
σ , we must account for that
correlation. We begin by determining the covariance matrix, V, given by,
V = σ
2
A
1
(8.5)
Where σ
2
is the variance in the y values and A
1
is the error matrix, as discussed in
section 2.
σ
2
is found using,
64
Equation 8.1 represents a possible relationship between R and t (Kirkup L and Cherry I, 1988).
65
See Salter (2002).
55
( )
2
ˆ
2
2
−
−
=
¯
n
y y
i i
σ (8.6)
where
i i
bx a y + = ˆ , and n is the number of xy data.
In this example, A
1
is the inverse of the matrix, A, where,
A =


.

\

¯ ¯
¯
2
x x
x n
(8.7)
There is an economical way to determine the elements in the matrix, A, which is
especially efficient when using a computer package that allows for matrix
multiplication (such as Excel). A is written as,
A = X
T
X (8.8)
X
T
is the transpose of the matrix, X, where X is given by,
X =








.

\

n
i
x
x
x
x
x
1
1
1
1
1
3
2
1
(8.9)
If f is some function of a and b, then
66
,
f
T
f
Vd d =
2
f
σ (8.10)
where




.

\

∂
∂
∂
∂
=
b
f
a
f
f
d (8.11)
As V = σ
2
A
1
, equation 8.10 can be rewritten as,
f
1 T
f
d A d
2 2
σ σ =
f
(8.12)
8.1: Example of propagation of uncertainties involving parameter estimates
Equation 8.12 is applied to data gathered in an experiment which considers the
variation in pressure of a fixed mass and volume of gas as the temperature of the gas
changes. The data are given in table 8.1.We will use the data to estimate the value of
66
See Salter (2000).
56
the temperature at which the pressure of the gas is zero (this is termed the ‘absolute
zero’ of temperature).
Table 8.1: Pressure versus temperature data.
θ(°C) P (kPa)
20 211
10 218
0 224
10 238
20 247
30 251
40 259
50 265
60 277
70 288
80 294
Assume that the relation between pressure, P, and temperature, θ, can be written,
P = A + Bθ (8.13)
Where A and B are parameters to be estimated using least squares.
We will determine,
a) best estimates for A and B (written as a and b respectively).
b) standard errors, σ
a
and σ
b
,
in a and b.
c) the intercept,
INT
θ
ˆ
, of the best line through the data on the temperature axis.
d) the standard error in
INT
θ
ˆ
, assuming errors in a and b are uncorrelated.
e) the standard error in
INT
θ
ˆ
, assuming errors in a and b are correlated.
Solution
a)
a and b may be determined in several ways, including using the LINEST() function in
Excel
67
. Applying the LINEST() function to the data in table 8.1 we obtain:
a = 226909 Pa
b = 836.36 Pa⋅°C
b)
Using LINEST() in Excel to calculate σ
a
and σ
b
gives,
σ
a
= 993.7 Pa
σ
b
= 22.80 Pa⋅°C
67
See page 228 of Kirkup (2002).
57
c)
The intercept, θ
INT
, on the temperature axis occurs when P = 0. Rearranging equation
8.13 gives,
B
A
INT
− = θ (8.14)
The best estimate of θ
INT
, written as
INT
θ
ˆ
, is therefore,
b
a
INT
− = θ
ˆ
(8.15)
3 . 271
36 . 836
226909
− = − = °C
d)
Assuming that errors in a and b are uncorrelated, the usual propagation of
uncertainties equation gives the standard error in
INT
θ
ˆ
,
INT
θ
σ
ˆ
as
68
,
2 2
ˆ
ˆ ˆ


.

\

∂
∂
+


.

\

∂
∂
=
b
INT
a
INT
b a
INT
σ
θ
σ
θ
σ
θ
(8.16)
Now,
b a
INT
1
ˆ
− =
∂
∂θ
and (8.17)
2
ˆ
b
a
b
INT
=
∂
∂θ
(8.18)
It follows that (using equation 8.16).
( )
2
2
2
ˆ
80 . 22
36 . 836
226909
7 . 993
36 . 836
1


.

\

× + 
.

\

× − =
INT
θ
σ
= 7.49 °C
It follows that,
( ) 5 7 3 271 . .
INT
± − = θ °C
e)
In order to determine
INT
θ
ˆ
when the correlation between a and b is accounted for, we
write (following equation 8.7),
68
See page 390 of Kirkup (2002).
58
A =


.

\

=


.

\

¯ ¯
¯
20900 330
330 11
2
x x
x n
Inverting the matrix A is accomplished using the MINVERSE() function in Excel
69
.
This gives,
A
1
=


.

\

× −
−
5 
10 9.09091 00272727 . 0
00272727 . 0 172727 . 0
(8.19)
To determine
INT
θ
σ
ˆ
we use equation 8.12. It is convenient to rewrite equation 8.12 as,
INT INT INT
θ θ θ
σ σ
ˆ ˆ
2 2
ˆ
d A d
1 T
= (8.20)
Now
( )
2
ˆ
2
2
−
−
=
¯
n
P P
i i
σ (8.21)
where
i i
b a P θ + =
ˆ
(8.22)
Values for a and b appear in part a) of this question. Using those estimates and
equation 8.21, we find,
5717171
2
= σ (8.23)
From equation 8.11 and equation 8.15,
INT
θ
ˆ
d is given by,




.

\

−
=




.

\

∂
∂
∂
∂
=
2
ˆ
1
ˆ
ˆ
b
a
b
b
a
i
i
INT
θ
θ
θ
d (8.24)
Substituting a and b obtained in part a) of this question gives,


.

\
−
=
32439 . 0
0011957 . 0
ˆ
INT
θ
d
Returning to equation 8.20, we have,
( )


.

\
 −


.

\

× −
−
− =
−
32439 . 0
0011957 . 0
10 09091 . 9 00272727 . 0
00272727 . 0 172727 . 0
32439 . 0 0011957 . 0 5717171
5
2
ˆ
INT
θ
σ
69
See page 285 in Kirkup (2002).
59
so that,
20 . 68
2
ˆ
=
INT
θ
σ (°C)
2
, or
26 . 8
ˆ
=
INT
θ
σ °C
Now we write:
( ) 3 . 8 3 . 271 ± − =
INT
θ °C
This may be compared with
INT
θ obtained in part d) of this question when a and b are
assumed to be uncorrelated, ie:
( ) 5 7 3 271 . .
INT
± − = θ °C
In this instance, failure to account for the correlation between a and b results in an
underestimation of the standard error in
INT
θ
ˆ
.
8.2 Uncertainties in derived quantities incorporating least squares estimates
Parameter estimates obtained using least squares, as well as other quantities that have
uncertainty, may be brought together to determine a ‘derived’ quantity. The derived
quantity has an uncertainty which may be calculated. As an example, consider the
calibration line in figure 8.1 which is to be used to determine
o
xˆ when y =
o
y (in an
analytical chemistry application,
o
y might represent the mean detector response of an
instrument and
o
xˆ is the predicted concentration of the analyte corresponding to that
response).
Figure 8.1: Calibration line fitted to xy data.
Assuming the relationship between x and y in figure 8.1 is linear, then,
o o
x b a y ˆ + = (8.25)
x
y
y
o
o
x ˆ
60
or,
b
a y
x
o
o
−
= ˆ (8.26)
a and b are determined using least squares. As
o
y is not correlated with a or b, we
write,
o o o o
x x y
o
o
x
y
x
ˆ ˆ
2
2
2
ˆ
ˆ
d A d
1 T
σ σ σ +


.

\

∂
∂
= (8.27)
From equation 8.26 we have,
b y
x
o
o
1 ˆ
=
∂
∂
Also,
m
o
y
2
2
σ
σ = (8.28)
where σ
2
is given by equation 8.6, and m is the number of repeat measurements made
of the detector response for a particular (unknown) analyte concentration.
8.3: Example of propagation of uncertainties in derived quantities
In section 8.1 we considered data from an experiment in which the variation in
pressure of a fixed mass and volume of gas was measured as the temperature of the
gas changes. We will use that data and the additional information that four repeat
measurements of pressure were made at an unknown temperature such that,
Mean pressure
o
P = 2.54 × 10
5
Pa,
Adapting equation 8.26, we have,
36 . 836
226909 10 54 . 2
ˆ
5
− ×
=
−
=
b
a P
o
o
θ = 32.39 °C
Using equation 8.28, and the value of σ
2
given in equation 8.23, we find
70
4
5717171
2
2
= =
m
P
σ
σ = 1.429 × 10
6
(Pa)
2
Rewriting equation 8.27 in terms of the variables in this question gives,
70
The assumption is made here is that the scatter in the y values remains constant, such that the
estimate we make of the standard deviation in the y values during calibration is the same as that of the y
values obtained for the unknown x value.
61
o o o
P
o
o
P
θ θ θ
σ σ
θ
σ
ˆ ˆ
2
2
2
ˆ
ˆ
d A d
1 T
+


.

\

∂
∂
= (8.29)
20 . 68 10 429 . 1
36 . 836
1
6
2
+ × × 
.

\

=
= 2.04 + 68.20
so that,
38 . 8
ˆ
=
o
θ
σ °C
Finally, we write,
o
θ = (32.5 ± 8.4) °C
8.4: Uncertainty propagation and nonlinear least squares
In general, parameter estimates obtained using non linear least squares are correlated.
Therefore, for derived quantities which incorporate parameter estimates, the
covariance matrix must be used to establish the standard errors in those quantities.
The first stage, as with any non linear fitting, is to minimise the sum of squares of
residuals, SSR, as described in sections 3 and 4.
Suppose f is a function of parameter estimates obtained through non linear least
squares. The variance in f,
2
f
σ , may be written,
f
1 T
f
d E d
2 2
σ σ =
f
(8.30)
E
1
is the inverse of the matrix, E, where,
71
E = D
T
D (8.31)
D is given by,
D =
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
c
y
b
y
a
y
n n n
i i i
2 2 2
1 1 1
(8.32)
71
See section 4.2.
62
and
72






.

\

∂
∂
∂
∂
∂
∂
=
c
f
b
f
a
f
f
d (8.33)
8.4.1: Example of uncertainty propagation in parameter estimates obtained by
nonlinear least squares
In many situations, calibration data exhibit a slight curvature and it is a matter of
debate whether it is appropriate to fit an equation of the form y = a + bx to the data.
As an example, consider the data shown in table 8.2 and also in figure 8.2.
Table 8.2: Area versus concentration data for biochanin.
Conc.
(x)
(mg/l)
Area, (y)
(arbitrary
units)
0.158 0.121342
0.158 0.121109
0.315 0.403550
0.315 0.415226
0.315 0.399678
0.631 1.839583
0.631 1.835114
0.631 1.835915
1.261 3.840554
1.261 3.846146
1.261 3.825760
2.522 8.523561
2.522 8.539992
2.522 8.485319
5.045 16.80701
5.045 16.69860
5.045 16.68172
10.09 34.06871
10.09 33.91678
10.09 33.70727
Close inspection of the data in figure 8.2 indicates that the relationship between Area
and Concentration is not linear, but shows a slight but definite curvature. There are
many candidates for the function that might be fitted to data, but we must be wary of
using a function with too many adjustable parameters (see section 10). We will fit the
function,
C
Bx A y + = (8.34)
72
Equations 8.32 and 8.33 are appropriate where there are three best estimates, a, b and c of the
parameters in the equation fitted to data. Both equations may be extended if the number of parameters
to be estimated exceeds three.
0
5
10
15
20
25
30
35
40
0
2
4
6
8
10
12
Concentration (mg/l)
0
5
10
15
20
25
30
35
40
0
2
4
6
8
10
12
Area (arbitrary units)
Figure 8.2: Calibration curve of area versus concentration
for biochanin.
63
to the data in table 8.2.
Applying nonlinear least squares, the best estimates for A, B and C, represented by a,
b, and c, respectively are,
a = 0.5651
b = 3.581
c = 0.9790
When repeat measurements are made of the area under a chromatogram curve, the
mean area can be determined. Using this mean we may estimate the concentration of
the biochanin. We begin by rearranging equation 8.34, so that,
C
B
A y
x
1

.

\
 −
= (8.35)
Substituting a, b and c, and
o
y , into equation 8.35, gives the estimate of x,
o
xˆ , as,
c
o
o
b
a y
x
1
ˆ

.

\
 −
= (8.36)
As
o
y is not correlated with a, b or c, we can write,
o o o o
x x y
o
o
x
y
x
ˆ ˆ
2
2
2
ˆ
ˆ
d E d
1 T
σ σ σ +


.

\

∂
∂
= (8.37)
where,






.

\

∂
∂
∂
∂
∂
∂
=
c
x
b
x
a
x
o
o
o
ˆ
ˆ
ˆ
ˆ
o
x
d (8.38)
and,
( )
3
ˆ
2
2
−
−
=
¯
n
y y
i i
σ (8.39)
Partially differentiating
o
xˆ in equation 8.36 with respect to a, b, c and
o
y respectively
gives,

.

\
 −

.

\
 −
− =
∂
∂
c
c
o o
b
a y
bc a
x
1
1 ˆ
(8.40)
64

.

\


.

\
 −
− =
∂
∂
c
o o
b
a y
bc b
x
1
1 ˆ
(8.41)

.

\
 −

.

\
 −
− =
∂
∂

.

\

b
a y
b
a y
c c
x
o
c
o o
ln
1 ˆ
1
2
(8.42)

.

\
 −

.

\
 −
=
∂
∂
c
c
o
o
o
b
a y
bc y
x
1
1 ˆ
(8.43)
After calibration, the area under the chromatogram curve is measured four times for a
sample of unknown concentration. It is found that,
15513 . 6 =
o
y (8.44)
Fitting using nonlinear least squares gives,
a = 0.5651
b = 3.581
c = 0.9790
Substituting for a, b, c and
o
y in equation 8.36 gives the estimate of the unknown
concentration,
o
xˆ , as
9790 . 0
1 1
581 . 3
5651 . 0 15513 . 6
ˆ

.

\
 +
= 
.

\
 −
=
c
o
o
b
a y
x = 1.902 mg/l
Substituting for a, b, c and
o
y into equations 8.40 to 8.43, we obtain,
289083879 0
ˆ
. 
a
x
o
=
∂
∂
, 54244803 0
ˆ
.
b
x
o
− =
∂
∂
, 248914487 1
ˆ
. 
c
x
o
=
∂
∂
, 289083879 0
ˆ
.
y
x
o
o
=
∂
∂
65
Sheet 8.1 Shows the layout of a spreadsheet used to calculate
o
xˆ and
o
xˆ
σ .
Sheet 8.1: Annotated sheet showing calculation of
o
xˆ and
o
xˆ
σ .
A B C D
44
a 0.56512 σ
a
0.088470
45
b 3.58138 σ
b
0.078995
46
c 0.97901 σ
c
0.009026
47
48
o
x
d
ˆ
0.28908
49
0.54245
50
1.24891
51
52 T
x
o
d
ˆ
0.28908 0.542448 1.248914
53
54
V 0.00783 0.00601 0.000646
55
0.00601 0.00624 0.000705
56 0.00065 0.0007 8.15E05
57
58
V
o
x
d
ˆ
0.00019
59
0.00077
60 0.00009
61
62
o o
x
T
x
Vd d
ˆ ˆ
0.00025
63
64
o
o
y
x
∂
∂ˆ
0.28908
65
2
ˆ


.

\

∂
∂
o
o
y
x
0.08357
66 2
σ 0.02985
67
2
o
y
σ
0.00746
68
o
y
6.15513
69
m 4
70
o
xˆ
1.90193
71 2
ˆ
o
x
σ
0.00087
72
o
xˆ
σ
0.02948
Best estimates of parameters
and standard errors in
parameters






.

\

∂
∂
∂
∂
∂
∂
=
c
x
b
x
a
x
o
o
o
ˆ
ˆ
ˆ
ˆ
o
x
d

.

\
 −

.

\
 −
− =
∂
∂
c
c
o
o
o
b
a y
bc y
x
1
1 ˆ
( )
3
ˆ
2
2
−
−
=
¯
n
y y
i i
σ
m
o
y
2
2
σ
σ =
V = σ
2
E
1
c
o
o
b
a y
x
1
ˆ

.

\
 −
=
o o o o
x x y
o
o
x
y
x
ˆ ˆ
2
2
2
ˆ
ˆ
d E d
1 T
σ σ σ +


.

\

∂
∂
=
66
From sheet 8.1,
o
xˆ and
o
xˆ
σ are found to be:
o
xˆ = 1.90193
o
xˆ
σ = 0.029
which allows us to write: xˆ = (1.902 ± 0.029) mg/l
Exercise 6
The following data were obtained during the calibration of an HPLC system using
Ibuprofen. The area under the chromatograph peak is shown as a function of known
concentrations (expressed in mass/tablet) of Ibuprofen.
Table 8.3: Area under chromatograph peak as a function of concentration of
Ibuprofen.
Mass per
tablet/(mg/tablet)
Area
(arbitrary
units)
103.9 265053
103.9 261357
139.3 345915
139.3 345669
180.1 445684
180.1 445753
200.3 494700
200.3 493846
219.9 540221
219.9 539610
278.1 683881
278.1 683991
305.7 755890
305.7 754901
Using the data in table 8.3:
a) Fit an equation of the form
c
bx a y + = to the data in table 8.3, where y
corresponds to the area under the chromatograph peak and x corresponds to
Ibuprofen concentration. Determine a, b and c and their respective standards
errors.
b) A sample of Ibuprofen of unknown concentration is injected into the column
of the calibrated HPLC. The mean area of three replicates measurements is
found to be 405623. Use this information to estimate the concentration of
Ibuprofen and the standard error in the estimate of the concentration.
67
Section 9: More on Solver
Solver was devised primarily for use by the business community and this is reflected
in the features it offers. Solver comprises three optimisation algorthms:
1) For integer problems, Solver uses the Branch and Bound method
73
.
2) Where equations are linear, the Simplex method is used for optimisation
74
.
3) In the case of nonlinear problems, the General Reduced Gradient (GRG)
method is adopted
75
.
It is the GRG method that is applied in our analyses therefore most of this section is
devoted to describing features of Solver that relate to this.
Though optimisation can be carried out successfully with the default settings in the
Solver Option dialog box, Solver possesses several options that can be adjusted by the
user to assist in the optimisation process and we will describe those next.
The Solver dialog box, as shown in figure 4.2 offers the facility to constrain
parameters estimates. The application of constraints requires careful consideration as
it is possible that Solver will locate a local minimum, rather than a global minimum.
The best estimates returned by Solver need to be compared with ‘physical reality’
before being accepted. Consider an example in which a parameter in an equation
represents the speed of sound, v, in air. If, after fitting, the best estimate of v is
212 m/s, it is fair to question whether this value is ‘reasonable’. If it is not, then one
course of action it to try new starting values for the parameter estimates. We could use
the Constraints box in Solver to constrain the estimate of v so that it cannot take on
negative values. This cannot guarantee that a physically meaningful value will be
found for v, only that the value will be nonnegative.
9.1 Solver Options
To view Solver options shown in figure 9.1, it is necessary to click on the Option
button in the Solver dialog box. This dialog box may be used to modify, for example,
the methods by which the optimisation takes place. This, in turn, may provide for a
better fit or reduce the fitting time over that obtained using the default settings.
73
See Wolsey L A, (1998).
74
See Nocedal, J, (1999).
75
See Smith and Lasden, (1992).
68
Figure 9.1: Solver Options dialog box illustrating default settings.
We now consider some of the options in the Solver Options dialog box.
Max Time: This restricts the total time Solver spends searching for an optimum
solution. Unless there are many data, the default value of 100 s is
usually sufficient. If the maximum time is set too low, such that
Solver has not completed its search, then a message is returned 'The
maximum time limit was reached; continue anyway?'. Clicking on the
Continue button will cause Solver to carry on searching for a solution.
Iterations: This is the maximum number of iterations that Solver will execute
before terminating its search. The default value is 100, but this can be
increased to a limit of 32,767. Solver is likely to find an optimum
solution before reaching such a limit or return a message that an
optimum solution cannot be found. If the number of iterations is set
too low, such that Solver has not completed its search, then a message
will be returned 'The maximum iteration limit was reached; continue
anyway?'. Clicking on the Continue button will cause Solver to carry
on searching for a solution.
Precision and
Tolerance:
These options are applicable to situations in which constraints have
been specified. Specifying constraints is not advised and so we will
not consider these options.
Convergence: As fitting proceeds, Solver compares the most recent solution (for our
application this would be the value of SSR) with previous solutions. If
the fractional reduction in the solution over five iterations is less than
the value in the Convergence box, Solver reports that optimisation is
complete. If this value is made very small (say, 10
6
) Solver will
continue iterating (and hence take longer to complete) than if that
number is larger (say, 10
2
).
Assume
Linear Model
If this box is ticked then Solver uses the Simplex method to obtain
best estimates of parameters. If the model to be fitted to data is linear,
then fitting may be performed using the Regression Tool in the
Analysis ToolPak. This is an attractive alternative as the Regression
Tool returns best estimates, standard errors in estimates, confidence
intervals and sum of squares of residuals. If the 'Assume Linear
69
Model' box is ticked, Solver will attempt to establish if the model is
indeed linear. If Solver determines that the model is nonlinear, the
message is returned 'The conditions for Assume Linear Model are not
satisfied'. To continue, it is necessary to return to the Solver Option
dialog box and untick the Assume Linear Model option.
Assume Non
Negative
This constrains all estimates in an equation so that they cannot take on
negative values.
Use
Automatic
Scaling
In certain problems there may be many orders of magnitude
difference between the data, the parameter estimates and the value in
the target cell. This can lead to rounding problems owing to the finite
precision arithmetic performed by Excel. If the 'Use Automatic
Scaling' box is ticked, then Solver will scale values before carrying
out optimisation (and 'unscale' the solution values before entering
them into the spreadsheet). It is advisable to tick this box for all
problems.
Show
Iteration
Results
Ticking this box causes Solver to pause after each iteration, allowing
new estimates of parameters and the value in the Target cell to be
viewed. If parameter estimates are used to draw a line of best fit
through the data, then the line will be updated after each iteration.
Updating the fitted line on the graph after each iteration gives a
valuable insight into the progress made by Solver to find best
estimates of the parameters in an equation.
Estimates:
Tangent or
Quadratic
This determines the method used to find subsequent values of each
parameter estimate at the outset of the search (ie either linear or
quadratic extrapolation). Both methods produce the same final results
for the examples described in this document.
Derivatives:
Forward or
Central
The partial derivatives of the function in the target cell with respect to
the parameter estimates are found by the method of finite differences.
It is possible to perturb the estimates 'forward' from a particular point
(similar to that described in section 4.3) or to perturb the estimates
forward and backward from the point in order to obtain a better
estimates of the partial derivatives. Both methods of determining the
partial derivatives produce the same final results for the examples
described in this document.
Search:
Newton or
Conjugate
Specifies the search algorithm. Reference to the Quasi  Newton and
Conjugate search methods used by Excel can be found in Safizadeh
and Signorile (1993) and Perry (1978) respectively. Both methods of
produce the same final results for the examples described here.
Load Model
and Save
Model
It is possible that the effect on optimisation of using a combination of
options such as Tangent (Estimates), Central (Derivatives) and
Conjugate (Search methods) would like to be considered. It is tedious
to record which fitting conditions have been used. Excel offers the
facility to store the option by clicking on Save Model, followed by
specifying the cells on the spreadsheet where the Model conditions
should be saved. These conditions can be recalled by clicking on
Load Model and indicating the cells which contain the saved
information.
70
9.2 Solver Results
Once Solver completes optimisation, it displays the Solver Results dialog box shown
in figure 9.2.
Figure 9.2: Solver Results dialog box.
Clicking on OK will retain the solution found by Solver (ie the starting parameters are
permanently replaced by the final parameter estimates). At this stage Excel is able to
present three reports: Answer, Sensitivity and Limits. Of the three reports, the
Answer report is the most useful as it gives the starting values of the parameters
estimates and the associated SSR. The report also displays the final parameter
estimates and the final SSR, allowing for easy comparison with the original values. An
Answer report is shown in figure 9.3.
Figure 9.3: Answer report created by Excel.
71
Section 10: Modelling and Model Identification
There are several types of model that interest physical scientists. Physical and
chemical models are based on the application of physical and chemical principles.
Such principles are expected to have wide applicability and underlie phenomena
observed inside and outside the laboratory. Equations founded on physical and
chemical principles contain parameters that have physical meaning rather than simply
being anonymous constants in an equation. For example, a parameter in an equation
could represent the radius of the Earth, the energy gap of a semiconductor or a rate
constant in a chemical reaction.
There are also essentially statistically based models that may, through consideration
of experimental or observational data, assist in identifying the important variables and
lend support to an empirical relationship between variables. A useful empirical
equation is one that successfully describes the trend in the data but is not derived from
a consideration of the fundamental principles underlying the relationship between
variables.
While both types of modelling are useful, most scientists would prefer the insight and
predictive opportunities offered by good physical models to those that have purely
statistical basis or support.
10.1 Physical Modelling
If a model based on physical and chemical principles is successful, in the sense that
data gathered in experiments are consistent with the predictions of the model, then
this lends support to the validity of the underlying principles.
As an example, a physical principle described by Isaac Newton, is that an attractive
force exists between all bodies. That attractive force is termed the gravitational force.
Newton went on to indicate how the gravitational force between two bodies depends
on their respective masses and the separation between the bodies. From this starting
point, it is possible to predict the value of acceleration of a body when it is allowed to
fall freely above the Earth’s surface. It is often the case that approximations are made
so that the problem does not become too complicated
76
. In this example we might
consider the Earth to be
77
:
a) a perfect sphere
b) not rotating
c) of uniform density
Once a prediction has been made as to how the acceleration of a body varies with
distance above the Earth’s surface, the next step is to determine by careful
measurement how the acceleration actually depends on distance. If the
approximations given by a), b) and c) above are valid, then the relationship between
free fall acceleration, g(h) and height, h, can be written:
76
Experienced physical scientists are able to simplify complex situations while retaining the key
principles necessary to understand a particular physical process or phenomenon.
77
If it found that the data are inconsistent with the ‘simplified’ theory, the approximations may have to
be revisited and the model revised.
72
( )
2
0
1 
.

\

+
=
R
h
g
h g (10.1)
where g
0
is the acceleration caused by gravity at the Earth’s surface (i.e. when h = 0),
and R is the radius of the Earth
78
.
By gathering data of acceleration as a function of height, it should be possible to
confirm or contest the validity of equation 10.1. It is also possible to infer from
equation 10.1 that if the range of h values is too limited (much less than the radius, R)
then the acceleration, g(h), will decrease almost linearly with height
79
. Additionally,
as the radius of the Earth is one of the parameters to be estimated, this can be
compared with the known radius of the Earth as determined by other methods.
Applying physical principles in order to establish an equation that successfully relates
the variables is challenging. However, such an equation is often more satisfying and
have wider applicability than an empirical equation.
10.2 Data driven approach to discovering relationships
As an alternative to a ‘physical principles’ approach to developing a relationship
between physical variables, we could try a ‘data driven’ approach such that trends
observed in the data suggest a relationship between dependent and independent
variables that might be valid. One weakness of this approach is that, even if the
correct functional relationship between acceleration and height is discovered, we
would be unlikely to recognise that hidden within parameter estimates is an important
physical constant, such as the radius of the Earth.
For example, with respect to study involving gravity described in section 10.1, we
might carefully gather experimental data of the acceleration of free fall, g(h) for
various heights, h, then plot g(h) versus h in order to discern the type relationship
between the two variables. Such a plot is shown in figure 10.1 for values of h in the
range 0 to 20 km.
78
See Walker (2002), Chapter 12.
79
This can be shown by doing a binomial expansion of equation 10.1 (see problem 11 at the end of the
article).
73
9.740
9.750
9.760
9.770
9.780
9.790
9.800
9.810
9.820
0 5000 10000 15000 20000
height (m)
g
(
m
/
s
2
)
Figure 10.1: Variation of acceleration due to gravity with height above the Earth’s
surface
Based on the data appearing in figure 10.1, there is a relationship between g(h) and h,
but owing to the variability within the data and perhaps the limited range over which
the data were gathered, it is difficult to justify fitting an equation other than y = a + bx
to these data.
10.3 Other forms of modelling
In the physical sciences we are often able to isolate and control important independent
variables in an experiment. For example, past experience may suggest that the
thickness of an aluminium film vacuum deposited onto a glass substrate is affected by
the distance from the aluminium target to the substrate, the deposition time and the
pressure of the gas in the vacuum chamber. Such isolation and control might be
contrasted with situations often encountered in other areas of science (and in other
disciplines, such as the health or medical sciences).
Consider, as an example, the efficacy of a treatment in prolonging the life of a patient
suffering with liver cancer. There may be many variables that affect patient longevity
to be considered including patient age, sex, race, past medical history, family medical
history and socioeconomic status. In fact, identifying which are the most important
variables may be the finest achievement of the modelling/data analysis process with
little expectation that a functional relationship other than linear will emerge between
independent and dependent variables.
There are many areas of science in which a certain amount of data ‘mining’ or
‘prospecting’ is required to establish which variables are most important and which
can be safely discarded. Here we will confine our considerations to the analysis of
data which emerge from experiments in which independent variables can be carefully
controlled and measured.
10.4 Competing models
Whether equations relating variables have been developed by first considering
physical principles, past experience, or intelligent guesswork, there are circumstances
in which two or more equations compete to offer the best explanation of the
relationship between the variables. More terms can be added to an equation (including
74
terms that introduce extra independent variables) until the fit between equation and
data is optimised, as measured by some suitable statistic such as those described in
section 10.5.
Careful experimental design can assist in helping discriminate one equation from
another. For example, if a model predicts a slightly non linear relationship between
dependent and independent variables, it would be wise to make measurements over as
wide a range of values of the independent variable as possible to expose or exaggerate
that nonlinearity. Additionally, if the data show large scatter, there may be merit in
investigating ways by which the noise can be reduced in order to improve the quality
of the data.
In the situations in which we need to compare two or more equations, we can appeal
to methods of data analysis to provide us with quantifiable means of distinguishing
between models. It is these methods that we will concentrate upon for the remainder
of this section.
10.5 Statistical Measures of Goodness of Fit
There are several measures that can be used to assist in discriminating statistically
which equation gives the best fit to data, including the Schwartz criterion, Mallow’s
C
p
and the Hannan and Quinn Information Criterion
80
. Here we focus on two criteria,
the Adjusted Coefficient of Multiple Determination,
2
ADJ
R , and the Akaike
Information Criterion (AIC) as they are quite easy to implement and interpret.
10.5.1 Adjusted Coefficient of Multiple Determination
A measure of how well an equation is able to account for the relationship between the
independent and dependent variable is given by the Coefficient of Multiple
Determination, R
2
, given by,
( )
( )
¯
¯
−
−
− =
2
2
2
ˆ
ˆ
1
i
i i
y y
y y
R (10.2)
where y
i
is the ith observed value of y,
i
yˆ is the predicted y value found using the
equation representing the best line through the points and y is the mean of the
observed y values. Note that the numerator in the second term of equation 10.2 is the
sum of squares of residuals SSR.
As more parameters (or independent variables) are added to the model we would
expect SSR to reduce. As a consequence, R
2
would tend to unity. If we were to use R
2
to help choose between equations, for example between,
y = a + bx (10.3)
and
y = a + bx + cx
2
(10.4)
then equation 10.4 would always be favoured over equation 10.3 owing to the extra
flexibility the x
2
term provides for the line of best fit to pass close to the data points.
80
See AlSubaihi (2002).
75
While the extra term in x
2
contributes to a reduction in SSR, is possible that the
reduction is only marginal. It seems reasonable that, while looking for an equation
that reduces SSR, account should also be taken of the number of parameters so as not
to unfairly discriminate against equations with only a small number of parameters.
One such statistic is the Adjusted Coefficient of Multiple Determination,
2
ADJ
R , is
given by,
81
( ) ( )
M n
M R n
R
−
− − −
=
1 1
2
2
ADJ
(10.5)
where R
2
is given by equation 10.2, n is the number of data and M is the number of
parameters in the equation.
The equation that is favoured when two or more equations are fitted to data, is that
equation that gives the largest value for
2
ADJ
R .
10.5.2 Akaikes Information Criterion (AIC)
Another way to compare two (or more) equations fitted to data, where the equations
have different numbers of parameters is to use the Akaikes Information Criterion
82
(AIC). This criterion takes into account SSR, but also includes a term proportional to
the number of parameters used. AIC may be written,
M SSR n 2 ln AIC + = (10.6)
where n is the number of data and M is the number of parameters in the equation.
The second term on the right hand side of equation 10.6 can be considered as a
‘penalty’ term. If the addition of another parameter in an equation reduces SSR then
the first term on the right hand side of equation 10.6 becomes smaller. However the
second term on the right hand side increases by two for every extra parameter used. It
follows that a modest decrease in SSR which occurs when an extra term is introduced
into an equation may be more than offset by the increase in AIC by using another
parameter. We conclude that, if two or more equations are fitted to data, then the
equation producing the smallest value for AIC is preferred.
Care must be exercised when calculating SSR as, if a transformation is required to
facilitate fitting, data must be transformed back to the original units before calculating
SSR, otherwise it is not possible to compare equations using
2
ADJ
R or AIC.
Additionally, if weighted fitting is to be used, then the same weighting of the data
must be used for all equations fitted to data.
81
See Neter, Kutner, Nachtsheim and Wasserman for a discussion of equation 10.5.
2
ADJ
R is calculated
by the Regression Tool in the Analysis ToolPak in Excel (see p 373 of Kirkup).
82
See Akaike (1974).
76
10.5.3 Example
As part of a study into the behaviour of electrical contacts made to a ceramic
conductor, the data in table 10.1 were obtained for the temperature variation of the
electrical resistance of the contacts.
Table 10.1: Resistance versus temperature for electrical contacts on a ceramic.
T (K) R(Ω) T (K) R(Ω)
50 4.41 190 0.69
60 3.14 200 0.85
70 2.33 210 0.94
80 2.08 220 0.78
90 1.79 230 0.74
100 1.45 240 0.77
110 1.36 250 0.68
120 1.20 260 0.66
130 0.86 270 0.84
140 1.12 280 0.77
150 1.05 290 0.75
160 1.05 300 0.86
170 0.74
180 0.88
These data are shown plotted in figure 10.2.
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0 50 100 150 200 250 300 350
Temperature (K)
R
e
s
i
s
t
a
n
c
e
(
o
h
m
s
)
Figure 10.2: Resistance versus temperature data for electrical contacts made to a
ceramic material.
It is suggested that there are two possible models that can be used to describe the
variation of the contact resistance with temperature.
Model 1
The first model assumes that contacts show semiconducting behaviour, where the
relationship between R and T can be written,

.

\

=
T
B
A R exp (10.7)
77
where A and B are constants.
Model 2
Another equation proposed to describe the data assumes an exponential decay of
resistance with increasing temperature of the form,
( ) γ β α + − = T R exp (10.8)
where α, β and γ are constants.
We will use the adjusted coefficient of multiple determination and the Akaikes
information criterion to determine whether equation 10.7 or equation 10.8 better fits
the data.
Solution
Both equation 10.7 and equation 10.8 were fitted using nonlinear least squares. It is
possible to linearise equation 10.7 by taking logarithms of both sides of the equation,
then performing linear least squares. However it is more convenient to use the Solver
utility in Excel to perform nonlinear least squares, as described in sections 4 of this
document.
Summarised in table 10.2 are the results of the fitting. Note that the number of data in
table 10.1, n = 26.
Table 10.2: Parameter estimates and statistics obtained when fitting equations 10.7
and 10.8 to the data in table 10.1.
Parameter estimates
and other statistics
Fitting

.

\

=
T
B
A R exp
Fitting ( ) γ β α + − = T R exp
A, σ
A
0.4849, 0.0175 
B, σ
B
111.0, 2.40 
α, σ
α
 18.91, 2.14
β, σ
β
 0.03391, 0.00196
γ, σ
γ
 0.7974, 0.0313
SSR 0.2709 0.3191
AIC 29.95 23.70
R
2
0.9859 0.9833
2
ADJ
R
0.9853 0.9818
Inspection of table 10.2 reveals that the equation

.

\

=
T
B
A R exp is superior to
( ) γ β α + − = T R exp as judged by AIC and
2
ADJ
R . In this example the SSR is smaller
for equation 10.7 fitted to data compared to equation 10.8. As the number of
parameters in equation 10.7 is less than that in equation 10.8, this would have been
enough to encourage us to favour equation 10.7 as the better fit to data.
78
Section 11: Monte Carlo simulations and least squares
How effective is the technique of least squares at providing good estimates of
parameters appearing in an equation fitted to experimental data? This question is both
challenging and important. To begin with it is not possible to assure that an equation
fitted to data is appropriate. Additionally, we cannot be sure that the assumptions
usually made when applying the technique of least squares (e.g. that errors in the y
values are normally distributed, with a mean of zero and a constant standard
deviation) are valid.
There is no way to be certain of what the parameters should be that appear in any
equation that is fitted to ‘real’ data. However, it is possible to contrive a situation
where we do know the underlying relationship between the dependent variable (y) and
independent variable (x) and how errors are distributed.
The starting point is to generate ‘noise free’ y values in some range of x values. The
next stage is to add ‘noise’ of known standard deviation with the aid of a random
number generator
83
.
Data generated in this manner are submitted to a least squares routine which, in turn,
calculates the best estimates of parameters appearing in an equation fitted to the data.
The estimates are compared with the ‘actual’ parameters allowing the error in the
estimates
84
to be determined. Generating and analysing data in this way manner is an
example of a Monte Carlo simulation. Such simulations are widely used in science to
imitate situations that are too difficult, costly or time consuming to investigate
through conventional experiments.
The Monte Carlo approach is powerful and versatile. As examples, we may
investigate ‘experimentally’,
• the performance of data analysis tools (for example, the speed and accuracy
of rival algorithms for nonlinear least squares can be compared).
• the consequence of choosing different sampling regimes (for example, the
distribution of parameter estimates obtained when measurements are made at
evenly spaced intervals of x can be compared with the distribution of
parameter estimates obtained when replicate measurements are made at
extreme values of x).
• the effect of homo or heteroscedasticity on parameter estimates (for example,
the consequences may be investigated of fitting an equation by unweighted
least squares to data, where data have been influenced by heteroscedastic
noise).
• the effect of the magnitude of the ‘noise’ in the data on the standard errors of
the parameter estimates.
83
or a pseudorandom number generator as routinely found in statistic and spreadsheet packages.
84
error = true value of parameter – estimated value of parameter.
79
11.1 Using Excel’s Random Number Generator
The Random Number Generator in Excel offers a convenient means of adding
normally distributed noise to otherwise noise free data
85
. The Random Number
Generator is one of the tools in the Analysis ToolPak. The ToolPak is found by going
to the Tools pull down menu on the Menu toolbar and clicking on Data Analysis.
Figure 11.1 shows noise free y values in column B generated using the equation:
y = 3 + 1.5x (11.1)
Normally distributed noise with mean of zero and standard deviation of two is
generated in the C column. In the D column the noisefree yvalues are summed with
the noise. The xvalues are distributed evenly in the range x = 5 to x = 20.
A B C D
1 x y
noise_free
noise y
2 5 10.5 0.00145 =B2+C2
3 6 12.0 3.08168
4 7 13.5 2.95189
5 8 15.0 0.06251
6 9 16.5 0.674352
7 10 18.0 2.402985
8 11 19.5 2.987836
9 12 21.0 3.183932
10 13 22.5 1.49775
11 14 24.0 0.441671
12 15 25.5 2.453826
13 16 27.0 1.03515
14 17 28.5 1.77266
15 18 30.0 0.43274
16 19 31.5 1.972357
17 20 33.0 0.010632
Figure 11.1: Normally distributed noise with zero mean and standard deviation of two
added to y values. x values are in the range x = 5 to x = 20, with no replicates.
Figure 11.2 shows a similar range of x values, but in this case eight replicates are
made at x = 5 and another eight at x = 20, with no values between these limits (such
that the number of data in figures 11.1 and 11.2 are the same). Again, normally
distributed noise with mean of zero and standard deviation of two is added to each of
the yvalues.
85
The Random Number Generator allows noise with distributions other than normal to be added to
data. We will consider only normally distributed noise.
80
K L M N
1 x y
noise_free
noise y
2 5 10.5 0.51868 =L2+M2
3 5 10.5 0.849941
4 5 10.5 0.91623
5 5 10.5 0.153223
6 5 10.5 1.798417
7 5 10.5 0.67711
8 5 10.5 2.12328
9 5 10.5 3.16988
10 20 33.0 0.11841
11 20 33.0 0.47371
12 20 33.0 2.95645
13 20 33.0 1.563158
14 20 33.0 2.03966
15 20 33.0 0.874793
16 20 33.0 1.679191
17 20 33.0 1.71295
Figure 11.2: Normally distributed noise with zero mean and standard deviation of two
added to y values. Data are generated at x = 5 and x = 20. Eight replicate y values are
generated at each x value.
Analysing the data shown in figures 11.1 and 11.2 using unweighted least squares
gives the following estimates for parameters and standard errors in parameters. Note
we refer to the data that are evenly distributed between x = 5 and x = 20 as given in
Figure 11.1 as ‘Even dist.’ and the data consisting of replicates at x = 5 and x = 20 as
‘Extreme dist.’. The outcome of analysing using unweighted least squares is shown in
table 11.1.
a σ
a
b σ
b
R
2
Even dist. 2.238 1.464 1.578 0.1099 0.9364
Extreme dist. 2.461 0.8297 1.538 0.05692 0.9812
Table 11.1: Parameter estimates and statistics for data in figures 11.1 and 11.2 found
using unweighted least squares.
Errors in intercept and slope in table 11.1 are found by subtracting the estimates from
the true values (3 and 1.5 respectively) as shown in table 11.2.
Error in a Error in b
Even dist. 0.7615 0.07801
Extreme dist. 0.5386 0.03845
Table 11.2: Errors in intercept and slope.
It is possible that the simulated data are unrepresentative of the effect of evenly
distributed data compared to data gathered at extreme x values (as there is only two
sets of data and, by chance, the ‘Extreme dist.’ could have been favoured over the
‘Even dist.’). This is where the power of the Monte Carlo approach emerges. The
simulation may be repeated many times in order to establish whether designing an
experiment with replicate measurements made at extreme x values does consistently
produce parameter estimates with smaller standard errors.
81
Figures 11.3 and 11.4 show histograms of estimates value of a and b which were
determined by generating 50 sets of simulated data, based on adding noise of standard
deviation of 2 to y values generated using equation 11.1.
0
2
4
6
8
10
12
14
0
.
0
1
.
0
2
.
0
3
.
0
4
.
0
5
.
0
6
.
0
7
.
0
estimate, a, of intercept
f
r
e
q
u
e
n
c
y
even dist.
extreme dist.
Figure 11.3: Histogram consisting of 50 estimates of intercept found by fitting the
equation y = a + bx to simulated data.
0
5
10
15
20
25
1
.
2
1
.
3
1
.
4
1
.
5
1
.
6
1
.
7
1
.
8
estimate, b , of slope
f
r
e
q
u
e
n
c
y
even dist.
extreme dist.
Figure 11.4: Histogram consisting of 50 estimates of slope found by fitting the
equation y = a + bx to simulated data.
Figures 11.3 and 11.4 provide convincing evidence of the benefits (as far as reducing
standard errors in parameter estimates is concerned) of designing the experiments in
which extreme x values are favoured. This finding has a sound foundation based on
statistical principles. For example, the standard error in the estimate of the slope is
related to the xvalues, x
i
by,
86
( ) ( )
2
1
2
¯
−
=
x x
i
b
σ
σ (11.2)
86
See Devore, 1991.
82
where x is the mean of the x values and σ is the standard deviation of the
experimental y values given by,
( )
2
1
2
2
ˆ
−
−
=
¯
n
y y
i i
σ (11.3)
where n is the number of data.
Equation 11.2 indicates that, for a fixed σ, σ
b
become smaller for large deviations of x
from the mean, i.e. for large values of x x
i
− .
It is worth emphasising that the reduction of the standard errors and improved R
2
are
secured at some cost. What if the underlying relationship between x and y is not
linear? Gathering data at two extremes of x has assumed that the data are linearly
related and there is no way to test the validity of this assumption with the data
gathered in this manner.
11.2 Monte Carlo simulation and nonlinear least squares
Let us now consider a situation requiring fitting by nonlinear least squares. The
equation to be fitted to data is given by,
( ) ( ) x B A x B A y
2 2 1 1
exp exp + = (11.4)
We choose (arbitrarily),
A
1
= 50, A
2
= 50, B
1
= 0.025 and B
2
= 0.010
Fifty values of y are generated in the range x = 1 to x = 200. A graph of the noise free
data with y values calculated at equal increments of x beginning at x = 1 is shown in
figure 11.5.
Figure 11.5: Noise free data generated using equation 11.4.
0
20
40
60
80
100
120
0 50 100 150 200
x
y
83
Next, noise is added with mean of zero and a constant standard deviation of unity
(again chosen arbitrarily). The question arises: what values of x should be chosen to
obtain estimates of A
1
, A
2
etc which have the smallest standard errors?
With normally distributed noise of zero mean and standard deviation of unity added to
the y values, the graph looks typically like that shown in figure 11.6.
Figure 11.6: x – y data as shown in figure 11.5 with noise added.
50 replicate data sets were generated with noise added to the y values in figure 11.5.
Upon the generation of each set, an equation of the form,
( ) ( ) x b a x b a y
2 2 1 1
exp exp + = (11.5)
where a
1
, a
2
, b
1
and b
2
are estimates of A
1
, A
2
, B
1
and B
2
respectively, was fitted to the
data using nonlinear least squares. Note that starting values for the nonlinear fit,
which are very important when fitting a function consisting of a sum of exponentials,
were a
1
= 50, a
2
= 50, b
1
= 0.025 and b
2
= 0.010.
Figure 11.7 shows a histogram of the a
1
parameter estimates.
0
20
40
60
80
100
120
0 50 100 150 200
x
y
84
Figure 11.7: Distribution of parameter estimate a
1
.
Exercise 7
An alternative sampling regime to that used in section 11.2 is to choose smaller
sample intervals in the region where the y values are changing most rapidly with x. A
sampling regime that has this characteristic is given by,
( )


.

\

+ −
+
=
1
1
ln
1
i N
N
x
i
λ
(11.6)
N is the total number of data, and λ is a constant which is determined by letting
i
x
equal the maximum x value, when i = N.
Repeat the example given in section 11.2 (i.e. use the same starting equation and
distribution of errors) using the new sampling regime described by equation 11.6 and
perform 50 replicates. Plot a histogram of the distribution of the parameter
estimate a
1
.
a) Is the standard deviation of the parameter estimates, a
1
, less than that obtained
in section 11.2 when the x
i
values were evenly distributed?
b) Carry out an F test to compare the variances of the distribution of a
1
obtained
using both sampling regimes to establish if the difference in the characteristic
width is statistically significant
87
.
Exercise 8
In an experiment to determine the wavelength, λ, of an ultrasonic wave, an
experiment is to be performed which exploits the phenomenon of interference of
waves from two sources of ultrasound.
87
See pages 342 to 346, and pages 369 to 371 in Kirkup (2002).
0
2
4
6
8
10
12
14
16
20 30 40 50 60 70 80 90
parameter estimate, a 1
F
r
e
q
u
e
n
c
y
85
The relationship between the separation, y, between two successive maxima of the
interfering waves and the separation of the sources of the waves, d, is given by:
d
D
y
λ
= (11.7)
where D is a constant.
Equation 11.7 is of the form y = bx, where, x ≡
d
1
and b ≡ λD.
How may values of d be chosen so as to minimise the standard error in the slope, b?
Simulation
Two approaches for choosing values of d are to be compared. The first approach
generates y values as d is increases from 1 to 20 cm in steps of 1 cm. The other is to
generate y values as the ratio 1/d is increased from 1/20 to 1 (ie 0.05 to 1).
Taking λ equal to 0.76 cm and D =50 cm, generate simulated values of y using
equation 11.7:
a) for d = 1 to 20 cm, in steps of 1 cm.
b)
for 1/d = 1/20 cm
1
to 1 in steps of 0.05 cm
1
.
For data generated by both methods a) and b), use Excel’s Random Number generator
to add normally distributed noise with mean of zero and standard deviation of unity.
Analysis
Use the LINEST() function in Excel to find the slope of the best line through the
origin for two sets of data generated. Replicate the simulation and least squares
analysis fifty times and construct a histogram showing the distribution of best
estimates of slope based on both distributions of x values.
Questions
Is there an obvious difference between the distributions of the parameter estimates
based on the two sampling regimes?
a) Support your answer to a using an F test to compares variances in the slope.
b) Do you foresee any practical problems when a ‘real’ experiment is to be
carried out using either sampling regimes?
86
11.3 Adding heteroscedastic noise using Excel’s Random Number Generator
When the standard deviation of measurement is not constant, but instead depends on
the x value, the distribution of errors is said to be heteroscedastic. As far as fitting an
equation to data using least squares is concerned, it is necessary to use weighted
fitting
88
. Heteroscedasticity may be revealed by plotting the residuals versus x as
shown in figure 11.8.
Figure 11.8: Residuals indicating weighted fit is required: The trend of large residual
to small (or small to large) as x increases is a strong indication of heteroscedastic error
distribution.
Though heteroscedasticity may be revealed by a plot of residuals, the nature of the
heteroscedasticity is not always clear. For example, when the dominant source of
error is due to instrumental error, it is common for the error, e
i
to be proportional to
the magnitude of the response, y
i
.
We can use a Monte Carlo simulation to study the effect of heteroscedasticity, and to
establish (for example) the consequences of fitting an equation to data with
heteroscedastic errors using both unweighted and weighted least squares.
We begin with an (arbitrary) equation from which we generate ‘noise free’ data. The
equation is:
y = 2 – 4x (11.8)
Figure 11.9 shows noise free data generates in the range x = 1 to x = 10 using the
Random Number Generator in Excel. In the C column there are normally distributed
numbers with mean equal to zero, and standard deviation equal to one. The values in
the D column also have a normal distribution, but the standard deviation of the
distribution at each x value depends on the magnitude of the y value in column B.
Specifically, the standard deviation, σ
i
is given by :
σ
i
= 0.1 × y
i
(11.9)
88
See page 264 of Kirkup (2002).
150
100
50
0
50
100
150
10 20 30
x
∆ ∆∆ ∆y
400
300
200
100
0
100
200
300
10 20 30
x
∆ ∆∆ ∆y
87
A B C D E
1 x y
noise_free
homo_noise hetero_noise y
2 1 2 1.0787 =0.1*B2*C2 =B2+D2
3 2 6 0.5726
4 3 10 1.15598
5 4 14 0.0725
6 5 18 0.67552
7 6 22 0.44338
8 7 26 0.5806
9 8 30 1.23376
10 9 34 0.18546
11 10 38 2.34588
Sheet 11.1: Generating data with heteroscedastic noise. ‘Experimental’ data appear in
column E.
A B C D E
1 x y
noise_free
homo_noise hetero_noise y
2 1 2 1.0787 0.21575 1.78425
3 2 6 0.5726 0.34356 5.65644
4 3 10 1.15598 1.15598 11.156
5 4 14 0.0725 0.10146 13.8985
6 5 18 0.67552 1.21594 19.2159
7 6 22 0.44338 0.97544 22.9754
8 7 26 0.5806 1.50944 24.4906
9 8 30 1.23376 3.70128 33.7013
10 9 34 0.18546 0.63055 34.6306
11 10 38 2.34588 8.91434 46.9143
Sheet 11.2: Completed spreadsheet based on values in sheet 11.1.
Figure 11.9 shows a plot of y versus x based on the generated data in sheet 11.2. The
line of best fit on the graph was found using Excel’s Trendline option and therefore
represents and unweighted fit of the equation, y = a + bx to the data. Figure 11.10
shows the (unweighted) y residuals plotted versus x. The trend in the residuals
indicates that the errors have a heteroscedastic distribution and therefore weighted
fitting is required.
Figure 11.9: Plot of x – y data as generated by sheet 11.1. The line of best fit (found
using unweighted least squares) is shown on the graph.
y = 4.5894x + 3.7994
50
45
40
35
30
25
20
15
10
5
0
2 4 6 8 10 12
x
y
88
Figure 11.10: Distribution of residuals when an unweighted fit is carried out.
In order to compare unweighted and weighted fitting of data to heteroscedastic data,
fifty sets of heteroscedastic data were generated in the manner described above. An
equation of the form:
y = a + bx (11.10)
was fitted to simulated data.
Unweighted fitting was performed using the LINEST() function in Excel. Weighted
fitting was performed with the aid of Solver
89
, where the weighting was chosen so that
the standard deviation in the ith value was taken to be proportional to y
i
, i.e.,
i i
y ∝ σ (11.11)
Figures 11.11 and 11.12 compares the scatter in the estimates parameters when
unweighted and weighted fitting is performed on data which is heteroscedastic. It is
clear from both figures that the weighted fit produces a much narrow distribution in
parameter estimates and is therefore preferred over the weighted fit .
89
Note that the equation being fitted is linear in the parameters and so fitting can be accomplished
using weighted linear least squares. However, as Excel does not possess an option that allows for easy
fitting in this manner, it is easier to construct a spreadsheet that minimises (using Solver) the sum the
residuals, SSR, where:
¯ 

.

\
 −
=
2
ˆ
i
i i
y
y y
SSR .
6
5
4
3
2
1
0
1
2
3
4
5
2 4 6 8 10 12
x
r
e
s
i
d
u
a
l
89
0
5
10
15
20
25
30
35

1
.
5

0
.
5
0
.
5
1
.
5
2
.
5
3
.
5
4
.
5
parameter estimate, a
f
r
e
q
u
e
n
c
y
unweight
weight
Figure 11.11: Distribution of the parameter estimate, a, when unweighted and
weighted fitting is carried out on fifty data sets.
0
5
10
15
20
25

4
.
6

4
.
4

4
.
2

4

3
.
8

3
.
6

3
.
4

3
.
2
parameter estimate, b
f
r
e
q
u
e
n
c
y
unweight
weight
Figure 11.12: Distribution of the parameter estimate, b, when unweighted and
weighted fitting is carried out on fifty data sets.
Exercise 9
a) Use equation 11.8 to generate y values for x = 1, 2, 3 etc. up to x = 10.
b) Add normally distributed homoscedastic noise with mean of zero and standard
deviation of unity to the y values generated in part a).
c) Fit equation 11.10 to the data using both unweighted fitting and weighted least
squares. For the weighted fit, assume that the relationship for the standard
deviation in the y values given by equation 11.11 is valid.
d) Repeat part c) at least 40 times. Construct histograms of the scatter in both a
and b for both weighted and unweighted fitting.
e) Calculate the mean and standard deviation of a and b that you obtained in
part d).
f) Is unweighted fitting by least squares demonstrably better than weighted
fitting in this example?
90
Section 12: Review
This document focuses primarily on fitting equations to data using the technique of
nonlinear least squares. In particular, the use of the Solver tool packaged with Excel
has been considered and how it may be employed for nonlinear least squares fitting.
For completeness, some discussion of linear least squares has been included and under
what circumstances linear least squares is no longer viable.
A most important aspect of fitting equations to data is to be able to determine standard
errors in the estimates made of any parameters appearing in an equation. Solver does
not provide standard errors, so this document describes the means by which standard
errors can be calculated using an Excel spreadsheet. An advantage of employing
Excel is that some aspects of fitting by nonlinear least squares which are normally
hidden from view when using a conventional computer based statistics package can
be made visible with Excel. I hope this leads to a deeper appreciation of nonlinear
least squares than simply entering numbers into a stats package and waiting for the
fruits of the analysis to emerge.
Some general issues relating to fitting by nonlinear least squares have been
discussed, such as the existence of local minima in SSR and means by which good
starting values may be established in advance of fitting.
We have also considered briefly how equations fitted to data can be compared in
order to determine which equation is the 'better' in a statistical sense, while at the
same time emphasising that any equation fitted to data should be supported on a
foundation of sound physical and/or chemical principles.
This document is not yet complete. I would like to include something in the future
about identifying and treating outliers as well as points of ‘high leverage’.
Acknowledgements
I would like to express my sincere thanks to Dr Mary Mulholland of the Faculty of
Science at UTS and Dr Paul Swift (formerly of the same Faculty) for suggesting
examples from chemistry and physics that may be usefully treated using nonlinear
least squares. From Luton University I acknowledge the assistance and
encouragement of Professor David Rawson, Dr Barry Haggert and Dr John Dilleen. I
thank my good friends John Harbottle and Peter Rowley for their excellent hospitality
while I was in the UK in 2002 preparing some of this material.
I also acknowledge a timely communication from Dr Marcel Maeder of Newcastle
University (New South Wales) who queried the omission of Excel’s Solver from my
book. I am grateful to Dr Maeder, as his query provided the spur to create this
document.
Finally, I thank the following organisations where parts of this document were
prepared: University of Technology, Sydney, University of Paisley, UK, University of
Luton, UK, and CSIRO, Lindfield, Australia.
91
Problems
1.
Standard addition analysis is routinely used to establish the composition of a sample.
In order to establish the concentration of Fe
3+
in water, solutions containing known
concentrations of Fe
3+
were added to water samples
90
. The absorbance of each
solution, y, was determined for each concentration of added solution, x. The
absorbance/concentration data are shown in table P1.
Concentration
(ppm), x
Absorbance
(arbitrary units), y
0 0.240
5.55 0.437
11.10 0.621
16.65 0.809
22.20 1.009
Table P1: Data for problem 1.
The relationship between absorbance, y, and concentration, x, may be written,
y = B(xx
C
) (P1)
Where B is the slope of the line of y versus x. x
C
is the intercept on the x axis which
represents the concentration of Fe
3+
in the water before additions are made.
Use nonlinear least squares to fit equation P1 to the data in table P1. Determine,
a) best estimates of B and x
C
[0.03441 ppm
1
, 7.009 ppm]
b) standard errors in B and x
C
[0.000277 ppm
1
, 0.159 ppm].
2.
Another way to analyse the data in table P1 is to write,
y = A + Bx (P2)
Here A is the intercept on the y axis at x = 0, and B is the slope. The intercept on the x
axis, x
C
(found by setting y = 0 in equation P2) is given by,
B
A
x
C
−
= (P3)
Use linear least squares to fit equation P2 to the data in table P1. Determine,
a) best estimates of A, B and x
C
[0.2412, 0.03441 ppm
1
, 7.009 ppm]
b) standard errors in the best estimates of A, B and x
C
[0.00376, 0.000277 ppm
1
,
0.159 ppm].
.
90
This problem is adapted from Skoog and Leary (1992).
92
Note that the errors in the best estimate of slope and intercept in equation P2 are
correlated and so the normal ‘propagation of uncertainties’ method is not valid when
calculating x
C
(see section 8.1).
3
In a study of first order kinetics, the volume of titrant required, V(t), to reach the end
point of a reaction is measured as a function of time, t. The following data were
obtained
91
.
t(s) V(t) (ml)
145 4.0
314 7.6
638 12.2
901 15.6
1228 18.6
1691 21.6
2163 24.0
2464 24.8
Table P2: Data for problem 3.
The relationship between V and t can be written,
V(t) = V
∞
 ( V
∞
V
0
)exp(kt) (P4)
Where k is the rate constant. V
∞
and V
0
are also constants.
Using nonlinear least squares, fit equation P4 to the data in table P2. Determine,
a) best estimates of V
∞
, V
0
and k [28.22 ml, 0.9906 ml, 0.0008469 s
1
]
b) standard errors in the estimates of V
∞
, V
0
and k [0.377 ml, 0.216 ml,
3.00 × 10
5
s
1
]
91
These data were taken from Denton (2000).
93
4.
Table P3 contains data obtained from a simulation of a chemical reaction in which
noise of constant variance has been added to the data.
92
Time,
t, (s)
Concentration,
C, (mol/l)
0 0.01000
20000 0.00862
40000 0.00780
60000 0.00687
80000 0.00648
100000 0.00595
120000 0.00536
140000 0.00507
160000 0.00517
180000 0.00450
200000 0.00482
220000 0.00414
240000 0.00359
260000 0.00354
280000 0.00324
300000 0.00333
320000 0.00309
340000 0.00285
360000 0.00349
380000 0.00273
400000 0.00271
Table P3: Simulated data taken from Zielinski and Allendoerfer (1997).
Assuming that the relationship between Concentration, C, and time, t, can be
written
93
,
kt C
C
C
0
0
1+
= (P5)
C
0
is the concentration at t = 0 and k is the second order rate constant.
Fit equation P5 to the data in table P3 to obtain best estimates for C
0
and k and
standard errors in the best estimates. [0.009852 mol/l, 0.0006622 l/mol⋅s,
0.00167 mol/l, 1.98 × 10
5
l/mol⋅s]
5.
Table P4 gives the temperature dependence of the energy gap of high purity
crystalline silicon. The variation of energy gap with temperature can be represented
by the equation,
92
See Zielinski and Allendoerfer (1997).
93
The assumption is made that a second order kinetics model can represent the reaction.
94
( ) ( )
T
T
E T E
g g
+
− =
β
α
2
0 (P6)
T
(K)
E
g
(T)
(eV)
20 1.1696
40 1.1686
60 1.1675
80 1.1657
100 1.1639
120 1.1608
140 1.1579
160 1.1546
180 1.1513
200 1.1474
220 1.1436
240 1.1392
260 1.1346
280 1.1294
300 1.1247
320 1.1196
340 1.1141
360 1.1087
380 1.1028
400 1.0970
420 1.0908
440 1.0849
460 1.0786
480 1.0723
500 1.0660
520 1.0595
Table P4: Energy gap versus temperature data.
where E
g
(0) is the energy gap at absolute zero and α and β are constants.
Fit equation P6 to the data in table P4 to find best estimates of E
g
(0), α and β as well
as standard errors in the estimates. Use starting values, 1.1, 0.0004, and 600
respectively for estimates of E
g
(0), α and β . [1.170 eV, 0.0004832 K
1
, 662 K,
7.8 × 10
5
eV, 4.7 × 10
6
K
1
, 11 K]
95
6.
In an experiment to study phytoestrogens in Soya beans, an HPLC system was
calibrated using known concentrations of the phytoestrogen, biochanin. Table P5
contains data of the area under the chromatograph absorption peak as a function of
biochanin concentration.
Conc. (x)
(mg/l)
Area, (y)
(arbitrary units)
0.158 0.121342
0.158 0.121109
0.315 0.403550
0.315 0.415226
0.315 0.399678
0.631 1.839583
0.631 1.835114
0.631 1.835915
1.261 3.840554
1.261 3.846146
1.261 3.825760
2.522 8.523561
2.522 8.539992
2.522 8.485319
5.045 16.80701
5.045 16.69860
5.045 16.68172
10.09 34.06871
10.09 33.91678
10.09 33.70727
Table P5: HPLC data for biochanin.
A comparison is to be made of two equations fitted to the data in table P5. The
equations are,
y = A + Bx (P7)
and
y = A + Bx
C
(P8)
Assuming an unweighted fit is appropriate, fit equations P7 and P8 to the data in
table P5.
For each equation fitted to the data, calculate the,
a) best estimates of parameters [0.4021, 3.404 (mg/l)
1
, 0.5650,
3.581 (mg/l)
0.979
, 0.9790]
b) standard errors in estimates [0.0575, 0.0127 (mg/l)
1
, 0.0885,
0.0790 (mg/l)
0.979
, 0.00903]
c) sum of squares of residuals (SSR) [0.6652, 0.5074]
d) Akaikes information criterion [4.15, 7.57]
e) residuals. Draw a graph of residuals versus concentration.
Which equation better fits the data?
96
7.
The relationship between critical current, I
c
, and temperature, T, for a high
temperature superconductor can be written,


.

\

−


.

\

− =
2
1
2
1
1 435 . 0 tanh 1 74 . 1
c
c
c
c
T
T
T
T
B
T
T
A I (P9)
Where A and B are constants and T
c
is the critical temperature of the superconductor.
For a high temperature superconductor with a T
c
equal to 90.1 K, the following data
for critical current and temperature were obtained:
T (K) I (mA)
5 5212
10 5373
15 5203
20 4987
25 4686
30 4594
35 4245
40 4091
45 3861
50 3785
55 3533
60 3199
65 2903
70 2611
75 2279
80 1831
85 1098
90 29
Table P6: Critical current versus temperature data for a high temperature
superconductor with critical temperature of 90.1 K.
Fit equation P9 to the data in table P6, to obtain best estimates for the parameters A
and B and standard errors in best estimates. [3199 mA, 11.7, 17.0 mA, 1.79]
8.
A sensor developed to measure the electrical conductivity of salt solutions is
calibrated using solutions of sodium chloride of known conductivity, σ. Table P7
contains data of signal output, V, of the sensor as a function of conductivity.
97
σ (mS/cm)
V (volts)
1.504 6.77
2.370 7.24
4.088 7.61
7.465 7.92
10.764 8.06
13.987 8.14
14.781 8.15
17.132 8.19
24.658 8.27
31.700 8.31
38.256 8.34
Table P7: Signal output from sensor as a function of electrical conductivity.
Assume that the relationship between V and σ is,
( ) [ ]
α
σ exp 1− + = k V V
s
(P10)
Where V
s
, k and α are constants.
Use unweighted nonlinear least squares to determine best estimates of the constants
and standard errors in the best estimates. [8.689 V, 1.460 V, 0.4281, 0.0190 V,
0.00740 V, 0.0108]
9.
In a study of the propagation of an electromagnetic wave through a porous solid, the
variation of relative permittivity, ε
r
, of solid was measured as a function of moisture
content, ν
w
(expressed as a fraction). Table P8 contains the data obtained in the
experiment
94
.
ν
w
ε
r
0.128 8.52
0.116 7.95
0.100 7.65
0.095 7.55
0.077 7.08
0.065 6.82
0.056 6.55
0.047 6.42
0.035 5.97
0.031 5.81
0.025 5.69
0.022 5.55
0.017 5.38
0.013 5.26
0.004 5.08
P8: Variation of relative permittivity with moisture content.
Assume the relationship between ε
r
andν
w
can be written,
94
Francois Malan 2002 (private communication).
98
( ) ( )
m m m w w m w w r
ε ε ε ε ν ε ε ν ε + − + − = 2
2
2
(P11)
where,
ε
w
is relative permittivity of water
ε
m
is the relative permittivity of the (dry) porous material
Use (unweighted) nonlinear least squares to fit equation P11 to the data in table P8
and hence obtain best estimates of ε
w
and ε
m
and standard errors in the best estimates
[55.44, 1.83, 5.067, 0.043]
10.
Unweighted least squares requires the minimisation of SSR given by,
( )
¯
− =
2
ˆ
i i
y y SSR (P12)
A technique sometimes adopted when optimising parameters in optical design
situations is to minimise S4R, where,
( )
¯
− =
4
ˆ 4
i i
y y R S (P13)
Perform a Monte Carlo simulation to compare parameter estimates obtained when
equations P12 and P13 are used to fit an equation of the form, y = a + bx to simulated
data. More specifically,
a) Use the function y = 2.1 – 0.4x to generate y values for x =1, 2, 3 etc up to
x = 20.
b) Add normally distributed noise of mean equal to zero and standard deviation of
0.5 to the values generated in part a).
c) Find best estimates of a and b by minimising SSR and S4R as given by
equations P12 and P13. (Suggestions: Solver may be used minimise SSR and
S4R.
d) Repeat steps b) and c) until 50 sets of parameter estimates have been obtained
using equation P12 and P13.
e) Is there any significant difference between the parameter estimates obtained
when minimising SSR and S4R?
f) Is there any significant difference between the variance in the parameter
estimates when minimising SSR and S4R?
99
11.
In section 10.1, the relationship between free fall acceleration, g(h) and height, h, was
written:
( )
2
0
1 
.

\

+
=
R
h
g
h g (P14)
To study the validity of equation P14, low noise data of free fall acceleration are
gathered over a range of values of height, h.
For h values small compared to the radius of the Earth, the acceleration will decrease
almost linearly with height. Applying the binomial expansion to equation 10.1, we
obtain for a first order approximation,
( ) 
.

\

− =
R
h
g h g
2
1
0
(P15)
Contained in table P9 are data of the variation of acceleration with height above the
Earth’s surface.
h (km) g (m/s
2
)
1000 7.33
2000 5.68
3000 4.53
4000 3.70
5000 3.08
6000 2.60
7000 2.23
8000 1.93
9000 1.69
10000 1.49
Table P9: Variation of acceleration due to gravity with height.
a) Use least squares to fit both equations P14 and P15 to the data in table P9 and
determine best estimates for g
0
and R.
b) Calculate standard errors in the best estimates.
c) Calculate and plot the residuals for each equation fitted to the data in table P9.
d) Is equation P15 a reasonable approximation to equation P14 over the range of
h values in table P9?
12.
The electrical resistance, r, of a particular material at a temperature, T, may be
described by,
BT A r + = (P16)
or
2
T T r γ β α + + = (P17)
100
where A, B, α, β, and γ are constants.
Table P10 shows the variation of the resistance of an alloy with temperature.
Table P10: Resistance versus temperature data for an alloy
r (Ω) 19.5 18.4 20.2 20.1 20.9 20.8 21.2 21.8 21.9 23.6 23.2 23.9 23.2 24.1 24.2 26.3 25.5 26.1 26.3 27.1 28.0
T (K) 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350
Using (unweighted) linear least squares, fit both equation P16 and P17 to the data in
table P10 and determine for each equation,
a) estimates for the parameters [12.41 Ω and 4.30 × 10
2
Ω/K, 14.0 Ω,
3.0 × 10
2
Ω/K and 2.7 × 10
5
Ω/K
2
]
b) the standard error in each estimate [0.49 Ω and 0.19 × 10
2
Ω/K, 2.2 Ω,
1.8 × 10
2
Ω/K and 3.6 × 10
5
Ω/K
2
]
c) the standard deviation, σ, in each y value [0.5304 Ω, 0.5368 Ω]
d) the sum of squares of the residuals, SSR [5.344 Ω
2
, 5.186 Ω
2
]
e) the Akaikes information criterion, AIC. [39.20, 40.57]
101
References
Akaike H A new look at the statistical model identification (1974) IEEE Transactions
on Automatic Control 19 716723.
AlSubaihi A A (2002) Variable Selection in Multivariate Regression using SAS/IML
http://www.jstatsoft.org/v07/i12/mv.pdf
Bard Y Nonlinear Parameter Estimation (1974) Academic Press, London.
Bates D M and Watts D G Nonlinear Regression Analysis and its Applications (1988)
Wiley, New York.
Bevington P R and Robinson D K Data Reduction and Error Analysis for the
Physical Sciences (1992) McGrawHill, New York.
Bube R H Photoconductivity of Solids (1960) Wiley New York.
Cleveland W S The Elements of Graphing Data (1994) Hobart Press, New Jersey.
Conway D G and Ragsdale C T Modeling Optimization Problems in the Unstructured
World of Spreadsheets (1997) Omega. Int. J. Mgmt. Sci. 25 313322.
Demas J N Excited State Lifetime Measurements (1983) Academic Press, New York.
Denton P Analysis of First Order Kinetics Using Microsoft Excel Solver (2000)
Journal of Chemical Education 77, 15241525.
Dietrich C R Uncertainty, Calibration and Probability: Statistics of Scientific and
Industrial Measurement 2
nd
edition (1991) Adam Hilger, Bristol.
Frenkel R D Statistical Background to the ISO ‘Guide to the Expression of
Uncertainty in Measurement’ (2002) CSIRO, Sydney p 43.
Fylstra D, Lasdon L, Watson J and Waren A Design and Use of Microsoft Excel
Solver (1998) Interfaces 28 2955.
Karlovsky J Simple Method for calculating the Tunneling Current in an Esaki Diode
(1962) Phys. Rev. 127 419.
Katz E, Ogan K L and Scott R P W Peak Dispersion and Mobile Phase Velocity in
Liquid Chromatography: The Pertinent Relationship for Porous Silica (1983)
J. Chromatogr. 270 5175.
Kennedy G J and Knox J H Performance of packings in high performance liquid
chromatography. 1. Porous and surface layer supports (1972) J. Chromatogr. Sci. 10
549556.
Kirkup L Data Analysis with Excel: An Introduction for Physical Scientists (2002)
Cambridge University Press, Cambridge.
102
Kirkup L and Cherry I Temperature Dependence of Photoconductive Decay in
Sintered Cadmium Sulphide (1988) Eur. J. Phys. 9 6468.
Kirkup L and Sutherland J Curve Stripping and NonLinear Fitting of
Polyexponential Functions to Data using a Microcomputer (1988) Comp. in Phys. 2
6468.
Moody H W The Evaluation of the Parameters in the Van Deemter Equation (1982)
Journal of Chemical Education 59, 290291.
Neter J, Kutner M J, Nachtsheim C J and Wasserman W Applied Linear Regression
Models (1996) Times Mirror Higher Education Group Inc.
NielsenKudsk F A Microcomputer Program in Basic for Iterative, NonLinear Data
Fitting to Pharmacokinetic Functions (1983) Int. J. BioMed. Comput. 14 95107.
Nocedal J Numerical optimization (1999) Springer: New York.
Perry A A Modified Conjugate Gradient Algorithm (1978) Operation Research 26
10731078.
Safizadeh M and Signorile R Optimization of Simulation via QuasiNewton Methods
(1994) ORSA J. Comput. 6 398408.
Salter C Error Analysis using the VarianceCovariance Matrix (2000) Journal of
Chemical Education 77, 12391243.
Skoog D A and Leary J J Principles of Instrumental Analysis 4
th
edition (1992)
Harcourt Brace: Fort Worth.
Smith S and Lasdon L Solving Large Sparse Nonlinear Programs Using GRG (1992)
Journal on Computing 4 216.
Snyder L R, Kirkland, J J and Glajch J L Practical HPLC Method Development 2
nd
Edition (1997) Wiley: New York.
Walker J S Physics (2002) Prentice Hall, New Jersey.
Walkenbach J Excel 2002 Power Programming with VBA (2001) M&T Books, New
York.
Walsh S and Diamond D Nonlinear Curve Fitting Using Microsoft Excel Solver
(1995) Talanta 42 561572.
Williams I P Matrices for Scientists (1972) Hutchinson University Library, London.
Wolsey, L A Integer Programming (1998) Wiley, New York.
Zielinski T J and Allendoerfer R D Least Squares Fitting of Nonlinear Data in the
Undergraduate Laboratory (1997) Journal of Chemical Education 74 10011007.
Preamble Least squares is an extremely powerful technique for fitting equations to data and is carried out in laboratories every day. Routines for calculating parameter estimates using linear least squares are most common, and many inexpensive pocket calculators are able to do this. As we move away from fitting the familiar equation, y = a + bx to data, we usually need to employ computer based programs such as spreadsheets, or specialised statistical packages to do the ‘number crunching’. In situations where an equation is complex, we may need to use nonlinear least squares to fit the equation to experimental or observational data. Nonlinear least squares is treated in this document with a focus on how Excel’s Solver utility may be employed to perform this task. Though I had originally intended to concentrate more or less exclusively on using Solver to carry out nonlinear least squares (due to the general availability of Excel and the fact that I’d already written a text discussing data analysis using Excel!), several other related topics emerged including model identification, Monte Carlo simulations and uncertainty propagation. I have included something about those topics in this document. In addition, I have tried to include helpful worked examples to illustrate the techniques discussed. I hope the document serves its purpose (I had senior undergraduates and graduates in the physical sciences in mind when I wrote it) and I would appreciate any comments as to what might have been included (or discarded).
2
CONTENTS
Section 1: Introduction 5 1.1 Reasons for fitting equations to data 7 Section 2: Linear Least squares 8 2.1 Standard errors in best estimates 11 Section 3: Extensions of the linear least squares technique 13 3.1 Using Excel to solve linear least squares problems 13 3.2 Limitations of linear least squares 14 Section 4: Excel's Solver addin 17 4.1 Example of use of Solver 17 4.2 Limitations of Solver 24 4.3 Spreadsheet for the determination of standard errors in parameter estimates 26 4.4 Confidence intervals for parameter estimates 28 Section 5: More on fitting using nonlinear least squares 30 5.1 Local Minima in SSR 30 5.2 Starting values 32 5.3 Starting values by curve stripping 33 5.4 Effect of instrument resolution and noise on best estimates 35 5.4.1 Adding normally distributed noise to data using Excel’s Random Number Generator 37 5.4.2 Fitting an equation to noisy data 38 5.4.3 Relationship between sampling density and parameter estimates 39 Section 6: Linear least squares meets nonlinear least squares 42 Section 7: Weighted nonlinear least squares 46 7.1 Weighted fitting using Solver 46 7.2 Example of weighted fitting using Solver 47 7.2.1 Best estimates of parameters using Solver 49 7.2.2 Determining the D matrix 50 7.2.3 The weight matrix, W 51 7.2.4 Calculation of
(D
T
WD
)
−1
52
7.2.5 Bringing it all together 52 Section 8: Uncertainty propagation, least squares estimates and calibration 54 8.1: Example of propagation of uncertainties involving parameter estimates 55 8.2 Uncertainties in derived quantities incorporating least squares estimates 59 8.3: Example of propagation of uncertainties in derived quantities 60 3
8.4: Uncertainty propagation and nonlinear least squares 61 8.4.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least squares 62 Section 9: More on Solver 67 9.1 Solver Options 67 9.2 Solver Results 70 Section 10: Modelling and Model Identification 71 10.1 Physical Modelling 71 10.2 Data driven approach to discovering relationships 72 10.3 Other forms of modelling 73 10.4 Competing models 73 10.5 Statistical Measures of Goodness of Fit 74 10.5.1 Adjusted Coefficient of Multiple Determination 74 10.5.2 Akaike’s Information Criterion (AIC) 75 10.5.3 Example 76 Section 11: Monte Carlo simulations and least squares 78 11.1 Using Excel’s Random Number Generator 79 11.2 Monte Carlo simulation and nonlinear least squares 82 11.3 Adding heteroscedastic noise using Excel’s Random Number Generator 86 Section 12: Review 90 Acknowledgements 90 Problems 91 References 101
4
Scientists in all areas of the physical sciences search for defensible models that describe the way nature works. As a part of that search they often investigate the relationship between physical variables. As examples, they might want to know how the, • • • • electrical resistance of a superconductor depends on the temperature of the superconductor. width of an absorption peak in liquid chromatography depends on the flow of the mobile phase through a packed column. electrical permittivity of a solid depends on the moisture content in the solid. output voltage from a conductivity sensor depends on the electrical conductivity of the liquid in which the sensor is immersed.
Section 1: Introduction
A model that explains or describes the relationship between physical variables may be devised from first principles, or it may represent a new development of an established model. Whatever the situation, once a model has been devised, it is prudent to compare it to ‘real’ data obtained by experiment. One reason for doing this is to establish whether predictions of the model are consistent with experimental data. Consider a specific example in which nuclear radiation passes through material of thickness, x. The relationship between the intensity, I, of the radiation and x can be written,
I = Io exp(− µx ) + B
(1.1)
Io is the intensity recorded in the absence of the material when the background radiation is negligible, µ is the absorption coefficient of the material and B is the background intensity. The appropriateness (or otherwise) of equation 1.1 may be investigated for a particular material by considering radiation intensity versus material thickness data, as shown in figure 1.1.
5
1200 1000 Intensity (counts) 800 600 400 200 0 0.0 0.5 1.0 Thickness (cm) 1.5 2.0
Figure 1.1: Intensity versus thickness data. If equation 1.1 fairly describes the relationship between intensity and thickness, we should be able to find values for Io, µ and B such that the line generated by equation 1.1, when x varies between x = 0.0 and x = 2.0, ‘fits’ the data shown in figure 1.1 (i.e. passes close to the data points). We could begin by making an intelligent guess at values for Io, µ and B. Figure 1.2 shows the outcome of one attempt at guessing values for Io, µ and B.
Line generated using equation 1.1, where, Io = 800 µ=1 B = 10
1200 1000 Intensity (counts) 800 600 400 200 0 0.0 0.5
1.0 Thickness (cm)
1.5
2.0
Figure 1.2: Line drawn through intensity versus thickness data using equation 1.1. It would have been fortuitous had the guesses for Io, µ and B, given in figure 1.2 produced a line that passed close to the data. We could try other values for Io, µ and B and through a process of ‘trial and error’ improve the fit of the line to the data. However, it must be admitted that this is an inefficient way to fit any equation to data
6
the parameters in equation 1.and that guesswork must give way to a better approach.1 to a particular material is likely to have been studied by other workers. In particular. For example. A graph of absorption.1 may be compared to that reported by others.1 have physical meaning. 1. 7 . There are situations in which an equation is fitted to data for the purpose of calibration and no attempt is made to relate parameters in the equation to physical constants. However. a compelling reason for fitting an equation in the physical sciences is that it provides for an insightful interpretation of physical or chemical processes or phenomena. the concentration of a particular chemical species might be determined using Atomic Absorption Spectroscopy (AAS). For example. y. is plotted. it is possible to determine species concentration from measurements made of absorption. x. the fitting of an equation can assist in validating or refuting a theoretical model and allow for the determination of physically meaningful parameters1. The next step is to fit an equation to the data. Using the equation.1 is a quantity that characterises radiation absorption by a material. Therefore a value for µ as determined through analysing the data in figure 1. versus concentration. This is the main consideration of this document. As an example.1 Reasons for fitting equations to data It is possible to fit almost any equation to any data. An instrument is calibrated by measuring the absorption of known concentrations of the species. The applicability of equation 1. 1 This issue is taken up again in section 10. µ in equation 1.
and assuming the relationship between x and y is linear. y = α + βx (2.3 to 2. 2) Errors in the y values are normally distributed with a mean of zero and a constant variance. expected or proposed relationship between variables measured during the experiment. (2. 3 2 8 . and the independent (or predictor) variable. a. y. a. the intercept and slope cannot be known exactly. • • • b + cx x y = a + bx + cz. We will represent the best estimates of α and β by symbols a and b respectively3. This is not possible. y = a + bx (2. 3) Errors in the y values are uncorrelated. A powerful and widely used technique for establishing best estimates of parameters4 is that of least squares.2) where α is the ‘true intercept’ and β is the ‘true slope’. ˆ ˆ In some texts. so that. we should be able to find the intercept and slope by drawing a straight line through the points.5 may be determined using the technique of least squares. 5 The technique is also widely referred to as regression. In perhaps the most common situation.5) In this discussion of least squares. 4 Refer to chapters 6 and 7 of Kirkup (2002) for more details. the following assumptions are made: 1) There are no errors in the x values. Constant variance errors are sometimes referred to as homoscedastic errors. b.4) (2. In principle. the relationship between the dependent (or response) variable. b and c in the equations 2. for example.3) (2. x. α and β are often referred to as parameters2 and through applying techniques based on sound statistical principles. may be expressed as. Sometimes referred to as population parameters or regression coefficients. In practice.Often in an experiment there is a known. and slope. If it were possible to eliminate all sources of error. The technique5 is versatile and allows parameters to be estimated when the relationship between x and y is more complex than that given by equation 2. (here both x and z are independent variables) y=a+ y = a+ b[1 − exp (cx )] . For example. best estimates of α and β are written as α and β respectively.1) Section 2: Linear Least squares Equation 2. as this would require that we eliminate (or correct for) all sources of random and systematic error in the data. it is possible to establish best estimates of those parameters.1. we could write the ‘exact’ relationship between x and y as. the error in the ith y value is not correlated to the error in the (i+1)th y value.1 is the equation of a straight line with intercept.
7) SSR is the Sum of Squares of the Residuals8. The equations to be solved for a and b can be expressed in matrix form as: 6 7 ˆ yi is sometimes referred to as ‘y hat’. This is the key step in any least squares analysis. Best estimates could be found by ‘trial and error’.7. b= n n xi yi − 2 i x −( xi xi ) yi 2 (2. An added advantage of the matrix approach is that it may be conveniently extended to situations in which more complex equations are fitted to experimental data. an equation for the best line can be found analytically by partially differentiating SSR with respect to a and b in turn then setting the resulting equations equal to zero. SSR = i =n i =1 ˆ ( y i − yi )2 (2.The ith observed y value is written as yi and the ith value of x as xi. The summation is written7. 9 . such that6.8) and. The ith predicted y ˆ value found using the equation of the line is written as yi . Simultaneous equations obtained by this process are solved for a and b to give. ˆ yi = a + bxi (2. The next stage is to find values of a and b which minimise SSR in equation 2. equation 2. where n is the number of data points. When a straight line is fitted to data. Strictly. We sum ( y i − yi )2 from i = 1 to i = n. In future we assume that all summations are carried out between i = 1 to i = n. Weighted least squares is considered in section 7. and therefore we omit the limits of the summations. as values of a and b that minimise SSR are regarded as the best estimates obtainable of the parameters in an equation9.6) The least squares technique of fitting equations to data requires the calculation of ˆ ˆ ( y i − yi )2 .9) An elegant approach to determining a and b employs matrices. 9 The process by which estimates are varied until some condition (such as the minimisation of SSR) is satisfied is often called ‘optimisation’.7 applies to fitting by ‘unweighted’ least squares. 8 ˆ yi − yi is referred to as the ith residual. a= xi2 n x −( 2 i yi − xi xi ) xi y i 2 (2. or by a systematic numerical search using a computer.
1 contains xy data which are shown plotted in figure 2.10) Equation 2. A1 is used in the calculation of the standard errors in parameter estimates and is sometimes referred to as the ‘error matrix’.n xi xi x 2 i a = b yi xi yi (2.10 can be written concisely as.11 is manipulated to give.1.1. Exercise 1 Table 2. 10 10 .12) where A1 is the inverse matrix10 of the matrix. especially if matrices are large. B = A1P (2. A= AB = P n xi xi x 2 i (2. Using the data in table 2. Table 2. where.1: xy data. x 2 4 6 8 10 y 70 63 49 42 31 y 80 70 60 50 40 30 20 10 0 0 2 4 6 x 8 10 12 Figure 2. A.1: Linearly related xy data. equation 2.11) yi xi yi B= a b P= To determine elements a and b of the matrix B. The built in matrix functions in Excel are well suited to estimating parameters in linear least squares problems. Matrix inversion and matrix multiplication are onerous to perform manually.
14 V = σ 2 A −1 A1 appears in equation 2.18) (2. of a straight line fitted to the data using linear least squares [80.1 Standard errors in best estimates In addition to the best estimates. Standard errors in a and b are written explicitly as. 12 11 11 . [9.94].16.19) See Kirkup (2002) p226.7. σa and σb may be determined using matrices13. may be written. − σ a = σ (A111 ) 2 1 − σ b = σ (A221 ) 2 1 (2. SSR. b. σ2 can be found using equation 2. 2. σa = σ( ∆ xi2 ) 1 2 1 2 (2.16) Alternatively.17) (2.15) 1 2 2 1 σ ≈ n−2 ˆ ( yi − y i ) (2. the usual starting point is to determine the standard errors in a and b. As a consequence.9]. 14 The covariance matrix is considered in more detail in section 9.i) ii) iii) find best estimates for the intercept.13) σb = where σn ∆ 1 2 1 2 (2. The covariance matrix. (1996). contains elements which are the variances (as well as the covariances) of the best estimates of a and b. See Bevington and Robinson (1992). Calculations of a and b depend on the measured y values. σa and σb are given by12. a. a and b. written as σa and σb respectively. the standard errors in a and b are required as this allows confidence intervals11 to be quoted for the parameters α and β. 13 See chapter 5 of Neter et al. 4. V. calculate the sum of squares of residuals. draw the line of best fit through the points. V. and the slope. uncertainties in the y values contribute to the uncertainties in a and b.12.14) ∆=n and x i2 − ( xi ) 2 (2. In order to calculate uncertainties in a and b.
1.− − A111 and A221 are diagonal elements of the A1 matrix15. or otherwise. [1. Exercise 2 Using matrices. 12 .9. determine the standard errors in the intercept and slope of the best straight line through the data given in table 2. 0.29] 15 See Williams (1972).
When it is not valid. As an example. where x and z are the independent variables (this is sometimes referred to as ‘multiple regression’). but first we consider nonlinear least squares.The technique of least squares used to fit equations to experimental data can be extended in several ways: Section 3: Extensions of the linear least squares technique • • • Weighting the fit. is referred to as ‘heteroscedasticity’. The condition where the variance in y values is not constant for all x. Equations may be fitted using linear least squares in which the equations have more than one independent variable. which can arise when there are more than two parameters to be estimated. More complex equations may be fitted to the data. The added computational complexity.10. 3. Excel does not provide an easy to use utility for fitting an equation to data requiring the application of nonlinear least squares. However. These methods are most conveniently applied using a computer for matrix manipulation/inversion. section 6. Equations such as b y = a + + cx and y = a + bx + cx2 are linear in the parameters and may be x fitted using linear least squares. We will deal with Solver in sections 4 and 9. The assumption that the standard deviation in y values is the same for all values of x (a characteristic which is sometimes referred to as homoscedasticity16) may not be valid.1 Using Excel to solve linear least squares problems Excel is capable of fitting functions to data that are linear in parameters. fitting using nonlinear least squares is possible. with the aid of a powerful addin called ‘Solver’ resident in Excel. 17 See Kirkup (2002). This may be achieved by using one of the following features in Excel: • • The LINEST() function The Regression tool in the Analysis ToolPak Excel has no built in tool for performing weighted least squares. in effect forcing the line closer to those points that are known to higher precision. the equation y = a + bx + cz may be fitted to data. Weighted fitting is considered in section 7. favours fitting by matrix methods. 13 16 . though a spreadsheet may be created to perform this procedure17. we need to ‘weight’ the fit.
Taking natural logarithms of both sides of equation 3.3. are linear in the parameter estimates. we obtain. If data show small scatter. ∂SSR etc. The resulting equations are set equal to zero and solved to find best estimates of the parameters.3) where R0 and γ are constants. the errors in ln(R) do not have constant variance. T. of some semiconductor materials is known to follow the relationship. least squares may be used to find best estimates for ln R0 (and hence R0) and γ. R = R0 exp γ T (3. In this circumstance weighted fitting is required18. ‘Linear’ refers to the fact that the partial derivatives. ∂SSR . 18 See Dietrich (1991) p303. As examples. Some relationships between physical variables require transformation before they are suitable for fitting by linear least squares.4) y = a + b x Taking the y values to be ln R and the x values to be 1/T. with temperature. SSR is partially differentiated with respect to each parameter estimate in turn. ln R = ln R 0 + γ 1 T (3. Weighted fitting of equations using least squares matters most when the scatter in data is large.1) (3.2 may be fitted to data using linear least squares. then after transformation. y = a + b ln x + c exp x c y = a + bx + 2 x (3.7.3 and comparing the resulting equation with y = a + bx.1 and 3. equations 3. 14 . the variation of electrical resistance.2) The equation to be fitted is inserted into equation 2. R. then the best estimates found using weighted least squares are very similar to the best estimates found by using unweighted least squares.2 Limitations of linear least squares Quite complex functions can be fitted to data using linear least squares. Using this definition. It is worth highlighting that the ‘linear’ in linear least squares does not mean that a plot of y versus x will produce a graph containing data which lie along a straight line. as ∂a ∂b described in section 2. As an example. If the errors in R have constant variance.
We must therefore resort to another method of finding best estimates. A surface may be constructed.1 shows a hypersurface which depends on estimates a and b. there are some equations that cannot be transformed into a form suitable for fitting by linear least squares.5) y = a + b exp cx y = a + b[1− exp cx] y = a exp bx + c exp dx (3.6) (3. where M is the number of parameters appearing in the equation to be fitted to data. SSR a Minimum in hypersurface b Figure 3.ppt by Alexey Pomerantsev.5 to 3. y=a+ bx 2 c+ x (3. Figure 3. SSR may be considered to be a continuous function of the parameter estimates.8) For equations 3. 15 . sometimes referred to as a hypersurface19 in M dimensional space. a and b.ras. 19 See Bevington and Robinson (1992).7) (3. c etc which yield a minimum in the hypersurface.chph.1: Variation of SSR as a function of parameter estimates. That method still requires that parameter estimates are found that minimise SSR. As examples.ru/nlr. a. b. these estimates are regarded as the best estimates of the parameters in the equation. This figure is adapted from rcs. As with linear least squares.Though transforming equations can assist in many situations.8 it is not possible to obtain a set of linear equations that may be solved for best estimates of the parameters. The intention is to use nonlinear least squares to discover estimates.
can be found which minimise SSR. 1992). In this situation linear least squares offers a more efficient route to determining best estimates of the parameters (and the standard errors in the best estimates). a linear equation can be fitted to data using nonlinear least squares. b. The answer obtained for best estimates of parameters and the standard errors in the best estimates should agree. 16 .20 20 We consider this in more detail in section 6. 1988). Nonlinear least squares is unnecessary when the derivatives of SSR with respect to the parameters are linear in parameters. 1983) and the Marquardt algorithm (Bates and Watts. c etc. irrespective of whether a linear equation is fitted using linear or nonlinear least squares.Fitting by nonlinear least squares begins with reasonable guesses for the best estimates of the parameters. Gauss Newton method (NielsenKudsk. The computational complexity of the iteration process means that nonlinear least squares can only realistically be carried out using a computer. including Grid Search (Bevington and Robinson. The objective is to modify the starting values in an iterative fashion until a minimum is found in SSR. Nevertheless. There are many documented ways in which the values of a.
1 Example of use of Solver Consider an experiment in which the rise of air temperature in an enclosure (such as a room) is measured as a function of time as heat passes through a window into the enclosure. s 17 . Table 4. Figure 4.9 30.1: Variation of air temperature in an enclosure with time.3 29.6 30.1 30.9 28. See also Smith and Lasden (1992).1 contains the raw data. It does this by iteratively altering the numerical value of variables contained in the cells of a spreadsheet until SSR is minimised.5 29.6 28. Time (minutes) 2 4 6 8 10 12 14 16 18 20 22 24 Temperature (°C) 26.1 26. is one of the many ' ins' add available in Excel21.7 Section 4: Excel's Solver addin 21 22 See Fyltra et al. Solver is a powerful and flexible optimisation tool which is capable of finding (as an example) the best estimates of parameters using least squares.8 29.1 displays the same data in graphical form. first introduced in 1991.4 30. 4. Table 4. Originally designed for business users. (1998) See Excel' online Help. To solve nonlinear problems. Features of Solver are best described by reference to a particular example.Solver.8 27. Solver uses Generalized Reduced Gradient (GRG2) code developed at the University of Texas and Cleveland State University22.
3. k and α are constants.2) 23 Equation 4. a relationship may be derived for the air temperature. equation 4. 4. 2. Through a consideration of the flow of heat into and out of an enclosure. then click on the Down option24.0 Temperature (C) 28. Equation 4. Type =$B$15+$B$16*(1EXP($B$17*A2)) into cell C2 as shown in sheet 4. b and c.1 into columns A and B of an Excel worksheet as shown in sheet 4.1: Temperature variation with time inside an enclosure.32. b and c respectively. y = a + b[1− exp(cx )] To find best estimates. T = Ts + k [1 − exp(αt )] (4.0 0 5 10 15 20 25 Time (minutes) Figure 4.1.1. inside the enclosure as a function of time.1 becomes23.0 20.0 24. 18 . Click on the Edit menu.1 may be written in a form consistent with other equations appearing in this document. Enter the raw data from table 4.0 22.0 26. Cells B15 to B17 contain the starting values for a.1) where Ts. a. 24 These steps are often abbreviated in Excel texts to Edit ®Fill ®Down. we proceed as follows: 1.0 30. Use the cursor to highlight cells C2 to C13. where the decay is characterised by a single time constant – see Walsh and Diamond (1995). Click on the Fill option. t. b and c for the parameter estimates. T. (4. Using x and y for independent and dependent variables respectively and a.2 is of the same form as that fitted to data obtained through fluorescent decay measurements. The relationship can be expressed.
9 28.9E+08 3.1 26.5 29.3 26498009287 1.9 30.066229 184323.31124E+15 2.01674E+20 7. Choosing good starting values for parameter estimates is often crucial to the 25 The estimated values of the dependent variable based on an equation like equation 4.5 162753 1202602 8886109 6.5982 401.429 2978. y .6 28.2 bear no resemblance to the experimental values in column B.6 28. these are calculated in column D.8 27.35385E+17 1. SSR.4 30. 19 .6 30.1 26.2: Calculation of sum of squares of residuals. As a consequence.6E+10 SSR = D ˆ (y.44632E+12 7. Estimated values are represented by the ˆ symbol.1 30. ˆ b and c are poor.3 29. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 A x(mins) 2 4 6 8 10 12 14 16 18 20 22 24 a b c B y(°C) 26.38906 52.2 must be distinguished from values obtained through experiment.89635E+13 4.96 22024. y.2 shows the values returned in the C column. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 A x(mins) 2 4 6 8 10 12 14 16 18 20 22 24 a b c B y(°C) 26. is calculated in cell D14 by summing the contents of cells D2 through to D13.6 30.8 29.28516E+19 7. y and experimental values by the symbol.6E+07 4.045 486333300.5 29.4 30.Sheet 4.8 27.3 29.y )2 (°C2) 991.7 1 1 1 C ˆ y (°C) =$B$15+$B$16*(1EXP($B$17*A2)) Sheet 4.8 29.7 1 1 1 C ˆ y (°C) 5. as the predicted values. Sheet 4. It is clear that the choices of starting values for a.2129 9045405. As the squares of the residuals are required.1: Temperature (y) and time (x) data from table 4.1 30.14765E+20 The sum of the squares of residuals.6E+09 2.9 28.560654 6304.9 30.1 entered into a spreadsheet25. SSR is very large. in column C of sheet 4.
To accomplish this. minimised or reaches a specified value.2: Solver dialog box with cell references inserted. After a few seconds Solver returns with the dialog box shown in figure 4. Solver is capable of adjusting cell contents such that the value in the target cell is maximised. If Solver does not appear.success of fitting equations using nonlinear least squares and we will return to this issue later. No constraints are applied in this example. then on the same pull down menu. Solver is able to adjust the parameter estimates in cells B15 to B17 until the number in cell D14 is minimised. click on the Solve button. 20 . For least squares analysis we require the content of the target cell to be minimised.3.2 should appear. After a short delay. After entering the information into the dialog box. select AddIns and tick the Solver Addin box. Click on Solver. The dialog box shown in figure 4. It is possible to constrain the values in one or more cells (for example a parameter estimate can be prevented from assuming a negative value. We want to minimise the value in cell D14 so D14 becomes our ' target' cell. Excel alters the values in these cells in order to minimise the value in cell D14 Figure 4. SSR in cell D14 is reduced by carefully altering the contents of cells B15 through to B17. Solver should be added to the Tools pull down menu. if a negative value is considered to be ' unphysical' ). choose Tools on Excel' Menu s bar and pull down to Solver.
176666436 0. 21 .99648 29.67277 29.67792 29.5 29.1 30.43473 0.049318684 0.68028 29.239842411 0.9 30.127929367 0.822210907 1.65767 29.Figure 4.6 28. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A x(mins) 2 4 6 8 10 12 14 16 18 20 22 24 a b c B y(°C) 26.3 shows the new parameters.75011 27.3.452872267 0.781649268 1.3: Best values for a. b and c returned by Solver when starting values are poor.5371 C ˆ y (°C) 24.48411 29. Inspection of cells B15 to B17 in the spreadsheet indicates that Solver has adjusted the parameters. all is not as satisfactory as it might seem.68049 29. Sheet 4.68056 SSR = D 2 ˆ 2 (y.01618844 0.1 26. However.517990468 0.3: Solver dialog box indicating that fitting has been completed.039257719 9.3 29.9 28. SSR.845498795 1.6 30.7 15.8 27.3 is almost 20 orders of magnitude smaller than that in cell D14 in sheet 4. Sheet 4. etc.61348 29.24587 14.431570819 1.2.67968 29.4 30. Consider the best line through the points which utilises the parameter estimates in cells B15 through to B17 of sheet 4.y ) (°C ) 1.10535 29.8 29.500995581 SSR in cell D14 in sheet 4.
1.5371x )] 15 20 25 Figure 4.2.5 and c0 = 0. Starting values for b and c may also be established by a similar preliminary analysis of data which we will consider in section 5. See Cleveland (1994) and Kirkup (2002) for a discussion of residuals. The source of the problem can be traced to the poorly chosen starting values (i. no plot of residuals is required in this case to reach the conclusion that the line on the graph in figure 4. Denoting starting values by a0. For convenience units are omitted until the analysis is complete. Solver has discovered a minimum in SSR.3.25 + 14. Drawing a line ' eye' by through the data in Figure 4.2) that when x = 0. y = a. a0 = 25.31 30 y 29 28 27 26 25 0 5 10 x ˆ y = 15.1 indicates that. y ≈ 25.2.2 where a. a = b = c = 1). 28 All parameter estimates in this example have units (for example the unit of c is min1.12 Inserting these values into sheet 4.4 is not a good fit to the experimental data.e.5 °C.e. 27 26 22 . Solver has found a minimum in SSR.5. but this is a local minimum27 and the parameter estimates are of little worth. b and c have the values given in sheet 4. a plot of ( yi − y i ) versus xi) is often used as an indicator of the ‘goodness of fit’ of an equation to data.5. Local minima are discussed in section 5. when x = 0. However.2 and running Solver again gives the output shown in sheet 4. there is another combination of parameter estimates that will produce an even lower value for SSR.4 and in graphical form in figure 4. we find28.4: Graph of y versus x showing the line based on equation 4. Working from these initial estimates.43[1 − exp(− 0. ˆ A plot of residuals (i. assuming time is measured in minutes). However. b0 = 5. with trends in the residuals indicating a poor fit26. Methods by which good starting values for parameter estimates may be obtained are considered in section 5. In the example under consideration here. we note (by reference to equation 4. b0 and c0.
001759666 0.4.29263495 31 30 29 28 27 26 25 0 5 10 x 15 20 25 y ˆ y = 24. the line fitted to the data in figure 4. In addition.09367 x )] Figure 4.54949E05 0.96361E05 0. This indicates that the parameter estimates obtained using Solver when good starting values are used are rather better than those obtained when the starting values are poorly chosen.8 27.5 29.5: Graph of y versus x showing line and equation of line based on a.34972 28. This is further reinforced by the plot of residuals shown in figure 4.3.94195 30.029716516 0.6 28.3 29.1 26.Sheet 4.98118 6.29326 29.00014558 0.y )2 (°C2) 0.387988 0.093668 C ˆ y (°C) 26.97734 27. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A x(mins) 2 4 6 8 10 12 14 16 18 20 22 24 a b c B y(°C) 26.07247 26.4.031447922 0.8 29.6 which exhibit a random scatter about the x axis.4 is less than that in cell D14 of sheet 4.1 30.7 24.4 30.023136202 0.9 28.000757691 0.007356121 0.72762 28. b and c in sheet 4.6 30. The sum of squares of residuals in cell D14 of sheet 4. 23 .001974583 2.69456 SSR = D ˆ (y.18577 30.55556 30.388[1 − exp(− 0.062639751 0.4: Best values for a.86555 29.9 30. b and c returned by Solver when starting values for parameter estimates are good.64789 29.133625786 4.38793 30.98 + 6.5 (where the line is based upon the new best estimates of the parameters) is far superior to the line fitted to the same data in shown in figure 4.
given by equation 2. such as those appearing in equation 4. 30 Compare these with equations 2.3 0. ∂y i ∂a ∂y i ∂yi ∂a ∂b ∂y i ∂yi ∂a ∂c 2 E= ∂y i ∂yi ∂a ∂b 2 ∂y i ∂b ∂y i ∂yi ∂b ∂c ∂y i ∂yi ∂a ∂c ∂y i ∂yi ∂b ∂c 2 ∂yi ∂c (4.4 0 5 10 15 20 25 30 x Figure 4. b and c are the diagonal elements of the covariance matrix. Solver does not provide standard errors in the parameter estimates. 24 29 . (1996) chapter 13.2 ˆ y − y 0.3 0. V. − σ a = σ (E111 ) − σ b = σ (E 221 ) 1 2 (4. Explicitly30.0.2 Limitations of Solver Solver is able to efficiently solve for the best estimates of parameters in an equation.19.6: Plot of residuals based on the data and equation in figure 4. If there are three parameters to be estimated. 4. See Neter et al.3) The standard errors in a .2. as without them it is not possible to quote a confidence interval for the estimates and so we cannot decide if the estimates are ‘good enough’ for any particular purpose. However.17. the standard errors in the parameter estimates can be determined with the assistance of the matrix of partial derivatives given by29.5) 1 2 Note that this approach can be extended to any number of parameters. Standard errors in estimates are extremely important.18 and 2.1 0.2 0.5.1 0 0.4) (4.
c. c . i and i . It is possible in some situations to determine the partial derivatives analytically.c . c. .9) The partial derivatives in equation 4. are approximated using. ∂y ∂y Similarly. xi a(1 + δ ) − a As double precision arithmetic is used by Excel. ∂b ∂c ∂y i ∂b ≈ a . xi ] − y[a. the perturbation. minus the number of parameters. b. ∂a ∂a ∂y i y[a (1 + δ ). in equation 4.9 are evaluated on completion of fitting an equation using Solver. is to use the method of ‘finite ∂y ∂y differences’ to find 1 . xi ] − y[a . xi y[a . b(1 + δ ). 25 31 . A more flexible approach and one that is generally more convenient. b and c that minimise SSR.8) (4. 2 etc. the partial derivatives.10) ∂a b. The numbers of degree of freedom is the number of data points. In general. b.6) where31. n. D is given by. D.− σ c = σ (E331 ) 1 2 (4. p. b. E = DTD DT is the transpose of the matrix.7) A convenient way to calculate the elements of the E matrix is to write. at values of a. xi ] b(1 + δ ) − b (4. 1 σ ≈ n−3 ˆ ( y i − yi ) 1 2 2 (4.10 can be as small as = 106 or 10 7. in the equation. i. In this example.11) n 3 in the denominator of the term in the square brackets of equation 4.e.7 appears because the estimate of the population standard deviation in the y values requires that the sum of squares of residuals be divided by the number of degrees of freedom. c. p = 3. xi ] ≈ (4. c. ∂y1 ∂a ∂y 2 D = ∂a ∂y i ∂a ∂y n ∂a ∂y1 ∂b ∂y 2 ∂b ∂yi ∂b ∂y n ∂b ∂y1 ∂c ∂y 2 ∂c ∂y i ∂c ∂y n ∂c (4.
387988 0. xi ] − y[a. xi y[a .98118 6. Sheet 4.c constant 24. The partial derivative ∂y1 ∂a is calculated by entering the formula b . By using Fill®Down.10 may be determined by entering the formula = $H$20$G$20 into a cell on the spreadsheet.12) 4.000001×a. x1 =(H2C2)/($H$20$G$20) into cell L2 of sheet 4.and.b constant 24.c constant 24.98118 6.9. 2001). The contents of cells L2 to N13 become the elements of the D matrix given by equation 4. To obtain i .000001×c. it is necessary to perturb ∂a ∂a a slightly (say to 1.c . xi ] c(1 + δ ) − c (4. as obtained by Solver for a.5 shows the optimum values. 32 33 It is possible to combine these steps into a macro of Visual Basic program (see Walkenback .09367 We use the modified parameter estimates to calculate the numerator in equation 4.093668158 J a.b .633. The denominator in equation 4.000001×b and cell J22 contains the value 1. we describe a stepbystep approach using an Excel spreadsheet32. b.093668158 I a.6. ∂y i ∂c ≈ a .5: Modification of best estimates of parameters. respectively of sheet 4.000001×a) while leaving the parameter estimates b and c at their optimum values. Sheet 4.3 Spreadsheet for the determination of standard errors in parameter estimates In an effort to clarify the process of estimating standard errors. Cell H20 contains the value 1.387988103 0.98120574 6.387994491 0. ∂y ∂y To find good approximations of the derivatives 1 and 2 . b and c in cells G20 to G22. 26 . b. i etc. 19 20 21 22 23 F a b c G from solver 24. c(1 + δ ). this process is repeated for columns M ∂b ∂c and N.093668 H b. c.4.10. Cell I21 contains the value 1. the formula may be copied into cells in the L column so that the partial derivative is ∂y ∂y calculated for every xi.98118076 6.387988 0. The values in the C column of the spreadsheet are shown in sheet 4.
0724749 26.34972 28. i. 2.65898 T (4. The first is to calculate the square root of each diagonal element of the matrix E1.13) 246.29325932 29.8741 12 E = D D = 7.72761795 28. 1 σ ≈ × 0. 27 .72762 28. Press Ctrl ® Shift ®Enter to transpose of contents of cells L2 to N13 into cells B24 to N26.8493 24.6: Calculation of partial derivatives. constant) 26.5934 17.7. constant 26.52732 0.77658 0.14) Two more steps are required to calculate the standard errors in the parameter estimates.0054669 1 (4.1556 25.29327999 29.64 The MINVERSE() function in Excel is used to find the inverse of E. a.9773606 27.94197336 30.97733762 27.8741 5252.55557 30.89439 N dy/dc 10.062 5.60807 0.b.870672 − 0. i.03456 − 0.31249 0.94195 30.6247 17.67503 0.0362 24.38795933 30.1907 Excel’s TRANSPOSE() function is used to transpose the D matrix.4.69456181 I ˆ y with. The second is to calculate σ using equation 4.34974564 28.64791909 29.38794 30.86555249 29..34972402 28.c.69458107 H ˆ y with.49356 160.5555887 30.42994 0.65898 246.Sheet 4.07247 26.c constant 26.3012 19.07249879 26.94195334 30. We proceed as follows: • • • • Highlight cells B24 to N26.38793976 30.6479 29.86557359 29.5673 21.03456 0. a.8355 21. In cell B24 type =TRANSPOSE(L2:N13).86555 29.8993 16.64789877 29.87264 0.29326 29.18577304 30.97734 27.09005 − 0.48539 − 0.81475 0.18579282 30..17084 0.09005 E = − 0.e.e.062 160.84639 0.239517 − 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 ˆ y with b.911 24.48539 1.18577 30.72764019 28.69456 J K L dy/da 1 1 1 1 1 1 1 1 1 1 1 1 M dy/db 0. Multiply DT with D (using the MMULT matrix function in Excel) to give E. 7. we obtain. Using the sum of squares of residuals appearing in cell D14 of sheet 4.2926 12 − 3 1 2 = 0.73055 0.1803 It follows that.0978 22.55556929 30.
09367 ± 2.8 26. For the parameters appearing in equation 4.2. t95%. Ts = (24. (the commonly chosen level).39 ± 0. p226.17) 1 2 1 2 1 2 = 0.6 26. The temperature within the enclosure as a function of time is shown in table 4. Time (minutes) 2 4 6 8 10 12 14 16 18 20 22 24 Temperature (°C) 24.3 26.4 Confidence intervals for parameter estimates We use parameter estimates and their respective standard errors to quote a confidence interval for each parameter34.98 ± 2.1803 × (2.270 = 0. If we choose a confidence level of 95 %.3 25.4 26.1.094 ± 0.270) °C = (24.19) (4.20) tX%.0 26.4 25.61) °C k = (6.9 25.15) (4.98 ± 0.240) = 0.νσb α = c ± tX%.005467) = 0.0133 4.9 Table 4.262 × 0.16) (4.1803 × (1. so that ν = 9.νσa k = b ± tX%. In this example ν = n 3 where n is the number of data points. t values are routinely tabulated in statistical texts.ν is the critical value of the t distribution for X% confidence level with ν degrees of freedom. In table 4.9 = 2.8 27. 34 See Kirkup (2002). 28 .871) 1 2 = 0.56) °C α = (0.247) °C = (6. Ts = a ± tX%.262 × 0.1803 × (0.247 1 2 (4.νσc (4.5 26.030) min1 Exercise 3 The amount of heat entering an enclosure through a window may be reduced by applying a reflective coating to the window. An experiment is performed to establish the effect of a reflective coating on the rise in air temperature within the enclosure.262 Restoring the units of measurements and quoting 95% confidence intervals gives.− σ a = σ (E111 ) − σ b = σ (E 221 ) − σ c = σ (E331 ) 1 2 = 0.388 ± 2.18) (4.1 there are 12 points.0133) min1 = (0.2: Data for exercise 3.262 × 0.0 26.
Note that good starting values for parameter estimates are required if fitting by nonlinear least squares is to be successful.0613 °C.227 °C. Find a .2.Fit equation 4. σb = 0.0682 min1. b = 3. c = 0.2 to the data in table 4. σa = 0. and c and their respective standard errors. b .0147 min1] 29 . σc = 0. [a = 24.128 °C.503 °C.
5 0. 5. or starting values are far from the best estimates. Solver finds a minimum in SSR when c is about 0. b and c are obtained for the parameters. Model identification is considered in section 10. we draw on the analysis of data appearing in section 4.1.2 is fitted to data. Section 5: More on fitting using nonlinear least squares To illustrate this situation.5 0.18 < c < 0.1 using the starting values given in sheet 4.04).2 shows the variation of SSR with c in the interval (0.6 c SSR 0. the nonlinear fitting routine finds parameter estimates that produce a lower final value for SSR.4 0.0 10. When starting values are used that are closer to the final values35.0 11. 2) Avoiding local minima in SSR.5 11. Figure 5. 1) Choosing an appropriate model to describe the relationship between x and y.5 10. These can be summarised as.1.1. It is the global minimum that we would like to identify in all least squares problems.8 0.2 and the best estimates. We consider 2) and 3) in this section.1: Variation of SSR with c when a local minimum has been found when equation 4. The minimum in SSR in figure 5. 3) Establishing good starting values prior to fitting by nonlinear least squares.There are several challenges to face when fitting equations to data using nonlinear least squares. a nonlinear least squares fitting routine can become ‘trapped’ in a local minimum.2 is fitted to the data in table 4.7 0.1 Local Minima in SSR When data are noisy.3 Figure 5. The variation of SSR with c is shown in figure 5. For clarity.1 is referred to as a local minimum as there is another combination of parameter estimates that will give a lower value for SSR.53 and terminates the fitting procedure. 12. 35 See section 4. a . the relationship between only one parameter estimate (c) and SSR is considered.0 9. Equation 4. The lowest value of SSR obtainable corresponds to a global minimum. 30 .
σa = 9. after fitting. are.247. If.0 10.3 (resulting from being trapped in a local minimum).4).09367.40. A poor fit of the line to the data could indicate. so that σb/b × 100% = 3. A number of indicators can assist in identifying a local minimum. σc = 0.9 % c = 0. the best estimates of the parameters. their respective standard errors and the magnitude of the ratio of these quantities (expressed as a percentage) are.27. a = 15.05 0 Figure 5.20 0.1 % b = 6.0 25. so that σa/a  × 100% = 1.5371. σc = 0. • • A local minimum has been found.284. so that σc/c × 100% = 14 % There is merit in fitting the same equation to data several times. so that σb/b × 100% = 64 % c = 0. each time using different starting values for the parameter estimates. so that σc/c × 100% = 53 % When the global minimum in SSR is found (see sheet 4.270.10 0. 31 . When a local minimum in SSR is found. σb = 9.388. As an example. σa = 0.0 20. there is consistency between the final values obtained for the best estimates.17. a = 24. the standard errors in the parameter estimates tend to be large. standard errors etc. An inappropriate model has been fitted to the data.0 0.0133.0 30. though there is no ‘foolproof’ way of deciding whether a local or global minimum has been discovered.15 c 0. so that σa/a × 100% = 61 % b = 14.43.0 SSR 15.4). then it is likely that the global minimum has been identified. A good starting point is to plot the raw data along with the fitted line (as illustrated in figure 4.0 5.25. best estimates appearing in sheet 4.0 35.2 is fitted to data.2: Variation of SSR with c when a global minimum has been found when equation 4.98. σb = 0.
then we are able to estimate a and b by considering the data in figure 5. to remark that familiarity with the relationship being studied can assist greatly in deciding what might be reasonable starting values for parameter estimates.0 22.0 °C. then a0 = 25.3.1) Equation 5.3 we see that when x = 0.2 Starting values There are no general rules that may be applied in order to determine good starting values36 for parameter estimates prior to fitting by nonlinear least squares. so that a ≈ 25. y ≈ 25.1. but sometimes unhelpful. y = a + b[1− exp(cx )] (5. c0. which has a smooth line drawn through the points ‘by eye’. a + b ≈ 31. i. Inspection of the graph in figure 5. In order to determine a starting value for c.0 y (°C) 26.5 °C.5 °C and b0 = 5.1 is rearranged into the form37.3: Line drawn ' eye' by through the data given in table 4.1. Consider the data in figure 5. (and assuming c is negative) then y = a + b.0 °C. When x is large.3 indicates that when x is large. equation 5.5 °C. A useful approach to determining starting values is to begin by plotting the experimental data. If the relationship between x and y is given by equation 5.5 °C.5 5 10 15 x (minutes) 20 25 ≈31 Figure 5. are substituted into the equation. If we write the starting values for a and b as a 0 and b0 respectively.3 and a ‘rough’ line drawn through the data. Starting values.0 0 ≈25.5. 32 . a0 and b0.0 28.1 predicts that y =a when x is equal to zero. It is correct.0 20.e. It follows that b ≈ 5. 32.5 °C. 36 37 Sometimes referred to as initial estimates. y ≈ 31.0 24. From figure 5.0 30.
5 °C. we could have fitted y = bx to the data.3 Starting values by curve stripping Establishing starting values in some situations is quite difficult and may require a significant amount of preprocessing of the data.e. For example. 39 38 33 .5 ln([1(ya 0)/b0] 1. the fitting to data of an equation consisting of a sum of exponential terms. such as. Either approach would have given an acceptable starting value for c0.0 3.4). Alternatively. c0. The slope of the line is approximately 0. An equation of the form y = a + bx was fitted to the data using Trendline in Excel.. The starting values may now be stated for this example.0 1. Figure 5.12 min1 These starting values were used in the successful fit of equation 4.5 °C. It follows that plotting ln 1 − versus x should give a straight line b0 with slope. The line of best fit and the equation of the line has been added using the Trendline option in Excel38.4 shows a plot of ln 1 − y − a0 b0 versus x.2 to the data given in table 4. i.0 0.5 x y = 0.2 has the form of an equation of a straight line passing through the origin y − a0 (ie y = bx). y = a exp bx + c exp dx (5.5 2.2461 0 5 10 15 20 25 30 Figure 5.0 2.12.3) For details of Trendline see page 222 in Kirkup (2002). a 0 = 25. 0. c0 = 0.5 3.2) Equation 5.4: Line of best fit used to determine starting value for c.39.1243x + 0. 5.1 (The output of the fit is shown in sheet 4. b0 = 5.ln 1 − y − a0 b0 = c0 x (5.
4 is appropriate. Fitting of equations such as equation 5. if starting values for the parameter estimates are too far from the optimum values.3 is less than 3)41. In analytical chemistry.4 is quite common.g. excited state lifetime measurements offer a means of identifying components in a mixture. Fitting a sum of exponentials by nonlinear least squares allows for each component in the mixture to be discriminated42.4) is particularly challenging especially when data are noisy and/or the ratio of the parameters within the exponentials is less than approximately 3 (e. If an equation to be fitted to data consists of a sum exponential terms. good starting values for parameter estimates are extremely important if local minima in SSR are to be avoided.5 have been gathered in an experiment in which the decay of photogenerated current in the wide band gap semiconductor cadmium sulphide (CdS) is measured as a function of time after photoexcitation of the semiconductor has ceased. in turn. chapter 6. when the ratio b /d in equation 5. Data in figure 5. fitting is terminated and an error message is returned by the spreadsheet. There appears to be an exponential decay of the photocurrent with time. In this situation. It is also possible that. 43 See Bube (1960). Compartmental analysis attempts to predict concentrations of drugs as a function of time (eg in blood or urine). 34 .3 and equation 5.or40 y = a exp bx + c exp dx + e exp fx (5. 40 41 42 Here we assume b > d > f See Kirkup and Sutherland (1988). The decay of phosphorescence with time that occurs after illumination of the mixture may be captured.3 or equation 5. suggests that an equation of the form given by equation 5. for example the kinetics of drug transport through the human body is routinely modelled using ‘compartmental analysis’. Theory indicates43 that there may be more than one decay mechanism for photoconductivity. That. See Demas (1983). SSR will increase during the iterative process to such an extent that it exceeds the maximum floating point number that a spreadsheet (or other program) can handle. The decay can be represented by a sum of exponential terms. The relationship between concentration and time is often well represented by a sum of exponential terms.
5) Equation 5. Now we revisit equation 5. Photocurrent (arbitrary units) y ≈ c exp dx (5.6) Transforming equation 5.4 Effect of instrument resolution and noise on best estimates Errors in the dependent variable lead to uncertainties in parameter estimates44. 1988. it may not be possible to establish reasonable parameter estimates.100 90 80 70 60 50 40 30 20 10 0 0 20 40 Time (ms) 60 80 100 Figure 5. y − c0 exp d 0 x = a exp bx (5. Such nonzero residuals would translate to uncertainties in parameter estimates.3 can now be written.5 can be linearised by taking natural logarithms of both sides of the equation. how are starting values for parameter estimates established? If b is large (and negative) then the contribution of the first term in equation 5. To illustrate the effect of errors on fitting. The next step is to fit a straight line to the transformed data to find (approximate values) for c and d which we will designate as c0 and d 0 respectively. 45 Monte Carlo simulations are dealt with in section 11. If equation 5. for x > x . we consider the outcome of Monte Carlo simulations in which errors are added to data in the form of normally distributed noise to ‘noise free’ data45. Equation 5. For a more detailed discussion of how to determine starting values when an equation to be fitted consists of a sum of exponential terms.3 and write for x < x . In the case of a ‘model violation’ (such that the equation fitted to the data is not appropriate) there would be nonzero residuals even if the data were error free.6 by taking natural logarithms of both sides of the equation then fitting a straight line to the transformed data will yield approximate values for a and b which can serve as starting values in a nonlinear fit.3 is to be fitted to data. see Kirkup and Sutherland. 35 44 .3 to y is small when x exceeds some value.5: Photocurrent versus time data for cadmium sulphide. If errors are very large. nonlinear least squares is performed to find best estimates of the parameters. 5. After noise is added. which we will designate as x .
c0 = 0. Writing equation 5.034 min1 Noise free data generated at 5 minute intervals between t = 0 and t = 55 minutes using equation 5.7 using our usual convention for variables and parameter estimates gives.7: 46 Fitting options are discussed in section 9. T∞ = 26 °C. data are generated in an ‘experiment’ in which the temperature of water in a vessel is monitored as it cools in a laboratory. t. y = a + (b − a ) exp(−cx) (5.8 to the data. 36 . Ts = 62 °C.7.02 The fitting options46 are selected using Solver Options dialog box as shown in figure 5.1. and k is the rate constant for cooling.6: Noise free data of temperature versus time generated using equation 5. We choose (arbitrarily). The equation relating temperature. b0 = 60. T = T∞ + (Ts − T∞ ) exp(− kt ) (5. is written.7) where T∞ is the temperature at infinite time (which is equal to room temperature).6.8) The next step is to fit equation 5. to time.7 are shown in figure 5. k = 0. using the following conditions: Starting values: a 0 = 25. Ts is the starting temperature. 65 60 Temperature ( o C) 55 50 45 40 35 30 25 20 0 10 20 30 Time (minutes) 40 50 60 Figure 5. T.To study the effect of noise on parameter estimates.
6. Using Solver.8. The mean and standard deviation of the random numbers are controlled using the dialog box shown in figure 5. it is usual to select the mean to be zero.1: Best estimate of parameters and standard errors in parameters. normally distributed noise47 of constant standard deviation (i.66 ×10 5 b 61. For this example it is convenient to leave the standard deviation at its default value of one.8 to the data in figure 5. Normally distributed noise can be added to the data by using the Random Number Generation tool in the Analysis ToolPak.Figure 5.45 ×10 5 c 0. 48 47 37 .7: Solver Options used to fit equation 5.73 ×109 Table 5.0339998 σc 7.3). The standard deviation can have any value (the larger the value. homoscedastic data) is added to noise free data48. the greater the ‘noise’).e.99997243 σb 1. a 25. Heteroscedastic noise can also be added to data with the aid of Excel’s Random Number Generation tool (see section 11.99993547 σa 3.1 Adding normally distributed noise to data using Excel’s Random Number Generator To investigate the effect of errors on the fitting of equations to data. Also referred to as Gaussian noise. the following values were recovered for best estimates of parameters and standard errors in best estimates.4. 5.65×10 8 SSR 2. When adding noise.
98141785 36.e.1 °C. the data with noise added) are obtained by adding values in columns B to those in column C.244257 1.095023 1.61784084 44. but no noise is added have noise of standard deviation 0.57660687 31.3719334 51.18359 0. 5.23978797 33.69043 D yexp (°C) =B2+C2 Figure 5. Figure 5. The next step is to use Fill® Down to enter the formula into cells D3 to D13.6 with addition of normally distributed noise. • • • • • are noise free are rounded to the nearest 0.62373162 47.23821173 41.19835 1.8) versus x is shown in figure 5.8 to data.27768 0.4.733133 2.30023 1.8 shows the formula entered into cell D2. parameter estimates (and standard errors) are compared when temperature data. 38 .276474 1. A plot of y values with noise added (as given in column D of figure 5.1 2 3 4 5 6 7 8 9 10 11 12 13 A x(min) 0 5 10 15 20 25 30 35 40 45 50 55 y(°C) B 62 56.0867 0.2 Fitting an equation to noisy data To show the effect errors have on the fitting of equation 5.8: Normally distributed noise with zero mean and standard deviation of one.9. 65 60 Temperature ( oC) 55 50 45 40 35 30 25 20 0 10 20 30 40 50 60 Time (minutes) Figure 5.54845183 C Noise (°C) 0.23418 1.9: Data in figure 5.95196551 35.2 °C added have noise of standard deviation 1 °C added have noise of standard deviation 5 °C added. The ‘experimental’ data in column D (i. Noise is generated using the Random Number Generation tool in Excel’s Analysis ToolPak.79528402 32.6902 1.38693755 38.
Noise a (a. on the whole.2 1. As anticipated.01031 0. See Kirkup (2002).54 6.000821 0.9991 61.99997 61.9870 26. ch1.034 min1 respectively.9999 25.000588 0.2820 25.2. In order to indicate to what extent the estimates a. 39 .4.154 0.8 2.0339998 0.000149 0.2820 25.9995 61.9999 25. increases as the noise increases.1 0.50 σx = σ n (5.0 5.0345083 0. b 0 = 60.149 0.2 reveals that for a noise of standard deviation of 5. the percentage difference between the true values of the parameters and the parameter estimates would increase as the noise level increased.0339850 0.9991 61.65×108 0.0000 61.02. However examination of table 5.0 5. percentage differences are presented in table 5.4.63 278. Solver Options were as given in figure 5.66 × 105 0.0484832 0.84E05 0.00145 1.8 42.43 13.2.0500 1. However if we were to repeat the simulation many times we would find that.Ts)×100%/ Ts c (b.1 °C. Note that.0345083 0. Ts = 62 °C and k = 0.9995 61. the standard errors in the estimates increase as the noise increases.0 a 25.9870 26.20 62.0339998 0.00487 0. b and c differ from the true values. c0 = 0.Temperature data over the period x = 0 to x = 55 minutes were generated in the manner described in section 5. the percentage difference between the parameter estimates and the true value as given in table 5.4289 0. of the data is related to the standard deviation.3 Relationship between sampling density and parameter estimates When repeat measurements are made of a single quantity (such as the time taken for a ball to free fall through a fixed distance) the standard error in the mean.0282 0.4735 19.1136 70.T∞)×100%/ T∞ b (b.4735 19.0 25.4289 σa 3.2: (Absolute) percentage difference between parameter estimates and true values. the estimate of T∞ is within ≈ 2% of the true value.1401 σb 1.0484832 σc 7.7. 5. σ.50 27.1401 4.000806 0.82 c 0.000385 0.2 Table 5.1.3070 12.0245648 0.0712 0. This should be expected: as noise added is random there is a possibility that ‘by chance’ a good estimate for some parameter will be obtained even when the noise is quite large.1: Best estimates of parameters and standard errors in estimates.0339850 0.0245648 0.9) 49 50 Denotes temperature values rounded to 0. by.0441 1.82 25. on average. σ x .378 4.73 ×109 0. Noise None RD:0.6 Table 5.0197 SSR 2. The starting values for all fits were a 0 = 25.42 b 61.45 × 105 0.1136 70.k)×100%/ k None RD:0.2 1.959 4.1 0. T∞ = 26 °C.
The number of values were chosen to be n = 9.8 0. a 5.e.0 3.06 1/n 0.00 0.10 0. 33. we profit by a reduction in the standard error of the mean.12 Frenkel (2002) discusses the relationship between the standard errors in the parameter estimates and the number of data.4 0.2 0. σ b2 and σ c2 . 49.2831 outlier a) 0.02 0. 25.10 shows such plots.9 is valid for the standard errors in the 2 parameter estimates. T∞ = 26 °C. then plotting σ a . b and c and their respective standard errors were determined using (unweighted) nonlinear least squares.5 2. there is a similar reduction in the standard error of the parameter estimates as the number of measurements increases51.9 0.1 0 0.7 (with parameters.9 indicates that σ x reduces as 1 / n i. k = 0.0 4.10 0.02 0. σ a .06 1/n 0.04 0.034 min1) to which noise of unity standard deviation has been added to data ‘gathered’ in the range x = 0 °C to x = 60 °C.3 0. a.04 0.08 y = 22. Variance of parameter estimate.5 1. Ts = 62 °C. It is anticipated that in analysis by least squares. in the parameter estimates.5 0. 41.942x + 0. 61. n. 16. if more measurements are made.0 0. σ b2 and σ c2 versus 1/n should produce a straight line.12 Variance of parameter estimate. consider the analysis of data generated using equation 5.08 y = 5. 40 51 .5 4.892 b) 0. Squaring the standard errors gives the variances. b 0. Data are generated at evenly spaced intervals of temperature.0 0.5 0.8 was fitted to the data in order to establish best estimates and standard 2 errors in the best estimates. 91. If an equation of the form given in equation 5.0 2.0 1.Equation 5. 121.5 3.00 0.6 0. To establish this.7 0. Figure 5.826 R2 = 0. Equation 5.8982x + 0.1142 R2 = 0.
0E06 0.0E06 2.0001x + 2E06 R2 = 0.9164 c) 0.variance of parameter estimate.10a to 5.04 0.00 0. 41 .0E05 8.4E05 1.10: Variance of parameter estimates as a function of number of data. c 1.0E06 6.06 1/n 0. R2.0E+00 0.10c appear to be follow a linear relationship.12 Figure 5. With the exception of the circled data point in figure 5.10 0.6E05 1.08 y = 0.2E05 1.0E06 4.8E05 1.02 0.10a. the points on the graphs in figures 5. Each graph shows the equation of the best straight line fitted to the points and the coefficient of determination. indicating that the variance of the parameter estimates does decrease (at least approximately) as 1/n.
1).59 5.9 78.42 3. The relationship between plate height.2 may be fitted to the data in table 6. To perform (linear) least squares with this tool. Figure 6.46 3.31 3.62 4.4 40.50 3. p46.1 16.63 3.1 20. H.8 115. irrespective of whether fitting is carried out by linear or nonlinear least squares. A convenient way to accomplish this is to use the Regression tool in the Analysis ToolPak in Excel54. (1997).1 34.1. y = a+ b + cx x (6. v.0 H (mm) 9.4 120. See Snyder et al.06 3. b. can be written.1 using linear least squares.4 7. To illustrate this.0 23.9 96.1: Plate height versus flow rate data. 53 42 . we rewrite equation 6.67 Section 6: Linear least squares meets nonlinear least squares Table 6.24 4. Consistent with our convention of naming variables and parameter estimates.53 H = A+ B + Cv v (6.1) where A.25 3.152. 54 See Kirkup (2002) p 373. and c are estimates of the constants A. B and C are constants.It is possible to use the technique of nonlinear least squares to fit linear equations to data. and flow rate. In such circumstances we expect the same values to emerge for the best estimates of the parameters and the standard errors in the estimates. 52 See Moody H W (1982).2) a.86 4. Equation 6. B and C respectively in equation 6.29 3.0 44.1 shows an Excel spreadsheet containing the data and the output of the Regression tool. we consider an example in which the van Deemter equation is fitted to gas chromatography data in table 6.7 65.1 as. we place values of 1/x and x in adjacent columns (these appear in columns B and C of figure 6. v (ml/min) 3.
Equation 6.1: Fitting equation 6.1 and 6. 43 . The approach adopted for determining the best estimates and the standard errors in the best estimates is as described in sections 4. As anticipated.1 using Solver to perform nonlinear least squares.2 to data using the Regression tool in Excel’s Analysis ToolPak.2. as can be seen by inspection of figures 6.2. 4.1 and 4.2 is now fitted to the data in table 6. both linear least squares and nonlinear least squares return the same best estimates for the parameters and standard errors in the best estimates.a b c σa σb σc Figure 6.
2 to data in table 6.2: Spreadsheet for fitting equation 6.1 using nonlinear least squares.Figure 6. D matrix Sum of squares of residuals DT matrix E =DTD Inverse of E matrix Standard deviation of y values Best estimates of parameters Standard errors in estimates 44 .
002526 0.6028 Table 6.3 to the data in table 6. 0. [0. 1 See Kennedy and Knox. H (cm) 0. 0. Table 6. B and C are constants.Exercise 4 The Knox equation is widely used to represent the relationship between the plate height H. 0.2: Data from Katz et al. Use either linear or nonlinear least squares to fit equation 6.002292 0.06507 0.1653 0. (1983).002246 0.002176 0.12 × 10 6 cm2/s.2 shows LC data of plate height versus flow velocity for data published by Katz et al (1983)56.0008720 s. H = Av 1 3 + B + Cv v (6.003116 0.1306 0.004788 0. 2 1 2 3 ⋅s 3 . (1972). and the velocity.003704 0.48% (w/v) ethyl acetate in n – pentane. v.002678 0.4792 0.002856 v (cm/s) 0.000185 cm 3.04527 0.002509 cm 3 ⋅ s 3 . of the mobile phase of a liquid chromatograph (LC)55. The data were obtained with a benzyl acetate solute and a mobile phase of 4. 0.0001232 cm2/s .002360 0.2488 0.000326 s]. The relationship may be written. B and C and the standard errors in the estimates.3) where A.03027 0. 56 55 45 .3185 0.2 and thereby obtain best estimates of A.10023 0.
hence the use of the symbol . as described in section 4. χ2. equations 7. versus x. we use an approach similar to that described in section 4.1. 57 See section 6. 7. If errors are heteroscedastic. an extra column in the spreadsheet containing the standard deviations σi is required. In order to accomplish weighted non linear least squares. For weighted fitting. (7.There are some occasions where the standard deviation of the errors in the y values is not constant (i. we proceed as follows: 1) Fit the desired equation to data by calculating χ2 as given by equation 7. It is possible that the absolute values of σi are unknown and that only relative standard deviations are known. We write. For example. 2) Determine the elements in the D matrix. σi is the standard deviation in the ith yvalue.7. In this sense. Section 7: Weighted nonlinear least squares χ = 2 ˆ y i − yi 2 σi (7.1 Weighted fitting using Solver In order to establish best estimates of parameters using Solver when weighted fitting is performed. 46 . ( y − y ) .2 and 7. Such a situation may be revealed ˆ by plotting residuals57. σ i ∝ yi σ i ∝ yi Weighted fitting can be carried out so long as.10 of Kirkup (2002). ˆ yi − yi 2 58 σi follows a chisquared distribution. The purpose of weighted fitting is to obtain best estimates of the parameters by forcing the line close to the data that are known to high precision. then weighted fitting is required. The purpose of the weighted fitting is to find best estimates of parameters that minimise χ2 in equation 7.2) (7. or the relative standard deviations are known. Use Solver to modify parameter estimates so that χ2 is minimised.3 are sometimes used when weighted fitting is required. equation 7. errors exhibit heteroscedasticity). equation 7. The starting point for weighted fitting using least squares is to define a sum of squares of residuals that takes into account the standard deviation in the y values.1 can be replaced by equation 2. as it is in unweighted fitting using least squares.3) • • the absolute standard deviations in values are known.1) We refer to χ2 as the weighted sum of squares of residuals58.e. while giving much less weight to those data that exhibit large scatter. If σi is constant.1 can be thought as the more general formulation of least squares.1.
59 60 See Neter et al. σw = χ2 n− p 1 2 (7. χ2 is given by equation 7. 47 .67 10. To illustrate steps 1 to 6. V across the diode may be written60.44 12. 4) Calculate the weighted standard deviation σw. (1996).voltage data for a germanium tunnel diode. 6) Calculate the confidence interval for each parameter appearing in the equation at a specified level of confidence (usually 95%). See Karlovsky (1962). W.1. V(mV) 10 20 30 40 50 60 70 80 90 100 110 120 I (mA) 4. n is the number of data points and p is the number 5) Calculate the standard errors of the parameter estimates.4) of parameters in the equation to be fitted to the data.57 10.03 5.5) B is the matrix containing elements equal to the best estimates of the parameters. I. σw is the weighted standard deviation. through a tunnel diode and the voltage.4.73 7. where σw is given by.90 10. I = AV (B − V ) 2 (7.11 10.80 2.3) Construct the weight matrix.36 Table 7. 7.6) A and B are constants to be estimated using least squares. given by59 σ (B ) = σ w (D T WD) [ −1 ] 1 2 (7.94 6.1: Current. we consider an example of weighted fitting using Solver. Table 7.87 9.1 shows current – voltage data for a germanium tunnel diode. given by equation 7.61 3.2 Example of weighted fitting using Solver The relationship between the current. in which the diagonal elements of the matrix contain the weights to be applied to the y values.
80 2.36 14 12 10 y(mA) 8 6 4 2 0 0 20 40 60 80 x (mV) 100 120 140 Figure 7.Equation 7.3 is valid for these data.94 6. The data in table 7. 48 .11 10.1: Data from table 7.1 entered into a spreadsheet.61 3.73 7. as the residuals may show little evidence of heteroscedacity and so there is little point in performing a more complex analysis).90 10. In this example we are going to assume that the error in the y quantity is proportional to the size of the y quantity.1: Current –voltage data for a germanium tunnel diode.87 9.03 5.57 10.6 could be fitted to data using unweighted nonlinear least squares (in the first instance it usually sensible to use an unweighted fit.44 12.1. Sheet 7. ie equation 7.67 10.1 and is plotted in figure 7.12 are entered into a spreadsheet as shown in sheet 7. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A x(mV) 10 20 30 40 50 60 70 80 90 100 110 120 B y(mA) 4.
by noting that equation 7.055205544 0.7 is rearranged to give.61 3.692562482 1. ˆ Sheet 7.316 6.752 2.003313932 0.80 2. Sheet 7.9 10.03 5. y = ax(b − x ) 2 (7.2: Fitted values and weighted sum of squares of residuals before optimisation occurs.30E05 130 49 . By inspection of figure 7.001448311 0. The parameter estimates are the starting values (3.2 shows the cells containing the calculated values of current ( y ) in column C based on equation 7.44 mA) and substituting these into equation 7.7. a0) can be obtained.038927822 0.000132118 0.221453287 0. x = 50 mV and y = 10. x ≈ 130 mV.7 predicts that y = 0 when x = b.90 10.702 8.11 10.3 × 10 5 and 130) in cells D17 and D18.7.004017905 0. This gives (to two significant figures) a 0 = 3. so that b0 = 130 mV.8 along with b0 = 130 mV.752 7. which we will use as a starting value.061457869 0.3 × 10 5.94 6. Column D of sheet 7.1 we see that when y = 0.668796555 solver 3.67 10.44 12. b0. by choosing any data pair from sheet 7.692 10. we rewrite equation 7.396 sum a b ˆ y−y y 2 4.452 0.2. y a= (7.986 9.8) 2 x(b − x ) An approximate value for a (which we take to be the starting value.381793906 0.97 1.105001811 0.7) We can obtain a reasonable value for b. Equation 7.6 as.87 9. 1 B C D 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 y(mA) ˆ y 4.36 0.57 10.1 (say. The sum of these residuals appears in cell D14.73 7.6 4.103481567 0.2 contains the weighted sum of squares of residuals.56 9.1 Best estimates of parameters using Solver Consistent with symbols used in other analyses in this document.
4440503 The weighted standard deviation is calculated using equation 7.94 6.451251736 7.2).005335934 0.61 3.280169211 5. xi ] − y[a.000960354 8. xi y[a (1 + δ ). b(1 + δ ).67 10. ie σw = χ2 n− p 1 2 0.797888132 10.57 10.3: Fitted values and weighted sum of squares of residuals after optimisation using Excel’s Solver.33317E05 0.008524277 0.3.11353481 8.03 5.089599066 from solver 2.10) and ∂y i ∂b ≈ a .Running Solver (using the default settings – see section 9. Sheet 7.089599066 = 12 − 2 1 2 = 0. xi ] b(1 + δ ) − b (7.009788509 0.73 7.4.917760389 2.11) is chosen to be 10 6 (see section 4.96797581 11.44 12.90 10. 50 .31904514 10.98844764 10. b. xi ] a (1 + δ ) − a (7.671430611 9. xi y[a.007089594 0.2 Determining the D matrix In order to determine the matrix of partial derivatives.022541876 0. b.1) gives the output shown in sheet 7.2.831658167 7.09465678 (7.289E05 149.004843048 0.4 shows the values of the partial derivatives in the D matrix.007201911 0.381543539 sum a b ˆ y−y y 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0.596419449 3. we calculate ∂y i ∂a ≈ b .11 10. Sheet 7.02195801 0.80 2. xi ] − y[a.00126636 5. b.87 9. 1 y(mA) B C D ˆ y 4.9) 7.86015E06 0.36 4.
032 0 0 0 0 0 0 0 0 0 0 0.063842869 0.254353473 0.917790076 2.11357286 8.671448325 9.244942205 0.596425045 3.2430 428006.010 0 0 0 0 0 0 0 0 0 0 0.011 0 0 0 0 0 0 0 0 0 0 0.451256188 7.198649367 0.2876 441794.2515 I dy/db 0.254607974 0.6412 104034.0894 318023.4440503 E ˆ y (a constant) 4.31905646 10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ˆ y (b constant) 4. 51 .4441997 F G H dy/da 194446.226374169 0.161767798 D 7.381545921 b constant 2.831666999 7. Excel retains all figures for the calculations).041 0 0 0.4343 479120.4107 171141.008 0 0 0 0 0 0 0 0 0 0 0.009 0 0 0 0 0 0 0 0 0 0 0 0.4: Calculation of partial derivatives used in the D matrix.96798677 11.3 The weight matrix.069 0 0 0 0 0 0 0 0 0 0 0.11354493 8.280176491 5.009 0 0 0 0 0 0 0 0 0 0 0.006 0 0 0 0 0 0 0 0 0 0.4316 335115.79789793 10. σi is taken to be equal to yi.289E05 149. see Neter et (1996).98845863 10.5601 244471.596453279 3.797912649 10. so that the diagonal matrix is as given in sheet 7.451261277 7.381567715 a constant 2.180 W 61 For details on the weight matrix.5: Weight matrix for tunnel diode analysis (while the weights are shown to only three decimal places.200430873 0.31907916 10.118528971 0.9567 480014.98848436 10.917764307 2. In this example.5.245705707 0.831696179 7. W The weight matrix is a square matrix with diagonal elements proportional to 1 other elements equal to zero61.020 0 0 0 0 0 0 0 0 0 0 0.164058306 0.671438283 9.2.227646674 0.96800576 11. 24 25 26 27 28 29 30 31 32 33 34 35 C D 0.022 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E σ i2 and N F G H I J K L M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0055 494455.289E05 149.9986 385798. Sheet 7.Sheet 7.280205816 5.
05374 4075.4) is multiplied by the square root of the diagonal elements σ a = σ w (1. (W is multiplied with D using the MMULT() function in Excel).002155 0. c) Inversion of the matrix.18847 0. b) Calculation of the matrix DTWD.9449 D 0.013757 0.00139 7767. The steps consist of: a) Calculation of the matrix WD.002089 0.06361 6435.5 Bringing it all together To calculate the standard errors in a and b . the weighted standard deviation of the mean (given by equation 7.94045 7532.55853 3830. DTWD.νσb (7. The elements of this matrix are shown in cells C37 to D48.004956 0.5279 3739.ν is t value corresponding to the 95 % level of confidence and ν is the number of degrees of freedom. −1 To obtain standard errors in estimates a and b.001961 0.001477 0.002616 0.0002251 331.2.9662 × 10 −10 ) and 1 2 = 0.89566 4687.νσa B = b ± t95%.7.002687 0.1885 1 H 15410. −1 7.029045 E F D WD T G 2. The elements of the inverted matrix are shown in cells G41 to H42.87729 11851. A= a ± t95%. −1 37 38 39 40 41 42 43 44 45 46 47 48 B WD C 7967.4 Calculation of (D T WD) −1 Sheet 7. 52 .9662 × 10 −10 ) 1 2 = 1.00) 1 2 = 0.13) t95%.007193 0.725 It follows that the 95% confidence intervals for A and B are.0002251 0.013460574 (D WD) T 1.12) (7.09465678 × (1.2728E+10 15410.6 shows the several steps required to determine (D T WD) .327 × 10 6 σ b = σ w (332.001468 0.2. i. we must determine D T WD ( ) −1 . Sheet 7.55955 2884.5077 4536.e.002664 0. The elements of this matrix are shown in cell G37 to H38.9662E10 0.9969496 of the (D T WD) matrix.09465678 × 332.9142 18678.00 × 10 −10 ( ) 1 2 = 1.6: Calculation of (D T WD) .
8) mV Exercise 5 Equation 7. for example.5 mV) d) Why is it preferable to use nonlinear least squares to estimate parameters rather than to linearise equation 7. Care must be exercised when calculating the uncertainty in the estimate of B as this requires use of both slope and the intercept and these are correlated. For more information see Kirkup (2002). From statistical tables62. 63 62 53 . 149. 1. Kirkup (2002) page 385. V c) Use unweighted least squares to obtain best estimates of A and B and standard errors in the best estimates63.6 followed by using linear least squares to find these estimates? b) Plot a graph of See. I V 1 2 = A 2 B − A 2V 1 1 1 (7.30 × 105 mA/(mV)3.0 × 106 mA/mV3. 2.8 mV.6 may be transformed into a form suitable for fitting by linear least squares. t95%.14) I 2 versus V. A= (2. ν = n – p = 12 – 2 = 10.4 ± 3. the number of degrees of freedom.30) × 105 mA/(mV)3 B = (149.29 ± 0. a) Show that equation 7. page 232.In this example. [2.228 It follows that (inserting units).6 can be rearranged into the form.10 = 2.
For example. if the relationship between x and y is y = a + bx (8. x .2 and replacing y by y and x by ˆ x . 64 65 Equation 8. given by. in a photoconductor. least squares estimates and calibration (8. is given by.Establishing best estimates of parameters in an equation may be the main purpose of fitting an equation to experimental data. σ xˆ = ˆ ∂x σa ∂a 2 ˆ ∂x + σb ∂b 2 ˆ ∂x σy + ∂y 2 (8. with time. 1988).3) ˆ One approach to calculating the standard error in x is to assume that the errors in a. the primary purpose of the fitting may be to obtain best estimates for the parameters A1. the equation is used to find ‘values of x’ from measured values of y. We begin by determining the covariance matrix. y . For example. See Salter (2002). This is done by rearranging equation 8. V = σ2A1 (8. as discussed in section 2. can be determined. B1 and B2 which appear in equation 8. t. σ xˆ . In this situation the standard error.1) There are situations in which parameter estimates are used to calculate other quantities of interest. σ2 is found using. the corresponding value of x.2) ˆ then for a given (mean) value of y. V. To correctly determine σ xˆ .5) Where σ2 is the variance in the y values and A1 is the error matrix. R = A1 exp(B1t ) + A2 exp(B2 t ) Section 8: Uncertainty propagation. A2.1 represents a possible relationship between R and t (Kirkup L and Cherry I. so it is not valid to use equation 8. errors in the parameters estimates a and b are correlated65. A common example involves gathering xy data for the purpose of calibration. Once the best estimates of the parameters in the calibration equation have been determined. 54 . so that ˆ x= y−a b (8. in an experiment to study the variation of resistance.4) Unfortunately.4. we must account for that correlation. b and y are uncorrelated. R.164 which represents a possible relationship between R and t.
A= n x x x2 (8.6) ˆ where yi = a + bxi . where X is given by.12) 8. then66.We will use the data to estimate the value of 66 See Salter (2000). A is written as.8) X= 1 x2 1 x3 1 xi 1 xn (8.9) If f is some function of a and b. 55 .12 is applied to data gathered in an experiment which considers the variation in pressure of a fixed mass and volume of gas as the temperature of the gas changes. equation 8. A1 is the inverse of the matrix. which is especially efficient when using a computer package that allows for matrix multiplication (such as Excel).7) There is an economical way to determine the elements in the matrix. A. In this example. σ 2 = d fT Vd f f where (8. X.11) σ 2 = σ 2 d T A 1d f f f (8. A = XTX XT is the transpose of the matrix. (8. and n is the number of xy data. A.10 can be rewritten as. The data are given in table 8. where. 1 x1 (8.1: Example of propagation of uncertainties involving parameter estimates Equation 8.10) ∂f d f = ∂a ∂f ∂b As V = σ2A1.1.σ = 2 ˆ ( y i − y i )2 n−2 (8.
the temperature at which the pressure of the gas is zero (this is termed the ‘absolute zero’ of temperature).1: Pressure versus temperature data. We will determine. including using the LINEST() function in Excel67. Applying the LINEST() function to the data in table 8. assuming errors in a and b are uncorrelated. σa = 993. θˆINT . a) best estimates for A and B (written as a and b respectively). can be written.13) e) the standard error in θˆINT . c) the intercept. b) standard errors.36 Pa⋅°C b) Using LINEST() in Excel to calculate σa and σb gives. d) the standard error in θˆ .80 Pa⋅°C 67 See page 228 of Kirkup (2002).7 Pa σb = 22. P = A + Bθ Where A and B are parameters to be estimated using least squares.1 we obtain: a = 226909 Pa b = 836. and temperature. Solution a) a and b may be determined in several ways. θ(°C) 20 10 0 10 20 30 40 50 60 70 80 P (kPa) 211 218 224 238 247 251 259 265 277 288 294 Assume that the relation between pressure. P. σa and σb. θ. assuming errors in a and b are correlated. in a and b. Table 8. INT (8. of the best line through the data on the temperature axis. 56 .
θˆINT = − =− a b (8.13 gives.3 ± 7.14) B The best estimate of θINT. on the temperature axis occurs when P = 0. θ INT = (− 271.7 836. Rearranging equation 8. θINT. INT 226909 = −271.c) The intercept.5) °C e) ˆ In order to determine θ INT when the correlation between a and b is accounted for. we write (following equation 8.49 °C It follows that. INT = ˆ ∂θ INT σa ∂a 2 ∂θˆ INT + σb ∂b 2 (8.80 (836. is therefore.16) ∂θˆ INT 1 = − and ∂a b ˆ ∂θ INT a = 2 ∂b b (8.7).17) (8.36 σ θˆ Now.16). 68 See page 390 of Kirkup (2002).36)2 2 = 7.15) d) Assuming that errors in a and b are uncorrelated. the usual propagation of uncertainties equation gives the standard error in θˆINT .3 °C 836.18) It follows that (using equation 8. A θ INT = − (8. σ θˆ as 68. 57 .36 2 226909 + × 22. σ θˆ INT = 1 − × 993. written as θˆINT .
15.20) σ = 2 ˆ (P − P ) i i 2 n−2 (8.12 as.00272727 − 0.A= n x x x 2 = 11 330 330 20900 Inverting the matrix A is accomplished using the MINVERSE() function in Excel69 . dθˆ INT = − 0. INT σ θ2 ˆ Now INT T = σ 2 dθˆ A 1dθˆ INT INT (8.11 and equation 8.172727 − 0. This gives. A1 = 0.00272727 9.19) To determine σ θˆ we use equation 8. It is convenient to rewrite equation 8.0011957 − 0.32439 69 See page 285 in Kirkup (2002). we find. INT (8.21.24) Substituting a and b obtained in part a) of this question gives.32439) 0. 58 .00272727 9.21) where ˆ Pi = a + bθ i (8. dθˆ dθˆ ∂θˆi = ∂a ∂θˆi ∂b 1 b = a b2 − INT (8.172727 − 0.09091× 10 5 (8.09091× 10 −5 0.0011957 0. we have.23) is given by.20. Using those estimates and equation 8.12.32439 Returning to equation 8.0011957 0. σ 2 = 5717171 From equation 8.22) Values for a and b appear in part a) of this question.00272727 − 0. σ θ2 ˆ INT = 5717171(− 0.
20 (°C) . y yo xo ˆ Figure 8. y o might represent the mean detector response of an ˆ instrument and xo is the predicted concentration of the analyte corresponding to that response). x Assuming the relationship between x and y in figure 8.26 °C θ INT = (− 271.1: Calibration line fitted to xy data. as well as other quantities that have uncertainty.3 ± 8. may be brought together to determine a ‘derived’ quantity.1 which is to be used to determine x o when y = y o (in an analytical chemistry application.3 ± 7.3) °C This may be compared with θ INT obtained in part d) of this question when a and b are assumed to be uncorrelated.5 ) °C In this instance. ˆ yo = a + bxo (8. 8.so that. The derived quantity has an uncertainty which may be calculated. failure to account for the correlation between a and b results in an ˆ underestimation of the standard error in θ INT . ie: θ INT = (− 271. then.25) 59 .2 Uncertainties in derived quantities incorporating least squares estimates Parameter estimates obtained using least squares. σ θ2 ˆ INT 2 = 68. consider the ˆ calibration line in figure 8. As an example. or σ θˆ Now we write: INT = 8.1 is linear.
we find70 2 σP = σ2 m = 5717171 = 1.1 we considered data from an experiment in which the variation in pressure of a fixed mass and volume of gas was measured as the temperature of the gas changes.3: Example of propagation of uncertainties in derived quantities In section 8.26 we have.28) where σ2 is given by equation 8. and the value of σ2 given in equation 8. As y o is not correlated with a or b. we write.28. 70 The assumption is made here is that the scatter in the y values remains constant.36 Using equation 8. 60 . ˆ xo = yo − a b (8. Mean pressure Po = 2. Adapting equation 8.54 × 10 5 Pa.39 °C b 836. 8.54 × 105 − 226909 = = 32.26.6.23.or. ˆ ∂x o 1 = ∂y o b Also. We will use that data and the additional information that four repeat measurements of pressure were made at an unknown temperature such that. such that the estimate we make of the standard deviation in the y values during calibration is the same as that of the y values obtained for the unknown x value. and m is the number of repeat measurements made of the detector response for a particular (unknown) analyte concentration.27) From equation 8. σ 2 ˆ xo ˆ ∂xo = σy ∂y o o 2 + σ 2 d To A 1d xo ˆ ˆ x (8.27 in terms of the variables in this question gives. 2 σy = o σ2 m (8. θˆo = Po − a 2.429 × 106 (Pa)2 4 Rewriting equation 8. we have.26) a and b are determined using least squares.
may be written.36 2 × 1.31) (8. the covariance matrix must be used to establish the standard errors in those quantities.5 ± 8.4) °C 8.20 so that.04 + 68. E. as described in sections 3 and 4. parameter estimates obtained using non linear least squares are correlated. θ o = (32. σ 2 .429 × 10 6 + 68.29) 1 = 836.32) 71 See section 4. Suppose f is a function of parameter estimates obtained through non linear least squares. f σ 2 = σ 2 d T E 1d f f f E1 is the inverse of the matrix. where.4: Uncertainty propagation and nonlinear least squares In general. σ θˆ = 8. 61 . is to minimise the sum of squares of residuals.20 = 2. ∂y1 ∂a ∂y 2 D = ∂a ∂y i ∂a ∂y n ∂a ∂y1 ∂b ∂y 2 ∂b ∂yi ∂b ∂y n ∂b ∂y1 ∂c ∂y 2 ∂c ∂y i ∂c ∂y n ∂c (8. Therefore. SSR. we write.38 °C o Finally. The variance in f.30) (8.2. for derived quantities which incorporate parameter estimates. The first stage.σ 2 θˆ o ˆ ∂θ o = σ ∂Po P 2 T + σ 2 d θˆ A 1dθˆ o o (8.71 E = DTD D is given by. as with any non linear fitting.
09 10.485319 16. but shows a slight but definite curvature.4. As an example.1: Example of uncertainty propagation in parameter estimates obtained by nonlinear least squares In many situations.2 and also in figure 8.34) Equations 8. a.261 1. Close inspection of the data in figure 8. We will fit the function. but we must be wary of using a function with too many adjustable parameters (see section 10).825760 8.522 5.2: Calibration curve of area versus concentration for biochanin. There are many candidates for the function that might be fitted to data. b and c of the parameters in the equation fitted to data.045 10.523561 8.045 5.121109 0.631 1.158 0.06871 33.261 2.522 2.261 1.80701 16. consider the data shown in table 8.and72 ∂f ∂a ∂f df = ∂b ∂f ∂c (8.69860 16.33) 8.315 0.68172 34.539992 8.315 0.121342 0.399678 1.835915 3.631 0.2: Area versus concentration data for biochanin.70727 Area (arbitrary units) 40 35 30 25 20 15 10 5 0 0 2 6 4 8 Concentration (mg/l) 10 12 Figure 8.403550 0. calibration data exhibit a slight curvature and it is a matter of debate whether it is appropriate to fit an equation of the form y = a + bx to the data. (x) (mg/l) 0.315 0.2 indicates that the relationship between Area and Concentration is not linear.09 Table 8. Conc.33 are appropriate where there are three best estimates.846146 3.631 0.045 5.415226 0.91678 33. 62 .32 and 8.09 10. y = A + Bx C 72 (8. Area.835114 1.2.158 0.839583 1.840554 3.522 2. (y) (arbitrary units) 0. Both equations may be extended if the number of parameters to be estimated exceeds three.
b. σ where.35. c and y o respectively gives. and y o .9790 When repeat measurements are made of the area under a chromatogram curve. σ2 = ˆ ( y i − y i )2 n−3 (8.39) ˆ Partially differentiating xo in equation 8. as.36) As y o is not correlated with a. into equation 8.40) 63 . b. the mean area can be determined. the best estimates for A. We begin by rearranging equation 8. a = 0. gives the estimate of x. b or c.34.581 c = 0.2. b and c. y −a ˆ xo = o b 1 c (8. and c.37) d xo ˆ (8. respectively are. ˆ ∂xo 1 yo − a =− ∂a bc b 1− c c (8.to the data in table 8. xo . so that.36 with respect to a. Using this mean we may estimate the concentration of the biochanin. we can write. B and C.35) ˆ Substituting a .38) and.5651 b = 3. 2 ˆ xo ˆ ∂xo = σy ∂y o o ˆ ∂xo ∂a ˆ ∂xo = ∂b ˆ ∂xo ∂c 2 + σ 2 d To E 1d xo ˆ ˆ x (8. represented by a. Applying nonlinear least squares. y−A x= B 1 C (8.
ˆ ˆ ˆ ˆ ∂x ∂xo ∂xo ∂x = 0. xo . o = 1.41) 1 c ln yo − a b (8.289083879 ∂y o ∂a ∂b ∂c 64 . c and y o into equations 8. (8.ˆ ∂xo 1 yo − a =− ∂b b bc ˆ ∂xo 1 y −a =− 2 o ∂c b c ˆ ∂x o 1 yo − a = ∂y o bc b 1 c (8.9790 = 1.289083879 .9790 Substituting for a. y o = 6.15513 + 0. c and y o in equation 8.581 c = 0.15513 Fitting using nonlinear least squares gives.36 gives the estimate of the unknown ˆ concentration. b.5651 = 3.581 1 0.40 to 8. we obtain.43) After calibration.248914487 . o = 0.902 mg/l Substituting for a.5651 b = 3.44) a = 0. the area under the chromatogram curve is measured four times for a sample of unknown concentration. = −0. It is found that.42) 1− c c (8. b.43. as y −a ˆ xo = o b 1 c 6.54244803 .
0007 1.28908 2 σ = 2 ˆ ( y i − y i )2 n −3 2 σy = o ˆ ∂xo ∂y o 66 67 68 69 70 71 72 0.02948 σ 2 σy yo m 2 o σ2 m 1 c ˆ xo y −a ˆ xo = o b ˆ ∂xo = σy ∂y o o σ 2 ˆ xo o σx ˆ 2 σ 2 ˆ xo + σ 2 d To E 1d xo ˆ ˆ x 65 .28908 0.1: Annotated sheet showing calculation of x o and σ xo .28908 0.02985 0.00019 0.58138 0.15513 4 1.00624 0.542448 0.90193 0.088470 0.ˆ Sheet 8.56512 3.54245 1.009026 a b c d xo ˆ σa σb σc d xo ˆ d To ˆ x V = σ2E1 V V d xo ˆ ˆ ∂x o 1 yo − a =− ∂y o bc b 1−c c d To Vd xo ˆ ˆ x 63 64 65 ∂ˆ o x ∂y o 0.248914 0.00087 0. ˆ ˆ Sheet 8.00601 0. ˆ Best estimates of parameters and standard errors in parameters ˆ ∂xo ∂a ˆ ∂xo = ∂b ˆ ∂xo ∂c 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 A B 0.00746 6.00783 0.00601 0.00025 0.00077 0.00065 0.24891 0.000705 8.00009 0.1 Shows the layout of a spreadsheet used to calculate x o and σ xo .15E05 C D 0.000646 0.97901 0.078995 0.08357 0.
3 219.3 139.1 200.1 180. xo and σ xˆo are found to be: ˆ x o = 1. Determine a.ˆ From sheet 8. Table 8.029 ˆ which allows us to write: x = (1.3: a) Fit an equation of the form y = a + bx c to the data in table 8.3.1.7 305. The area under the chromatograph peak is shown as a function of known concentrations (expressed in mass/tablet) of Ibuprofen.3 200.9 139. 66 .1 278. b) A sample of Ibuprofen of unknown concentration is injected into the column of the calibrated HPLC. b and c and their respective standards errors.029) mg/l Exercise 6 The following data were obtained during the calibration of an HPLC system using Ibuprofen.9 103.3 180.9 219.1 305. where y corresponds to the area under the chromatograph peak and x corresponds to Ibuprofen concentration. Mass per tablet/(mg/tablet) 103.7 Area (arbitrary units) 265053 261357 345915 345669 445684 445753 494700 493846 540221 539610 683881 683991 755890 754901 Using the data in table 8.9 278.3: Area under chromatograph peak as a function of concentration of Ibuprofen.90193 σ xˆo = 0. The mean area of three replicates measurements is found to be 405623. Use this information to estimate the concentration of Ibuprofen and the standard error in the estimate of the concentration.902 ± 0.
1. v. The Solver dialog box. (1992). Solver comprises three optimisation algorthms: 1) For integer problems. It is the GRG method that is applied in our analyses therefore most of this section is devoted to describing features of Solver that relate to this. it is necessary to click on the Option button in the Solver dialog box. then one course of action it to try new starting values for the parameter estimates. J. the Simplex method is used for optimisation74. The best estimates returned by Solver need to be compared with ‘physical reality’ before being accepted. for example. (1999).Solver was devised primarily for use by the business community and this is reflected in the features it offers. 2) Where equations are linear. rather than a global minimum. See Wolsey L A. may provide for a better fit or reduce the fitting time over that obtained using the default settings. in turn. as shown in figure 4. the best estimate of v is 212 m/s. it is fair to question whether this value is ‘reasonable’. This cannot guarantee that a physically meaningful value will be found for v. See Nocedal. If it is not. This dialog box may be used to modify.2 offers the facility to constrain parameters estimates. This. 75 See Smith and Lasden. Though optimisation can be carried out successfully with the default settings in the Solver Option dialog box. We could use the Constraints box in Solver to constrain the estimate of v so that it cannot take on negative values. Solver possesses several options that can be adjusted by the user to assist in the optimisation process and we will describe those next. Consider an example in which a parameter in an equation represents the speed of sound.1 Solver Options To view Solver options shown in figure 9. 74 73 67 . (1998). in air. If. after fitting. the General Reduced Gradient (GRG) method is adopted75. only that the value will be nonnegative. Solver uses the Branch and Bound method73. 3) In the case of nonlinear problems. the methods by which the optimisation takes place. The application of constraints requires careful consideration as it is possible that Solver will locate a local minimum. Section 9: More on Solver 9.
Assume If this box is ticked then Solver uses the Simplex method to obtain Linear Model best estimates of parameters. If the fractional reduction in the solution over five iterations is less than the value in the Convergence box. Continue button will cause Solver to carry on searching for a solution. If the maximum time is set too low. but this can be increased to a limit of 32. such that Solver has not completed its search. This is the maximum number of iterations that Solver will execute before terminating its search. on searching for a solution. If the number of iterations is set too low. Precision and These options are applicable to situations in which constraints have Tolerance: been specified. Max Time: This restricts the total time Solver spends searching for an optimum solution. continue anyway?'Clicking on the . confidence intervals and sum of squares of residuals. This is an attractive alternative as the Regression Tool returns best estimates. such that Solver has not completed its search. then fitting may be performed using the Regression Tool in the Analysis ToolPak.1: Solver Options dialog box illustrating default settings. Solver reports that optimisation is complete. If the model to be fitted to data is linear. The default value is 100.767. 106) Solver will continue iterating (and hence take longer to complete) than if that number is larger (say. continue The anyway?'Clicking on the Continue button will cause Solver to carry . then a message will be returned ' maximum iteration limit was reached. Convergence: As fitting proceeds. If this value is made very small (say. Solver is likely to find an optimum solution before reaching such a limit or return a message that an optimum solution cannot be found. 102). If the ' Assume Linear 68 Iterations: . then a message is returned ' The maximum time limit was reached. We now consider some of the options in the Solver Options dialog box. Solver compares the most recent solution (for our application this would be the value of SSR) with previous solutions.Figure 9. Unless there are many data. standard errors in estimates. the default value of 100 s is usually sufficient. Specifying constraints is not advised and so we will not consider these options.
The partial derivatives of the function in the target cell with respect to the parameter estimates are found by the method of finite differences. allowing new estimates of parameters and the value in the Target cell to be viewed. If parameter estimates are used to draw a line of best fit through the data. This determines the method used to find subsequent values of each parameter estimate at the outset of the search (ie either linear or quadratic extrapolation). It is advisable to tick this box for all problems. Both methods of produce the same final results for the examples described here. If the ' Automatic Use Scaling' is ticked. Central (Derivatives) and Conjugate (Search methods) would like to be considered. Both methods produce the same final results for the examples described in this document. Updating the fitted line on the graph after each iteration gives a valuable insight into the progress made by Solver to find best estimates of the parameters in an equation. It is tedious to record which fitting conditions have been used. the parameter estimates and the value in the target cell. the message is returned ' conditions for Assume Linear Model are not The satisfied'To continue. Ticking this box causes Solver to pause after each iteration. Excel offers the facility to store the option by clicking on Save Model. then Solver will scale values before carrying box out optimisation (and ' unscale' solution values before entering the them into the spreadsheet).Newton and Conjugate search methods used by Excel can be found in Safizadeh and Signorile (1993) and Perry (1978) respectively. dialog box and untick the Assume Linear Model option. then the line will be updated after each iteration.Model' is ticked. Use Automatic Scaling In certain problems there may be many orders of magnitude difference between the data. it is necessary to return to the Solver Option . 69 Show Iteration Results Estimates: Tangent or Quadratic Derivatives: Forward or Central Search: Newton or Conjugate Load Model and Save Model .3) or to perturb the estimates forward and backward from the point in order to obtain a better estimates of the partial derivatives.This constrains all estimates in an equation so that they cannot take on Negative negative values. It is possible to perturb the estimates ' forward' from a particular point (similar to that described in section 4. This can lead to rounding problems owing to the finite precision arithmetic performed by Excel. If Solver determines that the model is nonlinear. It is possible that the effect on optimisation of using a combination of options such as Tangent (Estimates). followed by specifying the cells on the spreadsheet where the Model conditions should be saved. Both methods of determining the partial derivatives produce the same final results for the examples described in this document. Solver will attempt to establish if the model is box indeed linear. Specifies the search algorithm. Reference to the Quasi . Assume Non. These conditions can be recalled by clicking on Load Model and indicating the cells which contain the saved information.
2: Solver Results dialog box. At this stage Excel is able to present three reports: Answer. Sensitivity and Limits. Clicking on OK will retain the solution found by Solver (ie the starting parameters are permanently replaced by the final parameter estimates).2. the Answer report is the most useful as it gives the starting values of the parameters estimates and the associated SSR. allowing for easy comparison with the original values. it displays the Solver Results dialog box shown in figure 9. Figure 9.2 Solver Results Once Solver completes optimisation.3. An Answer report is shown in figure 9. The report also displays the final parameter estimates and the final SSR. 70 . Figure 9.9.3: Answer report created by Excel. Of the three reports.
That attractive force is termed the gravitational force.1 Physical Modelling If a model based on physical and chemical principles is successful. 71 76 . it is possible to predict the value of acceleration of a body when it is allowed to fall freely above the Earth’s surface. If the approximations given by a). the approximations may have to be revisited and the model revised. can be written: Experienced physical scientists are able to simplify complex situations while retaining the key principles necessary to understand a particular physical process or phenomenon. 77 If it found that the data are inconsistent with the ‘simplified’ theory. It is often the case that approximations are made so that the problem does not become too complicated76. then this lends support to the validity of the underlying principles. Section 10: Modelling and Model Identification 10. Such principles are expected to have wide applicability and underlie phenomena observed inside and outside the laboratory. in the sense that data gathered in experiments are consistent with the predictions of the model. For example. g(h) and height. While both types of modelling are useful. Physical and chemical models are based on the application of physical and chemical principles. b) and c) above are valid. then the relationship between free fall acceleration.There are several types of model that interest physical scientists. h . the energy gap of a semiconductor or a rate constant in a chemical reaction. As an example. Equations founded on physical and chemical principles contain parameters that have physical meaning rather than simply being anonymous constants in an equation. In this example we might consider the Earth to be77: a) a perfect sphere b) not rotating c) of uniform density Once a prediction has been made as to how the acceleration of a body varies with distance above the Earth’s surface. a parameter in an equation could represent the radius of the Earth. a physical principle described by Isaac Newton. most scientists would prefer the insight and predictive opportunities offered by good physical models to those that have purely statistical basis or support. A useful empirical equation is one that successfully describes the trend in the data but is not derived from a consideration of the fundamental principles underlying the relationship between variables. is that an attractive force exists between all bodies. through consideration of experimental or observational data. From this starting point. There are also essentially statistically based models that may. assist in identifying the important variables and lend support to an empirical relationship between variables. Newton went on to indicate how the gravitational force between two bodies depends on their respective masses and the separation between the bodies. the next step is to determine by careful measurement how the acceleration actually depends on distance.
such as the radius of the Earth. g(h). Additionally. 79 78 72 . However. One weakness of this approach is that. g(h) for various heights.1. Such a plot is shown in figure 10. it should be possible to confirm or contest the validity of equation 10. we would be unlikely to recognise that hidden within parameter estimates is an important physical constant.g (h ) = g0 h 1+ R 2 (10. we could try a ‘data driven’ approach such that trends observed in the data suggest a relationship between dependent and independent variables that might be valid.2 Data driven approach to discovering relationships As an alternative to a ‘physical principles’ approach to developing a relationship between physical variables. as the radius of the Earth is one of the parameters to be estimated. h. R) then the acceleration.1 for values of h in the range 0 to 20 km. Applying physical principles in order to establish an equation that successfully relates the variables is challenging. This can be shown by doing a binomial expansion of equation 10.1 (see problem 11 at the end of the article). when h = 0). this can be compared with the known radius of the Earth as determined by other methods. even if the correct functional relationship between acceleration and height is discovered.1. By gathering data of acceleration as a function of height.e. For example. and R is the radius of the Earth78.1) where g 0 is the acceleration caused by gravity at the Earth’s surface (i. See Walker (2002). Chapter 12. will decrease almost linearly with height79. we might carefully gather experimental data of the acceleration of free fall. 10. such an equation is often more satisfying and have wider applicability than an empirical equation.1 that if the range of h values is too limited (much less than the radius. It is also possible to infer from equation 10. with respect to study involving gravity described in section 10. then plot g(h) versus h in order to discern the type relationship between the two variables.
Such isolation and control might be contrasted with situations often encountered in other areas of science (and in other disciplines.760 9. past experience may suggest that the thickness of an aluminium film vacuum deposited onto a glass substrate is affected by the distance from the aluminium target to the substrate.790 9. or intelligent guesswork. family medical history and socioeconomic status. the deposition time and the pressure of the gas in the vacuum chamber. there are circumstances in which two or more equations compete to offer the best explanation of the relationship between the variables.1: Variation of acceleration due to gravity with height above the Earth’s surface Based on the data appearing in figure 10.3 Other forms of modelling In the physical sciences we are often able to isolate and control important independent variables in an experiment. such as the health or medical sciences).810 9.820 9. There are many areas of science in which a certain amount of data ‘mining’ or ‘prospecting’ is required to establish which variables are most important and which can be safely discarded.740 0 5000 10000 height (m) 15000 20000 Figure 10.750 9. as an example. there is a relationship between g(h) and h. There may be many variables that affect patient longevity to be considered including patient age. Here we will confine our considerations to the analysis of data which emerge from experiments in which independent variables can be carefully controlled and measured. the efficacy of a treatment in prolonging the life of a patient suffering with liver cancer.770 9.780 9. identifying which are the most important variables may be the finest achievement of the modelling/data analysis process with little expectation that a functional relationship other than linear will emerge between independent and dependent variables. past experience. but owing to the variability within the data and perhaps the limited range over which the data were gathered. it is difficult to justify fitting an equation other than y = a + bx to these data.1.4 Competing models Whether equations relating variables have been developed by first considering physical principles. 10.800 g (m/s2) 9. sex. 10. More terms can be added to an equation (including 73 . For example.9. Consider. race. past medical history. In fact.
If we were to use R2 to help choose between equations.3) (10. As more parameters (or independent variables) are added to the model we would expect SSR to reduce.5.4 would always be favoured over equation 10. 74 . for example between. if the data show large scatter. It is these methods that we will concentrate upon for the remainder of this section.5 Statistical Measures of Goodness of Fit There are several measures that can be used to assist in discriminating statistically which equation gives the best fit to data. Mallow’s Cp and the Hannan and Quinn Information Criterion80. Note that the numerator in the second term of equation 10. if a model predicts a slightly non linear relationship between dependent and independent variables.5.terms that introduce extra independent variables) until the fit between equation and data is optimised. and y = a + bx y = a + bx + cx2 (10.2 is the sum of squares of residuals SSR. it would be wise to make measurements over as wide a range of values of the independent variable as possible to expose or exaggerate that nonlinearity. there may be merit in investigating ways by which the noise can be reduced in order to improve the quality of the data. R2 would tend to unity. 10. and the Akaike Information Criterion (AIC) as they are quite easy to implement and interpret. 80 See AlSubaihi (2002).1 Adjusted Coefficient of Multiple Determination A measure of how well an equation is able to account for the relationship between the independent and dependent variable is given by the Coefficient of Multiple Determination. Additionally. R2.3 owing to the extra flexibility the x2 term provides for the line of best fit to pass close to the data points. For example. As a consequence. as measured by some suitable statistic such as those described in section 10. Careful experimental design can assist in helping discriminate one equation from another. R = 1− 2 ˆ ( y i − y i )2 ˆ ( y − yi )2 (10.4) then equation 10. 2 the Adjusted Coefficient of Multiple Determination.2) ˆ where yi is the ith observed value of y. including the Schwartz criterion. RADJ . we can appeal to methods of data analysis to provide us with quantifiable means of distinguishing between models. 10. In the situations in which we need to compare two or more equations. yi is the predicted y value found using the equation representing the best line through the points and y is the mean of the observed y values. given by. Here we focus on two criteria.
is that 2 equation that gives the largest value for RADJ . is given by.6 can be considered as a ‘penalty’ term. We conclude that. 2 One such statistic is the Adjusted Coefficient of Multiple Determination. If the addition of another parameter in an equation reduces SSR then the first term on the right hand side of equation 10.5. 81 75 . if two or more equations are fitted to data. This criterion takes into account SSR.6) where n is the number of data and M is the number of parameters in the equation. 2 See Neter. It follows that a modest decrease in SSR which occurs when an extra term is introduced into an equation may be more than offset by the increase in AIC by using another parameter. The second term on the right hand side of equation 10. n is the number of data and M is the number of parameters in the equation.2. It seems reasonable that.6 becomes smaller. is possible that the reduction is only marginal. AIC = n ln SSR + 2 M (10.2 Akaikes Information Criterion (AIC) Another way to compare two (or more) equations fitted to data. AIC may be written. Additionally.5) where R2 is given by equation 10.5. The equation that is favoured when two or more equations are fitted to data. Nachtsheim and Wasserman for a discussion of equation 10. then the equation producing the smallest value for AIC is preferred. Care must be exercised when calculating SSR as. 82 See Akaike (1974). if weighted fitting is to be used. account should also be taken of the number of parameters so as not to unfairly discriminate against equations with only a small number of parameters. R ADJ is calculated by the Regression Tool in the Analysis ToolPak in Excel (see p 373 of Kirkup). but also includes a term proportional to the number of parameters used. Kutner. if a transformation is required to facilitate fitting.While the extra term in x2 contributes to a reduction in SSR. RADJ . 10. otherwise it is not possible to compare equations using RADJ or AIC.81 2 RADJ = (n − 1)R 2 − (M − 1) n−M (10. while looking for an equation that reduces SSR. However the second term on the right hand side increases by two for every extra parameter used. data must be transformed back to the original units before calculating 2 SSR. then the same weighting of the data must be used for all equations fitted to data. where the equations have different numbers of parameters is to use the Akaikes Information Criterion82 (AIC).
86 These data are shown plotted in figure 10. where the relationship between R and T can be written.5 4.2: Resistance versus temperature data for electrical contacts made to a ceramic material.0 3.74 0.20 0.88 T (K) 190 200 210 220 230 240 250 260 270 280 290 300 R(Ω) 0.08 1.05 0.33 2.12 1.05 1.94 0. Model 1 The first model assumes that contacts show semiconducting behaviour. B R = A exp (10.5 3.77 0.36 1.0 0.66 0. 5.1: Resistance versus temperature for electrical contacts on a ceramic.0 2.5 0.0 4.68 0.0 1. the data in table 10.74 0. Table 10.41 3.3 Example As part of a study into the behaviour of electrical contacts made to a ceramic conductor.14 2.77 0.5 1.45 1.84 0. It is suggested that there are two possible models that can be used to describe the variation of the contact resistance with temperature.0 0 50 100 150 200 250 300 350 Temperature (K) Figure 10.85 0.5.1 were obtained for the temperature variation of the electrical resistance of the contacts.2. T (K) 50 60 70 80 90 100 110 120 130 140 150 160 170 180 R(Ω) 4.5 2.79 1.86 1.10.7) T 76 Resistance (ohms) .75 0.78 0.69 0.
σα β.91. σγ SSR AIC R2 2 RADJ 0. Note that the number of data in table 10.9859 0.4849. 2.7 or equation 10.40 0.2 are the results of the fitting. Inspection of table 10. 0.8 better fits the data.8) Solution Both equation 10. then performing linear least squares.8.7 as the better fit to data.70 0.2709 29. However it is more convenient to use the Solver utility in Excel to perform nonlinear least squares. Model 2 Another equation proposed to describe the data assumes an exponential decay of resistance with increasing temperature of the form. this would have been enough to encourage us to favour equation 10.3191 23.00196 0. As the number of parameters in equation 10.2: Parameter estimates and statistics obtained when fitting equations 10.0175 111.9853 18.0313 0.where A and B are constants. 2.95 0. σB α.0.7 and 10. σA B.7 is less than that in equation 10.7 and equation 10.2 reveals that the equation R = A exp 77 . R = α exp(− βT ) + γ where α. 0.7 fitted to data compared to equation 10. β and γ are constants.7974.8 to the data in table 10.1.9833 0.14 0. We will use the adjusted coefficient of multiple determination and the Akaikes information criterion to determine whether equation 10. 0.8. Table 10. Parameter estimates B Fitting R = A exp and other statistics Fitting R = α exp(− βT ) + γ T A.7 by taking logarithms of both sides of the equation. as described in sections 4 of this document. (10.03391. Summarised in table 10. n = 26. σβ γ. In this example the SSR is smaller for equation 10.1.9818 B is superior to T 2 R = α exp(− βT ) + γ as judged by AIC and RADJ . It is possible to linearise equation 10.8 were fitted using nonlinear least squares.
costly or time consuming to investigate through conventional experiments. the effect of the magnitude of the ‘noise’ in the data on the standard errors of the parameter estimates. in turn. The estimates are compared with the ‘actual’ parameters allowing the error in the estimates84 to be determined. To begin with it is not possible to assure that an equation fitted to data is appropriate. error = true value of parameter – estimated value of parameter. we may investigate ‘experimentally’. • • the performance of data analysis tools (for example. The Monte Carlo approach is powerful and versatile.or heteroscedasticity on parameter estimates (for example. where data have been influenced by heteroscedastic noise). Generating and analysing data in this way manner is an example of a Monte Carlo simulation. that errors in the y values are normally distributed. the consequence of choosing different sampling regimes (for example. Such simulations are widely used in science to imitate situations that are too difficult. There is no way to be certain of what the parameters should be that appear in any equation that is fitted to ‘real’ data. The next stage is to add ‘noise’ of known standard deviation with the aid of a random number generator83. 78 . The starting point is to generate ‘noise free’ y values in some range of x values. we cannot be sure that the assumptions usually made when applying the technique of least squares (e. Section 11: Monte Carlo simulations and least squares • • 83 84 or a pseudorandom number generator as routinely found in statistic and spreadsheet packages.g. Additionally. with a mean of zero and a constant standard deviation) are valid. calculates the best estimates of parameters appearing in an equation fitted to the data. the effect of homo. However.How effective is the technique of least squares at providing good estimates of parameters appearing in an equation fitted to experimental data? This question is both challenging and important. it is possible to contrive a situation where we do know the underlying relationship between the dependent variable (y) and independent variable (x) and how errors are distributed. As examples. the speed and accuracy of rival algorithms for nonlinear least squares can be compared). the consequences may be investigated of fitting an equation by unweighted least squares to data. Data generated in this manner are submitted to a least squares routine which. the distribution of parameter estimates obtained when measurements are made at evenly spaced intervals of x can be compared with the distribution of parameter estimates obtained when replicate measurements are made at extreme values of x).
Figure 11. The Random Number Generator is one of the tools in the Analysis ToolPak.5 30. We will consider only normally distributed noise.0 16. x values are in the range x = 5 to x = 20.1) Normally distributed noise with mean of zero and standard deviation of two is generated in the C column.1: Normally distributed noise with zero mean and standard deviation of two added to y values.0 C noise 0.0 31.5 21.1 and 11.5 12.402985 2.5 15.00145 3. 85 The Random Number Generator allows noise with distributions other than normal to be added to data.5 33.06251 0.5 27.1 Using Excel’s Random Number Generator The Random Number Generator in Excel offers a convenient means of adding normally distributed noise to otherwise noise free data85. with no replicates.0 28. with no values between these limits (such that the number of data in figures 11. The xvalues are distributed evenly in the range x = 5 to x = 20.49775 0.5 18.441671 2.010632 y D =B2+C2 Figure 11.674352 2.0 13.11.5x (11.2 are the same).43274 1.183932 1.972357 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 x A 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 B ynoise_free 10.0 22.03515 1.987836 3. 79 . Again. normally distributed noise with mean of zero and standard deviation of two is added to each of the yvalues.77266 0. Figure 11.0 25.08168 2.453826 1.1 shows noise free y values in column B generated using the equation: y = 3 + 1. In the D column the noisefree yvalues are summed with the noise.2 shows a similar range of x values.5 24. but in this case eight replicates are made at x = 5 and another eight at x = 20. The ToolPak is found by going to the Tools pull down menu on the Menu toolbar and clicking on Data Analysis.0 19.95189 0.
The simulation may be repeated many times in order to establish whether designing an experiment with replicate measurements made at extreme x values does consistently produce parameter estimates with smaller standard errors.1 are found by subtracting the estimates from the true values (3 and 1.461 0.8297 1. Error in a Error in b Even dist.03966 0.11841 0.5 10.563158 2.16988 0. Data are generated at x = 5 and x = 20.0 33.0 33.538 0. This is where the power of the Monte Carlo approach emerges.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 x K 5 5 5 5 5 5 5 5 20 20 20 20 20 20 20 20 L ynoise_free 10.71295 y N =L2+M2 Figure 11.5 10.03845 Table 11.0 33.5 respectively) as shown in table 11.9812 Table 11.874793 1. 0.798417 0.578 0.0 33. a b R2 σa σb Even dist.1099 0. 2.1 as ‘Even dist.5386 0. Errors in intercept and slope in table 11. 2.5 33.07801 Extreme dist.7615 0.’). The outcome of analysing using unweighted least squares is shown in table 11.05692 0.67711 2.9364 Extreme dist.2 using unweighted least squares gives the following estimates for parameters and standard errors in parameters.12328 3.5 10.1 and 11.’.0 33.47371 2.0 33.5 10.0 33. Eight replicate y values are generated at each x value. Note we refer to the data that are evenly distributed between x = 5 and x = 20 as given in Figure 11.679191 1.51868 0.95645 1.91623 0.2: Normally distributed noise with zero mean and standard deviation of two added to y values. 0.5 10. 80 .2 found using unweighted least squares.’ could have been favoured over the ‘Even dist. the ‘Extreme dist.849941 0.5 10. Analysing the data shown in figures 11.5 10.153223 1.464 1.0 M noise 0. It is possible that the simulated data are unrepresentative of the effect of evenly distributed data compared to data gathered at extreme x values (as there is only two sets of data and.2: Errors in intercept and slope.1.’ and the data consisting of replicates at x = 5 and x = 20 as ‘Extreme dist.238 1. by chance.1 and 11.2.1: Parameter estimates and statistics for data in figures 11.
0 5.86 σb = 0 0. Figure 11. extreme dist. of slope ( ( x i − x )2 ) 2 1 σ (11. of intercept 0 2. 4 1. For example. 0 4. 0 1.2) 86 See Devore. estimate. b . 5 1. based on adding noise of standard deviation of 2 to y values generated using equation 11. 6 1.4 show histograms of estimates value of a and b which were determined by generating 50 sets of simulated data. xi by.3: Histogram consisting of 50 estimates of intercept found by fitting the equation y = a + bx to simulated data.3 and 11. Figure 11. 14 12 frequency 10 8 6 4 2 0 even dist. 0 6. 2 1.4 provide convincing evidence of the benefits (as far as reducing standard errors in parameter estimates is concerned) of designing the experiments in which extreme x values are favoured.Figures 11. estimate. 0 3. 81 . 8 1. 7 1. the standard error in the estimate of the slope is related to the xvalues.3 and 11. Figures 11. This finding has a sound foundation based on statistical principles. 3 1. extreme dist. 0 7. a . 1991. 25 20 frequency 15 10 5 0 even dist.1.4: Histogram consisting of 50 estimates of slope found by fitting the equation y = a + bx to simulated data.
where x is the mean of the x values and σ is the standard deviation of the experimental y values given by,
σ =
ˆ ( yi − y i )2 n−2
1
2
(11.3)
where n is the number of data. Equation 11.2 indicates that, for a fixed σ, σb become smaller for large deviations of x from the mean, i.e. for large values of xi − x . It is worth emphasising that the reduction of the standard errors and improved R2 are secured at some cost. What if the underlying relationship between x and y is not linear? Gathering data at two extremes of x has assumed that the data are linearly related and there is no way to test the validity of this assumption with the data gathered in this manner.
11.2 Monte Carlo simulation and nonlinear least squares Let us now consider a situation requiring fitting by nonlinear least squares. The equation to be fitted to data is given by,
y = A1 exp(B1 x ) + A2 exp(B2 x ) We choose (arbitrarily), A1 = 50, A2 = 50, B1 = 0.025 and B2 = 0.010 Fifty values of y are generated in the range x = 1 to x = 200. A graph of the noise free data with y values calculated at equal increments of x beginning at x = 1 is shown in figure 11.5.
120 100
y
(11.4)
80 60 40 20 0 0 50 100 x 150 200
Figure 11.5: Noise free data generated using equation 11.4.
82
Next, noise is added with mean of zero and a constant standard deviation of unity (again chosen arbitrarily). The question arises: what values of x should be chosen to obtain estimates of A1, A2 etc which have the smallest standard errors? With normally distributed noise of zero mean and standard deviation of unity added to the y values, the graph looks typically like that shown in figure 11.6.
120
y
100 80 60 40 20 0 0 50 100 x 150 200
Figure 11.6: x – y data as shown in figure 11.5 with noise added. 50 replicate data sets were generated with noise added to the y values in figure 11.5. Upon the generation of each set, an equation of the form,
y = a1 exp(b1 x ) + a 2 exp(b2 x )
(11.5)
where a 1, a 2, b 1 and b2 are estimates of A1, A2, B1 and B2 respectively, was fitted to the data using nonlinear least squares. Note that starting values for the nonlinear fit, which are very important when fitting a function consisting of a sum of exponentials, were a1 = 50, a2 = 50, b 1 = 0.025 and b 2 = 0.010. Figure 11.7 shows a histogram of the a 1 parameter estimates.
83
16 14 12
Frequency
10 8 6 4 2 0 20 30 40 50 60 70 80 90
parameter estimate,a 1
Figure 11.7: Distribution of parameter estimate a1.
Exercise 7 An alternative sampling regime to that used in section 11.2 is to choose smaller sample intervals in the region where the y values are changing most rapidly with x. A sampling regime that has this characteristic is given by,
xi = 1 ln N +1 (N − i ) + 1 (11.6)
λ
N is the total number of data, and λ is a constant which is determined by letting xi equal the maximum x value, when i = N. Repeat the example given in section 11.2 (i.e. use the same starting equation and distribution of errors) using the new sampling regime described by equation 11.6 and perform 50 replicates. Plot a histogram of the distribution of the parameter estimate a1. a) Is the standard deviation of the parameter estimates, a 1, less than that obtained in section 11.2 when the xi values were evenly distributed? b) Carry out an F test to compare the variances of the distribution of a1 obtained using both sampling regimes to establish if the difference in the characteristic width is statistically significant87.
Exercise 8 In an experiment to determine the wavelength, λ, of an ultrasonic wave, an experiment is to be performed which exploits the phenomenon of interference of waves from two sources of ultrasound.
87
See pages 342 to 346, and pages 369 to 371 in Kirkup (2002). 84
b? Simulation Two approaches for choosing values of d are to be compared. The other is to generate y values as the ratio 1/d is increased from 1/20 to 1 (ie 0. use Excel’s Random Number generator to add normally distributed noise with mean of zero and standard deviation of unity. d. is given by: y= λD d (11. in steps of 1 cm. Questions Is there an obvious difference between the distributions of the parameter estimates based on the two sampling regimes? a) Support your answer to a using an F test to compares variances in the slope. d How may values of d be chosen so as to minimise the standard error in the slope.7) where D is a constant. b) for 1/d = 1/20 cm1 to 1 in steps of 0. generate simulated values of y using equation 11.05 cm1.The relationship between the separation. Analysis Use the LINEST() function in Excel to find the slope of the best line through the origin for two sets of data generated. y. For data generated by both methods a) and b).7: a) for d = 1 to 20 cm.7 is of the form y = bx. b) Do you foresee any practical problems when a ‘real’ experiment is to be carried out using either sampling regimes? 85 . Equation 11.76 cm and D =50 cm.05 to 1). where. x ≡ 1 and b ≡ λD. between two successive maxima of the interfering waves and the separation of the sources of the waves. Replicate the simulation and least squares analysis fifty times and construct a histogram showing the distribution of best estimates of slope based on both distributions of x values. Taking λ equal to 0. The first approach generates y values as d is increases from 1 to 20 cm in steps of 1 cm.
We begin with an (arbitrary) equation from which we generate ‘noise free’ data. yi. the standard deviation. when the dominant source of error is due to instrumental error. For example. We can use a Monte Carlo simulation to study the effect of heteroscedasticity. the distribution of errors is said to be heteroscedastic.3 Adding heteroscedastic noise using Excel’s Random Number Generator When the standard deviation of measurement is not constant.9 shows noise free data generates in the range x = 1 to x = 10 using the Random Number Generator in Excel. As far as fitting an equation to data using least squares is concerned. but instead depends on the x value. it is necessary to use weighted fitting88. Specifically. it is common for the error.9) 88 See page 264 of Kirkup (2002). but the standard deviation of the distribution at each x value depends on the magnitude of the y value in column B.1 × yi (11. The equation is: y = 2 – 4x (11. In the C column there are normally distributed numbers with mean equal to zero. and to establish (for example) the consequences of fitting an equation to data with heteroscedastic errors using both unweighted and weighted least squares.11. σi is given by : σi = 0. and standard deviation equal to one.8. the nature of the heteroscedasticity is not always clear. The values in the D column also have a normal distribution. ei to be proportional to the magnitude of the response.8) Figure 11. 150 100 ∆y 50 0 50 100 150 10 20 x 30 300 200 ∆ y 100 0 100 200 300 400 10 20 x 30 Figure 11. Though heteroscedasticity may be revealed by a plot of residuals. Heteroscedasticity may be revealed by plotting the residuals versus x as shown in figure 11.8: Residuals indicating weighted fit is required: The trend of large residual to small (or small to large) as x increases is a strong indication of heteroscedastic error distribution. 86 .
156 13. Figure 11. Figure 11.0725 0.5726 1.9 shows a plot of y versus x based on the generated data in sheet 11.0787 0.63055 8.1.15598 0.9: Plot of x – y data as generated by sheet 11.44338 0.7013 34.78425 5.15598 0.34356 1.21594 0.4906 33.23376 0.1 2 3 4 5 6 7 8 9 10 11 x A 1 2 3 4 5 6 7 8 9 10 B ynoise_free 2 6 10 14 18 22 26 30 34 38 C homo_noise 1.9143 Sheet 11.50944 3.5806 1. ‘Experimental’ data appear in column E.15598 0.10 shows the (unweighted) y residuals plotted versus x.67552 0.2159 22.67552 0.6306 46. 0 5 10 15 20 25 30 35 40 45 50 2 4 6 8 10 12 y y = 4.0787 0.91434 E y 1. y = a + bx to the data. 1 2 3 4 5 6 7 8 9 10 11 x A 1 2 3 4 5 6 7 8 9 10 B ynoise_free 2 6 10 14 18 22 26 30 34 38 C homo_noise 1.1.5726 1.10146 1.7994 x Figure 11.97544 1.34588 D hetero_noise 0.5806 1. The line of best fit (found using unweighted least squares) is shown on the graph.18546 2. The trend in the residuals indicates that the errors have a heteroscedastic distribution and therefore weighted fitting is required.5894x + 3.0725 0.18546 2.1*B2*C2 E y =B2+D2 Sheet 11. The line of best fit on the graph was found using Excel’s Trendline option and therefore represents and unweighted fit of the equation.2.70128 0.9754 24.34588 D hetero_noise =0. 87 .44338 0.21575 0.1: Generating data with heteroscedastic noise.65644 11.2: Completed spreadsheet based on values in sheet 11.23376 0.8985 19.
12 compares the scatter in the estimates parameters when unweighted and weighted fitting is performed on data which is heteroscedastic. i. 88 .11) Figures 11.11 and 11. In order to compare unweighted and weighted fitting of data to heteroscedastic data.5 4 3 2 1 0 1 2 3 4 5 6 residual 2 4 6 8 10 12 x Figure 11. However. Weighted fitting was performed with the aid of Solver89. An equation of the form: y = a + bx was fitted to simulated data.10) σ i ∝ yi (11. SSR.e.10: Distribution of residuals when an unweighted fit is carried out. where: SSR = ˆ yi − yi yi 2 89 . fifty sets of heteroscedastic data were generated in the manner described above. Unweighted fitting was performed using the LINEST() function in Excel. it is easier to construct a spreadsheet that minimises (using Solver) the sum the residuals. as Excel does not possess an option that allows for easy fitting in this manner. (11. where the weighting was chosen so that the standard deviation in the ith value was taken to be proportional to yi. It is clear from both figures that the weighted fit produces a much narrow distribution in parameter estimates and is therefore preferred over the weighted fit .. Note that the equation being fitted is linear in the parameters and so fitting can be accomplished using weighted linear least squares.
2 3 . 5 0. 5 2. a . up to x = 10. 2.8 to generate y values for x = 1. assume that the relationship for the standard deviation in the y values given by equation 11.4 3 .4 4 parameter estimate. a. b .6 3 . f) Is unweighted fitting by least squares demonstrably better than weighted fitting in this example? 89 . 5 1. .12: Distribution of the parameter estimate. Construct histograms of the scatter in both a and b for both weighted and unweighted fitting. b. when unweighted and weighted fitting is carried out on fifty data sets. 5 3.11: Distribution of the parameter estimate.5 1 .8 3 4 . e) Calculate the mean and standard deviation of a and b that you obtained in part d). d) Repeat part c) at least 40 times.2 4 .10 to the data using both unweighted fitting and weighted least squares.5 0 Figure 11. 3 etc. b) Add normally distributed homoscedastic noise with mean of zero and standard deviation of unity to the y values generated in part a). Exercise 9 a) Use equation 11. For the weighted fit.6 4 parameter estimate. when unweighted and weighted fitting is carried out on fifty data sets. c) Fit equation 11.35 30 frequency 25 20 15 10 5 0 unweight weight 5 4.11 is valid. 25 20 frequency 15 10 5 0 unweight weight Figure 11.
I thank my good friends John Harbottle and Peter Rowley for their excellent hospitality while I was in the UK in 2002 preparing some of this material. UK. I thank the following organisations where parts of this document were prepared: University of Technology. Finally.This document focuses primarily on fitting equations to data using the technique of nonlinear least squares. I would like to include something in the future about identifying and treating outliers as well as points of ‘high leverage’. Sydney. I also acknowledge a timely communication from Dr Marcel Maeder of Newcastle University (New South Wales) who queried the omission of Excel’s Solver from my book. We have also considered briefly how equations fitted to data can be compared in order to determine which equation is the ' better' a statistical sense. I hope this leads to a deeper appreciation of nonlinear least squares than simply entering numbers into a stats package and waiting for the fruits of the analysis to emerge. 90 . and CSIRO. Solver does not provide standard errors. From Luton University I acknowledge the assistance and encouragement of Professor David Rawson. UK. I am grateful to Dr Maeder. as his query provided the spur to create this document. This document is not yet complete. A most important aspect of fitting equations to data is to be able to determine standard errors in the estimates made of any parameters appearing in an equation. so this document describes the means by which standard errors can be calculated using an Excel spreadsheet. Australia. Lindfield. An advantage of employing Excel is that some aspects of fitting by nonlinear least squares which are normally hidden from view when using a conventional computer based statistics package can be made visible with Excel. Section 12: Review Acknowledgements I would like to express my sincere thanks to Dr Mary Mulholland of the Faculty of Science at UTS and Dr Paul Swift (formerly of the same Faculty) for suggesting examples from chemistry and physics that may be usefully treated using nonlinear least squares. the use of the Solver tool packaged with Excel has been considered and how it may be employed for nonlinear least squares fitting. University of Paisley. such as the existence of local minima in SSR and means by which good starting values may be established in advance of fitting. Dr Barry Haggert and Dr John Dilleen. In particular. while at the in same time emphasising that any equation fitted to data should be supported on a foundation of sound physical and/or chemical principles. some discussion of linear least squares has been included and under what circumstances linear least squares is no longer viable. University of Luton. For completeness. Some general issues relating to fitting by nonlinear least squares have been discussed.
x. xC (found by setting y = 0 in equation P2) is given by. Concentration (ppm).809 1. and B is the slope. 0. y 0.65 22.03441 ppm1.00376. 0.009 Problems Table P1: Data for problem 1. 0.2412.437 0.1. 7.159 ppm].009 ppm] b) standard errors in B and xC [0. x 0 5. x.03441 ppm1.20 Absorbance (arbitrary units). was determined for each concentration of added solution. 7. The absorbance of each solution. y = B(xxC) (P1) Where B is the slope of the line of y versus x. Another way to analyse the data in table P1 is to write. y.009 ppm] b) standard errors in the best estimates of A. may be written. In order to establish the concentration of Fe3+ in water. 2. a) best estimates of A. B and xC [0. xC is the intercept on the x axis which represents the concentration of Fe3+ in the water before additions are made.10 16. Standard addition analysis is routinely used to establish the composition of a sample. B and xC [0.000277 ppm1. The absorbance/concentration data are shown in table P1. . solutions containing known concentrations of Fe3+ were added to water samples90. Use nonlinear least squares to fit equation P1 to the data in table P1. Determine. 90 This problem is adapted from Skoog and Leary (1992). Determine. 91 . and concentration.621 0.55 11.240 0. y = A + Bx (P2) Here A is the intercept on the y axis at x = 0. The relationship between absorbance.000277 ppm1. 0. y. The intercept on the x axis. xC = −A B (P3) Use linear least squares to fit equation P2 to the data in table P1.159 ppm]. a) best estimates of B and xC [0.
6 24. Using nonlinear least squares.9906 ml. 92 . 0. the volume of titrant required.6 21.0 7. to reach the end point of a reaction is measured as a function of time. Determine.216 ml.00 × 10 5 s1] (P4) 91 These data were taken from Denton (2000).1). The following data were obtained91. V0 and k [0.Note that the errors in the best estimate of slope and intercept in equation P2 are correlated and so the normal ‘propagation of uncertainties’ method is not valid when calculating xC (see section 8. t.( V∞V0)exp(kt) Where k is the rate constant. 3. a) best estimates of V∞ . The relationship between V and t can be written. V0 and k [28. 0.8 Table P2: Data for problem 3. 3 In a study of first order kinetics. fit equation P4 to the data in table P2.377 ml. V(t) = V∞ .2 15.6 18. 0.0008469 s1] b) standard errors in the estimates of V∞ . V∞ and V0 are also constants.6 12.0 24. V(t).22 ml. t(s) 145 314 638 901 1228 1691 2163 2464 V(t) (ml) 4.
can be written93. Assuming that the relationship between Concentration.00517 0. C= C0 1 + C 0 kt (P5) C0 is the concentration at t = 0 and k is the second order rate constant.0006622 l/mol⋅s. (s) 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000 220000 240000 260000 280000 300000 320000 340000 360000 380000 400000 Concentration.00595 0. (mol/l) 0. 93 . 0.00536 0.00273 0. C. t.00648 0.00482 0. C.00285 0. 1.00167 mol/l. 92 93 See Zielinski and Allendoerfer (1997).00507 0.00414 0.00359 0.00324 0. The variation of energy gap with temperature can be represented by the equation.00450 0.00354 0.00780 0. and time. The assumption is made that a second order kinetics model can represent the reaction.00271 Table P3: Simulated data taken from Zielinski and Allendoerfer (1997). Table P3 contains data obtained from a simulation of a chemical reaction in which noise of constant variance has been added to the data.01000 0.009852 mol/l.4.98 × 10 5 l/mol⋅s] 5.00862 0.00687 0.00309 0. Fit equation P5 to the data in table P3 to obtain best estimates for C0 and k and standard errors in the best estimates. [0. t. Table P4 gives the temperature dependence of the energy gap of high purity crystalline silicon.00333 0.00349 0.92 Time. 0.
1608 1. 0.0786 1.1392 1.1579 1.0723 1.0970 1. [1.0004832 K1.1675 1.1513 1.1639 1.8 × 105 eV.1546 1.E g (T ) = E g (0 ) − T (K) Eg (T) (eV) 1.0595 αT 2 β +T (P6) 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 Table P4: Energy gap versus temperature data.1141 1. where Eg(0) is the energy gap at absolute zero and α and β are constants.1474 1. and 600 respectively for estimates of Eg(0). α and β as well as standard errors in the estimates.1686 1.1696 1.0660 1. 662 K.0908 1. Use starting values.1346 1.7 × 106 K1.0849 1. 0. 11 K] 94 .1196 1.1087 1.170 eV.0004. Fit equation P6 to the data in table P4 to find best estimates of Eg(0).1247 1. 1. 4.1436 1. 7.1657 1.1028 1. α and β .1294 1.1.
0.4021.9790] b) standard errors in estimates [0.825760 8. 0.121109 0. and y = A + Bx y = A + BxC (P7) (P8) Assuming an unweighted fit is appropriate.522 2.403550 0.840554 3.158 0. For each equation fitted to the data.631 0.404 (mg/l)1.631 1.70727 Table P5: HPLC data for biochanin.045 10.57] e) residuals. an HPLC system was calibrated using known concentrations of the phytoestrogen.09 10. Which equation better fits the data? 95 .158 0.631 0. Table P5 contains data of the area under the chromatograph absorption peak as a function of biochanin concentration.06871 33.09 10.485319 16. 3.68172 34.315 0.979. 0. Conc. Draw a graph of residuals versus concentration.045 5.6652. a) best estimates of parameters [0.315 0. The equations are.0127 (mg/l)1.121342 0.09 Area.5074] d) Akaikes information criterion [4. 3.415226 0.581 (mg/l)0.979. 7. fit equations P7 and P8 to the data in table P5.839583 1. (y) (arbitrary units) 0.261 1. biochanin. (x) (mg/l) 0.539992 8. A comparison is to be made of two equations fitted to the data in table P5. In an experiment to study phytoestrogens in Soya beans.835915 3.835114 1.0885.0575. 0.80701 16.91678 33.00903] c) sum of squares of residuals (SSR) [0.261 2. calculate the.523561 8.6.846146 3.045 5.0790 (mg/l)0. 0. 0.69860 16.522 5.522 2.5650.315 0.15. 0.261 1.399678 1.
T I c = 1. and temperature. V.435B c 1 − T Tc 1 2 (P9) Where A and B are constants and Tc is the critical temperature of the superconductor. 1.7. of the sensor as a function of conductivity. for a high temperature superconductor can be written.79] 8.1 K. 96 .7. [3199 mA. to obtain best estimates for the parameters A and B and standard errors in best estimates.74 A 1 − Tc 1 2 T T tanh 0. the following data for critical current and temperature were obtained: T (K) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 I (mA) 5212 5373 5203 4987 4686 4594 4245 4091 3861 3785 3533 3199 2903 2611 2279 1831 1098 29 Table P6: Critical current versus temperature data for a high temperature superconductor with critical temperature of 90. 11. Fit equation P9 to the data in table P6. Ic. 17.0 mA. For a high temperature superconductor with a Tc equal to 90.1 K. T. A sensor developed to measure the electrical conductivity of salt solutions is calibrated using solutions of sodium chloride of known conductivity. The relationship between critical current. Table P7 contains data of signal output. σ.
V = Vs + k 1− exp σ α [ ( )] (P10) Where Vs. Use unweighted nonlinear least squares to determine best estimates of the constants and standard errors in the best estimates.92 8.55 7. νw εr 0.128 0.077 0.4281.256 6.82 6.52 7.31 8.700 38. 97 .00740 V. of solid was measured as a function of moisture content.764 13. k and α are constants.025 0.065 0. Table P8 contains the data obtained in the experiment94.08 6.26 5.38 5. 0.056 0.65 7.69 5.95 7.24 7.095 0. εr. νw (expressed as a fraction).42 5.004 8.19 8.σ (mS/cm) V (volts) 1.132 24.460 V.61 7.465 10.34 Table P7: Signal output from sensor as a function of electrical conductivity.97 5.031 0.017 0.0190 V.116 0.022 0.81 5. the variation of relative permittivity.035 0.047 0. Assume the relationship between εr andνw can be written. 0.100 0. 94 Francois Malan 2002 (private communication).55 6.77 7.0108] 9. 0.55 5.658 31.14 8.689 V.088 7. In a study of the propagation of an electromagnetic wave through a porous solid.15 8.370 4. Assume that the relationship between V and σ is.06 8. [8.013 0.781 17.08 P8: Variation of relative permittivity with moisture content.27 8.504 2. 0. 1.987 14.
1 – 0. y = a + bx to simulated data. S 4R = ˆ ( y i − y i )4 (P13) Perform a Monte Carlo simulation to compare parameter estimates obtained when equations P12 and P13 are used to fit an equation of the form. Unweighted least squares requires the minimisation of SSR given by. e) Is there any significant difference between the parameter estimates obtained when minimising SSR and S4R? f) Is there any significant difference between the variance in the parameter estimates when minimising SSR and S4R? 98 . 2. 1. SSR = ˆ ( y i − yi )2 (P12) A technique sometimes adopted when optimising parameters in optical design situations is to minimise S4R. b) Add normally distributed noise of mean equal to zero and standard deviation of 0. 3 etc up to x = 20.067. εw is relative permittivity of water εm is the relative permittivity of the (dry) porous material Use (unweighted) nonlinear least squares to fit equation P11 to the data in table P8 and hence obtain best estimates of εw and εm and standard errors in the best estimates [55.4x to generate y values for x =1. a) Use the function y = 2. 0. c) Find best estimates of a and b by minimising SSR and S4R as given by equations P12 and P13. More specifically.83.043] 10. 5. where.44.2 εr =νw εw − εm ( ) 2 + 2ν w ( εw − εm )ε m +ε m (P11) where.5 to the values generated in part a). d) Repeat steps b) and c) until 50 sets of parameter estimates have been obtained using equation P12 and P13. (Suggestions: Solver may be used minimise SSR and S4R.
a) Use least squares to fit both equations P14 and P15 to the data in table P9 and determine best estimates for g0 and R. was written: g (h ) = g0 h 1+ R 2 (P14) To study the validity of equation P14.93 1.08 2. of a particular material at a temperature.69 1. For h values small compared to the radius of the Earth. we obtain for a first order approximation. c) Calculate and plot the residuals for each equation fitted to the data in table P9. T.70 3.23 1.1. d) Is equation P15 a reasonable approximation to equation P14 over the range of h values in table P9? 12. h. h (km) 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 g (m/s2) 7. r. Applying the binomial expansion to equation 10. low noise data of free fall acceleration are gathered over a range of values of height. g(h ) and height. b) Calculate standard errors in the best estimates. In section 10.11. h. g (h ) = g 0 1 − 2h R (P15) Contained in table P9 are data of the variation of acceleration with height above the Earth’s surface.68 4. r = A + BT (P16) (P17) 99 or r = α + βT + γT 2 .1. the acceleration will decrease almost linearly with height.49 Table P9: Variation of acceleration due to gravity with height. may be described by. The electrical resistance. the relationship between free fall acceleration.60 2.53 3.33 5.
AIC. SSR [5. [39.41 Ω and 4.8 × 102 Ω/K and 3. 40. 14.1 28.49 Ω and 0.0 × 102 Ω/K and 2.20.2 24.9 23. r (Ω ) 19.6 23.2 23.2 20.19 × 102 Ω/K. 0. α. 2.0 T (K) 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 Table P10: Resistance versus temperature data for an alloy Using (unweighted) linear least squares.9 23.7 × 105 Ω/K2] b) the standard error in each estimate [0.where A.8 21.2 21.4 20.9 20. in each y value [0.5 26.6 × 105 Ω/K2] c) the standard deviation.2 Ω. 5. fit both equation P16 and P17 to the data in table P10 and determine for each equation.3 27.5304 Ω.0 Ω.5368 Ω] d) the sum of squares of the residuals.1 20. 3.1 24. 1.186 Ω 2] e) the Akaikes information criterion. Table P10 shows the variation of the resistance of an alloy with temperature. and γ are constants.3 25. σ.5 18. B. β.2 26.57] 100 .1 26.8 21. a) estimates for the parameters [12.30 × 102 Ω/K.344 Ω2.
Bates D M and Watts D G Nonlinear Regression Analysis and its Applications (1988) Wiley. New York. Denton P Analysis of First Order Kinetics Using Microsoft Excel Solver (2000) Journal of Chemical Education 77. 270 5175.jstatsoft. Bube R H Photoconductivity of Solids (1960) Wiley New York. Porous and surface layer supports (1972) J. Frenkel R D Statistical Background to the ISO ‘Guide to the Expression of Uncertainty in Measurement’ (2002) CSIRO.pdf Bard Y Nonlinear Parameter Estimation (1974) Academic Press. Lasdon L. Sydney p 43. 15241525. Watson J and Waren A Design and Use of Microsoft Excel Solver (1998) Interfaces 28 2955. AlSubaihi A A (2002) Variable Selection in Multivariate Regression using SAS/IML http://www. Chromatogr. Demas J N Excited State Lifetime Measurements (1983) Academic Press. Kirkup L Data Analysis with Excel: An Introduction for Physical Scientists (2002) Cambridge University Press. Mgmt. New Jersey. Int.org/v07/i12/mv. Rev. J. 10 549556. 1. Conway D G and Ragsdale C T Modeling Optimization Problems in the Unstructured World of Spreadsheets (1997) Omega. Bristol. New York. Katz E. Cambridge. 101 References . 25 313322. Chromatogr. Fylstra D. Kennedy G J and Knox J H Performance of packings in high performance liquid chromatography. Ogan K L and Scott R P W Peak Dispersion and Mobile Phase Velocity in Liquid Chromatography: The Pertinent Relationship for Porous Silica (1983) J. Dietrich C R Uncertainty. New York. Bevington P R and Robinson D K Data Reduction and Error Analysis for the Physical Sciences (1992) McGrawHill. Calibration and Probability: Statistics of Scientific and Industrial Measurement 2nd edition (1991) Adam Hilger. London.Akaike H A new look at the statistical model identification (1974) IEEE Transactions on Automatic Control 19 716723. Cleveland W S The Elements of Graphing Data (1994) Hobart Press. 127 419. Karlovsky J Simple Method for calculating the Tunneling Current in an Esaki Diode (1962) Phys. Sci. Sci.
Snyder L R. J. Perry A A Modified Conjugate Gradient Algorithm (1978) Operation Research 26 10731078. 14 95107. Williams I P Matrices for Scientists (1972) Hutchinson University Library. Kirkland. Zielinski T J and Allendoerfer R D Least Squares Fitting of Nonlinear Data in the Undergraduate Laboratory (1997) Journal of Chemical Education 74 10011007. London. NielsenKudsk F A Microcomputer Program in Basic for Iterative. Smith S and Lasdon L Solving Large Sparse Nonlinear Programs Using GRG (1992) Journal on Computing 4 216. Nocedal J Numerical optimization (1999) Springer: New York. BioMed. in Phys. L A Integer Programming (1998) Wiley. Safizadeh M and Signorile R Optimization of Simulation via QuasiNewton Methods (1994) ORSA J. Comput. Nachtsheim C J and Wasserman W Applied Linear Regression Models (1996) Times Mirror Higher Education Group Inc. 6 398408. New York. Kirkup L and Sutherland J Curve Stripping and NonLinear Fitting of Polyexponential Functions to Data using a Microcomputer (1988) Comp. J. Kutner M J. New Jersey. Walker J S Physics (2002) Prentice Hall. Skoog D A and Leary J J Principles of Instrumental Analysis 4 th edition (1992) Harcourt Brace: Fort Worth. New York. Comput.Kirkup L and Cherry I Temperature Dependence of Photoconductive Decay in Sintered Cadmium Sulphide (1988) Eur. 9 6468. Moody H W The Evaluation of the Parameters in the Van Deemter Equation (1982) Journal of Chemical Education 59. J J and Glajch J L Practical HPLC Method Development 2nd Edition (1997) Wiley: New York. Salter C Error Analysis using the VarianceCovariance Matrix (2000) Journal of Chemical Education 77. Walkenbach J Excel 2002 Power Programming with VBA (2001) M&T Books. Phys. 2 6468. Walsh S and Diamond D Nonlinear Curve Fitting Using Microsoft Excel Solver (1995) Talanta 42 561572. NonLinear DataFitting to Pharmacokinetic Functions (1983) Int. 290291. 102 . Wolsey. Neter J. 12391243.