You are on page 1of 2

Linear Least Squares Treatment When There are Errors

in Both x and y
John A. Irvin' and Terry I. Quickenden
Department of Physical and Inorganic Chemistry, The University of Western Australia. Nedlands. W.A.. 6009.Australia
When a straight line is fitted to experimental x , y data by ditions, or who do not wish to run through these checks each
the method of least sauares (1-11) it is commonly assumed
~ ~
time a least squares calculation is carried out, an algorithm
that t h e y value contafins all the error in each data pair. Un- is now presented for the generation of a relatively simple
fortunately, experimental data does not always conform to this program which can be used to fit a straight line to x , y data
restriction and York (12) has presented examples in which the when the errors in x and y are in any ratio whatsoever. The
use of this assum~tioncan cause fitted slopes to deviate by as algorithm is presented in flow-chart form and can readily be
much as 40% from the correct values. ~ 0 r k i l 2has ) alsoshown converted to a Fortran or Basic program by a person pos-
that the artifice of averaging the results of a least squares fit sessing moderate programming experience.
of y on x with the r e s u ~ t s ~ ofit
f iof x on y, can lead to equally The algorithm can also he programmed for use on portable
large errors. Outside of the specialist statistical literature programmable calculators possessing magnetic card memo-
(9-11) there have been few attempts (12) to deal with this ries. althouah the Dropram may need to he separated into two
~rohlem,and the literature of chemistry provides no practical for smaller machines. A; an example, the authors have
advice on how t o handle leazt squarcs calculati~msunder these programmed
. . the flow-chart using an HP 25 calculator. Each
c~irrumstances.T h r purpuse ofthe present paprr is L I I delin- iteration required two separate p r & x n s and convergence was
eate the circumstances under which the conventional linear generally achieved in two iterations.
least squares treatment can he used and to also present an - The algorithm has also been used for several years as a
algorithm i w carrying uut linear least 5quares treatments in Fortran propram on a DEC 10 computer in both teaching and
nll case3 when significant errors exist in 110th x and ?. research-laboratories. No operating problems have arisen and
Validity of the Conventional Treatment When Both
Variables Contain Errors
It is possible to list three cases2 in which the conventional 'Present address: "Photocare," Belmont Plaza. Abernethy Rd..
linear least squares treatment can still he used even when Belmont. W.A.. 6104. Australia.
there are significant errors in both x and y. If the fitted line The~.canventionaiheatment can. of course. also be aDDlied in the
+
is y = a b x , these cases are:
- ~ ~ ~ ~~~ ~

r a m obvious case where most of the enor is c&centrated in x instead


of y(i.e.,when the reverse of inequality in condidion ( 7) holds) simply
(1) when o:. >> b2a:,,where o,, represents the standard error in by carrying out the fit using y as the independent variable.
. . and n, rrprcwnrs the itandvd C m , r I. li,: or
The treatment described by Worthing and Geffner( 13)in which a :,
t21 u hrn n , n , i. 8cmar.m' lor ail dalrs p t i n t i . I : ur
and at are both constant for all data points, is a subset of this special
.I, when m - + hn,' 16im,~ntfwdl d n l ~pinlr.pnrv~ding~nthis case. ii is unnecessary to use the special equations provided by these
ease, thit all points are assigned the s h e arbitrary error in? authors
-~~~ as the conventional treatment in which x is assumed to be
(i.e. that an unweighted least squares fit is applicable)'.
~ ~ ~~

error-fme.
.. . aives
..the- correct
- ~ solution in this case
~~ ~~ ~

A General Llnear Least Squares Treatment The proof of tnfsstatement lo fousquare simply froman inspection
of the form of the equal ons gsven in conActton with the flow-chan.
Unfortunately, the above conditions are not always met, and If W. is constant for all i,then wj may be taken out of the summations
it is indeed rare for them to even be checked. In order to meet and cancelled out of the numerators and denominators of all the
the needs of those whose data does not meet the above con- equations.
Equations for Use with the Flow Charta

Covariance

Correlation coefficient 2 w,2 W,X,Y, - 2w,x,2wiy,


( 2w,2& - ( 2 ~ , 4 ) ~ ) w,2Y?w,
"~(2 - (2w,yW2
2
e,
in which n is the number of data points, is abbreviated by 2 in both the flawzhan and in the above.

Volume 60 Number 9 September 1983 711


treatment requires substantial
PStart
programming expertise. An
approximate solution has been

6 Input n
provided by Deming (1) but is
still somewhat complex for the
novice programmer. In the
present work, we have used a
v to be fed in? 4 Input (xi. yi)
for i = 1to n
simpler, but mathematically
equivalent (9) approximation.
Here, b is kept constant in the
denominator of eqn. (1) and s
Yes Input (xi, yi, aYi),then evaluate is then minimized with respect
only to be fed in? wi = l/a;i for i = 1to n. to a and b in the numerator.
The new estimate of b thus
produced is inserted into the
Yes Input maximum number of iterations denominator and the process
(xi, yi, Oyi)for i = 1 to n. reiterated until convergence is
obtained.
It has been shown by Powell
and Macdonald (10) that even
Set wi = 1 in a very adverse case, the ap-
for i = 1 to n proximate solution converges
to give a slope and an intercept
which each lie within a few
percent of the exact solution.
This is a very substantial im-
provement upon the perfor-
mance of the conventional
linear least squares method
1 Evaluate the slope 1 which can introduce (12) dis-
crepancies as high as 40% in
the slope and the intercept
Were errors in both x and y when the error in x is ignored.
fed in at the input step? The only significant deficiency
in the present treatment is that
it is possible in extreme cases
for the standard errors in the
so far and the slope slope and intercept to be too
large by up to 40% (10). Occa-
Evaluate the error in the sional overestimation of the
slope and the intercept, the errors in the slope and inter-
Has the number of iterations covariance and the correlation cept is a small price to pay for
exceeded the maximum? coefficient. the simplicity of the algorithm
presented here, and normally
this will not pose a significant
problem. In situations where
i = 1 to n . using.the new slope the intercept and slope errors
I I
must he known with a high
Flow chart for the linear least squares Program. degree of reliability, the reader
is referred to the exact treat-
ment of Powell and Macdonald
ample opportunity has been provided for cross-checking the (10) which makes much more substantial demands on pro-
output in this period. As physical scientists frequently require gramming expertise and on computational facilities.
error estimates for the quantities derived from the slope and
the intercept of a least squares line, an important feature of Acknowledgment
the algorithm is the calculation of slope and intercept errors. J. A. I. gratefully acknowledges the tenure of a postgraduate
T h e correlation coefficient also forms part of the output. research studentship from the Australian Institute of Nuclear
Science and Engineering during the earlier stages of this
Theory of the Treatment work.
The problem of finding the line of best fit to experimental
Literature Cited
data is usually (1,8) solved by minimizing the weighted sum
of the squares, s, of the deviation of the measured points from
the fitted line. The quantity, s , is given by
(61 Sands. D. E . . J C H E M EDUC..51.477 (IY74l.
( y i - n - bx;I2 i s ) (:hristian.S.D..Lane. E.H.. and Garland. F.. J. C M E M Eilllc., 51.475 11871l.
5 - 2
OF; + b2ol.
and is minimized with respect to the independent variables,
a . and b. which resnectivelv. remesent
. the intercept and slope
u l the least squares line.
T h e exact solution to this problem requires (10) a compli-
cated numerical approach when a,, is non-zero. This type of

712 Journal of Chemical Education

You might also like