You are on page 1of 3

a1D Method of Least Squares a1D 903

The test statistic can now be calculated as


x 2 m0 0.116 2 0.123 Residual 5
t 5 5 5 24.375
s/ !N 0.032/ !4 yi 2 (mxi 1 b)

From Table a1-5, we find that the critical value of t for 3


degrees of freedom and the 95% confidence level is 3.18.
Because t # 23.18, we conclude that there is a significant 5.0
difference at the 95% confidence level and, thus, bias in the
method. Note that if we were to carry out this test at the 99%
confidence level, tcrit 5 5.84 (Table a1-5). Because 25.84 <
24.375, we would accept the null hypothesis at the 99% confi- 4.0
dence level and conclude there is no difference between the
experimental and the accepted values.

y, Peak area, arbitrary units


3.0
Note that in this example the outcome depends on the confi-
dence level that is used. The choice of confidence level depends
y
on our willingness to accept an error in the outcome.12 The
significance level (0.05, 0.01, 0.001, etc.) is the probability of 2.0
making an error by rejecting the null hypothesis.
Hypothesis testing is widely used in science and engineer-
ing.13 Comparison of one or two samples is carried out as
­described here. The principles can, however, be extended to 1.0
comparisons among more than two population means. Multiple
comparisons fall under the general category of analysis of vari-
ance (ANOVA).14 These methods use a single test to determine x
0
whether there is a difference among the population means rath- 0 0.5 1.0 1.5 2.0
er than pairwise comparisons as is done with the t test. After x, Concentration of isooctane, mol %
ANOVA indicates a potential difference, multiple comparison
FIGURE a1-6 Calibration curve for determining isooctane in
procedures can be used to identify which specific population
­hydrocarbon mixtures.
means differ from the others. Experimental design methods take
advantage of ANOVA in planning and performing experiments.
that not all the data fall exactly on the line because of the
­random errors in the measurement process. Thus, we must try
to find the “best” straight line through the points. Regression
a1D Method of Least Squares analysis provides the means for objectively obtaining such a
Most analytical methods are based on a calibration curve in line and also for specifying the uncertainties associated with its
which a measured quantity y is plotted as a function of the subsequent use. We consider here only the basic method of least
known concentration x of a series of standards. Figure a1-6 squares for two-dimensional data.
shows a typical calibration curve, which was computed for the
chromatographic determination of isooctane in hydrocarbon
a1D-1 Assumptions of the
samples. The ordinate (the dependent variable) is the area un-
der the chromatographic peak for isooctane, and the abscissa Least-Squares Method
(the independent variable) is the mole percent of isooctane. As Two assumptions are made in using the method of least
is typical, the plot approximates a straight line. Note, ­however, squares. The first is that there is actually a linear relationship
between the measured response y and the standard analyte
concentration x. The mathematical relationship that describes
this assumption is called the regression model, which may be
12For a discussion of errors in hypothesis testing, see J. L. Devore, Probability represented as
and Statistics for Engineering and the Sciences, 9th ed., Chap. 8, Boston: Brooks/
Cole, 2016. y 5 mx 1 b
13See D. A. Skoog, D. M. West, F. J. Holler, and S. R. Crouch, Fundamentals of where b is the y intercept (the value of y when x is 0) and m
Analytical Chemistry, 9th ed., Chap. 7, Belmont, CA: Brooks/Cole, 2014; S. R. is the slope of the line. We also assume that any deviation of
Crouch and F. J. Holler, Applications of Microsoft® Excel in Analytical Chemis-
try, 3rd ed., Belmont, CA: Cengage Learning, 2017.
the individual points from the straight line arises from error
14 For a discussion of ANOVA methods, see D. A. Skoog, D. M. West, F. J. ­Holler,
in the measurement. That is, we assume there is no error in x
and S. R. Crouch, Fundamentals of Analytical Chemistry, 9th ed., Sec. 7C, values of the points (concentrations). Both of these assumptions
­Belmont, CA: Brooks/Cole, 2014. are appropriate for many analytical methods, but bear in mind
904 Appendix 1 Evaluation of Analytical Data

that whenever there is significant uncertainty in the x data, ba- Six useful quantities can be derived from Sxx, Syy, and Sxy, as
sic linear least-squares analysis may not give the best straight follows:
line. In such a case, a more complex correlation analysis may be
necessary. In addition, simple least-squares analysis may not be 1. The slope of the line, m:
appropriate when the uncertainties in the y values vary signifi- Sxy
cantly with x. In this case, it may be necessary to apply different m5 (a1-34)
weighting factors to the points and perform a weighted least- Sxx
squares analysis.15 2. The intercept, b:

b 5 y 2 mx (a1-35)
a1D-2 Finding the Least-Squares Line 3. The standard deviation about regression, sr:
As illustrated in Figure a1-6, the vertical deviation of each point
from the straight line is called a residual. The line generated by Syy 2 m2Sxx
sr 5 (a1-36)
the least-squares method is the one that minimizes the sum Å N22
of the squares of the residuals for all the points. In addition to
providing the best fit between the experimental points and the 4. The standard deviation of the slope, sm:
straight line, the method gives the standard deviations for m
and b. s2r
sm 5 (a1-37)
Å Sxx
The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes these according to the minimi- 5. The standard deviation of the intercept, sb:

a xi
zation technique of calculus.16 The value of SSresid is found from
2

a a

SSresid 5 a 3 yi 2 1 b 1 mxi 2 4 2
N sb 5 sr
Å N xi 2 1 xi 2 2
2

i51

1
Å N 2 1a xi 2 2/ a x2i
where N is the number of points used. The calculation of the 5 sr (a1-38)
slope and intercept is simplified by defining three quantities Sxx,
Syy, and Sxy as follows:
1 a xi 2 2
6. The standard deviation for results obtained from

Sxx 5 a 1 xi 2 x 2 2 5 a x2i 2
the calibration curve, sc:
(a1-31)
N 1 yc 2 y 2 2
sr 1 1
1 a yi 2 2
sc 5 1 1 (a1-39)
Syy 5 a 1 yi 2 y 2 2 5 a y2i 2
m ÅM N m2Sxx
(a1-32)
N Equation a1-39 allows us to calculate the standard deviation

a xi a yi
from the mean yc of a set of M replicate analyses of unknowns
Sxy 5 a 1 xi 2 x 2 1 yi 2 y 2 5 a xiyi 2 (a1-33) when a calibration curve that contains N points is used; recall
N that y is the mean value of y for the N calibration points. This
where xi and yi are individual pairs of data for x and y, N is the equation is only approximate and assumes that the slope and
number of pairs, and x and y are the average values for x and y; intercept are independent parameters, which is not strictly true.
that is, x 5 a xi /N and y 5 a yi /N. The standard deviation about regression sr (Equation a1-36)
Note that Sxx and Syy are the sums of the squares of the de- is the standard deviation for y when the deviations are mea-
viations from the mean for individual values of x and y. The sured not from the mean of y (as is the usual case), but from the
expressions shown on the far right in Equations a1-31 through straight line that results from the least-squares prediction. The
a1-33 are more convenient when a calculator without a built-in value of sr is related to SSresid by

a 3 yi 2 1 b 1 mxi 2 4
regression function is being used. N
2
i51 SSresid
sr 5 5
ã N22 ÅN 2 2
In this equation, the number of degrees of freedom is N − 2
15For an Excel approach to weighted linear regression, see S. R. Crouch and ­because one degree of freedom is lost in calculating m and one
F. J. Holler, Applications of Microsoft Excel in Analytical Chemistry, 3rd ed., Bel- in determining b. The standard deviation about regression is
mont, CA: Cengage Learning, 2017, pp. 331–337.
often called the standard error of the estimate. It roughly cor-
16The procedure involves differentiating SS
resid with respect to first m and then b responds to the size of a typical deviation from the estimated
and setting the derivatives equal to 0. This yields two equations, called normal
equations, in the two unknowns m and b. These are then solved to give the least- regression line. Examples a1-11 and a1-12 illustrate how these
squares best estimates of these parameters. quantities are calculated and used.
a1D Method of Least Squares a1D 905

Thus, the equation for the least-squares line is


Example a1-11
y 5 2.09x 1 0.26
Complete a least-squares analysis of the experimental data Substitution into Equation a1-36 yields the standard deviation
provided in the first two columns of Table a1-7 and plotted in about regression:
Figure a1-6.
Syy 2 m2Sxx 5.07748 2 1 2.0925 2 2 3 1.14537
Solution sr 5 5
Å N22 Å 522
Columns 3, 4, and 5 of the table contain computed values for
x2i , y2i , and xi yi, with their sums appearing as the last entry in 5 0.1442 < 0.14
each column. Note that the number of digits carried in the and substitution into Equation a1-37 gives the standard
computed values should be the maximum allowed by the cal- ­deviation of the slope:
culator or computer; that is, rounding should not be performed
until the calculation is complete. s2r 1 0.1442 2 2
We now substitute into Equations a1-31, a1-32, and sm 5 5 5 0.13
Å Sxx Å 1.14537
a1-33 and obtain
Finally, we find the standard deviation of the intercept from
1 a yi 2 2
Syy 5 a y2i 2
Equation a1-38:

N 1
sb 5 0.1442 5 0.16
1 12.51 2 2 Å 5 2 1 5.365 2 2/6.9021
5 36.3775 2 5 5.07748
5

a xi a yi
Example a1-12
Sxy 5 a xiyi 2
N The calibration curve found in Example a1-11 was used for
5.365 3 12.51 the chromatographic determination of isooctane in a hydro-
5 15.81992 2 5 2.39669 carbon mixture. A peak area of 2.65 was obtained. Calculate
5
the mole percent of isooctane in the mixture and the standard
Substitution of these quantities into Equations a1-34 and deviation if the area was (a) the result of a single measurement
a1-35 yields and (b) the mean of four measurements.
Solution
2.39669 In either case, the unknown concentration is found from rear-
m5 5 2.0925 < 2.09
1.14537 ranging the least-squares equation for the line, which gives
y 2b y 2 0.2567
12.51 5.365 x5 5
b5 2 2.0925 3 5 0.2567 < 0.26 m 2.0925
5 5
2.65 2 0.2567
5 5 1.144 mol %
2.0925
(a) Substituting into Equation a1-39, we obtain
Table a1-7 Calibration Data for the Chromato-
graphic Determination of Isooctane in a Hydrocarbon 0.1442 1 1 1 2.65 2 12.51/5 2 2
sc 5 1 1
Mixture 2.0925 Å 1 5 1 2.0925 2 2 3 1.145
5 0.076 mol %
Mole
Percent Peak (b) For the mean of four measurements,
Isooctane, Area,
0.1442 1 1 1 2.65 2 12.51/5 2 2
xi yi x 2i y 2i xi y i sc 5 1 1
2.0925 Å 4 5 1 2.0925 2 2 3 1.145
0.352  1.09 0.12390 1.1881 0.38368
5 0.046 mol %
0.803  1.78 0.64481 3.1684 1.42934
1.08  2.60 1.16640 6.7600 2.80800 Note that the advent of powerful statistics and spreadsheet
software has greatly eased the burden of performing least-
1.38  3.03 1.90440 9.1809 4.18140 squares analysis of data.17
1.75  4.01 3.06250 16.0801 7.01750
5.365 12.51 6.90201 36.3775 15.81992 S. R. Crouch and F. J. Holler, Applications of Microsoft® Excel in Analytical
17See

Chemistry, 3rd ed., Belmont, CA: Cengage Learning, 2017.

You might also like