Professional Documents
Culture Documents
that whenever there is significant uncertainty in the x data, ba- Six useful quantities can be derived from Sxx, Syy, and Sxy, as
sic linear least-squares analysis may not give the best straight follows:
line. In such a case, a more complex correlation analysis may be
necessary. In addition, simple least-squares analysis may not be 1. The slope of the line, m:
appropriate when the uncertainties in the y values vary signifi- Sxy
cantly with x. In this case, it may be necessary to apply different m5 (a1-34)
weighting factors to the points and perform a weighted least- Sxx
squares analysis.15 2. The intercept, b:
b 5 y 2 mx (a1-35)
a1D-2 Finding the Least-Squares Line 3. The standard deviation about regression, sr:
As illustrated in Figure a1-6, the vertical deviation of each point
from the straight line is called a residual. The line generated by Syy 2 m2Sxx
sr 5 (a1-36)
the least-squares method is the one that minimizes the sum Å N22
of the squares of the residuals for all the points. In addition to
providing the best fit between the experimental points and the 4. The standard deviation of the slope, sm:
straight line, the method gives the standard deviations for m
and b. s2r
sm 5 (a1-37)
Å Sxx
The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes these according to the minimi- 5. The standard deviation of the intercept, sb:
a xi
zation technique of calculus.16 The value of SSresid is found from
2
a a
SSresid 5 a 3 yi 2 1 b 1 mxi 2 4 2
N sb 5 sr
Å N xi 2 1 xi 2 2
2
i51
1
Å N 2 1a xi 2 2/ a x2i
where N is the number of points used. The calculation of the 5 sr (a1-38)
slope and intercept is simplified by defining three quantities Sxx,
Syy, and Sxy as follows:
1 a xi 2 2
6. The standard deviation for results obtained from
Sxx 5 a 1 xi 2 x 2 2 5 a x2i 2
the calibration curve, sc:
(a1-31)
N 1 yc 2 y 2 2
sr 1 1
1 a yi 2 2
sc 5 1 1 (a1-39)
Syy 5 a 1 yi 2 y 2 2 5 a y2i 2
m ÅM N m2Sxx
(a1-32)
N Equation a1-39 allows us to calculate the standard deviation
a xi a yi
from the mean yc of a set of M replicate analyses of unknowns
Sxy 5 a 1 xi 2 x 2 1 yi 2 y 2 5 a xiyi 2 (a1-33) when a calibration curve that contains N points is used; recall
N that y is the mean value of y for the N calibration points. This
where xi and yi are individual pairs of data for x and y, N is the equation is only approximate and assumes that the slope and
number of pairs, and x and y are the average values for x and y; intercept are independent parameters, which is not strictly true.
that is, x 5 a xi /N and y 5 a yi /N. The standard deviation about regression sr (Equation a1-36)
Note that Sxx and Syy are the sums of the squares of the de- is the standard deviation for y when the deviations are mea-
viations from the mean for individual values of x and y. The sured not from the mean of y (as is the usual case), but from the
expressions shown on the far right in Equations a1-31 through straight line that results from the least-squares prediction. The
a1-33 are more convenient when a calculator without a built-in value of sr is related to SSresid by
a 3 yi 2 1 b 1 mxi 2 4
regression function is being used. N
2
i51 SSresid
sr 5 5
ã N22 ÅN 2 2
In this equation, the number of degrees of freedom is N − 2
15For an Excel approach to weighted linear regression, see S. R. Crouch and because one degree of freedom is lost in calculating m and one
F. J. Holler, Applications of Microsoft Excel in Analytical Chemistry, 3rd ed., Bel- in determining b. The standard deviation about regression is
mont, CA: Cengage Learning, 2017, pp. 331–337.
often called the standard error of the estimate. It roughly cor-
16The procedure involves differentiating SS
resid with respect to first m and then b responds to the size of a typical deviation from the estimated
and setting the derivatives equal to 0. This yields two equations, called normal
equations, in the two unknowns m and b. These are then solved to give the least- regression line. Examples a1-11 and a1-12 illustrate how these
squares best estimates of these parameters. quantities are calculated and used.
a1D Method of Least Squares a1D 905
a xi a yi
Example a1-12
Sxy 5 a xiyi 2
N The calibration curve found in Example a1-11 was used for
5.365 3 12.51 the chromatographic determination of isooctane in a hydro-
5 15.81992 2 5 2.39669 carbon mixture. A peak area of 2.65 was obtained. Calculate
5
the mole percent of isooctane in the mixture and the standard
Substitution of these quantities into Equations a1-34 and deviation if the area was (a) the result of a single measurement
a1-35 yields and (b) the mean of four measurements.
Solution
2.39669 In either case, the unknown concentration is found from rear-
m5 5 2.0925 < 2.09
1.14537 ranging the least-squares equation for the line, which gives
y 2b y 2 0.2567
12.51 5.365 x5 5
b5 2 2.0925 3 5 0.2567 < 0.26 m 2.0925
5 5
2.65 2 0.2567
5 5 1.144 mol %
2.0925
(a) Substituting into Equation a1-39, we obtain
Table a1-7 Calibration Data for the Chromato-
graphic Determination of Isooctane in a Hydrocarbon 0.1442 1 1 1 2.65 2 12.51/5 2 2
sc 5 1 1
Mixture 2.0925 Å 1 5 1 2.0925 2 2 3 1.145
5 0.076 mol %
Mole
Percent Peak (b) For the mean of four measurements,
Isooctane, Area,
0.1442 1 1 1 2.65 2 12.51/5 2 2
xi yi x 2i y 2i xi y i sc 5 1 1
2.0925 Å 4 5 1 2.0925 2 2 3 1.145
0.352 1.09 0.12390 1.1881 0.38368
5 0.046 mol %
0.803 1.78 0.64481 3.1684 1.42934
1.08 2.60 1.16640 6.7600 2.80800 Note that the advent of powerful statistics and spreadsheet
software has greatly eased the burden of performing least-
1.38 3.03 1.90440 9.1809 4.18140 squares analysis of data.17
1.75 4.01 3.06250 16.0801 7.01750
5.365 12.51 6.90201 36.3775 15.81992 S. R. Crouch and F. J. Holler, Applications of Microsoft® Excel in Analytical
17See