This action might not be possible to undo. Are you sure you want to continue?
There are lies, damn lies, and statistics...... (Anon.)
6.1 Introduction 6.2 Definitions 6.3 Basic Statistics 6.4 Statistical tests
In the preceding chapters basic elements for the proper execution of analytical work such as personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical work, however, one more tool for the quality assurance of the work must be dealt with: the statistical operations necessary to control and verify the analytical procedures (Chapter 7) as well as the resulting data (Chapter 8). It was stated before that making mistakes in analytical work is unavoidable. This is the reason why a complex system of precautions to prevent errors and traps to detect them has to be set up. An important aspect of the quality control is the detection of both random and systematic errors. This can be done by critically looking at the performance of the analysis as a whole and also of the instruments and operators involved in the job. For the detection itself as well as for the quantification of the errors, statistical treatment of data is indispensable. A multitude of different statistical tools is available, some of them simple, some complicated, and often very specific for certain purposes. In analytical work, the most important common operation is the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few simple convenient statistical tools most of the information needed in regular laboratory work can be obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of these will be given in the ensuing pages. Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical treatment, by an experienced and dedicated analyst may be just as useful as statistical figures on the desk of the disinterested. The value of statistics lies with organizing and simplifying data, to permit some objective estimate showing that an analysis is under control or that a change has occurred. Equally important is that the results of these statistical procedures are recorded and can be retrieved.
6.2.1 Error 6.2.2 Accuracy 6.2.3 Precision 6.2.4 Bias
Discussing Quality Control implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the most important concepts will be defined first.
Error is the collective noun for any departure of the result from the "true" value*. Analytical errors can be: 1. Random or unpredictable deviations between replicates, quantified with the "standard deviation". 2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.e. the difference between the true value and the mean of replicate determinations). 3. Constant, unrelated to the concentration of the substance analyzed (the analyte). 4. Proportional, i.e. related to the concentration of the analyte. * The "true" value of an attribute is by nature indeterminate and often has only a very relative meaning. Particularly in soil science for several attributes there is no such thing as the true value as any value obtained is method-dependent (e.g. cation exchange capacity). Obviously, this does not mean that no adequate analysis serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of external QC (see Chapter 9).
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by a combination of random and systematic errors (precision and bias) and cannot be quantified directly. The test result may be a mean of several values. An accurate determination produces a "true" quantitative value, i.e. it is precise and free of bias.
The closeness with which results of replicate analyses of a sample agree. It is a measure of dispersion or scattering around the mean value and usually expressed in terms of standard deviation, standard error or a range (difference between the highest and the lowest result).
The consistent deviation of analytical results from the "true" value caused by systematic errors in a procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the mean of analytical results with the true value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias: 1. Method bias
The difference between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level. 2. Laboratory bias The difference between the (mean) test result from a particular laboratory and the accepted reference value. 3. Sample bias The difference between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practice, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories have their own field sampling personnel). The relationship between these concepts can be expressed in the following equation: Figure
The types of errors are illustrated in Fig. 6-1. Fig. 6-1. Accuracy and precision in laboratory measurements. (Note that the qualifications apply to the mean of results: in c the mean is accurate but some individual results are inaccurate)
3. the t-test. 6-2) and the main tools the F-test. . The primary parameters used are the mean (or average) and the standard deviation (see Fig.3 Basic Statistics 6.5 Propagation of errors In the discussions of Chapters 7 and 8 basic statistical treatment of data will be considered.3 Relative standard deviation.4 Confidence limits of a measurement 6.2 Standard deviation 6.1 Mean 6. The basic assumption to be made is that a set of data.3.3. has anormal or Gaussian distribution. Coefficient of variation 6. some understanding of these statistics is essential and they will briefly be discussed here. Therefore. (When the distribution is skewed statistical treatment is more complicated).3. obtained by repeated analysis of the same analyte in the same sample under the same conditions. and regression and correlation analysis.3.6.
precision) it is often more convenient to use the relative standard deviation (RSD) than the standard deviation itself.3. 6-2.) 68% of the data fall in the range ¯ x± s.5. QuattroPro.1 Mean The average of a set of n data xi: (6. there are several ways of calculation: (6.6) .3. 6. Coefficient of variation Although the standard deviation of analytical data may not vary much over limited ranges of such data.3. Therefore. but more usually as a percentage and is then called coefficient of variation (CV).3) or (6. The RSD is expressed as a fraction. Often. Operationally.7% in the range ¯x ± 3s. however.4) The calculation of the mean and the standard deviation can easily be done on a calculator but most conveniently on a PC with computer programs such as dBASE.3 Relative standard deviation. divided by n1. these terms are confused. it usually depends on the magnitude of such data: the larger the figures. 6.1) ¯ 6. and 99.2 Standard deviation This is the most commonly used measure of the spread or dispersion of data around the mean. Lotus 123. which have simple ready-to-use functions. The standard deviation is defined as the square root of the variance (V). for comparison of variations (e. (Warning: some programs use n rather than n. The variance is defined as the sum of the squared deviations from the mean. the larger s. The figure shows that (approx.g. A Gaussian or normal distribution. and others. Excel.1) or (6. 95% in the range ¯x ± 2s.Fig. 6. (6.1!).
Note. 6.847 19. of the analyte content (assuming absence of bias).828 19. A single analysis of a test sample can be regarded as literally sampling the imaginary set of a multitude of results obtained for that test sample. therefore. Therefore.04 mL .11) the variance can.0627 mL.885 19. s = standard deviation of mean of subsamples n = number of subsamples (The term is also known as the standard error of the mean.812 19. be calculated by squaring the standard deviation: V = s2 (6.2). (6. see Eq. a semi-automatic pipette installation is used with a 20 mL pipette. for the F-test.8) where = "true" value (mean of large set of replicates) ¯x = mean of subsamples t = a statistical value which depends on the number of data and the required confidence (usually 95%). the closer the mean x of the results will approach the "true" value . here referred to as ttab ). the pipette has to be calibrated.842 ± 2.4. both the accuracy (trueness) and precision have to be established.7) 6.) The critical values for t are tabulated in Appendix 1 (they are.4 Confidence limits of a measurement The more an analysis or measurement is replicated. The uncertainty of such subsampling is expressed by (6. the number of degrees of freedom has to be established by: df = n -1 (see also Section 6. i. To find the applicable value. Example For the determination of the clay content in the particle-size analysis. According to Appendix 1 for n = 10 is ttab = 2.e.3.84 ± 0.8) this calibration yields: pipette volume = 19.26 (0.797 19. of course.742 19. This volume is approximate and the operation involves the opening and closing of taps.941 19.0627/ ) = 19.829 19. When needed (e.g.26 (df = 9) and using Eq.937 19. A tenfold measurement of the volume yielded the following set of data (in mL): 19.804 The mean is 19.842 mL and the standard deviation 0.
3. certain calibrations. 6. .3. See also bias). the statistical parameters have to be obtained in another way.10) where S is the previously determined standard deviation of the large set of replicates (see also Fig. triplicate analysis will increase the confidence by a factor further discussed in Section 8. Therefore. determinations for which no previous data exist. Therefore. Duplicates are Thus.9) where = "true" value x = single measurement t = applicable ttab (Appendix 1) s = standard deviation of set of previous measurements. 6. Running duplicates will. and with analytical conditions. No laboratory will analyze a test sample 50 times to be confident that the result is reliable.8). Note: This "method-s" or s of a control sample is not a constant and may vary for different test materials.(Note that the pipette has a systematic deviation from 20 mL as this is outside the found confidence interval. Equation (6. increase the confidence of the (mean) result by a factor : where ¯x = mean of duplicates s = known standard deviation of large set Similarly.8) can be applied in various ways to determine the size of errors (confidence) in analytical work or measurements: single determinations in routine work. according to Equation (6. etc. results are usually single values obtained in batches of several test samples. In routine analytical work.5 Propagation of errors . the (95%) confidence of the result x of a single test sample (n = 1 in Eq. 6-2). In Appendix 1 can be seen that if the set of replicated measurements is large (say > 30). Most usually this is done by method validation (see Chapter 7) and/or by keeping control charts.8) is approximated by the commonly used and well known expression (6. which is basically the collection of analytical results from one or more control samples in each batch (see Chapter 8). t is close to 2. etc. analyte levels. in summary.3.8) is then reduced to (6. Equation (6.
However.2. 6. or from control charts.15.. moisture correction..5.g.15. then the total precision is expressed by the standard deviation obtained by taking the square root of the sum of individual variances (squares of standard deviation): If a (sub)measurement has a constant multiplication factor or coefficient (such as an extra dilution).3.1.5. e. Mg. The main distinction to be made is between random errors (precision) and systematic errors (bias).). c. Summation calculations If the final result x is obtained from the sum (or difference) of (sub)measurements a. etc.3.60 cmolc/kg respectively. The total precision is: .6. Na. I. (2b)2 Example The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation of the exchangeable cations: ECEC = Exch. Chapter 8).2 Propagation of systematic errors The final result of an analysis is often calculated from several measurements performed during the procedure (weighing. dilution. Because the "adding-up" of errors is usually not a simple summation. b. 0. For daily practice. 0. and 0.30. e. the treatment of summation or subtraction of factors is different from that of multiplication or division. this will be discussed. As was indicated in Section 6. the bias and precision of the whole method are usually the most relevant parameters (obtained from validation. K and (H + Al) on a certain sample. (Ca + Mg + Na + K + H + Al) Standard deviations experimentally obtained for exchangeable Ca.3.g.1. Propagation of random errors In estimating the total random error from factors in a final calculation.: x = a + b + c +. titration. For instance if one wants to change (part of) the method. are: 0. etc. then this is included to calculate the effect of the variance concerned. calibration. a control sample. instrument readings. Chapter 7.25.5. sometimes it is useful to get an insight in the contributions of the subprocedures (and then these have to be determined separately). Propagation of random errors 6. 0. the total error in an analytical result is an adding-up of the sub-errors made in the various steps.
Then the RSD of the other individual parameters have to be determined experimentally. (2RSDb)2.5%.2%. titration: 0. calculations contain both summations and multiplications. for instance: distillation: 0. but (much) less than their sum. 2. . as a fraction or as percentage.It can be seen that the total standard deviation is larger than the highest individual standard deviation. see Eqs.7): If a (sub)measurement has a constant multiplication factor or coefficient. 6.6 and 6.5) or (6. this calculation contains a subtraction also (often. qualitatively the best result can be expected from reducing the largest individual contribution.8%. The found RSDs are. the standard deviation of the titration (a -b) is determined as indicated in Section 7 above. It is also clear that if one wants to reduce the total standard deviation. molarity: 0. e. This is then transformed to RSD using Equations (6.) Firstly.g. Example The calculation of Kjeldahl-nitrogen may be as follows: where a = ml HCl required for titration sample b = ml HCl required for titration blank s = air-dry sample weight in gram M = molarity of HCl 1. Multiplication calculations If the final result x is obtained from multiplication (or subtraction) of (sub)measurements according to then the total error is expressed by the standard deviation obtained by taking the square root of the sum of the individual relative standard deviations (RSD or CV.4 = 14×10-3×100% (14 = atomic weight of N) mcf = moisture correction factor Note that in addition to multiplications. in this case the exchangeable acidity.6). then this is included to calculate the effect of the RSD concerned.
Note that some systematic errors may cancel out. pay to improve the homogeneity of the sample. .2 F-test for precision 6. the systematic error in a balance will cause a systematic error in the sample weight (as well as in the moisture determination).performance of two instruments. however.4. In practice. Some examples of comparisons in practice are: . The only way to detect or avoid systematic errors is by comparison (calibration) with independent standards and outside reference or control samples. .4 Statistical tests 6.1. the precision of the Kjeldahl method is usually considerably worse ( 2.performance of two analysts or laboratories.5 Analysis of variance (ANOVA) In analytical work a frequently recurring operation is the verification of performance by comparison of data. .g. Note. 6. e.2 Propagation of systematic errors Systematic errors of (sub)measurements contribute directly to the total bias of the result since the individual parameters in the calculation of the final result each carry their own bias. However. by careful grinding and mixing in the preparatory stage. The total calculated precision is: Here again.5%) probably mainly as a result of the heterogeneity of the sample.4 Linear correlation and regression 6. mcf: 0. It would.4. Sample heterogeneity is also represented in the moisture correction factor.3 t-Tests for bias 6. one-sided test 6.g.1 Two-sided vs. .2%. For instance. the influence of this factor on the final result is usually very small.4. e. This implies that painstaking efforts to improve subprocedures such as the titration or the preparation of standard solutions may not be very rewarding. weighings by difference may not be affected by a biased balance. the highest RSD (of distillation) dominates the total precision. The present example does not take that into account.5. 6.0% = 1.2%.4.performance of a procedure in different periods. It would imply that 2.sample weight: 0.5% or 3/5 of the total random error is due to sample heterogeneity (or other overlooked cause).performance of two methods.3.4.5% .
the t-tests. C. .. Because the F-test and the t-tests are the most basic tests they will be discussed first. "target" or "assigned" value of this sample. Three possible cases when comparing two sets of data (n1 = n2). 6-3. Same mean (no bias). has not been drawn). These tests examine if two sets of normally distributed data are similar or dissimilar (belong or not belong to the same "population") by comparing their standard deviations and means respectively.results obtained for a reference or control sample with the "true". different precision. Different mean (bias). identical sets. B. same precision. Some of the most common and convenient statistical tools to quantify such comparisons are the F-test. A. and regression analysis. Fig. This is illustrated in Fig. (The fourth case. Both mean and precision are different. 6-3.
and ¯x2. we can be confident (usually at 95% level) that A and B are not different. . This illustrates that statistical tests differ in strictness and that for proper interpretation of results in reports.5% to 5% at the end of one tail. 6-2). for instance between methods A and B.4. If we perform the one-sided test with 5% uncertainty. However. are based on the assumption that there is no significant difference (the "null hypothesis"). In this case the probability that it goes the other way than expected is assumed to be zero and.g. more correctly. i. however. whereas the one-sided test says that the result of A is significantly higher (or lower) than that of B.2 F-test for precision Because the result of the F-test may be needed to choose between the Student's t-test and the Cochran variant (see next section). the twosided test indicates no significant difference. we actually increase this 2. this is then equivalent to an uncertainty of 10% in two ways!) This difference in probability in the tests is expressed in the use of two tables of critical values for both F and t. when the difference is so small that a tabulated critical value of F or t is not exceeded. (Note that for the whole gaussian curve.4. one-sided test These tests for comparison.1 Two-sided vs. when it is expected or suspected that the mean and/or the standard deviation will go only one way. is A higher (or lower) than B? (one-sided test). the one-sided (or onetailed) test is appropriate. with the t-test: 1. the probability that it goes the expected way is doubled. the statistical techniques used. 2. are A and B different? (two-sided test) 2. and then. should always be specified.6.3).5% at the end of each tail beyond 2s. In other words. Two fundamentally different questions can be asked concerning both the comparison of the standard deviations s1 and s2 with the F-test. e. then a two times higher probability level was used in retrospect. In fact. and of the means¯x1. after a change in an analytical procedure. What actually happens is that in the first case the 2. including the confidence limits or probability. the one-sided table at 95% confidence level is equivalent to the two-sided table at 90% confidence level. which is symmetrical.5% boundary in the tail was just not exceeded. the F-test is discussed first. the uncertainty in the two-way test of 5% (or the probability of 5% that the critical value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6. this 2. The most common case is the two-sided (also calledtwo-tailed) test: there are no particular reasons to expect that the means or the standard deviations of two data sets are different. An example is the routine comparison of a control chart with the previous one (see 8. This distinction has an important practical implication as statistically the probabilities for the two situations are different: the chance that A and B are only different ("it can go two ways") is twice as large as the chance that A is higher (or lower) than B ("it can go only one way"). subsequently.5% boundary is relaxed to 5% which is then obviously more easily exceeded. Or. In fact. therefore. and ttab. Of course it is tempting to perform this test after the results show a clear (unexpected) effect. This is underscored by the observation that in this way even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range between the two-sided and one-sided values of Ftab. It is emphasized that the one-sided test is only appropriate when a difference in one direction is expected or aimed at.e.
9 10.0 10.2 9. and s2.6 2 9. is accepted). Thus. Table 6-1. The test makes use of the ratio of the two variances: (6. then the estimates s1. = s.7 9.df1. which can be found in statistical textbooks. It can be concluded with 95% confidence that there is no significant difference in precision between the work of Analyst 1 and 2.8 11.34 9.The F-test (or Fisher's test) is a comparison of the spread of two sets of data to test if the sets belong to the same population. To read the table it is necessary to know the applicable number of degrees of freedom for s1. Example I (two-sided test) Table 6-1 gives the data sets obtained by two analysts for the cation exchange capacity (CEC) of a control sample.2 10.11) the calculated F value is 1. In certain cases more confidence may be needed. Using Equation (6. This exceeds the calculated value and the null hypothesis (no difference) is accepted. If the performances are not very different. and s2.1 9. there is still a 5% chance that we draw the wrong conclusion.2 11.9 8.2 10.3 10. see Appendix 2).819 0.2 10. we use the F-table for the two-sided test and find Ftab = 4.11) where the larger s2 must be the numerator by convention.0 11.62. These are calculated by: df1 = n1-1 df2 = n2-1 If Fcal Ftab one can conclude with 95% confidence that there is no significant difference in precision (the "null hypothesis" that s1. 1 10.5 9. = df2 = 9). do not differ much and their ratio (and that of their squares) should not deviate much from unity.97 0.9 9.7 10. CEC values (in cmolc/kg) of a control sample determined by two analysts. As we had no particular reason to expect that the analysts would perform differently.62 . In practice.644 10 10 tcal = 1.12 ¯x: s: n: Fcal = 1. in other words if the precisions are similar or dissimilar.03 (Appendix 2.8 10. the calculated F is compared with the applicable F value in the F-table (also called thecritical value. then a 99% confidence table can be used.5 10.4 9.
51 2.4 2.7 2.4.5 2.4 B 1. A 2. Student's t-test 6.3 t-Tests for bias 6.8 2.3.6 2.2 2.5 1.9 2. 2.4.3 t-Test for large data sets (n 30) 6. Because of the nature of the rapid method we suspect it to produce a lower precision then obtained with the Scheibler method and we can.3.5 2. Contents of CaCO3 (in mass/mass %) in a soil sample determined with the Scheibler method (A) and the rapid titration method (B).3.4.5 2. Table 6-2.4.6 2.6 1. df1.3 2.12 ttab* = 2.4 2.07 (ttab* = Cochran's "alternative" ttab) 6.7 1.3 Ftab = 3. The results are given in Table 6-2.18 x: s: n: Fcal = 18. therefore.1.3) and the null hypothesis (no difference) is rejected.03 ttab = 2. perform the one sided F-test.3.2 Cochran's t-test 6.4 Paired t-test .6 2.3 2.6 1. = 12.4 2.13 0.07 (App. The applicable Ftab = 3.9 2.4. It can be concluded (with 95% confidence) that for this one sample the precision of the rapid titration method is significantly worse than that of the Scheibler method.5 2.10 Example 2 (one-sided test) The determination of the calcium carbonate content with the Scheibler standard method is compared with the simple and more rapid "acid-neutralization" method using one and the same sample.Ftab = 4.7 2.099 0. df2 = 9) which is lower than Fcal(=18.424 10 13 tcal = 3.
the "true" value of proper reference samples is accompanied by the associated standard deviation and number of replicates used to determine these parameters.12) is then replaced by the mean of a second data set. sampling nature). the paired t-test for comparison of strongly dependent sets of data.Depending on the nature of two sets of data (n.12) where ¯x = mean of test results of a sample = "true" or reference value s = standard deviation of test results n = number of test results of the sample. More commonly. In other words. To compare the mean of a data set with a reference value normally the "two-sided t-table of critical values" is used (Appendix 1). if the difference between the means ¯x1-¯x2 is small. the means of the sets can be compared for bias by several variants of the t-test. 2. usually 95%). 3. As with the F-test. the Cochran variant of the t-test when the standard deviations of the independent sets differ significantly. When the standard deviations are not sufficiently similar an alternative procedure for the t-test must be followed in which the standard deviations are not . the data are taken to belong to the same population: there is no difference and the "null hypothesis" is accepted (with the applicable probability. This is discussed next. the one-sided t-test can be performed: if tcal > ttab. Basically. when it is expected or suspected that the obtained results are higher or lower than that of the reference value. As is shown in Fig. Student's t-test for comparison of two independent sets of data with very similar standard deviations. We can then apply the more general case of comparing the means of two data sets: the "true" value in Equation (6. a choice of the type of test must be made depending on the similarity (or non-similarity) of the standard deviations of the two sets. 6-3. s. for the t-tests Equation (6. however. The applicable number of degrees of freedom here is: df = n-1 If a value for t calculated with Equation (6. The following most common types will be discussed: 1.12) does not exceed the critical value in the table. to test if two data sets belong to the same population it is tested if the two Gauss curves do sufficiently overlap. If the standard deviations are sufficiently similar they can be "pooled" and the Student t-test can be used. then the results are significantly higher (or lower) than the reference value. Similarity or non-similarity of standard deviations When using the t-test for two small sets of data (n1 and/or n2<30).8) is used but written in a different way: (6.
and s2 are similar according to Ftest.10 (App. for small data sets. Equation (6.12 which is lower than the critical value ttab of 2.13) where ¯x1 = mean of data set 1 ¯x2 = mean of data set 2 sp = "pooled" standard deviation of the sets n1 = number of data in set 1 n2 = number of data in set 2. A convenient alternative is the Cochran variant of the t-test. The pooled standard deviation sp is calculated by: 6.4. When comparing two sets of data. Student's t-test (To be applied to small data sets (n1. if the variances do or do not significantly differ. the applicable number of degrees of freedom df is here calculated by: df = n1 + n2 -2 Example The two data sets of Table 6-1 can be used: With Equations (6. 6.12) is rewritten as: (6. twosided). that is. 3).1.3 and App.2).3.4. 30) the "normal" t-test is used (see Section 6.3. n2 < 30) where s1. n2. .pooled.13) and (6. is calculated as 1. For dealing with large data sets (n1. To perform the t-test. hence the null hypothesis (no difference) is accepted and the two data sets are assumed to belong to the same population: there is no significant difference between the mean results of the two analysts (with 95% confidence).4.14 where s1 = standard deviation of data set 1 s2 = standard deviation of data set 2 n1 = number of data in set 1 n2 = number of data in set 2. the F-test should precede the t-test.14) tcal. 1. The criterion for the choice is the passing or non-passing of the F-test (see 6. df =18. the critical ttab has to be found in the table (Appendix 1). Therefore.
37 ± 0. Equation 6.97 = 0. Example The two data sets of Table 6-2 can be used.Note.8): confidence limits = 0.69.06 Note that the value 0 for the difference is situated within this confidence interval which agrees with the null hypothesis of x1 = x2 (no difference) having been accepted. < 30) where s1 and s2. In other words.3. Calculate t with: 6. Another illustrative way to perform this test for bias is to calculate if the difference between the means falls within or outside the range where this difference is still not significantly large. The measured difference between the means is 10.16 Then determine an "alternative" critical t-value: 6.2 Cochran's t-test To be applied to small data sets (n1. 6. This can be derived from Equation (6. in this approach the 95% confidence limits of the difference between the means can be calculated (cf. . if this difference is less than the least significant difference (lsd).32 and 1. are dissimilar according to F-test.34 -9.69 = -0.15 In the present example of Table 6-1. the calculation yields lsd = 0. n2.17 where t1 = ttab at n1-1 degrees of freedom t2 = ttab at n2-1 degrees of freedom Now the t-test can be performed as usual: if tcal< ttab* then the null hypothesis that the means do not significantly differ is accepted.13): 6.37 which is smaller than the lsdindicating that there is no significant difference between the performance of the analysts. In addition.4.
The proper choice of the t-test as discussed above is summarized in a flow diagram in Appendix 3. Comparison of results at each level could be done by the F and t-tests as described above. Example 1). If the analysis covers a longer range. and decreases with increasing number of data. the paired t-test can be a better tool for comparison than the "normal" t-test described in the previous sections. This is caused by the fact that the difference in result of the Student and Cochran variants of the t-test is largest when small sets of data are compared. 6. Namely. pH 7). (6. If it is expected that one set is systematically higher (or lower) than the other set. The null hypothesis is that there is no difference between the data sets. The paired t-test. comparison of two methods using different levels of analyte gives more validation information about the methods than using only one level. however. we have no idea about the bias and therefore the two-sided test is appropriate.4. the range of results should be within the same magnitude. The calculations yield tcal = 3. several powers of ten. 1 that the two-sided ttab is now close to 2).12 and ttab*= 2.4. As a rule of fist. then the one-sided test is appropriate.4.16) and comparing this with ttab atdf = n1 + n2-2.2) the conclusion happens to have been the same if the Student's t-test with pooled standard deviations had been used. with increasing number of data a better estimate of the real distribution of the population is obtained (the flatter t-distribution converges then to the standardized normal distribution).18 meaning that tcal exceeds ttab* which implies that the null hypothesis (no difference) is rejected and that the mean of the rapid analysis deviates significantly from that of the standard analysis (with 95% confidence. i.3.4.g. allows for different levels provided the concentration range is not too wide. Example 1 The "promising" rapid single-extraction method for the determination of the cation exchange capacity of soils using the silver thiourea complex (AgTU. in contrast to our expectation that the precision of the rapid test would be inferior. so the test is to see if the mean of the differences between the data deviates significantly from zero or not (twosided test). (Note in App. buffered at pH 7) was compared with the traditional ammonium acetate method (NH4OAc. when comparing Control Charts (see 8. Further investigation of the rapid method would have to include the use of more different samples and then comparison with the one-sided t-test would be justified (see 6.e. 6. regression analysis must be considered (see Section 6. also be applied to the example of Table 6-1 if the two analysts used the same analytical method at (about) the same time.3). This is for instance the case when two methods are compared by the same analyst using the same sample(s). the standard deviations differ significantly so that the Cochran variant must be used. Furthermore. It could. When n 30 for both sets.4.3.3 t-Test for large data sets (n 30) In the example above (6.3. and for this sample only). for all practical purposes the difference between the Student and Cochran variant is negligible.4). The procedure is then reduced to the "normal" t-test by simply calculating tcal with Eq. Although for certain soil . In intermediate cases.4 Paired t-test When two data sets are not independent. As stated previously.4.3.According to the F-test. e. in fact. either technique may be chosen.
e.3 5 25. nor can the F-test be applied.2 6 4. no difference). replicate determinations are needed.4 8.0 d -0.6 23.26 Using Equation (6.4 7 7. The difference d within each pair and the parameters needed for the paired t-test are given also.4 5. for other types differences seemed larger.12) and noting that d = 0 (hypothesis value of the differences.6 14.5 19.83 (App.7 9 14.8 +4. For information about precision. 1.0 +0.8 10.types the difference in results appeared insignificant.89 sd = 2.6 +1.89) exceeds the critical value of 1. i. Since such data sets do not have a normal distribution. highly weathered sesquioxide-rich soils). In Table 6-3 the results often soils with these properties are grouped to test if the CEC methods give different results.8 8 2.6 +2.5 5.6 3 10.e. Example 2 . CEC values (in cmolc/kg) obtained by the NH4OAc and AgTU methods (both at pH 7) for ten soils with ferralic properties. onesided).3 10 13.1 2 4.9 +3.6 4 2.3 -1.2 15. Note. hence the null hypothesis that the methods do not differ is rejected and it is concluded that the silver thiourea method gives significantly higher results as compared with the ammonium acetate method when applied to such highly weathered soils.0 +3. Such a suspect group were soils with ferralic (oxic) properties (i. Table 6-3.5 5. Sample NH4OAc 1 7.395 ttab = 2. the t-value can be calculated as: where = mean of differences within each pair of data sd = standard deviation of the mean of differences n = number of pairs of data The calculated t value (=2. For the same reason no information about the precision of the two methods can be obtained.4 ¯d = +2.9 +1.4 +6. df = n -1 = 9.6 AgTU 6. the "normal" t-test which compares means of sets cannot be used here (the means do not constitute a fair representation of the sets).19 tcal = 2.
6.4.21 sd = 12. there is a fundamental difference in concept: correlation analysis is applied to independent factors: if X increases. what will Y do (increase.4. Table 6-4.12) and noting that d=0 (hypothesis value of the differences. whereas regression analysis can be used to construct calibration graphs. or perhaps not change at all)? In regression analysis a unilateral response is assumed: changes in X result in changes in Y.0 85. Sample Median Lab L d 1 93.18 (Appendix 1. twosided). Although the technique is in principle the same for both. For example. .Table 6-4 shows the data of total-P in four plant tissue samples obtained by a laboratory L and the median values obtained by 123 laboratories in a proficiency (roundrobin) test. (6.2 Comparing two sets of data using many samples at different analyte levels These also belong to the most common useful statistical tools to compare effects and performances X and Y. Total-P contents (in mmol/kg) of plant tissue as determined by 123 laboratories (Median) and Laboratory L. correlation analysis can be used for comparing methods or laboratories. and the results of Laboratory L seem to agree with those of "the rest of the world" (this is a so-called third-line control). in analytical work. the t value can be calculated as: The calculated t-value is below the critical value of 3. Even more convenient are the regression programs included in statistical packages such as Statistix. hence the null hypothesis that the laboratory does not significantly differ from the group of laboratories is accepted. however.e.5 5.1 Construction of calibration graph 6. df = n .4.18 To verify the performance of the laboratory a paired t-test can be performed: Using Eq.4.6 4 175 185 10 ¯d = 7.2 -7.70 tcal =1. but changes in Y do not result in changes in X. Mathcad.4. comparison of laboratories or methods is usually also done by regression analysis. i. The calculations can be performed on a (programmed) calculator or more conveniently on a PC using a home-made program. no difference). In practice.8 2 201 224 23 3 78.4 Linear correlation and regression 6.1 = 3. decrease. Eureka.702 ttab = 3.9 84.
Statcal.18) where a = intercept of the line with the y-axis b = slope (tangent) In laboratory work ideally. As was discussed in Section 6.Genstat. correlation analysis is indicated: 1. 2. Nevertheless. When the concentration range is so wide that the errors. but since these are less common in analytical work and more complex to handle mathematically. SPSS.). they will not be discussed here. Excel. When pairing is inappropriate for other reasons. Also. However. always inspect the kind of relationship by plotting the data. are not independent (which is the assumption for the t-tests). both random and systematic. non-linear higher-order relationships are also possible. However. Laboratories or methods are in fact independent factors. This is often the case where concentration ranges of several magnitudes are involved.3. or the response of an instrument to a series of standard solutions (regression). most spreadsheet programs such as Lotus 123.4. for regression analysis one factor has to be the independent or "constant" factor (e. for example. This is the so-called "1:1 line" passing through the origin (dashed line in Fig. etc. we speak of "regression of Y on X"). The resulting line takes the general form: y = bx + a (6. notably a long time span between the two analyses (sample aging. Note: Naturally.g. The principle is to establish a statistical linear relationship between two sets of corresponding data by fitting the data to a straight line by means of the "least squares" technique. change in laboratory conditions. analytical results of two methods applied to the same samples (correlation). If the intercept a 0 then there is a systematic discrepancy (bias. either on paper or on the computer monitor. and others. whereas the other factor is then the dependent factor Y (thus. 6-5). when there is perfect positive correlation without bias. when b 1 then there is a proportional response or difference between Xand Y. or the factor with the smallest standard deviation). the reference method. and Quattro-Prohave functions for this. The correlation between X and Y is expressed by the correlation coefficient r which can be calculated with the following equation: . the intercept a = 0 and the slope = 1. to avoid misinterpretation. Such data are. error) between X and Y. such comparisons can often been done with the Student/Cochran or paired t-tests. This factor is by convention designated X.
4. we take a standard series of P (0-1.b¯x 6. for example.504) of the variation in Y is due to the variation in X.98) this must be taken as a warning and it might then be advisable to repeat or review the procedure. the calibration graph should always be plotted. The advantage of r2 is that. 6-4.20 and a = ¯y .6.21 It is worth to note that r is independent of the choice which factor is the independent factory and which is the dependent Y. reading in absorbance units. The calculation of the correlation coefficient r with Equation (6. to verify this. The data and calculated terms needed to determine the parameters of the calibration graph are given in Table 6-5. On the other hand. it indicates the percentage of variation in Y associated with variation in X. 6. the regression parameters a and do depend on this choice as the regression lines will be different (except when there is ideal 1:1 correlation).19 where xi = data X ¯x = mean of data X yi = data Y ¯y = mean of data Y It can be shown that r can vary from 1 to -1: r = 1 perfect positive linear correlation r = 0 no linear correlation (maybe other correlation) r = -1 perfect negative linear correlation Often.g. when r = 0. Such high values are common for calibration graphs. either on paper or on computer monitor. in pipetting) or the used range of the graph may not be linear. a high r may be misleading as it does not necessarily indicate linearity. Thus. the correlation coefficient r is expressed as r2: the coefficient of determination or coefficient of variance. Errors may have been made (e. The line parameters b and a are calculated with the following equations: 6. below 0.997 (r2 = 0. The line itself is plotted in Fig. Table 6-5 is presented here to give an insight in the steps and terms involved. .0 mg/L) for the spectrophotometric determination of phosphate in a Bray-I extract ("available P"). when multiplied by 100.1 Construction of calibration graph As an example. When the value is not close to 1 (say.19) yields a value of 0.995).71 about 50% (r2 = 0. Therefore.4. However.
22) Table 6-5.626x + 0.09 0.17 0.3 -0.10 x1-¯x -0.029 0.01 0.160 0.438 ¯x=0.09 0.30 -0.2754 0.006 0.20 and (6.037 (6.25 0.08 0. Note that the confidence is highest at the centroid of the graph.0 3.044 0.4 0.70 yi-¯y -0.051 0.Using Equations (6. 6-4.21) we obtain: and a = 0.0.3 0. 6-4.67 2.5 -0.35 Fig.008 0. .063 0. xi 0.05 0.43 0.350 .32 0 (yi-¯y)2 (x1-¯x)(yi-¯y) 0.0 yi 0.150 0.01 0.25 0. Calibration graph plotted from data of Table 6-5.29 0.2 0. The dashed lines delineate the 95% confidence area of the graph.5 ¯y = 0.52 0.037 Thus.6 0.006 0.06 0.102 0.090 0.14 0.0 0.004 0.1 0. the equation of the calibration line is: y = 0. Parameters of calibration graph in Fig.21 -0.313 = 0.1 0.8 1.5 0 (xi-¯x)2 0.
its use is simple: for each y value measured the corresponding concentration x can be determined either by direct reading or by calculation using Equation (6.4.2. but preferably more than 10 different samples are analyzed .The samples should rather uniformly cover the analyte level range of interest.The most precise data set is plotted on the x-axis .4.2).22). reverse the variables and run the analysis again.At least 6. multi-replicate results have to be used to calculate standard deviations (see 6. A treatise of the error or uncertainty in the regression line is given. rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8. the maximum number of decimals is used. . If these are not available then the standard deviations of the present sets could be compared (note that we are now not dealing with normally distributed sets of replicate results). Note. 6.2 Comparing two sets of data using many samples at different analyte levels Although regression analysis assumes that one factor (on the x-axis) is constant. The use of calibration graphs is further discussed in Section 7.2). These conditions are: .2. Another convenient way is to run the regression analysis on the computer. Once the calibration graph is established.During calculation. To decide which laboratory or method is the most precise. when certain conditions are met the technique can also successfully be applied to comparing two variables such as laboratories or methods.4.
The slope (= 1. the pH.97 which is very satisfactory. .18.8 unit at the high end. the paired t-test might have been considered also).03) indicates that the regression line is only slightly steeper than the 1:1 ideal regression line. If the analyte level range is incomplete. see Chapter 9) and the data compared by regression. is shown here as an illustration. 6-5. is the intercept a of -1. Scatter plot of pH data of two laboratories. (In this particular case. both given by the computer) and then use the results of the regression analysis where this variable was plotted on the x-axis. with the inherent drawback that the original analyte-sample combination may not adequately be reflected. a large number of soil samples were analyzed by two laboratories X and Y (a form of "third-line control". Fig. This implies that laboratory Y measures the pH more than a whole unit lower than laboratory X at the low end of the pH range (the intercept -1. Drawn line: regression line. Figure 6-5 shows the so-called "scatter plot" of 124 soil pH-H2O determinations by the two laboratories.Observe which variable has the lowest standard deviation (or standard error of the intercept a. dashed line: 1:1 ideal regression line.18 is at pHx = 0) which difference decreases to about 0. Very disturbing. however. The correlation coefficient r is 0. one might have to resort to spiking or standard additions. The regression line of a common attribute. Example In the framework of a performance verification programme.
The t-test for significance is as follows: For intercept a: a = 0 (null hypothesis: no bias.98 (App. 1. ideal intercept is then zero).14 (calculated by the computer).98 (App.12) we obtain: Again. two-sided. and again using Equation (6. the laboratories have a significant mutual bias. standard error = 0. df = n . ttab = 1. hence.2 = 122 (n-2 because an extra degree of freedom is lost as the data are used for both a and b) hence. . two-sided.02 (given by computer).12) we obtain: Here. df = 122). the difference between the laboratories is not significantly proportional (or: the laboratories do not have a significant difference in sensitivity). ttab = 1. 1. These results suggest that in spite of the good correlation. and using Equation (6. the two laboratories would have to look into the cause of the bias. standard error =0. For slope: b = 1 (ideal slope: null hypothesis is no difference).
It is assumed that deviations in the x-direction are negligible. In such cases. random errors are implied.4.23) where = "fitted" y-value for each xi.weighted regression (not discussed here) is more appropriate than the unweighted regression as used here. (read from graph or calculated with Eq.5) may reveal that precision can change significantly with the level of analyte (and with other factors such as sample matrix). This is. A quantification can be found in the standard deviation of these parameters. different methods within one of the laboratories). the "residual standard deviation" or "standard error of the yestimate". for further discussion the reader is referred to statistical textbooks. As a treatise of ANOVA is beyond the scope of the present Guidelines.5 Analysis of variance (ANOVA) When results of laboratories or methods are compared where more than one factor can be of influence and must be distinguished from random effects. of course. Hence.4.1 is elaborated here. see Fig. 6. only the case if the standards are very accurately prepared. then ANOVA is a powerful statistical tool to be used. samples with different pre-treatments. the scattering of the points around the regression line does not seem to change much over the whole range.3. which we assumed to be constant (but which is only approximately so. 6. 64): (6. In the present example. Now the standard deviations for the intercept a and slope b can be calculated with: . is the (vertical) deviation of the found y-values from the line. This is not always the case. Error or uncertainty in the regression line The "fitting" of the calibration graph is necessary because the response points yi. the example of the calibration graph in Section 6. To illustrate the procedure. Most computer programmes for regression will automatically produce figures for these. This indicates that the precision of laboratory Y does not change very much over the range with respect to laboratory X. n = number of calibration points. composing the line do not fall exactly on the line. Examples of such factors are: different analysts. different analyte levels.22). This is expressed by an uncertainty about the slope and intercept b and a defining the line.Note. A practical quantification of the uncertainty is obtained by calculating the standard deviation of the points on the line. Thus. some of which are given in the list of Literature. Most statistical packages for the PC can perform this analysis. Validation of a method (see Section 7. Note: Only the y-deviations of the points from the line are considered.
14 0.001364 In the present example.29 0.sb Table 6-6.43 0. (6.538 0.017 -0. using Eq. df = n -1 = 4) hence.022 0.6 0. 1. using Eq.413 0. Parameters for calculating errors due to calibration graph (use also figures of Table 6-5).003 0.0 yi 0.0002 0.23).037 ± 2. two-sided. the parameters involved are listed in Table 6-6.0003 0.9): a ± t.037 and b = 0.2 0.037 0.25 To make this procedure clear.037 ± 0. xi 0 0.24 and 6.24) and Table 6-5: and.0001 0.78 × 0. (6. (6.67 0.007 0.663 0.0219 = 0.8 1. (6.0000 0.061 .25) and Table 6-5: The applicable ttab is 2.78 × 0.sa and b ± t.0132 = 0.626 ± 2.013 -0.6.4 0.287 0.0005 0.018 0.78 (App. (6.52 0.05 0. The uncertainty about the regression line is expressed by the confidence limits of a and b according to Eq.0003 0. we calculate and. using Eq.9): a = 0.626 ± 0.162 0. using Eq.
i. a negative reading for the blank or zero-standard.78 to 2. a negative value for a is possible.Note that if sa is large enough. see Section 7. (For a discussion about the error in x resulting from a reading in y.57 (see Appendix 1).2. which is particularly relevant for reading a calibration graph. .3) The uncertainty about the line is somewhat decreased by using more calibration points (assuming sy has not increased): one more point reduces ttab from 2.e.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.