You are on page 1of 80

CHAPTER THREE

STATISTICAL EVALUATION OF ANALYTICAL


DATA
3. INTRODUCTION
•Any laboratory that performs analysis providing a basis for
convenience must have a quality assurance programme.

•The aspects which can answer the quality assurance are the
assessing of analytical data.

•When assessing certain analytical data we are generally


interested most in learning that upto what extent the results
are reliable or how far they agree with the actual content of
the component analyzed.
1
•We can use statistical methods to evaluate the
random (indeterminate) errors which follow
a normal distribution

•Statistically calculations are necessary to


understand the significance of the data that are
collected and, therefore, to set limitations on
each step of analysis.

2
Some Important Terms
Replicate determinations:
single measurement cannot be taken as an accurate
result.
Determination of the number of times a measurement
should be replicated in order to approach the value of
experimental mean around the true mean with a certain
degree of probability.
Our confidence in an analytical result is increased by
increasing the number of parallel determinations,
known as
That is, the more numerous the number of
observations the more their results approach the truth.
3
The mean is the most commonly used measure
of the central value and the less common used
measures are the median and the mode.
Second, an analysis of the variation in results
helps us to estimate the uncertainty associated
with the central value of the data.
 You should note that population is the
collection of all measurements (very large
number and to infinity to the analyst, while a
sample is the subset of these measurements
(finite number of measurements) selected from
the population and we also call it as finite sample.
4
3.1 Mean, Median, Standard Deviation, Variance

• Mean
• The mean is the most widely used measure of the central
value.
• For a finite sample (for n < 30) the mean known as sample
mean, which represented by x and is the arithmetic average
of all the observations in the set of data;

where, x1, x2…, xn are the replicate observations, xi represents


the individual value of x making up the set of n number of
observations,
the symbol ∑xi means summation of x individual values from i
= 1 to i = n, i.e. to add up all of the individual values of x in the
5
set of replicate analyses
For the entire population of data or universe of data (the number
of observations approaching infinity i.e. N →∞) the mean known
as population mean is represented by μ and is given by

where N has been used to denote the large number of observations.

We will see later, after studying the probability distribution of


data that the population mean μ is the most probable value and is
taken to be the true value of the measured quantity.

6
• Example.3.1
• What is the mean for the data in Table 3.1?
• SOLUTION
• To calculate the mean, we add the results for all measurements
3.080 + 3.094 + 3.107 + 3.056 + 3.112 + 3.174 + 3.198 =
21.821 and divide by the number of measurements

7
Median
• Median (M): is the middle value of an odd number of results
listed in the order of magnitude, or the average of the two middle
ones for an even number of results.

• Example: For X1, X2, X3, X4, X5, where X1 < X2 < X3 < X4 < X5, M = X3
• For X1, X2, X3, X4, X5, X6, where X1 < X2 < X3 < X4 < X5 < X6,

M=

• For small numbers of measurements, the median may represent


the true result better than the mean does.

• This is because the median is less influenced by an outlying or


divergent value(i.e., one that appears to differ from the other
values ) in a data set.
8
• Statistically it can be shown that the median of 10 observations is
as efficient conveying the information as is the mean from 7
observations.

• The median is used advantageously when a set of analytical data


contains a probable outlying result, a result that differs
significantly from others in the set.

• An outlying result does not affect the median value since the
outlying result lies on the extremes. On the other hand, an
outlying result can have a significant effect on the mean of the
set since it is included in the calculation of the mean.

9
• Example.3.2
• What is the median for the data in Table 3.1?
SOLUTION
• To determine the median, we order the data from the
smallest to the largest value
3.056 3.080 3.094 3.107 3.112 3.174 3.198
• Since there is a total of seven measurements, the median is
the fourth value in the ordered data set; thus, the median is
3.107.

10
Mode
• The observation which occurs most frequently (i.e. with
maximum frequency) in a series of observations is known as
mode.

• It is yet another quick measure of central value if the number of


observations is not too small.

• For example, the mode of the set of data: 12.6, 12, 7, 12.9, 12, 7,
12.6, 12.8, 13.0, 12.5, 12.6, the value 12.6 is the mode since this
is occurring with maximum frequency (four times).

• Range (R): is defined as the difference between the highest and


the lowest result.
R = Xhighest – Xlowest
11
Deviation
• The error of a measurement cannot be stated if the true value of
the quantity is not known.
• It is meaningful then to take the difference between a particular
measured value (observation) and the arithmetical mean of a series
of measurements and this difference is called as its deviation for
apparent error.
• A deviation is generally taken without regard to sign. It is defined
mathematically as,

• where, d is the deviation of the observation x of a finite sample



from its mean x , D is the deviation of an individual measurement
from the population mean μ, and | | denotes that the difference is
taken as absolute. The reproducibility of measurements is
expressed in terms of various types of deviations.
12
Average Deviation
• The average deviation (a.d.) or the mean
deviation is the average of individual
deviations:

• where the symbols have their usual meanings.


• The ratio of the average deviation to the mean is known as
Relative Average Deviation (RAD) which can be expressed as
percent average deviation when multiplied by 100.

 Historically the average deviation has been widely employed as the estimate of
precision. However, it suffers from the disadvantage that the estimate of this statistics
depends upon the number of measurements. The larger the number the better will be
the estimate. 13
Standard Deviation
• Standard deviation is the most important statistic to indicate the
precision of an analysis.

• According to the International Union of Pure and Applied Chemistry


(IUPAC) the symbol is used for population standard deviation
and the symbol S is used for sample standard deviation.

• When the number of observations is very large (N →∞) the standard


deviation known as population standard deviation, which is used to
express the precision of a population of data is given by the square
root of the average of squares of deviations, thus,

14
• where, xi represents the individual observations, Di the individual
deviations, μ the population mean, N the number of observations,
and the symbol ∑ denotes the summation for i = 1 to i = N values.

• For most of the cases in analytical chemistry a finite sample is


considered where number of observations is finite (n < 30) and for
finite sample the standard deviation known as sample standard
deviation, s, is the square root of summation of squares of
deviations divided by (n – 1) and is given by

Where (n – 1) is known as the degrees of freedom and other terms


and symbols have their usual meaning.
15
Example 3.1
In an iron determination (taking 1 g sample every time) the
following four replicate results were obtained: 29.8, 30.2, 28.6
and 29.7 mg iron. Calculate the standard deviation of the given
data.

16
WeStandard Deviation of the Mean
know that the arithmetic mean of a series of n
measurements is more reliable (precise) than an
individual observation.
It can be shown statistically that the mean of n results
is n times as reliable as any one of the individual
results.
The precision is expressed in terms of deviation and
less the deviation the more precise the result is.

In other words, the deviation in the mean of a series of


4 observations is one-half that of a single observation,
and the deviation in the mean of a series of 16
observations is one-fourth that of a single observation.
17
the deviation of the mean of series of n measurements is inversely
proportional to the square root of n of the deviation of the individual
values. Thus

And the standard deviation of the mean (Smean) is inversely


proportional to the square root of n of the standard deviation of
the individual values (s).

The standard deviation of the mean is sometimes referred to as


the Standard Error.
18
• Variance (V)
• The term that is sometimes useful in statistics is the variance
(V). This is the square of the standard deviation. The sample
variance is given by

And is an estimate of population variance ,

Coefficient of variation (CV) = relative standard deviation


(RSD) = 100 s/x.
19
• Example:
• Four measurements of the weight of an object whose correct weight is
0.1026 g are 0.1021, 0.1025, 0.1019 and 0.1023 g. Calculate the mean,
the median, the range, the average deviation, the relative average
deviation (%), the standard deviation, the relative standard deviation (%),
error of mean and the relative error of the mean.

20
• Pooled standard deviation (sp)
• When we wish to calculate a standard deviation of a number of set
of analytical data obtained from several samples of varying
composition, it is preferable to use pooled standard deviation, sp.

The pooled standard deviation is sometimes used to obtain an


improved estimate of the precision of a method and it is used for
calculating the precision of two sets of data in a paired ‘t’ test.

That is, rather than relying on a single set of data to describe the
precision of a method, it is sometimes preferable to perform several
sets of analyses, for example, on different days, or on different
21
samples with slightly different compositions.
• If the indeterminate (random) error is assumed to be the same
for each set (assume the same source of random error in all the
measurements), then the precision of data of the different sets
can be pooled.
• This assumption is usually valid if the samples have similar
composition and have been analyzed in exactly the same way.
• This provides a more reliable estimate of the precision of a
method than is obtained from a single set.
• In the pooled standard deviation calculation, one degree of
freedom is lost in each subset.
• Thus, the number of degrees of freedom for the pooled s is
equal to the total number of measurements minus the number
of subset.

22
• 3.2 Accuracy and Precision of measurements
• A dart board is a good way to illustrate precision and accuracy.

Fig. 1 Precision
Precision is the strength of agreement between replicate
measurements. It tells us how close multiple values are to each
other. It refers to the magnitude of random errors and the
reproducibility of measurements.
Figure 1. illustrates a series of results that are very close to each
other i.e. have good precision.
Measuring Precision
Precision is usually discussed in terms of standard deviation (SD)
and percent coefficient of variation (%CV).
23
Fig. 2 Accuracy
Accuracy is a measure of the agreement between the estimates of a value and the
“true” value.
• Accuracy refers to how close a value is to the “true” value.
• Figure 2 illustrates a series of results that are accurate i.e. close to true value.
• Accuracy is expressed in terms of either absolute or relative error.
• a) Absolute error (E): is the difference between the measured value and the
accepted true value. It bears sign (could be positive or negative). Negative sign
indicates the experimental result is smaller than the accepted value. E = Xi – Xt
where Xi is measured value and Xt is the accepted true value

24
• b) Relative error (Er): is the absolute error divided by the true
value

If the measured value is the average of several measurements, the error is


called the mean error.
The relative error describes the error in relation to the magnitude of the
true value, and may, be more useful than considering the absolute error in
isolation.
The absolute error and the relative error are measures of accuracy of a
particular measurement.

Example: 1) The results of an analysis are 36.97 g, compared with the


accepted value of 37.06 g. a) What is the absolute error? b) What is the
relative error in parts per thousand?
Solution: Absolute error = 36.97 g – 37.06 g = -0.09 g
Relative error = (-0.09 g / 37.06 g) x 1000%o = -2.4 ppt %o indicates parts
per thousand, just as % indicates parts per hundred.
25
• If the measured value is the average of several
measurements, the error is called the mean
error. The relative error describes the error in
relation to the magnitude of the true value,
and may, be more useful than considering the
absolute error in isolation.

• The absolute error and the relative error are


measures of accuracy of a particular
measurement.
26
Errors in chemical analysis (analytical results)
• 2.3.1 Classification of Errors
• Two main classes of errors can affect accuracy or precision of a
measured quantity. These are systematic (or determinate) error
and random (indeterminate) error.
• A third type of error is gross error. Gross error differs from
indeterminate and determinate errors in the following respects.
 They usually occur occasionally, are often large, and may cause
a result to be either higher or low.
 Gross errors lead to outliers- results that appear to differ
markedly from all other data in a set of replicate
measurements.
• An outlier is an occasional result in replicate measurements
that differ significantly from the rest of the results.
27
• Gross error is normally large and essentially arises when a
significant mistake has been made with the analytical procedure
itself, so rendering the reading invalid.
• 2.3.1.1 Determinate (systematic) error(unidirectional errors. )
• There are three types of systematic errors.
• a) Personal and operational errors
• These are factors for which individual analyst is responsible and
are not connected with the method or procedure.
• Personal errors occur from the carelessness, inattention or
personal limitation of the experimenter.
• Personal errors may arise from the constitutional inability of the
individual to make personal observations accurately. Many
measurements require personal judgment.

28
• Examples include estimating position of a pointer between two
scale divisions, the color of a solution at the end point of
titration, level of liquid with respect to a graduation in a pipette
or burette.
• Some examples of personal errors include the following:
mechanical loss of materials in various steps of analysis, under
washing or over washing precipitate, ignition of precipitate at
incorrect temperature; insufficient cooling of crucibles before
weighing, allowing hygroscopic materials to absorb moisture
before weighing or during weighing,
• errors during transfer of solution, effervescence and ‘bumping’
during sample dissolution, incomplete drying of samples, and
mathematical errors in calculation and prejudice in estimating
measurement

29
• Most personal errors can be minimized by care and self
discipline.
b) Instrumental errors
• These arise from faulty construction of balances, the use of
uncalibrated or improperly calibrated weights, graduated glassware
and other instruments.
• Generally they include faulty instruments, uncalibrated weights
and uncalibrated glasswares.
• Instrument errors are caused by imperfection in measuring device
and instabilities in their power supplies; all measuring devices are
sources of systematic errors. For example, pipettes, burettes and
volumetric flasks may hold or deliver volumes slightly different
from those indicated by their graduations.

30
• These differences very arise from using glassware at a temperature
that differ significantly from the calibration temperature, from
distortion in container walls, due to heating while drying, from
errors is the original calibration or from contaminants on the inner
surfaces of the container.
• Systematic instrument errors are usually found and corrected by
calibrations.
• Periodic calibration of equipment is always desirable because the
response of most instruments change with time as a result of wear,
corrosion or mistreatment.

31
c) Method errors
• Method errors often arise from non-ideal chemical or physical
behavior of analytical system.
• The non-ideal chemical or physical behavior of the reagents and
reactions upon which an analysis is based often introduces
systematic method errors.
• Such sources of non-ideablity include
 the slowness of reaction,

the incompleteness of reactions,


the instability of some species,
the non specificity of most reagents and
 the possible occurrence of side reaction that
interferes with the measurements process.
32
 Some other sources of methodic errors include
 slight solubility of a precipitate,
 side reaction, copreciptation and
 a post precipitation of impurities,
 decomposition or volatilization of weighing forms on ignition,
 incomplete reaction and impurities in reagents.

 Method errors are the most serious errors of analysis. Most of


the personal and instrumental errors can be minimized or
corrected for, but errors that are inherent in the method can’t be
changed unless the conditions of the determination are altered.

33
• However, errors inherent in a method are often difficult to detect
and are thus the most serious of the three types of systematic
errors.
• Of the three types of systematic errors encountered in chemical
analysis, method errors are usually the most difficult to identify
and correct.
• Sometimes correction can be relatively simple, for example, by
running blank titration. Bias in an analytical method is
particularly difficult to detect.

34
• 2.3.1.2 Random or indeterminate errors
• Random or indeterminate errors arise when a system of
measurement is caused by many uncontrollable variables that are
inevitable part of every physical or chemical measurement.
• They are due to cause over which the analyst has no control, and
which are so intangible that they are incapable of analysis.
• They have no specific causes. There are many contributors to
random error, but none can be surely identified or measured
because most are so small that they can’t be detected individually.
• Indeterminate errors are random and can’t be avoided.
Indeterminate (random) errors accompany every measurement
and are due to non-permanent causes and include noise present in
the measurement.
• Random fluctuations of electronic signals appearing in a
recorded spectrum.
35
• Various types of random noise may occur in measurements, such
as electronic noise in a detector or noise due to non-reproducible
placement of a sample cuvette is the cell holder of
spectrophotometer, etc.
• Random errors represent the experimental uncertainty that occurs
in any measurement. The errors are revealed by small differences
in successive measurement made by the same analyst under
virtually identical condition and they can’t be predicted or
estimated.
• The accidental errors will follow a random distribution therefore;
mathematical laws of probability can be applied to arrive at some
conclusion regarding the most probable result of a series of
measurements. If a sufficiently large number of observations
(measurements) are taken, it can be shown that these errors lie on
the normal or Gaussian curve.

36
• An inspection of this error normal (Gaussian) curve shows a)
small errors occurs more frequently than large ones b) positive
and negative errors of the same numerical magnitude are
equally likely to occur.
• Random or indeterminate error causes data to be scattered
more or less symmetrically around a mean value. They are
bidirectional (positive and negative), and therefore affect the
results irregularly.
• Random errors are decreased to a certain extent by increasing
the number of measurements, but they can’t be eliminated,
since an infinite number of measurements would be required.
• In general, random error in measurement is reflected by its
precision. The total error observed in any chemical analysis is a
combination of the determinate and the random error.

37
Some of the features of systematic and random errors are shown in
the table below.

38
quiz
The reproducibility of a method for the determination of selenium
in foods was investigated by taking nine samples from a single
batch of brown rice and determining the selenium concentration
in each. The following results were obtained:
0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08 μg g−1
Calculate the mean, standard deviation and relative standard
deviation of these results.

39
Confidence limit and test of significance
• The mean of a set of analytical results is an estimate of the true mean for
the analysis. The true mean is the mean result of an infinite number of
analyses.

• It should be remembered that the true mean of a set of analytical results


is identical with the actual value, i.e., the correct concentration, only
when there is no determinate error in the analysis.
• In a practical situation relatively few (typically two
to five) replicate analyses of the sample are
performed. The smaller number of replicate
analyses may lead to significant deviation between
the mean of the replicate analyses and the true
mean.
40
• In a practical situation relatively few (typically two to five)
replicate analyses of the sample are performed.

• The smaller number of replicate analyses may lead to significant


deviation between the mean of the replicate analyses and the true
mean.

• it is possible to use statistics to determine, at a certain probability


level, the upper and lower boundaries between which the true
mean occurs.

• Statistical theory allows us to set limits around an experimentally


determined mean, , within which the population mean, m, lies
with a given degree of probability.

41
• Those upper and lower boundaries are the confidence limits. As
the number of analyses increases, the mean of the results
approaches the actual mean and confidence limits around the
mean move closer together.

• As the degree of certainty that the true mean is within the


confidence limits of the analytical mean increases, the confidence
limits form a large range around the mean.

• The confidence interval is defined as the range between the


confidence limits. As the confidence level increases, the
confidence interval also increases.

42
• Figure 2.6 shows the sampling distribution of the mean for
samples of size n. If we assume that this distribution is normal
then 95% of the sample means will lie in the range given by:

• In practice, however, we usually have one sample, of known


mean, and we require a range for μ, the true value. Equation
(2.6) can be rearranged to give this:

43
For large samples, the confidence limits of the mean are given by
• where the value of z depends on the degree of confidence
required.

• For 95% confidence limits, z = 1.96


• For 99% confidence limits, z = 2.58
• For 99.7% confidence limits, z = 2.97
For small samples, the confidence limits of the mean are given
by

The subscript (n − 1) indicates that t depends on this quantity,


which is known as the number of degrees of freedom, d.f.
(usually given the symbol ν).

44
The term ‘degrees of freedom’ refers to the number of independent
deviations which are used in calculating s. In this case the number is (n
− 1), because when (n − 1) deviations are known the last can be
deduced since .

45
• Example 2.6.1
• Calculate the 95% and 99% confidence limits of the mean for
the nitrate ion concentration measurements in Table 2.1.
• We have x = 0.500, s = 0.0165 and n = 50. Using equation
gives the 95% confidence limits as:

46
Example
• The sodium ion content of a urine specimen was determined by
using an ion-selective electrode. The following values were
obtained: 102, 97, 99, 98, 101, 106 mM. What are the 95% and
99% confidence limits for the sodium ion concentration?
• The mean and standard deviation of these values are 100.5 mM
and 3.27 Mm respectively.
• There are six measurements and therefore 5 degrees of freedom.
• From Table A.2 the value of t5 for calculating the 95%
confidence limits is 2.57 and the 95% confidence limits of the
mean are given by:

47
• Confidence intervals can be used as a test for systematic errors as
shown in the following example.
• The absorbance scale of a spectrometer is tested at a particular
wavelength with a standard solution which has an absorbance
given as 0.470. Ten measurements of the absorbance with the
spectrometer give = 0.461, and s = 0.003. Find the 95%
confidence interval for the mean absorbance as measured by the
spectrometer, and hence decide whether a systematic error is
present. The 95% confidence limits for the absorbance as
measured by the spectrometer are :

48
Tests of significance:
• Experimental data rarely agree completely with those expected on
the basis of a theoretical model.

• Consequently, scientists and analysts frequently must judge


whether a numerical difference is a manifestation of the random
error inevitable in all measurements or a systematic error in the
measurement processes.

• During comparison of two sets of experimental data, the


arithmetic difference between two values can be attributed to
either to determinate errors, in which case the difference is
significant, or indeterminate errors, in which case the difference is
insignificant.
49
• The checking of the significance of the observed differences is
done with special statistical tests.

• Tests of this kind make use of null hypothesis, which assumes that
the numerical quantities being compared are, in fact, the same. The
probability of the observed differences appearing as a result of
random errors is then computed from statistical theory.

• If the observed difference is greater than or equal to the difference


that would occur 5 times in 100 (the 5% probability level), the null
hypothesis is considered questionable and the difference is judged
to be significant.

50
• Other probability levels, such as 1 in 100 or 10 in 100, 90 in
100, 99 in 100 or 99.9% in 100 may also be adopted, depending
upon the certainty desired in the judgment.

• In developing a new analytical method, it is often desirable to


compare the results of that method (the new or test method) with
of an accepted (perhaps) standard method.

• How can one tell if there is a significant difference between the


new method and the accepted one? Certain statistical tests are
useful in sharpening our judgments.

51
a) Comparing of precision of two
measurements: F-test
• The F-test provides a simple method for comparing the
precision of two sets of measurements.
• This test is designed indicate whether there is a significant
different between two methods based on there standard
deviation.
• The sets do not necessarily have to be obtained from the same
sample as long as the samples are sufficiently alike that the
sources of random error can be assumed to be the same.
• F is defined in terms of the variance of the two methods
where the variance is the square of standard deviation.

52
• where S1 > S2. The large s is always used as numerator so that the value
of F is greater than unity. There are two different degrees of freedom V1
and V2 where the degrees of freedom is defined as
N-1 for each.
• If the calculated F value exceeds a tabulated F value at the selected
confidence level, then there is a significant different between the
variances of the method and this indicates the presence of
systematic errors in the measurement.
• However, if the calculated F value less than a tabulated F value at
the selected confidence level, then there is no statistically significant
different between the variances of the method and the
measurements is only due to random errors.

53
54
55
• example
• A proposed method for the determination of the chemical oxygen
demand of wastewater was compared with the standard (mercury salt)
method. The following results were obtained for a sewage effluent
sample:

For each method eight determinations were made.


Is the precision of the proposed method significantly greater than
that of the standard method?
We have to decide whether the variance of the standard method is
significantly greater than that of the proposed method. F is given by
the ratio of the variances: 56
• Both samples contain eight values so the number of degrees of
freedom in each case is 7. The critical value is F7,7 = 3.787 (P =
0.05), where the subscripts indicate the degrees of freedom of the
numerator and denominator respectively.
• Since the calculated value of F (4.8) exceeds this, the variance of
the standard method is significantly greater than that of the
proposed method at the 5% probability level, i.e. the proposed
method is more precise.

57
• example
• The standard deviation, sA, from one set of 11 determinations was
0.210 and the standard deviation, sB, from another 9 determinations
was 0.641. Is there any significant difference between the precision
of the two sets of results at 95% confidence level?
• Solutions:

From Table 2.5, for 10 degree of freedom (11 measurement) in the


denominator and eight degrees of freedom (9 measurements) in the
numerator, the table F value is 3.39. Since the calculated
(experimental) F value is greater than the corresponding table
(theoretical value), there is statistically significant difference
between the precision of the two sets of results.
58
• A new colorimetric procedure for determining the glucose
content of blood serum was developed by an analyst. The
standard Folin-Wu procedure was chosen with which to compare
the results obtained by the newly developed colorimetric method.
From the following two sets of replicate analyses on the same
sample, determine whether the variance (precision) of
colorimetric method differs significantly from that of the standard
method.

59
• The F-test is used to determine if two variances are statistically
different.
• For the tabulated F value for v1 = 6 and v2 = 5 is 4.95. Since the
calculated value is less than the tabulated value, there is no
significant difference in the precision of the two methods and
the difference in the standard deviations are due to random
error.

60
Comparison of an experimental mean with a
known value
• In order to decide whether the difference between the measured
and standard amounts can be accounted for by random error, a
statistical test known as a significance test can be employed. As
its name implies, this approach tests whether the difference
between the two results is significant, or whether it can be
accounted for merely by random variations. Significance tests are
widely used in the evaluation of experimental results.
• In making a significance test we are testing the truth of a
hypothesis which is known as a null hypothesis, often denoted
by H0. The term null is used to imply that there is no difference
between the observed and known values other than that which
can be attributed to random variation.
61
• Assuming that this null hypothesis is true, statistical theory can
be used to calculate the probability that the observed difference
(or a greater one) between the sample mean, and the true
value, μ, arises solely as a result of random errors.

• The lower the probability that the observed difference occurs by


chance, the less likely it is that the null hypothesis is true.

• Usually the null hypothesis is rejected if the probability of such


a difference occurring by chance is less than 1 in 20 (i.e. 0.05 or
5%). In such a case the difference is said to be significant at the
0.05 (or 5%) level.

62
• Using this level of significance there is, on average, a 1 in 20
chance that we shall reject the null hypothesis when it is in fact
true.

• In order to be more certain that we make the correct decision a


higher level of significance can be used, usually 0.01 or 0.001
(1% or 0.1%). The significance level is indicated by writing, for
example, P (i.e. probability) = 0.05, and gives the probability of
rejecting a true null hypothesis.

• It is important to appreciate that if the null hypothesis is


retained it has not been proved that it is true, only that it has not
been demonstrated to be false. Later in the chapter the
probability of retaining a null hypothesis when it is in fact false
will be discussed.

63
• In order to decide whether the difference between and μ is
significant, that is to test H0: population mean = μ, the statistic t is
calculated:
t = (x − μ)√n/s
• where = sample mean, s = sample standard deviation and n =
sample size.
• If (i.e. the calculated value of t without regard to sign) exceeds a
certain critical value then the null hypothesis is rejected.

• The critical value of t for a particular significance level can be


found from Table A.2. For example, for a sample size of 10 (i.e. 9
degrees of freedom) and a significance level of 0.01, the critical
value is t9 = 3.25,

64
• example
• In a new method for determining selenourea in water, the following
values were obtained for tap water samples spiked with 50 ng ml−1 of
selenourea: 50.4, 50.7, 49.1, 49.0, 51.1 ng ml−1 Is there any evidence of
systematic error?
• The mean of these values is 50.06 and the standard deviation is 0.956.
Adopting the null hypothesis that there is no systematic error, i.e. μ = 50,
and using equation

From Table A.2, the critical value is t4 = 2.78 (P = 0.05). Since the
observed value of is less than the critical value the null hypothesis is
retained:

there is no evidence of systematic error. Note again that this does not
mean that there are no systematic errors, only that they have not been
demonstrated.
65
66
Comparison of two experimental means
• Another way in which the results of a new analytical method may
be tested is by comparing them with those obtained by using a
second (perhaps a reference) method. In this case we have two
sample means

• Taking the null hypothesis that the two methods give the same
result, that is H0: μ1 = μ2, we need to test whether differs
significantly from zero.

• If the two samples have standard deviations which are not


significantly different, a pooled estimate, s, of the standard
deviation can be calculated from the two individual standard
deviations s1 and s2.
67
• In order to decide whether the difference between two sample
means is significant, that is to test the null hypothesis,
H0: μ1 = μ2, the statistic t is calculated:

• where s is calculated from:

• and t has n1 + n2 − 2 degrees of freedom.


• This method assumes that the samples are drawn from
populations with equal standard deviations.

68
• In a comparison of two methods for the determination of
chromium in rye grass, the following results (mg kg−1 Cr) were
obtained:

• Method 1: mean = 1.48; standard deviation 0.28


• Method 2: mean = 2.33; standard deviation 0.31
• For each method five determinations were made.

• Do these two methods give results having means which differ


significantly?
• The null hypothesis adopted is that the means of the results given
by the two methods are equal. the pooled value of the standard
deviation is given by:
69
• There are 8 degrees of freedom, so (Table A.2) the critical value
t8 = 2.31 (P = 0.05): since the experimental value of is
greater than this the difference between the two results is
significant at the 5% level and the null hypothesis is rejected.

• In fact since the critical value of t for P = 0.01 is about 3.36, the
difference is significant at the 1% level. In other words, if the null
hypothesis is true the probability of such a large difference arising
by chance is less than 1 in 100.
70
• In a series of experiments on the determination of tin in foodstuffs,
samples were boiled with hydrochloric acid under reflux for
different times. Some of the results are shown below:

Does the mean amount of tin found differ significantly for the two boiling times?
The mean and variance (square of the standard deviation) for the two times are:

The null hypothesis is adopted that boiling has no effect on the amount of tin
found. By equation (3.3), the pooled value for the variance is given by:

71
• There are 10 degrees of freedom so the critical value is t10 =
2.23 (P = 0.05). The observed value of (= 0.88) is less than
the critical value so the null hypothesis is retained: there is no
evidence that the length of boiling time affects the recovery
rate.

72
• If the population standard deviations are unlikely to be equal
then it is no longer appropriate to pool sample standard
deviations in order to give an overall estimate of standard
deviation. An approximate method in these circumstances is
given below:
• In order to test H0: μ1 = μ2 when it cannot be assumed that the
two samples come from populations with equal standard
deviations, the statistic t is calculated, where

73
• The data below give the concentration of thiol (mM) in the blood
lysate of the blood of two groups of volunteers, the first group
being ‘normal’ and the second suffering from rheumatoid
arthritis:
• Normal: 1.84, 1.92, 1.94, 1.92, 1.85, 1.91, 2.07
• Rheumatoid: 2.81, 4.06, 3.62, 3.27, 3.27, 3.76

• The null hypothesis adopted is that the mean concentration of


thiol is the same for the two groups.
• The reader can check that:

Substitution in equation (3.4) gives t = −8.48 and substitution in equation (3.5) gives 5.3,
which is truncated to 5. The critical value is t5 = 4.03 (P = 0.01) so the null hypothesis is
rejected: there is sufficient evidence to say that the mean concentration of thiol differs
between the groups.
74
Outliers
• Every experimentalist is familiar with the situation in which one
(or possibly more) of a set of results appears to differ
unreasonably from the others in the set. Such a measurement is
called an outlier.
• In order to use Grubbs’ test for an outlier, that is to test H0 : all
measurements come from the same population, the statistic G is
calculated:
(3.8)
• where and s are calculated with the suspect value included.
• The test assumes that the population is normal.

75
• The following values were obtained for the nitrite concentration
(mg l−1) in a sample of river water:
0.403, 0.410, 0.401, 0.380
• The last measurement is suspect: should it be rejected? The four
values have = 0.3985 and s = 0.01292, giving

• From Table A.5, for sample size 4, the critical value of G is


1.481 (P = 0.05). Since the calculated value of G does not
exceed 1.481, the suspect measurement should be retained.

76
• If three further measurements were added to
those given in the example above so that the
complete results became:
• 0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.408
• should 0.380 still be retained?
• The seven values have = 0.4021 and s = 0.01088.
The calculated value of G is now

• The critical value of G (P = 0.05) for a sample size


7 is 2.020, so the suspect measurement is now
rejected at the 5% significance level.
77
• Dixon’s test (sometimes called the Q-test) is another test for
outliers which is
• popular because the calculation is simple. For small samples (size
3 to 7) the test assesses a suspect measurement by comparing the
difference between it and the Outliers measurement nearest to it
in size with range of the measurements.

78
• In order to use Dixon’s test for an outlier, that is to test H0 : all
measurements come from the same population, the statistic Q is
calculated:

• This test is valid for samples size 3 to 7 and assumes that the
population is normal.

• The critical values of Q for P = 0.05 for a two-sided test are given in
Table A.6. If the
• calculated value of Q exceeds the critical value, the suspect value is
rejected.

79
80

You might also like