CHM025 - Topic II Evaluation of Analytical Data

CHM020 Topic II:
Evaluation of Analytical Data

“It is impossible to perform a chemical
analysis that is totally free of errors, or
uncertainties. All can hope is to minimize
these errors and to estimate their size
with acceptable accuracy.”
2nd Semester 2020-2021

Errors in Chemical Analysis
 Every measurement is influenced by many
uncertainties, which combine to produce a scatter of
results.
 Measurement uncertainties can never be completely
eliminated, so the true value for any quantity is
always unknown.
 Therefore, it would be wrong to say that blood
testing is “error-free” when presenting data to
court. Remember: data of unknown quality are
worthless.
Measures of Central Tendency
 Chemists usually carry three to five replicates
(portions) of a sample through an analytical
procedure.
 Individual results from a set of measurements are
seldom the same, so a central or “best” value is
used for the set.
1. Mean, x
 arithmetic mean or average
 the quantity obtained by dividing the sum of
replicate measurements (xi) by the number of
measurements (N) in the set.
Mathematically speaking:
= ( x1 + x2 + x3 + x4 + … + xN )
N
2. Median, M
 middle value of a sample of results arranged in
order of increasing/decreasing magnitude.
 odd number of results
 take the middle value
 even number of results
 take the mean of the two middle values
Example 1
Calculate the mean and the median for the following data:
20.3, 19.4, 19.8, 20.1, 19.6 , 19.5
Mean = 19.8 Median = 19.7
Ideally, the mean and the median are identical.

Frequently they are not particularly when the number of
measurements in the set is small.
The median is used advantageously when a set of
data contains an outlier. An outlier can have a
significant effect on the mean but lesser on the median.
3. Mode
 the value that occurs most frequently in a set of
determinations.
Precision
 describes the reproducibility of the measurements.
 tells how close the results are, provided that they are obtained in
exactly the same way.
 deals with repeatability (within-runs) and reproducibility (between-
runs).
 three terms are widely used to describe the precision of a set of
replicate data:
 standard deviation
 Variance
 coefficient of variation.
All these terms are a function of the deviation from the mean which
is defined as:
 deviation from the mean = di  / xi  xt /
Accuracy
 indicates the closeness of the measurements to its
true value or accepted value and is expressed by
the error (or simply the proximity to the true value).
 is expressed in terms of:
a. absolute error, E = xi - xt
b. relative error, Er
xi  xt
Er  x100% (in terms of %)
xt
xi  xt
Er  x1000 (in terms of ppt)
xt
Illustration


Tell whether:  



(high or low) accuracy  X
 
X

(high or low) precision  

 
(X represents the true low accuracy low accuracy
value low precision high precision
() represents the

 

replicates) 
 X 
 

X


 

high accuracy high accuracy
low precision high precision
Example 2
Determine the relative error (in % and ppt) and the
absolute error for the mean in ex. 1 above given that
the true value is 20.0
Absolute error, E = 0.2 Relative error, Er = 1%
Er = 10ppt
Note: Accuracy measures the agreement between a

result and its true value. Precision describes the
agreement among several results that have been
measured in the same way.
Types of Errors
1. Random / Indeterminate Error
 causes data to be scattered more or less systematically around a mean
value.
 reflected by the precision.
2. Systematic / Determinate Error

 causes the mean of a set of data to differ from the accepted value.
 causes the results in a series of replicate measurements to be all high or

low.
3. Gross Error
 occur only occasionally, are often large and may cause a result to be
either high or low.
 leads to outliers, results that obviously differ significantly from the rest of
the data of replicate measurements.
Sources of Systematic Errors
a. Instrumental error
 caused by imperfection of measuring devices and
instabilities in their power supplies.
 glasswares used at temperatures that differ from their
calibration temperature
 distortion of container walls
 errors in the original calibration
 contaminants on the inner surface of the containers

b. Methodic error
 arises from non-ideal chemical or physical behavior of
analytical systems.
 due to slowness of some reactions
 incompleteness of a reaction
 instability of some species
 nonspecificity of the reagents
 possible occurrence of side reactions

c. Personal error
 results from the carelessness, inattention or personal
limitations of the experimenter.
 estimating the level of the liquid between two scale
divisions
 the color of the solution at the end point in a titration
Persons who make measurements must guard against

personal bias to preserve the integrity of the collected
data. Of the three types of systematic errors encountered
in a chemical analysis, methodic errors are usually the
most difficult to identify and correct.
Detection of Systematic Error:
a. systematic instrument error
 usually found and corrected by calibration.
 periodic calibration of the equipment is always desirable
because the response of most instruments changes with time as
a result of wear, corrosion and mistreatment.
b. systematic personal error
 can be minimized by and self-discipline.
 a good habit is to check the instrument readings, notebook
entries, and calculations systematically.
c. systematic methodic error
 analytical method has its biases and difficult to detect.
 One or more of the following steps recognize and
adjust for a systematic error in an analytical
method:
a. analysis of standard samples
 the analysis of the standard reference materials, SRM
(materials that contain one or more analytes with
exactly known concentration levels.)
 standard material can be purchased or sometimes
prepared by synthesis but unfortunately, this often
impossible or so difficult and time consuming that this
approach is not practical.
Standard Reference Materials (SRM)
 SRM can be purchased from a number of governmental and

industrial sources (e.g. National Institute of Standards and
Technology, NIST which offers over 900 SRM)
 Concentration of the SRM has been determined in one of the
three ways:
 through analysis by previously validated reference method,
 through analysis by two or more independent, reliable measurement
methods,
 through analysis by a network of cooperating laboratories, technically
competent and throughly knowledgeable with the material being
tested.
b. independent analysis
a second independent and reliable analytical
method to be used in parallel with the method
being evaluated.
 should differ as much as possible from the method
used.
 This minimizes the possibility that some common
factor in the sample has the same effect on both
methods.
c. blank determination
 useful for detecting certain types of constant errors.
 all steps of the analysis are performed in the absence
of the sample.
 the results from the blank are then applied as a
correction to the sample measurements.
 this reveals errors due to interfering contaminants from
the reagents and vessels employed in the analysis.
 this also allow the analyst to correct titration data for
the volume of reagent needed to cause an indicator to
change color at the end-point.
d. variation in sample size
 can detect constant errors (as the size of the
measurement increases, the effect of a constant
error decreases).
Random Errors
 arisewhen a system of measurement is extended to
its maximum sensitivity. This type of error is caused
by many uncontrollable variables that are an
inevitable part of every physical or chemical
measurement.
 the accumulated effect of the individual
indeterminate uncertainties, however, causes
replicate measurements to fluctuate randomly
around the mean of the set.
A histogram showing distribution of the 50
results in calibrating a 10-mL pipet
Figure 6-3 A histogram showing distribution of the 50 results in calibrating

a 10-mL pipet. A Gaussian curve for data having the same mean and
standard deviation as the data in the histogram.
Sources of Random Errors
The frequency distribution of

some results or measurements
can be plotted into different
(b) Frequency distribution for
forms of graphs. measurements containing 10 random
uncertainties
(a) Frequency distribution for measurements (c) Frequency distribution for measurements
containing 4 random uncertainties containing a very large number of random
uncertainties
Statistical Treatment of Random Errors
 sample – a finite number of experimental

observations; a tiny fraction of infinite number of
observations.
 population or universe – theoretical infinite number
of data.
 population mean, μ – true mean of the population;
in the absence of any systematic error, this is also
the true value for the measured quantity.
 population mean, μ – true mean of the population;

in the absence of any systematic error, this is also
the true value for the measured quantity.
n
x i
 i 1 where N = 
N
 sample mean, – the mean of a limited sample drawn

from the population of the data.
x
i 1
i
where N is finite
x
N
Measures of Precision
 population standard deviation, σ – a measure of the
precision of a population of data and is
mathematically given by:
 x  
2
i
 i 1
N
 sample standard deviation, s – a measure of the
precision of a sample of data and is
mathematically given by:
 x 
n 2
i x
i 1
s
N 1
 standard deviation of the mean, S or sm
s
S = -----
N
Other ways of expressing precision:
 Variance, s2 – simply the square of the standard

deviation.
 x 
n
2
i x
i 1
s 
2
N 1
 Relative standard deviation, RSD, and coefficient of

variation, CV
s
RSD = x 1000 ppt
x
s
CV = x 100%
x
3. spread or range, w
 the
difference between the largest value and the
smallest in the set of data.
w = highest value – lowest value
Example 3
The following results were obtained in the replicate
determination of the lead content of a blood sample:
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Calculate the mean, standard deviation, standard

deviation of the mean and relative standard
deviation.
Example 3
The following results were obtained in the replicate
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Mean = 0.754
Standard deviation = 0.004
Standard deviation of the mean = 0.002
Relative standard deviation (RSD) = 4.996 ppt
Reliability of s as a Measure of Precision
 Most of the statistical tests described are based

upon sample standard deviations, and the
probability of correctness of the results of these
tests improves as the reliability of s becomes
greater. Uncertainty in the calculated value of s
decreases as N increases. When N is greater than
20, s and σ can assumed to be identical for all
practical purposes.
Pooling Data to Improve the Reliability of s
 data from a series of similar samples accumulated over time

can often be pooled to provide an estimate of s superior to
the value of the individual subset.
 mathematical equation of the superior s or pooled standard
deviation:
 x    x 
N1 N2
2 2
i  x1 j  x2  ...
i 1 j 1
S pooled 
N 1  N 2  N 3  ...  N T
where: N1 = number of data in set 1
N2 = number of data in set 2
NT = number of data sets that are pooled
N1 + N2 + … – NT = degrees of freedom
Example: S pooled
Glucose levels are routinely monitored in patients suffering from
diabetes. The glucose concentrations in a patient with mildly elevated
glucose levels were determined in different months by a
spectrophotometric analytical method. The patient was placed on a low-
sugar diet to reduce the glucose levels. The following results were
obtained during a study to determine the effectiveness of the diet.
Calculate a pooled estimate of the standard deviation for the method.
Example: S pooled
Glucose levels are routinely monitored in patients suffering from
diabetes. The glucose concentrations in a patient with mildly elevated
glucose levels were determined in different months by a
spectrophotometric analytical method. The patient was placed on a low-
sugar diet to reduce the glucose levels. The following results were
obtained during a study to determine the effectiveness of the diet.
Calculate a pooled estimate of the standard deviation for the method.
 x    x 
N1 N2
2 2
i  x1 j  x2  ...
i 1 j 1
S pooled 
N 1  N 2  N 3  ...  N T
x +x1182.80
  +x1086.80
 x + 2950.86 x  x   
N1 N2 N1 N
 6907.89
2 2 2
1687.43
i 1  ... j 2 i 1
i 1 j 1 i 1
S pooled  S pooled 
N 1 7 N+2 
5N+3 5 ...
+ 7 N- T 4 24N 1- 4 N 2  N
 x    x 
N1 N2
2 2
i  x1 j  x2  ...
i 1 j 1
S pooled 
N 1  N 2  N 3  ...  N T
x +x1182.80
  +x1086.80
 x + 2950.86 x  x   
N1 N2 N1 N
 6907.89
2 2 2
1687.43
i 1  ... j 2 i 1
i 1 j 1 i 1
S pooled  S pooled 
N 1 7 N+2 
5N+3 5 ...
+ 7 N- T 4 24N 1- 4 N 2  N
The Confidence Limit
 The exact value of the mean, μ, for a population of
data can never be determined exactly because such
a determination requires that an infinite number of
measurements be made. Statistical theory, however,
allows us to set limits around an experimentally
determined mean within which the population mean
lies with a given degree of probability. These limits
are called confidence limits, and the interval they
define is known as the confidence interval, CI.
CI for μ = x ± t s
N
The Confidence Limit
The value of t
depends on the
desired confidence
level and on the
number of degrees of
freedom ( N – 1 ) in
the calculation of s.
Example 4
From the same set of data in Example 3, replicate
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Mean = 0.754
Standard deviation = 0.004
Standard deviation of the mean = 0.002
Relative standard deviation (RSD) = 4.996 ppt
Calculate the 95% confidence interval.

Example 4
From the same set of data in Example 3, replicate
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Calculate the 95% confidence interval.
CI for μ = x ± ts
N
Confidence limit = 0.754 ± 2.78 (0.00377) = 0.754 ± 0.005
5
Confidence interval = 0.750 – 0.759
Detection of Gross Error
 When a set of data contains an outlying result that
appears to differ exclusively from the average, the
decision must be made whether to retain or reject it.
It is an unfortunate fact that no universal rule can be
invoked to settle the question of retention or
rejection.
The Q-test
 is a simple, widely used statistical test; Qexp is the
absolute value of the questionable result Xq and its
neighbor Xn (provided that the result was arranged in
increasing or decreasing order) divided by the range
or spread of the entire set.
Xq  Xn
Qexp 
w
Example 5
2. Apply Q-test to the following set of data and determine
whether the outlying result is retained or rejected at 95%.
 41.27, 41.71, 41.84, 41.78
41.27 - Xq w, range = 41.84 - 41.27 = 0.57

41.71 - Xn Xq  Xn
41.78 Qexp  = /41.27-41.71/ = 0.772
41.84 w 0.57
Qexp 0.772 < Qcrit = 0.829  retain
So, Mean = 41.65 and Median = 41.75
Reported value is 41.75
What if the outlier is to be rejected? The reported value is the mean
(without the outlier) = 41.78
Recommendation for Treatment of Outliers:
 Reexamine carefully all data relating to the outlying result

to see if a gross error could have affected its value.
 If possible, estimate the precision that can be reasonably
expected from the procedure to be sure that the outlying
result actually is questionable.
 If more data cannot be secured, apply Q-test to the existing
set if the doubtful result should be retained or rejected on
statistical grounds.
 If the Q-test indicates retention, consider reporting the median
of the set rather than the mean. The median has the great
virtue of allowing inclusion of all data in a set without undue
influence from an outlying value. In addition, the median of
a normally distributed set containing 3 measurements
provides a better estimate of the correct value than the
mean of the set after an outlying value has been discarded.
Least-Square Method
(A tool for calibration plots)
 Most analytical methods are based on
experimentally determined/derived calibration
plots/curves in which a measured quantity, y, is
plotted as a function of x. The ordinate is the
dependent variable and the abscissa is the
independent variable. As is typical and desirable,
the plot approximates a straight line. Note
however than along the process, indeterminate
errors may arise and consequently not all data fall
exactly on the same line.
Least-Square Method
(A tool for calibration plots)
Assumptions:
1. There is actually a linear relationship between the measured
variable, y, and the analyte concentration, x.
Recall: Equation of the Line: y = mx + b

where b = y-intercept (the value of y when x is zero)
m = slope of the line.
2. Any deviations of the individual points from the straight line

results from error in the measurement, that is, there is no error
in the x values of the points.
Application of Statistics to Data Treatment
and Evaluation
Experimentalist use statistical calculations to sharpen their
judgments concerning the quality of experimental
measurements. The most common application of statistics to
analytical chemistry includes:
a. establishing confidence limits for the mean of a set of
replicate data.
b. determining the number of replications required to
decrease the confidence limit for a mean for a given level
of confidence.
c. determining at a given probability whether an
experimental mean is different from the accepted value
for the quantity being measured.
d. determining at a given probability level whether two
experimental means are different.
.
Application of Statistics to Data Treatment
and Evaluation
Experimentalist use statistical calculations to sharpen their
judgments concerning the quality of experimental
measurements. The most common application of statistics to
analytical chemistry includes:
e. determining at a given probability level whether precision
of two sets of measurements differs.
f. deciding whether an outlier is probably the result of a
gross error and should be discarded in calculating a mean.
g. defining and estimating detection limits
h. treating calibration data.
i. in quality control of analytical data and of industrial
products.
Exercises 1
1. Consider the following sets of replicate samples.
0.0902, 0.0980, 0.0956, 0.1000, 0.0925
Calculate the:
a. mean,
b. median,
c. range,
d. standard deviation,
e. coefficient of variation,
f. standard deviation of the mean
g. 95% confidence limit.
(Observe rules on significant figures)

Exercises 2
2. Apply Q-test to the following set of data and determine
whether the outlying result is retained or rejected at 95%
probability. What is then the reported value?
 7.290, 7.284, 7.388, 7.292
Exercises 3
3. The sulfate ion concentration in natural water can be
determined by measuring the turbidity that results when an excess
of BaCl2 is added to a measured quantity of the sample. A
turbiditimeter, the instrument used for this analysis, was calibrated
with a series of standard Na2SO4 solutions. The following data
were obtained for the calibration:
Assume that a linear relationship exists Cx , mg SO42– Turbiditimeter
between the instrument reading and /L reading, R
concentration. 0.00 0.06
a. Derive an equation of a line that
tells us the relationship of Cx and
5.00 1.48
R. 10.00 2.28
b. What is the concentration of 15.00 3.98
sulfate when the turbiditimeter 20.00 4.61
reading was 3.67?
6
a. y = 0.232x + 0.162
5 R² = 0.9834
4
3
2
1
0
0 5 10 15 20 25
b. x = y - 0.162 = 3.67 - 0.162 = 15.12

0.232 0.232
R2, the coefficient of determination
 a better measure for best fit in a line.
 the goodness of fit is judged by the number of 9’s.
 three 9’s (0.999) or better represents an excellent fit.
References
 D.A. Skoog, D.M. West, F.J. Holler, and S.R.
Crouch, Fundamentals of Analytical Chemistry,
9th ed., Thomson Learning Asia, Singapore,
2014.
 Supplemental Notes
 Web references

CHM025 - Topic II Evaluation of Analytical Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHM025 - Topic II Evaluation of Analytical Data

Uploaded by

Copyright:

Available Formats

CHM020 Topic II:

Evaluation of Analytical Data

2nd Semester 2020-2021

Ideally, the mean and the median are identical.

() represents the

Note: Accuracy measures the agreement between a

2. Systematic / Determinate Error

 causes the results in a series of replicate measurements to be all high or

 errors in the original calibration

 contaminants on the inner surface of the containers

 instability of some species

 nonspecificity of the reagents

 possible occurrence of side reactions

Persons who make measurements must guard against

 SRM can be purchased from a number of governmental and

Figure 6-3 A histogram showing distribution of the 50 results in calibrating

The frequency distribution of

 sample – a finite number of experimental

 population mean, μ – true mean of the population;

 sample mean, – the mean of a limited sample drawn

 Variance, s2 – simply the square of the standard

 Relative standard deviation, RSD, and coefficient of

Calculate the mean, standard deviation, standard

 Most of the statistical tests described are based

 data from a series of similar samples accumulated over time

Calculate the 95% confidence interval.

41.27 - Xq w, range = 41.84 - 41.27 = 0.57

 Reexamine carefully all data relating to the outlying result

Recall: Equation of the Line: y = mx + b

2. Any deviations of the individual points from the straight line

(Observe rules on significant figures)

b. x = y - 0.162 = 3.67 - 0.162 = 15.12

You might also like