Statistics: Error (Chpt. 5)

Statistics: Error (Chpt.
5)
• Always some amount of error in every analysis (How much can you tolerate?)
• We examine error in our measurements to know reliably that a given amount of analyte is
in the sample
• To determine the error in the measurement, we run replicate samples: samples of about
the same size that are carried through an analysis in exactly the same way
• If a measurement has no error, the replicate samples should yield the same answer. This
does not happen
• With replicate data, we usually report the mean or average
• In some instances, we are interested in the median: middle value in a set of data that has
been arranged in order of size
• Median is important in data sets with outlier. Outliers can have large effects on the mean,
but they will have little effect on the median.
• Example: Consider masses: 3.080, 3.094, 3.107, 3.056, 3.112, 3.174, 3.198
What happens if record 31.07 on accident?
1
Statistics: Error (Chpt. 5)
Precision vs. Accuracy
Precision is the closeness of data to other data that have been obtained in exactly the same
way
High precision measurements have small standard deviations, variance, and coefficient of
variance. These terms are a function of deviation from the mean value and have no
relationship to the true value.
Accuracy is the closeness of a result to
its true or accepted value. Accuracy
determines how much error is in the
method, not how reproducible the
method is
2
Error related to Accuracy
Absolute error: difference between the measured value and the true value. It bears a sign
E = xi – xt where xt is true or accepted value and xi is measured value
Relative Error: absolute error divided by true value (aka % error)

Example: True value is 20.0 ppm and measured value is 19.8 ppm
Precision is determined by comparing replicate data, but accuracy is not as easy to

determine (we usually don’t know the true value)
Different types of error

Random (or indeterminate) errors: affect the precision of measurement; non-traceable
Systematic (or determinate) errors: affect the accuracy of results; traceable; has assignable
cause; same magnitude for replicate measurements
Gross Error (aka outlier): quite large, don’t occur often, caused by human error (loss of
precipitate, etc.)
3
Sources of systematic errors
1. Instrumental errors (fixed by calibration)

- volumetric glassware may differ from listed value
- electrical: increased resistance from dirty contacts or temperature changes
2. Method errors (from non-ideal behavior of reagents used in analysis)

- slow reactivity between analyte and titrant, side reactions, end point vs. equiv. point
- often most difficult to detect
- fixed by doing analysis of standard samples (standard reference material) and/or by
performing blank determinations
- also fixed by cross validation with other method
3. Personal errors (fix by taking care and doing replicates)

- incorrect reading of liquid level in a buret
- error in detecting color change in titrated (esp. if color blind)
- prejudice in numerical readings
- incorrect significant figures
4
Effect of systematic error on results
1. Constant error: same amount of error is made each time, but the relative error will change
- independent of sample size
- becomes serious as sample size decreases
Example: 0.5 mg of a precipitate (ppt) is lost as a result of a wash with liquid. Calculate
Er if ppt is 500 mg or 50 mg
2. Proportional Error: absolute error changes, but the relative error remains constant
- dependent on sample size
- changes with sample size
Example: When washing a ppt with a liquid a proportional error is occurring. If the Er is
2.5% calculate E for washing a 50 mg and 500 mg ppt
5
Random Errors (Chpt. 6)
Significant Figures
General rule: don’t report what you don’t know
See pages 134-136, you must know these, we won’t cover
1. Addition/subtraction
Do not be more specific that your least specific number
2. Multiplication/Division
same general rules apply
6
All measurements have random error (can only be minimized not eliminated)
Consider measuring the volume dispensed by a 10-mL volumetric pipet
As N >30, starts to form bell-shaped curve

Central limit theorem: distribution of measurements subject to random errors is
often a normal distribution (Gaussian distribution)
7
Properties of a Gaussian Curve
Population (collection of all measurements of interest to a experiment) vs. sample (subset
of measurements selected from the population)
Population mean (µ) vs. sample mean ( )
Precision = closeness of data to other data that have been obtained in a similar manner,
expressed usually by standard deviation
Population std. dev. (σ)
8
Properties of a Gaussian Curve
z-variable: deviation from the mean relative to the standard deviation, describes all
populations of data regardless of standard deviation
µ ± 1σ = 68.3%
µ ± 2σ = 95.5%
µ ± 3σ = 99.7%
Sample standard deviation

�(s): more calculator friendly
�n
i=1 (xi − x̄) )
2
s=
n−1
• Use sample std. dev. (s) with data sets of 30 points or less
• Lower value of s indicates better precision
• Scatter from “true” value will decrease as N is increased
• What is n-1? Degrees of freedom: anytime you make an assumption, lose one degree
of freedom, N-1 = # of data that remain independent
9
Relative standard deviation
RSD (parts per thousand) =
Coefficient of variation (% RSD) =
Standard error of the mean (Sm)
- Shows relationship between mean and std. dev.
Pooled Standard Deviation

SPooled is used to pool standard deviations from different measurements, done when
increasing # of measurements is not possible (several subsets of data)
When have 2 sets of data can simplify to be calculator friendly (not in the book)
10
Statistical Treatment of Data (Chpt. 7)
Scientists use statistical calculations to judge the quality of experimental measurements
These calculations are based upon means, standard deviations, Gaussian curves and test
statistics
Confidence Limits
define an interval around the experimentally determined mean that “probably” contains the
population mean (µ)
If population standard deviation is known:
Again, CI decreases by
Value for z depends on confidence level in measurement
Confidence interval is:
Example: Determine 80% and 95 % confidence interval for

experimentally determined glucose level of 1108 mg/L
if s = 19 mg/L and s is good estimator of σ (n=7)
11
But…..s is not always a good estimator of σ
Then use t statistic, which depends on the number of measurements
Example: A chemist found the following data for the alcohol content of a sample of blood:
0.084%, 0.089%, and 0.079%. Calculate the 95% confidence level for the mean.
12
Often use t or z statistic to accept or reject data: Hypothesis testing
Null hypothesis: postulates that there is no difference between two observed quantities
Rules for hypothesis testing when true mean is known:

1. Write null hypothesis
2. Depending upon whether σ or s is to be used, look up corresponding test statistic
(z or t) for a given confidence level
3. Determine zcal or tcal
4. If calculated value is greater than table value, reject null hypothesis

If calculated value is less than table value, accept null hypothesis
Example: A new procedure for test sulfur in fuel.

Certified standard gives 0.123%S. New test (n=4)
gives 0.112, 0.118, 0.115 and 0.119% S. Is there a bias
at the 95% confidence level?
13
Often times want to compare two different experimentally determined means (N<30)
Use spooled and different tcal formula but same t table
Example: Analysis of two barrels of wine for alcohol content. 6 analyses of 1st barrel =
12.61%; 4 analyses of 2nd barrel = 12.53%. 10 analyses spooled = 0.070% At 95% CL, is
there a difference between the 2 wines?
Note: number of degrees of freedom: N1+N2 -2
14
Comparison of precision: F-test
Similar to t-tests, but this test compares precision of two sets of data
Can be used to test experimental and true standard dev.
Fcalc=
so that Fcalc > 1.0
if Fcalc > Ftable:

reject null hypothesis
Example: Standard method for measuring CO = std dev of 0.21 ppm. This is a well
established method that has been performed a number of times. Modification of method that
was done 13 times leads to std. dev. of 0.15 ppm. Is one method more precise?
15
Test for Outliers: Q-test
Is the outlier from a gross error?
For small data sets, it is best to try and collect more
data
If not possible apply Q-test
where xq is questionable result, xn is nearest neighbor, and w is spread
If Qexp is greater than Qcrit then reject the questionable result, it is from a gross error
Consider the following data set: 81, 100, 101, 102, 103. Is 81 bad at 99% CL?
16

Statistics: Error (Chpt. 5)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics: Error (Chpt. 5)

Uploaded by

Copyright:

Available Formats

Statistics: Error (Chpt.

• With replicate data, we usually report the mean or average

Relative Error: absolute error divided by true value (aka % error)

Precision is determined by comparing replicate data, but accuracy is not as easy to

Different types of error

1. Instrumental errors (fixed by calibration)

2. Method errors (from non-ideal behavior of reagents used in analysis)

3. Personal errors (fix by taking care and doing replicates)

Do not be more specific that your least specific number

same general rules apply

As N >30, starts to form bell-shaped curve

Population mean (µ) vs. sample mean ( )

Population std. dev. (σ)

Sample standard deviation

Coefficient of variation (% RSD) =

Standard error of the mean (Sm)

- Shows relationship between mean and std. dev.

Pooled Standard Deviation

If population standard deviation is known:

Value for z depends on confidence level in measurement

Confidence interval is:

Example: Determine 80% and 95 % confidence interval for

Rules for hypothesis testing when true mean is known:

4. If calculated value is greater than table value, reject null hypothesis

Example: A new procedure for test sulfur in fuel.

so that Fcalc > 1.0

if Fcalc > Ftable:

where xq is questionable result, xn is nearest neighbor, and w is spread

You might also like