You are on page 1of 2

Chemistry 330 notes for statistical analysis, winter 2010, by V.

Colvin
Any quantitative analytical chemistry experiment will rely on replicate measurements to evaluate the
precision of the reported values. Some basic, simple statistics are commonly used to describe the
reproducibility of the measurement. Let xi be the set of individual measurements and N the total
number of measurements. N-1 is sometimes called the degrees of freedom. We describe:
N

x i
x i 1
this assumes a normal distribution in the measurements …
N
The average value is called the mean, or sometimes the arithmetic mean. To get at the spread in the
data we calculate for each measurement how far it is from the average. Because the difference is
squared, it doesn’t matter if a value is higher or lower than the mean – just how far it is from the mean:
N

 (x i  x) 2
s i 1
for situations where N < 25
N 1
The average standard deviation, s, describes the ‘spread’ in a dataset. This means that 95% of the
measurements are expected to fall within ± 2s of the average; roughly 68% within ± s.
If you make more than 25 measurements, then effectively N-1 is well approximated by N, and the
measure of the standard deviation is considered precise. We now use sigma, , to convey the standard
deviation:
N

 (x i  x) 2
 i 1
for situations where N >= 25
N
For the rest of this handout, I will use s and understand that when N is large enough it should be . It
is conventional for chemists to describe the uncertainty in a dataset using the error bar convention
where the reported value (referred to here as answer) is:
answer  x  s chemical convention for error bars
The problem with this convention is that it doesn’t reflect the level of confidence in how well the
dataset measures the ‘true’ value. For example, if N = 100 then the standard deviation quite
confidently describes reality; conversely, if N = 3 you can calculate average and standard deviations
but there is a lot of uncertainty in both values.
To handle this ambiguity and be more precise about imprecision (sorry), we need to add one additional
twist to how we think about error bars: the confidence interval. Experimental data is always limited,
and as more measurements are collected you become more and more confident that the mean value of a
dataset is equal to its ‘true’ value. If you have only an N of four, or three degrees of freedom or DF,
then your calculated s is not very confident. If you have one thousand degrees of freedom, then the
range is well described by s. Statisticians have a way to make this a lot more precise by introducing:
s
answer  x  t  find ‘t’ from table specifying your needed level of confidence
N

1
Degree of confidence (%)
Degrees of Freedom
90 95 99 99.9
1 6.31 12.71 63.66 636.62
2 2.92 4.30 9.93 31.60
3 2.35 3.18 5.84 12.92
4 2.13 2.78 4.60 8.61
5 2.02 2.57 4.03 6.87
6 1.94 2.45 3.71 5.96
7 1.89 2.37 3.50 5.41
8 1.86 2.31 3.36 5.04
9 1.83 2.26 3.25 4.78
10 1.81 2.23 3.17 4.59
11 1.80 2.20 3.11 4.44
12 1.78 2.18 3.06 4.32
13 1.77 2.16 3.01 4.22
14 1.76 2.14 2.98 4.14
15 1.75 2.13 2.95 4.07
16 1.75 2.12 2.92 4.02
17 1.74 2.11 2.90 3.97
18 1.73 2.10 2.88 3.92
19 1.73 2.09 2.86 3.88
20 1.72 2.09 2.85 3.85
21 1.72 2.08 2.83 3.82
22 1.72 2.07 2.82 3.79
23 1.71 2.07 2.82 3.77
24 1.71 2.06 2.80 3.75
Infinity 1.65 1.96 2.58 3.29

You might also like