Statistics For Anal. Chem. - Lecture Notes - Xu Ly So Lieu

Statistics for Analytical
Chemistry
Lecture Notes
Dr. Ta Thi Thao

Syllabus
( 2 credits)
Introduction of Analytical process
Chapter 1: Error in analytical chemistry
Chapter 2: Descriptive statistics
Chapter 3: Basic Distributions
Chapter 4: Significant Test
Chapter 5: ANOVA
Chapter 6: Correlation and Regression
Chapter 7: QA/QC
Software: EXCEL, ORIGIN, MINITAB,
MATLAB, STATGRAPHICS, SPSS

What is analytical chemistry?
Almost all chemists routinely make qualitative or
quantitative measurements.
Analytical chemistry is not a separate branch of
chemistry, but simply the application of chemical
knowledge.
The craft of analytical chemistry is not in
performing a routine analysis on a routine sample
but in improving established methods, extending
existing methods to new types of samples, and
developing new methods for measuring chemical
phenomena.

The analytical process
1. Definition problems
2. Chose methods

3. Sampling
4. Sample preparation

5. Chemical Separation

6. Analysis

7. Data processing and Report Results

The analytical process (cont.)
1. Define the problem
What need to be found?
Qualitative or quantitative?
What will the information be used for? Who will
be used?
When will it be needed?
How accurate and precise does it have to be?
What is the fund?
The analysts should consult with the clients to
plan the useful and efficient analysis, including
how to obtain a useful sample.
2. Choose methods
Sample type; - Size of sample;
Sample preparation needed;
Concentration and range (sensitivity needed)
Selectivity needed (interferences)
Accuracy and precision needed
Tools/ instruments available
Expertise/ experience
Cost - Speed
Does it need to be automated?
Are method available in the chemical literature?
Are standard methods available?
3. Sampling
Sample type
Representative/ random sample
Sample size
Minimum sample number
Sampling statistics/ error
4. Sample preparation
Samples are solid, liquid or gas?
Dissolve?
Ash or digest?
Chemical separation or masking of
interferences needed?
Need to be concentrate the analysts?
Need to change analyst for detection?
Need to adjust solution conditions ( pH, add
reagent)
5. Chemical Separation if necessary
Distillation
Precipitation
Solvent Extraction
Solid phase extraction
Chromatography
Electrophoresis

May be done as part of the
measurement step
6. Analysis
- Calibration
- Validation/controls/ blanks
- Replicates
7. Data processing and Report Results
Statistical analysis
Report the results
Signal Instrumental Methods
Emission of radiation
Emission spectroscopy (X-ray, UV, visible, electron,
Auger); fluorescence, phosphorescence, and
luminescence (X-ray, UV, and visible)
Absorption of radiation
Spectrophotometry and photometry (X-ray, UV, visible,
IR); photoacoustic spectroscopy; nuclear magnetic
resonance and electron spin resonance spectroscopy
Scattering of radiation
Turbidimetry;nephelometry; Raman spectroscopy
Refraction of radiation Refractometry; interferometry
Diffraction of radiation X-ray and electron diffraction methods
Rotation of radiation
Polarimetry; optical rotatry dispersion;circular dichroism
Electrical potential Potentiometry; chronopotentiometry
Electric charge Coulometry
Electric current Polarography; amperometry
Electrical resistance Conductometry
Mass-to-charge ratio
Rate of reaction
Mass spectrometry
Kinetic methods
Thermal properties Thermal conductivity and enthalpy methods
Radioactivity Activation and isotope dilution methods
Types of Instrumental Methods
Comparison of Different analytical methods
Method
Approx.
range
( mol/L)
Appro
xpreci
sion
(%)
Selectivit
y
Speed Cost Principle uses
Gravimetry
Titrimetry
Potentionmetry
Electrogravimetry,
coulometry
Voltammetry
Spectrophotometry
Fluorometry
Atomic spectrometry
Chromatography
Kinetic methods

10
-1
-10
-2
10
-1
-10
-4
10
-1
-10
-6
10
-1
-10
-4

10
-3
-10
-10

10
-3
-10
-6

10
-6
-10
-9

10
-3
-10
-9

10
-3
-10
-9

10
-2
-10
-10

0.1
0.1-1
2
0.01-2

2-5
2
2-5
2-10
2-5
2-10
Poor- mod.
Poor- mod.
Good
Moderate

Good
Good-mod.
Moderate
Good
Good
Good-Mod.
Slow
Mod.
Fast
Slow-mod.

Moderate
Fast- mod.
Moderate
Fast
Fast-Mod.
Fast- Mod.
Low
Low
Low
Mod.

Mod.
Low- Mod.
Mod.
Mod.- High
Mod.-high
Mod.
Inorg.
Inorg., Org.
Inorg.
Inorg., org.

Inorg., org.
Inorg., org.
Org.
Inorg- Multiele.
Org. Multicom.
Inorg.,org,enzyme
Validation of a method
Precision must be checked by analyze replicate
samples
Accurate result must be obtained by:
+ use proper calibration
= Analyses spiked sample
+ compare the samples results with the those
obtained with another accepted method
+ analyze the standard reference material of
known composition
+ Run control sample at least daily
To assure validation method, apply the
guidelines of good laboratory practice (GLP)
The Laboratory Notebook
Use to record your job in a analytical chemist,
documents everything you do.
Some good rules:
+ Use a hardcover notebook
+Number pages consecutively
+ Record only in ink
+ Never tear out pages.
+ Date each page, sign it and have it signed by
someone else.
+ Record the name of the project, why it being
done, and any literature references
+ Record all data on the day you obtain it.

The Laboratory Notebook
An example of laboratory notebook:
+Date of experiment
+ Name of experiment
+ Principle
+ Reaction for determination:
+ Work of standardization of preparation of
chemicals, reagents
+ The way to calculate; raw data of experiment,
calculate the average and Standard deviation.
+ The final result.
Chapter 1:
Error in Anal. Chem.
1. Error
2. Absolute and Relative error
3. Systematic and random error
4. Outliers and accumulative error
5. Repeatability, reproducibility
6. Precision and accuracy
* Every measurement that is made is subject to a number of errors.
If you cannot measure it, you cannot know it.
A. Einstein

absolute error X = AX=
= measured value true value
E
A
= x
relative error = c
x
= AX / X
percent relative error = c
x
x 100 (%)
Absolute and Relative Error
Random Error
(indeterminate error)
Cannot be determined (no control over)
A result of fluctuations (+ and -) in random variables
Multiple trials help to minimize
Random errors can be reduced by:
- Better experiments (equipment, methodology, training
of analyst)
- Large number of replicate samples

Random errors show Gaussian distribution for a large
number of replicates
Can be described using statistical parameters

Systematic Error
(determinate error)
Known cause: - Operator
- Calibration of glassware, sensor, or instrument a result of
a bias in one direction
A result of a bias in one direction ( + or -)
When determined can be corrected
May be of a constant or proportional nature
To detect a systematic error:
Use Standard Reference Materials
Run a blank sample
Use different analytical methods
Participate in round robin experiments (different labs and
people running the same analysis)

Types of Error
Proportional error
influences the
slope.
Constant error influences the
intercept.
If the nature of the error is not known (random or
systematic?) then the following rules will apply:
Accumulate Error
Addition and subtractions

When adding or subtracting measurements the absolute errors are added.

Example 1:
X AX
mass of beaker plus sample 21.1184 g 0.0003 g
mass of empty beaker 15.8465 g 0.0003 g
mass of sample 5.2719 g 0.0006 g (errors added !)

(21.1184 0.0003) g (15.8465 0.0003) g = (5.2719 0.0006) g
Multiplication and division
When multiplying or dividing measurements the relative errors are added.
Consequently the absolute errors of the measurements must first be converted to relative
errors.
Example 1:
A = (1.56 0.04) cm, AA = 0.04 cm c
A
= 0.04 cm / 1.56 cm = 0.0256
B = (15.8 0.2) cm
2
, AB = 0.2 cm
2
c
B
= 0.2 cm
2
/ 15.8 cm
2
= 0.0127
Product of A and B: AB = (1.56 cm)(15.8 cm
2
) = 24.648 cm
3
= 24.6 cm
3
to 3 SF
Adding relative errors: c
AB
= c
A
+ c
B
= 0.0256 + 0.0127 = 0.0383 = 0.04
The % relative error in the product AB is therefore = 4 %
Sampling
Preparation
Analysis
Representative
sample
homogeneous
vs.
heterogeneous
Loss

Contamination
(unwanted addition)
Measurement
of Analyte

Calibration of
Instrument or
Standard
solutions
How about sampling a
chocolate chip cookie?
1. Static Error 2. Dynamic Error
3. Insertion and Loading Errors
4. Instrument Error 5. Human Error
6. Theoretical Error 7. Miscellaneous Error
Repeatability,
reproducibility
The closeness of agreement between
independent results obtained with the same
method on identical test material,
under the same conditions (same operator,
same apparatus, same laboratory and after
short intervals of time) (repeatability).
under different conditions (different operators,
different apparatus, different laboratories
and/or after different intervals of time)
(reproducibility).

Accuracy and Precision
True value standard
or reference of
known value or a
theoretical value

Accuracy closeness
to the true value
Precision
reproducibility or
agreement with each
other for multiple trials.

Accuracy------ Precision

Only obtained if
measured values
agree with true values
Must reduce systemic
& random error to
improve accuracy
Always requires the
use or comparison to
a known standard

Describes the range of
spread of the individual
measurements from
the average value for
the series
Describes the
reproducibility of the
measurement
Improves with reduction in
random error

Exercise 1
Fig. 1:
Exercise 2
Exercise 3
Exercise 4
Exercise 5
What kind of error?
Chapter 2:
Descriptive statistics
How do you assess the total error?
- One way to assess total error is to
treat a reference standard as a
sample.
- The reference standard would be
carried through the entire process to
see how close the results are to the
reference value.

Accuracy and Precision
Nature of
accuracy and
precision
Both accurate
and precise
Precise only Neither
accurate nor
precise
Mathematical
comments
Small standard
deviation or %CV
Small %error
Small
standard
deviation or
%CV
Large %error
Large
standard
deviation or
%CV
Large %error
The center of
the target is
the true value.

Scientific
comments
Very small error
in measurement

All cluster the
true value

Remember a
standard or true
value is needed
Clustered
multiple
measurements
but consistently
off from true
value

Calibration of
probe or other
measuring
device is off or
unknown
systematic error
The shot-gun
effect

Get a new
measurement
system or
operator
Both a & p
Precision only
Neither a nor p
Expressing
accuracy and precision
Mean (average)
Percent error

Range
Deviation
Standard deviation
Percent coefficient of variation
precision
accuracy
(See also in chapter 3)
Population vs. sample
Population = the entire collection of
items
e.g. all 100 mg vitamin C tablets produced
Sample = a portion of the
population
e.g. a bottle of vitamin C pills
Generally only data for samples is available
since it is generally impossible to obtain
data for the whole population
Standard Deviation of the
Population

Actual variation in the
population
Sample part of
population
Estimates
the variation
in the population
- May not be representative
sample

s
x x
N
x
x
N
N
i
i
N
i
i
i
N
i
N
=

|
\
|
.
|
=

|
\
|
.
|
=
=
=
_
2
1
2 1
2
1
1 1

( )
N
x x
N
i
i
=

=
1
2
o
Why divide by N-1 when calculating s?
N-1 = degrees of freedom (Df) of sample
number of independent values on which a
result is based, or the number of values in
the final calculation of a statistic that are free
to vary
for a population Df = N
for a sample Df = N-1
one Df lost when calculating the Average of a
sample

More on Dfs
To calculate the std. dev. of a random sample, we must first calculate
the mean of that sample and then compute the sum of the several
squared deviations from that mean.
While there will be n such squared deviations only (n - 1) of them are,
in fact, free to assume any value whatsoever.
This is because the final squared deviation from the mean must include
the one value of X such that the sum of all the Xs divided by n will
equal the obtained mean of the sample.
All of the other (n - 1) squared deviations from the mean can,
theoretically, have any values whatsoever.
For these reasons, std. dev. of a sample is said to have only (n - 1)
degrees of freedom.
Population Data
For an infinite set of data,
N then x and s

population mean population std. dev.
The experiment that produces a small
standard deviation is more precise .
Remember, greater precision does not imply
greater accuracy.
Experimental results are commonly
expressed in the form:
mean standard deviation

s x
_
Standard deviation of the mean

(standard error)
When the standard deviation of several mean
values is taken, the amount of deviation between
the mean values will be reduced by a factor
proportional to the square root of the number of
data points (N) present in each set used to
calculate each mean value
s = standard deviation between individual values
s
m
= standard deviation between mean values

s
s
N
m
=
Variance
Relative standard deviation
Percent RSD / coefficient of variation
x
s
RSD=
Other ways of expressing the precision of the data:
Variance = s
2

100
x
s
%RSD =
Box and whisker plot on Minitab 14
median
range
Large variation
Small
variation
outlies
The same rules apply to calculations involving standard
deviations (assuming the standard deviation is due only
to random errors)
If the nature of the errors are
ALL KNOWN TO BE RANDOM ERRORS
then the following set of rules will apply
Significant Figures
The number of digits reported in a measurement reflect the
accuracy of the measurement and the precision of the
measuring device.

The results are reported to the fewest significant figures (for
multiplication and division) or fewest decimal places
(addition and subtraction.

Significant Figures
1. Digits 1 6 9
2. Zeros between Significant Digits
3. Terminal Zeros to Right of Decimal
4. Terminal Zeros to Left of Decimal
(two thoughts)
5. Place holding zeros
Except log x = 0.025
Rounding off
When the answer to a calculation contains too
many significant figures, it must be rounded off.
This approach to rounding off is summarized as
follows.
If the digit is smaller than 5, drop this digit and
leave the remaining number unchanged. Thus,
1.684 becomes 1.68.
If the digit is 5 or larger, drop this digit and add
1 to the preceding digit. Thus, 1.247 becomes
1.25.
If the last digit is 5, the number is rouded off to
the nearest even digist

Methods of Expressing
Uncertainty in Results
A. Three methods
1. Record Uncertainty (Absolute)
2. Record Relative Uncertainty in %
3. Use of Significant Digits
+ Record all accurately known digits
+ a digit that is uncertain

B. Method assumes that the last digit recorded is
uncertain by 1 unless stated differently

Examples of presentation the data
Weight Measured 9.82 0.2385 g
= 9.82 0.02 g
6051.78 30 m/s = 6050 30 m/s
For stating uncertainty:
-Round uncertainty to one significant figure
.unless
x
has a 1 as a leading digit

x
= 0.14 then
x
=0.14 not 0.1
For calculation though you should retain one
significant more than justified

Chapter 3: Basic Distribution
What is a Distribution?
The pattern of variation of a variable is
called its distribution, which can be
described both mathematically and
graphically.
In essence, the distribution records all
possible numerical values of a variable
and how often each value occurs (its
frequency).
Can be either discrete or continuous
Which statistical test is appropriate will
depend upon the distribution of your data.
From: http://stat.tamu.edu/stat30x/notes/node16.html
Types of Distributions
Binomial
Distribution
Normal
Distribution
Poisson
Distribution
Exponential
Distribution
Logistic
Distribution
t-Distribution
Chi-squared
Distribution
F-Distribution
Gamma Distribution
Hypergeometric
Laplace Distribution
Note that distributions can be either discrete or continuous
Binomial Distribution Graphic
From http://mathworld.wolfram.com/BinomialDistribution.html
For a large number of experiment replicates the results
approach an ideal smooth curve called the GAUSSIAN
or NORMAL DISTRIBUTION CURVE
Characterised by:
The mean value x
gives the center of the
distribution
The standard
deviation s
measures the width of
the distribution
2 2
/2 ) (x
e
2
1
y

=
The Gaussian curve whose area
is unity is called a normal error
curve.
= 0 and = 1
The Gaussian curve equation:
2
1
= Normalization factor
It guarantees that the area under
the curve is unity
Probability of
measuring a value
in a certain range =
area below the
graph of that range
Gaussian Distribution of Random Errors (Population)
Another way to represent a Gaussian distribution is to
relate it to a new variable, z, on the x-axis
s
x x x
z
_

Where
z = deviation from
the mean of a data
point stated in terms
of units of std dev.
Gaussian Distribution of Random Errors
Range Percentage of measurements
1 68.3
2 95.5
3 99.7
The more times you measure, the more confident you are
that your average value is approaching the true value.
The uncertainty decreases in proportion to
n 1/
The standard deviation measures the width of the
Gaussian curve.
(The larger the value of , the broader the curve)
Normal distribution
Data Transformation
What do you do if your data is not normally
distributed?
Use a non-parametric test
Transform your data
Logarithmic transformation:
Variable x log (Variable x +1)
Power transformation:
e.g. Variable x (Variable x)
Angular transformation:
e.g. Variable x arcsine ((Variable x))
Poisson Distribution
Typically used to model the number of
random occurrences of some phenomenon
in a specified unit of space or time.
E.g. The number of birds seen in a 10 min
period
Can usually be approximated by a normal
distribution
Exponential Distribution
Describes a
sample where y=
x^
a
Messy to work
with, but can be
transformed
(sometimes) or
you can use a
non-parametric
test
Logistic Distribution
Typically describes
sample that fits
y = log (x)
Again, messy to
work with
(sometimes) but
can be transformed
or you can use a
non-parametric test
T-distributions
Normal distribution
- As N (DF) increases t-distribution is less spread out.
- At large N t-distribution approaches shape of Gaussian distribution.

T-distribution ( 1-sided)
F-Distribution
A distribution that
typically arises
when testing
whether two
variables have the
same variance
Its the ratio of
two independent
chi-squared
statistics
ANOVAs are based on F distributions
Chi-squared Distribution
This is also based upon degrees of
freedom
Can be used to approximate many
different distributions
For example, may be used to approximate the
sampling distribution of the likelihood ratio
statistic (may cover this later)
Chi-square Distribution examples
Estimating Random Error
The random error (Ax)in a set of data can
be estimated by multiplying S
m
by a
statistical function called the student-t
distribution function
Ax t s
t s
N
p v m
p v
= =
,
,
Confidence intervals
AX

at a given confidence level (say 95%)
implies that the true value will be found
within AX

of the calculated mean
= x
t s
N
p v ,
x
t s
N
x
t s
N
p v p v
< < +
|
\
|
.
|
, ,
Chapter 4: Significant Test

Hypothesis:

F-test compares levels of PRECISION
T-test compares levels of ACCURACY

Rearranging Students t equation:
= true population mean
x = measured mean
n = number of samples needed
s
2
= variance of the sampling operation
e = sought-for uncertainty
Required number of
replicate analyses:
e
n
ts
x =
2
2 2
e
s t
n =
Since degrees of freedom is not known at this stage,
the value of t=1.96 for n is used to estimate n.
The process is then repeated a few times until a
constant value for n is found.
How many samples/replicates to analyze?
Comparing a mean value to the
true value (1t)
Calculate a t value as shown below

Compare to a value of t in a t-table at the
appropriate confidence level and DF
If the t
calc
> t
table
the two results are significantly
different
t
x N
s
calc
=
( )
=
=
=
x
ts
N
x
ts
N
x N
s
t
Comparing two sets of data
Comparison of means (T-test) (2t)
unpaired data
samples from the same population
e.g. comparing the results for the analysis water
samples performed by two different labs (water samples
from the same population)
paired data
samples from different populations
e.g. comparing cholesterol levels in different individuals
using two analytical methods
Comparison of variances (F-test)
unpaired data only
Comparison of variances (F-test)

-Calculate F :

-Compare F
cal.
with F
table
-If F
cal.
>

F
table
(2-tailed-test) then s
1

and s
2
are statistically comparable.
S
1
and S
2
are significant differences
(P
value
<o-level=0.05)

1
2
2
2
1
.
> =
S
S
F
cal
Which type of t test should be use
Comparing two means
(unpaired data)
Textbook method: 1. Comparison of variances
2. Comparison of mean
if S
1
and S
2
are not
significantly different
once t
calc
is
determined compare
to t
table

t
table
determined for
f= n
1
+n
2
-2
if t
calc
> t
table
then
difference is significant
(P
value
>o-level=0.05)

s
s n s n
n n
pooled
=
+
+
1
2
1 2
2
2
1 2
1 1
2
( ) ( )
t
x x
s
n n
n n
calc
pooled
=

+
1 2
1 2
1 2
Comparing two means
(unpaired data) (cont.)
Textbook method
if s
1
and s
2
are NOT
statistically comparable
(F-test fails) or S
1
and S
2

are significantly different.
t
calc
and DF for t
table
need
to be determined as
follows:

if t
calc
> t
table
then
difference is significant
(P
value
<o-level=0.05)

t
x x
s
n
s
n
calc
=

+
1 2
1
2
1
2
2
2
DF
s
n
s
n
s
n
n
s
n
n
=
+
|
\
|
.
|
|
\
|
.
|
+
+
|
\
|
.
|
+
1
2
1
2
2
2
2
1
2
1
2
1
2
2
2
2
2
1 1
2
Comparing two means
(unpaired data) (cont.)
Mosi method
Calculate the confidence interval for each mean
Compare the confidence intervals
The results are statistically comparable if the
intervals overlap such that each interval
overlaps with the mean value of the other
interval as shown in the diagram below.
Comparing two means (paired
data)
Calculate differences between PAIRS of data:
d
i
=x
Ai
-x
Bi

values can be either + or -
do not take absolute values of differences!
Calculate average and standard deviation
of differences (s
d
)

Calculate a t- value as shown below

If the t
calc
> t
table
the two results are
significantly different (f = N-1)

t
d N
s
calc
d
=
( )
d
1 n
) d (d
s
2
i
d
=

Chapter 5: ANOVA (analysis of variance)
t distribution: to test the hypothesis of no difference
between two population/sample means.
If we wish to know about the relative effect of three or more
different treatments, t distribution can be used?.
The t-test is inadequate in several ways.
Any statistic that is based on only part of the evidence (as
is the case when any two groups are compared) is less
stable than one based on all of the evidence.
There are so many comparisons that some will be
significant by chance.
It is tedious to compare all possible combinations of
groups.

The logic of ANOVA
Hypothesis testing in ANOVA is about whether the
means of the samples differ more than you would
expect if the null hypothesis were true.
This question about means is answered by analyzing
variances.
Among other reasons, you focus on variances because
when you want to know how several means differ, you are
asking about the variances among those means.

ANOVA is also used for evaluation of main/ interaction
effects.
Some ANOVA notes
Hypothesis: H
0
:
1
=
2
=
3
. =
k
H
a
: At least 2 of the means differ
(Does NOT mean that all population means differ)
The term variance refers to the statistical method
being tested, not the hypothesis being tested
(Does NOT test whether the variances of the groups are
different)
The P value in an ANOVA has many tails
Reported as:
(one-way ANOVA, F
df between groups, df within groups
= , p =
)
Assumptions of ANOVA
Samples are randomly selected from larger
populations.
Sample groups are independent.
Observations within each sample were
obtained independently.
The data from each population is normally
distributed.
Two Sources of Variability
In ANOVA, an estimate of variability between
groups is compared with variability within
groups.
Between-group variation is the variation among the
means of the different treatment conditions due to
chance (random sampling error) and treatment
effects, if any exist.
Within-group variation is the variation due to
chance (random sampling error) among individuals
given the same treatment.
ANOVA
Within-Groups Variation
Variation due to chance.
Between-Groups Variation
Variation due to chance
and treatment effect (if any existis).
Total Variation Among Scores
Variability Between Groups
There is a lot of variability from one mean to the next.
Large differences between means probably are not due to
chance.
It is difficult to imagine that all six groups are random
samples taken from the same population.
The null hypothesis is rejected, indicating a treatment
effect in at least one of the groups.
One-way ANOVA formula
The one-way ANOVA fits data to this:
Y
i,j
= grand mean + group effect +
i,j

.Y
i,
j
= the value of the i th subject and j th group

.Group effect = the difference between the
means of population i and the grand mean

.Each
i,j
is a random value from a normally
distributed population with a mean of 0
The F Ratio
ANOVA (F)
Mean Squares Within
Within-Groups Variation
Variation due to chance.
Mean Squares Between
Between-Groups Variation
Variation due to chance
and treatment effect (if any existis).
Total Variation Among Scores
F =
MS
between
MS
within
bility GroupVaria Within
bility GroupVaria Between
= F
The F Ratio
F =
MS
between
MS
within
MS
between
=
SS
between
df
between
MS
within
=
SS
within
df
within
SS
total
= SS
between
+SS
within
The F Ratio: SS Between
Grand Total (add all of the
scores together, then
square the total)
Total number of subjects.
N
G
n
T
SS
between
2 2
E =
Find each group total, square it,
and divide by the number of
subjects in the group
2
) (
grand group
between
X X n SS E =
The F Ratio: SS Within
Squared group total.
Number of
subjects in each
group.
n
T
X SS
within
2
2
E E =
Square each individual score and
then add up all of the squared
scores
2
) (
group
within
X X SS E =
The F Ratio: SS Total
1 - groups of number =
between
df
SS
total
= X
2
G
2
N
Grand Total (add all of the

scores together, then
square the total)
Total number of subjects.
Square each
score, then add
all of the squared
scores together.
) ( ) ( ) (
2
group grand group grand
total
X X X X X X SS + = E =
Degrees of Freedom:
Between:

Within:

groups of number total - subjects of number total
... 1 1 1
3 2 1
=
+ + =
within
within
df
n n n df
1
2
3
Two-Way ANOVA
Two-Way ANOVA uses same error term
as One-Way ANOVA
Average of within-cell variances (SS
WC
/df
WC
)
Difference is that between-cell SS is
partitioned into each main effect (rows,
columns) and the interaction
SS
R
, SS
C
, SS
RXC

Two-Way ANOVA
Two-Way ANOVA
Two-Way ANOVA
Two-Way ANOVA
Latin square
Latin squares has counterbalancing built in
Nr of rows equals the nr of columns
The letter presenting treatments appears in
each column and row only once
Effects of treatment, order and sequence are
isolated systematic counterbalancing
Order 1 2 3
seq 1 A B C
seq 2 B C A
seq 3 C A B

Latin Squares
Latin Squares

Chapter 6: Correlation and Linear
Regression analysis
6.1. Bivariate Correlation:
- is used to measure the strength of the
linear relationship between variables.
- measures how variables or rank
orders are related.
-computes Pearsons correlation
coefficient and Spearmens rank
correlation.

Assumptions
Subjects are representative of a larger population
Paired samples (must have 2 variables) are
Independent observations
X and Y values must be measured independently
X values are measured but not controlled
Normal distribution (if not use Spearmans rank
correlation)
All covariation must be linear
Note that outliers have a large influence in correlation
Scatter Diagram
Designate one variable X and the other Y.
Although it does not matter which is which, in cases
where one variable is used to predict the other, X is
the predictor variable (the variable youre predicting
from).
Draw axes of equal length for your graph.
Determine the range of values for each variable. Place
the high values of X to the right on the horizontal axis
and the high values of Y toward the top of the vertical
axis. Label convenient points along each axis.
For each pair of scores, find the point of intersection for
the X and Y values and indicate it with a dot.
Pearson correlation
- Compute for correlation coefficient (r) which
indicated the strength that variables are
linearly related in a sample.
-The significance test for r reveals whether
there is a linear relationship between variables
in population.
Pearsons r assumes an underlying linear
relationship (a relationship that can be best
represented by a straight line).
Not all relationships are linear
Correlation Analysis
With a simple two variable correlation,
you need to know the strength and
direction of the correlation

Scatterplots help illustrate the relationships between variables
Pearsons r
Definitional formula:

Computational formula:
) ) ( )( ) ( (
) )( ( ) (
2 2 2 2

=
Y Y n X X n
Y X XY n
r
r =
COV
XY
(s
x
)(s
y
)
n
Y Y X X
COV
XY

=
) )( (
separately vary Y and X which to degree
ther vary toge Y and X which to degree
= r
Strength of Relationship
How can we describe the strength of the
relationship in a scatter diagram?
Pearsons r.
A number between -1 and +1 that indicates the
relationship between two variables.

The sign (- or +) indicates the direction of the relationship.
The number indicates the strength of the relationship.
-1 ------------ 0 ------------ +1
Perfect Relationship No Relationship Perfect Relationship

Correlation Coefficient
is the best-known and easiest technique
r
s
is given by the equation:

where d is the difference between rankings in two
ranking methods
When N > 10, r
s
can be used to calculate a t-score with
the equation and the resulting t-score is used in
a two-tailed test of significance
Spearman Rank Correlation
1) - N( N
d
6
- 1 =
r
2
2
N
=1 i
s
Kendall Rank Correlation Coefficient

(t)
More complicated than the Spearman rank
Should be used when three or more sets of
rankings are compared
Calculated by the proportion of concordant pairs
minus the proportion of discordant pairs
There exist two bivariate observations, (x
i
,y
i
) and (x
j
,y
j
)
Concordant pairs are when (x
i
-x
j
)*(y
i
-y
j
) are positive
Discordant pairs are when (x
i
-x
j
)*(y
i
-y
j
) is negative
Scores range from -1 to 1
Goodman and Kruskals Lambda ()
is used when nominal scales are used
Spearman rank and t wont work because
the ordering element is missing with
nominal scales
can be calculated by statistical
packages
Partial Correlations (r
P
)
- To indicate the degree of two variables are linearly
related in a sample, partialling out the effects of
one or more control variables.
- To interpret partial correlation between two
variables we must know the bivariate correlation
between the variables first.
- To conduct a partial correlation, there must be at
least three variables.
can be used in the following ways
Partial correlation between two variables
Partial correlation among multiple variables within
a set
Partial correlation between sets of variables
Method of Least Squares
Assumptions
The uncertainties in the y-values are greater than those in
the x values.
The line representing the data should be drawn so that the
deviations of the y-values are minimized.
Thus the best fit line or the least squares line is the
straight line drawn that minimizes the vertical deviations
(residuals) between the points on the line deviations cab
be positive or negative ->should minimize the sum of the
squares of the deviations.
The linear relationship of the analyte content and the measured signal:
Y = mX + b or signal = m (Conc.) + S
blank

That is we draw the straight line that has the least value
for the sum of the squares of the deviations.

Least square method
signal = m (Conc.) + Sblank

Linear regression y=b+mx

=
2 2 2
) ( ) (
) )( (
i i
i i i i
i
i i
x x n
y x y x n
x x
y y x x
m

= =
= = = =
= =
n
i
n
i
i i
n
i
n
i
i i
n
i
i i
n
i
i
x x n
y x x x y
x m y b
1 1
2 2
1 1 1
2
1
) (
. . .
.
Linear regression
y= ( bS
b
)+ (mS
m
)x
2
2
=
n
d
S
i
y

=
2
2
2 2
) ( ) (
) (
i i
i y
b
x x n
x S
S

=
2 2
2
) ( ) (
i i
y
m
x x n
nS
S
fiding Sy, Sm, Sb
Important Parameters in Instrumental Analysis
1) Sensitivity
2) Detection Limit
3) Dynamic Range
4) Selectivity
5) Signal-to-noise Ratio
Detection Limit ( LOD)
LOD: The [analyte]
min
that can be determined with
statistical confidence.
Analytical signal must be statistically greater than the
random noise of the blank.
(i.e. analytical signal = 2 or 3 times S.D.of blank
measurement ( approx. equal to the peaknoise).
Calculation of LOD
The minimal detectable analytical signal (Sm) is given
by: S
m
= S
bl
+ k.SD
blank
To experimentally determine
Perform 20-30 blank measurements over an extended
period of time.
Calculate S
bl
(mean blank signal) and SD
blank

Detection limit (Cm) is : Cm = (S
m-
SD
bl
/)/m
LOQ, LOL, Dynamic Range
LOQ (limit of
quantitation): [lowest] at
which quantitative
measurements can
reliably be made.
LOQ=10 x Average Signal
for blank
LOL (limit of linearity):
point where signal is no
longer proportional to
concentration.
[Dynamic Range]: from
LOQ to LOL.

Cm: detection limit

Sensitivity

Indicates the response of the instrument to changes in
analyte concentration or a measure of a methods ability
to distinguish between small differences in concentration
in different samples.
In other words, a change in analytical signal per unit
change in [analyte].

Effected by the slope of calibration curve & precision
For two methods with equal precision, the one with the
steeper calibration curve is more sensitive.
( Calibration Sensitivity)

If two methods have calibration curves with equal
slopes, the one with higher precision is more sensitive.
(Analytical Sensitivity)
Calibration Sensitivity
is the slope of the calibration curve evaluated .
S = mc + S
bl

(m= slope; c= conc; Sbl = Signal of Blank)
Advantage: sensitivity independent of [analyte]
Disadvantage: Does not account for precision of individual
measurements
Analytical Sensitivity
(Defined by Mandel and Stiehler )
to include precision in sensitivity definition g = m/S
s
(m = slope; S
s
is the standard deviation of measurement)

- Advantage: Insensitive to amplification factors i.e. increasing gain
also increases m but Ss also increases by same factor hence g
stays constant
- Disadvantage: concentration dependent as Ss usually varies with
[analyte]

Selectivity
Degree to which a measurement is free from
interferences by other species contained in the matrix
Analytical Signal Detected is a sum of the analyte signal
plus interference signals
S = m
a
C
a
+ m
b
C + mCc + S
blank
Selectivitity is a measure of how easy it is to distinguish
between the analyte signal and the interference signal
Selectivity of an analytical method can be described using
a figure of merit called selectivity coefficient
k
b,a
= m
b
/m
a
: k
c,a
= m
c
/m
a
S = m
a
(C
a
+ k
b,a
C
b
+ k
c,a
C
c
) + S
blank
Selectivity coefficients range from 0 >> 1. Can be
negative if interference reduces the observed signal
Standard Addition Calibration

Most useful when analyzing complex samples when
significant matrix effects are possible.
Most common form is adding 1 or more aliquots to sample
aliquot (sample spiked)
If sample limited, can add to sample aliquot

Where : k is a proportionality constant relating signal to
concentration, Vs is the volume of standard added at a concentration of Cs,
Vx is the volume of unknown (aliquot) added at a concentration Cx, and
Vt is the total (final) volume.

The Standard Addition Method
(Spiking)

Technique to be used when:
Samples have substantial matrix effects.
Assay requires instrumental conditions that are difficult to
control
Procedure
A measurement is made on a portion of the sample
Varying but known amounts (called spikes) of the assayed
substances are added to several equal portions of the
sample standard addition
Each solution is diluted to same volume and measured.
The assay measurement is then plotted as a function of the
concentration spike.
The resulting plot is extrapolated to the concentration axis (i.e.
x
axis
)

Internal Standard Method
An internal standard is a substance that is added in a
constant amount to all samples, blanks and calibration
standards in an analysis.
Procedure:
Carefully measured quantity of the internal standard is
introduced into each standard sample.
The solutions are diluted to the same volume and the
analytical signal is measured.
Calibration curve: Plot a ratio of the analyte signal to internal
standard signal vs. the analyte concentration of the
standards
The ration for the samples is then used to obtain their
analyte concentration from the calibration plot.

Internal Standard (IS)

Internal standards are essential if we have a time-varying
instrumental response Internal standards are very useful
if you have matrix effects

Chapter 7: Quality Assurance / Quality
Control
QA: The planned measures that ensure a
service or product meets minimum
professional standards.
QC: The day-to-day activities that monitor
the quality of laboratory reagents, supplies
and equipment.
QA/QC: Proficiency Testing
Laboratory Accreditation
Validation

ISO 9000
An international set of standards for quality
management.
Applicable to a range of organisations from
manufacturing to service industries.
ISO 9001 applicable to organisations which
design, develop and maintain products.
ISO 9001 is a generic model of the quality
process that must be instantiated for each
organisation using the standard.
ISO 9001
Management responsibility Quality system
Control of non-conforming product s Design control
Handling, storage, packaging and
delivery
Purchasing
Purchaser-supplied product s Product identificat ion and traceabilit y
Process control Inspection and testing
Inspection and test equipment Inspection and test st atus
Contract review Correct ive act ion
Document control Quality records
Internal quality audit s Training
Servicing St atistical techniques
ISO 9000 certification
Quality standards and procedures should
be documented in an organisational
quality manual.
An external body may certify that an
organisations quality manual conforms to
ISO 9000 standards.
Some customers require suppliers to be
ISO 9000 certified although the need for
flexibility here is increasingly recognised.
Documentation standards
Particularly important - documents are the
tangible manifestation of the software.
Documentation process standards
Concerned with how documents should be
developed, validated and maintained.
Document standards
Concerned with document contents, structure, and
appearance.
Document interchange standards
Concerned with the compatibility of electronic
documents.
Document standards
Document identification standards
How documents are uniquely identified.
Document structure standards
Standard structure for project documents.
Document presentation standards
Define fonts and styles, use of logos, etc.
Document update standards
Define how changes from previous versions
are reflected in a document.
Quality in Environmental Analysis
Value of Quality Control
General QC principles.
Sources of error.
Terminology and Definitions.
Quality Control vs. Quality Assurance.

QC Terminology and Definitions
Principle Data Quality Indicators (DQIs):
Precision
Bias
Accuracy
Representativeness
Comparability
Completeness
Precision:
- The agreement between the numerical values of
two or more measurements that have been
made in an identical fashion.
- Calculated as range or standard deviation.
- Intralaboratory & interlaboratory precision.

Bias:
- The systematic or persistent distortion of a
measurement process that can cause errors in
one direction
Accuracy:
- The measure of how close an individual or
average measurement is to the true value.
- Combination of precision and bias.
- A reference material must be used in
determining accuracy.

Representativeness:
- A measure of the degree to which data
accurately and precisely represents a sampling
point or process condition.
- A measure of how closely a sample
is representative of a larger process.
Comparability:
- A qualitative term that expresses the confidence
that two data sets can contribute to a common
analysis.

Completeness:
- A measure of the amount of valid data
obtained from a measurement system,
expressed as a percentage of the valid
measurements that should have been
collected (i.e., measurements that were
planned to be collected).

Quality Control vs. Quality
Assurance
- QC is a component of QA.
- QC measures and estimates errors in a
system.
- QA is the ability to prove that the data is
as reported.

Sources of Error
- Sample errors
- Reagent errors
- Reference material errors
- Method errors
- Calibration errors
- Equipment errors
- Signal registration and recording errors
- Calculation errors
- Errors in reporting results

Sources of Error

Sample Errors
- Sample container contaminated.
- Incorrect sample location.
- Non-representative sample.
- Incorrect sample container.
- Sample mix up.
Reagent Errors
- Impure reagents or solvents.
- Improper storage of reagents.
- Neglect of reagent expiration date.
- Evaporated reagents.
- Consideration of different purities or grades.

Sources of Error
Reference Material Errors
- Impurity of reference materials.
- Errors from interfering substances.
- Changes due to improper storage.
- Errors in preparing reference material.
- Using expired reference material.

General Method Errors
- Deviating from the analysis procedure.
- Disregard for the limit of detection.
- Disregard for a blank correction.
- Calculation errors (dilutions, mixtures, additions).
- Not using the correct analytical procedure.

Sources of Error
Calibration Errors
- Volumetric measuring errors.
- Weighing errors.
- Inaccurate equipment adjustments.

Equipment Errors
-Equipment not cleaned
- Maintenance neglected.
- Temperature, electrical, and magnetic effects.
- Errors in using auto-pipettes (not calibrated, pipette tip
not correctly attached, contamination).
- Errors in using glass pipettes (damaged, bad technique,
contamination).
Sources of Error
Equipment Errors (continued)

Cuvette errors (defects not considered,
unsuitable cuvette glass, not filled to
minimum, wet on the outside, air bubbles,
contamination).
Photometer errors (wrong wavelength,
insufficient lamp intensity, dirty optics, drift
effect ignored, incorrectly set zero, light
entering the sample chamber).

Sources of Error
Signal Registration and Recording Errors
- Incorrect range setting.
- Reading errors.
- Recording errors.
- Switching of data.
Calculation Errors
- Arithmetic errors, decimal point errors, incorrect
units.
- Rounding errors.
- Not taking into account the reagent blank values.
- Error in dilution factor.
Errors in Reporting Results
- Omitting a sample error.
- No quality assurance implemented

Validation demonstrates that a procedure is
robust, reliable and reproducible
A robust method is one which produces
successful results a high percentage of the
time.
A reliable method is one that produces
accurate results.
A reproducible method produces similar
results each time a sample is tested.
QA- Does the method still work
Control charts - Documenting and archiving
Proficiency testing
Participating in collaborative interlaboratory studies
Calculate Z-score:
Where:
S
X X
z
i

=
i
X
is the mean of I replicate measurements by laboratory

is the accepted concentration

is the standard deviation of the accepted concentration
X
S
Defining the Problem
1. What accuracy is required?
2. How much sample is available?
3. What is the concentration range of the
analyte?
4. What components of the sample will cause
interference?
5. What are the physical and chemical
properties of the sample matrix?
6. How many samples are to be analyzed?
Selecting an Analytical Method
Numerical Criteria for Selecting Analytical Methods
Parameters for method validation
Accuracy
Precision
LOD, LOQ, Sensitivity
Selectivity
Linearity
Range
Ruggedness or Robustness
Accuracy (determination)
Compare results of the method with
results of an established reference method
Positive controls (dilution must be done
separately from calibration point with
fresh reagents, different supplier of
standard or other batch than calibration)
Measurements of CRM
Spiking the sample matrix with a known
concentration of RM

Standard operation procedure
(SOP)
It should include:
Validity ( e.g. application in wastewater)
Short description of the main principle
Possible errors and problems
Preparation of reagents, standards, instruments
Sample preparation (sampling, enrichment,
chromatography, detection)
Quantification of the compounds
QA/QC

Statistics For Anal. Chem. - Lecture Notes - Xu Ly So Lieu

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Anal. Chem. - Lecture Notes - Xu Ly So Lieu

Uploaded by

Copyright:

Available Formats

Statistics for Analytical

Standard deviation of the mean

Chapter 4: Significant Test

Grand Total (add all of the

Kendall Rank Correlation Coefficient

You might also like