Professional Documents
Culture Documents
Variance
Simple analysis of variance Condence Interval of the Mean t values Do two means differ F test
So far, weve assumed that all observed variance comes from a single, random source. not likely there can be many sources of variance Well now introduce a way to analyze the variance in sample sets.
Analysis of variance
2 total
= s + s +.... + s
2 1 2 2
Sample
1 2 3 4
Replicates
15.9, 16.1, 16.3 14.9, 15.1, 15.3 15.8, 15.8, 15.8 16.2, 16.0, 15.9
Mean
16.1 15.1 15.8 16.0
In general, when sources of variance are linearly related (independent and uncorrelated), the variances are additive. We often need to do experiments to evaluate the magnitude and sources of variance.
In this example, a series of four samples are obtained and each is analyzed in triplicate.
X1
X2
X3
X4
Level 2
ss T = !` x i - x T j
xT = Grand Mean mean of all the points
Level 1 gives us an idea as to sample variability Level 2 tells use about the method variability
MS T =
!` x
-x Tj df T
i
dfT = total - 1
ss between = !n r ` x s - x T j
Then the mean square for the samples
Since you know SST and SSbetween, you can nd the within sample variance by:
ss T = ss between + ss within
The mean sum of squares for our replicates is then:
MS between = ss between df s
n r = # replicates per sample
MS within =
^ss T - ss between h
df T - # samples
df s =# samples -1
# 1 2 3 4
Replicates 15.9 16.1 16.3 14.9 15.2 15.3 14.8 15.8 15.8 16.2 16.0 15.9
We can now use the F test to determine if there is a signicant difference between the two sources of variance. F is then compared to Fc to see if the difference is signicant. This will be covered in a bit.
s F =s
2 big 2 small
F test
Using Excel
MS F = MS
2 big 2 small
2 MS sample = 2 MS replicate
Using Excel
Using Excel
Using Excel
Replicate 1 2 3 4
1 2 3 4
Twelve chemists assayed a sample for Pb to see if they got the same results. Each used the same furnace AA and spiked authentic serum sample. Results are in ug Pb/L Current Federal limit is 100 ug Pb/L in blood.
Using XLStat
Using XLStat
Note: XLstat does not report Fc values - just the P value - the probability that your values are NOT different. Data must be ordered in a single column. Two methods are available for calculating sum of squares for your groups - Type I and III. These are only useful for more complex multivariable ANOVA Might as well review them at this point.
Analysis of variance: Source Model Error Corrected Total DF Sum of squares Mean squares F 40.264 Pr > F < 0.0001
XLStat results.
Lead / Standardized coefficients
Standardized coefficients
Chemist-A7
There is a difference. Can we tell what it is? In this example, there is <0.01% chance of there NOT being a difference.
0.2 0 -0.2
Chemist-A1
Chemist-A2
0.4
Variable
Chemist-A11
Chemist-A3
47 474.621
Fcrit = 2.07
While results have been normalized, youd get the same basic plot with the raw data.
0.6
Chemist-A6
Chemist-A8
0.8
Chemist-A10
36
35.678
0.991
Chemist-A4
Chemist-A5
Chemist-A9
11 438.943 39.904
1.2
Chemist-A12
Using XLStat
The Dunnett test is used to compare samples (your chemists) to a control. There actually is no control but the test provides a useful way of comparing results. In this case, choose Chemist A1 because his/her results were the lowest, causing the results to be positive for the others.
Dunnett test
Compares group means. Each is pitted against one control or reference group. Calculate a t test values for each group comparison. Test typically can only be used when all groups are of equal size.
Dunnett test
Category A1 vs A12 A1 vs A9 A1 vs A5 A1 vs A4 A1 vs A7 A1 vs A10 A1 vs A8 A1 vs A6 A1 vs A3 A1 vs A11 A1 vs A2 Difference -10.798 -8.078 -6.345 -6.328 -5.838 -5.227 -4.903 -4.475 -2.575 -2.165 -0.105 Standardize d difference -15.339 -11.475 -9.014 -8.989 -8.293 -7.426 -6.964 -6.357 -3.658 -3.076 -0.149 Critical value 2.890 2.890 2.890 2.890 2.890 2.890 2.890 2.890 2.890 2.890 2.890 Critical difference 2.034 2.034 2.034 2.034 2.034 2.034 2.034 2.034 2.034 2.034 2.034 Pr > Diff 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.007 0.032 1.000 Signicant Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
Tukey Test
Honestly Signicantly Different (HSD) test. Based on pairwise comparison among means.
nh =
ts =
1 xi
n i=1
Mi - M j MSE nh
Harmonized mean
Mi - Mj = difference between pair means MSE = mean square error nh = the harmonized mean
Harmonized mean is the weighted arithmetic mean, with each value's weight being the reciprocal of the value.
Tukey test
Contrast Difference Standardized Critical value difference 15.339 3.490 Pr > Diff Signicant A12 vs A1 10.798 < 0.0001 Yes
Tukey test
Chemist Means A12 A9 A5 45.170 42.450 40.718 40.700 40.210 39.600 39.275 38.848 36.948 36.538 34.478 34.373 A B B B B C C C C C C D D D E E E F F F Groups
A12 vs A2
10.693
15.190
3.490
< 0.0001
Yes
A12 vs A11
8.633
12.263
3.490
< 0.0001
Yes
A12 vs A3
8.223
11.681
3.490
< 0.0001
Yes
A12 vs A6
6.323
8.982
3.490
< 0.0001
Yes
A4 A7 A10 A8 A6 A3 A11 A2 A1
A12 vs A8
5.895
8.374
3.490
< 0.0001
Yes
A12 vs A10
5.570
7.913
3.490
< 0.0001
Yes
A12 vs A7
4.960
7.046
3.490
< 0.0001
Yes
B1 B2 B3 B1 B2 B3 B1 B2 B3 B1 B2 B3
SSbetween
(k-1)
SSbetween
(k-1)
C1 C2 C3 C1 C2 C3
When adding more levels, things rapidly become more complex. MSbetween MSwithin A factorial design is required. Covered in unit 5.
SSwithin
(N-1)-(k-1)
SSwithin
(N-1)-(k-1)
C1 C2 C3
Condence interval
C .I. = n ! Z v N
Z = probability factor
95 99 99.99995
This will tell you where most of your data should occur. A common calculation to report variability of data. A quick way of identifying outlying values.
This is for large (N>100) data sets. Z comes from innity row of the t table.
We seldom collect a near innite data set. It much more common to work with smaller sets. We can then rely on the t test.
Degrees of freedom 1 2 3 4 5 6 7 8 9 10
90% t.95 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.81
99% t.995 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 3.17
t values
t values account for error introduced based on sample size, degrees of freedom and potential sample skew. Actually use 2 distribution - chi squared. 2 = (n-1) s2 / 2 n-1 This allows us to estimate population variance from sample variance. All of this is tied together into the t values.
Degrees of freedom 1 2 3 4 5 6 7 8 9 10
90% t.95 6.31 2.92 2.35 2.13 2.02 1.94 1.90 1.86 1.83 1.81
Confidence level 95% t.975 12.7 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.23
99% t.995 63.7 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 3.17
t values
Example
Data: 1.01, 1.02, 1.10, 0.95, 1.00 mean = 1.016 sx = 0.0541 sx = 0.0242 t values for 4 degrees of freedom 90% condence = 2.13 95% condence = 2.78
CI 90% = 1.016 +
2.13 x 0.0541 51/2
Example
(+ 5%)
= 1.02 + 0.07
(+ 7%)
You can have samples that are considered signicantly different and still have the same mean.
t test example
In both examples, the populations would be considered to be different - even though the means, medians and modes are identical in example on the right.
The F test
This test can be used to tell if two populations are different based on changes in variance. Examples Has the measurement precision changed? Has the method been altered? Were there any signicant changes due to the lab or analyst?
The F test
Calculation of F
F = S2 larger S2 smaller
F is always 1 or greater and depends on the condence level and degrees of freedom for both data sets You can look up the Fc value for the desired levels or use a spreadsheet.
You need to be concerned with differences in both the mean and sample variance.
As a group, did they perform differently on the rst and second exam?
Grades
Grades