You are on page 1of 45

CHAPTER 6

Statistical Inference & Hypothesis Testing

6.1 - One Sample


Mean , Variance 2, Proportion

6.2 - Two Samples


Means, Variances, Proportions
1 vs. 2 12 vs. 22 1 vs. 2

6.3 - Multiple Samples


Means, Variances, Proportions
1, , k 12, , k2 1, , k
CHAPTER 6
Statistical Inference & Hypothesis Testing

6.1 - One Sample


Mean , Variance 2, Proportion

6.2 - Two Samples


Means, Variances, Proportions
1 vs. 2 12 vs. 22 1 vs. 2

6.3 - Multiple Samples


Means, Variances, Proportions
1, , k 12, , k2 1, , k
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
Analysis via T-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604 - 630 )2
(667 - 630 (593 - 546 2
+K + (520 - 546 2
= 1663 F =

= 2.11 < 4
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1
546)) 1663
788.5

s2 = SS/df SS1 SS2


2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
= 1080 The pooled variance is a weighted average of the group
Variance 1 + n52+-32- 2 variances, using the degrees of freedom as the weights.
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
Analysis via T-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604 - 630 )2
(667 - 630 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SSErr = 6480
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.
s1 +5()n+2 -(3
1)-s1)(
2 1663 )
= 1080 The pooled variance is a weighted average of the group
Variance 1 + n52+-32- 2 variances, using the degrees of freedom as the weights.
dfErr = 6
p-value ==2 P (Y1 - Y2 84) = 2 P ( T6 24 ) = 2 P T6 3.5
p-value 84 - 0
( )
Standard 11 1 1 > 2 * (1 - pt(3.5, 6)) Reject H0 at = .05
Error s.e.0 = s 2
1080 ++ = 24
pooled [1] 0.01282634 stat signif, Hosp > Clinic
5n1 3n2
R code:
> y1 = c(667, 653, 614, 612, 604)
> y2 = c(593, 525, 520)
>
> t.test(y1, y2, var.equal = T)
Formal Conclusion
Two Sample t-test
p-value < = .05
data: y1 and y2 Reject H0 at this level.
t = 3.5, df = 6, p-value = 0.01283
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
25.27412 142.72588
sample estimates: Interpretation
mean of x mean of y
630 546 The samples provide evidence that the
difference between mean costs is (moderately)
statistically significant, at the 5% level, with
the hospital being higher than the clinic (by an
average of $84).
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups

Y1 Y2 Yk

kk

L
1
1 2
2

Null
H:
sis?
0
m1 = m2 = K = mk
pot he
Hy HA: At least one treatment mean i is
significantly different from the others.
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0

5 (630) 3 (546)
Grand Mean
667 + 653 + 614 + 612 + 604 + 593 + 525 + 520
y= = 598.50
5+3

The grand mean is a weighted average of the group


means, using the sample sizes as the weights.
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups

Y1 Y2 Yk

kk

L
1
1 2
2

H0: m1 = m2 = K = mk

HA: At least one treatment mean i is


significantly different from the others.
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3

How far is the total sample from the grand mean?


Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups

Y1 Y2 Yk

kk

L
1
1 2
2

H0: m1 = m2 = K = mk
How can we measure this? Imagine zero variability within groups
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups

Y1 Y2 Yk

kk

L
1
1 2
2

H0: m1 = m2 = K = mk
How can we measure this? Imagine zero variability within groups
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667,{630,


653, 614,
630, 612,
630, 604}; n1 = 5
630, Sample 2 = {593,
{546, 525,
546, 520};
546} n2 = 3
ANOVA F-test (if equivariance630 }
holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1

The
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
Total Variability = Variability between groups + Variability within groups

Y1 Y2 Yk

kk

L
1
1 2
2

H0: m1 = m2 = K = mk
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1

How far is each sample from its own group mean?


Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1

SSErr = (667 - 630) + (653 - 630) + (614 - 630) + (612 - 630) + (604 - 630)
2 2 2 2 2

+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2 BUT


Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
Analysis via T-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604
(667 - 630 (604 - 630 )2 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SS1 SS2
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
= 1080 The pooled variance is a weighted average of the group
Variance 1 + n52+-32- 2 variances, using the degrees of freedom as the weights.

LL
RECA
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
Analysis via T-test (if equivariance holds): Point estimates y = yi / n
NOTE:
Group Means y1 =
667 + 653 + 614 + 612 + 604
5 = 630 y2 = 593 + 525 + 520
3 = 546 y1 - y2 = 84 >0
Group 630)) 2 +K + (604
(667 - 630 (604 - 630 )2 (593 - 546 22
46 ) 22
+K + (520 - 5546)
Variances s1 =
2
5 -1 = 788.5 s2 2 = 546))
3-1 = 1663 F = 1663
788.5 = 2.11 < 4
s2 = SS/df SSErr = 6480
2 2
Pooled 2
spooled = (5( n1--1)(1)n788.5
s1 + ()n+2 -(3
1)-s1)(
2 1663 )
= 1080 The pooled variance is a weighted average of the group
Variance 1 + n52+-32- 2 variances, using the degrees of freedom as the weights.
dfErr = 6
LL
RECA
Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1

SSErr = (667 - 630) + (653 - 630) + (614 - 630) + (612 - 630) + (604 - 630)
2 2 2 2 2

+ (593 - 546) 2 + (525 - 546) 2 + (520 - 546) 2


Example: Y = $ Cost of a certain medical service
Assume Y is known to be normally distributed at each of k = 2 health care facilities (groups).
Hospital: Y1 ~ N(1, 1) Clinic: Y2 ~ N(2, 2)
Null Hypothesis H0: 1 = 2,
i.e., 1 2 = 0
(No difference exists.")
2-sided test at significance level = .05

Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3
ANOVA F-test (if equivariance holds): Point estimates y = yi / n
667 + 653 + 614 + 612 + 604 593 + 525 + 520
Group Means y1 = 5 = 630 y2 = 3 = 546
5(630) + 3(546)
Grand Mean y= = 598.50
5+3
SSTot = (667 - 598.5) + (653 - 598.5) + (614 - 598.5) + (612 - 598.5) + (604 - 598.5)
2 2 2 2 2

+ (593 - 598.5) 2 + (525 - 598.5)2 + (520 - 598.5) 2 = 19710 dfTot = (5+3) 1 = 7


SSTrt = 5 (630 - 598.5) 2 + 3 (546 - 598.5) 2 = 13230 dfTrt = (2) 1 =1

SSErr = 4(788.5) + 2 (1663) = 6480 dfErr = (5+3) 2 = 6


SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230


( between )
= s 2

Error 6 6480 1080


(=s 2
within )
Total 7 19710 Note:
2
This is also spooled .
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230


( between )
= s 2
12.25 ????
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
This is also spooled .
H0: = 2
1
2 2

H A : 12 2 2
SS1 2 SS 2
s =
2
1 s2 =
df1 df 2

Test Statistic

s12
F= 2
s2
Sampling Distribution =?
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table F1,6 df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230


( between )
= s 2
12.25
p-value
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note: |
2
This is also spooled . 12.25
5.99
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table F1,6 df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230


( between )
= s 2
12.25
p-value
Error 6 6480 1080
(=s 2
within ) = .05
Total 7 19710 |
Note: |
2
5.99 This is also spooled . 12.25
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230


( between )
= s 2
12.25 p < .05
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
This is also spooled .
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230 1pf(12.25, 1, 6)

( between )
= s 2
12.25 .01282634
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710 Note:
2
This is also spooled .
SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr
Tot
Err

Trt
SS MSTrt
MS = F=
ANOVA Table df MSErr

Source df SS MS F-ratio p-value

Treatment 1 13230 13230 1pf(12.25, 1, 6)

( between )
= s 2
12.25 .01282634
(on F1, 6 )
Error 6 6480 1080
(=s 2
within )
Total 7 19710
13230
Thus, the treatment accounts for 19710 = 67.1% of the total variability in the response Y.
R code:
# ANOVA FOR UNBALANCED DESIGN
> y1 = c(667, 653, 614, 612, 604)
> y2 = c(593, 525, 520)
>
> Data = data.frame(
+ Y = c(y1, y2),
+ X = factor(rep(c("y1", "y2"), times = c(length(y1),
length(y2))))
+ )
>
> var.test(Y ~ X, data = Data) # EQUIVARIANCE?

F test to compare two variances


data: Y by X
F = 0.4741, num df = 4, denom df = 2,
p-value = 0.4738

alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.01208057 5.04920249
sample estimates:
ratio of variances
0.4741431
R code:
# ANOVA FOR UNBALANCED DESIGN

> out = aov(Y ~ X, data = Data)


> anova(out)

Analysis of Variance Table

Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X 1 13230 13230 12.25 0.01283 *
Residuals 6 6480 1080
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Note: Vis--vis T-test vs. F-test,


p-value is the same using either method (.01283), since the sample is unchanged!
The square of the Tdf -score (3.5) is equal to the F1, df -score (12.25).
c 2
(Recall that the square of the Z-score is equal to the 1 -score.)
X1 X2 Xk
X1 X2 Xk
Suppose this ANOVA overall F-test
indicates that a significant difference
exists between one (or more) of the
treatment means, at = .05.

How can we find out which one(s)?


Y1 Y2 Yk

1
etc k
k

1 2
2

H0: m1 = m2 = K = mk
Idea: Test all possible pairwise comparisons, each via a two-sample t-test.
Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...

5

There are = 10 such comparisons. PROBLEM???
2

SPURIOUSY1 Y2 Yk
SIGNIFICANCE!!!

1
etc k
k

1 2
2

H0: m1 = m2 = Ke = mk

p-valu
Idea: Test all possible pairwise comparisons, each via a two-sample t-test.
Example
= .05 : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...

5

There are = 10 such comparisons. PROBLEM???
2

* = .05/10
Y1 Y2 Yk

1
etc k
k

1 2
2

H0: m1 = m2 = K = mk
Idea: Test all possible pairwise comparisons, each via a two-sample t-test.
Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...

5

There are = 10 such comparisons. PROBLEM???
2

Make each comparison at level * = / 10.
Y1 Y2 Yk

1
etc k
k

1 2
2

H0: m1 = m2 = K = mk
Idea: Test all possible pairwise comparisons, each via a two-sample t-test.
Example : Suppose there are k = 5 treatment groups.

(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3,5) (4,5)
p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ... p = ...

5

There are = 10 such comparisons.
2

BONFERRONI
Make each comparison at level * = / 10. CORRECTION
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups

NS?
PTI O
SSU M
EL A
MOD

Y1 Y2 Yk

1
L kk

1 2
2

H0: m1 = m2 = K = mk
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups

Equivariance can be tested via very similar two variances F-test in


6.2.2 (but this is very sensitive to normality assumption), or others.
If violated, can extend Welch Test for two means.

Y1 Y2 Yk

1
L kk

1 2
2

H0: m1 = m2 = K = mk
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups

Normality can be tested via usual methods.


If violated, use nonparametric Kruskal-Wallis Test.

Y1 Y2 Yk

1
L kk

1 2
2

H0: m1 = m2 = K = mk
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups

Extensions of ANOVA for data in matched blocks designs,


repeated measures, multiple factor levels within groups, etc.

Y1 Y2 Yk

1
L kk

1 2
2

H0: m1 = m2 = K = mk
Alternate method ~

Analysis of Variance (ANOVA)


Main Idea: Among several (k 2) independent, equivariant,
normally-distributed treatment groups
How to identify significant group(s)? Pairwise testing, with correction
(e.g., Bonferroni) for spurious significance.
Example: k = 5 groups result in 10 such tests, so let each * = / 10.

Y1 Y2 Yk

1
L kk

1 2
2

H0: m1 = m2 = K = mk
ssppuurrio
iou
ssiiggnniiffic uss
icaannccee