ANOVA-1

Nancy Ko, Ph.D. Assistant Professor National University of Singapore

1

Learning objectives

After todays lecture, you should be able to know 1. when ANOVA is an appropriate test to be used 2. why we need to use ANOVA rather than multiple t-tests 3. what are the null and alternative hypotheses for ANOVA 4. the h difference diff between b within-groups i hi variability i bili and d between-groups variability 5 how to perform ANOVA hypothesis testing 5.

Example 1

A study of patients with insulin-dependent diabetes was conducted to investigate the effects of cigarette smoking on renal and retinal complications. Before examining the results of the study, you wish to compare the baseline measures of systolic blood pressure (mmHg) across four different subgroups: nonsmokers, current smokers, ex-smokers, and tobacco chewers. Assume that systolic blood pressure is normally distributed.

Example 2

We wish to examine if there is a significant difference in medication adherence in patients of different age groups (21-40, (21 40, 41 41-60, 60, 61 and above). Medication adherence was measured on a 0-100 scale.

Example 3

Data were collected from a study that compares the effectiveness of a control treatment to an active treatment. Three doses of the active treatment were considered in the study. We wish to test if there is a significant difference in the mean ages of participants across treatment groups.

ANOVA

Group 1 Population Mean Standard Deviation Sample Mean Standard Deviation Sample Size Group 2 Group k

1 1

2 2

k k xk sk nk

x1

s1 n1

x2

s2 n2

ANOVA

We do not conduct all possible independent samples ttests because we want to control the overall probability of making a type I error on the predetermined level. Assuming we set = 0.05, by the multiplicative rule, the probability of failing to reject a null hypothesis in all three t tests would be 0.857, t-tests 0 857 and the probability of rejecting the Ho in at least one t-test is 0.143, which is greater than 0.05

ANOVA

Explanation: As we set = 0.05, for each t- test, when H0 is true, the probability of rejecting the H0 is 0.05 (i.e., the probability of making a type I error) and the probability of failing to reject the H0 is therefore 1 0.05 = 0.95 P (fail to reject the H0 in all three tests) = (1 0.05)3 = (0.95)3 = 0.857 P (reject the H0 in at least one t test) = P (the overall probability of making a type I error) = 1 0.857 = 0.143

Assumptions of ANOVA

1. The populations need to be independent. 2 The populations from which the samples are obtained 2. need to be at least approximately normally distributed or the sample sizes are sufficiently large ( 30). 3 The 3. Th samples l are random d samples l from f their th i populations. l ti 4. The variances of the populations are equal.

Sources of Variation

Total variation can be divided into two components: variation of the individual values around their population means variation of the population means around the overall mean If the variability within the different populations under comparison is small relative to the variability among their respective means, this suggests that the population means are in fact different

10

Sources of Variation

2 sb MS b

n1 x 1 x n2 x 2 x ... nk x k x k 1

,where

11

ANOVA

Ho: 1 = 2 = 3 F th For the alternative lt ti hypothesis h th i H1, you can state t t that: th t Not all three population means are the same, or At least one of the p population p means differs from one of the others

12

ANOVA

You CANNOT write the alternative hypothesis as H1: 1 2 3 because we will reject Ho if any of the population means are not equal (i.e., at least one pair of means is not equal). As such, it could be 1 2 = 3, or 1 = 2 3 It does not have to be 1 2 3.

13

ANOVA

2 sb Calculate F test statistic: F 2 sw

Look for the critical value F, df1, df2 from the F table The numerator degrees of freedom (df1) = k-1 The denominator degrees of freedom (df2) = N-k

14

ANOVA

Source of variability i bili Between groups Within groups Total Sum of Squares (SS)

k

2

2 b

SS b ni ( x i x)

i 1 k

k-1

2 SS b sb s MS b F 2 sw k 1

SS w df i * s

i 1

2 i

N-k N-1

SS w s MS w N k

2 w

Alternatively, SStotal

2 2 ( x x ) x

( x) 2 N

15

ANOVA

Decision Rule If F statistic < Fcritical, we fail to reject the null hypothesis and conclude that all the population means are equal.

If F statistic > Fcritical, we reject the null hypothesis and conclude that not all means are equal. In this case, , we only y know that difference exists BUT we do not know where the differences are. We need to perform post hoc tests (known as multiple comparisons procedures) p ) to determine where the differences lie.

16

Exercise

Source of variability Between groups Within groups Total 400 35 SS 100 Df 5 MS F

17

ANOVA-Example 1

Study objective: To examine if patients with coronary artery disease from three different hospitals have different forced expiratory volume in one second (FEV1) (liters)

Hospital 2 1.98 2.57 2.08 2.47 2.47 2.74 2.88 2.63 2.53 3.22 2.88 1.71 2.89 3.77 3.29 3.39 3.86 2.64 2.71 2.71 3.41 2.87 2.61 3.39 3.17 Hospital 3 2.79 3.22 2.25 2.98 2.47 2.77 2.95 3.56 2.88 2.63 3.38 3.07 2.81 3.17 2.23 2.19 4.06 1.98 2.81 2.85 2.43 3.20 3.53 18

Hospital 1 3.23 3.47 1.86 2.47 3.01 1.69 2.10 2.81 3.28 3.36 2.61 2.91

ANOVA-Example 1

Hospital 1 Mean Standard Deviation Sample Size 2.63 liters 0.496 liters 21 Hospital 2 3.03 liters 0.523 liters 16 Hospital 3 2.88 liters 0.498 liters 23

Ho: 1 = 2 = 3 H1: Not all three population means are the same, or At least one of the population means differs from one of the others

19

ANOVA-Example 1

Hospital 1 Mean Standard Deviation Sample Size 2.63 liters 0.496 liters 21 Hospital 2 3.03 liters 0.523 liters 16 Hospital 3 2.88 liters 0.498 liters 23

n1 x1 n2 x 2 ... nk x k x n1 n2 ...nk

SSb ni ( x i x) 2

i 1 k

SS w df i * si2

i 1

20

ANOVA-Example 1

Source of variability Between groups Within groups Total Sum of Squares (SS) Degree of Mean Squares freedom (df) (MS) F

21

ANOVA-Example 1

Compare the F statistic calculated with the F critical value and then make a conclusion

22

ANOVA-Example 2

Study objective: To examine if there is any difference in mean change in body weight among the subjects in three different weight-loss programs ( = 0.01)

Program A Program B -4.0 kg 3.9 kg g 47 Program C 0.6 kg 3.7 kg g 42 -7.2 kg 3.7 kg g 42

23

ANOVA-Example 2

Ho: A = B = C H1: Not all three p population p means are the same, , or The mean changes in total body weight are not all identical for the three populations

24

ANOVA-Example 2

Program A Mean St d d Deviation Standard D i ti Sample Size -7.2 kg 3 7 kg 3.7 k 42 Program B -4.0 kg 3 9 kg 3.9 k 47 Program C 0.6 kg 37k 3.7 kg 42

25

ANOVA-Example 2

Source of variability Between groups Wi hi groups Within Total Sum of Degree of Mean Squares Squares (SS) freedom (df) (MS) F

26

ANOVA-Example 2

Compare the F statistic calculated with the F critical value and d th then make k a conclusion l i

27

