This action might not be possible to undo. Are you sure you want to continue?
Peter Shaw RU
1 way ANOVA – What is it?
This is a parametric test, examining whether the means differ between 2 or more populations.
Site 1 Site 2 Site 3
Do males differ from females?
Do results differ between these sites?
indeed we are spoiled for choice: Parametric Nonparametric Mann-Whitney U Kruskal-Wallis test 2 classes only t test. anova 2 or more classes anova .This is not in itself so unusual.
3: Because I want you to understand the degrees of freedom associated with anova models. 2: You need to be familiar with the layout of anova tables. .So why am I spending so much time on anova? 1: Because anova is the definitive analytical tool: it allows one to ask questions that cannot be asked any other way. There are deep pitfalls associated with allocation of dfs. and inspection of the dfs in an anova table allow one to understand immediately what model another researcher has used.
which is the ratio of explained to unexplained variation. and some of which is unexplained. some of which can be explained by the experimenter (such as the difference between two treatments). It generates a test statistic F.in fact ANOVA on 2 groups is equivalent to a t test [F = t2 .What anova actually does: It partitions the variation in the data into components. formally F 1. This can be thought of as a signal:noise ratio. but is in fact essential to performing the anova. It is thus similar to the t test .n-2 = (Tn-2 )2] . Thus large values of F indicate a high degree of pattern within the data and imply rejection of H0. The unexplained variation is called “error”.
The core of anova is to partition the sum of squares of a dataset: This is the summed values of (X-mean) 2. otherwise known as the sum of residuals2. Value Overall mean (μ) Residuals 1 2 3 4 5 6 7 8 Datapoint number Linear model: Each observation is the mean plus a random error Xi = μ + ei Total sum of squares = SStot = Σi (Xi-mean) 2= Σi (ei * ei) .
Now we split the data up into treatments: Overall mean (μ) New residuals Mean of treatment 2 5 6 7 8 Datapoint number Treatment 2 Linear model: Each observation is the mean plus a treatment effect plus random error: Xti = μ +Tt+ eti Total sum of squares = Σi (Xi. Notice that it only works if Σti (eti ) = Σ (T ) = 0) 1 2 3 4 Treatment 1 .μ) 2= Σti (eti * eti ) + Σti (Tti * Tti ) = error sum of squares + treatment sum of squares (This is how variation is partitioned.
.Now we have one sum of squares which has been partitioned into two sources. In order to test this we cannot simply look at the sums of squares (because the more samples you collect the more variation you may find). The null hypothesis H0 says that these two sources of variation should be equally unimportant. but first divide these by their degrees of freedom to convert SS into variance: Total variance = total SS / total df – true but not used in most anova tables treatment variance = treatment SS / treatment df error variance = error SS / error df. both unexplained random noise. explained and unexplained. F ratio (signal/noise) = treatment variance /error variance.
Source df SS treatment (T-1) SStrt error…………by subtraction Sserr Total (N-1) MS F =SStrt /(T-1) MStrt /MSerr =SSerr /dferr Finally. you (or the PC) consult tables or otherwise obtain a probability of obtaining this F value given dfs for treatment and error.s Anova tables: Exact layout varies somewhat . .I dislike SPSS’s version! Learn this layout parrot-fashion! It is correct for a 1-way anova with N observations and T treatments.
which I intend to show you now. . In practice no-one does it this way because there is a laboursaving shortcut that is easily learned and implemented.It is formally possible to perform an anova by calculating the values of treatment and error for each observation in turn – I have a handout showing this.
How to do an ANOVA by hand: 1: Calculate N.CF where Xt.). *Xt. and r is the number of observations that went into that total. then calculate Treatment Sum of Squares SStrt = Σt(Xt. 3: Draw up ANOVA table. Σx. getting error terms by subtraction. . = sum of all values within treatment t. 2: Find the Correction factor CF = (Σx * Σx) /N 3: Find the total Sum of Squares for the data = Σ(xi2) – CF 4: add up the totals for each treatment in turn (Xt. Σx2 for the whole dataset. )/r .
but a trial of BOTH fertiliser and insecticide could not.s s This technique is only applicable when there is one treatment used. One way ANOVA’s limitations . Note that the one treatment can be at 3. 4. Thus fertiliser trials with 10 concentrations of fertiliser could be analysed this way.… many levels.
Class data – your turn T1 7 8 11 15 12 T2 14 16 19 18 15 T3 20 18 22 19 16 Totals (to be nice to you!) 53 82 95 .
just as you can’t do multiple t tests in place of a 1-way anova.What to do when you want to test : H0: group means are the same When the data are clearly not normally distributed? If you have 2 groups. 1: Kruskal-Wallis non-parametric anova (good and safe) 2: use normal anova but use a Monte-Carlo approach to empirically estimate p values. (This is a perfect. safe and reliable way to generate p values. . one of which is supplied in SPSS. (Why not?) There are 2 good alternatives. you can fall back on Mann-Whitney’s U test BUT: 3 or more groups – you can’t do multiple U tests. one of which needs special code (I have some home-written). but is not widely available).
If you do want to ask about a specific division within your classification you need to explore the world of post-hoc tests (=”after the event”). but you need to be careful of handling your significance levels. There are a plethora of these.05 you simply assume that the groups do not differ. . If however p<0. The simple answer is “NO”.Post-hoc tests Often one runs an ANOVA on a dataset where the “treatment” variable comes at >3 levels. The p value tests the classification as a whole. and you can run them by hand. If p>0. such as showing that site 1 differs from site 2. students often ask whether this proves some specific difference.05. and you can’t infer specific differences from it.
then test H0: no difference between teach pair: P1-P2.05 hat means. That is what p = 0. Or any other p1 p2 p3 . Take random data and assemble into 2 piles.05 you know that you will reject this H0 1 time in 20. P2-P3 hat 1 time in 20 p1-p2 is * 1 time in 20 p1-p3 is * 1 time in 20 p2-p3 is * Why you don’t do multiple t tests. Using p = 0. Now assemble into 3 piles.test. unless you have your eyes open…. then test H0: no difference between them. P1-P3.
not 1 in 20. but the probability of accepting the 3 together is 0.95*0. and you apply a more stringent criterion to each individual test. AND in P2P3. It is OK to do this PROVIDING you know what you are doing.05 you are lying.86.α) Where α is the final significance level.14. In each case the probability of accepting H0 is 0. AND in P1-P3. but not quite. This involves accepting H0 in test 1 (P1P2). albeit probably unwittingly.Now we ask what the probability is that we will end up accepting H0.95*0. 1-3*p). each one should run at a significance level of P = 1-(1-α)1/N = 1n (1. But if p(accepting H0) = 0.857375 (nearly.95 (=1-p). .95 = 0. If you are doing N different tests on subsets of the same data. So if you claim in your write-up that you used p=0. So in random data you will reject H0 1 time in 7. then p(rejecting H0) = 0.
Post-hoc tests in SPSSunder “Compare means – 1 way anova”. Are hidden .
Dissolved Fe in water draining Pelenna mine.9 p<0. ppm FE 0 -20 N= 8 8 8 8 25 8 8 8 site 1 site 2 site 3 site 4 site 5 site 6 site 7 SITE .49 = 72.001 But which sites differ from each other? 60 40 20 Fe. 120 100 80 F6. Swansea.
FE as is site 2.05 1 2 3 1.000 1.00 Sig.0000 1.752 1.0000 62.2500 1.000 Means for groups in homogeneous subsets are displayed. Uses Harmonic Mean Sample Size = 8.3750 2.00 7.000. Duncan a NUMSITE 1.5000 .1250 1.00 3.00 6.00 2.3750 19. a. .00 5.Duncan’s multiple Note range test: 1: Means are sorted into ascending order 2: all bar 2 are in a homogenous subgroup: site 3 is in a group by itself. N 8 8 8 8 8 8 8 Subset for alpha = .00 4.
00 5 8 5.00 6 8 4.00 4 8 6.00 2 .00 Site SIZEORDR 1.00 3 7.00 8 8 B 8 C 8 3.Presentation methods: 1: Leave means sorted into order and underline those that do not differ 120 100 80 60 40 20 0 25 FE -20 N= A 1 7 2.
00 6.05”.00 4. 1.00 62.50 19. Then you add the text “means followed by the same letter not differ at p<0.13 A C B A A A A .38 1.00 3.2: the ABC method Leave the means in their original order but indicate which group they in by giving a letter of the alphabet to each line in the graph just presented.00 2.00 5.00 7.00 1.38 1.00 2.25 1.
but you can create one with care. but apply a more stringent significance test as explained earlier. and underline those which do not differ significantly as before. This does not have a post-hoc test. 1: Compare every group with every other by a U or K-W test. known as the Kruskal Wallis test. . 2: Sort means (or better medians) into ascending order.And if the data are very non-normal? You have always got a non-parametric anova.
00 P values for each pairwise comparison in turn: 4 NS 0.006 NS - SITE Site 12 3 4 1 NS - 2 NS 3 0.00 4.036 - . P<0.05 by Kruskal-Wallis test.00 3.00 2. 50 40 48 30 20 10 MAYFLY 0 -10 N= 18 15 12 10 1.Mayflies on Pelenna stream (4 sites only).
00 3 10 4.00 1 12 3. and underline sites that do not differ at this level 50 40 48 30 20 10 MAYFLY 0 -10 N= 15 1.Adjust significance to 1-(0.00 2 18 2.0085.95^1/6) = 0.00 4 B A SIZEORDR Site Or list as follows: Site 1AB 2A 3AB 4B .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.