0 Up votes0 Down votes

42 views7 pagesJun 24, 2011

© Attribution Non-Commercial (BY-NC)

DOC, PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

42 views

Attribution Non-Commercial (BY-NC)

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

You are on page 1of 7

ANOVA is a powerful set of techniques to test differences among means of three or more samples When only a single variable is being considered (think of it as a table with only one row), then this is called one-way ANOVA The null hypothesis is Ho: 1 = 2 = = k for k samples. The alternative hypothesis is Ha: the populations means are not all equal. For Ha, all the means wouldnt have to be different from each other; it may be that just one of the k population means is significantly different from the rest of them. Sample sizes for the different groups do not need to be equal, although it is best if they are. The assumptions of ANOVA include 1) that each sample is normally distributed and 2) that the variances of each of the populations are equal. Unlike 2 and G2 tests, which were restricted to counts only, ANOVA can be used on most types of data (e.g. continuous variables, proportions and so on) If we had multiple samples and wanted to test differences among them, we could potentially perform multiple 2-sample t-tests. However, for each test, we make a type I error with rate . The effects are additive: that is, perform n tests and the probability of making a type I error is n . Also, for n samples, there are n(n / 2 possible comparisons essence, we 1) in are bound to be make false conclusions. ANOVA takes a different approach and essentially makes all of these comparisons at once.

Sum of squares ANOVA is based upon the concept of sum of squares (in longhand, sum of squared deviates). 2 Estimated variance, i.e. ( xi x ) ( n 1) , is a measure of the variability within a sample divided by degrees of freedom, which for present purposes we can think of as sample size. That is, variance is essentially average variability per data point. If we dont divide by degree of freedom then we have a measure of total, absolute variability. This is sum of squares. 2 2 SS = ( xi x ) = xi2 ( x ) n Partitioning of variability When we have multiple (3 or more for ANOVA) samples, and consider the variability with the whole set of data, there are two sources of variability. First is the variability within each of the samples and second is the variability between the samples, the result of the effect of a treatment upon a particular mean, which therefore shifts that particular distribution. Imagine for the moment that we were just dealing with two distributions (but remember ANOVA is strictly for 3 or more) Treatment A Treatment B Total range of data

Relative the above, the following [in which only the means have been altered] would have greater variability as a result of between treatmentsthe mean of B is higher shifting the whole distribution to the right, but the within treatment variability is no different than the situation above. In other words, the between treatment variability plays a relatively greater role in explaining total variability than above. The fraction of the total variability that comes from between samples [as opposed to within] is greater than for the above example. Treatment A Treatment B Total range of data is larger Conversely, if the means were the same as before but variability within samples were higher Treatment A Treatment B Total range of data Then, within treatment variability accounts for relatively more variability of the whole dataset. The fraction of the total variability that comes from between samples [as opposed to within] is less than for the first example. The above is a crude explanation of the basic idea behind ANOVA: partition the variability of the whole data set into two components, that accounted for by within samples [hereafter groups] and that between groups. The way that we measure the variability is by sum of squares. SStotal = SSwithin groups + SSbetween groups Within group Sum of Squares This is easy to calculate. For each group (sample) we can calculate its SS using the formula 2 above. (For example, for the data 16, 15, 17, 15 & 20, then n = 5, x =1395 , x =83 and hence SS = 17.2.). We then simply sum those SS for all groups to get a total SS within groups. SSwithin groups =

SS

i =1

x SS

x x

All data All the data 16...20, 20..18, 1818 15 273 5039 18.2 70.4

Therefore, SSwithin groups = 17.2 + 14.8 + 18.8 = 50.8 Between groups Sum of Squares For the above example, we have three means: 16.6, 18.8 and 19.2. Essentially we have a sample, a distribution of means and we want to measure the variability among those means. We use a slightly modified version of the SS formula. The mean of all of the data is 18.2 (= 273 / 15). We measure the variability of our set of means (16.6, 18.8 and 19.2) from the overall mean, 18.2. So initially we can think of sum of squares for the between groups as (16.6 18.2)2 + (18.8 18.2)2 + (19.2 18.2)2. However, those three mean values are each associated with a particular original sample size (5 for each group in this case). For our SS, we ought therefore to weight the difference from the 18.2 mean by the sample size, that is ngroup(Meangroup i Meanall data)2 In this case, the sample sizes are equal so it doesnt matter, but often it does. Therefore, in general, SSbetween groups =

n ( x

k

xall data

)2 .

Thus, for the above examples we have SSbetween groups = 5(16.6 18.2)2 + 5(18.8 18.2)2 + 5(19.2 18.2)2 = 12.8 + 1.8 + 5 = 19.6. From the table, notice that 70.4 is the SS of all of the data. This value was calculated in the normal way, treating it as a single sample consisting of 15 observations and using 2 SS = ( xi x ) . Now, 19.6 (SSbetween) plus 50.8 (SSwithin) equals 70.4. We have partitioned the variability into two components, between and within, in a meaningful manner that sums to what we would quantify if it were just a single, combined sample.

The final steps Remember that the estimated variance of a focal population =

sum of squared deviates . degrees of freedom

We can again think of dividing the sum of squares by sample size (technically degrees of freedom) to get a sort of average squared deviate. In ANOVA, this has a special name mean square, often denoted MS. We have a measure, 19.6, which is SSbetween. If we divide by the relevant degrees of freedom, then we have mean square for between treatments. There were 3 groups and so degrees of freedom = number of groups 1 = 3 1 = 2. MSbetween = 19.6 / 2 = 9.8 We have a measure, 50.8, which is SSwithin. If we divide by the relevant degrees of freedom, then we have mean square for within. For each of the 3 groups, we have ni 1 degrees of freedom. Total degrees of freedom is thus (5 1) + (5 1) + (5 1) = 4 3 = 12. MSwithin = 50.8 / 12 = 4.23 To summarize, we now have a measure, mean square, for the variance within the groups, this is 4.23, and a measure for the variance between the groups, 9.8, and through degrees of freedom all the sample sizes have been taken into account, that is those two measures have each been normalized, so we can directly compare them in meaningfully. Finally, we look at the ratio between them to get an F test statistic. Note, we always take between / within. It is this ratio that we look up in a table of the F distribution. For our chosen level of significance, the table gives us a critical F value. If our test statistic is greater than this critical value, we can reject the null hypothesis at level . If it is less than the critical value, then we fail to reject the null hypothesis.

F= MS between 9. 8 = = 2.32 MS within 4.23

The table for the F distribution is a little complicated because we have two degrees of freedom, that for the numerator and that for the denominator. Then we have different critical values for each level of significance. For our sample we have 2 degrees of freedom for the numerator, 12 degrees of freedom for the denominator, and we will take = 0.05. The critical value from the table is 3.89. As our test statistic, 2.32, is less than this value, we fail to reject the null hypothesis. We do not have sufficient evidence to demonstrate a significant difference, at the 5% significance level, among the different populations means.

One final note, MSwithin, because it deals with variability within a sample, can be thought of as error about the mean of the population from which the sample comes from. That is, a observation is the population mean, , plus an error term:

2 where ei , j ~ N ( 0, ) Thus, MSwithin, is usually called mean squared error and denoted MSE. This notation is often used in the output of statistical packages.

x i , j = j + ei , j

By the way, one-way ANOVAs are the easiest, they go up in complexity from here! Carl Anderson (carl@isye.gatech.edu) 10/25/02

2

2

All data All the data 16...20, 20..18, 1818 15 273 5039 18.2 70.4

SS = ( xi x ) = xi2 ( x )

SStotal = SSwithin groups + SSbetween groups SSwithin groups = 17.2 + 14.8 + 18.8 = 50.8 SSbetween groups = 5(16.6 18.2)2 + 5(18.8 18.2)2 + 5(19.2 18.2)2 = 12.8 + 1.8 + 5 = 19.6.

SStotal = 50.8 + 19.6 = 70.4 Minitab command: Stat / ANOVA / One-way (unstacked) One-way ANOVA: A, B, C

Analysis of Variance Source DF SS Factor 2 19.60 Error 12 50.80 Total 14 70.40 Level A B C N 5 5 5 Mean 16.600 18.800 19.200 2.058 MS 9.80 4.23 F 2.31 P 0.141

Pooled StDev =

Individual 95% CIs For Mean Based on Pooled StDev --------+---------+---------+-------(---------*---------) (---------*---------) (---------*---------) --------+---------+---------+-------16.0 18.0 20.0

Source

Degrees freedom

Mean squares

SSTr k 1

F-statistic

MSTr MSE

p-value

P ( Fk 1, Nt k F )

MSTr =

F =

= 19.6 / 2 = 9.8

Error (=within)

Nt k = 15-3 = 12

MSE =

SSE Nt 1

= 50.8 / 12 = 4.23

Total

Nt 1 = 15-1 = 14

SST = 70.4

- T=ANOVA=RegressionUploaded byvpilania2008
- 53314066-How-to-Pass-the-NLEUploaded byDiana Rose DC
- Cognitive DisordersUploaded byLawrence Nemir
- Anti Psychotic DrugsUploaded byJohn Corpuz
- Health TeachingUploaded byMarco Virgo
- Techniques of Physical ExaminationUploaded byAijem Ryan
- Statistics QuestionsUploaded byteacher.theacestud
- Health Promotion Strategies for NursesUploaded bypuding101
- T7_LatinSqUploaded bymaleticj
- An Experimental Study on the Effect of Foundation Depth, Size and Shape on Subgrade Reaction of Cohessionless Soil Wael NUploaded bylingamkumar
- BiophysicalUploaded byBryan Jeff Quilo Mayo
- Solution to Selected Question in Quantitative MethodUploaded byAdekoya Adeniyi
- PAK_11Uploaded bySinta Permata Sari
- Lecture-3-Single-Factor-Experiments_RCBD.pdfUploaded bykirshten monoy
- Vimal Chapter7Uploaded byMitheleshDevaraj
- Characterization and Modeling of Crude Oil Desalting Plant by a statistically design approachUploaded byangelkindly
- 02_CRD.pdfUploaded byYogi Saputra
- A comparative study between Coffee, Curcumic, and Ginseng in increasing dopamine levels in bloodUploaded bySean Gabriel
- eps14drUploaded byJerome Ventura Balgos
- A Study on Disclosures in Accounting StandardsUploaded byAayushi Arora
- Plackett-Burnam (Hadamard Matrix) Designs for ScreeningUploaded bywilliamsa01
- Chapter 11 - Analysis of Variance With More Than One IVUploaded byMurali Dharan
- Data AnalysisUploaded byTaqsim E Rabbani
- 2-newton_laws+incline_plane+plotting_and_motion_sensor.pdfUploaded byCute Physics
- SW3 - 0111Uploaded byCarl Johnave Manigbas Monzon
- Ch 6 the 2 k Factorial DesignUploaded bydaimon_p
- 1-s2.0-S1026918518300763-mainUploaded byMichelle zabala piedrahita
- Retail Store OperationsUploaded byAshutosh Baghel
- 12 Chapter 4Uploaded bymuhammad rizali

- melasma treatment, Combined therapy is more effective than conventional therapy as Melasma treatmentUploaded byBhuiyan Ma Yousuf
- AP Statistics 1st Semester Study GuideUploaded bySusan Huynh
- Growing an Edinburgh MOOCUploaded byAde Firmansyah
- 501625main Ta12 Msmsm Draft Nov2010 AUploaded bySandeep Kumar
- Oil and gasUploaded byFaisal Nawaz
- Effects of Word-Of-Mouth Versus Traditional Marketing Findings From an Internet SocialUploaded byJavier Gonzalez
- Myth DebunkingUploaded byNéstor F. Rueda
- A_New_Look_into_the_Construct_Validity_o.pdfUploaded byAlexis Cordova
- Career PlanningUploaded byArpan Chatterjee
- Training and Development_OutlineUploaded bysiddiqua
- glc researchUploaded byapi-288052990
- Ointment Process Validation-OriginalUploaded byasit_m
- The Taguchi Design of ExperimentsUploaded bySreekumar Rajendrababu
- 25game2Uploaded byPotnuru Vinay
- Physiotherapy Management of Patients With Chronic Pelvic Pain (CPP) - A Systematic ReviewUploaded byCambriaChico
- Information Acquisition BiasUploaded byCosmina Ștefănescu
- JURNAL PICO KMB.docxUploaded byOktaviani Putri Pratiwi
- Portfolio Communication ManagementUploaded byAdil
- Publicatii Straine Intrate in Biblioteca INS Intre Anii 2001 Si 2008Uploaded byaledpp
- Cv Abraich JuinUploaded byAyoub Abraich
- Exer 1 Article 5[1]Uploaded byameshia2010
- Exam 2 Study Guide psych 101Uploaded byraviteja
- Concrete Pictorial Abstract Approach on Students Attitude and Performance in MathematicsUploaded byArifudin Surya
- Colorimetry EssayUploaded bylujaynaxx
- Teaching plan Beka2453Uploaded byKoh Guan Keong
- Brandy Equity of United Colors of UCBUploaded bymohit.almal
- Viscosimetric Behavior of Tomato Concentrates - Harper and Sahrigi 1965Uploaded byBeatriz Penido
- Books, readership and librariesUploaded byM S Sridhar
- W-4 the System Development Life Cycle (1) sameerUploaded bySameer Khan
- Megazyme Fructan Flyer 2013Uploaded byMegazyme International Ireland

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.