You are on page 1of 15

ANOVA – Analysis of Variance

Wednesday, April 25, 2018 10:55 PM

ANOVA stands for ‘Analysis of Variance’. It actually means analysis of variation in means
of different groups of a population or different populations. It is an advanced version of t –
test. While t-test is used to compare two means, ANOVA can be used for more than two
means.
What does an ANOVA do?
It studies whether the variation between group-means is due to an effect/treatment of it is
just a chance variation. It checks the ‘Between Group Variation’ and ‘Within Group
Variation’. If the treatment has a significant effect, then ‘Between Group Variation’ will be
significantly higher than ‘Within Group Variation’.
How to do an ANOVA test?
Assume an educational institute wants to check whether different modes of education like:
Visual aided teaching, practical learning, Self-learning through library & internet, have an
impact on the students’ performance. The management decided to assign 20 students to
each of the teaching methods. Their performance will be evaluated with an examination at
the end of treatment. The scores are collected and the mean scores of each of the methods
are also arrived.
ANOVA method is used to find out, if there is a difference between the mean values of the
three groups.
Like in all other Hypothesis Testing, the hypothesis of ANOVA is like:
Null hypothesis: Mean of all the three methods are equal
Alternate Hypothesis: There is a significant variation in mean in at least one of the
methods
1. Calculate the Sum of Squares of ‘Between the Group’ SS B.
2. Calculate the Sum of Squares of ‘Within the Group’ SS W.
3. Find the degrees of freedom of ‘Between the Group’: df 1 = Number of groups -1
4. Find the degrees of freedom of ‘Within the Group’: df 2 = Number of groups X (Number of
observations -1)
5. Calculate the Mean Square value for ‘Between the Group’. MS B= SSB/ df1
6. Calculate the Mean Square value for ‘Between the Group’. MS W = SSW/ df2
7. Calculate F value. FActual = MSB/ MSW
8. Find from F-Table, the FExpected value for the given degrees of freedom.
9. Find out the significance value ‘p’ value.
The below table will explain how the calculations are performed and interpreted:

Six Sigma Page 1

Each type of ANOVA has some variations and the methods and interpretations will be different. The below table will explain how the calculations are performed and interpreted: Figure 1: ANOVA Calculations There are two ways to find out if the variation. F Actual will be greater than FExpected. Then null hypothesis cannot be validated. Two-way ANOVA (Repeated): Used to compare means of groups/populations using two factors with interactions among the factors 4. Thus ANOVA can be used for various purposes. This article is just an introduction to ANOVA. 3.net/anova/> A Simple Model of a Variance Stable Process John J. II – Remain relatively constant and at their lowest levels during the normal equipment or process operating period. If there is a significant difference between the mean values of groups. III – Increase during the wearout time period. Nested ANOVA: Used to compare means of groups/populations that can be sub-grouped and the interactions happen only within the sub-groups and not with other factors. Types of ANOVA ANOVA has multiple uses and there are various types that can be used for different purposes: 1. Hickey 1 Favorite Most fairly accurate descriptions of equipment and/or process lifetimes assume that failure rates follow a three period I II III “bathtub-curve pattern” where failures/errors: I – Decrease during the debugging or improvement time period. Scientific studies of limit based natural or complex growth patterns also suggest that many processes are Six Sigma Page 2 . 2. Two-way ANOVA: Used to compare means of groups/populations using two factors. From <http://www. Another way is: p value will be less than α. Thus alternate hypothesis is accepted. One-way ANOVA: Used to compare means of groups/populations using one factor.whatissixsigma.

8 X0 0.360 0.73 4.55 V2 1.64 V3 1.49 8.01 4.500 0.49 1.499 0.800 0.50 6.092 0.921 The critical growth factor value Rcr = 3.492 0.518 0. R is the growth factor and t is a discrete time variable is a simple model for these processes.29 5.422 V0 4.333 0.800 0.29 7.325 X4 0.191 X8 0.64 7. For R = 2.071 0.749 0.906 X3 0.09 V4 1.499 0.07 4.52 7. the time series iterates Xt = X1 X2 X3… converge to the constant value Xc =.50 6.222 0.48 7. the application of the logistics parabola model to the process results in the iterate expression Vt + 1 = RVt (Vm – Vt). In this case the process is stable 1 within the growth factor R range of 1/Vm < R < 3/Vm.32 4.18 V7 0.39 Six Sigma Page 3 .12 V9 0.50 5.68 4.Scientific studies of limit based natural or complex growth patterns also suggest that many processes are inherently non-linear and subject to chaotic tendencies.160 0.50 5.50 6.833 X5 0. Table 1: Logistics Map – Xt + 1 = Rxt (1-Xt) Calculated Iterates Process Category Unstable Stable Unstable Unstable Unstable Decreasing Constant Oscillation Oscillation Chaotic R= 1 2 3 3.49 6.498 0.528 X6 0.738 0. The logistics map 3 or parabola Xt + 1 = RXt (1-Xt) where Xt + 1 the measure of the next generation is a function of the present measure Xt.500 0.500 0.80 4.13 V6 0.480 0. Also.24 3.50 6.29 2.077 0.49 5.25 4.50 4.725 0.084 0. When the growth factor R falls within the range of 1 < R < 3 the process is stable.587 X9 0.810 0.800 0.16 4.800 X1 0.38 4.320 0.608 X2 0.435 0.89 4.581 0.40 V5 1.29 8.116 0.103 0. Process Variance Stability If we assume that variance (Vt) of a process during its lifetime varies between zero and some maximum acceptable value Vm.29 8.066 0.35 V8 0.50 4.598 0.37 4.134 0.5 which can be easily demonstrated (see Table 1) through the use of an Excel spreadsheet or pocket calculator.22 4.67 4.591 0.8 the instability is clearly evident.501 0.730 0.809 0.50 V1 2.24 (51/2 +1 ) in Table 1 signals the start of chaotic instability in this model process and for R = 3.50 6.810 0.500 0.564 0.74 7.50 5.500 0. the process attains super-stability or constancy when its variance equals one half of the maximum acceptable value (Vt = Vm/2) and when R = 2/Vm.57 7.500 0.947 X7 0.50 4.50 4.50 5.800 0. Table 2: Super-stable Process Variance Map – Vt + 1 = RVt (9 – Vt) Variance Category Unstable Super-stable Unstable Unstable Unstable Decreasing Constant Oscillation Oscillation Chaotic R= 0. This is illustrated for a process with a maximum -allowed variance of Vm= 9 (standard deviation =3) in Table 2.111 0.810 0.

become larger. In the case of Poisson- distributed processes. A conditional Poisson process that conformed to this simple non -linear model has the variance Ct + 1 = RCt (Cm -Ct) and would be stable in growth rate range 1/Cm < R < 3/Cm where Cm is the specified maximum number of occurrences. the growth rate for Vm = 1 is Rc = 51/2 + 1. the expected number of occurrences C = NP (large N. Andrews.. E.7 PPM (N= 106.7 x 10-6). This condition of super-stability is analogous to “States of Equilibrium” in Statistical Mechanics 2 and is illustrated by the Ct + 1 = Ct intersecting line of above Figure I quadratic map. |F'(Vt)| < 1 is the first derivative criteria for the stability of the fixed points Vt = 0. Chaos and infinite period doubling occurs with R > 3. Since F'(Vt) = R (Vm-2Vt). A real-world stable process would of course exhibit random fluctuations in variance which would not be strictly deterministic.isixsigma. Therefore. R = 2/9 = . Super-Stable Poisson Distribution – Ct + 1 = 2 / CmCt (Quadratic Map) The values of R in Table 2 are obtained by scaling the R values of Table I by 1/Vm = 1/9. Vt = Vm – 1/R calculate as R < 1/Vm and 1/Vm < R < 3/Vm respectively.5. In the Chapter 7 section on ensembles with minimum information he proves that a maximum ignorance (lack of assigned causes) about the system exists when the state probabilities are equal. The hypothetical model is suggestive of an ideal.= Ct = Ct + 1 remain constant and are time independent over the operating lifetime of the process. If a process is stable with a relatively constant variance and it meets requirements (in my opinion) it does not need to be fixed. 1975 defines a state of equilibrium as one in which the information we have about the system it has reached a time -independent minimum..4 PPM and growth factor that has the value R = . fixed-point stability. John Wiley & Sons Inc. The oscillatory behavior of the logistic parabola iterates in the unstable growth rate region R ≥ 3 is known as 2n period doubling. The technical literature on non-linear dynamics. P= 1. Vt = Vm – 1/R in the Variance quadratic map Vt + 1 = R Vt (Vm-Vt). the stability ranges for the fixed points Vt = 0. For example. However. For the 2-cycle period F2 (X0) = X0 and super-stable fixed point X0 =. a super-stable Poisson process Variance Ct would be “ideally Poisson” because its expected number of occurrences Ct = Ct+1…remain constant during successive time periods. machine tool wear). When Ct = Cm/2 and R = 2/Cm the process is super - stable1 and ideally Poisson because the expected number of occurrences Co = C1 = C2…. super-stable six sigma process with an expected Poisson failure no of C = 1.6/Vm. Notes and References 1. The cycles or “splittings” increase as the associated growth rates R1.g.. 3. From <https://www.36 is the critical factor. as it ages or deteriorates and becomes unstable some deterministic chaos may be present and evident by an oscillatory pattern of variance (e.60. small fraction P of occurrence) is both the variance and mean of the distribution.24/9 =. 2.. period doubling and chaos is extensive. logistic equations. maximum failure number of Cm = 3.222 is the super-stable growth factor and Rcr = 3. Equilibrium Statistical Mechanics.com/tools-templates/variation/simple-model-variance-stable-process/> Reduce Special-cause Variation Before Experimentation Six Sigma Page 4 . Vt = Vm/2 is the value of the non zero fixed point at the growth rate R = 2 /Vm and F'(Vt) = 0 when this occurs.C. quadratic maps.R2. It is represented mathematically by the composite function expression Fn (X0) = X0 where n is the number of cycles or iterations required for a repetition of the point X0.

as the factors used in the experiments did not explain the response variation. many cups – because of the wide variation – fell below the customer-specified minimum weight. Figure 1: Overall Cup Gram Weight – Before Six Sigma Page 5 .5 grams. the operators usually raise the target cup weight average. An additional 260. They were unsuccessful. which is 1 gram higher than the target of 23 grams (also the lower specification limit [LSL]) for an individual cup’s weight. To avoid low -weight cup failures. The Problem The automated cup line has an average cup weight of 24.Experimentation Benjamin Madrigal 1 Favorite For several years. however. The process thus had to be reset to a higher weight target in order to avoid those out-of-specification cups.198 pounds of resin is used annually with a cost of poor quality (COPQ) of $195. A previous process improvement team attempted to find the sources of variation through some data collection and a couple of two -factor/two-level full-factorial experiments.148. increasing the amount of resin use. When process operators and engineers had tried to reduce the plastic pellet usage by reducing the average formed cup weight. Figure 1 shows the current output of cup weights over 30 days (3 shifts per day). a fully-automated plastic drinking cup production line used excessive amounts of raw materials (plastic PET pellets) due to a wide distribution in the weight of the formed cups.

If such an improvement were achieved.22. customers) map was created (Figure 2). input. process. output.Solving the Problem with DMAIC (Define. the team could reduce by 50 percent the use of the additional resin – a savings of $97. The team aimed to improve the production process such that the cup weight average could be retargeted closer to the 23-gram LSL. Control) A Six Sigma project team (comprised of machine operators. maintenance staff and other factory subject-matter experts) was created to reduce weight distribution variability to achieve a minimum Cp of 1. Measure. saving 0.750 per year. Improve. As part of the Define phase. a SIPOC (suppliers. quality assurance personnel.5 grams of resin on average per cup produced. Analyze. Figure 2: SIPOC Map for Plastic Cup Forming Six Sigma Page 6 .

variation. require the use of detailed process and product data in order to distinguish. etc. piece to piece. • An automatic box filler that takes the stacks of cups from the conveyor and fills up boxes with stacks of 20 cups. the source or sources of variation. within piece. The variation categories can be grouped as: time to time. flat infrared oven that reheats the plastic sheet to specific target temperature. Figure 3: Example of Within Group Variation Six Sigma Page 7 . graphed against the output variable Y. Graphical tools. operator to operator. Data collection is designed to include all the suspected sources of variation. they do. colorant and regrinds (scraped plastic cups that are reground and fed back into the process). multivary studies help identify the where or when of the biggest source of variation. lot to lot. • A 72-position puncher that cuts the cups from the formed plastic sheet (called webbing) that presents the separated cups in stacks to a conveyor system. During the early brainstorming sessions of the project team. however. • A chilled stainless steel hard-chromed roller system that creates a wide plastic sheet. B or C. • A beta-ray scanner that continuously monitors the thickness of the plastic sheet and also provides a closed-loop control to the extruder and roller system. The graph below shown in Figure 3 is an example of a multivary study with group-to-group. Multivary Studies Multivary studies make no changes to the process being studied. The process includes these pieces of equipment: • A plastic-pellet extruder fed with virgin PET resin pellets. A. shift to shift. • A wide. The extruder mixes all of them and supplies a constant plastic paste. changes such as mold temperature increases and mold cavity (plug assist) replacements were suggested – and implemented – but cup-weight distribution remained the same. by categories. • A 72-cavity thermoforming mold that receives the heated plastic sheet and stamps out 72 cups at each press stroke (also known as a mold shot). The team decided to get back to basics. and a multivary study was initiated.

Figure 5: Diagram of Molded Cup Creation Six Sigma Page 8 . 5 and 6. below. the analysis of the sampled data showed that high variation was always present with no correlation to time-related categories. a particular type of multivary study. with each cavity position numbered and the sides and direction of travel indicated. the team looked toward positional variation. and so on. Looking at Each Molded Cup Each molded cup comes from a thermoforming mold with 72 cavity locations. 2 and 3 groups to 4.In Figure 4. represented by the changes between 1. there is a time-related cyclical variation. Figure 5 shows a diagram of the mold. Figure 4: Example of Time Cycle Variation In the example of the plastic cups. Next.

Figure 6: Results of Cup Unit Sampling After a few samples. As displayed in Figure 7. By creating a surface map using the average weights produced in each individual cavity. The team was thus able to identify all of the cavities that fell below the minimum limit at every sampled mold shot (every 72 units coming from a single mold stroke).Data was collected from cup units coming from each cavity as shown in Figure 6. rows 1 through 4 have higher average weights than rows 7 through 9. the row-to-row differences across the mold were clear. it was clear that the cavity position inside the mold was the highest source of variation. with product that come from one side of the mold running consistently below the average shot. Figure 7: Surface Map of Molded Cups – Before Six Sigma Page 9 .

looking for any clues as to the uneven weight distribution. Previous work included looking at the thermoforming electrical heaters and thermocouples. This time.Looking at Sheet Thickness Distribution The project team began to look for clues as to why the side-to-side weight variation was occurring by returning to the SIPOC exercise. Team members were left to examine the infrared oven and the mold itself. a beta-ray scanner was available to monitor the thickness in a continuous raster (line-by-line) scan and provided reliable numbers. however. but they had shown no critical issue. The physical review. found that a section of the oven had a gap between the oven and the mold interface. The team was able to compare plastic sheet thickness to cup weight and found no correlation – the extruded plastic sheet had a consistent thickness distribution among the width axes while the formed cup still displayed a side-to-side difference. Fortunately. An important input variable to the formed cup weight was the extruded sheet thickness. team members found nothing wrong. Upon inspecting the electrical components. the team decided to disassemble and inspect the infrared oven entirely. Figure 8: Extruded Plastic Sheet Reheating Oven Six Sigma Page 10 . which allowed heat to escape.

a more even weight distribution was seen across the mold.After this physical gap was fixed and several full mold shot samples were run. As shown in Figure 9. the variance in weights narrowed after the infrared oven was repaired. Figure 9: Surface Map of Molded Cups – During Six Sigma Page 11 .

Figure 10: Cup Gram Weight After Fixing Oven Gap Six Sigma Page 12 .

they fixed it by flushing the lines out. Unfortunately. This last action allowed the team to eliminate cavity 65 as a recurring low weight cup (Figure 11). often had a low cup weight (Figure 10). Figure 11: Cup Gram Weight (with Cp) After Flushing Vacuum Lines Six Sigma Page 13 . the low weight behavior persisted. position 65. The team revisited the cavity location and found a vacuum line had clogged. The mold was inspected. cleaned and the incumbent cavity fixture replaced.Issues with Position 65 A particular mold cavity.

The experiment data analysis resulted in a good R2 square of 97. The average weight was reduced to the 24 -gram target with none (or very few) cups going under the 23-gram LSL.After this step.87 percent with sheet thickness and regrind set point as strong contributors to the overall variation. Today cup-weight surface mapping is more even across the mold with a tighter distribution. The project team shifted its focus to reducing common cause variation. and working with sheet thickness and regrind levels allowed the team to establish optimal input control parameters. The low points. but it would have resulted in a-14 percent reduction in productivity. The original project goal of reducing raw material usage was achieved with a savings of $100.5. it was not pursued. Further DOE work focused on fixing the oven temperatures. the process Cp was increased to greater than the targeted minimum of 1. Data analysis showed that oven settings and initial sheet thickness contributed to cup-weight variation. This change was considered. all special variations had been elminiated.000. A standard three-factor full-factorial design of experiments (DOE) was set up with the factors of oven temperature. returning again to its SIPOC map to identify critical inputs to process variation. Figure 12: Surface Map of Molded Cups – After Six Sigma Page 14 . located at the front corners (Figure 12). plastic sheet thickness and plastic pellet regrind. cannot be improved without redesigning the mold and reducing the size of the mold from 72 to 60 cavities. Accordingly.

Conclusion This process improvement project demonstrates that it is important to not use sophisticated statistical tools (such as DOE) to analyze a process before reducing the special input variables present in the process in question.isixsigma. Otherwise. From <https://www. time. energy and resources may be wasted without ever finding the critical characteristics to enable the control and improvement of a process.com/tools-templates/variation/reduce-special-cause-variation-before-experimentation/> Six Sigma Page 15 .