You are on page 1of 12

Report for Statistical Data Processing

Matricola Last Name First Name


s210549 Gastaldi Chiara
s223633 Umer Muhammad

0. Table of contents
1. Introduction

2. Solved Exercise - One-way ANOVA

3. Solved Exercise - Two-way ANOVA

4. Solved Exercise - One-way MANOVA

5. Solved Exercise - Two-way MANOVA

6. Conclusions

1. Introduction
The purpose of this report is to explore the statistical tools ANOVA (one way and two way)
and MANOVA (one way and two way), compare and apply them to a test case coming from
the field of interest of the two authors.

1.1 ANOVA

If the objective is to compare two groups or populations on a single factor, the t-Test is
sufficient. On the other hand "Analysis of Variance" (ANOVA) tests three or more groups for
mean differences based on a continuous (i.e. scale or interval) response variable (a.k.a.
independent variable). The term "factor" refers to the variable that distinguishes this group
membership. The lathe machine that produced a mechanical component is an example of
factor.

There are two main types of ANOVA:


1. "one-way" ANOVA compares levels (i.e. groups) of a single factor based on a single
continuous response variable (e.g. comparing the diameter of the mechanical
components by the lathe machine that produced them)
2. "two-way" ANOVA compares levels of two or more factors for mean differences on
a single continuous response variable (e.g. comparing diameter of the mechanical
component by both the lathe machine and the operator).

1.2 MANOVA
MANOVA stands for Multivariate Analysis of Variance". In basic terms, a MANOVA is an
ANOVA with two or more continuous response variables. Like ANOVA, MANOVA has both a
one-way flavor and a two-way flavor. The number of factor variables involved distinguishes
a one-way MANOVA from a two-way MANOVA:

1. "one-way" MANOVA compares two or more continuous response variables (e.g.


component diameter and surface roughness) by a single factor variable (e.g. lathe
machine that produced them).
2. "two-way" MANOVA compares two or more continuous response (e.g. component
diameter and surface roughness) by two or more factor variables (e.g. lathe
machine and the operator).

Hypotheses of ANOVA
Following two hypotheses are considered in ANOVA.
Ho: The (population) means of all groups under consideration are equal.
Ha: Means are not all equal. (Note: This is different than saying they are all
unequal)

1.3 ANOVA Significance Testing with the F Statistic


Four parameters are needed to perform significant testing in one-way ANOVA:

Alpha is chosen ahead of time (in this case set to 0.05) and represents the
willingness to falsely reject the null hypothesis if it is true. In practical terms It is the
area of the right
F-critical corresponds to the x coordinate of the distribution that guarantees that
the area on the right of F-critical is (5%) of the total.
The F ratio is computed from the data (see Sect. 2)
The p value is the area on the right of the F ratio.
F ratio
A large F ratio means that the variation among group means is more than you'd expect
to see by chance. A large F ratio is encountered both when the null hypothesis is wrong
(the data are not sampled from populations with the same mean) and when random
sampling happened to end up with large values in some groups and small values in
others.
P value
The P value tests the null hypothesis that data from all groups are drawn from
populations with identical means. Therefore, the P value answers this question:
If all the populations really have the same mean (the fact that the rods come from
different lathe machines is ineffective), what is the chance that random sampling would
result in means as far apart (or more so) as observed in this experiment?
If the overall P value is large (i.e. > 0.05), the data do not give you any reason to
conclude that the means differ. Even if the population means were equal. This is not the
same as saying that If the overall P value is small, then it is unlikely that the differences
you observed are due to random sampling. It is possible to reject the idea that all the
populations have identical means.
Decision criteria

if pvalue< the null hypothesis is rejected


If the computed F-ratio > F-critical the null hypothesis is rejected
2. Solved Exercise: One Way ANOVA
In a mechanical parts manufacturing factory, a small rod with nominal diameter 25mm
(dependent variable) has been machined on 3- different lathe machines (i.e. independent
variables). On each lathe machine 7 parts are machined and the diameter has been
measured. A statistical data analysis is required to observe the 3 machines data differences.

2.1 Data
Data
no. of df= degrees of freedom
samples lathe 1 lathe 2 lathe 3
SS= Sum of squares
1 25.45 25.71 25.64
MS= Mean Square
2 25.20 25.50 25.73

3 25.61 25.62 25.81 Alpha= Significance level

4 25.55 25.73 25.91 P-value= A value to test Null


hypothesis
5 25.10 25.78 25.56

6 25.17 25.66 25.70

7 25.15 25.71 25.79

2.2 Solution
The excel Data Analysis tool has been used and cross-checked with step-by-step calculation
ANOVA: Single Factor (Alpha=0.05)

SUMMARY
Groups Count Sum Average Variance
lathe 1 7 177.23 25.319 0.044681
lathe 2 7 179.71 25.673 0.00839
lathe 3 7 180.14 25.734 0.013362

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups (BG) 0.704924 2 0.3525 15.91649 0.000105 3.554557
Within
Groups(WG)/Error (E) 0.3986 18 0.0221

Total (T) 1.103524 20

2.3 Relevant formulas


3

= ( )2
=1
where nj is the size of sample j (always 7 in this case)
3 7

= = ( )2
=1 =1
3 7

= ( )2
=1 =1

/
=
/
= +

2.4 Comments on Results:


Ho: All means of rod diameters from 3-lathe machines are equal

Ha: At least one mean differs from another

Alpha = 0.5 (significance level 95%)


Using the criteria defined in Sect. 1.3 is it possible to conclude that, since:
- the p-value is lower than 5%
- the computed F-ratio is higher than Fcritical

the null hypothesis H0 is rejected, in other words Ha is accepted. The data provide
sufficient evidence to conclude that the means of the rod diameter from the 3-lathe
machines are not all the same.

3. Solved Exercise: Two Way ANOVA


In the same above mentioned example, we introduce one more independent variable: there
are now 2 operators working at the 3 different lathe machines. All operators have worked at
all 3 lathe machines respectively (during different shifts). Six different observations have
been carried out.
The purpose of this section is to establish whether there is any effect of operator
skills/expertise on the output dependent variable (rod diameter).
ANOVA partitions the overall variance into multiple parts:
- one is always the ERROR (the unexplained source, SSE)
- One-way ANOVA works with only one potential source of variability (groups or
columns, i.e. lathe machines, here denoted as SSBG)
- Two-way ANOVA introduces a new additional way to separate data (blocks or rows,
i.e. operators SSBB). It thus allow to further refine how variance is split apart,
allowing for more powerful hypothesis tests.

3.1 Data
Data

lathe 1 lathe 2 lathe 3

Operator 1 25.45 25.71 25.64


Obs. 1
Operator 2 25.20 25.50 25.73

Operator 1 25.61 25.62 25.81


Obs. 2
Operator 2 25.55 25.73 25.91

Operator 1 25.10 25.78 25.56


Obs. 3
Operator 2 25.17 25.66 25.70

Operator 1 25.15 25.71 25.79


Obs. 4
Operator 2 25.4 25.7 25.6

Operator 1 25.2 25.5 25.7


Obs. 5
Operator 2 25.6 25.6 25.8

Operator 1 25.5 25.7 25.9


Obs. 6
Operator 2 25.1 25.7 25.6

3.2 Solution
We are in this case using Matlabs built-in function anova2. The results have been cross checked by
Excels calculations
3.3 Relevant formulas
7

= ( )2
=1
where nk is the size of sample k (always 3 in this case)

/
=
/

/
=
/
= + +

3.4 Comments on Results


HoG: All means of rod diameters from 3 lathe machines are equal

HoB: All means of rod diameters from 7 different operators are equal

Ha: At least one mean differs from another

Alpha = 0.5 (significance level 95%)


Introducing block or rows (in this case operators) as well as groups allows to reduce SSE,
since part of the variability assigned to error in the one-way ANOVA calculation is now
explained by the presence of different operators running the machines (SSB).
Several conclusions can be drawn from these results
- The introduction of blocks or rows (in this case operators) reduces the error SS E
allowing for more powerful hypothesis tests.
- The p-value for the lathe machine (Columns) is zero to four decimal places, it is
therefore lower than the chosen 5% significance level. This result is a strong
indication that the diameter varies from one lathe machine to another. HoG is
rejected.
- The p-value for the operator (Rows) is 0.8891. This value indicates that the
operators performance is quite comparable. The observed p-value indicates that
an F-statistic as high as the observed one occurs by chance about 89 out of 100
times, if the diameters produced were equal from operator to operator. HoB is
accepted.
- The operators and lathe machines appear to have no interaction. The p-value at
0.99, means that the observed result is likely (99 out 100 times), given that there is
no interaction.

4. Solved Exercise: One Way MANOVA


In the same above mentioned example, we introduce one more dependent variable or
response: we are now looking at surface roughness as well as the diameter. The factors here
considered are only the lathe machines, we are not looking at the operators at this stage.

4.1 Data
Data Output Diameter

lathe 1 lathe 2 lathe 3

Observation 1 25.45 25.71 25.64

Observation 2 25.20 25.50 25.73

Observation 3 25.61 25.62 25.81

Observation 4 25.55 25.73 25.91

Observation 5 25.10 25.78 25.56

Observation 6 25.17 25.66 25.70

Observation 7 25.15 25.71 25.79

Observation 8 25.4 25.7 25.6

Observation 9 25.2 25.5 25.7

Observation 10 25.6 25.6 25.8

Observation 11 25.5 25.7 25.9

Observation 12 25.1 25.7 25.6


Data Output Surface Roughness

lathe 1 lathe 2 lathe 3

Observation 1 25.45 25.71 25.64

Observation 2 25.20 25.50 25.73

Observation 3 25.61 25.62 25.81

Observation 4 25.55 25.73 25.91

Observation 5 25.10 25.78 25.56

Observation 6 25.17 25.66 25.70

Observation 7 25.15 25.71 25.79

Observation 8 25.4 25.7 25.6

Observation 9 25.2 25.5 25.7

Observation 10 25.6 25.6 25.8

Observation 11 25.5 25.7 25.9

Observation 12 25.1 25.7 25.6

4.2 Solution
Using Matlabs built-in functions:
[d,p,stats] = manova1([Diameter,Roughness],Groups,0.05);

The first output, d, is an estimate of the dimension of the group means. If the means
were all the same, the dimension would be 0, indicating that the means are at the same
point. In the example the dimension is 1, indicating that it is not possible to reject the
hypothesis that the group means fall along a line.
The largest possible dimension for the means of three groups is 2, which would indicate
that the means fall on a plane but not along a line.
The second output, p, is a vector of p-values for a sequence of tests. The first p value
tests whether the dimension is 0, the next whether the dimension is 1, and so on. In this
case the first p-values is small (6.6510e-06), while the second one is large (0.56). That's
why the estimated dimension is 2.

Matlabs function MBox confirms the assumptions on which MANOVA is based:

[MBox] = MBoxtest([Groups,Diameter,Roughness],0.05);
5. Solved Exercise: Two Way MANOVA
In the same above mentioned example, we introduce one more dependent variable or
response: we are now looking at surface roughness as well as the diameter and testing for
two independent factors (operator and machine)

5.1 Data
Data Output Diameter

lathe 1 lathe 2 lathe 3

Operator 1 25.45 25.71 25.64


Obs. 1
Operator 2 25.20 25.50 25.73

Operator 1 25.61 25.62 25.81


Obs. 2
Operator 2 25.55 25.73 25.91

Operator 1 25.10 25.78 25.56


Obs. 3
Operator 2 25.17 25.66 25.70

Operator 1 25.15 25.71 25.79


Obs. 4
Operator 2 25.4 25.7 25.6

Operator 1 25.2 25.5 25.7


Obs. 5
Operator 2 25.6 25.6 25.8

Operator 1 25.5 25.7 25.9


Obs. 6
Operator 2 25.1 25.7 25.6

Data Output Surface Roughness

lathe 1 lathe 2 lathe 3


Obs. 1 Operator 1 25.45 25.71 25.64
Operator 2 25.20 25.50 25.73

Operator 1 25.61 25.62 25.81


Obs. 2
Operator 2 25.55 25.73 25.91

Operator 1 25.10 25.78 25.56


Obs. 3
Operator 2 25.17 25.66 25.70

Operator 1 25.15 25.71 25.79


Obs. 4
Operator 2 25.4 25.7 25.6

Operator 1 25.2 25.5 25.7


Obs. 5
Operator 2 25.6 25.6 25.8

Operator 1 25.5 25.7 25.9


Obs. 6
Operator 2 25.1 25.7 25.6

5.2 Solution
Using Matlabs function maov2:
maov2([Factor1,Factor2,Diam,Roughness],0.05)

It appears that no interaction occurs between factors, therefore two separate multivariate analysis
of variance have been conducted for Factor 1 (Lathe machines) and Factor 2 (Operators).

As observed for the two-way ANOVA, there exists a difference in the components due to the lathe
machines but not due to the operators.
6. Conclusions
It can therefore be concluded that, in the case observed, the lack of uniformity between the
components is not due to the operators training but to the lathe machines. At least one of the lathe
machines tool is not positioned properly but this does not affect the performance of the operators
on it.