You are on page 1of 10

ANOVA

ANOVA is concerned with the testing of hypotheses about means. It is very similar to t-test.

For an experiment involving two groups, output from ANOVA and t-test will be the same.

t-test cannot be used to test hypothesis on three or more groups.

If there are just two groups use t-test. If there are three groups or more use ANOVA

Assumptions
 Data is Normally Distributed
 Variances Between the Groups are Equal
 The Sample Size is Adequate (At Least 30 Cases Per Group)

Applications in marketing research


Marketing researchers are often interested in examining the differences in the mean values of
the dependent variable for several categories of a single independent variable or factor.

For example:
 Do various segments (IV) differ in terms of their volume of product consumption
(DV)?

 Do the brand evaluations (IV) of groups exposed to different commercials (DV) vary?

 Do retailers, wholesalers, and agents (IV) differ in their attitudes (DV) toward the
firm’s distribution policies?

 How do consumers’ intentions (DV) to buy the brand vary with different price levels
(high, medium, low - IV)?

 What is the effect of consumers’ familiarity with the store (measured as high,
medium, and low - IV) on preference for the store (DV)?

Data requirements
- DV (Metric i.e., interval or ratio level)
- IV (Non-metric i.e., nominal or ordinal with two or more groups)
- Normal distributed data
- Homogeneity of variances
- No outliers
- Independent samples/groups (i.e., independence of observations) data from before
after comparisons cannot be performed in ANOVA

1
Note: 
- When the normality, homogeneity of variances, or outliers assumptions for One-Way
ANOVA are not met, you may want to run the nonparametric Kruskal-Wallis test.

- ANOVA alone does not indicate which group mean is different. Determining which


specific pairs of means are significantly different requires running of post hoc test.

- For example: Gender – Male & Female – ANOVA will tell which group mean is
different, posthoc test will tell which group mean is higher or lower.

Framing hypothesis
H0: µ1 = µ2 = µ3 = µk ("all k population means are equal")

H1: At least one µi different ("at least one of the k population means is not equal to the
others")

µi is the population mean of the ith group (i = 1, 2, ..., k)

Example: One way ANOVA


- Determine for among the four anxiety factors, which are all the factors that are related
to respondent’s experience.

Problem Statement
Is there any statistically significant difference in experience groups with respect to the four
anxiety factors (Maths anxiety, statistics anxiety, computer anxiety, and software anxiety?)

DV: Maths anxiety, statistics anxiety, computer anxiety, and software anxiety (Metric data)

IV: Experience (Non Metric data)

Hypothesis
H0: All Experience based groups have equal effect on the four anxiety factors

H1: At least one Experience based groups have different effects on the four anxiety factors

SPSS Commands
Go to “Analyse”
Compare “means”
Select “one way – ANOVA”
Select four anxiety factors in the dependent variables list
Select experience variable in the “Factor” box
Go to options
Under statistics select “descriptives”
In the missing values select “exclude cases list-wise”
Click “Continue”

2
Click “posthoc” tab
Check “Tukey”
Click “Continue”
Click “OK”

Output Interpretation
Look at the ANOVA table, look at the significance value, which is less than 0.05.

From the table we can find that only “maths_anxiety” has significance value less than 0.05.

- It means that at least one of the three experience groups is having significantly
different effect on the “maths_anxiety.

In the posthoc multiple comparison table, look at the significance value.

- We can find that in “maths_anxiety”, the mean difference value for “less than 2 years
– more than 5 years” which is -0.438 is significant at 0.017.

From this we can interpret that the group with less than 2 years experience is different from
more than 5 years, with respect to maths_anxiety.

- We can also say that the group which has more than 5 years experience has higher
mean value than the less than 2 years experience group on maths_anxiety.

Under Homogenous Subsets:

- Look at the maths_anxiety table, we can see that less than 2 years and 3-5 years
comes under subset 1.

- Similarly 3 to 5 years and more than 5 years comes under subset 2.

- The mean value of more than 5 years (0.227) is higher than the mean value of 3 to 5
years (0.0438), so we can say that maths_anxiety is associated with more than 5 years
experience.

Conclusion: The people who have more than 5 years experience have greater mean towards
the maths_anxiety.

3
N-Way ANOVA
An ANOVA model where two or more factors are involved.

Application in Marketing Research


In marketing research, one is often concerned with the effect of more than one factor
simultaneously.

For example
- How do the consumer’s intentions (DV) to buy a brand vary with different levels of
price and different levels of distribution i.e. high-medium-low (IV)?

- How do advertising levels (high, medium and low - IV) interact with price levels
(high, medium and low - IV) to influence brand sales (DV)?

- What is the effect of consumers’ familiarity with a department store (high, medium
and low - IV) and store image (positive, neutral and negative - IV) on preference for
the store (DV)?

Assumptions
 Data is Normally Distributed
 Variances Between the Groups are Equal
 The Sample Size is Adequate (At Least 30 Cases Per Group)

Data requirements
- DV (Metric i.e., interval or ratio level)
- IV (Non-metric i.e., nominal or ordinal with two or more groups)
- Normal distributed data
- Homogeneity of variances
- No outliers
- Independent samples/groups (i.e., independence of observations) data from before
after comparisons cannot be performed in ANOVA

Example: N-Way ANOVA


- Determine if the 'maths_anxiety' is related to ‘experience' and UG of the respondents.

Problem Statement
Is there any statistically significant difference in experience and UG groups with respect to
the maths_anxiety.

DV: Maths anxiety (Metric data)

IV: Experience and UG (Non-Metric data)

Hypothesis
H0a: All Experience based groups have equal effect on the maths_anxiety

4
H0b: All UG based group have equal effect on the maths_anxiety

H1a: At least one Experience based group has different effect on the maths_anxiety

H1b: At least one UG based group has different effect on maths_anxiety

SPSS Commands
Go to “Analyse”
Select “general linear model”
Select “Univariant”
In that dependent variables input “maths_anxiety”
In fixed factors input “UG and Experience”
In posthoc select both UG and Experience to the posthoc test
In equal variances assumed check Tukey
Click “Continue”
Click “OK”

Output Interpretation
In the table of “test between subjects effects”, check significance value 0.05.

- We can see that only experience has 0.019, which means experience is having
association with maths_anxiety and NOT UG.

In posthoc test, go for experience multiple comparison table and check for significance value
less than 0.05.

- We can find the same result as obtained in the one-way ANOVA, i.e. more than 5
years’ experience is having different mean value than the other two groups on
maths_anxiety.

5
ANCOVA

An advanced analysis of variance procedure in which the effects of one or more metric scaled
extraneous variables are removed from the dependent variable before conducting the
ANOVA.

If the set of independent variables (IVs) consists of both categorical (non-metric) and metric
variables the technique is called analysis of covariance (ANCOVA).

When examining the differences in the mean values of the dependent variable related to the
effect of the controlled independent variables, it is often necessary to take into account the
influence of uncontrolled independent variables. For example:

- In determining how consumers intentions (metric – DV) to buy a brand vary with
different levels of price (non-metric IV), attitudes toward the brand (metric –
Covariant) may have to be taken into consideration.

- In determining how different groups (non-metric IV) exposed to different


commercials evaluate a brand (Metric – DV), it may be necessary to control for prior
knowledge (Metric – Covariant).

- In determining how different price levels (non-metric, IV) will affect a household’s
cereal consumption (metric, DV), it may be essential to take household size (Metric –
Covariant) into account.

In other words, if we have to add another independent variable (IV) which is metric we can
perform ANOCOVA.

- The metric independent variable will be treated as a covariant in ANCOVA.

Question: Determine if the 'maths_anxiety' is related to ‘experience' and UG of the


respondents controlling for statistics_anxiety.

Hypothesis
H0a – Statistics_anxiety is NOT associated with maths_anxiety
H0b – Experience is NOT associated with maths_anxiety
H0c – UG is NOT associated with maths_anxiety

H1a – Statistics_anxiety is associated with maths_anxiety


H1b – Experience is associated with maths_anxiety
H1c – UG is associated with maths_anxiety

SPSS Commands
Go to “Analyse”
Select “general linear model”
Select “Univariant”
In the dependent variables input “maths_anxiety”

6
In fixed factors input “UG and Experience”
Input statistics_anxiety into covariant
Click “OK”

Output interpretation
In test of between subjects effects table check for significance value less than 0.05 against the
statistics_anxiety, which is found to be 0.608 which is greater than significance value.

So statistics_anxiety is NOT having effect on the maths_anxiety.

Only experience has significance value, which is 0.019 less than 0.05, so maths_anxiety is
associated with experience.

7
DISCRIMINANT ANALYSIS

A technique for analysing marketing research data when


- the criterion or dependent variable is categorical (non-metric) and
- the predictor or independent variables are interval (metric) in nature.

When the criterion variable has two categories, the technique is known as two-group
discriminant analysis, e.g., Gender – Male & Female

When three or more categories are involved, the technique is referred to as multiple
discriminant analysis, e.g., Experience: Less than 2 years, between 3-5 years and more than 5
years.

For example,
- The dependent variable may be the choice of a brand of personal computer (brand A,
B, or C) and the independent variables may be ratings of attributes of PCs on a 7-
point Likert scale.

- In terms of demographic characteristics, how do customers who exhibit store loyalty


differ from those who do not?

Discriminant Analysis Model


D =b0 + b1X1 + b2X2 +b3X3 + ... +bkXk
where

D - discriminant score
b’s - discriminant coefficient or weight
X’s -predictor or independent variable

Statistics Associated with Discriminant Analysis

Eigenvalue. For each discriminant function, the eigenvalue is the ratio of between-group to
within-group sums of squares.

- Large eigenvalues imply superior functions

- Usually, the eigen value is more than 1 implies the D function is strongly
discriminating

Wilks’lambda (ℷ). Sometimes also called the U statistic, Wilks’ lambda for each predictor is
the ratio of the within-group sum of squares to the total sum of squares.

- Its value varies between 0 and 1. Large values of (near 1) indicate that group means
do NOT seem to be different.

- Small values of (near 0) indicate that the group means seem TO BE different

8
Standardized discriminant function coefficients. These are the discriminant function
coefficients and are used as the multipliers when the variables have been standardized to a
mean of 0 and a variance of 1.
Centroid. The centroid is the mean values for the discriminant scores for a particular group.

- There are as many centroids as there are groups, because there is one for each group.

- The means for a group on all the functions are the group centroids.

Structure correlations /Structure matrix / discriminant loadings. The structure


correlations represent the simple correlations between the predictors and the discriminant
function.

Classification / confusion / prediction matrix. It contains the number of correctly classified


and misclassified cases.

- The correctly classified cases appear on the diagonal, because the predicted and actual
groups are the same.

- The off-diagonal elements represent cases that have been incorrectly classified.

- The sum of the diagonal elements divided by the total number of cases represents the
hit ratio.

- The hit ratio is the percentage of cases correctly classified by discriminant analysis.

SPSS Commands
Go to “Analyse”
Go to “Classify”
Then choose “discriminant”
In the grouping variable input the “cluster variable” which was formed using cluster analysis
Click “defined range”, input min 1 and max 3
Click “Continue”
Inputs four anxiety factors in independents
Under Classify
In Priority probabilities, check only “all groups equal”
In Use covariance matrix check only “within groups”
Under plots check only “Combined groups”
Under display check “summary table and leave-one-out classification”
Click “Continue”
Under save click “predicted group membership” & “discriminant scores”
Click “Continue”

9
Click “OK”

Output interpretation

Look into eigen values table and we can see two functions are there, which have eigen value
more than 1. This means both functions show higher discriminating power.

Under Wilk’s Lamba check for significance value, first is 1 through 2 and 2 both functions
have significance value. As the Wilk’s Lambda values are closer to 0, we can say that both
the group means seem to be different.

Under standardised canonical discriminant functions coefficient table, we form the equation
for D1 and D2.

D1 = (0.441 * maths_anxiety) + (0.203 * statistics_anxiety) + (0.996 * computer_anxiety) +(-


0.157 * software_anxiety)

D2 = (-0.453 * maths_anxiety) + (0.976 * statistics_anxiety) + (-0.091 * computer_anxiety) +


(0.051 * software_anxiety)

Look into structure matrix for naming the functions created (D1 and D2). Computer and
software anxiety seem to be loaded in function 1 and maths and statistics anxiety is loaded in
function 2.

- D1 can be named as technology anxiety group


- D2 can be named as aptitude anxiety group

Look at canonical discriminant function graph to observe the D1 and D2 are used to
discriminating three groups.

Look into classification results table and note the footnote ‘c’ value, which is 98.3%. So we
can conclude that we can predict the discriminant based on the clusters 1, 2 and 3 and the IVs
(maths, statistics, computers, and software anxiety).

In the variable view we can see that the discriminant variable is created (row-wise).

Name them technology anxiety and aptitude anxiety.

In columns we can also see the discriminant scores for the respective respondents. These
values can be further used in analysis, e.g. regression analysis.

Characteristic Profile: An aid to interpreting discriminant analysis results by describing


each group in terms of the group means for the predictor variables.

10

You might also like