You are on page 1of 6

Excel and SPSS Assignment

Lecture: Data Analysis (1/3)


Dr. Anderson

MICROSOFT EXCEL Open up the Excel file titled Data Analysis Lab Qual Qual.
DATA ANALYSIS WITH CATEGORICAL VARIABLES:
Goodness of Fit Test
Lets start with the Chi-squared Excel tab. A chi-square goodness of fit test allows us to test whether the observed proportions for a
categorical variable differ from hypothesized proportions. We will test if a qualitative variable follows a particular distribution, the uniform
distribution, which would mean that the null hypothesis is that consumers do not have a preference for various brands. Create Frequency
table and the expected frequency count for the hypothesized distribution by creating the tables as in the picture below in cells D1:F5.
Brand
1
2
3
Total

f
=COUNTIF($A$2:$A$121, D2)
=COUNTIF($A$2:$A$121, D3)
=COUNTIF($A$2:$A$121, D4)
=SUM(E2:E4)

Expected f
=$E$5/3
=$E$5/3
=$E$5/3
=SUM(F2:F4)

Brand
1
2
3
Total

f
25
65
30
120

Expected f
40
40
40
120

In cells F2:F4 the uniform distribution would be to expect 40 purchases of each type of brand. In cell D7, click Formulas, Insert Function
from the toolbar, and search for Chi. Choose CHISQ.TEST and click OK. Fill in the Function Arguments dialog box as below and click OK.

The p-value is given as 6.9623E-06 which is scientific notation for 0.0000069623. Since the p value is < , in which the default is .05, reject the null
hypothesis. There is enough evidence to show that customers have a preference for various brands. Specifically, brand 2 is highly desired by consumers
compared to brands 1 and 3.
Checking Assumptions: Check that the expected values are all > 5. In this case, the expected values (40,40,40) are all > 5. The results of the chi-squared
test are valid. If this was not the case, the Fishers exact test should be used in place of the chi-squared test (combining categories to create a 2x2 table of
course). Excel, however, does not have the capability to perform the fishers exact test and SPSS would need to be used.
Effect size (r family): We will use Cramers V since we have more than two outcomes (we would use Phi if we had only two outcomes), calculated as follows:
V

2
n df .

2.10
.05
322
3D9 to F14 create the following table:
In cells

Effect size:
Brand
1
2
3
Total

Chi square statistic


=((E2-F2)^2)/F2
=((E3-F3)^2)/F3
=((E4-F4)^2)/F4
=SUM(E11:E13)

Cramer's V
=SQRT(E14/(120*(3-1)))

Effect size:
Brand
Chi square statistic Cramer's V
1
5.625
0.3145764
2
15.625
3
2.5
Total
23.75

Remember that a value of 0 indicates that the sample proportions are exactly equal (a perfect fit) to the hypothesized proportions (i.e., O = E). As v
increases, the degree of departure from a perfect fit increases. Cramers V is 0.31 which tells us that the relationship is weak, indicating a small departure
from the hypothesized distribution, or in other words, a very good fit.

Test of Independence
A manager is interested if there is a relationship between Gender and Brand purchased. We need to
create the contingency table (also called Crosstabulation) to be studied. Click on the Insert tab,
Pivot table, choose Pivot chart. In the Create Pivot table dialog box that appears, for
Table/Range , click on

and highlight cells A1:B121. For Choose where you want the Pivot table

to report choose Existing Worksheet, click on


and choose cell H1. Click OK.
Field List dialog box will appear and fill it out as in the diagram.

The PivotTable

To create the 100% stacked bar graph: Highlight the contingency table, right click and COPY. Choose
an area that is blank in your excel spreadsheet (H7), right click to paste and choose option 123. As
in the cells below, calculate the relative frequencies for the gender variable.
Count of Gender
Row Labels
Female
Male
Grand Total

Female
Male

Column Labels
1
2
14
19
11
46
25
65
Brand 1
=I9/$L$9
=I10/$L$10

3
15
15
30

Grand Total
48
72
120

Female
Male

Brand 2
Brand 3
=J9/$L$9 =K9/$L$9 =SUM(I14:K14)
=J10/$L$10 =K10/$L$10 =SUM(I15:K15)
3

Brand 1
29%
15%

Brand 2 Brand 3
40%
31%
64%
21%

100%
100%

Highlight cells H13 to K15 -> click on Insert -> column -> 2-D column -> 100% stacked bar graph. In the design tab at the top of the screen, click
on the black/gray color scheme (this is best for printing), and click on Switch Row/Column. Add data labels to the graph.

It is important to change the colors/textures of the bars to help distinguish them from one
another. To do this, right click on a bar -> Format data series -> Fill -> Pattern fill -> choose a
pattern and color -> Close. You can also add a border around the bars in the format data
series dialog box.

To perform the test, Excel requires you to provide the expected values before computing a chi-square test. To do so, create the following table
with formulas in cells N1:R5.
Expected

Expected

Brand 1
Brand 2
Brand 3
Total
Female =(O5*$R$3)/$R$5 =(P5*$R$3)/$R$5 =(Q5*$R$3)/$R$5
48
Male =(O5*$R$4)/$R$5 =(P5*$R$4)/$R$5 =(Q5*$R$4)/$R$5
72
Total 25
65
30
120

Brand 1 Brand 2 Brand 3 Total


Female
10
26
12
48
Male
15
39
18
72
Total
25
65
30
120

In cell N7, click Formulas, Insert Function from the toolbar, and search for Chi. Choose CHISQ.TEST and click OK. Fill in the Function
Arguments dialog box as below and click OK.

The p-value is given as 0.029. Since the p value is < , in which the default is .05, reject the null hypothesis. There is enough evidence to show that there
is a relationship between Gender and Brand purchased. Thus, men and woman prefer different brands.
Checking Assumptions: Check that the expected values are all > 5. In this case, the expected values (10, 15, 26, 39, 12, 18) are all > 5. The results of the chisquared test are valid.
Effect size(d family): We will use Cramers V since we have more than two outcomes (we would use Phi if we had only two outcomes), calculated as follows:

. In cells N9:R13 create the following table:


5

Effect size (r family) Cramer's V


Brand 1
Brand 2
Brand 3
Cramer's V
=((I9-O3)^2)/O3 =((J9-P3)^2)/P3 =((K9-Q3)^2)/Q3 =SQRT((O13/120)/1)
=((I10-O4)^2)/O4 =((J10-P4)^2)/P4 =((K10-Q4)^2)/Q4
=SUM(O11:Q12)

Female
Male
Total

Effect size (r family) Cramer's V


Brand 1 Brand 2 Brand 3 Cramer's V
Female
1.60
1.88
0.75
0.243
Male
1.07
1.26
0.50
Total
7.06

Cramers V is 0.243 indicating that the relationship is weak. The gender of a person explains 5.9% (.243 x .243) of the variance in brand preference.
Therefore, gender does contribute to ones brand preference, but there are still other factors to consider.
Effect size (r family): The adjusted standardized residual is used for tables that are not 2x2. Create the following table in cells N15:Q18. The formula for

OE
E

each cell is:

(1

nrow
n
)(1 column )
ntotal
ntotal

Effect size (d family) Adjusted Standardized Residuals


Brand 1
Female =((I9-O3)/SQRT(O3))/SQRT((1-($R$3/$R$5))*(1-(O5/$R$5)))
Male =((I10-O4)/SQRT(O4))/SQRT((1-($R$4/$R$5))*(1-(O5/$R$5)))

Brand 2
=((J9-P3)/SQRT(P3))/SQRT((1-($R$3/$R$5))*(1-(P5/$R$5)))
=((J10-P4)/SQRT(P4))/SQRT((1-($R$4/$R$5))*(1-(P5/$R$5)))

Brand 3
=((K9-Q3)/SQRT(Q3))/SQRT((1-($R$3/$R$5))*(1-(Q5/$R$5)))
=((K10-Q4)/SQRT(Q4))/SQRT((1-($R$4/$R$5))*(1-(Q5/$R$5)))

Effect size (d family) Adjusted Standardized Residuals


Brand 1
Brand 2
Brand 3
Female
1.84
-2.62
1.29
Male
-1.84
2.62
-1.29
We look for an adjusted standardized residual that is above +2 or below -2. There is a relationship between Gender and Brand Purchased. This
relationship is shown clearly by the difference in interest in Brand 2 between male and female.

You might also like