You are on page 1of 8

Introduction to Chi-Square

All of the inferential statistics we have covered in past lessons, are what are called
parametric statistics. To use these statistics we make some assumptions about the
distributions they come from, such as they are normally distributed. With parametric
statistics we also deal with data for the dependent variable that is at the interval or ratio
level of measurement, i.e. test scores, physical measurements.

The parametric statistics we have discussed so for in this course are:

1. the Z-score test


2. the Z-test
3. the single-sample t-test
4. the independent t-test
5. the dependent t-test
6. one-sample analysis of variance (ANOVA)

We will now consider a widely used non-parametric test, chi-square, which we can use
with data at the nominal level, that is data that is classificatory. For example, we know
the frequency with which entering freshman, when required to purchase a computer for
college use, select Macintosh Computers, IBM Computers, or Some other brand of
computer. We want to know if there is a difference among the frequencies with which
these three brands of computers are selected or if they choose basically equally among
the three brands. This is a problem we can use the chi-square statistic for.

The chi-square statistic is used to compare the observed frequency of some observation
(such as frequency of buying different brands of computers) with an expected frequency
(such as buying equal numbers of each brand of computer). The comparison of observed
and expected frequencies is used to calculate the value of the chi-square statistic, which
in turn can be compared with the distribution of chi-square to make an inference about a
statistical problem.

The symbol for chi-square and the formula are as follows:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the one-dimensional chi-square statistic is:


df = C - 1

where C is the number of categories or levels of the independent variable.

One-Variable Chi-Square (goodness-of-fit test) with


equal expected frequencies
We can use the chi-square statistic to test the distribution of measures over levels of a
variable to indicate if the distribution of measures is the same for all levels. This is the
first use of the one-variable chi-square test. This test is also referred to as the goodness-
of-fit test.

Using the example we already mentioned of the frequency with which entering freshman,
when required to purchase a computer for college use, select Macintosh Computers, IBM
Computers, or Some other brand of computer. We want to know if there is a significant
difference among the frequencies with which these three brands of computers are selected
or if the students select equally among the three brands.

The data for 100 students is recorded in the table below (the observed frequencies). We
have also indicated the expected frequency for each category. Since there are 100
measures or observations and there are three categories (Macintosh, IBM, and Other) we
would indicate the expected frequency for each category to be 100/3 or 33.333. In the
third column of the table we have calculated the square of the observed frequency minus
the expected frequency divided by the expected frequency. The sum of the third column
would be the value of the chi-square statistic.

Frequency with which students select computer brand


Observed Expected
Computer (O-E)2/E
Frequency Frequency
IBM 47 33.333 5.604
Macintosh 36 33.333 0.213
Other 17 33.333 8.003
Total (chi-square) 13.820

From the table we can see that:

The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level
and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi
Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2
we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your
research question.

Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate
hypothesis states that there are significant differences between the observed and
expected frequencies.
2. Set the alpha level.

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.

df = C - 1 = 2

4. Write the decision rule for rejecting the null hypothesis.

Reject H0 if >= 5.991.

Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.

5. Write a summary statement based on the decision.


Reject H0, p < .05

Note: Since our calculated value of (13.820) is greater than 5.991, we reject
the null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is a significant difference among the frequencies with which students
purchased three different brands of computers.
One-Variable Chi-Square (goodness-of-fit test) with
predetermined expected frequencies
Let's look at the problem we just solved, in a way that illustrates the other use of one-
variable chi-square, that is with predetermined expected frequencies rather than with
equal frequencies. We could formulated our revised problem as follows:

In a national study, students required to buy computers for college use bought IBM
computers 50% of the time, Macintosh computers 25% of the time, and other computers
25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh
Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We
want to know if these frequencies of computer buying behavior is similar to or different
than the national study data.

The data for 100 students is recorded in the table below (the observed frequencies). In
this case the expected frequencies are those from the national study. To get the expected
frequency we take the percentages from the national study times the total number of
subjects in the current study.

 Expected frequency for IBM = 100 X 50% = 50


 Expected frequency for Macintosh = 100 X 25% = 25
 Expected frequency for Other = 100 X 25% = 25

The expected frequencies are recorded in the second column of the table. As before we
have calculated the square of the observed frequency minus the expected frequency
divided by the expected frequency and recorded this result in the third column of the
table. The sum of the third column would be the value of the chi-square statistic.
Frequency with which students select computer brand
Observed Expected
Computer (O-E)2/E
Frequency Frequency
IBM 47 50 0.18
Macintosh 36 25 4.84
Other 17 25 2.56
Total (chi-square) 7.58

From the table we can see that:

The df = C - 1 = 3 - 1 = 2

We can compare the obtained value of chi-square with the critical value for the .05 level
and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi
Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2
we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your
research question.

Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate
hypothesis states that there are significant differences between the observed and
expected frequencies.
2. Set the alpha level.

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.

df = C - 1 = 2

4. Write the decision rule for rejecting the null hypothesis.

Reject H0 if >= 5.991.

Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.

5. Write a summary statement based on the decision.


Reject H0, p < .05

Note: Since our calculated value of (7.58) is greater than 5.991, we reject the
null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is a significant difference among the frequencies with which students
purchased three different brands of computers and the proportions suggested by a
national study.
Two-Variable Chi-Square (test of independence)
Now let us consider the case of the two-variable chi-square test, also known as the test of
independence.

For example we may wish to know if there is a significant difference in the frequencies
with which males come from small, medium, or large cities as constrasted with females.
The two variables we are considering here are hometown size (small, medium, or large)
and sex (male or female). Another way of putting our research question is: Is gender
independent of size of hometown?

The data for 30 females and 6 males is in the following table.

Frequency with which males and females come from small, medium, and large cities

Small Medium Large Totals

Female 10 14 6 30
Male 4 1 1 6
Totals 14 15 7 36

The formula for chi-square is the same as before:

where

O is the observed frequency, and

E is the expected frequency.

The degrees of freedom for the two-dimensional chi-square statistic is:

df = (C - 1)(R - 1)

where C is the number of columes or levels of the first variable and R is the number of
rows or levels of the seconed variable.

In the table above we have the observed frequencies (six of them). Now we must
calculate the expected frequency for each of the six cells. For two-variable chi-square we
find the expected frequencies with the formula:

Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total


In the table above we can see that the Column Totals are 14 (small), 15 (medium), and 7
(large), while the Row Totals are 30 (female) and 6 (male). The grand total is 36.

Using the formula we can thus find the expected frequency for each cell.

1. The expected frequency for the small female cell is 14X30/36 = 11.667
2. The expected frequency for the medium female cell is 15X30/36 = 12.500
3. The expected frequency for the large female cell is 7X30/36 = 5.833
4. The expected frequency for the small male cell is 14X6/36 = 2.333
5. The expected frequency for the medium male cell is 15X6/36 = 2.500
6. The expected frequency for the large male cell is 7X6/36 = 1.167

We can put these expected frequencies in our table and also include the values for (O -
E)2/E. The sum of all these will of course be the value of chi-square.

Observed frequencies, expected frequencies, and (O - E)2/E for males and females from small,
medium, and large cities

Small Medium Large Totals

(O- (O- (O-


Observed Expected Observed Expected 2 Observed Expected 2
E)2/E E) /E E) /E
Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30
Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6
Totals 14 15 7 36

From the table we can see that:

and df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2

We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your
research question.

2. Set the alpha level.


3. Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.

df = (C - 1)(R - 1) = (2)(1) = 2

4. Write the decision rule for rejecting the null hypothesis.

Reject H0 if >= 5.991.

Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.

5. Write a summary statement based on the decision.


Fail to reject H0

Note: Since our calculated value of (2.538) is not greater than 5.991, we fail
to reject the null hypothesis and are unable to accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is not a significant difference in the frequencies with which males come
from small, medium, or large towns as compared with females.
Hometown size is not independent of gender.

Chi-square is a useful non-parametric statistic to help evaluate statistical hypothesis,


involving frequencies with which observations fall in various categories (nominal data).

You might also like