Professional Documents
Culture Documents
Introduction To Chi
Introduction To Chi
All of the inferential statistics we have covered in past lessons, are what are called
parametric statistics. To use these statistics we make some assumptions about the
distributions they come from, such as they are normally distributed. With parametric
statistics we also deal with data for the dependent variable that is at the interval or ratio
level of measurement, i.e. test scores, physical measurements.
We will now consider a widely used non-parametric test, chi-square, which we can use
with data at the nominal level, that is data that is classificatory. For example, we know
the frequency with which entering freshman, when required to purchase a computer for
college use, select Macintosh Computers, IBM Computers, or Some other brand of
computer. We want to know if there is a difference among the frequencies with which
these three brands of computers are selected or if they choose basically equally among
the three brands. This is a problem we can use the chi-square statistic for.
The chi-square statistic is used to compare the observed frequency of some observation
(such as frequency of buying different brands of computers) with an expected frequency
(such as buying equal numbers of each brand of computer). The comparison of observed
and expected frequencies is used to calculate the value of the chi-square statistic, which
in turn can be compared with the distribution of chi-square to make an inference about a
statistical problem.
where
Using the example we already mentioned of the frequency with which entering freshman,
when required to purchase a computer for college use, select Macintosh Computers, IBM
Computers, or Some other brand of computer. We want to know if there is a significant
difference among the frequencies with which these three brands of computers are selected
or if the students select equally among the three brands.
The data for 100 students is recorded in the table below (the observed frequencies). We
have also indicated the expected frequency for each category. Since there are 100
measures or observations and there are three categories (Macintosh, IBM, and Other) we
would indicate the expected frequency for each category to be 100/3 or 33.333. In the
third column of the table we have calculated the square of the observed frequency minus
the expected frequency divided by the expected frequency. The sum of the third column
would be the value of the chi-square statistic.
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level
and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi
Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2
we see that the critical value for chi-square is 5.991.
We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your
research question.
Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate
hypothesis states that there are significant differences between the observed and
expected frequencies.
2. Set the alpha level.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
df = C - 1 = 2
Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.
Note: Since our calculated value of (13.820) is greater than 5.991, we reject
the null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is a significant difference among the frequencies with which students
purchased three different brands of computers.
One-Variable Chi-Square (goodness-of-fit test) with
predetermined expected frequencies
Let's look at the problem we just solved, in a way that illustrates the other use of one-
variable chi-square, that is with predetermined expected frequencies rather than with
equal frequencies. We could formulated our revised problem as follows:
In a national study, students required to buy computers for college use bought IBM
computers 50% of the time, Macintosh computers 25% of the time, and other computers
25% of the time. Of 100 entering freshman we surveyed 36 bought Macintosh
Computers, 47 bought IBM computers, and 17 bought some other brand of computer. We
want to know if these frequencies of computer buying behavior is similar to or different
than the national study data.
The data for 100 students is recorded in the table below (the observed frequencies). In
this case the expected frequencies are those from the national study. To get the expected
frequency we take the percentages from the national study times the total number of
subjects in the current study.
The expected frequencies are recorded in the second column of the table. As before we
have calculated the square of the observed frequency minus the expected frequency
divided by the expected frequency and recorded this result in the third column of the
table. The sum of the third column would be the value of the chi-square statistic.
Frequency with which students select computer brand
Observed Expected
Computer (O-E)2/E
Frequency Frequency
IBM 47 50 0.18
Macintosh 36 25 4.84
Other 17 25 2.56
Total (chi-square) 7.58
The df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for the .05 level
and with degreees of freedom of 2 obtained from Appendix Table F (Distribution of Chi
Square) on page 331 of the text. Looking under the column for .05 and the row for df = 2
we see that the critical value for chi-square is 5.991.
We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your
research question.
Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate
hypothesis states that there are significant differences between the observed and
expected frequencies.
2. Set the alpha level.
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
df = C - 1 = 2
Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.
Note: Since our calculated value of (7.58) is greater than 5.991, we reject the
null hypothesis and accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is a significant difference among the frequencies with which students
purchased three different brands of computers and the proportions suggested by a
national study.
Two-Variable Chi-Square (test of independence)
Now let us consider the case of the two-variable chi-square test, also known as the test of
independence.
For example we may wish to know if there is a significant difference in the frequencies
with which males come from small, medium, or large cities as constrasted with females.
The two variables we are considering here are hometown size (small, medium, or large)
and sex (male or female). Another way of putting our research question is: Is gender
independent of size of hometown?
Frequency with which males and females come from small, medium, and large cities
Female 10 14 6 30
Male 4 1 1 6
Totals 14 15 7 36
where
df = (C - 1)(R - 1)
where C is the number of columes or levels of the first variable and R is the number of
rows or levels of the seconed variable.
In the table above we have the observed frequencies (six of them). Now we must
calculate the expected frequency for each of the six cells. For two-variable chi-square we
find the expected frequencies with the formula:
Using the formula we can thus find the expected frequency for each cell.
1. The expected frequency for the small female cell is 14X30/36 = 11.667
2. The expected frequency for the medium female cell is 15X30/36 = 12.500
3. The expected frequency for the large female cell is 7X30/36 = 5.833
4. The expected frequency for the small male cell is 14X6/36 = 2.333
5. The expected frequency for the medium male cell is 15X6/36 = 2.500
6. The expected frequency for the large male cell is 7X6/36 = 1.167
We can put these expected frequencies in our table and also include the values for (O -
E)2/E. The sum of all these will of course be the value of chi-square.
Observed frequencies, expected frequencies, and (O - E)2/E for males and females from small,
medium, and large cities
We now have the information we need to complete the six step process for testing
statistical hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your
research question.
df = (C - 1)(R - 1) = (2)(1) = 2
Note: To write the decision rule we had to know the critical value for chi-square,
with an alpha level of .05, and 2 degrees of freedom. We can do this by looking at
Appendix Table F and noting the tabled value for the column for the .05 level and
the row for 2 df.
Note: Since our calculated value of (2.538) is not greater than 5.991, we fail
to reject the null hypothesis and are unable to accept the alternative hypothesis.
6. Write a statement of results in standard English.
There is not a significant difference in the frequencies with which males come
from small, medium, or large towns as compared with females.
Hometown size is not independent of gender.