You are on page 1of 8

Introduction to Chi-Square

All of the inferential statistics we have covered in past lessons, are what are called parametric statistics. To use these statistics we make some assumptions about the distributions they come from, such as they are normally distributed. With parametric statistics we also deal with data for the dependent variable that is at the interval or ratio level of measurement, i.e. test scores, physical measurements. The parametric statistics we have discussed so for in this course are: 1. . !. #. \$. %. the Z-score test the Z-test the sin"le-sample t-test the independent t-test the dependent t-test one-sample analysis of variance &A'()A*

We will now consider a widely used non-parametric test, chi-square, which we can use with data at the nominal level, that is data that is classificatory. +or e,ample, we know the fre-uency with which enterin" freshman, when re-uired to purchase a computer for colle"e use, select .acintosh /omputers, 01. /omputers, or 2ome other brand of computer. We want to know if there is a difference amon" the fre-uencies with which these three brands of computers are selected or if they choose basically e-ually amon" the three brands. This is a problem we can use the chi-s-uare statistic for. The chi-s-uare statistic is used to compare the observed frequency of some observation &such as fre-uency of buyin" different brands of computers* with an expected frequency &such as buyin" e-ual numbers of each brand of computer*. The comparison of observed and e,pected fre-uencies is used to calculate the value of the chi-s-uare statistic, which in turn can be compared with the distribution of chi-s-uare to make an inference about a statistical problem. The symbol for chi-s-uare and the formula are as follows:

where O is the observed fre-uency, and E is the e,pected fre-uency. The de"rees of freedom for the one-dimensional chi-s-uare statistic is:

!!! !!. 3sin" the e. The data for 144 students is recorded in the table below &the observed fre-uencies*.44! 1!. This test is also referred to as the "oodnessof-fit test. select .ample we already mentioned of the fre-uency with which enterin" freshman. This is the first use of the one-variable chi-s-uare test.1 9 ! . #requency "ith "hich students select co\$puter brand Co\$puter I%& &acintosh Other 'otal (chi-square! +rom the table we can see that: Observed Expected &(-6* 56 #requency #requency #7 !% 17 !!. We want to know if there is a si"nificant difference amon" the fre-uencies with which these three brands of computers are selected or if the students select e-ually amon" the three brands.df = C .%4# 4. and (ther* we would indicate the e. /omputers. 0n the third column of the table we have calculated the s-uare of the observed fre-uency minus the e.1 where C is the number of cate"ories or levels of the independent variable.pected fre-uency divided by the e.pected fre-uency. 1! 8.!!!. when re-uired to purchase a computer for colle"e use. One-Variable Chi-Square ( oodness-of-fit test! "ith equal expected frequencies We can use the chi-s-uare statistic to test the distribution of measures over levels of a variable to indicate if the distribution of measures is the same for all levels.pected fre-uency for each cate"ory to be 1445! or !!. 01. The sum of the third column would be the value of the chi-s-uare statistic. 2ince there are 144 measures or observations and there are three cate"ories &.pected fre-uency for each cate"ory.acintosh.8 4 The df 9 / . or 2ome other brand of computer.acintosh /omputers. 01.1 9 . We have also indicated the e.!!! \$.!!! !!..

Set the alpha level.8 4* is "reater than \$. +rite a su\$\$ary state\$ent based on the decision( =e>ect ?4. 'ote: As usual we will set our alpha level at .1 = * #. We can do this by lookin" at Appendi. with an alpha level of . We now have the information we need to complete the si.4\$ level and with de"reees of freedom of obtained from Appendi. Table + &:istribution of /hi 2-uare* on pa"e !!1 of the te. +rite a state\$ent of results in standard En lish( There is a si"nificant difference amon" the fre-uencies with which students purchased three different brands of computers.ect . . .4\$.pected fre-uencies. Calculate the value of the appropriate statistic( )lso indicate the de rees of freedo\$ for the statistical test if necessary( df = C . !.4\$ and the row for df 9 we see that the critical value for chi-s-uare is \$.<<1. %. step process for testin" statistical hypotheses for our research problem. +rite the decision rule for re. 'ote: (ur null hypothesis. 1. we have \$ chances in 144 of makin" a type 0 error.ectin the null hypothesis( -e.4\$ level and the row for df.pected fre-uencies. State the null hypothesis and the alternative hypothesis based on your research question.<<1. \$.t.We can compare the obtained value of chi-s-uare with the critical value for the . and de"rees of freedom. Table + and notin" the tabled value for the column for the .4\$. for the chi-s-uare test. states that there are no differences between the observed and the e. we re>ect the null hypothesis and accept the alternative hypothesis. p @ .4\$ 'ote: 2ince our calculated value of &1!. ./ if 0= 1(221( 'ote: To write the decision rule we had to know the critical value for chi-s-uare.ookin" under the column for . The alternate hypothesis states that there are si"nificant differences between the observed and e.

18 #.acintosh 9 144 C \$B 9 \$ 6. students re-uired to buy computers for colle"e use bou"ht 01.pected fre-uencies are recorded in the second column of the table.4\$ level and with de"reees of freedom of obtained from Appendi. (f 144 enterin" freshman we surveyed !% bou"ht . The data for 144 students is recorded in the table below &the observed fre-uencies*. Table + &:istribution of /hi . 9 144 C \$4B 9 \$4 6. .pected fre-uencies are those from the national study. and other computers \$B of the time. We want to know if these fre-uencies of computer buyin" behavior is similar to or different than the national study data.One-Variable Chi-Square ( oodness-of-fit test! "ith predeter\$ined expected frequencies .pected fre-uency divided by the e.acintosh computers \$B of the time. #requency "ith "hich students select co\$puter brand Co\$puter I%& &acintosh Other 'otal (chi-square! +rom the table we can see that: Observed Expected &(-6* 56 #requency #requency #7 !% 17 \$4 \$ \$ 4.pected fre-uencies rather than with e-ual fre-uencies. We could formulated our revised problem as follows: 0n a national study. The sum of the third column would be the value of the chi-s-uare statistic.pected fre-uency for 01.pected fre-uency for .\$% 7.pected fre-uency for (ther 9 144 C \$B 9 \$ The e.etAs look at the problem we >ust solved.pected fre-uency we take the percenta"es from the national study times the total number of sub>ects in the current study. As before we have calculated the s-uare of the observed fre-uency minus the e. To "et the e. computers.1 9 ! . in a way that illustrates the other use of onevariable chi-s-uare.1 9 We can compare the obtained value of chi-s-uare with the critical value for the .\$8 The df 9 / .pected fre-uency and recorded this result in the third column of the table. computers \$4B of the time. 0n this case the e. that is with predetermined e. • • • 6. and 17 bou"ht some other brand of computer.acintosh /omputers.8# . #7 bou"ht 01.

We can do this by lookin" at Appendi. We now have the information we need to complete the si. State the null hypothesis and the alternative hypothesis based on your research question. . p @ .2-uare* on pa"e !!1 of the te. and de"rees of freedom.4\$ and the row for df 9 we see that the critical value for chi-s-uare is \$.ookin" under the column for .pected fre-uencies. !.4\$. for the chi-s-uare test. we have \$ chances in 144 of makin" a type 0 error. we re>ect the null hypothesis and accept the alternative hypothesis.ectin the null hypothesis( -e. +rite a state\$ent of results in standard En lish( There is a si"nificant difference amon" the fre-uencies with which students purchased three different brands of computers and the proportions su""ested by a national study. 1.1 = * #.4\$. 'ote: (ur null hypothesis. Table + and notin" the tabled value for the column for the .pected fre-uencies. +rite the decision rule for re. states that there are no differences between the observed and the e.<<1. \$./ if 0= 1(221( 'ote: To write the decision rule we had to know the critical value for chi-s-uare. Set the alpha level.\$8* is "reater than \$. with an alpha level of . +rite a su\$\$ary state\$ent based on the decision( =e>ect ?4. . .4\$ 'ote: 2ince our calculated value of &7.<<1.t. step process for testin" statistical hypotheses for our research problem. 'ote: As usual we will set our alpha level at . %. Calculate the value of the appropriate statistic( )lso indicate the de rees of freedo\$ for the statistical test if necessary( df = C .ect . The alternate hypothesis states that there are si"nificant differences between the observed and e.4\$ level and the row for df.

medium. The two variables we are considerin" here are hometown siDe &small.is the number of rows or levels of the seconed variable. 'ow we must calculate the e..pected fre-uencies with the formula: Expected #requency for a Cell = (Colu\$n 'otal 5 -o" 'otal!67rand 'otal . of them*.1!(.pected fre-uency for each of the si. cells.'"o-Variable Chi-Square (test of independence! 'ow let us consider the case of the two-variable chi-s-uare test. and E is the e. The de"rees of freedom for the two-dimensional chi-s-uare statistic is: df = (C .pected fre-uency. also known as the test of independence. medium. +or e. #requency "ith "hich \$ales and fe\$ales co\$e fro\$ s\$all3 \$ediu\$3 and lar e cities S\$all #e\$ale &ale 'otals 14 # 1# 1# 1 1\$ &ediu\$ % 1 7 4ar e !4 % !% 'otals The formula for chi-s-uare is the same as before: where O is the observed fre-uency.1! where C is the number of columes or levels of the first variable and . 0n the table above we have the observed fre-uencies &si. or lar"e* and se. &male or female*. +or two-variable chi-s-uare we find the e.ample we may wish to know if there is a si"nificant difference in the fre-uencies with which males come from small. Another way of puttin" our research -uestion is: 0s "ender independent of siDe of hometownE The data for !4 females and % males is in the followin" table. or lar"e cities as constrasted with females.

The "rand total is !%. \$. Set the alpha level. #.%%7 The e. .\$44 The e. 3sin" the formula we can thus find the e.!!! &ediu\$ 4ar e 'otals (O(O(OObserved Expected * Observed Expected * * E! 6E E! 6E E! 6E 4.8!! 1.1! = (*!(1! = * We now have the information we need to complete the si. 1\$ &medium*.%%7 . step process for testin" statistical hypotheses for our research problem.0n the table above we can see that the /olumn Totals are 1# &small*.184 % 4.pected fre-uency for the lar"e male cell is 7C%5!% 9 1.\$44 4.1%7 4.<44 1 7 \$.\$44 . !8 1# 1. Observed frequencies3 expected frequencies3 and (O . . !. 1.pected fre-uency for each cell.8!! The e. .\$44 The e.pected fre-uency for the small male cell is 1#C%5!% 9 .pected fre-uency for the medium female cell is 1\$C!45!% 9 1 . The sum of all these will of course be the value of chi-s-uare.1!(.1! = (8 .!!! The e.. %.E!*6E for \$ales and fe\$ales fro\$ s\$all3 \$ediu\$3 and lar e cities S\$all Observed Expected #e\$ale 14 &ale # 'otals 1# +rom the table we can see that: 11.1!(* . The e.1%7 We can put these e. while the =ow Totals are !4 &female* and % &male*.pected fre-uencies in our table and also include the values for (O E!*6E.pected fre-uency for the lar"e female cell is 7C!45!% 9 \$.pected fre-uency for the small female cell is 1#C!45!% 9 11. and 7 &lar"e*. 1.pected fre-uency for the medium male cell is 1\$C%5!% 9 .44\$ !4 4.1<1 1 1\$ 1 . State the null hypothesis and the alternative hypothesis based on your research question.4 # % !% and df = (C .

\$.ect . involvin" fre-uencies with which observations fall in various cate"ories &nominal data*. Calculate the value of the appropriate statistic( )lso indicate the de rees of freedo\$ for the statistical test if necessary( df = (C . /hi-s-uare is a useful non-parametric statistic to help evaluate statistical hypothesis.<<1. medium.4\$ level and the row for df. +rite a su\$\$ary state\$ent based on the decision( +ail to re>ect ?4 'ote: 2ince our calculated value of & . ?ometown siDe is not independent of "ender.1! = (*!(1! = * #. with an alpha level of . Table + and notin" the tabled value for the column for the . +rite a state\$ent of results in standard En lish( There is not a si"nificant difference in the fre-uencies with which males come from small. or lar"e towns as compared with females./ if 0= 1(221( 'ote: To write the decision rule we had to know the critical value for chi-s-uare.1!(. we fail to re>ect the null hypothesis and are unable to accept the alternative hypothesis.!.ectin the null hypothesis( -e. .. %. and de"rees of freedom.\$!8* is not "reater than \$. +rite the decision rule for re.4\$. We can do this by lookin" at Appendi.