You are on page 1of 2

Chi squared (𝜒 " test for independence)

Start by stating why you’ve chosen to perform a chi-squared test (i.e. what is a chi squared
test used for?). Then state your null and alternative hypothesis. Example:

Ho: Favourite music genre is independent of gender


H1: Favourite music genre is not independent of gender

Chi squared test statistic is found using:

"
(𝑓* − 𝑓, )"
𝜒#$%# ='
𝑓,

Where fo is the overserved value and fe is the expected value. Therefore, we need observed
values (which we collect ourselves) and expected values (which we calculate).

Once collected, you can place your raw data in your appendices. Organise your raw data into
a table for observed values:

Genre Pop Classical Folk Jazz Totals


Male 18 9 4 7 38
Female 22 6 7 7 42
Totals 40 15 11 14 80

We now need to work out the expected values. Here is an example of one, you need to
calculate them all and show the working in your IA:

The expected number of men who like pop is found by:


𝑃(𝑝𝑜𝑝) × 𝑃 (𝑚𝑎𝑙𝑒) × 𝑛, 𝑤ℎ𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
40 38
× × 80 = 19
80 80

This process is continued until all of the expected values are calculated. We can then calculate
the 𝜒 " calculation:

Observed Expected 𝑓* − 𝑓, (𝑓* − 𝑓, )" (𝑓* − 𝑓, )"


values (𝑓* ) values (𝑓, ) 𝑓,
18 19 -1 1 1
19
9
4
7
22
6
7
7
"
𝜒#$%#

"
In this case you should find the 𝜒#$%# to be 1.622.
"
We then need to compare our 𝜒#$%# with the critical value for this test. The critical value is
determined by the degrees of freedom and the significance level. We normally choose a
significance level of 1%, 5% or 10%. The lower the level, the less the results can be due to
chance.

The degrees of freedom can be found by:

Degrees of freedom = (number of rows – 1)(number of columns – 1)


= (4 – 1)(2 – 1)
=3

I’m going to select a 10% significance level for this test.

For a test with 10% significance level and 3 degrees of freedom, the critical value is 6.25.
"
If our 𝜒#$%# is greater than the critical value we say that this is significant and therefore reject
the null hypothesis that the variables are independent.

1.622 < 6.25 therefore we accept the null hypothesis that favourite music genre is independent
of gender, i.e. a person’s gender has no influence on their preferred music genre.

If it you found them to be not independent you we can say the test is significant. If this
happens, an extra thing you can do it use the p-value. When you calculate your p-value (using
"
your GDC and by verifying it’s correct by comparing your 𝜒#$%# to the value on your GDC)
you can now say, “I am now (1-p)% sure that my results are not due to chance”.

E.g. if p = 0.04 you can say, “I am now 96% sure that my results are not due to chance”.

Things to watch out for

• You cannot have expected values less than 5 (e.g. EX8C Q4). In this case you either
collect more data or you can combine results of some cells.
• When you have 1 degree of freedom you use a slightly different formula which is

referred to as Yates correction for continuity:


"
(|𝑓* − 𝑓, | − 0.5)"
𝜒I$J,K ='
𝑓,

You might also like