You are on page 1of 19

Testing the independency of two or

more categorical variables

1
Contingency Tables

An r  c contingency table shows the observed frequencies


for two variables. The observed frequencies are arranged
in r rows and c columns. The intersection of a row and a
column is called a cell.

The following contingency table shows a random sample


of 321 fatally injured passenger vehicle drivers by age and
gender.

Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6

This table is 2 x 6 contingency table 2


Expected Frequency
Assuming the two variables are independent, you can use
the contingency table to find the expected frequency for
each cell.

Finding the Expected Frequency for Contingency Table Cells


The expected frequency for a cell Er,c in a contingency table is

(Sum of row r )  (Sum of column c )


Expected frequency E r ,c  .
Sample size

3
Expected Frequency

Example:
Find the expected frequency for each “Male” cell in the
contingency table for the sample of 321 fatally injured drivers.
Assume that the variables, age and gender, are independent.
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

4
Expected Frequency

Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321
(Sum of row r )  (Sum of column c )
Expected frequency E r ,c 
Sample size

E1,1  216  45  30.28 E1,2  216  73  49.12 E1,3  216  85  57.20


321 321 321

E1,4  216  64  43.07 E1,5  216  38  25.57 E1,6  216  16  10.77


321 321 321

5
Chi-Square Independence Test

A chi-square independence test is used to test the


independence of two variables. Using a chi-square test,
you can determine whether the occurrence of one variable
affects the probability of the occurrence of the other
variable.

For the chi-square independence test to be used, the


following must be true.
1. The observed frequencies must be obtained by using a
random sample.
2. Each expected frequency must be ≥ 5.

6
Chi-Square Independence Test

The Chi-Square Independence Test


If the conditions listed are satisfied, then the sampling
distribution for the chi-square independence test is
approximated by a chi-square distribution with
(r – 1)(c – 1) degrees of freedom, where r and c are the
number of rows and columns, respectively, of a contingency
table.
The test statistic for the chi-square independence test is
(O  E )2 The test is always a right-tailed test.
2
χ 
E
where O represents the observed frequencies and E
represents the expected frequencies.
7
Chi-Square Independence Test

Example:
The following contingency table shows a random sample
of 321 fatally injured passenger vehicle drivers by age and
gender. The expected frequencies are displayed in
parentheses. At  = 0.05, can you conclude that the
drivers’ ages are related to gender in such accidents?
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 105
(14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
8
Chi-Square Independence Test
Example continued:
Because each expected frequency is at least 5 and the
drivers were randomly selected, the chi-square
independence test can be used to test whether the
variables are independent.
H0: The drivers’ ages are independent of gender.
Ha: The drivers’ ages are dependent on gender. (Claim)

d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5

With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.

9
Chi-Square Independence Test

Example continued: O E O–E (O – E)2 (O  E )2


Rejection E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
  0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ   2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134

There is not enough evidence at the 5% level to conclude


that age is dependent on gender in such accidents.
10
Comparing Two
Variances

11
F-Distribution

Let s12 and s 22 represent the sample variances of two


different populations. If both populations are normal and
the population variances σ 12 and σ 22 are equal, then the
sampling distribution of
s12
F  2
s2
is called an F-distribution.
There are several properties of this distribution.
1. The F-distribution is a family of curves each of which is
determined by two types of degrees of freedom: the degrees
of freedom corresponding to the variance in the numerator,
denoted d.f.N, and the degrees of freedom corresponding to
the variance in the denominator, denoted d.f.D.

12
F-Distribution

Properties of the F-distribution continued:


2. F-distributions are positively skewed.
3. The total area under each curve of an F-distribution is
equal to 1.
4. F-values are always greater than or equal to 0.
5. For all F-distributions, the mean value of F is
approximately equal to 1.

d.f.N = 1 and d.f.D = 8


d.f.N = 8 and d.f.D = 26
d.f.N = 16 and d.f.D = 7
d.f.N = 3 and d.f.D = 11

F
1 2 3 4
13
Critical Values for the F-Distribution

Finding Critical Values for the F-Distribution


1. Specify the level of significance .
2. Determine the degrees of freedom for the numerator, d.f.N.
3. Determine the degrees of freedom for the denominator, d.f.D.
4. Use the F Table to find the critical value.

14
Critical Values for the F-Distribution

Example:
Find the critical F-value for a right-tailed test when
 = 0.05, d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33

27 4.21 3.35 2.96 2.73 2.57 2.46


28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.


15
Critical Values for the F-Distribution

Example:
Find the critical F-value for a two-tailed
 = (0.10) = 0.05
test when  = 0.10, d.f.N = 4 and d.f.D = 6.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
6 5.99 5.14 4.76 4.53 4.39 4.28
7 5.59 4.74 4.35 4.12 3.97 3.87

The critical value is F0 = 4.53.


16
Two-Sample F-Test for Variances

Two-Sample F-Test for Variances


A two-sample F-test is used to compare two population
variances σ 12 and σ 22 when a sample is randomly selected
from each population. The populations must be
independent and normally distributed. The test statistic is
s12
F  2
s2
where s12 and s 22 represent the sample variances with
s12  s 22.
d.f.N = n1 – 1
d.f.D = n2 – 1,
where n1 is the size of the sample having variance s12 and
n2 is the size of the sample having variance s 22. 17
Two-Sample F-Test
Example:
A biologist claims that the standard deviations of the fish
length captured in two different seas are the same. A
random sample of 13 fishes from sea 1 has a standard
deviation of 27.50 cm and a random sample of 16 fishes
from sea 2 has a standard deviation of 29.75 cm. Can you
reject the biologist s claim at  = 0.01?

Because 29.75 > 27.50, s12=885.06 and s22  756.25.


H0: σ 12  σ 22 (Claim)
Ha: σ 12  σ 22

18
Two-Sample F-Test

Example continued:
This is a two-tailed test with  = ( 0.01) = 0.005, d.f.N = 15
and d.f.D = 12.
The critical value is F0 = 4.72.
1
  0.005
2 The test statistic is
s12 885.06
F F  2  1.17.
1 2 3 4
F0 = 4.72 s2 756.25

Fail to reject H0.


There is not enough evidence at the 1% level to reject the
claim that the standard deviation of the length of the
fishes for the two seas are the same.
19

You might also like