Chi Square Test Lecture Note Final 2018

Testing the independency of two or
more categorical variables
1
Contingency Tables
An r  c contingency table shows the observed frequencies

for two variables. The observed frequencies are arranged
in r rows and c columns. The intersection of a row and a
column is called a cell.
The following contingency table shows a random sample

of 321 fatally injured passenger vehicle drivers by age and
gender.
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6
This table is 2 x 6 contingency table 2

Expected Frequency
Assuming the two variables are independent, you can use
the contingency table to find the expected frequency for
each cell.
Finding the Expected Frequency for Contingency Table Cells

The expected frequency for a cell Er,c in a contingency table is
(Sum of row r )  (Sum of column c )

Expected frequency E r ,c  .
Sample size
3
Expected Frequency
Example:
Find the expected frequency for each “Male” cell in the
contingency table for the sample of 321 fatally injured drivers.
Assume that the variables, age and gender, are independent.
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321
4
Expected Frequency
Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321
(Sum of row r )  (Sum of column c )
Expected frequency E r ,c 
Sample size
E1,1  216  45  30.28 E1,2  216  73  49.12 E1,3  216  85  57.20

321 321 321
E1,4  216  64  43.07 E1,5  216  38  25.57 E1,6  216  16  10.77

321 321 321
5
Chi-Square Independence Test
A chi-square independence test is used to test the

independence of two variables. Using a chi-square test,
you can determine whether the occurrence of one variable
affects the probability of the occurrence of the other
variable.
For the chi-square independence test to be used, the

following must be true.
1. The observed frequencies must be obtained by using a
random sample.
2. Each expected frequency must be ≥ 5.
6
The Chi-Square Independence Test

If the conditions listed are satisfied, then the sampling
distribution for the chi-square independence test is
approximated by a chi-square distribution with
(r – 1)(c – 1) degrees of freedom, where r and c are the
number of rows and columns, respectively, of a contingency
table.
The test statistic for the chi-square independence test is
(O  E )2 The test is always a right-tailed test.
2
χ 
E
where O represents the observed frequencies and E
represents the expected frequencies.
7
Example:
The following contingency table shows a random sample
of 321 fatally injured passenger vehicle drivers by age and
gender. The expected frequencies are displayed in
parentheses. At  = 0.05, can you conclude that the
drivers’ ages are related to gender in such accidents?
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 105
(14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
8
Example continued:
Because each expected frequency is at least 5 and the
drivers were randomly selected, the chi-square
independence test can be used to test whether the
variables are independent.
H0: The drivers’ ages are independent of gender.
Ha: The drivers’ ages are dependent on gender. (Claim)
d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5
With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.
9
Example continued: O E O–E (O – E)2 (O  E )2

Rejection E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
  0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ   2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134
There is not enough evidence at the 5% level to conclude

that age is dependent on gender in such accidents.
10
Comparing Two
Variances
11
F-Distribution
Let s12 and s 22 represent the sample variances of two

different populations. If both populations are normal and
the population variances σ 12 and σ 22 are equal, then the
sampling distribution of
s12
F  2
s2
is called an F-distribution.
There are several properties of this distribution.
1. The F-distribution is a family of curves each of which is
determined by two types of degrees of freedom: the degrees
of freedom corresponding to the variance in the numerator,
denoted d.f.N, and the degrees of freedom corresponding to
the variance in the denominator, denoted d.f.D.
12
F-Distribution
Properties of the F-distribution continued:

2. F-distributions are positively skewed.
3. The total area under each curve of an F-distribution is
equal to 1.
4. F-values are always greater than or equal to 0.
5. For all F-distributions, the mean value of F is
approximately equal to 1.
d.f.N = 1 and d.f.D = 8

d.f.N = 8 and d.f.D = 26
d.f.N = 16 and d.f.D = 7
d.f.N = 3 and d.f.D = 11
F
1 2 3 4
13
Critical Values for the F-Distribution
Finding Critical Values for the F-Distribution

1. Specify the level of significance .
2. Determine the degrees of freedom for the numerator, d.f.N.
3. Determine the degrees of freedom for the denominator, d.f.D.
4. Use the F Table to find the critical value.
14
Example:
Find the critical F-value for a right-tailed test when
 = 0.05, d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
27 4.21 3.35 2.96 2.73 2.57 2.46

28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43
The critical value is F0 = 2.56.

15
Example:
Find the critical F-value for a two-tailed
 = (0.10) = 0.05
test when  = 0.10, d.f.N = 4 and d.f.D = 6.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator 1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
6 5.99 5.14 4.76 4.53 4.39 4.28
7 5.59 4.74 4.35 4.12 3.97 3.87

16
Two-Sample F-Test for Variances
Two-Sample F-Test for Variances

A two-sample F-test is used to compare two population
variances σ 12 and σ 22 when a sample is randomly selected
from each population. The populations must be
independent and normally distributed. The test statistic is
s12
F  2
s2
where s12 and s 22 represent the sample variances with
s12  s 22.
d.f.N = n1 – 1
d.f.D = n2 – 1,
where n1 is the size of the sample having variance s12 and
n2 is the size of the sample having variance s 22. 17
Two-Sample F-Test
Example:
A biologist claims that the standard deviations of the fish
length captured in two different seas are the same. A
random sample of 13 fishes from sea 1 has a standard
deviation of 27.50 cm and a random sample of 16 fishes
from sea 2 has a standard deviation of 29.75 cm. Can you
reject the biologist s claim at  = 0.01?
Because 29.75 > 27.50, s12=885.06 and s22  756.25.

H0: σ 12  σ 22 (Claim)
Ha: σ 12  σ 22
18
Two-Sample F-Test
Example continued:
This is a two-tailed test with  = ( 0.01) = 0.005, d.f.N = 15
and d.f.D = 12.
1
  0.005
2 The test statistic is
s12 885.06
F F  2  1.17.
1 2 3 4
F0 = 4.72 s2 756.25
Fail to reject H0.

There is not enough evidence at the 1% level to reject the
claim that the standard deviation of the length of the
fishes for the two seas are the same.
19

Chi Square Test Lecture Note Final 2018

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Square Test Lecture Note Final 2018

Uploaded by

Copyright:

Available Formats

Testing the independency of two or

more categorical variables

An r  c contingency table shows the observed frequencies

The following contingency table shows a random sample

This table is 2 x 6 contingency table 2

Finding the Expected Frequency for Contingency Table Cells

(Sum of row r )  (Sum of column c )

E1,1  216  45  30.28 E1,2  216  73  49.12 E1,3  216  85  57.20

E1,4  216  64  43.07 E1,5  216  38  25.57 E1,6  216  16  10.77

A chi-square independence test is used to test the

For the chi-square independence test to be used, the

The Chi-Square Independence Test

d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5

With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.

Example continued: O E O–E (O – E)2 (O  E )2

There is not enough evidence at the 5% level to conclude

Let s12 and s 22 represent the sample variances of two

Properties of the F-distribution continued:

d.f.N = 1 and d.f.D = 8

Finding Critical Values for the F-Distribution

27 4.21 3.35 2.96 2.73 2.57 2.46

The critical value is F0 = 2.56.

The critical value is F0 = 4.53.

Two-Sample F-Test for Variances

Because 29.75 > 27.50, s12=885.06 and s22  756.25.

Fail to reject H0.

You might also like