You are on page 1of 6

DASHBOARD LEARN MENU

Learn VEE Mathematical Stats 4 4.2 4.2.3 Pearson's Chi-Squared Tests

Pearson's Chi-Squared Tests

Test of Goodness-of-Fit
Hypothesis tests can also assist in assessing the quality of a model. In particular, the chi-squared goodness-of-fit
test checks whether a proposed distribution agrees with observed data.

Start with n independent observations that must be classified as one of r mutually exclusive categories. Define ni
ni
as the number of observations classified as Category i , where i = 1 , 2 , ..., r . Hence, n is the proportion of
observations in Category i .

Now consider a model that describes the distribution among the categories. If the model is properly specified, then
ni
pi , the probability an observation belongs to Category i , should be similar to n for all i . As such, the
hypotheses can be written as

ni
• H0 : pi = for all i = 1 , 2 , ..., r
n
ni
• H1 : At least one pi ≠ for i = 1 , 2 , ..., r
n
In other words, failing to reject H0 suggests that the model fits the data adequately, whereas rejecting H0
suggests that the model fits the data poorly.

Without discussing the proof, this is a right-tailed test with a test statistic calculated as

r
(ni − npi )2

i=1
npi

which comes from a χ2 sampling distribution with r − 1 degrees of freedom. Therefore, reject H0 when

r
(ni − npi )2
∑ ≥ χ21−α, r−1
i=1
npi

As a reminder,
• r is the number of unique categories,
• ni is the number of Category i observations,
• n is the total number of observations,
• pi is the model's probability of a Category i observation,
• α is the significance level, and
• χ2p, ν is the 100p th percentile of a χ2 random variable with ν degrees of freedom.

EXAMPLE 4.2.5

The outcomes of 150 die rolls were recorded as follows:

Die Roll Frequency

1 17

2 18

3 24

4 29

5 33

6 29

Let χ2p, ν be the 100p th percentile of a chi-squared random variable with ν degrees of freedom. The following
table lists values of χ2p, ν for specific combinations of p and ν :

p = 0.94 p = 0.96 p = 0.98


ν=5 10.596 11.644 13.388
ν=6 12.090 13.198 15.033

Test whether the die is fair using the chi-squared goodness-of-fit test.

SOLUTION

There are six categories, one for each die roll outcome. Therefore, r =6.
1
In addition, a fair die implies that each die roll outcome is equally likely, meaning pi = 6
for all i.
With 150 observations,

i npi
1
1 150 ⋅ 6
= 25
1
2 150 ⋅ 6
= 25

⋮ ⋮
1
6 150 ⋅ 6
= 25

The test statistic is

r
(ni − npi )2 (17 − 25)2 (18 − 25)2 (29 − 25)2
∑ = + +…+
i=1
npi 25 25 25
= 8.4

This test involves 6 − 1 = 5 degrees of freedom. Note that

8.4 < 10.596

Determine the significance level associated with 10.596.

10.596 = χ20.94, 5 = χ21−α, 5 ⇒ α = 1 − 0.94 = 0.06

In conclusion, we fail to reject H0 at the 6% significance level, suggesting that the assumption of a fair die
seems reasonable for this data of 150 rolls.

Test of Independence
A contingency table records the frequency of observations described by two categorical variables. It is used to
examine the presence of dependence between the two variables. This is achieved using the same procedure as
the goodness-of-fit test. The hypotheses are

• H0 : The two variables are independent


• H1 : The two variables are dependent

One variable has r number of categories, while the other variable has s . Each of the n observations belongs to
one of the r -by-s combinations. Let
• nij be the number of observations in Category i for the first variable and Category j for the second variable,
• ni⋅ be the subtotal number of observations in Category i for the first variable, across all categories of the
second variable, and

• n⋅j be the subtotal number of observations in Category j for the second variable, across all categories of the
first variable,

for i = 1 , 2 , ..., r and j = 1 , 2 , ..., s . Thus, a contingency table resembles

Second Variable
Total
Cat 1 Cat 2 ⋯ Cat s  

Cat 1 n11 n12 ⋯ n1s n1⋅

First Cat 2 n21 n22 ⋯ n2s n2⋅


Variable
  ⋮ ⋮ ⋮ ⋱ ⋮ ⋮

Cat r nr1 nr2 ⋯ nrs nr⋅


Total n⋅1 n⋅2 ⋯ n⋅s n

The test statistic is calculated as

1 r s (nij n − ni⋅ n⋅j )2


∑∑
n i=1 j=1 ni⋅ n⋅j

which comes from a χ2 sampling distribution with (r − 1)(s − 1) degrees of freedom. Therefore, reject H0
when

1 r s (nij n − ni⋅ n⋅j )2


∑∑ ≥ χ21−α, (r−1)(s−1)
n i=1 j=1 ni⋅ n⋅j

EXAMPLE 4.2.6

150 vehicles were stopped at random by the police for inspection.


Year
Total
< 2015 ≥ 2015  

   

Cars 40 60 100
Type
Motorcycles 10 40 50
Total 50 100 150

Let χ2p, ν be the 100p th percentile of a chi-squared random variable with ν degrees of freedom. The following
table lists values of χ2p, ν for specific combinations of p and ν :

p = 0.95 p = 0.975 p = 0.99


ν=1 3.841 5.024 6.635
ν=2 5.991 7.378 9.210

Test whether the vehicle type and year are independent.

SOLUTION

Note that r = 2 and s = 2 , since each variable (type and year) has two categories.

The test statistic is

1 r s (nij n − ni⋅ n⋅j )2


∑∑
n i=1 j=1 ni⋅ n⋅j
2
1 [40(150) − 100(50)] [60(150) − 100(100)]2
= ( +
150 100(50) 100(100)
2
[10(150) − 50(50)] [40(150) − 50(100)]2
+ + )
50(50) 50(100)
=6
 

This test involves (2 − 1)(2 − 1) = 1 degree of freedom. Note that

5.024 < 6 < 6.635

Determine the significance levels associated with 5.024 and 6.635.


5.024 = χ20.975, 1 = χ21−α, 1 ⇒ α = 1 − 0.975 = 0.025
6.635 = χ20.99, 1 = χ21−α, 1 ⇒ α = 1 − 0.99 = 0.01

In conclusion, reject H0 at the 2.5% significance level, but not at the 1% level, suggesting strong evidence
that vehicle type and year are dependent.

Discussions

Ask a question

Nur Alia Kamaluddin

SUMMARY:

MESSAGE:

Type your question...

Previous Lesson Next Lesson


Watch 4.2.2 Hypothesis Test for Variances Watch 4.2.3 Test of Goodness-of-Fit

You might also like