Professional Documents
Culture Documents
Chapter 6
Chapter 6
CHI-SQUARE DISTRIBUTIONS
Having the above characteristics, X2 dist has the following areas of application:
1. Testing for the equality of several proportions
2. Test for independence between two variables
3. Goodness of fit tests (Binomial, Normal, and Poisson )
1
The steps and procedures are similar with hypothesis testing.
Example:
1. A company planning a TV advertising campaign wants to determine which
TV shows its target audience watches and thereby to know whether the choice of
TV program an individual watches is independent of the individuals income.
The table supporting this is shown below. Use a 5% level of significance and the
null hypothesis.
Medium 90 67 43 200
High 17 13 20 50
Solution
1. Ho: Choice of TV program an individual watches is independent of the
individuals income
Ha: Income and Choice of TV program are not independent
2. Decision rule
= 0.05
ν = (R-1) (C-1)1*
= (3-1) (3-1)
=4
X , ν = X20.05, 4 = 9.49
2
1
For the RxC contingency table, the degrees of freedom are calculated as (R-1) (C-1). The degrees
of freedom refers to the number of expected frequencies that can be chosen freely provided the
row and column totals of expected frequencies are identical to the row and column totals of the
observed frequency table.
2
e11 = 250x250/500 = 125 e21 = 200 x 250/500 = 100 e31 = 50 x 250/500 =25
e12= 250x150/500 = 75 e22 = 200 x 150/500=60 e32 = 50 x 150/500=15
e13 = 250x100/500 =50 e23 = 200x100/500 =40 e33 = 50x100/500 =10
A test of the null hypothesis that variables are independent of one another is
based on the magnitudes of the differences between the observed frequencies
and the expected frequencies. Large differences between o ij and eij provide
evidence that the null hypothesis is false. The test is based on the following Chi-
square test statistic.
Or
Where:
Oij (fo) = observed frequency for contingency table category in row i and column j.
Eij (fe) = expected frequency for contingency table in row i and column j.
Marital Status
Absence Married Divorced Widowed Single Total
behavior
Often absent 36 16 14 34 100
Seldom absent 64 34 20 82 200
Never absent 50 50 16 84 200
Total 150 100 50 200 500
3
Solution
1. Ho: Voluntary absence behavior is independent of marital status
Ha: Voluntary absence behavior and marital status are dependent
2. = 0.01
V = (R-1) (C-1)
= (3-1) (4-1) = 6
X ,ν= X2 0.01,6 = 16.81
2
Ho: P1 = P10; P2 = P2O; P3 = P30; --- Pk = PkO; and the alternative hypothesis takes
the following form:
Ha: The population proportions are not equal to the hypothesized values
4
The degree of freedom is determined as V= K-1; where K refers to the number of
proportions and all expected cell values must be greater than or equal to 5.
Example:
1. In the business credit institution industry the accounts receivable for
companies are classified as being “current,” “moderately late,” “very late,” and
“uncollectible.” Industry figure show that the ratio of these four classes is 9: 3: 3:
1. ENDURANCE firm has 800 accounts receivable, with 439, 168, 133, and 60
falling in each class. Are these proportions in agreement with the industry ratio?
Let =0.05.
Solution
1. Ho: P1 = 9/16; P2 =3/16; P3 = 3/16; P4 = 1/16
Ha: One or more of the proportions are not equal to the proportions given in
the null hypothesis.
2. = 0.05
ν =K - 1 = 4-1 = 3
X2, ν = X20.05, 3 = 7.81
Reject Ho if sample X2> 7.81
3. Test Statistics (Sample χ2)
Class Observed freq Expected Freq (fo-fe)2
(fo) (fe = npi)
Current 439 450 121 0.269
Moderately late 168 150 324 2.160
Very late 133 150 289 1.927
Uncollectible 60 50 100 2.000
6.356
2. ETHIO Plastic Factory sells its products in three primary colors: Red,
blue, and yellow. The marketing manager feels that customers have no color
preference for the product. To test this hypothesis the manager set up a test in
which 120 purchases were given equal opportunity to buy the product in each of
the three colors. The results were that 60 bought red, 20 bought blue, and 40
bought yellow. Test the marketing manager’s null hypothesis, using =0.05.
Solution
1. Ho: People have no color preference with this product; P1 = P2 = P3 = 1/3
Ha: People have color preference with this product
2. = 0.05
5
V= K-1 = 3 -1=2
X2,ν = X2 0.05,2 = 5.99
Reject Ho if sample X2 is greater than 5.99.
3. Sample χ2
Class Observed freq Expected Freq (fo-fe)2
(fo) (fe = npi); pi = 1/32
Red 60 40 400 10.00
Blue 20 40 400 10.00
Yellow 40 40 0 0.00
20.00
4. Reject Ho; because 20 > 5.99. This means that customers do have color
preference. It appears that red is the most popular color and blue is the least
popular.
The chi-square test is widely used for a variety of analyses. One of the more
important uses of Chi-Square is the goodness-of-fit test. That is, it can be used to
decide whether a particular probability distribution, such as the binomial,
Poisson or normal, is the appropriate distribution. This is an important ability,
because as decision makers using statistics, we will need to choose a certain
probability distribution to represent the distribution of the data we happen to be
considering.
In tests of hypothesis (Chapter 5), we assumed that the population was normal
and tested the hypothesis =o, p = Po, etc. But what if we want to check on the
assumption of normality it self? The multinomial χ 2 goodness–of–fit test can be
applied.
The null hypothesis for a goodness-off it test in that the distribution of the
population from which a sample is taken is the one specified. The alternative
hypothesis is that the actual distribution is not the specified distribution.
Generally, a researcher specifies only the name of distribution and uses the
sample data to estimate the particular parameters of the distribution. In this
situation one degree of freedom is lest for each parameter that has to be
2
Since the null hypothesis states that there is no color preference, each of the three colors is
preferred by one third of the purchases.
6
estimated. However, if the research completely specifies the distribution
including parameter values, then no additional degrees of freedom is lost.
Example (Binomial)
1. Mrs. Tsion, Saleswoman for MOON Paper Company, has five accounts to
visit per day. It is suggested that sales by Mrs. Tsion May be described by the
binomial distribution, with the probability of selling each account being 0.4.
Given the following frequency distribution of Mrs. Tsion’s number of sales per
day, can we conclude that the data do in fact follow the binomial distribution?
Use the 0.05 significance level.
Solution
1. Ho: The frequency distribution is Binomial with n = 5 and P = 0.4
Ha: The frequency distribution is not binomial with n = 5 and P = 0.4
2. = 0.05
K-1 –m = 5-1-0 = 4
X2, ν = X2 0.05,4 = 9.49
Reject Ho if sample x2 is greater than 9.49
3. Sample χ2.
7
4&5 .0870 9 12.18 10.1124 0.8302
8.9607
4. Do not reject Ho. The data are well described by the binomial distribution
with n=5 and P=0.4.
Solution
1. Ho: The freq. Distribution can be best described by binomial
distribution with n=5, P=0.4
Ha: The freq. Distribution can’t be best described by binomial distribution
with n=5, P=0.4
2. = 0.05
V = K-1 –m = 5-1-0 = 4
X2,ν = X2 0.05,4 = 9.49
Reject Ho if sample χ2 > 9.49
3. Sample χ2
No. of hits No. of games with Prob. with Expected (fo-fe)2
per game that no. of hit (fo) n=5, P=0.4 freq (fe = npi)
0 12 .0778 7.78 17.8084 2.2890
1 38 .2592 25.92 145.9264 5.6249
2 27 .3456 34.56 57.1536 1.6538
3 17 .2304 23.04 36.4816 1.5834
4&5 6 .0870 8.70 4.2900 0.8379
11.9940
4. Reject Ho. The # of hit over the same in not binomially distributed
Example (Poisson)
8
1. It is hypothesized that the number of breakdowns per month of a
computer system at a major university follows a Poisson distribution with μ = 2.
The data below show the observed number of breakdowns per month during a
sample of 100 months. Use a 5% level of significance and test the null hypothesis.
Solution
Before we solve the question, first we have to compute the arrival rate per
minute, and hence one degree of freedom is lost.
9
1. Ho: The arrival of customers at a bank is Poisson distributed with λ = 2.3
Ha: The arrival of customers at a bank is not Poisson distributed with λ = 2.3
2. = 0.05
V = K-1 – m = 6-1-1 = 4
X2, ν = X2 0.05,4 = 11.07
Reject Ho if sample χ2 > 9.488
3. Sample χ2
Number of Observed Prob. with Expected (fo-fe)2
arrivals freq. (fo) λ=2.3 freq (fe = npi)
0 7 0.1003 8.4252 2.0312 0.2411
1 18 0.2306 19.3704 1.8778 0.0969
2 25 0.2652 22.2768 7.4158 0.3329
3 17 0.2033 17.0772 0.0060 0.0003
4 12 0.1169 9.8196 4.7541 0.4841
5 or more 5 0.0837 7.0308 4.1241 0.5866
1.795
Example (Normal)
1. Suppose that Ato Paulos developed an overall attitude scale to determine
how his company’s employees feel toward their company. In theory the scores
can vary from 0 to 50. Ato Paulos pretests his measurement instrument on a
randomly selected group of 100 employees. He tallies the scores and summarizes
them into six categories as shown below. Are these pretest scores approximately
normally distributed with μ = 24.9 and σ = 7.194? Use α = 0.05.
Solution
1. Ho: The attitude scores are normally distributed with μ = 24.9 and σ = 7.194
Ha: The attitude scores are not normally distributed with μ = 24.9 and σ =
7.194
2. = 0.05
V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07
10
Reject Ho if sample χ2 > 11.07
3. Sample χ2
With , the expected probability of each category can be obtained as
follows:
- 0.41621
- 0.25175
+0.00399
+0.26115
0.41924
11
0.48214
The six probabilities do not sum to 1.00. Even though observed frequencies were
obtained only for these six categories, getting a score less than 10 or greater than
40 was also possible. Because 0.5 of the probabilities lie in each half of a normal
distribution and utilizing the sum of expected probabilities on each side of the
mean, 24.9, we can obtain a probability of the < 10 category: 0.5 – (0.06456 +
0.16446 + 0.25175) = 0.01923. Similarly, we can obtain the probability of >40
category: 0.5 – (0.00399 + 0.25716 + 0.15809 + 0.06290) = 0.01786. expected
frequencies can then be obtained by multiplying each expected probability by the
total frequency (100), as shown below.
As the < 10 and > 40 categories have values of less than 5, each must be combined
with the adjacent category. As a result, the < 10 category becomes part of the 10 –
15 category and the > 40 category becomes part of the 35 – 40 category.
Expected freq
Score category Probability (fe = npi)
10 – 15 0.08379 8.379
15 – 20 0.16446 16.446
20 – 25 0.25574 25.574
25 – 30 0.25716 25.716
30 -35 0.15809 15.809
35 – 40 0.08076 8.076
12
Score Observed Expected
category freq. (fo) Probability freq (fe = npi) (fo-fe)2
10 – 15 11 0.08379 8.379 6.8696 0.8199
15 – 20 14 0.16446 16.446 5.9829 0.3638
20 – 25 24 0.25574 25.574 2.4775 0.0964
25 - 30 28 0.25716 25.716 5.2167 0.2029
30 -35 13 0.15809 15.809 7.8905 0.4991
35 – 40 10 0.08076 8.076 3.7018 0.4584
2.4409
4. Do not Reject Ho. The attitude score are normally distributed with mean 24.9
and standard deviation 7.194.
13