You are on page 1of 10

UECM1223 Probability and Statistics II

Chapter 7-1
UECM1223 Probability and Statistics II

Example 7.1
To illustrate, we consider the tossing of a die. We hypothesize that the
die is honest, which is equivalent to testing the hypothesis, H0, that the
distribution of outcomes is the discrete uniform distribution

f(x) = 1/6, x= 1, 2, . . .,6.

Suppose that the die is tossed 120 times and each outcome is recorded.
Theoretically, if the die is balanced, we would expect each face to occur
20 times. The results are given in Table 7.1.

Table 7.1: Observed and Expected Frequencies of 120 Tosses of a Die


Face: 1 2 3 4 5 6
Observed 20 22 17 18 19 24
Expected 20 20 20 20 20 20

Chapter 7-2
UECM1223 Probability and Statistics II

If the observed frequencies are close to the corresponding expected


frequencies, the 2 -value will be small, indicating a good fit. If the
observed frequencies differ considerably from the expected
frequencies, the 2 -value will be large, and the fit is poor. A good fit
lead to the acceptance of H0, whereas a poor fit lead to its rejection. The
critical region will, therefore, fall in the right tail of the chi-squared
distribution. For a level of significance equal to α, we find the critical
value 2 and then  2 > 2 constitutes the critical region.

H0: that the distribution of outcomes is the discrete uniform distribution

Face: O E 2
O - E (O - E)
 O  E 
E
1 20 20
2 22 20
3 17 20
4 18 20
5 19 20
6 24 20
Total

Chapter 7-3
UECM1223 Probability and Statistics II

Example 7.2
You are investigating insurance fraud that manifests itself through
claimants who file claims with respect to auto accidents with which
they were not involved. Your evidence consists of a distribution of the
observed number of claimants per accident and a standard distribution
for accidents on which fraud is known to be absent. The two
distributions are summarized below:

Number of Claimants
per Accident Standard Probability Observed Number of Accidents
1 0.25 235
2 .35 335
3 .24 250
4 .11 111
5 .04 47
6+ .01 22
Total 1.00 1000
Determine the result of a chi-square test of the null hypothesis that
there is no fraud in the observed accidents.
Solution
H0: there is no fraud in the observed accidents.
Standard Observed -
No. Claimants Probability Observed Hypothesized Expected Chi-Square
1 0.25 235
2 0.35 335
3 0.24 250
4 0.11 111
5 0.04 47
6+ 0.01 22
Total 1 1000

Chapter 7-4
UECM1223 Probability and Statistics II

Example 7.3
You are given the following random sample of 30 auto claims:
54 140 230 560 600 1,100 1,500 1,800 1,920 2,000
2,450 2,500 2,580 2,910 3,800 3,800 3,810 3,870 4,000 4,800
7,200 7,390 11,750 12,000 15,000 25,000 30,000 32,300 35,000 55,000
You test the hypothesis that auto claims follow a continuous distribution F(x)
with the following percentiles:
x 310 500 2,498 4,876 7,498 12,930
F(x) 0.16 0.27 0.55 0.81 0.9 0.95
You group the data using the largest number of groups such that the expected
number of claims in each group is at least 5. Calculate the chi-square goodness-
of-fit statistic.

Solution
Note that each of the expected frequencies is at least equal to 5.
For the given intervals, based on the model probabilities, the expected counts
are 4.8, 3.3, 8.4, 7.8, 2.7, and 1.5. To get the totals above 5, group the first two
intervals and the last three.
The table is
Interval Observed Expected Prob Total Expected Chi Square
0—500 3
500—2498 8
2498—4876 9
4876—infinity 10
Total 30

Chapter 7-5
UECM1223 Probability and Statistics II

Chapter 7-6
UECM1223 Probability and Statistics II

Example 7.4
The 400 undergraduate students in a random sample at the university
of Iowa were classified according to the college in which the students
were enrolled and according to their gender. The results are recorded
in Table 7.2 call k x h contingency table, where, in this case, k=2 and
h= 5. Incidentally, these data do reflect the composition of the
undergraduate colleges at Iowa, but they were modified a little to make
the computations easier in this example. Test the null hypothesis H 0 :
pij  pi . p j i =1,2 and j =1,2,3,4,5, that the college in which a student
enrols is independent of the gender of the student at   0.01
significance level.

Table 7.2 : Undergraduates at the University of Iowa


College
Gender Business Engineering Liberal Arts Nursing Pharmacy Totals
Male 21 16 145 2 6 190
Female 14 4 175 13 4 210
Totals 35 20 320 15 10 400
Solution
1. H 0 : the college in which a student enrols is independent of the gender

of the student.
2. H1 : the college in which a student enrols is dependent of the gender
of the student.
3.   0.01
2
4. Critical Region: .01  , degree of freedom, v =(k-1). (h-1) =
4
5. Computation:
Under H 0 estimates of the probabilities are

P and P2 
1

and

P 
P 
P 
P 
P
1 2 3 4 5

Chapter 7-7
UECM1223 Probability and Statistics II

The expected numbers are computed as follows:

  
400.    
  

Table 7.2 : Undergraduates at the University of Iowa


College
Gender Business Engineering Liberal Arts Nursing Pharmacy Totals
Male 21 16 145 2 6 190

Female 14 4 175 13 4 210

Totals 35 20 320 15 10 400

 21  16.625   14  18.375   4  5.25 


2      .......     18.93
 16.625   18.375   5.25 

Since this  2  18.93 >_________, we reject H 0 at   0.01 significance


level.

Chapter 7-8
UECM1223 Probability and Statistics II

7.3 Tests of Homogeneity


In a test of homogeneity, we test if two (or more) populations are
homogeneous (similar) about the distribution of a certain
characteristics.

Suppose, for example, that we decide in advance to select 200


Democrats, 150 Republicans, and 150 Independents from the voters of
the state of North Carolina and record whether they are for a proposed
abortion law, against it, or undecided.

The observed responses are given in Table 7.3.

Table 7.3: Observed Frequencies


Political Affiliation
Abortion
Democrat Republican Independent Total
Law
For 82 70 62 214
Against 93 62 67 222
Undecided 25 18 21 64
Total 200 150 150 500

Now, rather than test for independence, we test the hypothesis that the
population proportions within each row are the same. That is, we test
the hypothesis that the proportions of Democrats, Republicans, and
Independents favouring the abortion law is the same; the proportions
of each political affiliation against the law is the same; and the
proportions of each political affiliation that are undecided are the same.
We are basically interested in determining whether the three categories
of voters are homogeneous with respect to their opinions concerning
the proposed abortion laws. Such a test is called a test for homogeneity.

Example 7.4
Referring to the data of Table 7.3, test the hypothesis that opinions
concerning the proposed abortion law is the same within each political
affiliation. Use a 0.05 level of significance.
Solution:

Chapter 7-9
UECM1223 Probability and Statistics II

1. H 0 : For each opinion, the proportions of Democrats, Republicans,


and Independents are the same.
2. H1 : For at least one opinion, the proportions of Democrats,
Republicans, and Independents are not the same.
3. α = 0.05.
4. Critical region:________________________________________
5. Computations: Using the expected cell frequency formula, we need
to compute 4 cell frequencies. All other frequencies are found by
subtraction. The observed and expected cell frequencies are displayed
in following table.

6. Decision:

Chapter 7-10

You might also like