You are on page 1of 30

Business Statistics

10 Chi-Square distribution

Test of goodness of fit

Dr Akhter Raza
Test of Goodness of Fit

Testing whether observed frequency distribution


follows an specific probability model

2
Test of Goodness of Fit
Is a non-parametric test that is used to find out how
the observed value of a given phenomena is
significantly different from the expected value. It test
how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution.

3
Test of Goodness of Fit
Is a non-parametric test that is used to find out how
the observed value of a given phenomena is
significantly different from the expected value. It test
how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution.

4
Test of goodness of fit

5
6
Testing normality
A bank manager has developed a new system to
reduce the time customers spend waiting to be
served by tellers during peak business hours.
Typical waiting times during peak business hours
under the current system are roughly 9 to 10
minutes. The bank manager hopes that the new
system will lower typical waiting times to less
than six minutes and wishes to evaluate the new
system.
7
Testing normality
When the new system is operating consistently
over time, the bank manager decides to select a
sample of 100 customers that need teller service
during peak business hours. Specifically, for each
of 100 peak business hours, the first customer
that starts waiting for teller service at or after a
randomly selected time during the hour will be
chosen

8
Bank customers waiting time

9
Testing normality
Use these data to carry out a chi-square
goodness-of-fit test to determine whether the
population of all waiting times is normally
distributed. Use 0.05 as level of significance

10
steps
1. Distribute the data into intervals of
−∞, 𝜇 − 2𝜎 , 𝜇 − 2𝜎, 𝜇 − 𝜎 , 𝜇 − 𝜎, 𝜇 , (𝜇
, 𝜇 + 𝜎], 𝜇 + 𝜎, 𝜇 + 2𝜎 , (𝜇 + 2𝜎, ∞)
2. Produce a observed frequency distribution
and construct histogram
3. Estimate the two unknown parameters
𝜇 & 𝜎 using observed frequency table
4. Compute probabilities using normal pdf
5. Find expected frequencies 𝑓𝑒 = 𝑁 ∗ 𝑃(𝑥)
11
steps
(𝑓𝑜 −𝑓𝑒 )2
6. Find Chi-Sq using formula χ2 =
𝑓𝑒
7. Find p-value or critical region
8. Take decision and make conclusion

12
R-code: making data frame
bnkWtngTime<-c(1.6, 6.2, 3.2, 5.6, 7.9, 6.1, 7.2, 6.6, 5.4, 6.5, 4.4,
1.1, 3.8, 7.3, 5.6, 4.9, 2.3, 4.5, 7.2, 10.7, 4.1, 5.1, 5.4, 8.7, 6.7, 2.9,
7.5, 6.7, 3.9, .8, 4.7, 8.1, 9.1, 7.0, 3.5, 4.6, 2.5, 3.6, 4.3, 7.7, 5.3,
6.3, 6.5, 8.3, 2.7, 2.2, 4.0, 4.5, 4.3, 6.4, 6.1, 3.7, 5.8, 1.4, 4.5, 3.8,
8.6, 6.3, .4, 8.6, 7.8, 1.8, 5.1, 4.2, 6.8, 10.2, 2.0, 5.2, 3.7, 5.5, 5.8,
9.8, 2.8, 8.0, 8.4, 4.0, 3.4, 2.9, 11.6, 9.5, 6.3, 5.7, 9.3, 10.9, 4.3,
1.3, 4.4, 2.4, 7.4, 4.7, 3.1, 4.8, 5.2, 9.2, 1.8, 3.9, 5.8, 9.9, 7.4, 5.0)

13
R-code: estimate mean and variance
length(bnkWtngTime)
Head(bnkWtngTime)
mu=mean(bnkWtngTime)
sd=sd(bnkWtngTime)
c(mu, sd)

14
R-code: making intervals
bin=seq(mu-3*sd,mu+3*sd,sd)

15
R-code: making intervals
table(cut(bnkWtngTime,breaks = bin))

16
R-code: making frequency table
obs=hist(bnkWtngTime,breaks=bin,plot=F)$count

17
R-code: Finding probabilites
prob=c(pnorm(-2),pnorm(-1:2)-pnorm(-2:1),1-pnorm(2))

18
R-code: Finding expected freq
exp=prob*100
exp

19
R-code: final table
ans=cbind(prob,exp,obs)
ans

20
R-code: calculating chi-square
chSq=sum((obs-exp)^2/exp)

> chSq
[1] 1.987862

21
R-code: finding p-value
pval=1-pchisq(chSQ,6-2-1)

> pval=1-pchisq(chSq,6-2-1)
> pval
[1] 0.5749299

22
Decision and Conclusion
As p-value is larger than 0.05 level of significance therefore
accept H0 and conclude that the waiting time of bank customers
follows normal distribution

23
Assignment using R
1. A die was thrown 60 times and the following
frequency distribution was observed. Test at 5%
level of significance that the die is fair.

Face 1 2 3 4 5 6
Frequency 15 6 4 7 11 17

24
2. A company routinely purchases a certain type of
bolts. The purchasing department of the company
has been instructed to spread the purchase order
among suppliers A, B, C, and D in the ratio of
2:2:1:1. As a check, 24 purchase order are
randomly selected and suppliers A, B, C, D have
received 13, 4, 4, 3 order respectively. Does this
indicate that the instructions are being followed at
5 % level of significance

25
3.

26
4. Three cards are drawn from an ordinary deck of
playing cards, with replacement, and the number Y of
spades is recorded. After repeating the experiment 64
times, the following outcomes were recorded:

Y 0 1 2 3
f 21 31 12 0

Test the hypothesis of 0.01 level of significance that the


recorded data may be fitted by the binomial distribution
b(y;3,0.25), y = 0,1,2,3.

27
5. Three marbles are selected from an urn containing 5
red marbles and 3 green marbles. After recording the
number X of red marbles, the marbles are replaced in the
urn and the experiment repeated 112 times. The results
obtained are as follows:

x 0 1 2 3
f 1 31 55 25

Test the hypothesis at the 0.05 level of significance that


the recorded data may be fitted by the Hyper geometric
distribution h(x; 8,3, 5), x = 0,1, 2, 3.
28
6. A coin is thrown until a head occurs and the number X
of tosses recorded. After repeating the experiment 256
times, we obtained the following results:

x 1 2 3 4 5 6 7 8
f 136 60 34 12 9 1 3 1

Test the hypothesis at, the 0.05 level of significance that


the observed distribution of X may be fitted by the
geometric distribution g(x; 1/2), x = 1,2,3, …, 8

29
Questions?

You might also like