Professional Documents
Culture Documents
Chapter 7
GOODNESS OF FIT TESTS AND APPLICATIONS
7.1 INTRODUCTION
7.2 CHI-SQUARE TESTS FOR COUNT DATA
7.3 Goodness-of-fit tests to identify the Probability Distribution
7.4. SOME IMPORTANT APPLICATIONS: PARAMETRIC ANALYSIS
7.4 Chapter Summary
7.6 Computer Examples
Projects for Chapter 7
P a g e | 184
Exercise 7.2
7.2.1
p1 = p2 = .5
E1 = E2 = 200 (.5 ) = 100
(104 − 100 ) ( 96 − 100 )
2 2
16 + 16
Q 2
= + = = .32
100 100 100
Let = 0.05
At the 5% level of significance, there is not enough evidence to show that the coins is unfair.
7.2.2
E11 =
( 60 )( 90 ) = 27 E12 =
( 60 )( 50 ) = 15 E13 =
( 60 )( 60 ) = 18
200 200 200
E21 =
(100 )( 90 ) = 45 E22 =
(100 )( 50 ) = 25 E23 =
(100 )( 60 ) = 30
200 200 200
E31 =
( 40 )( 90 ) = 18 E32 =
( 40 )( 50 ) = 10 E33 =
( 40 )( 60 ) = 12
200 200 200
(O − Eij )
2
3 3
Q = = 43.86
2 ij
i =1 j =1 Eij
At the 5% level of significance, there is enough evidence to show that the opinion on collective
bargaining is dependent on employee classification.
P a g e | 185
7.2.3
E11 =
( 205)(120 ) = 82 E12 =
( 205)( 52 ) = 35.533
300 300
E21 =
( 95)(120 ) = 38 E22 =
= 16.467
( 95)( 52 )
300 300
E13 =
( 205)( 39 )
= 26.65 E14 =
( 205)(89 ) = 60.817
300 300
E23 =
( 95)( 39 ) = 12.35 E = ( 95)(89 ) = 28.183
24
300 300
(O − Eij )
2
2 4
Q 2 = = 4.5979
ij
i =1 j =1 Eij
At the 1% level of significance, there is not enough evidence to show that the major of undergraduate
students is dependent on gender.
7.2.4
1 2 3 4
Obs 427.5 237.5 19 266
Exp 380 190 47.5 332.5
Here k = 4 , d . f . = 3 , = 0.05 4
( Oi − Ei )
2
Q =
2
= 48.2125
RR : 23,0.05 7.815 i =1 Ei
At the 5% level of significance, there is enough evidence to show that at least one of the probabilities is
different from the hypothesized value.
P a g e | 186
7.2.5
(a)
H0: p1 = .2 , p2 = .2 , p3 = .3 , p4 = .2 , p5 = .1
Ha: At least one of the probabilities is different from the hypothesized value.
1 2 3 4 5
Obs 22 21 29 17 11
Exp 20 20 30 20 10
Here k = 5 , d . f . = 4 , = 0.05
RR : 2 4,0.05 9.488
( Oi − Ei )
2
4
Q =
2
= .833
i =1 Ei
At the 5% level of significance, there is not enough evidence to show that at least one of the probabilities
is different from the hypothesized value.
(b)
E11 = E21 =
( 22 )( 50 ) = 11 , E12 = E22 =
( 21)( 50 ) = 10.5
100 100
E13 = E23 =
( 29 )( 50 ) = 14.5 , E14 = E24 =
(17 )( 50 ) = 8.5
100 100
E15 = E25 =
(11)( 50 ) = 5.5
100
(O − Eij )
2
2 5
Q 2 = = 3.8428
ij
i =1 j =1 Eij
At the 5% level of significance, there is not enough evidence to show that the choice of footwear by
undergraduate students is dependent on their gender.
P a g e | 187
7.2.6
125 75 15 1
p1 = , p2 = , p3 = , p4 =
216 216 216 216
125 75
E1 = (150 ) = 86.81 , E2 = (150 ) = 52.08
216 216
15 1
E3 = (150 ) = 10.42 , E4 = (150 ) = .69
216 216
Here k = 4 , d . f . = 3 , = 0.05
RR : 23,0.05 7.815
( Oi − Ei )
2
4
Q2 = = 54.15
i =1 Ei
At the 5% level of significance, there is enough evidence to show that at least one of the three dices is
unfair.
Exercise 7.3
7.3.1
Here k = 5 , d . f . = 4 , = 0.01 5
( Oi − Ei )
2
Q =
2
= 9352.66
RR : 2 4,0.01 13.277 i =1 Ei
At the 1% level of significance, there is enough evidence to show that the speed of vehicles is not
normally distributed.
P a g e | 188
7.3.2
Here k = 5 , d . f . = 4 , = 0.05
RR : 2 4,0.05 9.488
( Oi − Ei )
2
5
Q =
2
106
i =1 Ei
At the 5% level of significance, there is enough evidence to show that daily mean temperature are not
normally distributed.
7.3.3
Here k = 5 , d . f . = 4 , = 0.05
RR : 2 4,0.05 9.488
( Oi − Ei )
2
5
Q =
2
= 1.493
i =1 Ei
At the 5% level of significance, there is not enough evidence to show that the lifetime of components
does not follow exponential distribution.
7.3.4
x 0.5
ˆMLE = , ˆ MLE
1 n
log ( x ) − log ( xi )
n i =1
R Output:
> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24,
+ 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21,
+ 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> q1=mean(log(x))
> q2=log(mean(x))
> s=q2-q1
>
> alpha=0.5/s
> alpha
[1] 102.0185
> beta=(mean(x))/alpha
> beta
[1] 0.2181423
O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .15 , P2 = .31 , P3 = .32 , P4 = .16 , P5 = .05
E1 = 8.38 , E 2 = 17.31 , E 3 = 17.75 , E 4 = 8.82 , E 5 = 2.75
Q 2 = 4.37
At the 5% level of significance, there is not enough evidence to show that the data does not follow gamma
distribution.
P a g e | 190
> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24, 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21, 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> fitdistr(x,"weibull")
shape scale
10.8855539 23.2637020
( 1.1087929) ( 0.3053506)
O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .18 , P2 = .24 , P3 = .33 , P4 = .21 , P5 = .034
E1 = 9.65 , E 2 = 13.44 , E 3 = 18.4 , E 4 = 11.59 , E 5 = 1.92
Q 2 = 6.37
At the 5% level of significance, there is enough evidence to show that the data does not follow weibull
distribution.
Since the data points do not fall on the straight line, we can determine that the data doesn't follow the
normal pdf.
P a g e | 191
7.3.6
> #7.3.6
> #PP Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (ci=pnorm(xi,xbar,sd))
[1] 0.1519306 0.3739965 0.6501090 0.8626969 0.9692656
> plot(cihat,ci,main="Normal P-P Plot for Grades",xlab="Observed Cum Prob",ylab="Exp Cumulative
Prob")
> abline(lsfit(cihat,ci))
Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.
P a g e | 192
7.3.7
Q-Q Plot:
Since the data points deviate from the straight line, we can conclude that the data doesn't follow
the normal distribution.
7.3.8
> #7.3.8
> #QQ Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (xihat=qnorm(cihat,xbar,sd))
[1] 52.01528 63.99907 80.92210 92.62440 110.50769
> plot(xihat,xi,main="Normal Q-Q Plot for Grades",xlab="Theoretical Quantiles",ylab="Sample
Quantiles")
> abline(lsfit(xihat,xi))
P a g e | 193
Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.
Exercise 7.4
7.4.1
a)
We can’t exactly say the distribution of this data with the histogram obtained. It seems like the histogram
is slightly right skewed.
P a g e | 194
b)
Ha: The data does not follow the exponential power probability distribution.
R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:
> paramp(Point.Barrow)
Mean Mp Sd Sp p
353.817226 354.399367 13.809308 16.742494 4.596905
no.conv = FALSE
>
> chisq.test(cbind(Point.Barrow,dnormp(Point.Barrow,354.399367,16.742494,4.596905)))
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential power probability distribution.
c)
ˆ = 354.399367
So the average CO2 amount in Point Barrow during the period of 1974 to 2004 is approximately 354
parts per million(ppm).
7.4.2
a)
b)
Ha: The data does not follow the exponential power probability distribution.
R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:
> paramp(Mauna.Loa)
Mean Mp Sd Sp p
352.38910 352.78344 13.93138 16.38595 3.82405
no.conv = FALSE
> chisq.test(cbind(Mauna.Loa,dnormp(Mauna.Loa,352.78344,16.38595,3.82405)))
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential power probability distribution.
c)
ˆ = 352.78344
So the average CO2 amount in Mauna Lao during the period of 1974 to 2004 is approximately 353ppm.
7.4.3
a)
Data does not seem to be exactly symmetrical with this 8 bins. However the pattern may get change by
having higher number of bins in the histogram.
P a g e | 196
b)
Ha: The data does not follow the normal probability distribution.
> ks.test(Northern.Region,pnorm,mean(Northern.Region),sd(Northern.Region))
data: Northern.Region
D = 0.0615, p-value = 0.9989
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
normal probability distribution.
c)
x = 4.38703 , s = 0.6710476
We are 95% confident that the true mean lies between 4.194888 and 4.579173.
d)
The data points closely follow the straight line, showing that the data could follow the normal
distribution.
P a g e | 197
7.4.4
a)
b)
Ha: The data does not follow the gamma probability distribution.
R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:
> q1=mean(log(Central.Region))
> q2=log(mean(Central.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 33.09928
> beta=(mean(Central.Region))/alpha
> beta
[1] 0.1298179
>
> ks.test(Central.Region,pgamma,alpha,1/beta)
data: Central.Region
D = 0.0813, p-value = 0.9688
alternative hypothesis: two-sided
P a g e | 198
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
c)
x = 4.296879 , s = 0.7602079
We are 95% confident that the true mean lies between 4.079207 and 4.514551.
d)
The data points deviate from straight line showing that the data does not follow the normal distribution.
7.4.5
(Note: The problem asks us to use the beta distribution, but the distribution has domain from 0 to 1.
Therefore, I solve the problem using the gamma distribution similarly to the previous problem.)
P a g e | 199
a)
b)
Ha: The data does not follow the gamma probability distribution.
R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:
> q1=mean(log(Southern.Region))
> q2=log(mean(Southern.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 36.66364
> beta=(mean(Southern.Region))/alpha
> beta
[1] 0.1171047
>
> ks.test(Southern.Region,pgamma,alpha,1/beta)
data: Southern.Region
D = 0.0774, p-value = 0.9802
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
P a g e | 200
c)
x = 4.293485 , s = 0.7076159
We are 95% confident that the true mean lies between 4.090872 and 4.496098.
d)
The data points deviate from straight line showing that the data does not follow the normal distribution.
7.4.6
a) i)
Ha: The data does not follow the 3 parameter weibull probability distribution.
R code for the parameter estimation and the Kolmogorov-Smirnov goodness-of-fit test:
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 3
parameter weibull probability distribution.
ii)
Ha: The data does not follow the 3 parameter weibull probability distribution.
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 3
parameter weibull probability distribution.
b)
Both test fail the reject the null hypothesis. In our case, with the information given the weibull
distribution could be a good fit for our data.
P a g e | 202
c)
Using the estimated parameters from part a, we can estimate
1.7778932 x − 11.7814433 x−11.7814433
0.7778932 1.7778932
f ( x) = exp − , − x
73.5992769 73.5992769 73.5992769
1 1
d) 𝐸(𝑥) = 𝜇 + 𝛽Г (1 + 𝛼) = 11.78 + 73.60 ∗ Г (1 + 1.78) = 77.25
7.4.7
a)
i)
Ha: The data does not follow the Rayleigh probability distribution.
R code for the parameter estimation using MLE and the Kolmogorov-Smirnov goodness-of-fit test:
> sigsqr=(1/(2*n))*sum(wind$Wind^2)
> sig=sqrt(sigsqr)
> sig
[1] 60.87816
> ks.test(wind$Wind,prayleigh,sig)
data: wind$Wind
D = 0.0739, p-value = 0.8819
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
rayleigh probability distribution.
ii)
Ha: The data does not follow the Rayleigh probability distribution.
> Fi=raycdf(orderwind,sig)
> revFi=rev(Fi)
> n=length(wind$Wind)
> i=c(1:n)
> coef1=((2*i)-1)/n
> coef2=log(Fi)+log(1-revFi)
> x2=coef1*coef2
> s=sum(x2)
>
> A=-n-s
> #Test Statistic
>A
[1] 0.54775
> AA=(1 + 0.75/n + 2.25/n^2) * A
> pval=exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
> #p-value
> pval
[1] 0.1546356
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
Rayleigh probability distribution.
b)
Both tests, fail to reject the null hypothesis. In our case, with the information given the Rayleigh
distribution might be a good fit for our data.
c)
−x 2
x
f ( x) = e 7412.302 , x0
3706.151
Comparing the result between the 3 parameter Weibull and the Rayleigh distribution, the Rayleigh
distribution had higher p-values in both tests.
𝜋 3706.15𝜋
d) 𝐸(𝑥) = 𝜎√ 2 = √ 2
= 76.30
7.4.8
a)
The data is skewed to the right. Seems like the gamma distribution might be a good fit.
b)
Ha: The data does not follow the gamma probability distribution.
R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:
> q1=mean(log(Annual.Av....))
> q2=log(mean(Annual.Av....))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 18.15786
> beta=(mean(Annual.Av....))/alpha
> beta
[1] 0.3209327
>
> ks.test(Annual.Av....,pgamma,alpha,1/beta)
data: Annual.Av....
D = 0.0849, p-value = 0.8562
alternative hypothesis: two-sided
P a g e | 205
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
c)
7.4.9
a)
Looking at the histogram, it's possible that the data follows exponential distribution.
b)
R output for the estimated parameter, Kolmogorov-Smirnov test and Chi-squared test:
> set.seed(321123)
> sizespl=sample(Tumor_SIZE,50)
> sizespl
[1] 8 20 15 24 2 9 52 10 40 36 17 1 3 1 12 18 27 10 22 21 1 40 16 21 45
[26] 10 11 16 4 20 3 7 4 19 9 55 33 40 48 18 11 11 12 9 3 12 8 27 12 11
> beta=mean(sizespl)
> ks.test(sizespl,pexp,1/beta)
data: sizespl
D = 0.164, p-value = 0.136
alternative hypothesis: two-sided
> chisq.test(cbind(sizespl,dexp(sizespl,1/beta)))
Since in both tests the p-value > .05, we fail to reject H0.
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential probability distribution.
c)
−x
1 17.68
f ( x) = e , x0
17.68
d)
ˆMLE = 17.68
So the expected tumor size of a breast cancer patient is 17.68mm.
7.4.10
a)
P a g e | 207
b)
Ha: The sampled data does not follow 2 parameter weibull probability distribution.
> set.seed(123123)
> survt=sample(SRV_TIME_YEAR,50)
> survt
[1] 4.91667 6.75000 1.75000 1.83333 4.83333 3.08333 3.83333 6.25000 5.25000
[10] 2.83333 7.50000 7.58333 6.75000 2.25000 2.16667 2.66667 4.58333 2.33333
[19] 0.16667 2.25000 3.50000 3.83333 4.33333 2.75000 3.66667 2.25000 3.66667
[28] 5.66667 4.41667 0.83333 4.00000 0.25000 3.91667 0.91667 1.83333 3.58333
[37] 3.08333 4.16667 1.66667 2.75000 5.50000 6.50000 1.75000 3.50000 3.91667
[46] 0.41667 2.41667 0.41667 1.16667 6.50000
> fitdistr(survt,"weibull")
shape scale
1.7734463 3.8544480
(0.2051998) (0.3214546)
> ks.test(survt,pweibull,1.7734463,3.8544480)
data: survt
D = 0.0895, p-value = 0.8182
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.
.7734463 1.7734463
1.7734463 x x
c) f ( x ) = exp − , x0
3.8544480 3.8544480 3.8544480
Once we have the pdf, we can obtain the expected values and probabilities.
P a g e | 208
x
1.7734463
F ( x ) = 1 − exp − , x0
3.8544480
d)
Survival function
x
1.7734463
S ( x ) = 1 − 1 − exp − , x 0
3.8544480
1.7734463
x
S ( x ) = exp − , x0
3.8544480
e)
From the survival function we can obtain the probabilities of surviving more than “t” years for a breast
cancer patient.
7.4.11
a)
P a g e | 209
b)
Ha: The sampled data does not follow 2 parameter weibull probability distribution.
data: lung.ms
D = 0.0925, p-value = 0.683
alternative hypothesis: two-sided
P a g e | 210
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.
c)
2.1086635 x
1.1086635
x
2.1086635
f ( x) = exp − , x0
48.2347045 48.2347045 48.2347045
Estimated E ( x ) :
1
=48.2347045 1 + = 42.72008
2.1086635
So the average malignant tumor size of a male lung cancer patient is 42.72mm.
7.4.12
a)
Ha: The sampled data does not follow 2 parameter weibull probability distribution.
P a g e | 211
data: lung.fs
D = 0.1161, p-value = 0.3936
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.
c)
1.8975835 x
.8975835
x
1.8975835
f ( x) = exp − , x0
39.8984887 39.8984887 39.8984887
Estimated E ( x ) :
1
=39.8984887 1 + = 35.40591
1.8975835
So the average malignant tumor size of a female lung cancer patient is 35.41mm.
7.4.13
From the samples drawn, on average males tumors’ are around 7mm larger than the average female
tumor. At the same time, females show less variance than males in tumor sizes when looking at the
histograms.
7.4.14
P a g e | 212
a)
Ha: The sampled data does not follow weibull probability distribution.
data: lung.stms
D = 0.1457, p-value = 0.2393
alternative hypothesis: two-sided
P a g e | 213
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.
c)
0.79037722 x
-0.2096228
x
0.79037722
f ( x) = exp − , x0
29.77144952 29.77144952 29.77144952
x
0.79037722
F ( x ) = 1 − exp − , x0
29.77144952
d) Survival function
x
0.79037722
S ( x ) = 1 − 1 − exp − , x 0
29.77144952
x
0.79037722
S ( x ) = exp − , x0
29.77144952
e)
7.4.15
a)
P a g e | 214
Ha: The sampled data does not follow weibull probability distribution.
data: lung.stfs
D = 0.0561, p-value = 0.9975
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.
0.9689514 x
-0.0310486
x
0.9689514
c) f ( x ) = exp − , x0
22.6733066 22.6733066 22.6733066
0.9689514
x
F ( x ) = 1 − exp − , x0
22.6733066
d) Survival function
x
0.9689514
S ( x ) = 1 − 1 − exp − , x 0
22.6733066
0.9689514
x
S ( x ) = exp − , x0
22.6733066
e)
P a g e | 216
7.4.16
Males on average tend to have a longer survival time than the females.
7.4.17
a)
b)
Ha: The sampled data does not follow 2 parameter weibull probability distribution.
> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(323121)
> colon.ms=sample(colon.m$X1.CS_SIZE,50)
>
> fitdistr(colon.ms,"weibull")
shape scale
1.6269871 57.0717161
( 0.1779206) ( 5.2153708)
> ks.test(colon.ms,pweibull,1.6269871,57.0717161)
data: colon.ms
D = 0.1162, p-value = 0.5096
alternative hypothesis: two-sided
P a g e | 217
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.
c)
0.6269871 1.6269871
1.6269871 x x
f ( x) = exp − , x0
48.2347045 48.2347045 48.2347045
Estimated E ( x ) :
1
=48.2347045 1 + = 51.09328
1.6269871
7.4.18
a)
b)
Ha: The sampled data does not follow weibull probability distribution.
> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(121323)
> colon.fs=sample(colon.f$X2.CS_SIZE,50)
>
> fitdistr(colon.fs,"weibull")
shape scale
1.7232673 48.4527073
( 0.1659436) ( 4.2061986)
> ks.test(colon.fs,pweibull,1.7232673,48.4527073)
P a g e | 218
data: colon.fs
D = 0.1357, p-value = 0.3163
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.
c)
1.7232673 x
.7232673
x
1.7232673
f ( x) = exp − , x0
48.4527073 48.4527073 48.4527073
Estimated E ( x ) :
1
=48.4527073 1 + = 43.19307
1.7232673
7.4.19
a)
P a g e | 219
Ha: The sampled data does not follow gamma probability distribution.
> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(112233)
> colon.msts=sample(colon.m$X1.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.msts))
> q2=log(mean(colon.msts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.621226
> beta=(mean(colon.msts))/alpha
> beta
[1] 1.942974
>
> ks.test(colon.msts,pgamma,alpha,1/beta)
data: colon.msts
P a g e | 220
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
c)
−x
1
f ( x) = x1.621226 −1 1.942974
e , x0
(1.621226 )1.9429741.621226
−x
1
f ( x) = x.621226 e1.942974 , x0
2.630375
Estimated E ( x ) :
= 1.621226 1.942974 = 3.15
7.4.20
a)
Ha: The sampled data does not follow gamma probability distribution.
P a g e | 221
> colon=read.csv("seercoloncancerdata.csv",header=T)
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
> set.seed(123321)
> colon.fsts=sample(colon.f$X2.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.fsts))
> q2=log(mean(colon.fsts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.262591
> beta=(mean(colon.fsts))/alpha
> beta
[1] 2.47771
>
> ks.test(colon.fsts,pgamma,alpha,1/beta)
data: colon.fsts
D = 0.104, p-value = 0.6519
alternative hypothesis: two-sided
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
c)
−x
1
f ( x) = x1.262591−1 2.47771
e , x0
(1.262591) 2.477711.262591
−x
1
f ( x) = x.262591e 2.47771 , x0
2.842123
Estimated E ( x ) :
= 1.262591 2.47771 = 3.128333
7.4.21
On average, male and female have a similar survival time, but females have a larger variance than males.