You are on page 1of 40

Mathematical Statistics with

Applications in R 2nd Edition


Ramachandran Solutions Manual
Visit to Download in Full: https://testbankdeal.com/download/mathematical-statistics-
with-applications-in-r-2nd-edition-ramachandran-solutions-manual/
P a g e | 183

Chapter 7
GOODNESS OF FIT TESTS AND APPLICATIONS

7.1 INTRODUCTION
7.2 CHI-SQUARE TESTS FOR COUNT DATA
7.3 Goodness-of-fit tests to identify the Probability Distribution
7.4. SOME IMPORTANT APPLICATIONS: PARAMETRIC ANALYSIS
7.4 Chapter Summary
7.6 Computer Examples
Projects for Chapter 7
P a g e | 184

Exercise 7.2

7.2.1

H0: The coin is fair.

Ha: The coin is unfair.

p1 = p2 = .5
E1 = E2 = 200 (.5 ) = 100
(104 − 100 ) ( 96 − 100 )
2 2
16 + 16
Q 2
= + = = .32
100 100 100

Let  = 0.05

Since k=2, the Rejection Region is  21,0.05  3.841

Since .32 < 3.841, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the coins is unfair.

7.2.2

H0: Opinion on collective bargaining is independent of employee classification.

Ha: Opinion on collective bargaining is dependent on employee classification.

Here c = 3 and r = 3 , d . f . = ( 3 − 1)( 3 − 1) = 4 ,  = 0.05


RR :  2 4,0.05  9.488

E11 =
( 60 )( 90 ) = 27 E12 =
( 60 )( 50 ) = 15 E13 =
( 60 )( 60 ) = 18
200 200 200

E21 =
(100 )( 90 ) = 45 E22 =
(100 )( 50 ) = 25 E23 =
(100 )( 60 ) = 30
200 200 200

E31 =
( 40 )( 90 ) = 18 E32 =
( 40 )( 50 ) = 10 E33 =
( 40 )( 60 ) = 12
200 200 200

(O − Eij )
2
3 3
Q =  = 43.86
2 ij

i =1 j =1 Eij

Since 43.86 > 9.488, we reject H0

At the 5% level of significance, there is enough evidence to show that the opinion on collective
bargaining is dependent on employee classification.
P a g e | 185

7.2.3

H0: Major of undergraduate students is independent of gender.

Ha: Major of undergraduate students is dependent on gender.

Here c = 4 and r = 2 , d . f . = ( 4 − 1)( 2 − 1) = 3 ,  = 0.01


RR :  23,0.01  11.345

E11 =
( 205)(120 ) = 82 E12 =
( 205)( 52 ) = 35.533
300 300

E21 =
( 95)(120 ) = 38 E22 =
= 16.467
( 95)( 52 )
300 300

E13 =
( 205)( 39 )
= 26.65 E14 =
( 205)(89 ) = 60.817
300 300

E23 =
( 95)( 39 ) = 12.35 E = ( 95)(89 ) = 28.183
24
300 300

(O − Eij )
2
2 4
Q 2 =  = 4.5979
ij

i =1 j =1 Eij

Since 4.5979 < 11.345, we fail to reject H0

At the 1% level of significance, there is not enough evidence to show that the major of undergraduate
students is dependent on gender.

7.2.4

H0: p1 = .35 , p2 = .2 , p3 = .15 , p4 = .3


Ha: At least one of the probabilities is different from the hypothesized value.

1 2 3 4
Obs 427.5 237.5 19 266
Exp 380 190 47.5 332.5

Here k = 4 , d . f . = 3 ,  = 0.05 4
( Oi − Ei )
2

Q =
2
= 48.2125
RR :  23,0.05  7.815 i =1 Ei

Since 48.2125 > 7.815, we reject H0

At the 5% level of significance, there is enough evidence to show that at least one of the probabilities is
different from the hypothesized value.
P a g e | 186

7.2.5

(a)

H0: p1 = .2 , p2 = .2 , p3 = .3 , p4 = .2 , p5 = .1
Ha: At least one of the probabilities is different from the hypothesized value.

1 2 3 4 5
Obs 22 21 29 17 11
Exp 20 20 30 20 10

Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488

( Oi − Ei )
2
4
Q =
2
= .833
i =1 Ei

Since .833 < 9.488, we fail to reject H0

At the 5% level of significance, there is not enough evidence to show that at least one of the probabilities
is different from the hypothesized value.

(b)

H0: The choice of footwear by undergraduate students is independent of their gender.

Ha: The choice of footwear by undergraduate students is dependent on their gender.

Here c = 5 and r = 2 , d . f . = ( 5 − 1)( 2 − 1) = 4 ,  = 0.05


RR :  2 4,0.05  9.488

E11 = E21 =
( 22 )( 50 ) = 11 , E12 = E22 =
( 21)( 50 ) = 10.5
100 100

E13 = E23 =
( 29 )( 50 ) = 14.5 , E14 = E24 =
(17 )( 50 ) = 8.5
100 100

E15 = E25 =
(11)( 50 ) = 5.5
100
(O − Eij )
2
2 5
Q 2 =  = 3.8428
ij

i =1 j =1 Eij

Since 3.8428 < 9.488, we fail to reject H0

At the 5% level of significance, there is not enough evidence to show that the choice of footwear by
undergraduate students is dependent on their gender.
P a g e | 187

7.2.6

H0: Dices are fair.

Ha: At least one of the three dices is unfair.

125 75 15 1
p1 = , p2 = , p3 = , p4 =
216 216 216 216
125 75
E1 = (150 ) = 86.81 , E2 = (150 ) = 52.08
216 216
15 1
E3 = (150 ) = 10.42 , E4 = (150 ) = .69
216 216

(Note: Assumption Ei  5 is not meet)

Here k = 4 , d . f . = 3 ,  = 0.05
RR :  23,0.05  7.815

( Oi − Ei )
2
4
Q2 =  = 54.15
i =1 Ei

Since 54.15 > 7.815, we reject H0

At the 5% level of significance, there is enough evidence to show that at least one of the three dices is
unfair.

Exercise 7.3

7.3.1

H0: The speed of vehicles is normally distributed.

Ha: The speed of vehicles is not normally distributed.

p1 = .000144 , p2 = .13 , p3 = .785 , p4 = .084 , p5 = 8.84 10 −5


E1 = .022 , E2 = 19.52 , E3 = 117.77 , E4 = 12.67 , E5 = .013

(Note: Assumption Ei  5 is not meet)

Here k = 5 , d . f . = 4 ,  = 0.01 5
( Oi − Ei )
2

Q =
2
= 9352.66
RR :  2 4,0.01  13.277 i =1 Ei

Since 9352.66 > 13.277, we reject H0

At the 1% level of significance, there is enough evidence to show that the speed of vehicles is not
normally distributed.
P a g e | 188

7.3.2

H0: Daily mean temperature are normally distributed.

Ha: Daily mean temperature are not normally distributed.

p1 = 8.37 10−19 , p2 = 4.33 10−6 , p3 = .27 , p4 = .73 , p5 = 2.6 10 −4


E1 = 4.18 10−17 , E2 = 6.67 10−5 , E3 = 13.5 , E4 = 36.48 , E5 = .013

(Note: Assumption Ei  5 is not meet)

Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488

( Oi − Ei )
2
5
Q =
2
 106
i =1 Ei

Since 106 > 9.488, we reject H0

At the 5% level of significance, there is enough evidence to show that daily mean temperature are not
normally distributed.

7.3.3

H0: The lifetime of components follow exponential distribution.

Ha: The lifetime of components does not follow exponential distribution.

p1 = .39 , p2 = .24 , p3 = .14 , p4 = .09 , p 5 = .14


E1 = 11.8 , E2 = 7.16 , E3 = 4.34 , E4 = 2.63 , E5 = 4.06

(Note: Assumption Ei  5 is not meet)

0-100 100-200 200-300 300-400 >400


Obs 15 6 4 2 3

Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488

( Oi − Ei )
2
5
Q =
2
= 1.493
i =1 Ei

Since 1.493 < 9.488, we fail to reject H0

At the 5% level of significance, there is not enough evidence to show that the lifetime of components
does not follow exponential distribution.

Using the R-code for example 7.3.1


P a g e | 189

7.3.4

a) Estimate α and β using MLE

x 0.5
ˆMLE = , ˆ MLE 
 1 n
log ( x ) −  log ( xi )
n i =1

R Output:

> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24,
+ 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21,
+ 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> q1=mean(log(x))
> q2=log(mean(x))
> s=q2-q1
>
> alpha=0.5/s
> alpha
[1] 102.0185
> beta=(mean(x))/alpha
> beta
[1] 0.2181423

ˆMLE  0.2181423 , ˆ MLE  102.0185


H0 : The data follow Gamma Distribution

Ha : The data does not follow Gamma Distribution

0-20 20-22 22-24 24-26 >26


Obs 12 20 13 9 1

O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .15 , P2 = .31 , P3 = .32 , P4 = .16 , P5 = .05
E1 = 8.38 , E 2 = 17.31 , E 3 = 17.75 , E 4 = 8.82 , E 5 = 2.75
Q 2 = 4.37

k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2


2
 5.991

Since 4.37  5.991 , we fail to reject H0

At the 5% level of significance, there is not enough evidence to show that the data does not follow gamma
distribution.
P a g e | 190

b) R Output for estimating the parameters using the MASS package:

> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24, 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21, 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> fitdistr(x,"weibull")
shape scale
10.8855539 23.2637020
( 1.1087929) ( 0.3053506)

ˆ MLE  23.2637020 , ˆMLE  10.8855539


H0 : The data follow Weibull Distribution

Ha : The data does not follow Weibull Distribution

0-20 20-22 22-24 24-26 >26


Obs 12 20 13 9 1

O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .18 , P2 = .24 , P3 = .33 , P4 = .21 , P5 = .034
E1 = 9.65 , E 2 = 13.44 , E 3 = 18.4 , E 4 = 11.59 , E 5 = 1.92
Q 2 = 6.37

k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2


2
 5.991

Since 6.37  5.991 , we reject H0

At the 5% level of significance, there is enough evidence to show that the data does not follow weibull
distribution.

7.3.5 P-P Plot:

Since the data points do not fall on the straight line, we can determine that the data doesn't follow the
normal pdf.
P a g e | 191

7.3.6

> #7.3.6
> #PP Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (ci=pnorm(xi,xbar,sd))
[1] 0.1519306 0.3739965 0.6501090 0.8626969 0.9692656
> plot(cihat,ci,main="Normal P-P Plot for Grades",xlab="Observed Cum Prob",ylab="Exp Cumulative
Prob")
> abline(lsfit(cihat,ci))

Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.
P a g e | 192

7.3.7

Q-Q Plot:

Since the data points deviate from the straight line, we can conclude that the data doesn't follow
the normal distribution.

7.3.8

> #7.3.8
> #QQ Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (xihat=qnorm(cihat,xbar,sd))
[1] 52.01528 63.99907 80.92210 92.62440 110.50769
> plot(xihat,xi,main="Normal Q-Q Plot for Grades",xlab="Theoretical Quantiles",ylab="Sample
Quantiles")
> abline(lsfit(xihat,xi))
P a g e | 193

Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.

Exercise 7.4

7.4.1

a)

We can’t exactly say the distribution of this data with the histogram obtained. It seems like the histogram
is slightly right skewed.
P a g e | 194

b)

H0: The data follows the exponential power probability distribution.

Ha: The data does not follow the exponential power probability distribution.

R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:

> paramp(Point.Barrow)
Mean Mp Sd Sp p
353.817226 354.399367 13.809308 16.742494 4.596905

no.conv = FALSE

>
> chisq.test(cbind(Point.Barrow,dnormp(Point.Barrow,354.399367,16.742494,4.596905)))

Pearson's Chi-squared test

data: cbind(Point.Barrow, dnormp(Point.Barrow, 354.399367, 16.742494, 4.596905))


X-squared = 0.0348, df = 30, p-value = 1

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential power probability distribution.

c)
ˆ = 354.399367

So the average CO2 amount in Point Barrow during the period of 1974 to 2004 is approximately 354
parts per million(ppm).

7.4.2

a)

The histogram is clearly skewed to the right.


P a g e | 195

b)

H0: The data follows the exponential power probability distribution.

Ha: The data does not follow the exponential power probability distribution.

R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:

> paramp(Mauna.Loa)
Mean Mp Sd Sp p
352.38910 352.78344 13.93138 16.38595 3.82405

no.conv = FALSE

> chisq.test(cbind(Mauna.Loa,dnormp(Mauna.Loa,352.78344,16.38595,3.82405)))

Pearson's Chi-squared test

data: cbind(Mauna.Loa, dnormp(Mauna.Loa, 352.78344, 16.38595, 3.82405))


X-squared = 0.0421, df = 30, p-value = 1

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential power probability distribution.

c)
ˆ = 352.78344

So the average CO2 amount in Mauna Lao during the period of 1974 to 2004 is approximately 353ppm.

7.4.3

a)

Data does not seem to be exactly symmetrical with this 8 bins. However the pattern may get change by
having higher number of bins in the histogram.
P a g e | 196

b)

H0: The data follows the normal probability distribution.

Ha: The data does not follow the normal probability distribution.

R output for Kolmogorov-Smirnov goodnes of fit test:

> ks.test(Northern.Region,pnorm,mean(Northern.Region),sd(Northern.Region))

One-sample Kolmogorov-Smirnov test

data: Northern.Region
D = 0.0615, p-value = 0.9989
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
normal probability distribution.

c)
x = 4.38703 , s = 0.6710476

95% Confidence Interval: (4.194888 , 4.579173)

We are 95% confident that the true mean lies between 4.194888 and 4.579173.

d)

The data points closely follow the straight line, showing that the data could follow the normal
distribution.
P a g e | 197

7.4.4

a)

Data seems to be slightly skewed to the right.

b)

H0: The data follows the gamma probability distribution.

Ha: The data does not follow the gamma probability distribution.

R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:

> q1=mean(log(Central.Region))
> q2=log(mean(Central.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 33.09928
> beta=(mean(Central.Region))/alpha
> beta
[1] 0.1298179
>
> ks.test(Central.Region,pgamma,alpha,1/beta)

One-sample Kolmogorov-Smirnov test

data: Central.Region
D = 0.0813, p-value = 0.9688
alternative hypothesis: two-sided
P a g e | 198

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.

c)
x = 4.296879 , s = 0.7602079

95% Confidence Interval: (4.079207 , 4.514551)

We are 95% confident that the true mean lies between 4.079207 and 4.514551.

d)

The data points deviate from straight line showing that the data does not follow the normal distribution.

7.4.5

(Note: The problem asks us to use the beta distribution, but the distribution has domain from 0 to 1.
Therefore, I solve the problem using the gamma distribution similarly to the previous problem.)
P a g e | 199

a)

b)

H0: The data follows the gamma probability distribution.

Ha: The data does not follow the gamma probability distribution.

R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:

> q1=mean(log(Southern.Region))
> q2=log(mean(Southern.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 36.66364
> beta=(mean(Southern.Region))/alpha
> beta
[1] 0.1171047
>
> ks.test(Southern.Region,pgamma,alpha,1/beta)

One-sample Kolmogorov-Smirnov test

data: Southern.Region
D = 0.0774, p-value = 0.9802
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.
P a g e | 200

c)
x = 4.293485 , s = 0.7076159

95% Confidence Interval: (4.090872 , 4.496098)

We are 95% confident that the true mean lies between 4.090872 and 4.496098.

d)

The data points deviate from straight line showing that the data does not follow the normal distribution.

7.4.6

a) i)

H0: The data follows the 3 parameter weibull probability distribution.

Ha: The data does not follow the 3 parameter weibull probability distribution.

R code for the parameter estimation and the Kolmogorov-Smirnov goodness-of-fit test:

> w3p <- function(x, a,b,c) {c/b*((x -a)/b)^(c-1)*exp(-((x-a)/b)^c)}


> fitdistr(wind$Wind,w3p,list(a = 5, b = 65, c = 1))
a b c
11.7814433 73.5992769 1.7778932
( 4.5188223) ( 7.9794036) ( 0.2639357)
>
> ks.test(wind$Wind-11.7814433,pweibull,1.7778932,73.5992769)

One-sample Kolmogorov-Smirnov test


P a g e | 201

data: wind$Wind - 11.7814433


D = 0.0818, p-value = 0.7935
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 3
parameter weibull probability distribution.

ii)

H0: The data follows the 3 parameter weibull probability distribution.

Ha: The data does not follow the 3 parameter weibull probability distribution.

R code for the Anderson-Darling goodness-of-fit test:

> #sort observations


> orderwind=sort(wind$Wind)
>
> #weibull 3 parameter CDF
> w3cdf=function(x,a,b,c) {1-exp(-((x-a)/b)^c)}
>
> Fi=w3cdf(orderwind,11.7814433,73.5992769,1.7778932)
> revFi=rev(Fi)
> n=length(wind$Wind)
> i=c(1:n)
> coef1=((2*i)-1)/n
> coef2=log(Fi)+log(1-revFi)
> x2=coef1*coef2
> s=sum(x2)
>
> A=-n-s
> #Test Statistic
>A
[1] 0.5946242
> AA=(1 + 0.75/n + 2.25/n^2) * A
> pval=exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
> #p-value
> pval
[1] 0.1180549

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 3
parameter weibull probability distribution.

b)

Both test fail the reject the null hypothesis. In our case, with the information given the weibull
distribution could be a good fit for our data.
P a g e | 202

c)
Using the estimated parameters from part a, we can estimate
1.7778932  x − 11.7814433    x−11.7814433  
0.7778932 1.7778932
f ( x) =   exp −    , −  x  
73.5992769  73.5992769    73.5992769  
1 1
d) 𝐸(𝑥) = 𝜇 + 𝛽Г (1 + 𝛼) = 11.78 + 73.60 ∗ Г (1 + 1.78) = 77.25

The expected velocity of a category 5 hurricane is of 77.25mph.

7.4.7

a)

i)

H0: The data follows the Rayleigh probability distribution.

Ha: The data does not follow the Rayleigh probability distribution.

R code for the parameter estimation using MLE and the Kolmogorov-Smirnov goodness-of-fit test:

> sigsqr=(1/(2*n))*sum(wind$Wind^2)
> sig=sqrt(sigsqr)
> sig
[1] 60.87816
> ks.test(wind$Wind,prayleigh,sig)

One-sample Kolmogorov-Smirnov test

data: wind$Wind
D = 0.0739, p-value = 0.8819
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
rayleigh probability distribution.

ii)

H0: The data follows the Rayleigh probability distribution.

Ha: The data does not follow the Rayleigh probability distribution.

R code for the Anderson-Darling goodness-of-fit test:

> #sort observations


> orderwind=sort(wind$Wind)
>
> #rayleigh CDF
> raycdf=function(x,b) {1-exp(-((x^2)/(2*(b^2))))}
>
P a g e | 203

> Fi=raycdf(orderwind,sig)
> revFi=rev(Fi)
> n=length(wind$Wind)
> i=c(1:n)
> coef1=((2*i)-1)/n
> coef2=log(Fi)+log(1-revFi)
> x2=coef1*coef2
> s=sum(x2)
>
> A=-n-s
> #Test Statistic
>A
[1] 0.54775
> AA=(1 + 0.75/n + 2.25/n^2) * A
> pval=exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
> #p-value
> pval
[1] 0.1546356

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
Rayleigh probability distribution.

b)

Both tests, fail to reject the null hypothesis. In our case, with the information given the Rayleigh
distribution might be a good fit for our data.

c)

Using the estimated parameters from part a, we can estimate

−x 2
x
f ( x) = e 7412.302 , x0
3706.151
Comparing the result between the 3 parameter Weibull and the Rayleigh distribution, the Rayleigh
distribution had higher p-values in both tests.

𝜋 3706.15𝜋
d) 𝐸(𝑥) = 𝜎√ 2 = √ 2
= 76.30

The expected velocity of a category 5 hurricane is of 76.30mph.


P a g e | 204

7.4.8

a)

The data is skewed to the right. Seems like the gamma distribution might be a good fit.

b)

H0: The data follows the gamma probability distribution.

Ha: The data does not follow the gamma probability distribution.

R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:

> q1=mean(log(Annual.Av....))
> q2=log(mean(Annual.Av....))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 18.15786
> beta=(mean(Annual.Av....))/alpha
> beta
[1] 0.3209327
>
> ks.test(Annual.Av....,pgamma,alpha,1/beta)

One-sample Kolmogorov-Smirnov test

data: Annual.Av....
D = 0.0849, p-value = 0.8562
alternative hypothesis: two-sided
P a g e | 205

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.

c)

Using our estimated parameter, we can estimate the expected value.

ˆ MLE = 18.15786 , ˆMLE = 0.3209327


E ( x ) = 18.15786  0.3209327 = 5.827451

7.4.9

a)

Looking at the histogram, it's possible that the data follows exponential distribution.

b)

H0: The sampled data follow exponential distribution.

Ha: The sampled data does not follow exponential distribution.


P a g e | 206

R output for the estimated parameter, Kolmogorov-Smirnov test and Chi-squared test:

> set.seed(321123)
> sizespl=sample(Tumor_SIZE,50)
> sizespl
[1] 8 20 15 24 2 9 52 10 40 36 17 1 3 1 12 18 27 10 22 21 1 40 16 21 45
[26] 10 11 16 4 20 3 7 4 19 9 55 33 40 48 18 11 11 12 9 3 12 8 27 12 11
> beta=mean(sizespl)
> ks.test(sizespl,pexp,1/beta)

One-sample Kolmogorov-Smirnov test

data: sizespl
D = 0.164, p-value = 0.136
alternative hypothesis: two-sided

> chisq.test(cbind(sizespl,dexp(sizespl,1/beta)))

Pearson's Chi-squared test

data: cbind(sizespl, dexp(sizespl, 1/beta))


X-squared = 8.6364, df = 49, p-value = 1

Since in both tests the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential probability distribution.

c)
−x
1 17.68
f ( x) = e , x0
17.68
d)

ˆMLE = 17.68
So the expected tumor size of a breast cancer patient is 17.68mm.

7.4.10

a)
P a g e | 207

It is possible that the data follow the 2 parameter weibull distributiion.

b)

H0: The sampled data follows 2 parameter weibull probability distribution.

Ha: The sampled data does not follow 2 parameter weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> set.seed(123123)
> survt=sample(SRV_TIME_YEAR,50)
> survt
[1] 4.91667 6.75000 1.75000 1.83333 4.83333 3.08333 3.83333 6.25000 5.25000
[10] 2.83333 7.50000 7.58333 6.75000 2.25000 2.16667 2.66667 4.58333 2.33333
[19] 0.16667 2.25000 3.50000 3.83333 4.33333 2.75000 3.66667 2.25000 3.66667
[28] 5.66667 4.41667 0.83333 4.00000 0.25000 3.91667 0.91667 1.83333 3.58333
[37] 3.08333 4.16667 1.66667 2.75000 5.50000 6.50000 1.75000 3.50000 3.91667
[46] 0.41667 2.41667 0.41667 1.16667 6.50000
> fitdistr(survt,"weibull")
shape scale
1.7734463 3.8544480
(0.2051998) (0.3214546)
> ks.test(survt,pweibull,1.7734463,3.8544480)

One-sample Kolmogorov-Smirnov test

data: survt
D = 0.0895, p-value = 0.8182
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.

  
.7734463 1.7734463
1.7734463  x  x 
c) f ( x ) =   exp −    , x0
3.8544480  3.8544480    3.8544480  

Once we have the pdf, we can obtain the expected values and probabilities.
P a g e | 208

  x 
1.7734463

F ( x ) = 1 − exp −    , x0
  3.8544480  
d)
Survival function
   x 
1.7734463
 
S ( x ) = 1 − 1 − exp −     , x  0

   3.8544480   
  
1.7734463
x 
S ( x ) = exp −    , x0
  3.8544480  

e)

From the survival function we can obtain the probabilities of surviving more than “t” years for a breast
cancer patient.

7.4.11

a)
P a g e | 209

b)

H0: The sampled data follows 2 parameter weibull probability distribution.

Ha: The sampled data does not follow 2 parameter weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> lung=read.csv("Lung cancer data.csv",header=T)


>
> lungs=split(lung,lung$SEX)
>
> lung.m=as.data.frame(lungs[1])
> lung.f=as.data.frame(lungs[2])
>
> set.seed(321123)
> lung.ms=sample(lung.m$X1.Tumor,60)
>
> fitdistr(lung.ms,"weibull")
shape scale
2.1086635 48.2347045
( 0.2076375) ( 3.1234964)
> ks.test(lung.ms,pweibull,2.1086635,48.2347045)

One-sample Kolmogorov-Smirnov test

data: lung.ms
D = 0.0925, p-value = 0.683
alternative hypothesis: two-sided
P a g e | 210

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.

c)

2.1086635  x 
1.1086635
  x 
2.1086635

f ( x) =   exp −    , x0
48.2347045  48.2347045    48.2347045  
Estimated E ( x ) :
 1 
=48.2347045   1 +  = 42.72008
 2.1086635 

So the average malignant tumor size of a male lung cancer patient is 42.72mm.

7.4.12

a)

b) H0: The sampled data follows 2 parameter weibull probability distribution.

Ha: The sampled data does not follow 2 parameter weibull probability distribution.
P a g e | 211

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> lung=read.csv("Lung cancer data.csv",header=T)


>
> lungs=split(lung,lung$SEX)
>
> lung.m=as.data.frame(lungs[1])
> lung.f=as.data.frame(lungs[2])
>
> set.seed(321123)
> lung.fs=sample(lung.f$X2.Tumor,60)
>
> fitdistr(lung.fs,"weibull")
shape scale
1.8975835 39.8984887
( 0.1870386) ( 2.8626266)
> ks.test(lung.fs,pweibull,1.8975835,39.8984887)

One-sample Kolmogorov-Smirnov test

data: lung.fs
D = 0.1161, p-value = 0.3936
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.

c)

1.8975835  x 
.8975835
  x 
1.8975835

f ( x) =   exp −    , x0
39.8984887  39.8984887    39.8984887  
Estimated E ( x ) :
 1 
=39.8984887   1 +  = 35.40591
 1.8975835 

So the average malignant tumor size of a female lung cancer patient is 35.41mm.

7.4.13

From the samples drawn, on average males tumors’ are around 7mm larger than the average female
tumor. At the same time, females show less variance than males in tumor sizes when looking at the
histograms.

7.4.14
P a g e | 212

a)

b) H0: The sampled data follows weibull probability distribution.

Ha: The sampled data does not follow weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> lung=read.csv("Lung cancer data.csv",header=T)


>
> lungs=split(lung,lung$SEX)
>
> lung.m=as.data.frame(lungs[1])
> lung.f=as.data.frame(lungs[2])
>
> set.seed(321123)
> lung.stms=sample(lung.m$X1.SurvTimeMT,50)
>
> fitdistr(lung.stms,"weibull")
shape scale
0.79037722 29.77144952
( 0.08391815) ( 5.64803367)
> ks.test(lung.stms,pweibull,0.79037722,29.77144952)

One-sample Kolmogorov-Smirnov test

data: lung.stms
D = 0.1457, p-value = 0.2393
alternative hypothesis: two-sided
P a g e | 213

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.

c)

0.79037722  x 
-0.2096228
  x 
0.79037722

f ( x) =   exp −    , x0
29.77144952  29.77144952    29.77144952  

  x 
0.79037722

F ( x ) = 1 − exp −    , x0
  29.77144952  

d) Survival function
   x 
0.79037722
 
S ( x ) = 1 − 1 − exp −     , x  0
   29.77144952   
 
  x 
0.79037722

S ( x ) = exp −    , x0
  29.77144952  

e)

7.4.15

a)
P a g e | 214

b) H0: The sampled data follows weibull probability distribution.

Ha: The sampled data does not follow weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> lung=read.csv("Lung cancer data.csv",header=T)


>
> lungs=split(lung,lung$SEX)
>
> lung.m=as.data.frame(lungs[1])
> lung.f=as.data.frame(lungs[2])
>
> set.seed(321321)
> lung.stfs=sample(lung.f$X2.SurvTimeMT,50)
>
> fitdistr(lung.stfs,"weibull")
shape scale
0.9689514 22.6733066
( 0.1067687) ( 3.4941692)
> ks.test(lung.stfs,pweibull,0.9689514,22.6733066)

One-sample Kolmogorov-Smirnov test

data: lung.stfs
D = 0.0561, p-value = 0.9975
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.


P a g e | 215

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.

0.9689514  x 
-0.0310486
  x 
0.9689514

c) f ( x ) =   exp −    , x0
22.6733066  22.6733066    22.6733066  

  
0.9689514
x 
F ( x ) = 1 − exp −    , x0
  22.6733066  

d) Survival function
   x 
0.9689514
 
S ( x ) = 1 − 1 − exp −     , x  0
   22.6733066   
 
  
0.9689514
x 
S ( x ) = exp −    , x0
  22.6733066  

e)
P a g e | 216

7.4.16

Males on average tend to have a longer survival time than the females.

7.4.17

a)

b)

H0: The sampled data follows 2 parameter weibull probability distribution.

Ha: The sampled data does not follow 2 parameter weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(323121)
> colon.ms=sample(colon.m$X1.CS_SIZE,50)
>
> fitdistr(colon.ms,"weibull")
shape scale
1.6269871 57.0717161
( 0.1779206) ( 5.2153708)
> ks.test(colon.ms,pweibull,1.6269871,57.0717161)

One-sample Kolmogorov-Smirnov test

data: colon.ms
D = 0.1162, p-value = 0.5096
alternative hypothesis: two-sided
P a g e | 217

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the 2
parameter weibull probability distribution.

c)

  
0.6269871 1.6269871
1.6269871  x  x 
f ( x) =   exp −    , x0
48.2347045  48.2347045    48.2347045  
Estimated E ( x ) :
 1 
=48.2347045   1 +  = 51.09328
 1.6269871 
7.4.18

a)

b)

H0: The sampled data follows weibull probability distribution.

Ha: The sampled data does not follow weibull probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(121323)
> colon.fs=sample(colon.f$X2.CS_SIZE,50)
>
> fitdistr(colon.fs,"weibull")
shape scale
1.7232673 48.4527073
( 0.1659436) ( 4.2061986)
> ks.test(colon.fs,pweibull,1.7232673,48.4527073)
P a g e | 218

One-sample Kolmogorov-Smirnov test

data: colon.fs
D = 0.1357, p-value = 0.3163
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
weibull probability distribution.

c)

1.7232673  x 
.7232673
  x 
1.7232673

f ( x) =   exp −    , x0
48.4527073  48.4527073    48.4527073  
Estimated E ( x ) :
 1 
=48.4527073   1 +  = 43.19307
 1.7232673 

7.4.19

a)
P a g e | 219

b) H0: The sampled data follows gamma probability distribution.

Ha: The sampled data does not follow gamma probability distribution.

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(112233)
> colon.msts=sample(colon.m$X1.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.msts))
> q2=log(mean(colon.msts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.621226
> beta=(mean(colon.msts))/alpha
> beta
[1] 1.942974
>
> ks.test(colon.msts,pgamma,alpha,1/beta)

One-sample Kolmogorov-Smirnov test

data: colon.msts
P a g e | 220

D = 0.0799, p-value = 0.9069


alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.

c)

−x
1
f ( x) = x1.621226 −1 1.942974
e , x0
 (1.621226 )1.9429741.621226
−x
1
f ( x) = x.621226 e1.942974 , x0
2.630375
Estimated E ( x ) :
= 1.621226  1.942974 = 3.15
7.4.20

a)

b) H0: The sampled data follows gamma probability distribution.

Ha: The sampled data does not follow gamma probability distribution.
P a g e | 221

R output for the estimated parameter and the Kolmogorov-Smirnov test:

> colon=read.csv("seercoloncancerdata.csv",header=T)
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
> set.seed(123321)
> colon.fsts=sample(colon.f$X2.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.fsts))
> q2=log(mean(colon.fsts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.262591
> beta=(mean(colon.fsts))/alpha
> beta
[1] 2.47771
>
> ks.test(colon.fsts,pgamma,alpha,1/beta)

One-sample Kolmogorov-Smirnov test

data: colon.fsts
D = 0.104, p-value = 0.6519
alternative hypothesis: two-sided

Since the p-value > .05, we fail to reject H0.

At the 5% level of significance, there is not enough evidence to show that the data does not follow the
gamma probability distribution.

c)

−x
1
f ( x) = x1.262591−1 2.47771
e , x0
 (1.262591) 2.477711.262591
−x
1
f ( x) = x.262591e 2.47771 , x0
2.842123
Estimated E ( x ) :
= 1.262591 2.47771 = 3.128333
7.4.21

On average, male and female have a similar survival time, but females have a larger variance than males.

You might also like