Mathematical Statistics With Applications in R 2nd Edition Ramachandran Solutions Manual

Mathematical Statistics with
Applications in R 2nd Edition

Ramachandran Solutions Manual
Visit to Download in Full: https://testbankdeal.com/download/mathematical-statistics-
with-applications-in-r-2nd-edition-ramachandran-solutions-manual/
P a g e | 183
Chapter 7
GOODNESS OF FIT TESTS AND APPLICATIONS
7.1 INTRODUCTION
7.2 CHI-SQUARE TESTS FOR COUNT DATA
7.3 Goodness-of-fit tests to identify the Probability Distribution
7.4. SOME IMPORTANT APPLICATIONS: PARAMETRIC ANALYSIS
7.4 Chapter Summary
7.6 Computer Examples
Projects for Chapter 7
P a g e | 184
Exercise 7.2
7.2.1
H0: The coin is fair.
Ha: The coin is unfair.
p1 = p2 = .5
E1 = E2 = 200 (.5 ) = 100
(104 − 100 ) ( 96 − 100 )
2 2
16 + 16
Q 2
= + = = .32
100 100 100
Let  = 0.05
Since k=2, the Rejection Region is  21,0.05  3.841
Since .32 < 3.841, we fail to reject H0.
At the 5% level of significance, there is not enough evidence to show that the coins is unfair.
7.2.2
H0: Opinion on collective bargaining is independent of employee classification.
Ha: Opinion on collective bargaining is dependent on employee classification.
Here c = 3 and r = 3 , d . f . = ( 3 − 1)( 3 − 1) = 4 ,  = 0.05

RR :  2 4,0.05  9.488
E11 =
( 60 )( 90 ) = 27 E12 =
( 60 )( 50 ) = 15 E13 =
( 60 )( 60 ) = 18
200 200 200
E21 =
(100 )( 90 ) = 45 E22 =
(100 )( 50 ) = 25 E23 =
(100 )( 60 ) = 30
200 200 200
E31 =
( 40 )( 90 ) = 18 E32 =
( 40 )( 50 ) = 10 E33 =
( 40 )( 60 ) = 12
200 200 200
(O − Eij )
2
3 3
Q =  = 43.86
2 ij
i =1 j =1 Eij
Since 43.86 > 9.488, we reject H0
At the 5% level of significance, there is enough evidence to show that the opinion on collective
bargaining is dependent on employee classification.
P a g e | 185
7.2.3
H0: Major of undergraduate students is independent of gender.
Ha: Major of undergraduate students is dependent on gender.
Here c = 4 and r = 2 , d . f . = ( 4 − 1)( 2 − 1) = 3 ,  = 0.01

RR :  23,0.01  11.345
E11 =
( 205)(120 ) = 82 E12 =
( 205)( 52 ) = 35.533
300 300
E21 =
( 95)(120 ) = 38 E22 =
= 16.467
( 95)( 52 )
300 300
E13 =
( 205)( 39 )
= 26.65 E14 =
( 205)(89 ) = 60.817
300 300
E23 =
( 95)( 39 ) = 12.35 E = ( 95)(89 ) = 28.183
24
300 300
(O − Eij )
2
2 4
Q 2 =  = 4.5979
ij
i =1 j =1 Eij
Since 4.5979 < 11.345, we fail to reject H0
At the 1% level of significance, there is not enough evidence to show that the major of undergraduate
students is dependent on gender.
7.2.4
H0: p1 = .35 , p2 = .2 , p3 = .15 , p4 = .3

Ha: At least one of the probabilities is different from the hypothesized value.
1 2 3 4
Obs 427.5 237.5 19 266
Exp 380 190 47.5 332.5
Here k = 4 , d . f . = 3 ,  = 0.05 4
( Oi − Ei )
2
Q =
2
= 48.2125
RR :  23,0.05  7.815 i =1 Ei
At the 5% level of significance, there is enough evidence to show that at least one of the probabilities is
different from the hypothesized value.
P a g e | 186
7.2.5
(a)
H0: p1 = .2 , p2 = .2 , p3 = .3 , p4 = .2 , p5 = .1
Ha: At least one of the probabilities is different from the hypothesized value.
1 2 3 4 5
Obs 22 21 29 17 11
Exp 20 20 30 20 10
Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488
( Oi − Ei )
2
4
Q =
2
= .833
i =1 Ei
Since .833 < 9.488, we fail to reject H0
At the 5% level of significance, there is not enough evidence to show that at least one of the probabilities
is different from the hypothesized value.
(b)
H0: The choice of footwear by undergraduate students is independent of their gender.
Ha: The choice of footwear by undergraduate students is dependent on their gender.
Here c = 5 and r = 2 , d . f . = ( 5 − 1)( 2 − 1) = 4 ,  = 0.05

RR :  2 4,0.05  9.488
E11 = E21 =
( 22 )( 50 ) = 11 , E12 = E22 =
( 21)( 50 ) = 10.5
100 100
E13 = E23 =
( 29 )( 50 ) = 14.5 , E14 = E24 =
(17 )( 50 ) = 8.5
100 100
E15 = E25 =
(11)( 50 ) = 5.5
100
(O − Eij )
2
2 5
Q 2 =  = 3.8428
ij
i =1 j =1 Eij
At the 5% level of significance, there is not enough evidence to show that the choice of footwear by
undergraduate students is dependent on their gender.
P a g e | 187
7.2.6
H0: Dices are fair.
Ha: At least one of the three dices is unfair.
125 75 15 1
p1 = , p2 = , p3 = , p4 =
216 216 216 216
125 75
E1 = (150 ) = 86.81 , E2 = (150 ) = 52.08
216 216
15 1
E3 = (150 ) = 10.42 , E4 = (150 ) = .69
216 216
(Note: Assumption Ei  5 is not meet)
Here k = 4 , d . f . = 3 ,  = 0.05
RR :  23,0.05  7.815
( Oi − Ei )
2
4
Q2 =  = 54.15
i =1 Ei
At the 5% level of significance, there is enough evidence to show that at least one of the three dices is
unfair.
Exercise 7.3
7.3.1
H0: The speed of vehicles is normally distributed.
Ha: The speed of vehicles is not normally distributed.
p1 = .000144 , p2 = .13 , p3 = .785 , p4 = .084 , p5 = 8.84 10 −5

E1 = .022 , E2 = 19.52 , E3 = 117.77 , E4 = 12.67 , E5 = .013
Here k = 5 , d . f . = 4 ,  = 0.01 5
( Oi − Ei )
2
Q =
2
= 9352.66
RR :  2 4,0.01  13.277 i =1 Ei
Since 9352.66 > 13.277, we reject H0
At the 1% level of significance, there is enough evidence to show that the speed of vehicles is not
normally distributed.
P a g e | 188
7.3.2
H0: Daily mean temperature are normally distributed.
Ha: Daily mean temperature are not normally distributed.
p1 = 8.37 10−19 , p2 = 4.33 10−6 , p3 = .27 , p4 = .73 , p5 = 2.6 10 −4

E1 = 4.18 10−17 , E2 = 6.67 10−5 , E3 = 13.5 , E4 = 36.48 , E5 = .013
Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488
( Oi − Ei )
2
5
Q =
2
 106
i =1 Ei
Since 106 > 9.488, we reject H0
At the 5% level of significance, there is enough evidence to show that daily mean temperature are not
normally distributed.
7.3.3
H0: The lifetime of components follow exponential distribution.
Ha: The lifetime of components does not follow exponential distribution.
p1 = .39 , p2 = .24 , p3 = .14 , p4 = .09 , p 5 = .14

E1 = 11.8 , E2 = 7.16 , E3 = 4.34 , E4 = 2.63 , E5 = 4.06
0-100 100-200 200-300 300-400 >400

Obs 15 6 4 2 3
Here k = 5 , d . f . = 4 ,  = 0.05
RR :  2 4,0.05  9.488
( Oi − Ei )
2
5
Q =
2
= 1.493
i =1 Ei
At the 5% level of significance, there is not enough evidence to show that the lifetime of components
does not follow exponential distribution.
Using the R-code for example 7.3.1

P a g e | 189
7.3.4
a) Estimate α and β using MLE
x 0.5
ˆMLE = , ˆ MLE 
 1 n
log ( x ) −  log ( xi )
n i =1
R Output:
> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24,
+ 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21,
+ 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> q1=mean(log(x))
> q2=log(mean(x))
> s=q2-q1
>
> alpha=0.5/s
> alpha
[1] 102.0185
> beta=(mean(x))/alpha
> beta
[1] 0.2181423
ˆMLE  0.2181423 , ˆ MLE  102.0185

H0 : The data follow Gamma Distribution
Ha : The data does not follow Gamma Distribution
0-20 20-22 22-24 24-26 >26

Obs 12 20 13 9 1
O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .15 , P2 = .31 , P3 = .32 , P4 = .16 , P5 = .05
E1 = 8.38 , E 2 = 17.31 , E 3 = 17.75 , E 4 = 8.82 , E 5 = 2.75
Q 2 = 4.37
k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2

2
 5.991
Since 4.37  5.991 , we fail to reject H0
At the 5% level of significance, there is not enough evidence to show that the data does not follow gamma
distribution.
P a g e | 190
b) R Output for estimating the parameters using the MASS package:
> x=c(27, 25, 24, 24, 22, 20, 21, 22, 21, 25, 24, 26, 25, 24, 23, 22, 20, 21, 19, 21, 25, 24,
+ 26, 25, 22, 23, 22, 22, 21, 19, 21, 23, 21, 26, 24, 22, 23, 22, 22, 20, 19, 21, 23, 21,
+ 26, 24, 22, 23, 21, 19, 20, 18, 20, 20, 18)
>
> fitdistr(x,"weibull")
shape scale
10.8855539 23.2637020
( 1.1087929) ( 0.3053506)
ˆ MLE  23.2637020 , ˆMLE  10.8855539

H0 : The data follow Weibull Distribution
Ha : The data does not follow Weibull Distribution
0-20 20-22 22-24 24-26 >26

Obs 12 20 13 9 1
O1 = 12 , O2 = 20 , O3 = 13 , O4 = 9 , O5 = 1
P1 = .18 , P2 = .24 , P3 = .33 , P4 = .21 , P5 = .034
E1 = 9.65 , E 2 = 13.44 , E 3 = 18.4 , E 4 = 11.59 , E 5 = 1.92
Q 2 = 6.37
k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2

2
 5.991
Since 6.37  5.991 , we reject H0
At the 5% level of significance, there is enough evidence to show that the data does not follow weibull
distribution.
7.3.5 P-P Plot:
Since the data points do not fall on the straight line, we can determine that the data doesn't follow the
normal pdf.
P a g e | 191
7.3.6
> #7.3.6
> #PP Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (ci=pnorm(xi,xbar,sd))
[1] 0.1519306 0.3739965 0.6501090 0.8626969 0.9692656
> plot(cihat,ci,main="Normal P-P Plot for Grades",xlab="Observed Cum Prob",ylab="Exp Cumulative
Prob")
> abline(lsfit(cihat,ci))
Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.
P a g e | 192
7.3.7
Q-Q Plot:
Since the data points deviate from the straight line, we can conclude that the data doesn't follow
the normal distribution.
7.3.8
> #7.3.8
> #QQ Plot
> xi=c(59.5,69.5,79.5,89.5,100.5)
> fi=c(12,36,90,44,18)
> cumfreqi=c(12,48,138,182,200)
> (cihat=cumfreqi/201)
[1] 0.05970149 0.23880597 0.68656716 0.90547264 0.99502488
> midxi=c(29.5,64.5,74.5,84.5,95)
> (xbar=sum(midxi*fi)/200)
[1] 74.045
> (variance=(1/199)*sum(fi*((midxi-xbar)^2)))
[1] 200.1161
> (sd=sqrt(variance))
[1] 14.14624
> (xihat=qnorm(cihat,xbar,sd))
[1] 52.01528 63.99907 80.92210 92.62440 110.50769
> plot(xihat,xi,main="Normal Q-Q Plot for Grades",xlab="Theoretical Quantiles",ylab="Sample
Quantiles")
> abline(lsfit(xihat,xi))
P a g e | 193
Since the data points falls on the straight line, we can conclude that the data follow the normal
pdf.
Exercise 7.4
7.4.1
a)
We can’t exactly say the distribution of this data with the histogram obtained. It seems like the histogram
is slightly right skewed.
P a g e | 194
b)
H0: The data follows the exponential power probability distribution.
Ha: The data does not follow the exponential power probability distribution.
R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:
> paramp(Point.Barrow)
Mean Mp Sd Sp p
353.817226 354.399367 13.809308 16.742494 4.596905
no.conv = FALSE
>
> chisq.test(cbind(Point.Barrow,dnormp(Point.Barrow,354.399367,16.742494,4.596905)))
Pearson's Chi-squared test
data: cbind(Point.Barrow, dnormp(Point.Barrow, 354.399367, 16.742494, 4.596905))

X-squared = 0.0348, df = 30, p-value = 1
Since the p-value > .05, we fail to reject H0.
At the 5% level of significance, there is not enough evidence to show that the data does not follow the
exponential power probability distribution.
c)
ˆ = 354.399367
So the average CO2 amount in Point Barrow during the period of 1974 to 2004 is approximately 354
parts per million(ppm).
7.4.2
a)
The histogram is clearly skewed to the right.

P a g e | 195
b)
H0: The data follows the exponential power probability distribution.
Ha: The data does not follow the exponential power probability distribution.
R output for estimating the parameters (using the normalp package) and Chi-squared goodness of fit test:
> paramp(Mauna.Loa)
Mean Mp Sd Sp p
352.38910 352.78344 13.93138 16.38595 3.82405
no.conv = FALSE
> chisq.test(cbind(Mauna.Loa,dnormp(Mauna.Loa,352.78344,16.38595,3.82405)))
data: cbind(Mauna.Loa, dnormp(Mauna.Loa, 352.78344, 16.38595, 3.82405))

exponential power probability distribution.
c)
ˆ = 352.78344
So the average CO2 amount in Mauna Lao during the period of 1974 to 2004 is approximately 353ppm.
7.4.3
a)
Data does not seem to be exactly symmetrical with this 8 bins. However the pattern may get change by
having higher number of bins in the histogram.
P a g e | 196
b)
H0: The data follows the normal probability distribution.
Ha: The data does not follow the normal probability distribution.
R output for Kolmogorov-Smirnov goodnes of fit test:
> ks.test(Northern.Region,pnorm,mean(Northern.Region),sd(Northern.Region))
One-sample Kolmogorov-Smirnov test
data: Northern.Region
D = 0.0615, p-value = 0.9989
alternative hypothesis: two-sided
normal probability distribution.
c)
x = 4.38703 , s = 0.6710476
95% Confidence Interval: (4.194888 , 4.579173)
We are 95% confident that the true mean lies between 4.194888 and 4.579173.
d)
The data points closely follow the straight line, showing that the data could follow the normal
distribution.
P a g e | 197
7.4.4
a)
Data seems to be slightly skewed to the right.
b)
H0: The data follows the gamma probability distribution.
Ha: The data does not follow the gamma probability distribution.
R output for parameter estimation (MLE) and Kolmogorov-Smirnov goodnes of fit test:
> q1=mean(log(Central.Region))
> q2=log(mean(Central.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 33.09928
> beta=(mean(Central.Region))/alpha
> beta
[1] 0.1298179
>
> ks.test(Central.Region,pgamma,alpha,1/beta)
data: Central.Region
D = 0.0813, p-value = 0.9688
P a g e | 198
gamma probability distribution.
c)
x = 4.296879 , s = 0.7602079
d)
The data points deviate from straight line showing that the data does not follow the normal distribution.
7.4.5
(Note: The problem asks us to use the beta distribution, but the distribution has domain from 0 to 1.
Therefore, I solve the problem using the gamma distribution similarly to the previous problem.)
P a g e | 199
a)
b)
> q1=mean(log(Southern.Region))
> q2=log(mean(Southern.Region))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 36.66364
> beta=(mean(Southern.Region))/alpha
> beta
[1] 0.1171047
>
> ks.test(Southern.Region,pgamma,alpha,1/beta)
data: Southern.Region
D = 0.0774, p-value = 0.9802
P a g e | 200
c)
x = 4.293485 , s = 0.7076159
d)
The data points deviate from straight line showing that the data does not follow the normal distribution.
7.4.6
a) i)
H0: The data follows the 3 parameter weibull probability distribution.
Ha: The data does not follow the 3 parameter weibull probability distribution.
R code for the parameter estimation and the Kolmogorov-Smirnov goodness-of-fit test:
> w3p <- function(x, a,b,c) {c/b*((x -a)/b)^(c-1)*exp(-((x-a)/b)^c)}

> fitdistr(wind$Wind,w3p,list(a = 5, b = 65, c = 1))
a b c
11.7814433 73.5992769 1.7778932
( 4.5188223) ( 7.9794036) ( 0.2639357)
>
> ks.test(wind$Wind-11.7814433,pweibull,1.7778932,73.5992769)

P a g e | 201
data: wind$Wind - 11.7814433

D = 0.0818, p-value = 0.7935
At the 5% level of significance, there is not enough evidence to show that the data does not follow the 3
parameter weibull probability distribution.
ii)
H0: The data follows the 3 parameter weibull probability distribution.
Ha: The data does not follow the 3 parameter weibull probability distribution.
R code for the Anderson-Darling goodness-of-fit test:
> #sort observations

> orderwind=sort(wind$Wind)
>
> #weibull 3 parameter CDF
> w3cdf=function(x,a,b,c) {1-exp(-((x-a)/b)^c)}
>
> Fi=w3cdf(orderwind,11.7814433,73.5992769,1.7778932)
> revFi=rev(Fi)
> n=length(wind$Wind)
> i=c(1:n)
> coef1=((2*i)-1)/n
> coef2=log(Fi)+log(1-revFi)
> x2=coef1*coef2
> s=sum(x2)
>
> A=-n-s
> #Test Statistic
>A
[1] 0.5946242
> AA=(1 + 0.75/n + 2.25/n^2) * A
> pval=exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
> #p-value
> pval
[1] 0.1180549
b)
Both test fail the reject the null hypothesis. In our case, with the information given the weibull
distribution could be a good fit for our data.
P a g e | 202
c)
Using the estimated parameters from part a, we can estimate
1.7778932  x − 11.7814433    x−11.7814433  
0.7778932 1.7778932
f ( x) =   exp −    , −  x  
73.5992769  73.5992769    73.5992769  
1 1
d) 𝐸(𝑥) = 𝜇 + 𝛽Г (1 + 𝛼) = 11.78 + 73.60 ∗ Г (1 + 1.78) = 77.25
The expected velocity of a category 5 hurricane is of 77.25mph.
7.4.7
a)
i)
H0: The data follows the Rayleigh probability distribution.
Ha: The data does not follow the Rayleigh probability distribution.
R code for the parameter estimation using MLE and the Kolmogorov-Smirnov goodness-of-fit test:
> sigsqr=(1/(2*n))*sum(wind$Wind^2)
> sig=sqrt(sigsqr)
> sig
[1] 60.87816
> ks.test(wind$Wind,prayleigh,sig)
data: wind$Wind
D = 0.0739, p-value = 0.8819
rayleigh probability distribution.
ii)
H0: The data follows the Rayleigh probability distribution.
Ha: The data does not follow the Rayleigh probability distribution.
R code for the Anderson-Darling goodness-of-fit test:
> #sort observations

> orderwind=sort(wind$Wind)
>
> #rayleigh CDF
> raycdf=function(x,b) {1-exp(-((x^2)/(2*(b^2))))}
>
P a g e | 203
> Fi=raycdf(orderwind,sig)
> revFi=rev(Fi)
> n=length(wind$Wind)
> i=c(1:n)
> coef1=((2*i)-1)/n
> coef2=log(Fi)+log(1-revFi)
> x2=coef1*coef2
> s=sum(x2)
>
> A=-n-s
> #Test Statistic
>A
[1] 0.54775
> AA=(1 + 0.75/n + 2.25/n^2) * A
> pval=exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
> #p-value
> pval
[1] 0.1546356
Rayleigh probability distribution.
b)
Both tests, fail to reject the null hypothesis. In our case, with the information given the Rayleigh
distribution might be a good fit for our data.
c)
Using the estimated parameters from part a, we can estimate
−x 2
x
f ( x) = e 7412.302 , x0
3706.151
Comparing the result between the 3 parameter Weibull and the Rayleigh distribution, the Rayleigh
distribution had higher p-values in both tests.
𝜋 3706.15𝜋
d) 𝐸(𝑥) = 𝜎√ 2 = √ 2
= 76.30
The expected velocity of a category 5 hurricane is of 76.30mph.

P a g e | 204
7.4.8
a)
The data is skewed to the right. Seems like the gamma distribution might be a good fit.
b)
> q1=mean(log(Annual.Av....))
> q2=log(mean(Annual.Av....))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 18.15786
> beta=(mean(Annual.Av....))/alpha
> beta
[1] 0.3209327
>
> ks.test(Annual.Av....,pgamma,alpha,1/beta)
data: Annual.Av....
D = 0.0849, p-value = 0.8562
P a g e | 205
c)
Using our estimated parameter, we can estimate the expected value.
ˆ MLE = 18.15786 , ˆMLE = 0.3209327

E ( x ) = 18.15786  0.3209327 = 5.827451
7.4.9
a)
Looking at the histogram, it's possible that the data follows exponential distribution.
b)
H0: The sampled data follow exponential distribution.
Ha: The sampled data does not follow exponential distribution.

P a g e | 206
R output for the estimated parameter, Kolmogorov-Smirnov test and Chi-squared test:
> set.seed(321123)
> sizespl=sample(Tumor_SIZE,50)
> sizespl
[1] 8 20 15 24 2 9 52 10 40 36 17 1 3 1 12 18 27 10 22 21 1 40 16 21 45
[26] 10 11 16 4 20 3 7 4 19 9 55 33 40 48 18 11 11 12 9 3 12 8 27 12 11
> beta=mean(sizespl)
> ks.test(sizespl,pexp,1/beta)
data: sizespl
D = 0.164, p-value = 0.136
> chisq.test(cbind(sizespl,dexp(sizespl,1/beta)))
data: cbind(sizespl, dexp(sizespl, 1/beta))

Since in both tests the p-value > .05, we fail to reject H0.
exponential probability distribution.
c)
−x
1 17.68
f ( x) = e , x0
17.68
d)
ˆMLE = 17.68
So the expected tumor size of a breast cancer patient is 17.68mm.
7.4.10
a)
P a g e | 207
It is possible that the data follow the 2 parameter weibull distributiion.
b)
H0: The sampled data follows 2 parameter weibull probability distribution.
Ha: The sampled data does not follow 2 parameter weibull probability distribution.
R output for the estimated parameter and the Kolmogorov-Smirnov test:
> set.seed(123123)
> survt=sample(SRV_TIME_YEAR,50)
> survt
[1] 4.91667 6.75000 1.75000 1.83333 4.83333 3.08333 3.83333 6.25000 5.25000
[10] 2.83333 7.50000 7.58333 6.75000 2.25000 2.16667 2.66667 4.58333 2.33333
[19] 0.16667 2.25000 3.50000 3.83333 4.33333 2.75000 3.66667 2.25000 3.66667
[28] 5.66667 4.41667 0.83333 4.00000 0.25000 3.91667 0.91667 1.83333 3.58333
[37] 3.08333 4.16667 1.66667 2.75000 5.50000 6.50000 1.75000 3.50000 3.91667
[46] 0.41667 2.41667 0.41667 1.16667 6.50000
> fitdistr(survt,"weibull")
shape scale
1.7734463 3.8544480
(0.2051998) (0.3214546)
> ks.test(survt,pweibull,1.7734463,3.8544480)
data: survt
D = 0.0895, p-value = 0.8182
  
.7734463 1.7734463
1.7734463  x  x 
c) f ( x ) =   exp −    , x0
3.8544480  3.8544480    3.8544480  
Once we have the pdf, we can obtain the expected values and probabilities.
P a g e | 208
  x 
1.7734463

F ( x ) = 1 − exp −    , x0
  3.8544480  
d)
Survival function
   x 
1.7734463
 
S ( x ) = 1 − 1 − exp −     , x  0

   3.8544480   
  
1.7734463
x 
S ( x ) = exp −    , x0
  3.8544480  
e)
From the survival function we can obtain the probabilities of surviving more than “t” years for a breast
cancer patient.
7.4.11
a)
P a g e | 209
b)
> lung=read.csv("Lung cancer data.csv",header=T)

>
> lungs=split(lung,lung$SEX)
>
> lung.m=as.data.frame(lungs[1])
> lung.f=as.data.frame(lungs[2])
>
> set.seed(321123)
> lung.ms=sample(lung.m$X1.Tumor,60)
>
> fitdistr(lung.ms,"weibull")
shape scale
2.1086635 48.2347045
( 0.2076375) ( 3.1234964)
> ks.test(lung.ms,pweibull,2.1086635,48.2347045)
data: lung.ms
D = 0.0925, p-value = 0.683
P a g e | 210
c)
2.1086635  x 
1.1086635
  x 
2.1086635

f ( x) =   exp −    , x0
48.2347045  48.2347045    48.2347045  
Estimated E ( x ) :
 1 
=48.2347045   1 +  = 42.72008
 2.1086635 
So the average malignant tumor size of a male lung cancer patient is 42.72mm.
7.4.12
a)
b) H0: The sampled data follows 2 parameter weibull probability distribution.
P a g e | 211

>
>
>
> set.seed(321123)
> lung.fs=sample(lung.f$X2.Tumor,60)
>
> fitdistr(lung.fs,"weibull")
shape scale
1.8975835 39.8984887
( 0.1870386) ( 2.8626266)
> ks.test(lung.fs,pweibull,1.8975835,39.8984887)
data: lung.fs
D = 0.1161, p-value = 0.3936
c)
1.8975835  x 
.8975835
  x 
1.8975835

f ( x) =   exp −    , x0
39.8984887  39.8984887    39.8984887  
Estimated E ( x ) :
 1 
=39.8984887   1 +  = 35.40591
 1.8975835 
So the average malignant tumor size of a female lung cancer patient is 35.41mm.
7.4.13
From the samples drawn, on average males tumors’ are around 7mm larger than the average female
tumor. At the same time, females show less variance than males in tumor sizes when looking at the
histograms.
7.4.14
P a g e | 212
a)
b) H0: The sampled data follows weibull probability distribution.
Ha: The sampled data does not follow weibull probability distribution.

>
>
>
> set.seed(321123)
> lung.stms=sample(lung.m$X1.SurvTimeMT,50)
>
> fitdistr(lung.stms,"weibull")
shape scale
0.79037722 29.77144952
( 0.08391815) ( 5.64803367)
> ks.test(lung.stms,pweibull,0.79037722,29.77144952)
data: lung.stms
D = 0.1457, p-value = 0.2393
P a g e | 213
weibull probability distribution.
c)
0.79037722  x 
-0.2096228
  x 
0.79037722

f ( x) =   exp −    , x0
29.77144952  29.77144952    29.77144952  
  x 
0.79037722

F ( x ) = 1 − exp −    , x0
  29.77144952  
d) Survival function
   x 
0.79037722
 
S ( x ) = 1 − 1 − exp −     , x  0
   29.77144952   
 
  x 
0.79037722

S ( x ) = exp −    , x0
  29.77144952  
e)
7.4.15
a)
P a g e | 214
b) H0: The sampled data follows weibull probability distribution.

>
>
>
> set.seed(321321)
> lung.stfs=sample(lung.f$X2.SurvTimeMT,50)
>
> fitdistr(lung.stfs,"weibull")
shape scale
0.9689514 22.6733066
( 0.1067687) ( 3.4941692)
> ks.test(lung.stfs,pweibull,0.9689514,22.6733066)
data: lung.stfs
D = 0.0561, p-value = 0.9975

P a g e | 215
0.9689514  x 
-0.0310486
  x 
0.9689514

c) f ( x ) =   exp −    , x0
22.6733066  22.6733066    22.6733066  
  
0.9689514
x 
F ( x ) = 1 − exp −    , x0
  22.6733066  
d) Survival function
   x 
0.9689514
 
S ( x ) = 1 − 1 − exp −     , x  0
   22.6733066   
 
  
0.9689514
x 
S ( x ) = exp −    , x0
  22.6733066  
e)
P a g e | 216
7.4.16
Males on average tend to have a longer survival time than the females.
7.4.17
a)
b)
> colon=read.csv("seercoloncancerdata.csv",header=T)
>
> colons=split(colon,colon$SEX)
>
> colon.m=as.data.frame(colons[1])
> colon.f=as.data.frame(colons[2])
>
> set.seed(323121)
> colon.ms=sample(colon.m$X1.CS_SIZE,50)
>
> fitdistr(colon.ms,"weibull")
shape scale
1.6269871 57.0717161
( 0.1779206) ( 5.2153708)
> ks.test(colon.ms,pweibull,1.6269871,57.0717161)
data: colon.ms
D = 0.1162, p-value = 0.5096
P a g e | 217
c)
  
0.6269871 1.6269871
1.6269871  x  x 
f ( x) =   exp −    , x0
48.2347045  48.2347045    48.2347045  
Estimated E ( x ) :
 1 
=48.2347045   1 +  = 51.09328
 1.6269871 
7.4.18
a)
b)
H0: The sampled data follows weibull probability distribution.
>
>
>
> set.seed(121323)
> colon.fs=sample(colon.f$X2.CS_SIZE,50)
>
> fitdistr(colon.fs,"weibull")
shape scale
1.7232673 48.4527073
( 0.1659436) ( 4.2061986)
> ks.test(colon.fs,pweibull,1.7232673,48.4527073)
P a g e | 218
data: colon.fs
D = 0.1357, p-value = 0.3163
c)
1.7232673  x 
.7232673
  x 
1.7232673

f ( x) =   exp −    , x0
48.4527073  48.4527073    48.4527073  
Estimated E ( x ) :
 1 
=48.4527073   1 +  = 43.19307
 1.7232673 
7.4.19
a)
P a g e | 219
b) H0: The sampled data follows gamma probability distribution.
Ha: The sampled data does not follow gamma probability distribution.
>
>
>
> set.seed(112233)
> colon.msts=sample(colon.m$X1.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.msts))
> q2=log(mean(colon.msts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.621226
> beta=(mean(colon.msts))/alpha
> beta
[1] 1.942974
>
> ks.test(colon.msts,pgamma,alpha,1/beta)
data: colon.msts
P a g e | 220
D = 0.0799, p-value = 0.9069

c)
−x
1
f ( x) = x1.621226 −1 1.942974
e , x0
 (1.621226 )1.9429741.621226
−x
1
f ( x) = x.621226 e1.942974 , x0
2.630375
Estimated E ( x ) :
= 1.621226  1.942974 = 3.15
7.4.20
a)
b) H0: The sampled data follows gamma probability distribution.
Ha: The sampled data does not follow gamma probability distribution.
P a g e | 221
>
> set.seed(123321)
> colon.fsts=sample(colon.f$X2.SRV_TIME_YEAR,50)
>
> q1=mean(log(colon.fsts))
> q2=log(mean(colon.fsts))
> s=q2-q1
> alpha=0.5/s
> alpha
[1] 1.262591
> beta=(mean(colon.fsts))/alpha
> beta
[1] 2.47771
>
> ks.test(colon.fsts,pgamma,alpha,1/beta)
data: colon.fsts
D = 0.104, p-value = 0.6519
c)
−x
1
f ( x) = x1.262591−1 2.47771
e , x0
 (1.262591) 2.477711.262591
−x
1
f ( x) = x.262591e 2.47771 , x0
2.842123
Estimated E ( x ) :
= 1.262591 2.47771 = 3.128333
7.4.21
On average, male and female have a similar survival time, but females have a larger variance than males.

Mathematical Statistics With Applications in R 2nd Edition Ramachandran Solutions Manual

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Statistics With Applications in R 2nd Edition Ramachandran Solutions Manual

Uploaded by

Copyright:

Available Formats

Mathematical Statistics with

Applications in R 2nd Edition

H0: The coin is fair.

Ha: The coin is unfair.

Since k=2, the Rejection Region is  21,0.05  3.841

Since .32 < 3.841, we fail to reject H0.

H0: Opinion on collective bargaining is independent of employee classification.

Ha: Opinion on collective bargaining is dependent on employee classification.

Here c = 3 and r = 3 , d . f . = ( 3 − 1)( 3 − 1) = 4 ,  = 0.05

Since 43.86 > 9.488, we reject H0

H0: Major of undergraduate students is independent of gender.

Ha: Major of undergraduate students is dependent on gender.

Here c = 4 and r = 2 , d . f . = ( 4 − 1)( 2 − 1) = 3 ,  = 0.01

Since 4.5979 < 11.345, we fail to reject H0

H0: p1 = .35 , p2 = .2 , p3 = .15 , p4 = .3

Since 48.2125 > 7.815, we reject H0

Since .833 < 9.488, we fail to reject H0

H0: The choice of footwear by undergraduate students is independent of their gender.

Ha: The choice of footwear by undergraduate students is dependent on their gender.

Here c = 5 and r = 2 , d . f . = ( 5 − 1)( 2 − 1) = 4 ,  = 0.05

Since 3.8428 < 9.488, we fail to reject H0

H0: Dices are fair.

Ha: At least one of the three dices is unfair.

(Note: Assumption Ei  5 is not meet)

Since 54.15 > 7.815, we reject H0

H0: The speed of vehicles is normally distributed.

Ha: The speed of vehicles is not normally distributed.

p1 = .000144 , p2 = .13 , p3 = .785 , p4 = .084 , p5 = 8.84 10 −5

(Note: Assumption Ei  5 is not meet)

Since 9352.66 > 13.277, we reject H0

H0: Daily mean temperature are normally distributed.

Ha: Daily mean temperature are not normally distributed.

p1 = 8.37 10−19 , p2 = 4.33 10−6 , p3 = .27 , p4 = .73 , p5 = 2.6 10 −4

(Note: Assumption Ei  5 is not meet)

Since 106 > 9.488, we reject H0

H0: The lifetime of components follow exponential distribution.

Ha: The lifetime of components does not follow exponential distribution.

p1 = .39 , p2 = .24 , p3 = .14 , p4 = .09 , p 5 = .14

(Note: Assumption Ei  5 is not meet)

0-100 100-200 200-300 300-400 >400

Since 1.493 < 9.488, we fail to reject H0

Using the R-code for example 7.3.1

a) Estimate α and β using MLE

ˆMLE  0.2181423 , ˆ MLE  102.0185

Ha : The data does not follow Gamma Distribution

0-20 20-22 22-24 24-26 >26

k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2

Since 4.37  5.991 , we fail to reject H0

b) R Output for estimating the parameters using the MASS package:

ˆ MLE  23.2637020 , ˆMLE  10.8855539

Ha : The data does not follow Weibull Distribution

0-20 20-22 22-24 24-26 >26

k=5 and two parameters where estimated, df=5-1-2=2. RR : .05,2

Since 6.37  5.991 , we reject H0

7.3.5 P-P Plot:

H0: The data follows the exponential power probability distribution.

Pearson's Chi-squared test

data: cbind(Point.Barrow, dnormp(Point.Barrow, 354.399367, 16.742494, 4.596905))

Since the p-value > .05, we fail to reject H0.

The histogram is clearly skewed to the right.

H0: The data follows the exponential power probability distribution.

Pearson's Chi-squared test

data: cbind(Mauna.Loa, dnormp(Mauna.Loa, 352.78344, 16.38595, 3.82405))

> w3p <- function(x, a,b,c) {c/b((x -a)/b)^(c-1)exp(-((x-a)/b)^c)}