You are on page 1of 16

Probability and Statistics with R for

Engineers and Scientists 1st Edition Michael


Akritas Solutions Manual
Full download at link:

Solution Manual: https://testbankpack.com/p/solution-manual-for-


probability-and-statistics-with-r-for-engineers-and-scientists-1st-edition-
michael-akritas-0321852990-9780321852991/
111

Chapter 6
Fitting Models to Data

6.2 Some Estimation Concepts


1. Using the commands OZ = read.table(“OzoneData.txt”, header=T) to read data and
using mean(OZ$OzoneData) to get the sample mean, we get 286.3571. The
command sd(OZ$OzoneData)/sqrt(14) gets an estimated standard error of 17.07244.

2. The difference of the average maximum penetration between the two types is es-
timated as 0.49 − 0.36 = 0.13 and the estimated standard error of X̄1 − X̄2 is
calculated as

S2 2 0.192 0.162
SX̄ −X̄ + S2 = + = 0.0369.
1
=
1 2
n1 n2 48 42

3. The proof is straightforward:

Copyright © 2016 Pearson Education, Inc.


( n1 − 1)E(S12 )+ (n2 − 1)E(S22)
2 2 (n
1 2
E(σˆ2 ) = E −1)S + (n − 1)S =
n1 + n2 −12 n1 + n2 − 2
2

(n1 − 1)σ2 + (n2 − 1)σ2


= = σ2.
n1 + n2 − 2

4. (a) The parameter of interest is the proportion of all credit card customers who
had incurred an interest charge in the previous year due to an unpaid balance. The
empirical estimator is the proportion in a sample of credit card customers who
had incurred an interest charge in the previous year due to an unpaid
balance. Using the provided information, we can get the estimate as p̂ =
136/200 = 0.68.
(b) Yes, it is unbiased.
(c) The estimated standard error is


p̂(1 n− p̂) 0.68 × 200
(1 − 0.68)
S = = = 0.033.

Copyright © 2016 Pearson Education, Inc.


112 Chapter 6 Fitting Models to Data

5. The standard error is Sθ̂ = S 2X̄ = 2SX̄ = 2θ/ 12n. θ̂ is unbiased because E(θ̂) =

E(2X̄ ) = 2E(X̄ ) = 2 × θ/2 = θ.


6. (a) E(pˆ1 − pˆ2) = E(pˆ1) − E(pˆ2) = E(X)/m − E(Y )/n = mp1/m − np2/n = p1 − p2,
thus pˆ1 − pˆ2 is unbiased estimator of p1 − p2.
(b) The standard error of pˆ1 − pˆ2 is

σpˆ1−pˆ2 = σ2pˆ1 + σ2 = p1(1 − p1) p2(1 − p2)


+ .
pˆ2
m n
The estimated standard error is

pˆ1(1 − pˆ1) pˆ2(1 − pˆ2)


S pˆ1−pˆ2 = m + n .

(c) From the data, we have pˆ1 = X/m = 70/100 = 0.7 and pˆ2 = Y/n = 160/200 =
0.8, thus the estimator for p1 p—2 is pˆ1 pˆ−2 = 0.1−and the estimated standard error
is
0.7 × (1 − 0.7) 0.8 × (1 − 0.8)
Spˆ1−pˆ2 = + = 0.054.
100 200
7. (a) The model-free estimation is 0.5.
(b) Using the commands x=c(2.08, 2.10, 1.81, 1.98, 1.91, 2.06); 1-pnorm(2.05,
mean(x),sd(x)), we get a model based estimation of 0.2979.

8. (a) The average of the 10,000 variances was computed as 0.083 (it might be differ-
ent at a different time and with a different computer), which is very close to
the population variance. On the other hand, the average of the 10,000 sam-
ple standard deviations was computed as 0.235, which is not as close to the
population version. Thus, we conclude that S2 is unbiased but S is biased.
(b) In part (a), the bias of S is 0.235—0.2887 = 0.0537.
− When using the sample
size n = 5, the average of the 10,000 sample standard deviations was computed as

0.278, with bias 0.278 0.2887 −
= 0.0107. Thus, we conclude that the bias of S
decreases as the sample size increases.

9. (a) Using R commands we can get P (12 < X ≤ 16) = 0.2956 and the 15th, 25th,
55th, and 95th percentiles are 6.85, 8.30, 11.50, and 17.58, respectively.
(b) The estimated values for P (12 < X ≤ 16) is 0.38 and the estimated 15th,
25th, 55th, and 95th percentiles are 6.37, 7.95, 13.09, and 18.89.
(c) We use the following commands
m = mean(x); s = sd(x);

Copyright © 2016 Pearson Education, Inc.


pnorm(16, m, s) - pnorm(12, m, s)
qnorm(c(0.15, 0.25, 0.55, 0.95), m, s)

Copyright © 2016 Pearson Education, Inc.


6.3 Methods for Fitting Models to Data 113

The model-based estimation for P (12 < X ≤16) is 0.289 and the model-based
estimation for 15th, 25th, 55th, and 95th percentiles are 6.67, 8.40, 12.22, and
19.46.
By comparing the results in (b) and (c) to those in (a), we can see that, in
general, the model-based estimators are closer to the population values.

10. (a) The normal Q-Q plot is given below.

Normal Q−Q Plot


48
Sample Quantiles

46
44
42

−2 −1 0 1 2

Theoretical Quantiles

The figure suggests that the normal model for the data is appropriate.
(b) The model based estimation for P (44 < X ≤ 46) is 0.39 and the model based
estimation for median and 75th percentile are 45.20 and 46.52, respectively.
(c) The model-free estimation for P (12 < X ≤ 16) is 0.375 and the model-free
estimation for median and 75th percentile are 44.885 and 46.420 , respectively.
(d) Since the Q-Q plot suggests that normal assumption is appropriate, we would
prefer the model-based estimation.

6.3 Methods for Fitting Models to Data


1. For the exponential(λ) distribution, μ = 1/λ. Letting X̄ = 1/λ, we can solve for

the method of moment estimator for λ as λ̂ = 1/X̄ . It is not unbiased estimator


because E(1/X̄ ) /= 1/E(X̄ ).

2. (a) The commands to fit Weibull(α,β) distribution are


Copyright © 2016 Pearson Education, Inc.
114 Chapter 6 Fitting Models to Data

t=read.table(“RobotReactTime.txt”, header=T); t1=t$Time[t$Robot==1];


fn=function(a)
{(mu/gamma(1+1/a))**2*(gamma(1+2/a)-gamma(1+1/a)**2)-var}
library(nleqslv); mu=mean(t1); var=var(t1);
nleqslv(13, fn); mu/gamma(1+1/32.39172)
The fitted model parameters are α̂ = 32.39, and β̂ = 31.05.
(b) To fit the exponential(λ) distribution, using the results in Example 6.3-5, we
have λ̂ = 1/X̄ = 0.0328.
(c) The model-based estimate of 80th population percentile is 31.51 under model
(a) (using command qweibull(0.8,32.39,31.05)) and it is 49.07 under model
(b) (using command qexp(0.8, 0.0328). As for the probability P (28.15 ≤ X ≤
29.75), the estimates under the two models are 0.1805 and 0.0203, respectively.
(d) Using the commands quantile(t1,0.8); sum(t1>=28.15&t1<=29.75)/length(t1),
we get the empirical estimate for the 80th population percentile and the prob-
ability P (28.15 ≤ X ≤ 29.75) as 31.522 and 0.2727, respectively.

3. For gamma(α, β) distribution, we have μ = αβ and σ2 = αβ2. Thus, β = σ2/μ, and


α = μ2 /σ2 . We get an estimator of α̂ = X̄ 2 /S 2 and β̂ = S 2 /X̄ . For the given
problem, α̂ = 113.52 /1205.55 = 10.686 and β̂ = 1205.55/113.5 = 10.622.
√ √
4. (a) Since μ = θ π/2, the√ re is θ = μ 2/π. Thus, the method of moments
estimator for θ is θ̂ = X̄ 2/π. It is unbiased because
2 2 π 2 = θ.
E(θ̂) = E(X̄ ) = E(X)
π π =θ 2 π

(b) A model based estimator of the population variance is


4−π 24−π 4−π
σ̂2 = θ̂2 = X̄ 2 = X̄ 2 .
2 π 2 π
σ̂2 is not an unbiased estimator of σ2 because
4−π ¯ ¯ 4−π σ2 2 4−π
E(σ̂ ) = E(X¯ 2 )
2 2
π π n π
= (V ar(X)+ E(X) ) = +μ
σ2 π 4−π σ2 π 2 4−π
= + θ2 + 2
n 2 π = n 24−πσ π

4−π
= σ2 1 + /= σ2.

Copyright © 2016 Pearson Education, Inc.


5. (a) Since X Bin(n, p), E(X) = np. Thus, we can estimate p by p̂ = X/n. It is
unbiased because E(p̂) = E(X)/n = np/n = p.

Copyright © 2016 Pearson Education, Inc.


6.3 Methods for Fitting Models to Data 115

(b) By part (a), p̂ = 24/37 = 0.6486.


(c) The system lasts more than 350 hours if and only if both of the two components
can last more than 350 hours. By the independence of the two components,
this probability is p2 and can be estimated as p̂2 . Given the information in
(b), p̂2 = (24/37)2 = 0.4207.
(d) p̂2 is not unbiased estimator for p2 because E(p̂2 ) = V ar(p̂) + E(p̂)2 =

p(1 − p)/n + p2 p2.
6. (a) Since the PMF of Poisson(λ) is e−λλx/x! The likelihood function is
x1 x2 xn
−λ λ −λ λ −λ λ λx1+x2+···+xn
−nλ

lik(λ) = e e 2
··· e =e n
x1 ! x! xn! i=1 xi!

and the log-likelihood function is


Σ
n Σ
L(λ) = −nλ + xi log λ − n log xi!.
i=1 i=1

Setting the first derivative of the log-likelihood function to zero yields the
equation
Σ n 1
−n + xi = 0.
i=1
λ

Solving this equation with respect to λ yields the MLE λ̂ = X̄ of λ.


(b) The MLE estimate of λ is X̄ = 2.24.

(c) The model-based population variance is σ̂2 = λ̂ = X̄ = 2.24, and the sam-

ple variance is 1.533. Assuming the Poisson model correctly describes the
population distribution, we would prefer the model-based estimate.
7. (a) There are X + 5 helmets and the last one has flaw, among the rest X + 4
helmets, there are 4 with flaw and X flawless, thus, we have the probability
x+4 5
P (X = x) = p (1 − p)x.
4
Therefore, the log-likelihood function is
X+4
L(p) = log + 5 log p + X log(1 − p).
4

Copyright © 2016 Pearson Education, Inc.


Setting the first derivative of the log-likelihood function to zero yields the
equation
5 X
− = 0.
p 1−p
Solving this equation yields the MLE p̂ = 5/(5 + X).

Copyright © 2016 Pearson Education, Inc.


116 Chapter 6 Fitting Models to Data

(b) The distribution of X is easily identified as Negative binomial with r = 5 and


parameter p (compare to formula (3.4.15)). Thus, E(X) = r/p = 5/p. In
method of moment estimation, set X = 5/p, and we can solve for the estimatorp̂
= 5/X.
(c) If X = 47, the MLE (a) gives p̂ = 5/(5 + 47) = 0.096 and the method of
moment formula in (b) gives p̂ = 5/47 = 0.106.

8. (a) For uniform(0, θ) distribution, E(X) = θ/2, thus the method of moment
estimator is θ̂ = 2X̄ . For the commands set.seed(3333); x=runif(20, 0, 10);
mean(x), we have X̄ = 5.359, thus θ̂ = 10.718. The model-based estimator of
the population variance σ2 is σ̂2 = θ̂2 /12, thus, for this dataset, the estimate
is 10.7182/12 = 9.573.
(b) The sample variance is 6.781. Compared to the true value of the population
variance, 102/12 = 8.333, the model-based estimate overestimates 1.24, while the
model-free estimate underestimates 1.552. Thus, the model-based estimate
provides a better approximation.
9. (a) To get the moments estimator for θ, solve the equation P̂ = E(P ), that is
P̂ = θ/(1 + θ), and we have the estimator


θˆ = .
1 − P̂

(b) For the given data, the estimate of θ is θˆ = 0.202.

10. (a) The regression coefficients are


Σ Σ Σ
n xΣy ( Σx )( − i i y ) = 11 × 400.5225 − 263.53 × 36.66
β̂1 = n x2 − ( x )2 11 × 9677.4709 − 263.532 = − 0.1420,
i i
i i

and
36.66 263.53
α̂1 = Ȳ − β̂1 X̄ = − (−0.1420) = 6.735.
11 11
Thus, the regression line is ŷ = α̂1 + β̂1 x = 6.735 − 0.142x.
(b) Since the observed concentrations are in the range of 2.50 to 55.00 and the
concentrations 4.5 and 34.7 are in this range, but 62.8 is not in the range, we can
conclude that it is appropriate to use the regression line to 4.5 and 34.7.The
estimated expected corrosion rate at 4.5 is 6.735 − 0.142 × 4.5 = 6.096,
and at 34.7 is 6.735 − 0.142 × 34.7 = 1.808.

∼ we
11. (a) Using the commands x = c(498,526,559,614); y=c(16, 25, 34, 39); lm(y x),
find the estimated regression line is ŷ = −78.7381 + 0.1952x. The expected
number of manatee deaths in a year with 550,000 powerboat registrations is
Copyright © 2016 Pearson Education, Inc.
estimated as −78.7381 + 0.1952 × 550 = 28.62.

Copyright © 2016 Pearson Education, Inc.


6.3 Methods for Fitting Models to Data 117

(b) The R command for (6.3.11) is sum(y**2)+78.7381*sum(y) - 0.1952*sum(x*y)


and it gives 4.82 as the error sum of squares. The intrinsic error variance is
SSE/(n − 2) = 24.82/(4 − 2) = 12.41.
(c) The command lm(y ∼x)$fitted gives the fitted values as 18.49371, 23.96056,
30.40364, and 41.14209. The command lm(y ∼ x)$resid gives the residuals as
—2.493712, 1.039438, 3.596365, and -2.142091. The command sum((lm(y∼x)$resid)**2)
gives the sum of squared residuals and is the same as in part (b).

12. (a) The following shows the scatterplot of the data with the fitted regression line
drawn through it.
14
13
12
11
Strength

10
9
8
7

30 40 50 60 70 80

modulus of Elasticity

From this graph, the linearity of the regression function and homoscedasticity
appear to hold.
(b) The LSE regression coefficients are α̂1 = 2.5801 and β̂1 = 0.1339. We can

estimate the expected strength at modulus of elasticity X = 60 as 2.5801 +


0.1339 × 60 = 10.6141.
(c) Using the commands out=lm(y ∼x); sum(out$resid**2); sum(out$resid**2)/out$df.resid,
we get the error sum of squares and the estimator of the intrinsic error variance
are 15.16757 and 0.6067028, respectively.

13. (a) The LSE regression coefficients are α̂1 = 19.9691 and β̂1 = 0.2255. We can
estimate the expected age at diameter x as ŷ = 19.9691 + 0.2255 × x.
(b) The scatterplot of the data is shown on the next page.
Copyright © 2016 Pearson Education, Inc.
118 Chapter 6 Fitting Models to Data

150
100
Age

50
0

0 100 200 300 400 500

Diameter

This figure suggests that the age of the tree increases with the diameter of
the tree at approximately linear fashion. Thus, the assumption of linearity of
the regression function seems to be, at least approximately, satisfied. On the other
hand, the variability in age of trees seems to increase with the diameterof tree.
Thus, the homoscedasticity assumption appears to be violated for this data set.
(c) The following shows the scatterplot of the transformed data.
5
4
logarithm of Age

3
2
1

1 2 3 4 5 6

logarithm of Diameter

Copyright © 2016 Pearson Education, Inc.


6.4 Comparing Estimators: The MSE Criterion 119

After the log-transformation, the assumptions of the simple linear regression


model seem to be valid.

6.4 Comparing Estimators: The MSE Criterion


1. (a) Bias(θ̂1 ) = E(θ̂1 ) — θ = 2E(X̄ ) − θ = 2E(X) − θ = 2 × θ/2 − θ = 0. The bias
for θ̂2 is Bias(θ̂2 ) = E(θ̂2 ) — θ = nθ/(n + 1) − θ = − θ/(n + 1). Thus, θ̂1 is
unbiased while θ̂2 is biased.
(b) For θ̂1 , we have
ˆ ˆ ˆ 2 σ2 4 θ2 ¯θ2
¯
MSE(θ1) = Var(θ1)+ Bias(θ1) = Var(2X) = 4Var(X) = 4 = = .
n n 12 3n
For θ̂2 ,

θ 2
n
MSE(θ̂2 ) = Var(θ̂2 ) + Bias(θ̂2 )2 = θ2 + −
(n + 1)2(n + 2) n +1
2θ2
= .
(n + 1)(n + 2)

(c) When n = 5 and true value of θ is 10, we have MSE(θ̂1 ) = 102 /(3 × 5) = 6.67,
while MSE(θ̂2 ) = 2 × 102 /[((5 + 1)(5 + 2)] = 4.76. According to the MSE
selection criterion, θ̂2 is preferable.
2. From the distributions of X1 , · · · , X10 and Y1 , · · · , Y10 , we have E(X̄ ) = E(Ȳ ) = μ,
Var(X̄ ) = σ2 /10, and Var(Ȳ ) = 4σ2 /10. X̄ and Ȳ are also independent. Thus,
(a) For any 0 ≤ α ≤ 1, E(μ̂) = E(αX̄ + (1 − α)Ȳ ) = αE(X̄ ) + (1 − α)E(Ȳ ) =
αμ + (1 − α)μ = μ. Thus, μ̂ is unbiased for μ.
(b) Since μ̂ is unbiased,

MSE(μ̂) = Var(μ̂) = Var(αX̄ + (1 − α)Ȳ ) = α2 Var(X̄ ) + (1 − α)2 Var(Ȳ )



2 4σ2 σ2
+ (1 − α)
2
= (5α — 8α + 4)
2

10 10 10

(c) The estimator 0.5X̄ + 0.5Ȳ corresponds to μ̂ with α = 0.5. The MSE is

¯ ¯ 2
σ2 σ2
MSE(0.5X + 0.5Y ) = (5 × 0.5 − 8 × 0.5+ 4) = 1.25 .
10 10

Copyright © 2016 Pearson Education, Inc.


Since MSE(X̄ ) = Var(X̄ ) = σ2 /10 < MSE(0.5X̄ + 0.5Ȳ ), X̄ is a preferable
estimator.

Copyright © 2016 Pearson Education, Inc.

You might also like