Professional Documents
Culture Documents
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
Probability and Statistics With R For Engineers and Scientists 1st Edition Michael Akritas Solutions Manual Download
Chapter 6
Fitting Models to Data
2. The difference of the average maximum penetration between the two types is es-
timated as 0.49 − 0.36 = 0.13 and the estimated standard error of X̄1 − X̄2 is
calculated as
S2 2 0.192 0.162
SX̄ −X̄ + S2 = + = 0.0369.
1
=
1 2
n1 n2 48 42
4. (a) The parameter of interest is the proportion of all credit card customers who
had incurred an interest charge in the previous year due to an unpaid balance. The
empirical estimator is the proportion in a sample of credit card customers who
had incurred an interest charge in the previous year due to an unpaid
balance. Using the provided information, we can get the estimate as p̂ =
136/200 = 0.68.
(b) Yes, it is unbiased.
(c) The estimated standard error is
p̂
p̂(1 n− p̂) 0.68 × 200
(1 − 0.68)
S = = = 0.033.
(c) From the data, we have pˆ1 = X/m = 70/100 = 0.7 and pˆ2 = Y/n = 160/200 =
0.8, thus the estimator for p1 p—2 is pˆ1 pˆ−2 = 0.1−and the estimated standard error
is
0.7 × (1 − 0.7) 0.8 × (1 − 0.8)
Spˆ1−pˆ2 = + = 0.054.
100 200
7. (a) The model-free estimation is 0.5.
(b) Using the commands x=c(2.08, 2.10, 1.81, 1.98, 1.91, 2.06); 1-pnorm(2.05,
mean(x),sd(x)), we get a model based estimation of 0.2979.
8. (a) The average of the 10,000 variances was computed as 0.083 (it might be differ-
ent at a different time and with a different computer), which is very close to
the population variance. On the other hand, the average of the 10,000 sam-
ple standard deviations was computed as 0.235, which is not as close to the
population version. Thus, we conclude that S2 is unbiased but S is biased.
(b) In part (a), the bias of S is 0.235—0.2887 = 0.0537.
− When using the sample
size n = 5, the average of the 10,000 sample standard deviations was computed as
—
0.278, with bias 0.278 0.2887 −
= 0.0107. Thus, we conclude that the bias of S
decreases as the sample size increases.
9. (a) Using R commands we can get P (12 < X ≤ 16) = 0.2956 and the 15th, 25th,
55th, and 95th percentiles are 6.85, 8.30, 11.50, and 17.58, respectively.
(b) The estimated values for P (12 < X ≤ 16) is 0.38 and the estimated 15th,
25th, 55th, and 95th percentiles are 6.37, 7.95, 13.09, and 18.89.
(c) We use the following commands
m = mean(x); s = sd(x);
The model-based estimation for P (12 < X ≤16) is 0.289 and the model-based
estimation for 15th, 25th, 55th, and 95th percentiles are 6.67, 8.40, 12.22, and
19.46.
By comparing the results in (b) and (c) to those in (a), we can see that, in
general, the model-based estimators are closer to the population values.
46
44
42
−2 −1 0 1 2
Theoretical Quantiles
The figure suggests that the normal model for the data is appropriate.
(b) The model based estimation for P (44 < X ≤ 46) is 0.39 and the model based
estimation for median and 75th percentile are 45.20 and 46.52, respectively.
(c) The model-free estimation for P (12 < X ≤ 16) is 0.375 and the model-free
estimation for median and 75th percentile are 44.885 and 46.420 , respectively.
(d) Since the Q-Q plot suggests that normal assumption is appropriate, we would
prefer the model-based estimation.
4−π
= σ2 1 + /= σ2.
nπ
lik(λ) = e e 2
··· e =e n
x1 ! x! xn! i=1 xi!
Setting the first derivative of the log-likelihood function to zero yields the
equation
Σ n 1
−n + xi = 0.
i=1
λ
(c) The model-based population variance is σ̂2 = λ̂ = X̄ = 2.24, and the sam-
ple variance is 1.533. Assuming the Poisson model correctly describes the
population distribution, we would prefer the model-based estimate.
7. (a) There are X + 5 helmets and the last one has flaw, among the rest X + 4
helmets, there are 4 with flaw and X flawless, thus, we have the probability
x+4 5
P (X = x) = p (1 − p)x.
4
Therefore, the log-likelihood function is
X+4
L(p) = log + 5 log p + X log(1 − p).
4
8. (a) For uniform(0, θ) distribution, E(X) = θ/2, thus the method of moment
estimator is θ̂ = 2X̄ . For the commands set.seed(3333); x=runif(20, 0, 10);
mean(x), we have X̄ = 5.359, thus θ̂ = 10.718. The model-based estimator of
the population variance σ2 is σ̂2 = θ̂2 /12, thus, for this dataset, the estimate
is 10.7182/12 = 9.573.
(b) The sample variance is 6.781. Compared to the true value of the population
variance, 102/12 = 8.333, the model-based estimate overestimates 1.24, while the
model-free estimate underestimates 1.552. Thus, the model-based estimate
provides a better approximation.
9. (a) To get the moments estimator for θ, solve the equation P̂ = E(P ), that is
P̂ = θ/(1 + θ), and we have the estimator
P̂
θˆ = .
1 − P̂
and
36.66 263.53
α̂1 = Ȳ − β̂1 X̄ = − (−0.1420) = 6.735.
11 11
Thus, the regression line is ŷ = α̂1 + β̂1 x = 6.735 − 0.142x.
(b) Since the observed concentrations are in the range of 2.50 to 55.00 and the
concentrations 4.5 and 34.7 are in this range, but 62.8 is not in the range, we can
conclude that it is appropriate to use the regression line to 4.5 and 34.7.The
estimated expected corrosion rate at 4.5 is 6.735 − 0.142 × 4.5 = 6.096,
and at 34.7 is 6.735 − 0.142 × 34.7 = 1.808.
∼ we
11. (a) Using the commands x = c(498,526,559,614); y=c(16, 25, 34, 39); lm(y x),
find the estimated regression line is ŷ = −78.7381 + 0.1952x. The expected
number of manatee deaths in a year with 550,000 powerboat registrations is
Copyright © 2016 Pearson Education, Inc.
estimated as −78.7381 + 0.1952 × 550 = 28.62.
12. (a) The following shows the scatterplot of the data with the fitted regression line
drawn through it.
14
13
12
11
Strength
10
9
8
7
30 40 50 60 70 80
modulus of Elasticity
From this graph, the linearity of the regression function and homoscedasticity
appear to hold.
(b) The LSE regression coefficients are α̂1 = 2.5801 and β̂1 = 0.1339. We can
13. (a) The LSE regression coefficients are α̂1 = 19.9691 and β̂1 = 0.2255. We can
estimate the expected age at diameter x as ŷ = 19.9691 + 0.2255 × x.
(b) The scatterplot of the data is shown on the next page.
Copyright © 2016 Pearson Education, Inc.
118 Chapter 6 Fitting Models to Data
150
100
Age
50
0
Diameter
This figure suggests that the age of the tree increases with the diameter of
the tree at approximately linear fashion. Thus, the assumption of linearity of
the regression function seems to be, at least approximately, satisfied. On the other
hand, the variability in age of trees seems to increase with the diameterof tree.
Thus, the homoscedasticity assumption appears to be violated for this data set.
(c) The following shows the scatterplot of the transformed data.
5
4
logarithm of Age
3
2
1
1 2 3 4 5 6
logarithm of Diameter
θ 2
n
MSE(θ̂2 ) = Var(θ̂2 ) + Bias(θ̂2 )2 = θ2 + −
(n + 1)2(n + 2) n +1
2θ2
= .
(n + 1)(n + 2)
(c) When n = 5 and true value of θ is 10, we have MSE(θ̂1 ) = 102 /(3 × 5) = 6.67,
while MSE(θ̂2 ) = 2 × 102 /[((5 + 1)(5 + 2)] = 4.76. According to the MSE
selection criterion, θ̂2 is preferable.
2. From the distributions of X1 , · · · , X10 and Y1 , · · · , Y10 , we have E(X̄ ) = E(Ȳ ) = μ,
Var(X̄ ) = σ2 /10, and Var(Ȳ ) = 4σ2 /10. X̄ and Ȳ are also independent. Thus,
(a) For any 0 ≤ α ≤ 1, E(μ̂) = E(αX̄ + (1 − α)Ȳ ) = αE(X̄ ) + (1 − α)E(Ȳ ) =
αμ + (1 − α)μ = μ. Thus, μ̂ is unbiased for μ.
(b) Since μ̂ is unbiased,
(c) The estimator 0.5X̄ + 0.5Ȳ corresponds to μ̂ with α = 0.5. The MSE is
¯ ¯ 2
σ2 σ2
MSE(0.5X + 0.5Y ) = (5 × 0.5 − 8 × 0.5+ 4) = 1.25 .
10 10