Stat 305 Final Practice – Solutions

1. Enterprise Industries produces Fresh, a brand of liquid laundry detergent. In order to more
effectively manage its inventory, the company would like to better predict demand for Fresh. To
develop a prediction model, the company has gathered data concerning demand for Fresh over
the last 30 sales periods (each sales period is defined to be a four-week period). For this data
set, let

x1 = the price (in dollars) of Fresh as offered by Enterprise Industries in the sales period minus
the average industry price (in dollars) of competitors’ similar detergents in the sales period.

x 2 = Enterprise Industries’ advertising expenditure (in hundreds of thousands of dollars) to
promote Fresh in the sales period

y = the demand for Fresh (in hundreds of thousands of bottles) in the sales period
Refer to Output A for parts (a) – (b).
a) [4] Based on your interpretation of the scatterplots provided, state the model equations that
might adequately describe the relationship of i) y with x1 and ii) y with x 2 . Your answers
here should be similar in form to the following incorrect answer:
y = β 0 + β1 x1 + β 2 x2 + β 3 x1 x2 + ε .

i) y =

β 0 + β1 x1 + ε

ii) y = β 0 + β1 x 2 + β 2 x 22 + ε

b) [2] If one would fit the model, y ≈ β 0 + β1 x1 (which may or may not correctly reflect the
relationship between y and x1 ) to the data set, what would be the value for R2, the coefficient of
determination?

R 2 = 0.88972 = 0.7916
From among several models for y as a function of x1 and x 2 , the following model (Model 1) was
selected: y = β 0 + β1 x 2 + β 2 x 22 + β 3 x1 + β 4 x1 x 2 + ε .
c) [5] A normal quantile plot of residuals and a plot of the residuals versus the predicted values
are shown in Output B. Describe how you may use these plots to examine whether certain
model assumptions are appropriate here. State the assumptions under consideration and
identify clearly the plot you would use for assessing each assumption.

ε ~ iid N (0, σ 2 )
The constant variance assumption can be assessed by looking at the plot of the
residuals. If one sees the residuals forming a fan shape, then the constant variance
assumption may not be appropriate for the data. If the model is appropriate for the data,
then one hopes to see the residuals forming a cloud shape.

H0 : β4 = 0 H a : β4 ≠ 0 − 1. then the normality assumption could be appropriate for the data. MSE = 0.6672 p − value = 0.20) − 1.11 − 7. yˆ = 29.20 (dollars) and the advertising expenditure for Fresh will be 5. especially in the middle of the plot. g) [2] Give the estimate for σ 2 .02 ) Since 0 is not in the interval.32.61(5) + .4777 − 0 = −2. If the points create a fairly linear pattern.21 0.0 (hundreds of thousands of dollars).20 * 5) = 7.1.05.0361 t= Since the p-value is less than 0.6712 ± 1. Refer to Output C for parts (d) – (g). Continue to follow the five-step format to perform a hypothesis test. we can reject the null hypothesis and conclude that the interaction term is needed in the model. Use this interval to test the hypotheses concerning whether a quadratic term is needed in the model or not. Note that the value used for t is based on 25 df. Hence the quadratic term is needed in the model.13( −0.67(52 ) + 11.2027) (0.The normality assumption could be assessed by looking at the normal quantile plot. d) [4] Predict the demand for the next sales period (in hundreds of thousands of bottles) if the price difference will be -. then we can conclude that β 2 ≠ 0 .064 e) [5] Calculate a 90% confidence interval for β 2 using the information provided.04258 .48( −0. 0. State your decision. f) [4] State the null and the alternative hypotheses concerning whether the interaction term is needed in the model or not.708(0.

Output D contains the output for a model that uses only advertising expenditure to predict demand. Follow the five-step format to justify using Model 1 or Model 2 to predict demand.05 so we reject H0..e. use the full model (Model 1) to predict demand.4.06) /(27 − 25) = 13. Therefore. i.2 ~ F2. (2. H 0 : β3 = β4 = 0 H a : at least one β j ≠ 0 for i = 3. Output A: Output B: . y = β 0 + β1 x 2 + β 2 x 22 + ε (Model 2).18 − 1. Note that you will need to provide the value of a test statistic that is distributed according to an F -distribution.06 / 25 Q(.001 f = The p-value is less than 0.25 1.95) = 3.39 p − value < . Enterprise Industries is wondering if using only advertising expenditure to predict demand is equivalent to using both advertising expenditure and price difference to predict demand.h) [6] Since Enterprise Industries has to pay someone to visit several stores and gather information on the prices for similar detergents produced by competitors during every sales period.

Output C: Output D: .

19 mpg.97 and 30. the coefficient of determination.027) (−0. Source DF Model 4 400 Error __a__ ___b___ 24 800 C. 3. c=100.029 − . √0. − 0. − 0. For a speed of 55 mph and 87 octane.547 = 0. Fill in the blanks for the following Analysis of Variance table.2.176 ± 2.252. A student measured his car mileage at different combinations of speed (55. The fitted model is in the form yˆ = b0 + b1 x speed + b2 xoc tan e . a) Calculate R2. d=20. 65 and 70 mph) and octane (87 and 90).637 = . 60. The data set consists of 24 observations . e=5.831(0. y1 − yˆ1 = 30 − (−133. Total Sum of Squares Mean Square _____c____ F Ratio ___e___ _____d_____ Answers: a=20.363 c) Give a 99% confidence interval for β1 . Refer to Output A to answer the following questions.981* 87) = 30 − 29. . we are 95% confident that the average mileage will be between 28.95 b) Calculate the residual for the observation given in the first line of the data table.14 / 246.100) d) Give the estimate for σ.176 * 55 + 1. b=400. R2 = 235.three observations for each combination of levels.63 = .7396 f) Interpret the confidence interval calculated by JMP for the observation described by the last line in the data table.

0001 f = Since the p-value is less than 0. the p-value and the conclusion.90 11.164883 28.5479045 30. state the null and alternative hypotheses. 21 > f ] < .3517837 27 26.5 28.1929045 29. the model that includes both speed and octane along with the appropriate parameter estimates should be used to predict average mileage.2832163 27. the formula with the appropriate values as provided by the JMP output.3237622 30.9687622 Upper 95% Mean mileage 30.05.489 /(21) P[ F2.9687622 29 28.g) To compare model y ≈ β 0 + β 1 x speed + β 2 x oc tan e to model y ≈ β 0 .63 − 11. H 0 : β1 = β 2 = 0 H a : at leastβ1 or β 2 ≠ 0 (246.1929045 .2334503 28 27. Output A: First five lines from the data table: Speed 55 60 65 70 55 Octane 87 87 87 87 87 Mileage Lower 95% Mean mileage 30 28.489) /(23 − 21) = 214. we can conclude that at least one of the parameters is not equal to zero. Therefore. the formula for the test statistic.

56 f = => .99) = 7. 2. a) In order for inference about quantities such as β1 to be valid.4.01 < P − value < . 60 and 80 degrees Fahrenheit. which is characterized by time and temperature. H 0 : β3 = 0 H a : β3 ≠ 0 (364346.2 − 321450. and maximum compressive strength of concrete (psi). Which model (along with the appropriate parameter estimates) use to predict strength and why? Follow the five-step format and provide a test statistic that is distributed according to the F-distribution. And there are three observations for each combination of levels. Hence. Output C: .27 < Q(.5) /(33 − 32) = 4.32 321450. we should fit y ≈ β 0 + β 1 xtime + β 2 xtemp + β 3 xtime xtemp to the data set and use the model with the Output B: appropriate parameter estimates to predict strength.95) = 4. there is enough evidence to conclude that β 3 ≠ 0 . There are four levels for time: 1.05.17 < 4. σ 2 ) b) Compare the model y ≈ β 0 + β1 xtime + β 2 xtemp (refer to Output B) to the model y ≈ β 0 + β1 xtime + β 2 xtemp + β 3 xtime xtemp (refer to Output C).30 : Q(. There are three levels for temperature: 40. 5 and 10 days.5 / 32 Using F1. what do we need to assume about errors (residuals) for any linear regression model? ε ~ iid Normal(0. The Department of Transportation (DOT) conducted an experiment to determine the relationship between the curing process.27 ~ F1.05 Since p-value<.

Each person had his/her cholesterol measured before taking the drug. Answers: I . Fill in the blank(s) with the appropriate answer. According to the 95% confidence interval. -30. a) Give and interpret a 95% confidence interval for the mean difference between the before and after cholesterol measurements. F.55. independently and identically distributed. a) A Type __________ error occurs when one says that there is a difference between two population means when the difference is zero as stated by the null hypothesis. Central Limit. A new experimental drug to reduce cholesterol was developed. c) The sample mean for large samples (samples with 30 or more values) is approximately normally distributed according to the ____________ ____________ Theorem. 6. Then each person took the drug for a six-week period and had his/her cholesterol measured again. one means that the probability that µ lies within the interval is .10) is a 95% CI for µ . -20.3. Five people were chosen to receive the new drug.776 ⎟ . 4. Circle either T (true) or F (false).95.05 based on a null hypothesis that states there is no difference. -60 d = -29 2 s d = 330 ⎛ 330 330 ⎞ ⎜ − 29 − 2. the p-value would be less than 0.−29 + 2. T F Suppose a 95% confidence interval for the difference of two population means is (-1. T F When one says (0.5. -15. b) The letters iid stand for __________________________________________________.−6. Answers: F. T F A 99% confidence interval is wider than a 95% confidence interval for a given data set.45) 3 180 165 4 195 175 5 240 180 .776 ⎜ ⎟ 5 5 ⎝ ⎠ df = n − 1 = 5 − 1 = 4 (−51.1). 7. Person Before After 1 200 180 2 220 190 Differences: -20. T.

.We are 95% confident that the mean decrease in cholesterol after taking the new drug for six weeks will be between 6. 8. An engineer is concerned about spring lifetimes (103 cycles) under two different levels of stress: 900 N/mm2 and 950 N/mm2. 171. Below are the data. 189.5 p < . 135. 162. An engineer is concerned about spring lifetimes (103 cycles) under two different levels of stress: 900 N/mm2 and 950 N/mm2. 189 Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.05.1 2 s900 = 1844. 135. This time the engineer performed this experiment with a total of 100 springs.55.9 x900 = 215. 216.3 − 0 t= = 2.3 2 s950 = 1098. 189. 216. 198.36 18 215.1 1098 .02 < p < . 9. 162 900 N/mm2: 216. Below are the data. 162.36 + 10 10 . so we will reject the null hypothesis and conclude that there is a difference in mean lifetimes between 900 N/mm2 stress level and 950 N/mm2 stress level.1 − 168. 117. 153. 306. H 0 : µ 900 − µ 950 = 0 H a : µ 900 − µ 950 ≠ 0 x950 = 168.45 and 51.9(9) + 1844 .01 < .1(9) = 38. 950 N/mm2: 225.73 1 1 38.025 sp = . 243. 225. 225.05 The p-value is less than .

Hence.8 1315.2 + 40 60 P[| Z |> 1.73] = . H 0 : µ 900 − µ 950 = 0 H a : µ 900 − µ 950 ≠ 0 z= 168.8 − 154.73 1902.0836 The p-value is greater than .05.x950 = 154. so we will not reject the null hypothesis.73] = 2 P[ Z < −1. . there is not enough evidence to conclude that there is a difference in mean lifetimes between 900 N/mm2 stress level and 950 N/mm2 stress level.1 − 0 = 1.1 2 s 950 = 1315.8 2 s 900 = 1902.2 n950 = 60 x900 = 168.8 n900 = 40 Follow the five-step format to assess the strength of evidence that the difference in mean lifetimes between 900 N/mm2 stress level and 950 N/mm2 stress level is not equal to zero.