STAT 252 Practice Final Exam Guide
STAT 252 Practice Final Exam Guide
STAT 252
PRACTICE FINAL
Signature: __________________________________
2. You are permitted to use a NON-PROGRAMMABLE calculator, and the formula sheets and tables
provided.
6. This exam has 14 pages (including this cover and all computer output tables). Please ensure that
you have all pages.
7. Make sure your name and signature are on the front and your student ID number is at the top of
page two.
8. For questions that state you should show all steps, be sure that you do this in order to obtain full
credit. Conclusions must also be clearly stated. Your answers must have adequate justification.
9. For questions that state that you do not need to show all steps, read the question carefully and follow
the exact instructions regarding what is required.
10. If you run out of space in the blank area provided, use the back of the page to complete your answers
as needed and label such answers so that is clear which question they belong to.
11. Also use the reverse sides of the pages for all rough work.
BEST WISHES!!
1
Student ID Number: ___________________
Question 1 (2 marks): What is an indicator or dummy variable? What is its application in regression?
Question 2 (Two parts totaling 5 marks): A randomized experiment was conducted on washing hands
using four different methods and determining subsequent bacterial counts. The output below is from an
ANOVA F-test which resulted in rejecting the null hypothesis and concluding that there is a difference in
the bacterial counts after using the four methods of washing hands.
SUMMARY
Groups Count Sum Average Variance
Just water 8 936 117 969.1429
Alcohol (65%) 8 300 37.5 705.4286
Anti-bacterial soap 8 740 92.5 1760.857
Regular soap 8 848 106 2205.143
ANOVA
Source of Variation SS df MS F P-value
Between Groups 29882 3 9960.667 7.064 0.0011
Within Groups 39484 28 1410.143
Total 69366 31
(a) (3 marks): Using the Bonferroni method at the 94% confidence level, determine the individual
comparison-wise error rate ( I ), find the critical value (two-sided) from the appropriate statistical
table and calculate the margin of error. You do not need to perform all the steps. [Note: You only
have to calculate the margin of error once since the sample sizes are equal for all treatment groups.]
2
(b) (2 marks): Develop a linear combination (contrast) to test whether there is a difference between using
alcohol versus the other three methods combined. However, you do not need to perform all steps of a
hypothesis test; just develop the contrast and calculate the estimate of the contrast.
Question 3 (Four parts totaling 16 marks): The average saturated fat consumption (in grams) and
cholesterol level (in mg/100 ml of blood) of a random a sample of 8 men were recorded. The data obtained
fit the assumptions of simple linear regression analysis. SPSS output obtained after analysis of the data is
shown below, with some values missing from the tables. You may also need some of the following
information: x = 52.625 , y = 189.250 and S xx = 1587.875 .
Scatterplot (done with SPSS) Normal Probability Plot (done with SPSS)
Fat consumption
Residual Plot
50
Residuals
0
0 20 40 60 80 100
-50
Fat consumption
3
Model Summaryb
ANOVAa
Regression 5089.335
1 Residual 502.165
Total 5591.500
Coefficientsa
(a) (5 marks): Using a simple linear regression ANOVA test, at the1% significance level, test whether
there is a relationship between saturated fat consumption and cholesterol level in men. In other words,
test for the significance of the slope of the regression line. Perform ALL steps of the hypothesis test.
Give both the exact P-value from the computer output and the P-value obtained from the F-table.
4
(b) (3 marks): Calculate the Pearson correlation coefficient for the relationship between saturated fat
consumption and cholesterol level in men. You do not need to do all the steps of a hypothesis test;
just state the correlation coefficient, the P-value (both the exact P-value from the computer output and
the P-value obtained from the r-table) and your conclusion.
(c) (4 marks): Since it is fairly common knowledge that high saturated fat consumption increases
cholesterol level, perform a regression t-test, at the 1% significance level, to test the hypothesis that
there is a positive relationship between saturated fat consumption and cholesterol level in men.
Perform ALL steps of the hypothesis test. Give both the exact P-value from the computer output and
also the P-value obtained from the t-table.
5
(d) (4 marks): Calculate a 95% confidence interval for the mean response of cholesterol level for men
whose average fat consumption is 60 g/day.
Question 4 (Two parts totaling 8 marks): An experiment was conducted to test the ultimate strength (in
MPa’s) of random samples of three types of metals (steel, alloy and titanium) produced by two methods
(Method 1 and Method 2). The following is incomplete SPSS output. [Note: This is a balanced design
where n = 42 and there are 7 replicates for each combination of the two factors.]
6
(a) (6 marks): Perform a hypothesis test, at the 1% significance level, to determine whether the overall
model is significant.
7
(b) (2 marks): The table below shows the results of multiple comparisons for the difference in strength
between the three types of metals (Steel, Alloy and Titanium) both separately for Methods 1 and 2
and overall for the two methods combined. Firstly, construct a means comparison diagram
summarizing the results of multiple comparisons for the difference between the three types of metals,
regardless of the method used (that is, based on the means for the totals). Secondly, write a
conclusion in words about what the multiple comparisons show.
Descriptive Statistics
Dependent Variable: Strength
Method Metal Mean Std. Deviation N
Alloy 820.00 40.415 7
Steel 864.29 36.904 7
Method 1
Titanium 891.43 36.710 7
Total 858.57 47.041 21
Alloy 824.29 41.975 7
Steel 903.57 39.761 7
Method 2
Titanium 854.29 35.051 7
Total 860.71 49.932 21
Alloy 822.14 39.648 14
Steel 883.93 42.116 14
Total
Titanium 872.86 39.502 14
Total 859.64 47.925 42
8
Question 5 (Five parts totaling 15 marks): A marine ecologist wanted to examine the relationship
between water depth, light intensity, and diatom density (response variable). At 9 different depths in the
ocean, he recorded depth (in meters), light intensity (as percentage of the surface intensity) and diatom
density (in cells per milliliter of ocean water). The first table below shows the raw data recorded. Below that
is incomplete SPSS output of the data analysis.
Model Summary
ANOVAa
Regression 7510.626
1 Residual 9.374
Total
Coefficientsa
9
(a) (5 marks): At the 1% significance level, perform a hypothesis test to determine whether the overall
multiple regression model is significant or useful for making predictions about diatom density.
(b) (3 marks): Calculate a 95% confidence interval for the slope of the interaction term (representing
interaction between depth and light intensity). Using this confidence interval, what conclusion can you
make about the significance of the slope of the interaction term? Explain your answer.
(c) (1 mark): Find the standard error of the model (standard error of the estimate of the model)?
10
(d) (2 marks): At a depth of 70 meters and a light intensity of 18%, suppose that the actual or observed
diatom density observed was 19.6 cells per milliliter. What was the residual or error of this
observation?
(e) (4 marks): Based on the values of the predictor variables given in part (d) (depth = 70 m, light = 18%,
interaction term = 1260), what is the 95% prediction interval for all single observation responses of
diatom density at those values of the predictor variables? [Note: SE(Fit) = 0.793]
11
Question 6 (5 marks): A certain company wanted to analyze the relationship between total sales
(response variable) and the money they spend advertising through magazines, television, and radio. All
data were recorded in millions of dollars based on a random sample of 10 business transactions. At the
5% significance level, perform the most appropriate test, showing all steps, to determine whether
magazines have any effect on sales after accounting for the effect of TV and radio advertizing.
Consider the following three models and the corresponding ANOVA tables below them:
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 353.361
1 Residual 958.275
Total 1311.636
a. Dependent Variable: Sales
b. Predictors: (Constant), Magazines
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 1117.732
1 Residual 193.904
Total 1311.636
a. Dependent Variable: Sales
b. Predictors: (Constant), Radio, TV
ANOVA table for Model 3: Effect of Magazines, Radio and TV (Full Model)
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 1194.529
1 Residual 117.107
Total 1311.636
a. Dependent Variable: Sales
b. Predictors: (Constant), Radio, TV, Magazines
12
Solution for Question 6:
(a) (2 marks): What is the effect of magazines on sales? How would you redefine the model? No
calculations are necessary.
(b) (1 mark): What would be the null hypotheses for testing for the effect of magazines?
Question 8 (2 marks): Suppose the relationship between the annual rate of hip fractures (per 100,000
people) and age follows the following model: ˆ (ln( fractures ) | age) = −2.09 + 0.0912 age .
For an increase in age from 40 to 50 years old, what would be your interpretation regarding the rate of hip
fractures on the original scale?
13
Question 9 (6 marks): All parametric hypothesis tests have assumptions about normality. However, the
specific requirements regarding normality differ from one test to the other. For each of the hypothesis
tests mentioned below, explain what the specific requirement is regarding normality. In answering this
question, do not make reference to the Central Limit Theorem, assuming that sample sizes are not large
enough to apply that theorem.
Question 10 (2 marks): There are two types of inferential statistics that can be applied to a research
problem; one type is a hypothesis test and the other is a confidence interval. What advantage does a
hypothesis test have over calculating a confidence interval? Explain your answer.
14