You are on page 1of 9
STAT 430 Final Exam Instructor: David Shaw Date: December 17, 2012 Instructions: Show all work, clearly and in order, if you want to get full credit. Justify your answers algebraically whenever possible to ensure full credit. Circle or otherwise indicate your final answers. Please keep your written answers brief; be clear and to the point. I will take points off for rambling and for incorrect or irrelevant statements, Note: The four problems are given on the first page (front and back) and each refers to its respective output on the subsequent pages. The exam is out of 150 points Good luck! 1. (36 points) After much investigation, it was posited that the explosion of the USA Space Shuttle Challenger on 28 January, 1986 could be traced to the failure of “O-rings’-components of the field joints on the booster rockets. Each rocket has a total of 6 O-rings, and data from past test launches were collected into 23 observations ‘Many engineers believe that the probability of failure of the O-rings is related to the temperature at Isunch, and slong with the number of compromised O-rings (aunThera), the temperature (comp, recorded in Celsius) during the test launch was also recorded. Linear regression was performed using nuxTherm as a response variable and texp as a predictor. The goal was to predict the number of O-rings thet will fail when the temperature is below freezing (.e., below 0) (5 pts) What is the estimated variance of the residuals? (i.e., what is 627) b. (5 pis) Compute the studentized residuals for observations 6 and 22. ‘c. (10 pts) What point(s) would you consider outlier(s)? influential point(s)? Given the problem of estimating the number of compromised O-rings at temperatures below freezing, ‘would you consider removing the outlier(s)? d. (10 pts) Consider the quadratic model. Plot the estimated number of compromised (O-rings (auaThern) vs. temperature (teap) between temp=0 and temp=30 (hint: plot points for ‘temp at 0, 16, and 30, and connect them). Do the same for the linear model. e. (5 pis) What conclusions can you draw about the failure of the O-rings when the temperature is below freezing? Provide interpretations for both models. 2. (45 points) Measurements were taken on 699 benign and malignant breast cancer cell sowths with multiple measurements (size, shape, etc.) for each tumor. The hope is that utilizing these measurements will allow doctors to diagnose tumors as either benign or malignant without having to perform more invasive procedures. ‘The variable diagnosis, taking a value ‘B’ for a benign growth and a value ‘M’ for a malignant growth, was used as the response. (Source: University of Wisconsin Hospitals, Madison; Dr. William H. Wolberg) a, (10 pls) The variables chape and nuclei take integer values from 1 to 10. Interpret this model in terms of the probability of the tumor being benign (i.e, are you more likely to hhave a benign tumor if shape has a large value or a small value? what about nuclei?) b. (10 pts) The variable thick also takes values from 1 to 10. Interpret this model in terms of the probability of a tumor being benign. How does it compare to the first’ model? c. (5 pts) What is the minimum predicted probability from the first model? What is the maximum? (give three significant figures) 4. (5 pts) Compute the AIC and the SC for the first model, and the -2 Log L for the second. e. (10 pts) Sensitivity and specificity values were provided for various thresholds on the predicted probability for ‘B’ both for the first and the second model. Sketch the ROC curves for both models on the same plot. £. (5 pts) Which model do you think is better? Why? 3. (95 points) ‘The age of an abslone (sea snail) is usually determined by cutting its shell and counting its rings. This is typically a time-consuming task, so a dataset with other factors was obtained to determine if some attributes could accurately predict the age of an abalone without, having to go through this process. ‘The dataset contains various attributes (weight, sex, length, diameter, etc.) and the response considered is the number of rings (an integer). ‘a. (10 pts) Compute the sum of squares for the model and the R-square for this model. b. (10 pts) The variables have the following meanings: veightAl1 - whole weight, weight In - sum of weight of meat and viscera (portions of inside shell), weightOut - weight of shell. Interpret the model in terms of these weight measurements. In other words, how does the estimated age change when ‘inside’ weight changes? shell weight? total weight? ©. (5 pis) Which coefficients are significantly different from zero? (Note: the 99.5th percentile of a standard normal distribution is 2.58). d. (5 pts) A second model was fit with all second order interactions of these weight variables. Compute the R-square for this model and determine the significance of the coefficients (you do not have to compute exact p-values). e. (5 pts) Which model do you prefer? Why? 4. (35 points) A balanced experiment was performed in which 5 factors related to a wave- soldering procedure were varied. The number of times the solder skips (the variable skips) was recorded as the response variable. The square root of this variable (coded as sqskips) was used in the analysis. a. (5 pts) The variable Mask has four levels: A1.5, A3, B3, B6. Are the means for ‘eqokips significantly different from one another between these groups? b. (10 pts) Compute the degrees of freedom for the model and the mean of squares for the model and error. c. (10 pts) Write the hypothesis tests that each contrast statement is testing. Interpret ‘the results/signficance of these tests. 4. (5 pts) The plot given shows the mean of sqskips vs. Mask with different plotting characters for Opening (levels: L,M,S). Describe the interaction between Mask and Opening using the plot and the output given. . (5 pts) What is the coefficient estimate for Mask in the first model (the one without the interaction term)? Why? ‘seseerssussestsnnsnneesensenseratiny ‘ves ada Selected Code/Output for Problen 1 sesneesuacsseasastnatentoseastantans f Parts a-c PROC REG DATAroring: MODEL nusThere = teap; OUTPUT OUT=outreg Pepred Reresid STUDENT=studResid COUKD=cook Heleverage; ROM; Analysis of Variance Sun of Mean Source Dr Squares Square F Value Pr > F Model 1 3.61750 3.61750 28.36 <.0001 Error pe 3.25208 0.15486 Corrected Total 2 6.86957 Parameter Estinates Parameter Standard Variable DF Estimate Error t Value Pr > [tl Intercept 1 2.46286 0.45408 5.42 <.0001 temp 1 0.09194 0.01902 4.88 © <. 0001 pus stud Obs Thorn tesp pred resid Resid cook ~— Leverage 1 0 21.280 0.50921 0.50921 1.33115 0.05164 0.05508 2 00 24.875 0.45178 -0.45175 -1.17746 0.03609 0.04048 3 0 —-21.875 0.45175 -0.45175 -1.17746 0.03609 0.04948 4 0 24.875 0.45178 -0.48175 -1.17746 0.03609 0.04048. 5 ~ 0 22.500 0.39429 -0.39429 -1.02566 0.02520 0.04571 6 0 23.125 0.83683 -0.33683 _______ 0.01783 0.04377 7 0 28.750 0.27987 -0.27937 -0.72693 0.01203 0.04365 8 0 28.760 0.27937 | -0.27937 -0.72593 0.01203 0.04365, 9 0 28.000 0.16444 -0.16444 -0.42848 0.00472 0.04889, 10 025.625 0.10688 -0.10698 -0.27955 0.00224 0.05425 11 148.625 1.02635 -0.02635 -0.07429 0.00064 0.18758 12 0 -—-26.875 -0.00784 0.00794 0.02092 0.00002 0.07048 130 26.875 -0.00784 0.00794 0.02092 0.00002 0.07044 14 116.280 0.96889 0.03111 0.08655 0.00074 0.16556 45 0 27.500 -0.06540 0.08540 0.17838 0.00133 0.08127 16 027.500 -0.06540 0.08540 0.17338 0.00133 0.08127 17 028.760 -0.18032 0.18032 0.48527 0.01432 0.10841 48 0 -—-29.975 -0.23778 0.23778 0.64585 0.02072 0.12472 19 1-—«'19.375 0.68159 0.31841 0.84488 | 0.03223 0.08282 20 0 += 30.625 -0.35270 0.38270 0.97954 0.09330 0.16282 210 123.750 0.27937 0.72063 __, 0.08002 0.04365 22 4 23.750 0.27937 0.72063 «1.87287 0.08002 0.04365 23-2 «13.125 1.25619 0.74381 2.24940 1.08317 0.29393 Parts d- Analysis of Variance Sum of Moan Source DF Squares Square F Value Moder 2 4.58565 2.20283 20.08 Error 20 2.28381 0.11420 Corrected Total, 2 6.86967 Parameter Estinates Parazeter Standard Variable DF Estinate Error Value Intercept 1 6.55300 1asrea 4.50 ‘teap 1 0.47700 0.13328 -3.58 ‘temp? 1 0.00869 0.00298 2.91 serereenneenennnnenenennnenennenges Selected Code/Qutput for Problea 2 setsesanuseeonennoesssstasentenesees Part a,d PROC LOGISTIC DATA=cancer; MODEL diagnosie(event=/B’) = shape nuclei: RU: Model Fit Statistics Intercept Intercept and criterion Only Covariates. arc 896.350 sc 890.877 8 “2 Log L 884.360 471.985 Analyeie of Maximun Likelihood Estimates Standard ward Parameter DF Estimate Error Chi-Square Intercept. 16.0826 0.4991 148. 5448 shape 1 4.0830 (0.4284 67.2453 ucled 1 0.727 0.0785 57-5681 Pr > Il 0.0002 0.0019 0.0086 Pr > chisq, <.0001 <.0001 <.0001 Pr>F <.0001 Parts bed PROC LOGISTIC DATAcanc: BY) = thick shape nuclei; went anc toes veneers opines MODEL diagnosis (ev RUN; Saad Model Fit statistics Intercept Intercept ‘and Criterion Only Covariates ae 886.350 148.246 sc 390.877 166.362 “2 Log b 884.350 - Analysis of Maximun Likelihood Eotinat Standard vena Parameter DF Estimate Error Chi-Square Pr > Chisq Intercept 18.0710 0.7856 108.541 <.0001 wick 1 -0.5871 0.1209 8.8600. <.0008 shape 1 -0.7601 0.1369 33.7039 <0001 aucles + 0.8367 0.0881 39.7273 <.0001 Parts o-f Model 1: P>S[P> | P> 1 Sensitivity! 4 | 96 | 98 Specificity | 90 | 86 | 95 Model 2: P>5[P> | P> 75 Senstvity| 95 | 35 | oF Specificity | 95 | 89 | 97 dseatanunenseeasenessssnentensanen® Selected Qutput for Problox 3 senviaaunennenenisseatasaatsseanens Parts ave Sun of Mean Source DF Squares Square Model 3 ‘To41 10036, Error 4173, a7 5.34084 Corrected Total 4176 apait Root MSE 2.31103 R-Square Dependent Kean 9.93368 Aaj RS, Coot? Var 23. 26454 Parameter Standard Variable DF Estimate Error Value Intercept 1 6.74866 0.07176 93.99 veigheall = 1 10.39488 0.75920 13.69, weightIn 1 ~16.94882 0.79034 -20.69, veightOut = 1 14.24360 i1giza 12.48 Parts d-e Sun of Mean Source DF Squares Square Model 6 a--- 3872.47610 Error 4170 20176 4.83832 Corrected Total 4176 aoait Root MSE 2.19962 R-Square Dependent Mean 9.93368 Adj RS, Coes? Var 2.14301, Parameter Standard Variable oF Eotimate Error Value Intercept 1 5.32883 0.1010 52.27 voightAlL 1 40.23820 1.48231 6.90 veightIn 1 -20.01826 16420-1215 vweightout 1 37.0850 2.43821 18.21, weightIndut = 1 -18.63372 5.76514 =2.36, weightAllOut = 1 ~10.93650, 2.49883 4.38 weightInAll = 5.41067 0.75640, 715 FValue Pr>F 1918.35 <.0001 Pr> itl FVelue Pr>F 200.38 <.0001, anteenenaeesanansantesteneenenentaty Selected Code/Output for Problea 4 setennentenseeaauaneeenenanneneaans Parts ane PROG GLM DATA*solder; CLASS Mask; MODEL aqakips = Masi; CONTRAST ‘contrasti’ Mask 1 1-1-1; CONTRAST ‘contrast2’ Mask 3-1-1 -1; CONTRAST ‘contrast3’ Mask -1 -10 2; CONTRAST ‘contrast4? Mask 0 0 1 1; OW: Sun of Source Dr Squares Yodel — — 369,629526 Error - 1aas.s29548 Corrected Total 739 1807.953074 Source DF Type 1.88 Maske = 359.6295259 Source DF Type III ss Mask 369.6296269 Contrast DF Contrast $5 contrast 1 281.64a4688 contrast? 1 195.4779982 contrast3. i 346.3750302 contrast 1 72.2798271 Part d-« Model 1: Source DF Anova SS Maske 3 —-. Opening 2 §93,0667283 Moder 2: Source Dr Anova $3 Maske 3 359.6295259, Opening 2 §93.9657283, Mask*Opening 6 88.6599036, Mean Square F Value Pr > F 59.26 <.0001 Mean Square F Value Pr > F 59.26 <.0001 Mean Square F Value Pr > F 59.26 <.0001 Mean Square F Value Pr > F 281.6444688 «129.24 tenene 4195.4779362 66.98 txenee 346.3750302 171.24 nanan 72.2798271 38.73 <.0001 Mean Square F Value Pr > F "~296.9828681 248.19 <.0001 Mean Square F Value Pr > F 119.8765086 110.84 <.0001, 296.9828641 274.60 <.0001 14.7766508 13.68 <.0001, mean of sqrt(skips) Interaction Plot (Problem 4d) Opening Mask BE

You might also like