You are on page 1of 121
Sefeed UNIVERSITY OF CAPE COAST L : DEPARTMENT OF Bie LUBMATICS & STATISTICS END-OF-SECOND SEMESTER EXAMINATIONS “ ‘ (2013/2014) ; STA 408: LINEAR MODELS Maskente( Time Allowed: 2 Hours This paper consists of two sections, A and B. Answer all the questions in Section A And any. tue from Section B . "Unless otherwise stated, Take the level'of significance, a= 0208, Scanned with CamScanner ‘with obsel SECTION 4 [30 marks] Answer all the questions in this section. Each of the items 1 to.30 is foliowed by options lettered A to D. Select the letter that corresponds to the correct or best option. 1. Which of the following are components of a mathematical model? : I: Random variables : : 1: Mathematical variables Il: Parameters A. Land II only B. Land III only ¢. Il and II only @S1, Wand 1 A model which is based on n = 30 observations is given as: “ys 13. ~ Sx ooo + 1X3 ~X4 +3x5, vation 27, having leverage value of 0.28. Use. this information to answer Questions 2 and 3. Ng yee 2. The average leverage value is 2% ie a) YA OFT B--0.23 Cc. 0.13. e 0.20:" 3. Which one of the following s A. ‘Observation ‘27 is an ow values. Statements is true?” tlier with respect to the x B; Observation 27 is not an outlier with respect to the a values: 7 — ops [ox Observation 27. is an outlier with respect to the y- values. D. Observatio! values, n 27 is not an outlier with respect to the y- Scanned with CamScanner nis of determination. R?; of 4= Bo + Boxe + Baxs tively R? =: The coe, and x) =fo* Bix + B3X3 are respec R3 = 0.912. Use this information to answer Questions 4to6. 4, Find the Variance Inflation Factors (VIFs) for x, and xp. A. VIR, = 0.954, VIF = 0.046 B. VIF, = 0.088, VIF = 9.912 £, VIF, = 21.739, Vis = 11.364 : D. VIF, = 23.107, VIF) = 3:241 5. The VIFs for the two variables suggest that the presence of the two variables ina model would: A. cause singularity problems. cause homoscedastic: problems. B ¢ cause multicollinearity problems/ 2 _ strengthen predictive powers of the model. 6. Giyen that the VIF for xg is 2.100, find the mean VIF. C. 9.483 j ot A MINITAB output of regressing y on three predictor variables, his information to answer, %,X, and X53, is shown below. Use # Questions 7 to10. Analysis of Variance MS Fe ‘source DF SS < ' 67.59 0,000 3 20156028928 6718676480 Model 2 Error 9 7 894568512 99396504 - total —«-12.:: 21050587376 \ Regression analysis : predictor coef. stbev’ texatio: ( ' Poy constant 30626, 19808 4:55 0-156 4 : x 3.893, 2-082 4,87 0,094 FS xy =29607" 23822 -1.24 0.245 + i X30 g6.52° 18.69 4.63 0-000.7° > Scanned with CamScanner 5 7, Write down the model relating y to x, xX. and x3. Af y = 30626 +3.893x, - 29607 xp + 86.52x5 B. y= 3x, +9x9412x3 C. y= 3.893x, - 29607 xp + 86.52x3 D. y= 30626.+ 2.081x; - 23822x9 + 18.69x3 8. Find the coefficient of determination for this modcl. A. 0.05%» By 95.75% C. 90.50% D. 4.25% 9.. What'can you say about the significance of the model? A:- The model is not significant at 1%. The model is‘significant at 1%. C., The model is'not significant at 67.59%. GyThe model is.not significant at 5%. 10. Which of the following statement(s) is/are true about the predictor variables? : I, | Variable. x, is significant at S%. Il. Variable’ x, is not significant at 5%. Ill. Variable x, is highly significant, even at 1%. A. Tonly . ~ By Il only C. land Il only Iland III only Questions 11 to 14 are based on the following residual plots. @. o. Residuals Residuals FT Predictors Scanned with CamScanner (i a Residual eet Residuals * + Time 4 11, Which plot(s) suggest(s) that the independence assumption is violated? A. Plots (a) and (b) B. Plot (b) Cc. Plot (c) & Plot (a) 13. Which plot(s) suggest(s) that the homoscedasticity assumption is violated? Plots {a) and (b) . Plot (b) C. Blot (c) D. Plot (d) - -3.Which plot suggests that the normality assumption is violated? A. Plot (a) ; B. Plot (b) ci C. Piot'(c) ale ee t Predictors wy None i4, Which plot suggests that the functional form of js incorrect? : = A. Plot (b) B. Plot {c) . Plot (d) None Scanned with CamScanner SX 15. Which of the following are-the reasons for transforming response variables? I.. To stabilize the variance of the response variable when homoscedasticity assumption is violated. Il. To normalize the response variable when normality assumption is violated. ‘ c Ill. ‘To seasonalise the regression model whgn original data suggest a model that is nonlinear. - I andl only. fe. land Ill only. Cc. (and III only. D. All of the above. Suppose that ss Y= +8, © Ha2 +848, Yy 3.09.4 28, +&5 where the €, (i=1,2,3) are independent with E(e,)=0 and V(E;)=a". Use this information to answea Questions 16 and 17. -16, This system of equations can be written in matrix form as Mm 10 : Os A. v)-[2]}22 Yy ry 2 ¥] [2%] re B v|-|23 [| ¥g}°{22]b Yl paypi21 : cp l*1¢ |lo12 Y3 YY) p12:17[% DB. /%I=lo 1 alla Scanned with CamScanner 17. Given that Y, = ¥ ‘ find the least squares estimate of 9. AS = 14 a 1 ‘A Dp. = A clinical psychologist finds that the relationship between the number of weeks spent in a therapy hospital {X= Hospital) and number of seizures per week (¥ = Seizures) is described by the equation: Y =14.09-0.91X. This is based on a sample size of 50 patienis and is associated with r=-0.93 Use this information. to.answer Questions 18 to 20. 18. The proportion of variance in Seizures accounted for by Hospital (i.e., the coefficient of determination) is A. 0.93 B. ,-0:93 ; 0.86 t D. -0.86 i 19, How many Seizures per week would you predict for a person who has been ‘at the therapy Hospital for ten weeks? 4 4.49 By 4.99 Cc. 14.09 D. 0.91 Scanned with CamScanner &K 20.1 you know that the Seizures variable. had a stan Geviation of 1.7, what would, be the standard érror-of . estimate? AK, 7.07 ‘ eye B. -0.93 eh Cc. 0.86 Wear - D. 0.64 7 B 3 : eat ye Suppose the 5. stem un = bo’ Byxiy + Boxia tot Arie * A Ya = By + Boxer + Ba%o2 + Bi Xan + €2 tn = Bo * Baan + Baka tot Bank +n is written in matrix notation as ¥ = XB +e, where E(c)=0- Use this information to answer Questions 2 21 to 28. 21.The matrix.X is called A, Design matrix B. Coefficient matrix | C.* Correlation matrix D. Singular matrix Ao ome . {fitted values, Yy in terms of X and . 22. What is the vector © . y? : A.) &xytx’y B, XEN x'y cc. k(x TY'X p. x(x’xy xy! 23. The dimension of the. matrix xi is 4 Ay nek) : EB. eDxI . : ©. . elxn i Dp. 1k(e+D Scanned with CamScanner 1°26. The least squares € estimate ‘of Bi is given ae i ae °24, Which two of the'vectars ¥:X.B.e have ‘the’ same dimensions? A. Bande B. YandX ¥ ands 2D. ceandX 25: The variance- covariarice matrix.of, 6 is given by SAL oh: BO me Dy Porxy! = (ky xy = X(QEX)IXY | _xpexyix! : 6 =(xxy xy! » = aa 8 oO By: 27-The hat matrix is given by. : the . A. (&K)'XY : ; : B. X(XK)I RY. a x(k) 1X! : (XK) IXY Beak. ; 28.. The hat matrix, denoted by H, is idempotent because AL Wel. | & Wen. C. n= jal “D. HSH Scanned with CamScanner model waoe (he mo' on vss Peka PO yeh 0 nswer Questions 29.and 30. usel . of yriate description | ‘of the model is The most approp! third order mo jnodel witit one predictor variable. rder model B. F second-or with one predictor variable. second-order model with two pr yredictor variables. « D. polynomial model. + BaoXn + BoXaXia + 30, The cross-product term, AaXnXa: is considered to be ' br 2 second-order term. B. a first-order term. C. a constant value. D. Adouble-order term. Scanned with CamScanner Fey eR) 1.2 ae SECTION B [30 Marks! Answer tivo questions only from: this section: s/s different regression models involving systolic Blood x pressure. .as . the response variable, Y, and three explanatory’ variables, namely: age (X,), smoking habit (X}, defined as X=), if 2 gimoker and: X,=9, if'a non- smoker), and: body . index, (X,), yielded le squares _estimates, 8; (i = 0.1, 2,3) ‘and a'composite partial ANOVA tables as shown below.’ | : : : Be Bee WE Be. 59,092 1.605 , - 48.058 1.709. 10.294 45.103 : : Source ss | 7. [Model | Regression | “1 3,861,630}. “| Residual.” | 30 :|2,564.338 | Model 2. Regression |. 2, |4,689.684 nh [Residual | 29 [1:736.285 ey Model 3."|Regression| 3 4,889.826 | mals ‘ i 2g, [1,536,143] 4a) -What:is the sample size in'this study? (b) Use Model 3' to predict, the systolic ‘blood ‘pres: fifty-year old smoker with body index 3.0...” » (e) Using the ANOVA tables, compute the three c determination.: fi E : (d) Conductan overall test for the'significance of Model 1. “ (e) Conduct partial F tests for, ogy xpihs end” Gi) XX Xe sure for. a efficients of 12 Scanned with CamScanner A aie ¢ that the random variables, 'y g. (2 suppers such that \ co? and that they are ur show that if Z; is defined as . : Z=Nn% >vel. = p Zy 2% #¥97 M3 =) by) +valy Ds = Vy +¥q #¥5-3¥o 6 GC Za =Yy+Yo 4Y3+¥q- 4¥5 B= Vir Yat Yat Ya #¥5s = : then reasonable to assume that, (b) ‘It is : EY)=O+9 ipere | a : Bln) = 9+ 26 pe] G hi | 8] ' B(y3)=-O+8 ns on Yj, Yo, ¥3 are the errors in observatio’ Use matrix and that independent, each with methods to . (i) obtain the least squares ¢s! hence ace h variance, o. timates of @ and ¢; and (ii) show that the covariance between the estimates is oe 7 nae oe (ii) What are the variances of Band ¢? 13, t Scanned with CamScanner 20. OP bs 22. In order to test for the significance of a full re; i i i I f gression model involving 14 independent variables and S0 observations, the numerator and ater degrees of freedom (respectively) for the critical value of F are A. 13 and 48. B. 13 and 48. Cc. 14and 48. D. 14 and 35. Which of the following is the best way to determine whether or not there is a statistical significance between two quantitative variables? Conduct a test of the null hypothesis that the population slope is 0. Conduct a test of the null hypothesis that the population intercept is 0. Calculate a regression line from a sample and see if the slope is 0. Calculate the correlation coefficient and see if it is greater than 0.5 or less than —0.5. If the ¢ ratio for the slope of a simple linear regression equation is - 2.48 and the critical values of the ¢ distribution at the 1% and 5% levels, respectively are 3.499 and 2.365, then the slope is A. not significantly different from zero. B. _ significantly different from zero at both 1% and 5% levels. C. _ significantly different from zero at 1% level but not at the 5% level. D. . significantly different from zero at 5% level but not at the 1% level. vom> The standard error of regression measures the A. _ variability of the independent variable(s) relative to its (their) mean. B. _ variability of the dependent variable relative to its mean. C. variability of the dependent variable relative to the regression line. D. average error that will result if the regression line is used to predict. Which of the following are components of a mathematical model? I. Parameters Il. Random variables Ill. Mathematical variables A. 1, Wand III B. [and II only C. Tand Ill only D. Iand III only Scanned with CamScanner toss a fair coin once, the odds of getting a “head” js 23. Ifyou A. 0 B. 0.5 c 1 D. None of the above 24. (in GHS1,000s) and the average percentage discount from list allowed to customers by salespersons. A 95% confidence interval o, slope calculated from the regression output ranges from 1.05 to 238. on this result, the researcher 4 A. can conclude that the slope is significantly different from zero g 5% level of significance. B. can be 95% confident that the effect of 1% increase in the ave price discount will increase weekly total revenue by bet GHS1,050 and GHS2,380. C. Has one chance in twenty of incorrectly concluding that the slo within the estimated confidence interval. D. All of the above are correct. 25. Autocorrelation refers to a situation in which A. successive error terms derived from the application of regres analysis to time series data are correlated. B. there is a high degree of correlation between two or more of independent variables included in a multiple regression model. C. the dependent variable is highly correlated with the indepen variable(s) in a regression analysis. D. _ the application of a multiple regression model yields estimate that nonlinear in form. +26. Which of the following statements is true? A. The error term in linear regression is normally distributed, but normally distributed in logistic regression. B. The error term in logistic regression is normally distributed, but normally distributed in linear regression. C. The error terms in both linear regression and logistic regression normally distributed. D. The error term in both linear regression and logistic regression is 1 normally distributed. Scanned with CamScanner on n=17 observations. s 1, 2 and 3 be 0.6818, 27. The model $= fy + Ax + Boat Ayxs is based t(s) is/are true? Let the respective leverage values for observation: following statemen' 0.8632 and 0.38550. Which of the I. Observation 1 is an outlier. II. Observation 2 is not an outlier. III. Observation 3 is an outlier. A. 1,1, UL B. Jand II only. x Cc. Mand III only. D. Tonly pendent variable y are +28. Which of the following transforms of the original dey plausible? I. Logarithmic transformation. IJ. Square root transformation. Ill. Reciprocal transformation. IV. Square transformation. A. 1,11, and Iv. B. Il, Il, and IV only. Cc. Land IV only. D. 1, UL, and IJ only. A model based on n=30 observations is given as: y=3-5x + 2x9 + 7x3 -X4, with observation 3 having leverage value of 0.2800. Use this information to answer Questions 29 and 30. 29. Calculate the average leverage value. A. 0.1667 B. 0.1852 Cc. 0.2000 D. 0.2222 30. Which of the following statements best describes observation 3? Observation 3 is an outlier with respect to the x-values. Observation 3 is not an outlier with respect to the x-values. Observation 3 is an outlier with respect to the p-values. Observation 3 is not an outlier with respect to the y-values. 7 pop> Scanned with CamScanner 22, Answer any two questions from this section A torsion spring of a mousetrap is twisted through an angle of 180°. torque versus angle data are given in the following table. (a) (b) © (d) © Torsion, T(N-m) | 0.110 | 0.189 | 0.230 0.250 Angle, @ (rad) | 0.10 | 0.50 | 1.10 | 1.50 Suppose that the data is regressed with the least squares regres: line T = 6, @. Estimate the value of ,. : [6 ma Suppose that the data is regressed with the least squares regres: line, T = 2p + B, 9. Estimate and interpret the least squares regres: line. i [8 ma Estimate the standard error of estimate for the least squares equal in part (b). \ [4m Test the significance of the slope in the least squares regression lin. part (b). [7 ma Calculate and interpret a 95% confidence interval on the mean va of Tif 9=1.0. [5 mar In a study by Yoshida (1961), the oxygen consumption (Y) of wirewot larvae groups was measured at five temperatures (2). The respo™ variable giving the rate of oxygen consumption per larvae group * millilitres per hour was transformed to 0.5 less than the common oP Another predictor variable of importance was larvae group weight (a) which was also transformed to common logarithms. The relevant comput? results based on the data generated are given in the following tables. Scanned with CamScanner Model 1: Y on x, Source af. SS Regression 1 0.0662 Soc. Residual 45 3.3398 —esidual 453.3398 7 L Model 2: Yon x, Source df, ‘SS Regression 1 2.7840 Residual 450.6323 —Nesidual 450.6323 Model 3: Yon X, and x, Source af. SS Regression 2 3.2104 44 (0.1956 Residual (a) The fitted regression model containing both X, and X, is given by F = 0.6835 +0.5917X, +0.0393X, Using this model, how much of a change in oxygen consumption would be predicted for a larvae group with fixed weight 1 if the temperature was increased from X, =20 to X, = 25? [6 marks] (bt) For a temperature of 20°C, calculate and compare the predicted values of ¥ for weights of 0.250 and 0.500. [6 marks] (©) Is temperature a better predictor of oxygen consumption than weight 'S, of vice versa? Explain. [10 marks} (@) Which s! a _— : fa.” el involving either or both .Y , and X, is to be preferred? [5 marks] the Change ; Odel invotys U8 in coeffigi f ination ;: Volving ca ra ‘icient of determination in going from a 20a mode] involving both X, and Ke? 1 a* [3 markel Scanned with CamScanner (a) Write down the simple linear regression model Y= Bo + B(x -¥)+5;, 1512.57, as a general linear model Y= XP +e, where the ¢; are independ and have a N(0,o”) distribution. Give the dimensions of each of X, B, and c. [8 mar Hence, quoting (without proof) any necessary results, find (b) _ the least squares estimates Bb and Bs [10 m (©) E(Ay) and BA); 3 mM (a) Var(Ao) 3 m= (©) Var(A) t B may () Cov(fo,A,) [3 matt (a) Write down the model for (@) multiple linear regression involving k predictor variables; and (ii) _ binary logistic regression involving k predictor variables. and clearly define the terms involved in each case. (6 mark: (b) Consider the following research question and explain why @ bina regression model is more appropriate than multiple linear regressio' model for answering it. “How does the smoking habit (i.e. smoker or non-smoker) of an individual relate to his/her gender?” J [3 marks Scanned with CamScanner © Three hundred and twenty-three graduate students in Linear Models were involved in a health survey. Their current smoking status was assessed, which we will predict with gender. Information was collected on age, exercise, and history of smoking. Our interest is on the association between gender and current smoking status. The following is part of the regression output. Coeff. S.E. z-value p-value (95% conf. interval) Intercept ;—3.059 0.324 -9.450 0.000 (~3.693 2.425) Gender { 0.968 0.455 _2.130 0.033 (0.0766 1.8593) (i) From the output, write down the logistic regression model in terms of log odds. (2 marks] (ii) Defining the primary predictor, gender, as , 1, formen ender = , ae 0, for women find the odds and probability of smoking for (a) women. (B) men. [14 marks] (iii) Calculate and interpret the odds ratio. [5 marks] Scanned with CamScanner yee UNIVERSITY OF CAPE COAST DEPARTMENT OF MATHEMATICS AND STATISTICS si 720 » .. END OF SECOND SEMESTER EXAMINATIONS by (2008/2009) i STA.408: LINEAR MODELS ~ S TIME ALLOWED: TWO HOURS o ; Answer all the questions in Section A and any { vo questions from Section B. | | Unless otherwise stated, take significahce.level to be 5% ifrequired. i ; as \ Scanned with CamScanner Answer atl Which of the foliowiag ematical eden |: Random variables i¢at variables — II: Peramete ‘ers A. Land (only B tand IW only Cc, Wand [ll only CB) Wand Il Suppose the system yy = By + Brn: + Bota +K + Bre + & yy = fy + Byxay + Box + K + Berg + & M M MM MK M M MM vn = By + Bixm + Baka + K + Bex + En itten in the usual matrix-notation as Y = XP +e, where E(e)=0. | this information (o answer Questions 2 to 9. te 2. The least squares estimate of fi is given by 3 A fe(Xxy lx 7 B=(X'KY'X'Y ah c. B=(x'xyly'X : 2 D. §=(X'K)'X'Y" 3. What is.the vector of fitted values, Y, in terms of X and Y? A. (XX X'Y X(X'X) XY C. XOERIYX D. X(XeXy xy! 4. The hat matrix is given by AS (XIXSUXY BL. XOX ERY. Ge? ROCKY TX! Scanned with CamScanner ematical ig ‘ f del) athematicat variables : Uk: Peremeiar tand (1 oly: Land IW only (Land Ill only br) 1, Wand Ill w> @- Suppose the system y= By + Bia: + Bota + K + Beta + 8 yr = fy + Bx + Bre + KR + Betre + & M M:. M M MK M M MM Yn = By #:Bixm © Pak t& + Bex + bn , where E(e)=0. itten in the usual’ matrix notation as Y = X{ this information to,answer Questions 2 to 9. 2. The least squares estimate of fi is given by AL ha(xxylx'y CB) B= (XX) TX'Y c. pacxxyly'x D. B=(X'K)'X’YT 3. What is.the vector of fitted values, ¥, in terms of X and Y? AL ROX, X(X/X)EX'Y CXR YK D. X(X!xyExAecE - 4. The hat matrix is given by A(X) XY B. X(XIXY' XY GB xevxy'x' D. (XXY IXY Scanned with CamScanner 3. The hat matrix, x denoted by A, is is idempotent because A. Wal (Bp) H? = C. H=|H| D..H=H' 6. The hat matrix, denoted by H,-is. symmetric because A. HH’ =I 2 BH? =H . Cc. H=|H} - @®u=n' 71. je matrix X is.called BBosig matrix B. Coefficient matrix C. Cortelation:matrix -. D. Singular matrix ‘& “The dimension of the matrix, X is y) nx (k +1). : cs é < B. (k+1)x1 “CC. (k+1) xn -D. Ix(k +1) 19." Which-two of the following Y Y,.X, B, and ¢ have the same dimensions? “A. Bande B. Y.and X & ¥ andé D. eand X ‘ A model which is based on 'n = 24 observations is given‘as: y= 121-0,050x, +2.917x - 0.0772). The leverage value of observation..19 is calculated to be- 0.3000. Use this information to answer Questions 10 and 11. 10, Find the average leverage valuc. 0.1667 B. 0.3000 C. 0.3333 D. 0.5000 Scanned with CamScanner statements is true?’ 4. Ooservation 19 is an infiuential outlier with SPECI to ity @ servation 19 is an oullier with respect to its sigs Nuential. S by _C, Oeservation 19 is an influential outlier with respec: to iis yo D. Observation 19 is not an dutlier with respect to both x 2a jag 1s [2 to 16 using the residual plots (a), (b). (c) and ig \& @ ) « Residuals e Reset Ns wee 7 Bilge ne fos aes a ee F ; Tine er Quests Tine \3. Which plot suggests that the independence assumption is violate A. Plot (d) : : B. Plot (c) Piot (0) <2) Plot (a) ee ¥13. Which plot suggests that the homoscedasticity assumption is IC Grn (4) “B. Piot (2) é C. Piot (b) D:Piat (2) Scanned with CamScanner \l4. Which plot suggests a-positive auto-correlation? i A. Plot (d) Fa - é B.Plot(c)-° : Plot (b) Plot (a) ‘IS. Which plot suggests a negative auto-correlation? A. Plot (d) lot(c) . C. Plot (b) D. Plot (a) 16. Which Plot(s) suggest(s) no-autocorrelation? A. Plots (d) and (6). "> B. Plots (c) and (b) Ex Plot (b) D. Plot (4) ++ Consider the polynomial regression model ¥-= a+ BX+7X? +6? where a, B and y-are constants, and use it to answer Questions.17.and 18, 717. How many predictor variables are'involved'in:this model? A3 . 3 B.2 @ D.None of the above 18. What is the arder. of the polynomial? 3 B.2 Ce D.O Below is the model fit obtained from regressing price on time and diam. Use the information to answer Questions 19.and 20. coef SE Coef T P cbeaeae’ ayers 6.8990 3,77 0.000 ‘Time 1.9309 0.1690 11.43 0.000 Diam 3.9579 0.8339 4.78 0-000 : DF ss MS F P repealee 2° 137172 68586 = 225 G.09 Error 56 17085 305 Total 58. +184257 - Scanned with CamScanner + 0,89907T ime - 3.77Diam f 1.9309Time + 3,9579Diam 9309+ OLE90TIme +1) .43Diam 3.9579Time +4.75Diam ; is the multipie coefficient of determination? A196% GB. 88.50% £. +26, Ou Consider the following linear probability model, P= Py Bait Bix HK + Bex, where P is the probability of one of two. possible outcomes; B, +0; X51 2K kK), can take any real vaiue. Use this information’ to ans Questions 20 and 22. 12h: What is the problem associated with this model? ‘There is no guarantee that p will satisfy 0 < psi. The regression parameters cannot be deterrained, unless k <3, C. The regression parameters cannot be determined, unless B, D. The regression: parameters cannot be determined, unless each a continuous variable, Write down the corresponding logistic model. af eats, nw ss Scanned with CamScanner A variable y is to be regressed aj: the vi metric is giver belo &S Ay itd xy. ‘alion to an rs 23, eee correlation coefficient benveen the predic, B. 0.91 C. 0.45 #087. ‘PA Given that there are n=15 observations, find the value of the test .Slatistic to test the significance Of the correlation between the Predictor variables, = A. 2.16 = Be haps =¢ be Bsr. Ge ne C. 0.14 a oh D. 0.87 Sra) Tao ms ee eet What is the critical value at the horton. ORG ‘Or variables? 25, @ 16 ~ be3b. H636° G04 2.4 D. 6.87 26. Which of the following statements is true? A.-If we reject the null hypothesis of no correlation between the predictor variables, then they are not correlated: B. Uf we fail to reject the null hypothesis of no correlation between the predictor variables, then they are correlated, If we reject the. is of no correlation between the predictor variables, then they are collinear. D. If we fail to reject the null hypothesis of no correlation bei predictor variables, then they are nol collinear. nthe Scanned with CamScanner 27.° Which of the, following is not a reason \ for transforming the original vatiabies in aregression ‘model? A. ‘To. stabilize the’ variance of ‘the ‘response yariable~ wien homoscedasticity assumption is violated. B. To normalize the response variable when the normality assumption is violated. C. To linearize the regression model when original data suggest @ model. that is nonlinear. @ To facilitate easy interpretation of regression model. Use the information in the table below to answer: Questions 28 (9°30. i Model R? 1 Se 0.954 2 = B+ Bx +B 0.912. 28. Find the variance inflation factor for-x | A. 0.954 21.74 Ce 08 D. 1.60 : 29. Find the variance inflation factor for xp. A. 0,91 . B. 5.94 1.36 9.83 30.-The variance inflation factors for the two variables suggest that’ the presence of the two variables in.a model would: A. cause singularity protlems. Bs cause homoscedastic problems. dusé multicollinearity problems. D.. strengthen predictive powers of the motel Scanned with CamScanner SECTION B Answer only two questions only from this see tion: L v1.) Suppose, you found the following pattems in cesiduel plots after fitting a sintple linear regression to (x, y) data. What do YOU suspect to be wrong in each case, and what would you do ty correct Residual Residual i C =. *« Predicted ¥ Predictor value | (ii) | Lb) Consider the multiple linear regression model, Y = XP +e, where S - E(e)=0. : (i) Show that B is an unbiased estimator of B. X= (ii) Show. that if f is used as an estimator of B when ia Ceci a E(e) = 6 (6 #0), then p is, in general, not unbiased fo: 6 AKC) Suppose S Y=O+4 Y.s20rG+e, Y=0+2 +65 where é; (i= !,2,3) are independent with E(e,)=0 and V(e)eo°. v4 (i) -Find the least squares estimators of @ and ¢. Ve Gi) Hence find the variances of the estimates. chiatrist wanted to know. whether the level of patholog Hf ic patients six months. after treatment can le predicted swith ; fa de! s @-(reatment sz oft reasonable accuracy from knowledge of pre- aument Symptom g \ thinking disturbance (4’,) and hostile-suspiciousness (X2). Scanned with CamScanner The ANOVA table below summarizes the results of the analysis: af SS H 4 tT . 160 a alk, 11,2386 FX [Xp Te 2,634 Residual_50_. 11,008 |[Residual__50_._ 11,008 ° i‘ (a) The least squares equation involving both independent variables is OS: given by P =~0.628 + 23.639. -7:147X>. Using this equation, determine the predicted level of pathology. (Y) for a Patient with predetermined. scores of 2.80 on thinking disturbarice and 7.0 on hostile suspiciousness. + 5 (b) Using the ANOVA tables, carry out the overall regression F tests for models containing (3) both X, and Xz; (ii) X alone and . (iii) X3 alone. se (c} Carry out the partial’ test for (i) ¥y given 47 in the model; and AN SS (i) A} given xy in the model. Y id) Based on your results in parts (b) and (c), how would you rate the two a variables in order of thie:importance in predicting. ¥? 4 (e) What are the R? values for the three regression medels in part.(b)? YD What: is the best model involving gither one or. both. of the 7 independent variables? 63. Ina small-scale study of the-relation between degree of brand liking (1) and moisture content (X;) and sweetness (>). ofthe ‘product, the following results were obtained (data are coded): 5256 8 8 Qe 83 96 J 7c Scanned with CamScanner <7 Assume the model j= By + Bp Xy + 8a Ny Ormal error gems is appropriate. Using niatrix methods, find: Yo the vector of estimated regression coefficients and bene Nine model for the data; « Vv, (b) the vecior of residuals; end ee op % (c) the residual sum of squares. ¢ : A J.(a) Write down the. model for « second-order polynomict regre involving nvo independent variables. Explain all the terms in the model Why is the mode! considered as a non-linear model? Transform the model into a linear tnodei. i (b) By transforming the model, p=ax", into a suitable, linear modes. 5 find the least squares estimates of a and @, using the following daia. : A : ee | 50 100 250 500 1000 eel een eet gy Pg $3... 28 8 Using your results, find the value ofy when a = 300. We = |n& +\n Scanned with CamScanner es APE COAST DEPARTMENT OF MactEmatstics AND STATISTICS END OF SECOND SEMESTER EXAMINATIONS (2009/2610) 4 8 dpb _ nad Scanned with CamScanner ALIN ALOU MARKS] Answer all.the questions in this section. x.vare three predictor variables. ont Factor of 3 for Xt amples that £ 2. Suppose Xp, XQ an Then a wriance’i cused to ther or not thée-observaticn'is an outlier th-its x.and-y values. udentized residuals can, be used T e of an outlier with respect to its Scanned with CamScanner 7. A modeli is? "based on: ps3 5x, £2x, A 1x3 — x44 Bx. 293 observations ‘and -given as , Obseryation:5: has. aileverage velue’ ‘of 0:2: is an outlier. The correlation between x,and x, is r, both are data cons: answer Qi esHons 8.to-10. 8. The value of the appropriate test statistic is 6. 1367. ec: 9. The corresponding t-valie from statistical’ tablesii is y 2.16. , 2 10. The two variables would not cause:..collinearity ‘ 30,stlect the most, A Boxart Bata tt Bema En ‘sual. matrix:notation-as.¥.=Xp+e, where tLe The dimerision of the vector PB is A.nxl Bink (k+)) Scanned with CamScanner (ap * (b) — Residuats Chr A Oey < (d) — Resicinats \ Scanned with CamScanner Use the residual plots above to answer questions 15 to 18 15, Which. lot suggests a violation of the iridepéndence 16. Which plot Suggests that the. homoscedasticity assumption.in regression analysis holds? i A. Plot {d) B. Plot (c) C. Plot.(b) AY Plot-(a) : 17. Which. plot Suggests® that) :the independence assumption in regression: analysis holds? A: Plot: (d) B: Plot (c) - 2 Plot fs - Di Plot: (a). : . tiggests. that. thé. normality 1 § holds i A. To stabilize the variance of:the resp wheit homoscedasticity: assumption" is.viclated. B: To" ‘normalize.- the": sporise ariable: ‘when normality assumption is vio C. To linearize the regression medel data suggést a model that is noniix _&. To facilitate easy interpre’ model, Scanned with CamScanner 99396504 1-87 -1.24 0.245), 4.63 0.000 Beye 3062643: 893%; — 2960 Ix, +86.5234. D. y= 30626 4+3x, 49x, +125 : Which .of the following staternents is true abdut the’ model? i “A. ? The ‘model is-notsignificant at Ir hé-modél is, significant at 1%, °C: The'modelis not significant at 67.59%. D.. ThE nee significant at S%. he multiple coefficient of determisiation? in te"ANOVA table is used to test Scanned with CamScanner 24, Which;variable(s) is/are significant at the 9 - "gf significance? Shificant at the 9, The number of smokers in a sample of 12,000 workers is -2,400. Use this information to answer 26 to 28. 26. What proportion of workers are smokers? AL 0:25 B. 0.8 0.2 ae DB. 4.0 27. What proportion of workers are non smokers? ALS B08 C.0.2 D. 1.0 28. Find the odds of a worker being a smoker. A. 0.25 B. 0.8 0.2 D. 4.0 Scanned with CamScanner Exppose Y =%+Ey Yq = 20 +4 +€,and ¥3 = +20, te, where -the € (i=1,2,3) are independent Me) -g?. Use this information to ain. ver O 29, This system of equations can be written: in matrix form as y 10] 4 % A i ale. % 1 Xi | y Yo }=|2 af FB 2 % re Ly We c; Ac 2] alt % 4] ; 21 Te . |e 1= b ‘ nly ; 4 ‘ 30. Given that f= ie Gaus 5 3) 5 —F¥+ Sie AG ty igh 4, w2y_ly y4y |’ Dist Tr find the least squares estimate of 6. “2 5 Is KI ape Ble xk Scanned with CamScanner V SECTION B [60 MARKS] + Answer only dvo-quest, fons from this section (a). Vie down: the ! a ormulae. ‘for’ finding boinit:- estimates “for the cients in: your model given in part (a) B ‘transforming the model, y=ax/, le ‘linear ‘model, find’ thé least ates of'a and f, Using ‘the followi: “eS) (X2), and the rate of uncmployment Gyn a of 20 town was involved in the study. (a) The“ANOVA tables below. are based < - ssions involving eacti:.two-variable’é. of.the three-independent. variables out the overall regression F test fo: variable regression model and the p conéerning the addition of the sec given that the first is already in the m: [|__ Yon ¥ and %) Scanned with CamScanner gig i Yon X, and X3 Source_| df “SS x3 1 Residual] 17. | 377.82 (b) Based on the results in part (a), which two- variable model would you recommend? (c) Calculate the R? values for ead +=-tWO-variable model above and relate the results. to the con. clusion in part (b). (d) The ANOVA. table below involves all three independent variables. Test whether :the addi- tion of X; to a model with X2 and: X5. already Ys Source 2dfi| Regression Paka | (%1, Xp, 3) eee | Residual 16 (e) Determine and comment on model that “inelidés fall” thr variables. The table below contains data for. experiment on the effect of work crey Scanned with CamScanner _ “Trial - Crew size ‘Bonus pay Crew 2 42 = fae 3 2 43 3 51 2 43 2 53 3 61 3 60 Assume’ = ee the model Y= fy+hiXqthaXint< mal error terns is appropriate. Usir. methods, find: x (a) the! vector of -estimated regression cco nice the Sttaiglit-line model for the als; and eer- 9 “ esidual sunt of squares. MF xe 4. (a) Write down the models for multip! ssion and bin: logistic regressi terms in each model. ie the following research question (b) Consider “How does the concentration of an Insecticide relate to whether or not an insect is killed?” Why is a binary logistic regression appropriate for solving this question: multiple linear regression model? (c) In. a. survey of 17,096 students in colleges, the researchers were intere mating the pro ortion o frequent binge a king five or more Scanned with CamScanner more times in the Past two weeks Was as a frequent binge drinker:), They frequent binge drinkers in a Sairiple W to be 3314, It, WAS..als0 four thats male students were fcetient biulge whilst the ioe ne figure for Was Zea 22: 70 % of: the rings ers * female (i) Find the odds of a student being a frequent binge drinker and interpret the result. {ii) By finding log(Odds) for males and | song for fernales, fita simple binary logi shes Teg sSsion model of the f form, i tbat 22 =B, +BS, ie i =p tale dri sy ng the Proportion, p, of frequent. binge’. crs as the response. y, (S=1, men anid -s- oy as Scanned with CamScanner i AE ap Tus UNIVERSITY OF CAPE COAST is DEPARTMENT OF MATHEMATICS AND STATISTICS. ae END OF SECOND SEMESTER EXAMINATIONS , 2 (2011/2012) . tn Ay STA 408: LINDAR MODELS. cy < ea er — f TIME ALLOWED: TWO HOURS uv jl» w : This paper consists of two sections, A and B. ‘ Ww : Answer all questions in Section A and any two from Section B | in the answer booklet provided. : i ; i . e [Unless otherwise stated, take a=0.05, if required]. t “ \ Scanned with CamScanner SECTION A [30 Marks] Answer all questions in this section. Zach ofthe items 1 to 30 is followed by options lettered 4 to p 3elect the letter that corresponds to the correct or best option, Consider the polynomial regression model ¥=a+ by + rq +x and (seit to dnswer Questions I and 2." * . How many predictor variables are involved in this rnodel? A ees B. 2 Cod None of the above. 2 What is the order of the polynomial? A,. First-order ‘B.. Second-order ©) Third-order 0. None of the above Which of.the following. are components of a mathematical . model? . : {2 Netean variaias Mathematical variables Ill: Parameters A, Vand Il only B. 1 and IU only land Il only Tl and 1 nodel based on n=23 observations is given as: YR3~ Sx, 42x) + 7x, ~ x, 43x, servation 5 has a leverage value 5. of 0.245, is infe ion to swer Questions 4 and 5. of 0.245, Use this information Scanned with CamScanner J 4. Calculate the average leverage value for the model: A. 0.245 0,522 GY 0.261 DS WA Which of the following statements can best be deduced fre the information given? Observation 5 is not an influential outlier with-respec its x value, B, Observation 5 is an influential outlier with respect to value. C. Observation 5 is not an influential outlier with respec its y value : D. Observation 5 is an influential outlier with respect to value e following matrices is symmetric? 0... Consider the following linear-probability model, : Pat Bet Br tet AX, where.p is the probability of one of two possible outcomes. Use information to answer Questions 7 and 8, 3 Scanned with CamScanner What is the problem associated with the models Restriction 0< ps1 is not guaranteed on HS of equation, There is no ervor term-in the equation. - Cc. We should use So instead of a. - Di All of the above 5 Why is the logistic model an improvement on the probability model? _ : A. Calculations involved in finding estimates for a, By, are easier. : 8. [thas more parameters, &. It allows the error term to be included. 3%) It removes the restriction on pin the probability model. Be following is a MINITAB output of a regression analysis of a un data, Use this information to answer Questicns 9 to 12. “Analysis of Variance ’ Source DF 8s MS F Pp Medel 3 20156028920 6718676480 67.59 ‘0.000 Error 9 894568512 99396504 otal 12 21050597376 Regrassion Analysis Predictor." -:Coof StDav t-ratio Pp Constant 30626 19808 1.55 0,156 dey 3.893 2.081 1.87 0.094 xy “29607 23822 71,24 0,245 wy xy - 86.52 18.69 4,63 0.000 ite down the model relating the response to the predictor iables. yw [SS 1.87%, = 1.24%9 1 4,63x, yes 19808 +2.081x, 23822 +18,69x, 306264 3.893x, - 29607, +86.52x,° U #2 0.156 + 0,094 + 0.245x, eh Scanned with CamScanner 0. e culate the coefficient of determination for this model. ) 0.9575 By 1.0444. Cc. 0.0148 i D. 0.7595 411: What can you say about the significance of the model?’ A, The model is not significant at 1%. if (BY The model is significant at 1%, CG. The model is not significant at 67.59%. DB. The model is not significant at 5%, v7 Which- -ofthe...following:.statements is/are predictor variables? L.. Variable x, is significant at 5%. (I. Variables ‘x, is not significant at 5%. di. Variables x is highly significant, ever at 1%. A. . only payin “pe-ihnhy tilt land If only . 5 ‘Tl and Il only i. , Consider the usuat matrix nota model Y=Xfte, where E(e)=0..Us Questions 13 to 15. ‘ b/s) ¢ least squares estimate of B is given by FAY) ia (XX)'X'Y ; B pexcsxy XY (or fax(X'K) x! Dp. pecexy xy 14. The hat matrix is.given “by ‘A, (AX)TXY B. xX) XY CY x(K'xy x’ cet true about tion of the multiple linear regre e this’. information ‘to; a7 a Scanned with CamScanner ‘The hat ‘matrix, denoted by H, is idempotent because phe Bee: H’=H H =|] De Mem! * . The adjusted muttiple coefficient of determination, R%,, is an’ improvement -on the multiple coefficient of determination, R?, because unlike R?, Ay it is very easy to calculate, : ~) it is not. unduly inflated as a result of the inclusion of ~ variables that may not have any predictive power. C. itis robust to effect of influential outliers. ‘'D. - it reveals multicollinearity problems. 2 -multipie. coefficients of determination, Rj, of the. models, = fy + fry t Ayr, and x, =f, +f. +f,x, are respectively, R? =0.954 dR} = 0.912. Use this.information to answer Questions 17 to 19. Find the variance inflation factors (VIFs) for x, and x,. A. > VIR, 20.954, VIF, = 0.046 VIF, 50.088, VIF, = 0,912 VIF, =21.739, VIP, = 11.364 VIE) 23.107, VIF; = 3,241 the VIFs for the two variables suggest that the presence of the two variables in a model would 7; CUse singularity problems. Bo cause homoscedastic problems. 3} cause miulticollinearity problems, D.., strengthen Predictive powers of the model. Given thatthe Vr for %. is 2.100, find the.mean VIF. AL”10:333 $622 G-9.483 11.734 : A Scanned with CamScanner ye pare’ © ee ee A WDE Usual notati oti SSE(Full model)/{ -(k+1)) = S8B(Reduced model) SSEFull mode k—g)- SSE(Pull model)/((k+1) : SSE(Reduced model) - SSE(Full, model) (k~ g) SSE(Reduced model)/[n -(-+1)] Spe SSE(Reduced model) ~ SSE(Full model) /(k- g) Fe a SSE(Full model)/fn—G+D]_ sivary logistic regression ‘model relating the probabilit A. 4 sei : us Pr sssfully identifying the sex .of examination candidates by iat gene to the sex, s, and experience, x, of the examiner is iby 4 : oe) + Ast Pax. ‘table below shows portions of the MINITAB output for this lel, which is based on some data collected. : Predictor Coof stDev P Odds Ratio. constant 0.4270 0.1080 0.000 : Sex, 3 -0,.1028 0,1390° 0.460 0.90 Experiance, * 0.0090 0,0088 0.308 0.99 2this information to answer Questions 21 and 22, ‘The odd ratio of 0.9.suggests that A. males are marginally better & handwriting of examination can! females are marginally better handwriting of examination can Sex is highly predictable. Sex is normally distributed. E than females jn identifying didates. i than males in i didates. © identifying Scanned with CamScanner J? 22. oe a suitable reduced model to estimate the Pprobat 0.605 ‘BH. 0.308 C. 0.427 D. 0.460 Supposé the system N= Bot Pix + Boxy toot Dery +6, 2 = Bot PX +A tet Beton +b ai = fo Poem, +P treet Bein Ey is avritten in matrix notation as Y=XP+e, where E(s)=0 information to answer Questions 23 to 27. v “es matrix X is called .. ‘Design matrix B Coefficient matrix P C. -" Correlation matrix = Ta D. Singular matrix V4: What is the vector of fitted values, %, in terms of X an AL (XX) x'y Ge X(X'Xy IXY CU xerxyyx D. xcexy ey! \/ 25. The dimension of the matrix X is nx(k+1) (k+)x1 Co (k+)xn D. Ix(k+1) ; of Which two of the vectors Y, X, Be have the same dimen A. Bande B. ¥ andX Pa ~) Yande - D. éandxX Scanned with CamScanner the covariance of Bis given by ol , 0 3. 2 eb Py) o2(X'X)T {ssurhing usual notation, which oft ee xpression for the studentized aie he following is the Correct ‘ dual PT enlemclune) ff 1-leverage value a ps) sy/1=leverage value = Pluelhree) ak = Paderhrer! eleleted —— yl ~ leverage value ‘ook’s distance for.a- certain observation is less than (e+), {n-(&+D}], where ‘n is the number of observations (k+1) is the number of model parameters; then the ration is not an outlier an outlier but not influential &n outlier and influential ot an outlier and not influential that the number of smokers in a sample of 12,000 te is 2,400. Find the odds of a worker being a non- £35 , es 90" 20 oe we’ — $9 i Simekons a ni Pp Scanned with CamScanner — Answer any two questions from this Section 3) \Z1. (a} Suppose you found the following patterns in fees S after fitting a simple linear regression to (x, y) aa you suspect to be wrong in each case, and aren a. do-to correct it? wd Residual Residual ; . «i : Cinceeanry romance) (b), Consider the multiple linear regreSsion ‘model ¥ where E(s)=0-1 ans : == yo Show that f= (X'K)'X’Y is an unbiased estimator }, Ai) Show that if 8 {as defined above) is used as an es! Sof B when in fact E(c)=5 (6#0), then f is, in not unbiased for B. a f \ JO), Suppose that ee eee on Ss Rabe 1 * HeOrpre Gis y : Had+2B+e, Ye; oo os where ¢,(/=1,2,3) are independent with a6,)=0 Ve) =0?. J (i) Find the least squares estimators of 9 and ¢. “i (ii) Hence find the variances of the estimates. Ge A Scanned with CamScanner ~ , : O)ytte down the model fi Uevite ; 1 Or a second-o; Lae involving two predictor variables, ee tial ' i a terms involved. Is the model considered a line; ence linear model? Justify your answer. If the macy O20 jinear, transform it into a linear model, - odel is non- By transforming the model yeax® int ‘i i model, find the least squares estimat fae pe a ee tes of a and 6, using x | 50_100 250 500 1000: y |108 53 24 9 5 Using your results, find the value of y when x =300. a Tire following are the residuals‘résulting from a simple linear regression fit to some data: [33 69 05 5.3 33 | 33:3 3.5 2.1 -5.1 -09 COI Construct a normal plot of the residuals and use it to verify whether or not the normality assumption is violated. Sketch a suitable plot to show that the i). independence assumption in regression analysis is "violated with negative autocorrelation; (ii) homoscedasticity assumption in regression analysis holds; and : 2 (iii) functional form-of the regression model-is incorrect. Give a short description of the main features in each case. i i i i tolic blood vee different regression models involving sys oad ssure as the response variable, Yr and ‘three pene ables, namély:[age (x,)} [Smoking habit (x}] defined 48 =, agmoker and x, =0, if & nonesmokers and{ body index (| as tial ‘Wed least squares estimates, B, #0123) and er \ ‘OVA tables as shown, Scanned with CamScanner ' | | Modet| Variable iri Model | | 1. 1.4 or aaa mee] 8 ANOVA:Tables ¢ i Source é poh Regression] 1 [3,861.630 ay : Residual _~}: "30° |2,564.338 beers! 6 495,.0C9 eae oe moder d° #75967 Pie \ i is . [Source Dey Shy (| Regression on, Ki OC [Residual “| Using model 3, what-is the pre SPS FF % Model 3 A (“Source | DR. | 8S ; ot alk, 20) Regression} 3 |4,889.826| zool¥a *s a Residual: 28 |1,536.143]° : : 628-964 : dicted systolic blood pr essv. for fifty-year old smoker with body index of 3.0? *Caleulate and compare the multiple coefficients detertiination for the three -models. «Test fot-the significance of Model Li Carry out partial F tests for (ii), the addition o: b Le: ery ty i) the.addition of x, to the model given x; and f x, to the model given x, and 4. % lla Scanned with CamScanner | Ate. Sie Fees UNIVERSITY OF CAPE COAST : DEPARTMENT OF MATHEMATICS & STATISTICS END-OF-SECOND SEMESTER EXAMINATIONS : (2012/2013) £ (ershen \e { STA 408: LINEAR MODELS © 2 Hours 30 Minutes : gime Allowed: b: This paper consists of two sections, A and B. Answer alt , the questions in'Section'A’and any two from Sécgjon B ‘ Ss : stated, use the fevel of significance, {Unléss otherwise . 0 =0.05 , if requited} Cael Scanned with CamScanner SECTION A [30 Marks] : Answer all the questions in this. section Each of the items 1 to 30 is followed by options lettered A to D. Select the letter that corresponds to the correct or best option. 1. What is the meaning of the term heteroscedasticity? A. ‘The variance of the errors is not constant. ~ (B) The variance of the dependent variable is not constant. ©. The errors are not linearly independent of one another. D. The variance of the errors is a constant. 2. Which of the following are plausible approaches to dealing with a model that exhibits heteroscedasticity? (i) Take logarithms of each of the variables. (ii) Use suitably modified standard errors. (iii) Use a generalised least squares procedure. (iv) Add lagged values of the variables to the regression equation. A. (ii) and (iv) only B. (i) and. (iii) only (i), (ii), and (iii) only — D. (ij, i), (Hi), and (iv) e 3. Which. of the following indicates that residuals follow a negative autocorrelation? A. Acyclical pattern in the residuals ©) An altemating pattern in the residuals C Acomplete randomness in the residuals D. Residuals that are all close to zero 4. Which of the following are plausible approaches to dealing with residual autocorrelation? -(i) Take logarithms of each of the variables. (ii) Add. lagged values of the variables.to the regression equation, «—. (iii) Use dummy variables to remove outlyi i : S lying observations. (iy) Try a model in first differenced form rather than in levels. (ii) and (iv) only B. (i) and (iii) only C. (i), (ii), and (ii) only D. (i), Gi), (ii), and (iv) Scanned with CamScanner 2 5. Near multicollinearity occurs when A.. two or.more explanatory variables are perfectly correlated with one another. +B. ‘the explanatory variables are highly correlated with the error term. G,, the:explanatory variables are highly: correlated with the, response variable. a) two or more explanatory variables are highly correlated with one another. 6. Which one of the following is not a: plausible remedy. for. multicollinearity?. A... Use principal components aiiabete B. Drop one of the collinear variables Use a longer run of data : (D) Take logarithms of each of the variables 7... Which one of the following is not’ an example of mis- specification of functional form? A. Using a linear. specification when y’changes as a function of the squares ofx.— ~ Z B. Usirig a linear spécification when _a.double-logarithmic model - - - would be: more appropriate. °C. ‘Modelling y as-a function of x when in fact it changes as a fuinction of 1/x ‘2D Excluding a relevant variable from a linéar regression model, 8.: ‘Which “of the following would be a’ ‘plausible Te onse to » finding residual non- -normality? .* ~_ aA, Use a logarithmic functional form instead of a linear-one. i .B. Add lags of. the variables on the: right hand Side of the Ys “regression: model.’ Estimate the ‘model in first ‘differeticed fori ae CG. os Remove any large dutliers froin’ ‘the data Benes fe ‘ parsimonious model is ‘one'that A; incliidés too many variables. A i includes as few best set variables. . is'a well-specified milodel. cz ! Disa mis-specifed model. | PON oge Ue ee Scanned with CamScanner 7 10. Which of the following is not true? “as A. The point (X, #7) always lies on the regression line. B. The sum of the residuals is always zero. The mean of the fitted values of Y is the same as the mean of the observe values of Y. : D. There are always as many points above the fitted line as there. are below it. 11. The least squares estimator of the slope is unbiased.. This pmeans that AQ the estimated slope coefficient will always bé-equal to the true parameter valit . : B.. the estimated slope coefficient’ will get closer to ‘the true ‘Parameter value as the saniple size increases. C.. the mean of the sampling distribution of the slope parameter is zero. : if repeated sample of the samie size are taken, on a average, their value will be equal to the true parameter, .: 12. Consider a multiple regression model with the usual assumptions. If one Predictor-variable is eliminated, which of. the following is correct? The explained variability will increase. c The unexplained variability Will increase or remain constant. The total variability will increase. - -” The-coefficient of determination will increase. ya - 13.1fin a multiple regression model the coefficient of determination is equal to 1, then: As fo is equal'to 0. { B. the explained variability is equal to the unexplained variability. i : : the unexplained variability is equal. to 0. : ) . the explained variability itiust bé1. A dlinical: Psychologist finds ‘that the ‘relationship between the number of weeks spent in @-therapy hospital (X = Hospital) and number of-seizures ber week Scanned with CamScanner }..The proportion of variance. in’ Seizures’ accounted fer by Hospital (i.e.; the coefficient of determination) is 2 A. 0.93 : 0.93 0.86, . -0:86 ; How many ‘Seizures per week would you ic predict for who has been at the therapy Hospital for ten weeks? eee A 4.49. .° . 4.99 oa 2 ; : 14,09 D. 0.91 : 5. If you know that ‘the’ Seizures variable had a standard deviation of 1.7, what: would be the standard error of the estimate? : 7.07 mee B. -0.93 © Cc. 0.86 D.. 0.64 7. A regression analysis is inappropriate when A. you have two variables that are measured on an interval orratid-scale. ~ 5 7 “B. ‘you want to piake predictions for one variable basedon --'- jnformation about another. variable. : e pattern of data points, form a-reasonably straight line. the cedasticity: inthe scatter plot’. 8 If we tise more than one predictor to predict.our criterion at “the same time! “005 eg PRE Oy re) we are no longer doing’statistics. : Dee aks sion’ and: there-“are.- we are’ conducting. multiple regres f determining ‘which:.predictor is the ‘established ways © best.. a 4 Cr we are ‘conducting miultiple régression and. there are no established ways of determining which’ predictor is the pest: 4 a D: ‘the predictors will always. suffer from jeteroscedasticity- € ; S Scanned with CamScanner 19. The line described by the jeast squares regression equation ' attempts to A. pass through as many points as possible B. ‘pass through as few points as possible. C. minimize the number of points it touches. minimize the squared distance.from the points. 20.1 am trying: to construct - the» regression equation ° for predicting grocery bills (Bills) from number of people’ in a household (Number). I have calculated the slope to’ be 33.57 and I-know the mean Bill is GH¢132.71 and the mean Number‘is 2.71. What is the value of the intercept? A. 132.71 i B. 33.57 : 41.74 2 res A tees ‘0.00. 21.The standard deviation is to the ‘mean as the --- is to the regression line. A. z-score B, SSR C. coefficient of determination standard error of the estimate .. 22.The hat matrix, H, transforms the vector .of. the observed response values, Y, to the: vector of fitted values, Ye True . False 23. The. covariance. matrix of the edtimated regression Coefficients, C= GX'X)", is a symmetric imatrixt whose. didgonial ‘elements, ly) represent the “variance A i estimated: jth regression coefficient, : By. The off- elements, ¢y, Tepresent. the coyari: timated Tegréssion ope ficients, ‘and A. \ True “Bz ‘False Scanned with CamScanner

You might also like