Professional Documents
Culture Documents
UNIVERSITY EXAMINATIONS
2015/2016 ACADEMIC YEAR
THIRD YEAR SUPPLIMENTARY EXAMINATION
FOR THE DEGREE OF
BACHELOR OF SCIENCE IN
APPLIED STATISTICS WITH COMPUTING
COURSE CODE: STA 328
DATE: TIME:
INSTRUCTION TO CANDIDATES
SEE INSIDE
Page 1 of 12
1. The paper comprises 7 questions.
2. Attempt questions one and two (compulsory) and 3 other questions (13 marks each)
3. Electronic, Scientific calculators may be used.
(i) Test the significance of each variable as it enters the model. [6 marks]
(ii) Test H0: β1 = β2 = 0 in the model. [2 marks]
Page 2 of 12
(iii) Why can’t we test H0: β1 = β3 = 0 using the ANOVA table given? What
formula would you use for this test? [2 marks]
(iv) What is your overall evaluation concerning the appropriate model to use
given the results in parts (i) and (ii)? [1 mark]
(c) Consider the model
Find the least squares estimators ̂1 and ˆ2 using matrix method. [3 marks]
(c) The usual assumptions placed on the error terms in ordinary least squares
regression are:
• Independently distributed
• Identically distributed (equal variance)
• Normally distributed
Which of these assumptions are violated when dealing with binary response
data? Explain briefly how each is violated. [3 marks]
(d) The results below are the estimates coefficients for a multiple logistic regression
model using the variables AGE, weight at last menstrual period (LWL) and
Number of first trimester physician visits (FTV) from a given data set.
Log-likelihood=-111.286
(i) Write down a multiple logistic regression model for the above case and
interpret it. [4 marks]
Page 3 of 12
(ii) State the corresponding logit expression. [1 mark]
(iii) If the log-likelihood of the model after excluding AGE and FTV from it, is
-111.630; test whether or not it is advantageous to include these two covariates
in out model. [3 marks]
To fit the mode, values of y and x were measured for each of 30 human subjects.
Denoting the amount of immunoglobulin with IGG and the maximal oxygen
uptake with MAXOXY,the following are R-program outputs:
Output 1: Scatterplot
Page 4 of 12
Output 2
Call:
lm(formula = IGG ~ MAXOXY)
Residuals:
Min 1Q Median 3Q Max
-228.16 -79.96 -11.78 83.75 211.93
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -100.345 100.450 -0.999 0.326
MAXOXY 32.743 1.932 16.947 2.97e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Output 3
Analysis of Variance Table
Response: IGG
Page 5 of 12
Df Sum Sq Mean Sq F value Pr(>F)
MAXOXY 1 4472047 4472047 287.21 2.973e-16 ***
Residuals 28 435982 15571
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Output 4
Analysis of Variance Table
Response: IGG
Df Sum Sq Mean Sq F value Pr(>F)
MAXOXY 1 4472047 4472047 394.827 2.2e-16 ***
MAXOXY2 1 130164 130164 11.492 0.002165 **
Residuals 27 305818 11327
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Output 5
Call:
lm(formula = IGG ~ MAXOXY + MAXOXY2)
Residuals:
Min 1Q Median 3Q Max
-185.375 -82.129 1.047 66.007 227.377
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1464.4042 411.4012 -3.560 0.00140 **
MAXOXY 88.3071 16.4735 5.361 1.16e-05 ***
MAXOXY2 -0.5362 0.1582 -3.390 0.00217 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Page 6 of 12
(iii) Interpret the estimates. [ 3 marks]
(iv) Is the overall model useful for predicting IgGy? Use 0.01 . [2 marks]
(v) Is there sufficient evidence of concave downward curvature in the immunity
fitness level? Use 0.01 . [2 marks]
(a) Obtain analysis of variance table that decomposes the regression sum of squares
into extra sum of squares associated with X2; with X1 given X2; and with X3 given
X1 and X2. [7 marks]
(b) Test whether X3 can be dropped from the regression model given that X1 and X2
are retained. Use 0.05 [3 marks]
(c) Compute the Coefficients of Partial determination r 23.12 .Comment on the
Dose Level 1 2 4 8 16 32
log2(Dose Level) 0 1 2 3 4 5
Dead or down 1 4 9 13 18 20
Page 7 of 12
(a) In general, describe the relationship between the dose level and the proportion of
male moths dead or down. [1 mark]
(b) What is the observed proportion of dead or down at dose level 16? [1 mark]
(c) What are the observed odds of dead or down at dose level 16? [1 mark]
A logistic regression of the proportion of dead or down on the log2(Dose Level) is run.
Below is the summary from R-program.
Output 6
Coefficients:
Value Std. Error z value
(Intercept) -2.818555 0.5479524 -5.143796
log2dose 1.258949 0.2120484 5.937086
Page 8 of 12
(h) According to the plot, if you wish to kill or knock down 50% of the males what dose
should you use? [1 mark]
(i) According to the plot, if you wish to kill or knock down 50% of the females what
dose should you use? [1 mark]
Consider N independent binary random variables, Y1, Y2, ..., YN such that
Page 9 of 12
Y (1 )1Y
i i
where Yi=0 or 1
(a) Show that this probability function belongs to the exponential family of
distributions. [2 marks]
(b) Show that E(Yi)= 𝜋𝑖 [3 marks]
(c) If the link function is defined as
exp( X )
T
g ( ) log , where
1 1 exp( X T )
exp( X ) [3 marks]
T
log
1
Each autumn, individuals (especially older persons or the chronically ill) are
encouraged to get a flu shot. Fifty persons are selected at random from a health clinic
client list and asked if they actually went to get a flu shot. A client who got a flu shot
has a response of Y = 1, if no flu shot, the response is Y = 0. Other data collected were
age (Age) and health awareness (Aware), for which higher values indicate greater
awareness.
Simple logistic regressions were run on Age and Aware separately
Output 7
Page 10 of 12
Coefficients:
Value Std. Error
(Intercept) -6.57492 2.12560
Age 0.13302 0.04439
Null Deviance: 68.03 on 49 degrees of freedom
Residual Deviance: 56.08 on 48 degrees of freedom
Output 8
Coefficients:
Value Std. Error
(Intercept) -7.39019 2.09332
Aware 0.13486 0.03884
Null Deviance: 68.03 on 49 degrees of freedom
(a) In the model with Age alone, is there a significant lack of fit? Is the variable Age
significant? Use tests based on deviances to support your answers. [4 marks]
(b) If you were to pick only one variable, Age or Aware, to model the binary
response of whether a client received a flu shot, which one would you choose?
Support your choice statistically. [3 marks]
A multiple logistic regression was run with both Age and Aware in the model.
Output 9
Coefficients:
Value Std. Error
(Intercept) -21.58213 6.33966
Page 11 of 12
Age 0.22175 0.07360
Aware 0.20348 0.06206
Null Deviance: 68.03 on 49 degrees of freedom
Residual Deviance: 32.42 on 47 degrees of freedom
(c) What is the z-test statistic for the variable Age in this multiple logistic regression
analysis? Is it statistically significant? [3 marks]
(d) Use the change in deviance to test the significance of adding the variable Aware
to the simple model using Age. [3 marks]
Page 12 of 12