Professional Documents
Culture Documents
FACULTY OF SCIENCE
PAPER III – INDUSTRIAL STATISTICS AND LINEAR & NONLINEAR MODELS – PART B
• If a page or a part of this question paper is not printed, please inform the supervisor
immediately.
• Write your index number on all pages of the answer script.
• Part B : Question (1), (2) and (3): Answer ONLY Two (02) questions of this part.
• Write the answers to the questions on writing papers.
• All questions carry equal marks.
• Statistical tables will be available in the LMS.
• Non-programmable calculators are allowed.
• Electronic device capable of storing and retrieving text, including electronic dictionaries and
mobile phones are not allowed.
Part B
Life expectancy data set contains information to tract factors that are associated with life
expectancy of countries in the world. Suppose we want to check how the following variables
are associated with the Life Expectancy of a country :
Assuming that the above fitted model satisfy all the necessary distributional assumptions,
answer the following questions.
a) Does at least one of the above mentioned variables associate with Life Expectancy? Give
reasons for your answer. [20 Marks]
b) Explain how the predictor variable AdultMortality is associated with Life Expectancy
after adjusting for other variables. [10 Marks]
2
c) Find a 95% confidence interval for the actual coefficient of the predictor variable
HepatitisB. Interpret this confidence interval. [25 Marks]
d) One researcher says that, on average, Life Expectancy of developing countries is nearly
21 units less than that of developed countries after adjusting for other variables considered
here. Give your thoughts on this. [25 Marks]
e) Can we remove the variable Schooling from the above fitted model? [10 Marks]
f) Does this model fit data well? Give reasons for your answer. [10 Marks]
i. In which situation can we apply simple linear regression technique to model the
relationship between two variables? [5 marks]
ii. Suppose the correlation coefficient between the two variables 𝑋 and 𝑌 is 0.8.
What would be the sign of 𝛽1 coefficient? Give reasons for your answer. [5
marks]
iii. One student says that 𝑌𝑖 s of the above model are independent and identically
distributed. Is this statement is correct? Give reasons for your answer. [5 marks]
Note: Here you may consider that 𝑋𝑖 s are not random.
iv. The least square estimate of 𝛽 can be written as 𝛽̂ = ∑𝑛𝑖=1(𝑋𝑖 −𝑋̅)𝑌𝑖 . One student
1 1 ∑𝑛(𝑋 −𝑋̅)2
𝑖 𝑖
says that 𝛽̂1 is a random variable. Is the student’s statement correct? Give
reasons for your answer. [5 marks]
b) Suppose that the error terms ( 𝜀𝑖 ) are independent and identically distributed as normal
with mean 0 and variance 𝜎 2 .
i. What is the distribution of 𝑌𝑖 when it is given that 𝑋𝑖 = 5 ? [5 marks]
3
𝑛
∑ (𝑋𝑖 −𝑋̅ )𝑌𝑖
ii. The least square estimate of 𝛽1 and 𝛽0can be written as 𝛽̂1= ∑𝑖=1 ,
𝑛
(𝑋 −𝑋̅ )2
𝑖 𝑖
c) A distribution belongs to the Exponential family if its density can be expressed in the form:
𝑓(𝑦; 𝜃) = 𝑒𝑥𝑝[𝑎(𝑦)𝑏(𝜃) + 𝑐(𝜃) + 𝑑(𝑦)]
For the following density functions, show whether they belong to the Exponential family of
distributions (write down the functions for 𝑎(𝑦), 𝑏(𝜃), 𝑐(𝜃), 𝑑(𝑦)) and justify whether they are
in the canonical form.
𝑒 𝜃𝑦
i. 𝑓(𝑦|𝜃) = (1+𝑒 𝜃 ) [10 marks ]
𝜃𝑘 𝜃
ii. 𝑓(𝑦|𝜃, 𝑘) = 𝑦 𝜃+1 [10 marks]
d) Find E[a(y)] and V[a(y)] for the distributions given in (c). Note that for the Exponential family
of distributions : [30 marks]
𝑐′(𝜃)
𝐸[𝑎(𝑌)] = −
𝑏′(𝜃)
−𝑏′′(𝜃)𝐸[𝑎(𝑦)] − 𝑐′′(𝜃)
𝑣𝑎𝑟[𝑎(𝑦)] =
[𝑏′(𝜃)] 2
Defaulting the credit card payments is a serious issue faced by credit card companies/banks. A
bank interested in identifying the behavior of customer characteristics and defaulting of payments,
following data were collected from the current list of credit card holders.
Credit: credit amount given (in dollars)
Gender: (1 = male; 2 = female).
Education: (1 = graduate school; 2 = university; 3 = high school).
4
Marital status: (1 = married; 2 = single).
Age (year)
Default (1=Yes,0=No)
A binary logistic model was fitted to this data using all the above predictors. The results of the
model fitted are given below.
Hypothesis Test
Parameter B Std. Error Wald Chi-
df Sig.
Square
(Intercept) -1.035 .0728 201.974 1 .000
[Gender=1] .169 .0292 33.556 1 .000
[Gender=2] 0a . . . .
[Education=1] -.013 .0450 .083 1 .774
[Education=2] .022 .0405 .304 1 .581
[Education=3] 0a . . . .
[Marital=1] .207 .0328 39.952 1 .000
[Marital=2] 0a . . . .
Credit -3.352E-6 1.3457E-7 620.367 1 .000
Age .004 .0018 4.161 1 .041
(Scale) 1b
a) Using the results given in Table 5, write down which variables have a significant association
with defaulting of payments. Give reasons/s for your answer. [10 marks]
b) Out of the combinations of gender and marital status, which combination (group) has the
highest risk for defaulting? Give reason/s for you answer. [10 marks]
c) Briefly explain how you can obtain an improved model for this data. [5 marks]
d) After following a model selection procedure, an optimal model with only variables of credit,
gender, education, and marital status was obtained. Table 6 and Table 7 below give model fit
5
statistics of the first model (with all the predictor variable) and second model (with only credit,
gender, Education and Marital Status) respectively.
Perform a suitable test and justify that second model is better than the model 1. [15 marks]
2
Hint : 𝑋7−5,5% = 5.99
Table 6: Goodness of Fit of Full Model
Value df Value/df
Deviance 10845.440 9756 1.112
Scaled Deviance 10845.440 9756
Pearson Chi-Square 10594.609 9756 1.086
Scaled Pearson Chi-
10594.609 9756
Square
Log Likelihoodb -8135.594
Akaike's Information
16285.188
Criterion (AIC)
Finite Sample
16285.192
Corrected AIC (AICC)
Bayesian Information
16343.152
Criterion (BIC)
Consistent AIC (CAIC) 16350.152
Value df Value/df
Deviance 10846.622 9758 1.112
Scaled Deviance 10846.622 9758
Pearson Chi-Square 10599.341 9758 1.086
Scaled Pearson Chi-
10599.341 9758
Square
Log Likelihoodb -8136.185
Akaike's Information
16282.370
Criterion (AIC)
Finite Sample Corrected
16282.372
AIC (AICC)
Bayesian Information
16323.773
Criterion (BIC)
Consistent AIC (CAIC) 16328.773
6
e) The parameter estimates of the second model is given below in Table 8.
i. Write down the model formula using the parameter estimates given in the Table 8. [10
marks]
ii. Out of the two predictors ‘Age’ and ‘Credit’, which one is more significantly associated
with the default of payments. Give reason/s. [10 marks]
iii. Interpret the parameter estimates for Age and Gender with respect to the odds of defaulting
payments. [20 marks]
iv. Using the second model formula, predict/calculate the odds of defaulting payments for a
married, female client who is 40 years old and has a credit amount of $ 20000. [20 marks]