You are on page 1of 7

UNIVERSITY OF COLOMBO, SRI LANKA

FACULTY OF SCIENCE

MASTER OF SCIENCE IN APPLIED STATISTICS – SEMESTER II – 2020 INTAKE

PAPER III – INDUSTRIAL STATISTICS AND LINEAR & NONLINEAR MODELS – PART B

ONE HOUR AND THIRTY MINUTES

Answer ONLY Two (02) question


No. of pages: 07, No. of questions: 03

Important Instructions to the Candidates:

• If a page or a part of this question paper is not printed, please inform the supervisor
immediately.
• Write your index number on all pages of the answer script.
• Part B : Question (1), (2) and (3): Answer ONLY Two (02) questions of this part.
• Write the answers to the questions on writing papers.
• All questions carry equal marks.
• Statistical tables will be available in the LMS.
• Non-programmable calculators are allowed.
• Electronic device capable of storing and retrieving text, including electronic dictionaries and
mobile phones are not allowed.
Part B

Question 01 (100 Marks)

Life expectancy data set contains information to tract factors that are associated with life
expectancy of countries in the world. Suppose we want to check how the following variables
are associated with the Life Expectancy of a country :

Status-Status of the country (Developing or Developed),


AdultMortality-Adult mortality rate of the country,
Schooling-Number of years of schooling,

HepatitisB-Hepetitis B immunization coverage among 1 year olds (%)


The estimated coefficient of the multiple regression model are given bellow:

Estimate Std. Error t value P value


(Intercept) 74.238500 7.089331 10.472 < 2e-16
AdultMortality -0.034776 0.003131 -11.109 < 2e-16
HepatitisB 0.047357 0.011222 4.220 4.12e-05
StatusDeveloping -20.831125 7.238383 -2.878 0.00456
Schooling 0.267322 0.430463 0.621 0.53549
StatusDeveloping:Schooling 1.236431 0.446663 2.768 0.00632

Residual standard error: 3.253 on 157 degrees of freedom


Multiple R-squared: 0.8243, Adjusted R-squared: 0.8187
F-statistic: 147.3 on 5 and 157 DF, p-value: < 2.2e-16

Assuming that the above fitted model satisfy all the necessary distributional assumptions,
answer the following questions.

a) Does at least one of the above mentioned variables associate with Life Expectancy? Give
reasons for your answer. [20 Marks]

b) Explain how the predictor variable AdultMortality is associated with Life Expectancy
after adjusting for other variables. [10 Marks]

2
c) Find a 95% confidence interval for the actual coefficient of the predictor variable
HepatitisB. Interpret this confidence interval. [25 Marks]
d) One researcher says that, on average, Life Expectancy of developing countries is nearly
21 units less than that of developed countries after adjusting for other variables considered
here. Give your thoughts on this. [25 Marks]

e) Can we remove the variable Schooling from the above fitted model? [10 Marks]

f) Does this model fit data well? Give reasons for your answer. [10 Marks]

Question 02 (100 Marks)

a) Consider the simple linear regression model given below:


𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 , where i=1,2,…,n and error terms ( 𝜀𝑖 ) are independent.

i. In which situation can we apply simple linear regression technique to model the
relationship between two variables? [5 marks]
ii. Suppose the correlation coefficient between the two variables 𝑋 and 𝑌 is 0.8.
What would be the sign of 𝛽1 coefficient? Give reasons for your answer. [5
marks]
iii. One student says that 𝑌𝑖 s of the above model are independent and identically
distributed. Is this statement is correct? Give reasons for your answer. [5 marks]
Note: Here you may consider that 𝑋𝑖 s are not random.
iv. The least square estimate of 𝛽 can be written as 𝛽̂ = ∑𝑛𝑖=1(𝑋𝑖 −𝑋̅)𝑌𝑖 . One student
1 1 ∑𝑛(𝑋 −𝑋̅)2
𝑖 𝑖

says that 𝛽̂1 is a random variable. Is the student’s statement correct? Give
reasons for your answer. [5 marks]

b) Suppose that the error terms ( 𝜀𝑖 ) are independent and identically distributed as normal
with mean 0 and variance 𝜎 2 .
i. What is the distribution of 𝑌𝑖 when it is given that 𝑋𝑖 = 5 ? [5 marks]

3
𝑛
∑ (𝑋𝑖 −𝑋̅ )𝑌𝑖
ii. The least square estimate of 𝛽1 and 𝛽0can be written as 𝛽̂1= ∑𝑖=1 ,
𝑛
(𝑋 −𝑋̅ )2
𝑖 𝑖

𝛽̂0 = 𝑌̅ − 𝛽̂1𝑋̅ respectively. Find the variance of 𝛽̂1. [10 marks]


iii. Show that 𝐶𝑜𝑣 (𝑌̅, 𝛽̂1 ) = 0. [10 marks]
iv. Show that 𝐶𝑜𝑣 (𝛽̂ , 𝛽̂ ) = − 𝑋̅𝜎2 . [5 marks]
0 1 𝑆 𝑋𝑋

c) A distribution belongs to the Exponential family if its density can be expressed in the form:
𝑓(𝑦; 𝜃) = 𝑒𝑥𝑝[𝑎(𝑦)𝑏(𝜃) + 𝑐(𝜃) + 𝑑(𝑦)]
For the following density functions, show whether they belong to the Exponential family of
distributions (write down the functions for 𝑎(𝑦), 𝑏(𝜃), 𝑐(𝜃), 𝑑(𝑦)) and justify whether they are
in the canonical form.
𝑒 𝜃𝑦
i. 𝑓(𝑦|𝜃) = (1+𝑒 𝜃 ) [10 marks ]

𝜃𝑘 𝜃
ii. 𝑓(𝑦|𝜃, 𝑘) = 𝑦 𝜃+1 [10 marks]

d) Find E[a(y)] and V[a(y)] for the distributions given in (c). Note that for the Exponential family
of distributions : [30 marks]
𝑐′(𝜃)
𝐸[𝑎(𝑌)] = −
𝑏′(𝜃)
−𝑏′′(𝜃)𝐸[𝑎(𝑦)] − 𝑐′′(𝜃)
𝑣𝑎𝑟[𝑎(𝑦)] =
[𝑏′(𝜃)] 2

Question 03 (100 Marks)

Defaulting the credit card payments is a serious issue faced by credit card companies/banks. A
bank interested in identifying the behavior of customer characteristics and defaulting of payments,
following data were collected from the current list of credit card holders.
Credit: credit amount given (in dollars)
Gender: (1 = male; 2 = female).
Education: (1 = graduate school; 2 = university; 3 = high school).

4
Marital status: (1 = married; 2 = single).
Age (year)
Default (1=Yes,0=No)
A binary logistic model was fitted to this data using all the above predictors. The results of the
model fitted are given below.

Table 5: Parameter Estimates of Full model

Hypothesis Test
Parameter B Std. Error Wald Chi-
df Sig.
Square
(Intercept) -1.035 .0728 201.974 1 .000
[Gender=1] .169 .0292 33.556 1 .000
[Gender=2] 0a . . . .
[Education=1] -.013 .0450 .083 1 .774
[Education=2] .022 .0405 .304 1 .581
[Education=3] 0a . . . .
[Marital=1] .207 .0328 39.952 1 .000
[Marital=2] 0a . . . .
Credit -3.352E-6 1.3457E-7 620.367 1 .000
Age .004 .0018 4.161 1 .041
(Scale) 1b

a) Using the results given in Table 5, write down which variables have a significant association
with defaulting of payments. Give reasons/s for your answer. [10 marks]

b) Out of the combinations of gender and marital status, which combination (group) has the
highest risk for defaulting? Give reason/s for you answer. [10 marks]

c) Briefly explain how you can obtain an improved model for this data. [5 marks]

d) After following a model selection procedure, an optimal model with only variables of credit,
gender, education, and marital status was obtained. Table 6 and Table 7 below give model fit

5
statistics of the first model (with all the predictor variable) and second model (with only credit,
gender, Education and Marital Status) respectively.

Perform a suitable test and justify that second model is better than the model 1. [15 marks]
2
Hint : 𝑋7−5,5% = 5.99
Table 6: Goodness of Fit of Full Model
Value df Value/df
Deviance 10845.440 9756 1.112
Scaled Deviance 10845.440 9756
Pearson Chi-Square 10594.609 9756 1.086
Scaled Pearson Chi-
10594.609 9756
Square
Log Likelihoodb -8135.594
Akaike's Information
16285.188
Criterion (AIC)
Finite Sample
16285.192
Corrected AIC (AICC)
Bayesian Information
16343.152
Criterion (BIC)
Consistent AIC (CAIC) 16350.152

Table 7: Goodness of Fit of Final Model

Value df Value/df
Deviance 10846.622 9758 1.112
Scaled Deviance 10846.622 9758
Pearson Chi-Square 10599.341 9758 1.086
Scaled Pearson Chi-
10599.341 9758
Square
Log Likelihoodb -8136.185
Akaike's Information
16282.370
Criterion (AIC)
Finite Sample Corrected
16282.372
AIC (AICC)
Bayesian Information
16323.773
Criterion (BIC)
Consistent AIC (CAIC) 16328.773

6
e) The parameter estimates of the second model is given below in Table 8.
i. Write down the model formula using the parameter estimates given in the Table 8. [10
marks]

ii. Out of the two predictors ‘Age’ and ‘Credit’, which one is more significantly associated
with the default of payments. Give reason/s. [10 marks]

iii. Interpret the parameter estimates for Age and Gender with respect to the odds of defaulting
payments. [20 marks]

iv. Using the second model formula, predict/calculate the odds of defaulting payments for a
married, female client who is 40 years old and has a credit amount of $ 20000. [20 marks]

Table 8: Parameter estimates of the second model


Hypothesis Test
Wald Chi-
Parameter B Std. Error Square df Sig. Exp(B)
(Intercept) -1.022 .0585 305.775 1 .000 .360
[Gender=1] .169 .0292 33.396 1 .000 1.184
[Gender=2] 0a . . . . 1
[Marital=1] .212 .0325 42.778 1 .000 1.237
[Marital=2] 0a . . . . 1
Credit -3.381E-6 1.2845E-7 692.618 1 .000 1.000
Age .004 .0017 4.151 1 .042 1.004
b
(Scale) 1

***************** End of Question Paper ***************

You might also like