You are on page 1of 7

UNIVERSITY OF ZIMBABWE

HSTS 412
B.Sc.Honours in Statistics

Dynamic Regression: Generalized Linear Models Tutorial

April 2023
Time : 2 hours

A1. Define the following terms;

(a) linear mixed models, and [2]


(b) two stage analysis, and [2]
(c) compound symmetric matrix [2]

A2. Distinguish the following;

(a) general linear model and generalized linear models, [2]


(b) long and wide longitudinal data, [2]
(c) balanced and unbalanced longitudinal data, [2]
(d) retrospective and prospective study; and [2]
(e) random intercept and random slope model. [2]

A3. (a) Describe the features of longitudinal data that makes them require special anal-
ysis? [4]
(b) Explain the term link function, with aid of examples for generalized lineae mod-
els. [4]

A4. (a) Give two advantages and two disadvantages of non linear models. [4]
(b) Why is it not advisable to fit a linearized model for non linear regression? [2]
(c) Linearize the following functions;
θ1 x
(i) y = θ2 +x
[3]
(ii) y = θ1 xθ2 [3]

page 1 of 3
HSTS 412/

A5. In a study to investigate weight loss programs, 100 male subjects were randomly as-
signed to maintain their current eating and exercise habits (Control), to follow a rig-
orous program of modified diet and exercise (Diet+Exercise), or to adopt a modified
diet (Diet Alone). All participants were weighed at baseline (month 0) and then again
at months 3, 6, 9, and 12.

(a) Using symbols for fixed effects, random effects, and residual error terms, write
down two expressions (i.e., one for each program) describing the trend of weight
loss in the two programs. Please use clear notation, making sure that all symbols
used are defined. [8]
(b) An analyst considered a model with different linear time trends for the three
groups as well as a random intercept to account for correlation of the measure-
ments. The following is part of the output from R.

Parameter Estimate P-value


Intercept 240.844 0.000
Program: Diet+Exercise 3.354 0.532
Programe: Diet Alone -4.061 0.413
Month 0.549 0.015
Programe: Diet+Exercise: Month -7.987 0.000
Programe: Diet Alone: Month -3.023 0.000

(i) Write an explicit expression for the estimated population average trend for
each of the three groups. Please be clear in your notation. [9]
(ii) Give a numerical estimate of the difference of population mean weight after
8 weeks between the “Diet Alone” and the “Control” programs. [5]
(c) Suppose we have the following statistical model

Yij = β0 + bi + ij ,

where Yij is a measurement for subject i at time j, i = 1, ..., m and j = 1, ..., n,


β0 is an intercept, bi are random effects such that bi ∼ N (0, σb2 ), ij are error
components such that ij ∼ N (0, σ 2 ), bi and ij are statistically independent for
each i and j, and ij and ik are statistically independent for any two values
j, k = 1, ..., n. Given the standard deviation of intercept and residuals; 18.46174
and 12.41623 respectively.
(i) Find the variance of Yij . [4]
(ii) Find the correlation between any two values Yij and Yik , j 6= k. [4]

END OF TUTORIAL QUESTIONS.

page 2 of 3
A5 (a) For the Diet+Exercise program:

W eighti (t) = β0 + β1 DietExercise + β2 t + ui + ϵi (t)

where:
∗ W eighti (t) is the weight of subject i at time t
∗ DietExercise is a binary variable indicating whether the subject
is in the Diet+Exercise program (1) or not (0)
∗ t is the time in months (t = 0, 3, 6, 9, 12)
∗ β0 is the intercept term
∗ β1 is the fixed effect for the Diet+Exercise program
∗ β2 is the fixed effect for time
∗ ui is the random effect for subject i
∗ ϵi (t) is the residual error term for subject i at time t
For the Diet Alone program:

W eighti (t) = γ0 + γ1 DietAlone + γ2 t + vi + ηi (t)

where:
∗ W eighti (t) is the weight of subject i at time t
∗ DietAlone is a binary variable indicating whether the subject is
in the Diet Alone program (1) or not (0)
∗ t is the time in months (t = 0, 3, 6, 9, 12)
∗ γ0 is the intercept term
∗ γ1 is the fixed effect for the Diet Alone program
∗ γ2 is the fixed effect for time
∗ vi is the random effect for subject i
∗ ηi (t) is the residual error term for subject i at time t
In both models, the random effects terms, ui and vi , capture the in-
dividual differences in weight loss that are not accounted for by the
fixed effects. The residual error terms, ϵi (t) and ηi (t), capture the un-
explained variation in weight loss at each time point for each subject.

(b) The model considered by the analyst can be written as a linear mixed
effects model with fixed effects for program and time, a random in-
tercept to account for correlation of the measurements, and an in-
teraction term between program and time to allow for different time
trends in the three groups. The model can be written as:

W eightij = β0 + β1 DietExercisei + β2 DietAlonei + β3 tij


+β4 (DietExercisei × tij ) + β5 (DietAlonei × tij ) + b0i + ϵij

where:

1
∗ W eightij is the weight of subject i in program j at time t
∗ DietExercisei is a binary variable indicating whether subject
∗ i is in the Diet+Exercise program (1) or not (0)
∗ DietAlonei is a binary variable indicating whether subject i is
in the Diet Alone program (1) or not (0)
∗ tij is the time in months for subject i in program j (tij =
0, 3, 6, 9, 12)
∗ β0 is the overall intercept term
∗ β1 is the fixed effect for the Diet+Exercise program
∗ β2 is the fixed effect for the Diet Alone program
∗ β3 is the fixed effect for time
∗ β4 is the fixed effect for the interaction between the Diet+Exercise
program and time
∗ β5 is the fixed effect for the interaction between the Diet Alone
program and time
∗ b0i is the random intercept for subject i, accounting for correla-
tion of the measurements within subjects
∗ ϵij is the residual error term for subject i in program j at time t
The random intercept term, b0i , allows for correlation of the mea-
surements within subjects by assuming that the intercepts for each
subject follow a normal distribution with mean zero and variance σb2 .
The residual error term, ϵij , captures the unexplained variation in
weight loss at each time point for each subject.

The interaction terms, β4 (DietExercisei × tij ) and β5 (DietAlonei ×


tij ), allow for different time trends in the three groups. For example,
if β4 is positive and significant, it would indicate that the rate of
weight loss over time is greater in the Diet+Exercise program com-
pared to the Control and Diet Alone programs. Similarly, if β5 is
positive and significant, it would indicate that the rate of weight loss
over time is greater in the Diet Alone program compared to the Con-
trol program.

Solution

(i) The estimated population average trend for each of the three
groups can be expressed using the parameter estimates from the
R output as follows:

For the Control group: The estimated population average trend


for the Control group can be expressed as:

W eight = 240.844 + 0.549t

2
where W eight is the average weight of the Control group at time
t.

For the Diet+Exercise group: The estimated population average


trend for the Diet+Exercise group can be expressed as:
W eight = 240.844+3.354DietExercise−7.987DietExerciset+0.549t
where W eight is the average weight of the Diet+Exercise group
at time t, and DietExercise is a binary variable indicating whether
the subject is in the Diet+Exercise program (1) or not (0).

For the Diet Alone group: The estimated population average


trend for the Diet Alone group can be expressed as:
W eight = 240.844 − 4.061DietAlone − 3.023DietAlonet + 0.549t
where W eight is the average weight of the Diet Alone group at
time t, and DietAlone is a binary variable indicating whether
the subject is in the Diet Alone program (1) or not (0).

In all three expressions, the intercept term represents the average


weight at baseline (month 0), the fixed effect for time represents
the average rate of weight change over time, and the interac-
tion term between program and time represents the difference in
the rate of weight change between the program and the Control
group.

(ii) To estimate the difference of population mean weight after 8


weeks between the “Diet Alone” and the “Control” programs,
we need to calculate the predicted average weight for each group
at 8 weeks (2 months) and then take the difference between the
two predictions.

For the Control group, the estimated population average trend


is:
W eight = 240.844 + 0.549t
Substituting t = 2 (2 months), we get:
W eightControl = 240.844 + 0.549 ∗ 2 = 241.942
For the Diet Alone group, the estimated population average
trend is:
W eight = 240.844 − 4.061DietAlone − 3.023DietAlonet + 0.549t
Substituting t = 2 and DietAlone = 1 (since we are interested
in the Diet Alone group), we get:
W eightDietAlone = 240.844−4.061∗1−3.023∗1∗2+0.549∗2 = 231.835

3
Therefore, the predicted average weight for the Control group
after 8 weeks is 241.942 and the predicted average weight for
the Diet Alone group after 8 weeks is 231.835. The difference
between the two predictions is:

W eightDietAlone −W eightControl = 231.835−241.942 = −10.107

Therefore, the estimated difference of population mean weight


after 8 weeks between the ”Diet Alone” and the ”Control” pro-
grams is -10.107 pounds. This means that, on average, the Diet
Alone group is predicted to have a weight that is 10.107 pounds
lower than the Control group after 8 weeks.

(c) (i) To find the variance of Yij , we can use the fact that the random
effects and error components are independent and have variances
of σb2 and σ 2 , respectively. Therefore, we have:

Var(Yij ) = Var(β0 + bi + eij ) = Var(bi ) + Var(eij ) = σb2 + σ 2

Evaluating using the given standard deviation of intercept, σβ0 =


18.46174, and the standard deviation of residuals, σe = 12.41623,
we can find the variance of Yij as:

Var(Yij ) = σb2 + σ 2 = σβ20 + σe2 = 494.9986


Therefore, the variance of Yij is 494.9986.
(ii) To find the correlation between any two values Yij and Yik , j ̸= k,
we can use the fact that the random effects and error components
are independent and uncorrelated. Therefore, we have:

Cov(Yij , Yik ) = Cov(β0 +bi +eij , β0 +bi +eik ) = Var(bi )+0 = σb2

Using the formula for the correlation coefficient, we have:

Cov(Yij , Yik ) σb2 σ2


Corr(Yij , Yik ) = =p 2 = 2 b 2
SD(Yij ) · SD(Yik ) 2
(σb + σ 2 )(σb + σ 2 ) σb + σ

Therefore, the correlation between any two values Yij and Yik ,
σb2
j ̸= k, is σb2 +σ 2
.

Using the standard deviation of intercept, σβ0 = 18.46174, and


the standard deviation of residuals, σe = 12.41623, we can find
the correlation between any two values Yij and Yik , j ̸= k, as:

4
σb2 σβ20 340.8358
Corr(Yij , Yik ) = 2 2
= 2 2
= = 0.6885591
σb + σ σβ0 + σe 340.8358 + 154.1628

Therefore, the correlation between any two values Yij and Yik ,
j ̸= k, is 0.6885591. Note that this indicates a moderate positive
correlation between the two measurements, which means that
as one measurement increases, the other measurement tends to
increase as well.

You might also like