Professional Documents
Culture Documents
HSTS 412
B.Sc.Honours in Statistics
April 2023
Time : 2 hours
A3. (a) Describe the features of longitudinal data that makes them require special anal-
ysis? [4]
(b) Explain the term link function, with aid of examples for generalized lineae mod-
els. [4]
A4. (a) Give two advantages and two disadvantages of non linear models. [4]
(b) Why is it not advisable to fit a linearized model for non linear regression? [2]
(c) Linearize the following functions;
θ1 x
(i) y = θ2 +x
[3]
(ii) y = θ1 xθ2 [3]
page 1 of 3
HSTS 412/
A5. In a study to investigate weight loss programs, 100 male subjects were randomly as-
signed to maintain their current eating and exercise habits (Control), to follow a rig-
orous program of modified diet and exercise (Diet+Exercise), or to adopt a modified
diet (Diet Alone). All participants were weighed at baseline (month 0) and then again
at months 3, 6, 9, and 12.
(a) Using symbols for fixed effects, random effects, and residual error terms, write
down two expressions (i.e., one for each program) describing the trend of weight
loss in the two programs. Please use clear notation, making sure that all symbols
used are defined. [8]
(b) An analyst considered a model with different linear time trends for the three
groups as well as a random intercept to account for correlation of the measure-
ments. The following is part of the output from R.
(i) Write an explicit expression for the estimated population average trend for
each of the three groups. Please be clear in your notation. [9]
(ii) Give a numerical estimate of the difference of population mean weight after
8 weeks between the “Diet Alone” and the “Control” programs. [5]
(c) Suppose we have the following statistical model
Yij = β0 + bi + ij ,
page 2 of 3
A5 (a) For the Diet+Exercise program:
where:
∗ W eighti (t) is the weight of subject i at time t
∗ DietExercise is a binary variable indicating whether the subject
is in the Diet+Exercise program (1) or not (0)
∗ t is the time in months (t = 0, 3, 6, 9, 12)
∗ β0 is the intercept term
∗ β1 is the fixed effect for the Diet+Exercise program
∗ β2 is the fixed effect for time
∗ ui is the random effect for subject i
∗ ϵi (t) is the residual error term for subject i at time t
For the Diet Alone program:
where:
∗ W eighti (t) is the weight of subject i at time t
∗ DietAlone is a binary variable indicating whether the subject is
in the Diet Alone program (1) or not (0)
∗ t is the time in months (t = 0, 3, 6, 9, 12)
∗ γ0 is the intercept term
∗ γ1 is the fixed effect for the Diet Alone program
∗ γ2 is the fixed effect for time
∗ vi is the random effect for subject i
∗ ηi (t) is the residual error term for subject i at time t
In both models, the random effects terms, ui and vi , capture the in-
dividual differences in weight loss that are not accounted for by the
fixed effects. The residual error terms, ϵi (t) and ηi (t), capture the un-
explained variation in weight loss at each time point for each subject.
(b) The model considered by the analyst can be written as a linear mixed
effects model with fixed effects for program and time, a random in-
tercept to account for correlation of the measurements, and an in-
teraction term between program and time to allow for different time
trends in the three groups. The model can be written as:
where:
1
∗ W eightij is the weight of subject i in program j at time t
∗ DietExercisei is a binary variable indicating whether subject
∗ i is in the Diet+Exercise program (1) or not (0)
∗ DietAlonei is a binary variable indicating whether subject i is
in the Diet Alone program (1) or not (0)
∗ tij is the time in months for subject i in program j (tij =
0, 3, 6, 9, 12)
∗ β0 is the overall intercept term
∗ β1 is the fixed effect for the Diet+Exercise program
∗ β2 is the fixed effect for the Diet Alone program
∗ β3 is the fixed effect for time
∗ β4 is the fixed effect for the interaction between the Diet+Exercise
program and time
∗ β5 is the fixed effect for the interaction between the Diet Alone
program and time
∗ b0i is the random intercept for subject i, accounting for correla-
tion of the measurements within subjects
∗ ϵij is the residual error term for subject i in program j at time t
The random intercept term, b0i , allows for correlation of the mea-
surements within subjects by assuming that the intercepts for each
subject follow a normal distribution with mean zero and variance σb2 .
The residual error term, ϵij , captures the unexplained variation in
weight loss at each time point for each subject.
Solution
(i) The estimated population average trend for each of the three
groups can be expressed using the parameter estimates from the
R output as follows:
2
where W eight is the average weight of the Control group at time
t.
3
Therefore, the predicted average weight for the Control group
after 8 weeks is 241.942 and the predicted average weight for
the Diet Alone group after 8 weeks is 231.835. The difference
between the two predictions is:
(c) (i) To find the variance of Yij , we can use the fact that the random
effects and error components are independent and have variances
of σb2 and σ 2 , respectively. Therefore, we have:
Cov(Yij , Yik ) = Cov(β0 +bi +eij , β0 +bi +eik ) = Var(bi )+0 = σb2
Therefore, the correlation between any two values Yij and Yik ,
σb2
j ̸= k, is σb2 +σ 2
.
4
σb2 σβ20 340.8358
Corr(Yij , Yik ) = 2 2
= 2 2
= = 0.6885591
σb + σ σβ0 + σe 340.8358 + 154.1628
Therefore, the correlation between any two values Yij and Yik ,
j ̸= k, is 0.6885591. Note that this indicates a moderate positive
correlation between the two measurements, which means that
as one measurement increases, the other measurement tends to
increase as well.