You are on page 1of 25

Section I: Review of exploratory data an alysis (EDA) techniques for longitudinal data

(a) Plot the outcome over time and describe any patterns.

Figure 1 The trend of outcome for both Gabapentin and Placebo groups

Figure 2 The Boxplots of outcome for both Gabapentin and Placebo groups

1|Page
There is one obvious outlier, which is id 318 of the placebo group. On the top diagram, this
individuals points are all outliers across time. However, as time progressed, the individuals hot
flash score declined. The downward sloping black line shows the general trend of the patients.
Hot flash score seems to be improving over time. As for the bottom diagram, many individuals
have lower scores at time 2 as the data are more concentrated. Moreover, there are several
outliers in each time point. For instance at time point 0, Q3+1.5*(Q3-Q1)=46.05, and this tells
us that any points beyond 46.05 are outliers. At each time point, mean exceeds median and this
means that the outcomes at each time point follow a right-skewed distribution. Lastly, the
scores decline between times 0 and 2, and stay constant between times 3 and 4.
(b) Plot the outcome over time in each respective treatment group and describe any patterns.

Figure 3 Spaghetti plots of outcome over time for Placebo and Gabapentin (separate version)

Figure 4 Boxplots of outcome over time for Placebo and Gabapentin (separate version)

2|Page
The hot flash scores overtime for the placebo group overall seems to be decline, but not so
much according the its spaghetti plot. Its box plot shows that the hot flash scores actually
increased a bit after time 2. However, we see a decreasing trend of hot flash score for the
Gabapentin group in both its spaghetti plot and its box plot over time. This implies that
Gabapentin seems to be working.
(c) Plot each time point create a pair of side-by-side box-plots of the outcome in each
treatment group.

Figure 5 Boxplots of outcome over time for Placebo and Gabapentin (combined version)

The hot flash scores have a decline between time 0 and time 2 in the placebo group. Between
time 3 and time 4, there is a slight increase. Also, there are several outliers at each time point.
For the Gabapentin gorup, there is a gradual decline in hot flash scores at each time point. In
this group, there are several outliers, but are not that spread comparing to the placebo group
at each time point. As for the boxplot, at times 3 and 4, Gabapentin has lower ranges of hot
flash scores than the placebo group at times 3 and 4. Thus, it seems that Gabapentin is working.
(d) The variance-covariance structure of the repeated measures outcome overall:
Table 1 The Estimated R matrix
Estimated R Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 233.05 148.42 113.41 119.92 112.91
2 148.42 128.72 101.96 105.30 99.9560
3 113.41 101.96 96.2750 98.3692 94.2333
4 119.92 105.30 98.3692 114.86 108.72
5 112.91 99.9560 94.2333 108.72 111.74

3|Page
Table 2 The Estimated R Correlation Matrix
Estimated R Correlation Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.8569 0.7572 0.7330 0.6997
2 0.8569 1.0000 0.9159 0.8660 0.8335
3 0.7572 0.9159 1.0000 0.9354 0.9086
4 0.7330 0.8660 0.9354 1.0000 0.9596
5 0.6997 0.8335 0.9086 0.9596 1.0000

Two measurements that are right next to each other tend to be correlated. For instance, the
entry row1-col2 is highly correlated (measurements at time 0 and time 1). The correlation
decreases as the two measurements get farther apart. For instance, the entry row1-col5
(measurements at time 0 and 4) has a lower correlation than the entry row1-col2.
(e) The variance-covariance structure of the repeated measures outcome within each of the
treatment groups:
Table 3 The Estimated R matrix
Estimated R Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 176.37 91.3819 77.2150 85.3764 79.1819
2 91.3819 80.3184 71.5092 70.1267 63.6879
3 77.2150 71.5092 79.2628 80.6509 73.5679
4 85.3764 70.1267 80.6509 96.5993 87.6164
5 79.1819 63.6879 73.5679 87.6164 87.3776

Table 4 The Estimated R Correlation Matrix


Estimated R Correlation Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.7678 0.6531 0.6541 0.6378
2 0.7678 1.0000 0.8962 0.7961 0.7602
3 0.6531 0.8962 1.0000 0.9217 0.8840
4 0.6541 0.7961 0.9217 1.0000 0.9537
5 0.6378 0.7602 0.8840 0.9537 1.0000

Two measurements that are close each other tend to have higher correlation than the two
measurements that are far apart (entries row5-col1 and row5-co4).
Table 5 The Estimated R matrix
Estimated R Matrix for id(treatment) 364 Placebo
Row Col1 Col2 Col3 Col4 Col5
1 290.30 206.03 149.99 154.88 147.01

4|Page
2 206.03 177.55 132.71 140.79 136.46
3 149.99 132.71 113.50 116.34 115.07
4 154.88 140.79 116.34 133.41 130.00
5 147.01 136.46 115.07 130.00 136.18

Table 6 The Estimated R Correlation Matrix


Estimated R Correlation Matrix for id(treatment) 364 Placebo
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.9075 0.8263 0.7870 0.7394
2 0.9075 1.0000 0.9349 0.9148 0.8776
3 0.8263 0.9349 1.0000 0.9454 0.9256
4 0.7870 0.9148 0.9454 1.0000 0.9645
5 0.7394 0.8776 0.9256 0.9645 1.0000

Two measurements that are close each other tend to have higher correlation than the two
measurements that are far apart (entries row5-col1 and row5-co4). Thus, we can conclude that
the correlation patterns are similar.
Session II: Bivariate Models
(a) Consider the association between each predictor (individually) and the outcome.
Suppose the variance-covariance structure R is unstructured and suppose ddfm = kr.

Table 7 The Association


Effect Estimate Association Standard Error DF Pr>|t|
Categorical
Intercept 11.3501 0.7706 197 <.0001
Time 0 7.6213 Positive 0.8071 196 <.0001
Time 1 2.0023 Positive 0.4617 193 <.0001
Time 2 0.000476 Positive 0.3262 191 0.9988
Time 3 -0.03141 Negative 0.2204 191 0.8868
Time 4 0 (reference)
Continuous
Time -0.5245 Negative 0.1450 192 0.0004
Intercept 11.7893 0.9985 194 <.0001
Gabapentin -2.8216 Negative 1.4118 195 0.0471
Placebo 0 (reference)
Intercept 11.4481 2.1970 197 <.0001
Smoker 0 -0.9728 Negative 2.3224 196 0.6758
Smoker 1 0 (reference)
hrt_months 0.003434 Positive 0.01723 193 0.8423
wbc -0.2585 Negative 0.4277 197 0.5463
age 0.2265 Positive 0.1579 195 0.1530

5|Page
(b) Consider using time as a categorical and a continuous predictor.

We are going to continue to assume that the variance-covariance structure R is


unstructured and suppose ddfm = kr.
Time is categorical (setting time 4 as the reference group)

Table 8 Solution for Fixed Effects


Effect time Estimate Standard DF t Value Pr > |t|
Error
Intercept 11.3501 0.7706 197 14.73 <.0001
time 0 7.6213 0.8071 196 9.44 <.0001
time 1 2.0023 0.4617 193 4.34 <.0001
time 2 0.000476 0.3262 191 0.00 0.9988
time 3 -0.03141 0.2204 191 -0.14 0.8868
time 4 0 . . . .

Time is continuous

Table 9 Solution for Fixed Effects


Effect Estimate SE DF t Value Pr > |t|
Intercept 11.8682 0.7942 197 14.94 <.0001
time -0.5245 0.1450 192 -3.62 0.0004

Deviance Statistic = -2*Res Log Likelihood of the continuous - 2* Res Log Likelihood of the
categorical = 6013.8 - 5935.2 = 78.6 > 24 . The p-value is 3.330669e-16, which means that the
categorical time model is better than the continuous time model.
This states that the model is not improved after changing time to continuous. We can access
the linearity of the relationship through a scatterplot and overlay the smoothed curve.

Figure 6 Outcome and Time Relationship

6|Page
Here time and outcome seem to have a negative linear relationship. In the categorical case, at
time 3, there is a negative association between time and outcome. It seems reasonable because
as time point goes up, we expect to see the outcome to go down.
Section III: Multivariable Models
(a) Considering some of the variable section methods discussed to date.
There are two common variable selection methods: forward and backward selection. We can
start with forward selection. This method starts with no covariates. Check AIC, AICC, and BIC as
new predictors added to the model. This will continue until no predictors can be added. Then
we use a backward elimination method by using the likelihood-ratio statistic. Same procedure
will be done with the interaction terms.

Table 10 Model Selection Part I


Step I: Single Covariates
Model Predictors AIC BIC
1 none 5050.6 6099.8
2 time 5965.2 6014.5
3 treatment 6044.4 6093.6
4 smoker 6046.9 6096.1
5 hrt_months 6056.8 6106.1
6 wbc 6050.1 6099.3
7 age 6022.8 6071.9
Step II: 2 Covariates
8 time+trt 5958.8 6008.1
9 time+smoker 5961.6 6010.8
10 time+hrt_months 6050.0 6099.3
11 time+wbc 6043.3 6092.5
12 time+age 6016.2 6065.3
Step III: 3 Covariates
13 time+trt+smoker 5955.2 6004.4
14 time+trt+hrt_months 5965.2 6014.4
15 time+trt+wbc 5958.4 6007.7
16 time+trt+age 5932.2 5981.4
Step IV: 4 Covariates

7|Page
17 time+trt+age+smoker 5928.3 5977.5
18 time+trt+age+hrt_months 5938.5 5987.6
19 time+trt+age+wbc 5931.7 5980.9
Step V: 5 Covariates
20 time+trt+age+smoker+hrt_months 5934.6 5983.8
21 time+trt+age+smoker+wbc 5927.6 5976.8
Step VI: 6 Covariates
22 time+trt+age+smoker+wbc+hrt_months 5933.9 5983.0

Now we compare model 18 and model 21 by deviance difference to see which model is better.
Then we can consider interaction terms by forward selection. LRT = deviance17 deviance21 =
5898.3 - 5897.6 = 0.7 < 12 = 3.84, and the p-value is 0.4027837, which states that the simpler
model 18 is better. Thus, model 21 is eliminated. We then compare model 16 with model 18.
LRT = deviance16 deviance17 = 5902.2 - 5898.3 = 3.9 > 12 = 3.84, and the p-value is 0.048. This
value is on the border line, so it is up to the research who finds smoker as an important
covariate in the study. In this case, the p-value states that model 17 is better.

Table 11 Model Selection Part II


Step VII: Interaction Term 1
Model Interaction Terms AIC BIC
23 time*age 5939.4 5988.6
24 time*treatment 5900.3 5949.5
25 time*smoker 5914.6 5963.8
26 age*treatment 5928.9 5978.0
27 age*smoker 5978.0 5975.9
28 smoker*treatment 5922.0 5971.2
Step VII: Interaction Term 2
29 time*age*treatment 5916.5 5965.7
30 time*age*smoker 5928.7 5977.9
31 time*smoker*treatment 5866.4 5915.5
32 age*treatment*smoker 5900.6 5949.8
Step VIII: Interaction Term 3
33 age*treatment*smoker*time 5875.7 5924.8

Now we compare model 31 and 24 by LRT = deviance24 deviance31 = 5870.3-5836.4 = 33.9 > 29
= 16.92, and the p-value is 9.3e-05. This is suggesting the complex model is actually better.
However, the new interaction uses up 9 degrees of freedom, and may change the meaning of
the lower order coefficients. None of the terms is significant, so without further complicating
the model, the interaction term will not be included. Thus, model 24 will be the final model.

8|Page
Table 12 Solution for Fixed Effects
Effect treatment time smoker Estimate SD DF t Value Pr > |t|

Intercept 1.9005 8.4802 195 0.22 0.8229


time 0 4.7988 1.1134 193 4.31 <.0001
time 1 1.3312 0.6533 190 2.04 0.0430
time 2 -0.9312 0.4543 188 -2.05 0.0418
time 3 -0.4226 0.3111 187 -1.36 0.1759
time 4 0 . . . .
treatment Gabapentin -3.9803 1.5242 193 -2.61 0.0097
treatment Placebo 0 . . . .
age 0.2390 0.1580 193 1.51 0.1321
smoker 0 -1.3726 2.3160 193 -0.59 0.5541
smoker 1 0 . . . .
treatment Gabapentin 0 5.5836 1.5732 194 3.55 0.0005
*time
treatment Gabapentin 1 1.3022 0.9248 191 1.41 0.1607
*time
treatment Gabapentin 2 1.8485 0.6432 189 2.87 0.0045
*time
treatment Gabapentin 3 0.7687 0.4403 189 1.75 0.0824
*time

Section IV: Linear Mixed Models


(a) From the above exercise, adding time, treatment, age, smoker, and the interaction term
treatment*time result in second lowest AIC and BIC of all the models tested. The method
used for the mixed proc is REML. Treatment, time, and treatment*time covariates are
mostly statistical significant. Age and smoker are added to better fit the data as AIC has
suggested.

The linear mixed model follows the form of y = XB + Z + , where y is the matrix of
outcome variable, X is the matrix of predictor variables, Z is the design matrix, is the
vector of random effects, and is the vector of residuals.

Now, we have to consider for the random effects. Consider to make the time points as the
random effects. The reason we consider time as random is because our repeated
observations are correlated, and by using random time we try to model the correlation
among repeated measures within the same subject. The random effects design matrix has a
column takes on a time point, and an intercept column, if any. If the observation is in the
time point, then it will equal to 1, but if not then will be 0. If we pick time without intercept
for the design matrix Z, then the Z matrix is 196 by 5. If we pick time with intercept for the

9|Page
design matrix Z, then the Z matrix is 196 by 6. If we only pick time, then the design matrix Z
is 196 by 1. Recall that variance for y is V = ZGZ+R. In our case, we want to fit the random
portion of the model by specifying the terms that define the random design matrix Z and
specifying the structures of matrices G. Then we compare which model has the lowest AIC
and BIC. Lets assume complete independence across subjects.

Table 13 Model Selection


Covariance-Structure Z Matrix AIC BIC
VC Intercept 6433.0 6439.5
Time 7396.3 7402.9
Intercept + Time 6435.0 6444.8
Autoregressive(1) Intercept 6435.0 6444.8
CS Intercept 6435.0 6444.8
Time 6435.0 6444.8
Intercept + Time 6435.0 6444.8
Unstructured Intercept 6433.0 6439.5
Time 5901.6 5954.0
Intercept + Time 5911.6 5980.4
Toeplitz Intercept 6433.0 6439.5
Time 6190.8 6210.5
Intercept + Time 6144.0 6167.0
The linear mixed model that has the smallest AIC and BIC is model with an unstructured G
matrix, and a 196 by 5 Z matrix (one column each time point).
(b) Assess the adequacy of the fit

Figure 7 Diagnostic plots for the linear mixed model

10 | P a g e
The Residual vs Predicted Mean graph bounces randomly around the 0 line. However, there are
several outliers. The Q-Q plot looks curved and some outliers at each end. We can also look at
the residual percent plot. The majority of the residual fall between -10 and 10. The graph is
right skewed, which indicates there are some positive outliers skewing the normality of the
residual. The right skewness of the residuals violates the linear mixed models assumption. To
improve the model, we removed the outlier id 318, and made a log transformation.

Figure 8 Adjusted diagnostic plots for the linear mixed model

The AIC and BIC both improved, and the residual vs percent plot suggests that the residuals
become more normal. However, the curve of the Q-Q plot is still apparent. The Residual Vs
Predicted mean also improved after the log-transformation and the removal of the outlier.
Section V: Generalized Estimating Equations (GEEs)
(a) GEE models is an extension of generalized linear models for correlated data. The additional
specification in a GEE model is the working correlation structure of the repeated measure.
In our case, the working correlation matrix Ri is 5 by 5, which will be the same for all
individuals. This Ri matrix models the correlation among the repeated measures within
subject. We will build our GEE model based on the Quasi-likelihood under Independence
Model Criterion (QIC). QIC is to assist researchers to choose the best correlation structure.
The smaller the QIC is the better. We are going to keep the same covariates as the previous
questions. According to the QIC, the exchangeable working correlation structure provides
the smallest QIC.

11 | P a g e
Table 14 Model Selection
Correlation Structure Types QIC
Autoregressive(1) 983.6723
Exchangeable or CS 981.3564
Independent 982.6149
Unstructured 2468.5068
M-dependent 1014.9139

GEE method yields consistent estimates of the regression coefficients and standard errors,
even with misspecification of the correlation structure. However, efficiency such as
statistical power is reduced if the choice of R is incorrect. The loss of efficiency can be
lessened as number of subject gets large.
(b) Assess the adequacy of the fitted model

Figure 9 Diagnostic plots for the generalized estimating equation

The raw residual and Pearson residual show that the residuals are randomly scattered around
the 0 line. However, there are several outliers that are distorting the model. Moreover, the
Cooks Distance plot and the leverage plot show there is a potential influential observation. This
observation needs further investigation because it may distort the outcome and accuracy of the
GEE model. After deleting the extreme value id 318, the GEE models QIC improved to
974.3188. However, there might still be other influential observations in the data referring to
the diagnostic statistics for outcome below.

12 | P a g e
Figure 10 Adjusted diagnostic plots for the generalized estimating equation

Section VI: Contrasts

(a) Lets suppose that the LMM has G and R matrices that follow the compound symmetry
structure. For contrast, we can build a model as the following:
Yij = b0 + b1t0 + b2t1 + b3t2 + b4t3 + b5t4 + b6G + b7P + b8Gt0 + b9Gt1 + b10Gt2 + b11Gt3 + b12Gt4 +
b13Pt0 + b14Pt1 + b15Pt2 + b16Pt3 + b17Pt4 + eij
We want the difference between time 1 and time 2: T1 is b0 + b2 and T2 is b0 + b3. The
difference between them is T2 T1 = (b0 + b3) (b0 + b2) = b3 - b2. In SAS, this L matrix is time
0 -1 1 0 0.
Below is the solution for the fixed effects of the LMM by proc mixed:
Table 15 Solution for Fixed Effects
Effect treatment time Estimate Standard DF t Value Pr > |t|
Error
Intercept 13.2827 1.1903 195 11.16 <.0001
treatment Gabapentin -4.0538 1.6818 756 -2.41 0.0162
treatment Placebo 0 . . . .
time 0 4.9826 0.7467 756 6.67 <.0001
time 1 1.4167 0.7474 756 1.90 0.0584
time 2 -0.9010 0.7474 756 -1.21 0.2284
time 3 -0.3938 0.7474 756 -0.53 0.5985
time 4 0 . . . .
treatment*time Gabapentin 0 5.5101 1.0594 756 5.20 <.0001
treatment*time Gabapentin 1 1.2626 1.0590 756 1.19 0.2335
treatment*time Gabapentin 2 1.8179 1.0598 756 1.72 0.0867

13 | P a g e
treatment*time Gabapentin 3 0.7359 1.0598 756 0.69 0.4877
treatment*time Gabapentin 4 0 . . . .

Table 16 Contrasts
Label Num DF Den DF F Value Pr > F
Time 2 - Time 1 1 756 14.84 0.0001

The p-value is 0.0001, and this indicates that the difference in hot flash scores between time 1
and time 2 is significant.
(b) Lets suppose the working correlation structure is AR(1). The model for contrast is still the
same as part a. In SAS, this L matrix is time 0 -1 1 0 0 still stays the same as part a.

Below is the GEE Parameter Estimates generated by proc genmod:


Table 17 The GEE Model
Parameter Estimate Standard Z Pr > |Z|
Error
Intercept 13.3617 1.1783 11.34 <.0001
treatment Gabapentin -4.0226 1.5118 -2.66 0.0078
treatment Placebo 0.0000 0.0000 . .
time 0 4.9036 1.1621 4.22 <.0001
time 1 1.3290 0.6459 2.06 0.0396
time 2 -0.9542 0.4501 -2.12 0.0340
time 3 -0.4180 0.3144 -1.33 0.1836
time 4 0.0000 0.0000 . .
treatment*time Gabapentin 0 5.4402 1.5627 3.48 0.0005
treatment*time Gabapentin 1 1.2330 0.9163 1.35 0.1784
treatment*time Gabapentin 2 1.8186 0.6372 2.85 0.0043
treatment*time Gabapentin 3 0.7362 0.4351 1.69 0.0907
treatment*time Gabapentin 4 0.0000 0.0000 . .

Table 18 Contrast Results for GEE Analysis


Contrast DF Chi-Square Pr > ChiSq Type
Time 2 - Time 1 1 30.37 <.0001 Score

The p-value is 0.0001, and this indicates that the difference in hot flash scores between time 1
and time 2 is significant.
(c) Lets suppose that the LMM has G and R matrices that follow the compound symmetry
structure. For contrast, we can build a model as the following:

14 | P a g e
Yij = b0 + b1t0 + b2t1 + b3t2 + b4t3 + b5t4 + b6G + b7P + b8Gt0 + b9Gt1 + b10Gt2 + b11Gt3 + b12Gt4 +
b13Pt0 + b14Pt1 + b15Pt2 + b16Pt3 + b17Pt4 + eij

The time difference within the Gabapentin group:


G2: b0 + b3 + b6 + b10 and G1: b0 + b2 + b6 + b9
G2 G1: (b0 + b3 + b6 + b10) - (b0 + b2 + b6 + b9) = b3 - b2 + b10 - b9
The time difference within the Placebo group:
P2: b0 + b3 + b7 + b15 and P1: b0 + b2 + b7 + b15
P2 P1: (b0 + b3 + b7 + b15) - (b0 + b2 + b7 + b14) = b3 - b2 + b15 - b14
(G2 G1) (P2 P1): (b3 - b2 + b10 - b9) - (b3 - b2 + b15 - b14) = b10 - b15 - b9 + b14
In SAS, this L matrix is treatment*time 0 -1 1 0 0 0 1 -1 0 0.

Using the same LMM generated by proc mixed,


Table 19 Contrasts
Label Num DF Den DF F Value Pr > F
(G2-G1)-(P2-P1) 1 756 0.27 0.6002

The p-value is 0.6002, and this indicates that the difference in hot flash scores between
time 1 and time 2 for treatment group versus control group is not significant.

(d) Lets suppose the working correlation structure is AR(1). The model for contrast is still the
same as part c. In SAS, this L matrix is treatment*time 0 -1 1 0 0 0 1 -1 0 0. The GEE
Parameter Estimates generated by proc genmod remains the same as part b.
Table 20 Contrast Results for GEE Analysis
Contrast DF Chi-Square Pr > ChiSq Type
(G2-G1)-(P2-P1) 1 0.78 0.3763 Score

The p-value is 0.3763, and this indicates that the difference in hot flash scores between
time 1 and time 2 for treatment group versus control group is not significant.
Section VII: Conditional Expectations
(a) The linear mixed model is the same as section VIs part a. The question asks to find the
conditional mean of the response at time 3 in the control group.

E[yij| Treatment = Placebo, Time = 3] = b0 + b4 + b7 + b16 = 13.2827 0.3938 = 12.8890

Table 21 Estimates
Label Estimate SE DF t Value Pr > |t|
Placebo at Time 3 12.8890 1.1903 756 10.83 <.0001

15 | P a g e
(b) The generalized estimating equation model is the same as section VIs part b. The question
asks to find the conditional mean of the response at time 3 in the control group.

E[yij| Treatment = Placebo, Time = 3] = b0 + b4 + b7 + b16 = 13.3617 0.4180 = 12.9437

Table 22 Contrast Estimate Results


Label Mean Mean Standard Chi- Pr > ChiSq
Estimate Confidence Limits Error Square
Placebo at Time 3 12.9437 10.6610 15.2264 1.1647 123.52 <.0001

Section VIII: Least Squares Means


(a) Least Squares Means for the Outcome at Each Time Point in Each Treatment for LMM
Note that the LMM above is still the same as Section VIs part a.

Table 23 Least Squares Means


Effect treatment time Estimate Standard DF t Value Pr > |t|
Error
treatment*time Gabapentin 0 19.7216 1.1814 756 16.69 <.0001
treatment*time Gabapentin 1 11.9082 1.1864 756 10.04 <.0001
treatment*time Gabapentin 2 10.1457 1.1882 756 8.54 <.0001
treatment*time Gabapentin 3 9.5710 1.1882 756 8.06 <.0001
treatment*time Gabapentin 4 9.2289 1.1882 756 7.77 <.0001
treatment*time Placebo 0 18.2653 1.1859 756 15.40 <.0001
treatment*time Placebo 1 14.6994 1.1903 756 12.35 <.0001
treatment*time Placebo 2 12.3817 1.1903 756 10.40 <.0001
treatment*time Placebo 3 12.8890 1.1903 756 10.83 <.0001
treatment*time Placebo 4 13.2827 1.1903 756 11.16 <.0001

To see the calculation above, for instance we take treatment Gabapentin at time 0. Then,
E[yij| Trt = Gab, Time = 0] = b0 + b1 + b6 + b8 = 13.2827 + 4.9826 - 4.0538 + 5.5101 = 19.7216

Checking Differences in the Means

Table 24 Differences of Least Squares Means


Effect treatment time treatment time Estimate SE DF t Value Pr > |t|
treatment*time Gabapentin 0 Gabapentin 1 7.8134 0.7487 756 10.44 <.0001
treatment*time Gabapentin 0 Gabapentin 2 9.5759 0.7515 756 12.74 <.0001
treatment*time Gabapentin 0 Gabapentin 3 10.1506 0.7515 756 13.51 <.0001
treatment*time Gabapentin 0 Gabapentin 4 10.4927 0.7515 756 13.96 <.0001
treatment*time Gabapentin 0 Placebo 0 1.4563 1.6740 756 0.87 0.3846
treatment*time Gabapentin 0 Placebo 1 5.0222 1.6770 756 2.99 0.0028

16 | P a g e
treatment*time Gabapentin 0 Placebo 2 7.3399 1.6770 756 4.38 <.0001
treatment*time Gabapentin 0 Placebo 3 6.8326 1.6770 756 4.07 <.0001
treatment*time Gabapentin 0 Placebo 4 6.4389 1.6770 756 3.84 0.0001
treatment*time Gabapentin 1 Gabapentin 2 1.7624 0.7502 756 2.35 0.0191
treatment*time Gabapentin 1 Gabapentin 3 2.3372 0.7502 756 3.12 0.0019
treatment*time Gabapentin 1 Gabapentin 4 2.6793 0.7502 756 3.57 0.0004
treatment*time Gabapentin 1 Placebo 0 -6.3571 1.6775 756 -3.79 0.0002
treatment*time Gabapentin 1 Placebo 1 -2.7912 1.6806 756 -1.66 0.0972
treatment*time Gabapentin 1 Placebo 2 -0.4735 1.6806 756 -0.28 0.7782
treatment*time Gabapentin 1 Placebo 3 -0.9808 1.6806 756 -0.58 0.5596
treatment*time Gabapentin 1 Placebo 4 -1.3746 1.6806 756 -0.82 0.4137
treatment*time Gabapentin 2 Gabapentin 3 0.5747 0.7513 756 0.76 0.4445
treatment*time Gabapentin 2 Gabapentin 4 0.9168 0.7513 756 1.22 0.2227
treatment*time Gabapentin 2 Placebo 0 -8.1196 1.6788 756 -4.84 <.0001
treatment*time Gabapentin 2 Placebo 1 -4.5537 1.6818 756 -2.71 0.0069
treatment*time Gabapentin 2 Placebo 2 -2.2360 1.6818 756 -1.33 0.1841
treatment*time Gabapentin 2 Placebo 3 -2.7432 1.6818 756 -1.63 0.1033
treatment*time Gabapentin 2 Placebo 4 -3.1370 1.6818 756 -1.87 0.0625
treatment*time Gabapentin 3 Gabapentin 4 0.3421 0.7513 756 0.46 0.6490
treatment*time Gabapentin 3 Placebo 0 -8.6943 1.6788 756 -5.18 <.0001
treatment*time Gabapentin 3 Placebo 1 -5.1284 1.6818 756 -3.05 0.0024
treatment*time Gabapentin 3 Placebo 2 -2.8107 1.6818 756 -1.67 0.0951
treatment*time Gabapentin 3 Placebo 3 -3.3180 1.6818 756 -1.97 0.0489
treatment*time Gabapentin 3 Placebo 4 -3.7117 1.6818 756 -2.21 0.0276
treatment*time Gabapentin 4 Placebo 0 -9.0364 1.6788 756 -5.38 <.0001
treatment*time Gabapentin 4 Placebo 1 -5.4705 1.6818 756 -3.25 0.0012
treatment*time Gabapentin 4 Placebo 2 -3.1528 1.6818 756 -1.87 0.0612
treatment*time Gabapentin 4 Placebo 3 -3.6601 1.6818 756 -2.18 0.0298
treatment*time Gabapentin 4 Placebo 4 -4.0538 1.6818 756 -2.41 0.0162
treatment*time Placebo 0 Placebo 1 3.5659 0.7467 756 4.78 <.0001
treatment*time Placebo 0 Placebo 2 5.8836 0.7467 756 7.88 <.0001
treatment*time Placebo 0 Placebo 3 5.3763 0.7467 756 7.20 <.0001
treatment*time Placebo 0 Placebo 4 4.9826 0.7467 756 6.67 <.0001
treatment*time Placebo 1 Placebo 2 2.3177 0.7474 756 3.10 0.0020
treatment*time Placebo 1 Placebo 3 1.8104 0.7474 756 2.42 0.0157
treatment*time Placebo 1 Placebo 4 1.4167 0.7474 756 1.90 0.0584
treatment*time Placebo 2 Placebo 3 -0.5073 0.7474 756 -0.68 0.4975
treatment*time Placebo 2 Placebo 4 -0.9010 0.7474 756 -1.21 0.2284
treatment*time Placebo 3 Placebo 4 -0.3938 0.7474 756 -0.53 0.5985

The yellow highlighted ones show that there are no differences between the means.
Gabapentin at time point 0 comparing to other time points show there are differences
between the means because the small p-values. This suggests that Gabapentin is lowering
the hot flash scores. The two groups start out to be similar because of the p-value 0.3846.

17 | P a g e
More interestingly, the two groups at time point 4 have different means and they are
significant as seen from the blue highlight. This is calculated by 9.2289 - 13.2827 = -4.0538.
The Gabapentin group has a lower mean score at time point 4, which suggests that
Gabapentin is helping to reduce the hot flash scores more. Moreover, looking at the last
four rows for Placebo, the means between times 1 & 4, 2 & 3, 2 & 4, and 3 & 4 are not
significant. Thus, Placebo seems to be inefficient with reducing the hot flash scores.

(b) Least Squares Means for the Outcome at Each Time Point in Each Treatment for GEE
Note that the GEE above is still the same as Section VIs part b.

Table 25 treatment*time Least Squares Means


treatment time Estimate Standard Error z Value Pr > |z|
Gabapentin 0 19.6829 1.3355 14.74 <.0001
Gabapentin 1 11.9011 0.8934 13.32 <.0001
Gabapentin 2 10.2035 0.8971 11.37 <.0001
Gabapentin 3 9.6573 0.9905 9.75 <.0001
Gabapentin 4 9.3391 0.9471 9.86 <.0001
Placebo 0 18.2653 1.7123 10.67 <.0001
Placebo 1 14.6907 1.3399 10.96 <.0001
Placebo 2 12.4075 1.0722 11.57 <.0001
Placebo 3 12.9437 1.1647 11.11 <.0001
Placebo 4 13.3617 1.1783 11.34 <.0001

To see the calculation above, for instance we take treatment Gabapentin at time 0. Then,
E[yij| Trt = Gab, Time = 0] = b0 + b1 + b6 + b8 = 13.3617 + 4.9036 - 4.0226 + 5.4402 = 19.7216

Table 26 Differences of treatment*time Least Squares Means


treatment time treatment time Estimate Standard Error z Value Pr > |z|
Gabapentin 0 Gabapentin 1 7.7818 0.8811 8.83 <.0001
Gabapentin 0 Gabapentin 2 9.4794 1.0289 9.21 <.0001
Gabapentin 0 Gabapentin 3 10.0257 1.0294 9.74 <.0001
Gabapentin 0 Gabapentin 4 10.3438 1.0448 9.90 <.0001
Gabapentin 0 Placebo 0 1.4176 2.1716 0.65 0.5139
Gabapentin 0 Placebo 1 4.9922 1.8918 2.64 0.0083
Gabapentin 0 Placebo 2 7.2755 1.7127 4.25 <.0001
Gabapentin 0 Placebo 3 6.7393 1.7720 3.80 0.0001
Gabapentin 0 Placebo 4 6.3212 1.7810 3.55 0.0004
Gabapentin 1 Gabapentin 2 1.6976 0.4170 4.07 <.0001
Gabapentin 1 Gabapentin 3 2.2438 0.6190 3.63 0.0003
Gabapentin 1 Gabapentin 4 2.5620 0.6499 3.94 <.0001
Gabapentin 1 Placebo 0 -6.3642 1.9314 -3.30 0.0010
18 | P a g e
Gabapentin 1 Placebo 1 -2.7896 1.6104 -1.73 0.0832
Gabapentin 1 Placebo 2 -0.5064 1.3956 -0.36 0.7167
Gabapentin 1 Placebo 3 -1.0426 1.4678 -0.71 0.4775
Gabapentin 1 Placebo 4 -1.4606 1.4787 -0.99 0.3233
Gabapentin 2 Gabapentin 3 0.5462 0.3900 1.40 0.1613
Gabapentin 2 Gabapentin 4 0.8644 0.4511 1.92 0.0553
Gabapentin 2 Placebo 0 -8.0618 1.9331 -4.17 <.0001
Gabapentin 2 Placebo 1 -4.4872 1.6125 -2.78 0.0054
Gabapentin 2 Placebo 2 -2.2040 1.3980 -1.58 0.1149
Gabapentin 2 Placebo 3 -2.7402 1.4701 -1.86 0.0623
Gabapentin 2 Placebo 4 -3.1582 1.4810 -2.13 0.0330
Gabapentin 3 Gabapentin 4 0.3182 0.3008 1.06 0.2902
Gabapentin 3 Placebo 0 -8.6080 1.9781 -4.35 <.0001
Gabapentin 3 Placebo 1 -5.0334 1.6662 -3.02 0.0025
Gabapentin 3 Placebo 2 -2.7502 1.4597 -1.88 0.0596
Gabapentin 3 Placebo 3 -3.2864 1.5289 -2.15 0.0316
Gabapentin 3 Placebo 4 -3.7044 1.5393 -2.41 0.0161
Gabapentin 4 Placebo 0 -8.9262 1.9568 -4.56 <.0001
Gabapentin 4 Placebo 1 -5.3516 1.6408 -3.26 0.0011
Gabapentin 4 Placebo 2 -3.0684 1.4306 -2.14 0.0320
Gabapentin 4 Placebo 3 -3.6046 1.5011 -2.40 0.0163
Gabapentin 4 Placebo 4 -4.0226 1.5118 -2.66 0.0078
Placebo 0 Placebo 1 3.5746 0.7567 4.72 <.0001
Placebo 0 Placebo 2 5.8578 1.0307 5.68 <.0001
Placebo 0 Placebo 3 5.3216 1.0785 4.93 <.0001
Placebo 0 Placebo 4 4.9036 1.1621 4.22 <.0001
Placebo 1 Placebo 2 2.2833 0.5124 4.46 <.0001
Placebo 1 Placebo 3 1.7470 0.5482 3.19 0.0014
Placebo 1 Placebo 4 1.3290 0.6459 2.06 0.0396
Placebo 2 Placebo 3 -0.5362 0.3839 -1.40 0.1625
Placebo 2 Placebo 4 -0.9542 0.4501 -2.12 0.0340
Placebo 3 Placebo 4 -0.4180 0.3144 -1.33 0.1836

The yellow highlighted ones show that there are no differences between the means. This
GEE also results in similar conclusion as the LMM. Gabapentin at time point 0 comparing to
other time points show there are differences between the means because the small p-
values. This suggests that Gabapentin is lowering the hot flash scores. The two groups start
out to be similar because of the p-value 0.5139. The two groups at time point 4 have
different means and they are significant as seen from the blue highlight. This is calculated
by 9.3391- 13.3617 = -4.0226. The Gabapentin group has a lower mean score at time point
4, which suggests that Gabapentin is helping to reduce the hot flash scores more.

19 | P a g e
(c) Still using the same LMM, but adding more covariates to the model. The solution for fixed
effects of the model is as follows:

Table 27 Solution for Fixed Effects


Effect treatment time smoker Estimate SE DF t Value Pr > |t|
Intercept 3.3879 9.7921 190 0.35 0.7297
treatment Gabapentin -3.8848 1.7166 752 -2.26 0.0239
treatment Placebo 0 . . . .
time 0 4.8835 0.7515 752 6.50 <.0001
time 1 1.3695 0.7522 752 1.82 0.0691
time 2 -0.9305 0.7522 752 -1.24 0.2165
time 3 -0.4147 0.7522 752 -0.55 0.5816
time 4 0 . . . .
smoker 0 -1.6177 2.5915 752 -0.62 0.5327
smoker 1 0 . . . .
wbc -0.3722 0.4793 752 -0.78 0.4377
age 0.2572 0.1763 752 1.46 0.1449
hrt_months 0.001944 0.0195 752 0.10 0.9207
treatment*time Gabapentin 0 5.6118 1.0635 752 5.28 <.0001
treatment*time Gabapentin 1 1.3083 1.0630 752 1.23 0.2188
treatment*time Gabapentin 2 1.8474 1.0638 752 1.74 0.0829
treatment*time Gabapentin 3 0.7568 1.0638 752 0.71 0.4770
treatment*time Gabapentin 4 0 . . . .

Table 28 Least Squares Means


Effect treatment time Estimate SE DF t Value Pr > |t|
treatment*time Gabapentin 0 20.4683 1.5601 752 13.12 <.0001
treatment*time Gabapentin 1 12.6509 1.5658 752 8.08 <.0001
treatment*time Gabapentin 2 10.8899 1.5669 752 6.95 <.0001
treatment*time Gabapentin 3 10.3152 1.5669 752 6.58 <.0001
treatment*time Gabapentin 4 9.9731 1.5669 752 6.36 <.0001
treatment*time Placebo 0 18.7413 1.5914 752 11.78 <.0001
treatment*time Placebo 1 15.2273 1.5938 752 9.55 <.0001
treatment*time Placebo 2 12.9273 1.5938 752 8.11 <.0001
treatment*time Placebo 3 13.4431 1.5938 752 8.43 <.0001
treatment*time Placebo 4 13.8579 1.5938 752 8.69 <.0001

The adjusted LS means estimates are have higher values than the ones calculated without the
additional covariates in the model. The standard errors also increased in the adjusted LS means
estimates. However, the trend of each treatment at each time point is still the same.
Still using the same GEE, but adding more covariates to the model. The model is as follows:

20 | P a g e
Table 29 Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard 95% Confidence Z Pr > |Z|
Error Limits
Intercept 2.0743 10.9789 -19.4439 23.5925 0.19 0.8501
treatment Gabapentin -3.8648 1.5612 -6.9248 -0.8049 -2.48 0.0133
treatment Placebo 0.0000 0.0000 0.0000 0.0000 . .
time 0 4.8097 1.1728 2.5111 7.1083 4.10 <.0001
time 1 1.2886 0.6526 0.0094 2.5678 1.97 0.0483
time 2 -0.9796 0.4544 -1.8703 -0.0889 -2.16 0.0311
time 3 -0.4371 0.3173 -1.0590 0.1848 -1.38 0.1683
time 4 0.0000 0.0000 0.0000 0.0000 . .
smoker 0 -1.6176 2.3691 -6.2609 3.0258 -0.68 0.4947
smoker 1 0.0000 0.0000 0.0000 0.0000 . .
wbc -0.3124 0.4109 -1.1178 0.4930 -0.76 0.4471
age 0.2767 0.2239 -0.1622 0.7156 1.24 0.2166
hrt_months 0.0008 0.0148 -0.0283 0.0298 0.05 0.9590
treatment*time Gabapentin 0 5.5379 1.5681 2.4644 8.6113 3.53 0.0004
treatment*time Gabapentin 1 1.2732 0.9196 -0.5292 3.0757 1.38 0.1662
treatment*time Gabapentin 2 1.8439 0.6401 0.5894 3.0984 2.88 0.0040
treatment*time Gabapentin 3 0.7553 0.4371 -0.1014 1.6119 1.73 0.0840
treatment*time Gabapentin 4 0.0000 0.0000 0.0000 0.0000 . .

Table 30 treatment*time Least Squares Means


treatment time Estimate Standard Error z Value Pr > |z|
Gabapentin 0 20.4213 1.5436 13.23 <.0001
Gabapentin 1 12.6356 1.2794 9.88 <.0001
Gabapentin 2 10.9381 1.2622 8.67 <.0001
Gabapentin 3 10.3919 1.3172 7.89 <.0001
Gabapentin 4 10.0738 1.2874 7.83 <.0001
Placebo 0 18.7483 2.0688 9.06 <.0001
Placebo 1 15.2272 1.7089 8.91 <.0001
Placebo 2 12.9590 1.5360 8.44 <.0001
Placebo 3 13.5015 1.5956 8.46 <.0001
Placebo 4 13.9386 1.5909 8.76 <.0001

The adjusted LS means estimates are have higher values than the ones calculated without the
additional covariates in the model. The standard errors also increased in the adjusted LS means
estimates. However, the trend of each treatment at each time point is still the same.

21 | P a g e
/* Section I */
/* Plot a spaghetti plot for outcome over time for all patients*/
proc sgplot data = work.mydata;
title 'Outcome Over Time';
series x=time y=outcome / group=id grouplc=treatment name='grouping';
keylegend 'grouping' / type=linecolor;
reg x=time y=outcome/nomarkers lineattrs=(color=black thickness=4);
run;

/* Box plot for outcome over time for all patients */


proc sgplot data = work.mydata;
vbox outcome/category = time connect=mean;
title 'Outcome Over Time';
run;

/* Plot a spaghetti plot for outcome over time for each group separately */
proc sgplot data = work.mydata;
where treatment = 'Placebo';
title 'Outcome Over Time';
series x=time y=outcome / group=id grouplc=treatment name='grouping';
keylegend 'grouping' / type=linecolor;
reg x=time y=outcome/nomarkers lineattrs=(color=black thickness=4);
run;

proc sgplot data = work.mydata;


where treatment = 'Gabapentin';
title 'Outcome Over Time';
series x=time y=outcome / group=id grouplc=treatment name='grouping';
keylegend 'grouping' / type=linecolor;
reg x=time y=outcome/nomarkers lineattrs=(color=black thickness=4);
run;

proc sgplot data = work.mydata;


where treatment = 'Gabapentin';
vbox outcome /category = time group = treatment connect=mean;
title 'Outcome Over Time for Gabapentin';
run;

/* Box Plot for outcome over time for each group separately */
proc sgplot data = work.mydata;
where treatment = 'Placebo';
vbox outcome /category = time group = treatment connect=mean;
title 'Outcome Over Time for Placebo';
run;

/* Box Plot for outcome over time for each group */


proc sgplot data = work.mydata;
vbox outcome /category = time group = treatment connect=mean;
title 'Outcome Over Time for Each Group';
run;

/* Estimate the variance-covariance structure and correlation of the repeated


measures outcome overall */
proc mixed data=work.mydata covtest;
class id treatment time;

22 | P a g e
model outcome = treatment time treatment*time/s;
repeated/subject=id(treatment) type=un r rcorr;
run;
/* Estimate the variance-covariance structure of the repeated measures
outcome within each of the treatment groups*/
proc mixed data=work.mydata covtest;
class id treatment time;
model outcome = treatment time treatment*time/s;
repeated/subject=id(treatment) type=un group=treatment r=1 rcorr=1 r=190
rcorr=190;
run;

/* Section II */
/* The association between each predictor individually and the outcome */
proc mixed data=work.mydata covtest;
class id treatment time;
model outcome=time/ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

proc mixed data=work.mydata covtest;


class id treatment;
model outcome=treatment/ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

proc mixed data=work.mydata covtest;


class id treatment smoker;
model outcome=smoker /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

proc mixed data=work.mydata covtest;


class id treatment;
model outcome=hrt_months /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

proc mixed data=work.mydata covtest;


class id treatment;
model outcome=wbc /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

proc mixed data=work.mydata covtest;


class id treatment;
model outcome=age /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

/* Scatterplot of time and outcome */


title 'Scatterplot of Time and Outcome';
proc loess data=work.mydata;
model outcome=time;
run;

/* Final Model */

23 | P a g e
proc mixed data=work.mydata covtest;
class id treatment time smoker ;
model outcome= time treatment age smoker time*treatment /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;

/* Section IV */
/* The Final Linear Mixed Model */
proc mixed data=work.mydata;
class id treatment time smoker;
model outcome=time treatment age smoker time*treatment;
random time/type=un subject=id(treatment)g;
run;

/* Assess the adequacy of the fit */


proc mixed data=work.mydata covtest;
class id treatment time smoker;
model outcome=time treatment age smoker time*treatment/residual outpm=out;
random time/type=un subject=id(treatment);
run;

/* After log transformation and the removal of outlier id 318*/


data hf;
set work.hf1;
logout = log(outcome);
ods graphics on;
ods html style=statistical;
ods graphics on;
proc mixed data=hf covtest;
class id treatment time smoker;
model outcome=time treatment age smoker time*treatment/residual outpm=out;
random time/type=un subject=id(treatment);
run;

/* Section V */
/* The Final Generalized Estimating Equation */
proc genmod data=work.mydata;
class id time treatment smoker;
model outcome=time treatment age smoker time*treatment /d=n link=indentity;
repeated subject=id/type=cs within=time covb corrw;
run;

/* Assess the adequacy of the fit */


ods graphics on;
proc genmod plots = all data=work.mydata;
class id treatment time (ref="0")/param=ref;
model log10(outcome)=time treatment age smoker time*treatment /d=n
link=indentity;
repeated subject=id(treatment) /type=cs within=time;
run;
ods graphics off;

/* After log transformation and the removal of outlier id 318*/


ods graphics on;
proc genmod plots = all data=hf;
class id time treatment smoker;

24 | P a g e
model outcome=time treatment age smoker time*treatment /d=n link=indentity;
repeated subject=id/type=cs within=time covb corrw;
run;
ods graphics off;

/* Sections VI, VII, and VIII */


/* Linear Mixed Model */
proc mixed data=work.mydata covtest;
class id treatment time;
model outcome=treatment time treatment*time/s;
random int/s subject=id type=cs;
repeated /subject=id type=cs;
contrast 'Time 2 - Time 1' time 0 -1 1 0 0;
contrast '(G2-G1)-(P2-P1)' treatment*time 0 -1 1 0 0 0 1 -1 0 0;
estimate 'Placebo at Time 3' intercept 1
treatment 0 1
time 0 0 0 1 0
treatment*time 0 0 0 0 0 0 0 0 1 0;
lsmeans treatment*time/diff;
run;

/* Generalized Estimating Equations */


proc genmod data=work.mydata;
class id treatment time;
model outcome=treatment time treatment*time /d=n link=indentity;
repeated subject=id(treatment)/type=ar(1) within=time;
contrast 'Time 2 - Time 1' time 0 -1 1 0 0;
contrast '(G2-G1)-(P2-P1)' treatment*time 0 -1 1 0 0 0 1 -1 0 0;
estimate 'Placebo at Time 3' intercept 1
treatment 0 1
time 0 0 0 1 0
treatment*time 0 0 0 0 0 0 0 0 1 0;
lsmeans treatment*time/diff;
run;

/* Section VIII */
/* Linear Mixed Model */
proc mixed data=work.mydata covtest;
class id treatment time smoker;
model outcome=treatment time smoker wbc age hrt_months treatment*time/s;
random int/s subject=id type=cs;
repeated /subject=id type=cs;
lsmeans treatment*time/diff;
run;

/* Generalized Estimating Equations */


proc genmod data=work.mydata;
class id treatment time smoker;
model outcome=treatment time smoker wbc age hrt_months treatment*time/d=n
link=indentity;
repeated subject=id(treatment)/type=ar(1) within=time;
lsmeans treatment*time/diff;
run;

25 | P a g e

You might also like