Professional Documents
Culture Documents
(a) Plot the outcome over time and describe any patterns.
Figure 1 The trend of outcome for both Gabapentin and Placebo groups
Figure 2 The Boxplots of outcome for both Gabapentin and Placebo groups
1|Page
There is one obvious outlier, which is id 318 of the placebo group. On the top diagram, this
individuals points are all outliers across time. However, as time progressed, the individuals hot
flash score declined. The downward sloping black line shows the general trend of the patients.
Hot flash score seems to be improving over time. As for the bottom diagram, many individuals
have lower scores at time 2 as the data are more concentrated. Moreover, there are several
outliers in each time point. For instance at time point 0, Q3+1.5*(Q3-Q1)=46.05, and this tells
us that any points beyond 46.05 are outliers. At each time point, mean exceeds median and this
means that the outcomes at each time point follow a right-skewed distribution. Lastly, the
scores decline between times 0 and 2, and stay constant between times 3 and 4.
(b) Plot the outcome over time in each respective treatment group and describe any patterns.
Figure 3 Spaghetti plots of outcome over time for Placebo and Gabapentin (separate version)
Figure 4 Boxplots of outcome over time for Placebo and Gabapentin (separate version)
2|Page
The hot flash scores overtime for the placebo group overall seems to be decline, but not so
much according the its spaghetti plot. Its box plot shows that the hot flash scores actually
increased a bit after time 2. However, we see a decreasing trend of hot flash score for the
Gabapentin group in both its spaghetti plot and its box plot over time. This implies that
Gabapentin seems to be working.
(c) Plot each time point create a pair of side-by-side box-plots of the outcome in each
treatment group.
Figure 5 Boxplots of outcome over time for Placebo and Gabapentin (combined version)
The hot flash scores have a decline between time 0 and time 2 in the placebo group. Between
time 3 and time 4, there is a slight increase. Also, there are several outliers at each time point.
For the Gabapentin gorup, there is a gradual decline in hot flash scores at each time point. In
this group, there are several outliers, but are not that spread comparing to the placebo group
at each time point. As for the boxplot, at times 3 and 4, Gabapentin has lower ranges of hot
flash scores than the placebo group at times 3 and 4. Thus, it seems that Gabapentin is working.
(d) The variance-covariance structure of the repeated measures outcome overall:
Table 1 The Estimated R matrix
Estimated R Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 233.05 148.42 113.41 119.92 112.91
2 148.42 128.72 101.96 105.30 99.9560
3 113.41 101.96 96.2750 98.3692 94.2333
4 119.92 105.30 98.3692 114.86 108.72
5 112.91 99.9560 94.2333 108.72 111.74
3|Page
Table 2 The Estimated R Correlation Matrix
Estimated R Correlation Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.8569 0.7572 0.7330 0.6997
2 0.8569 1.0000 0.9159 0.8660 0.8335
3 0.7572 0.9159 1.0000 0.9354 0.9086
4 0.7330 0.8660 0.9354 1.0000 0.9596
5 0.6997 0.8335 0.9086 0.9596 1.0000
Two measurements that are right next to each other tend to be correlated. For instance, the
entry row1-col2 is highly correlated (measurements at time 0 and time 1). The correlation
decreases as the two measurements get farther apart. For instance, the entry row1-col5
(measurements at time 0 and 4) has a lower correlation than the entry row1-col2.
(e) The variance-covariance structure of the repeated measures outcome within each of the
treatment groups:
Table 3 The Estimated R matrix
Estimated R Matrix for id(treatment) 165 Gabapentin
Row Col1 Col2 Col3 Col4 Col5
1 176.37 91.3819 77.2150 85.3764 79.1819
2 91.3819 80.3184 71.5092 70.1267 63.6879
3 77.2150 71.5092 79.2628 80.6509 73.5679
4 85.3764 70.1267 80.6509 96.5993 87.6164
5 79.1819 63.6879 73.5679 87.6164 87.3776
Two measurements that are close each other tend to have higher correlation than the two
measurements that are far apart (entries row5-col1 and row5-co4).
Table 5 The Estimated R matrix
Estimated R Matrix for id(treatment) 364 Placebo
Row Col1 Col2 Col3 Col4 Col5
1 290.30 206.03 149.99 154.88 147.01
4|Page
2 206.03 177.55 132.71 140.79 136.46
3 149.99 132.71 113.50 116.34 115.07
4 154.88 140.79 116.34 133.41 130.00
5 147.01 136.46 115.07 130.00 136.18
Two measurements that are close each other tend to have higher correlation than the two
measurements that are far apart (entries row5-col1 and row5-co4). Thus, we can conclude that
the correlation patterns are similar.
Session II: Bivariate Models
(a) Consider the association between each predictor (individually) and the outcome.
Suppose the variance-covariance structure R is unstructured and suppose ddfm = kr.
5|Page
(b) Consider using time as a categorical and a continuous predictor.
Time is continuous
Deviance Statistic = -2*Res Log Likelihood of the continuous - 2* Res Log Likelihood of the
categorical = 6013.8 - 5935.2 = 78.6 > 24 . The p-value is 3.330669e-16, which means that the
categorical time model is better than the continuous time model.
This states that the model is not improved after changing time to continuous. We can access
the linearity of the relationship through a scatterplot and overlay the smoothed curve.
6|Page
Here time and outcome seem to have a negative linear relationship. In the categorical case, at
time 3, there is a negative association between time and outcome. It seems reasonable because
as time point goes up, we expect to see the outcome to go down.
Section III: Multivariable Models
(a) Considering some of the variable section methods discussed to date.
There are two common variable selection methods: forward and backward selection. We can
start with forward selection. This method starts with no covariates. Check AIC, AICC, and BIC as
new predictors added to the model. This will continue until no predictors can be added. Then
we use a backward elimination method by using the likelihood-ratio statistic. Same procedure
will be done with the interaction terms.
7|Page
17 time+trt+age+smoker 5928.3 5977.5
18 time+trt+age+hrt_months 5938.5 5987.6
19 time+trt+age+wbc 5931.7 5980.9
Step V: 5 Covariates
20 time+trt+age+smoker+hrt_months 5934.6 5983.8
21 time+trt+age+smoker+wbc 5927.6 5976.8
Step VI: 6 Covariates
22 time+trt+age+smoker+wbc+hrt_months 5933.9 5983.0
Now we compare model 18 and model 21 by deviance difference to see which model is better.
Then we can consider interaction terms by forward selection. LRT = deviance17 deviance21 =
5898.3 - 5897.6 = 0.7 < 12 = 3.84, and the p-value is 0.4027837, which states that the simpler
model 18 is better. Thus, model 21 is eliminated. We then compare model 16 with model 18.
LRT = deviance16 deviance17 = 5902.2 - 5898.3 = 3.9 > 12 = 3.84, and the p-value is 0.048. This
value is on the border line, so it is up to the research who finds smoker as an important
covariate in the study. In this case, the p-value states that model 17 is better.
Now we compare model 31 and 24 by LRT = deviance24 deviance31 = 5870.3-5836.4 = 33.9 > 29
= 16.92, and the p-value is 9.3e-05. This is suggesting the complex model is actually better.
However, the new interaction uses up 9 degrees of freedom, and may change the meaning of
the lower order coefficients. None of the terms is significant, so without further complicating
the model, the interaction term will not be included. Thus, model 24 will be the final model.
8|Page
Table 12 Solution for Fixed Effects
Effect treatment time smoker Estimate SD DF t Value Pr > |t|
The linear mixed model follows the form of y = XB + Z + , where y is the matrix of
outcome variable, X is the matrix of predictor variables, Z is the design matrix, is the
vector of random effects, and is the vector of residuals.
Now, we have to consider for the random effects. Consider to make the time points as the
random effects. The reason we consider time as random is because our repeated
observations are correlated, and by using random time we try to model the correlation
among repeated measures within the same subject. The random effects design matrix has a
column takes on a time point, and an intercept column, if any. If the observation is in the
time point, then it will equal to 1, but if not then will be 0. If we pick time without intercept
for the design matrix Z, then the Z matrix is 196 by 5. If we pick time with intercept for the
9|Page
design matrix Z, then the Z matrix is 196 by 6. If we only pick time, then the design matrix Z
is 196 by 1. Recall that variance for y is V = ZGZ+R. In our case, we want to fit the random
portion of the model by specifying the terms that define the random design matrix Z and
specifying the structures of matrices G. Then we compare which model has the lowest AIC
and BIC. Lets assume complete independence across subjects.
10 | P a g e
The Residual vs Predicted Mean graph bounces randomly around the 0 line. However, there are
several outliers. The Q-Q plot looks curved and some outliers at each end. We can also look at
the residual percent plot. The majority of the residual fall between -10 and 10. The graph is
right skewed, which indicates there are some positive outliers skewing the normality of the
residual. The right skewness of the residuals violates the linear mixed models assumption. To
improve the model, we removed the outlier id 318, and made a log transformation.
The AIC and BIC both improved, and the residual vs percent plot suggests that the residuals
become more normal. However, the curve of the Q-Q plot is still apparent. The Residual Vs
Predicted mean also improved after the log-transformation and the removal of the outlier.
Section V: Generalized Estimating Equations (GEEs)
(a) GEE models is an extension of generalized linear models for correlated data. The additional
specification in a GEE model is the working correlation structure of the repeated measure.
In our case, the working correlation matrix Ri is 5 by 5, which will be the same for all
individuals. This Ri matrix models the correlation among the repeated measures within
subject. We will build our GEE model based on the Quasi-likelihood under Independence
Model Criterion (QIC). QIC is to assist researchers to choose the best correlation structure.
The smaller the QIC is the better. We are going to keep the same covariates as the previous
questions. According to the QIC, the exchangeable working correlation structure provides
the smallest QIC.
11 | P a g e
Table 14 Model Selection
Correlation Structure Types QIC
Autoregressive(1) 983.6723
Exchangeable or CS 981.3564
Independent 982.6149
Unstructured 2468.5068
M-dependent 1014.9139
GEE method yields consistent estimates of the regression coefficients and standard errors,
even with misspecification of the correlation structure. However, efficiency such as
statistical power is reduced if the choice of R is incorrect. The loss of efficiency can be
lessened as number of subject gets large.
(b) Assess the adequacy of the fitted model
The raw residual and Pearson residual show that the residuals are randomly scattered around
the 0 line. However, there are several outliers that are distorting the model. Moreover, the
Cooks Distance plot and the leverage plot show there is a potential influential observation. This
observation needs further investigation because it may distort the outcome and accuracy of the
GEE model. After deleting the extreme value id 318, the GEE models QIC improved to
974.3188. However, there might still be other influential observations in the data referring to
the diagnostic statistics for outcome below.
12 | P a g e
Figure 10 Adjusted diagnostic plots for the generalized estimating equation
(a) Lets suppose that the LMM has G and R matrices that follow the compound symmetry
structure. For contrast, we can build a model as the following:
Yij = b0 + b1t0 + b2t1 + b3t2 + b4t3 + b5t4 + b6G + b7P + b8Gt0 + b9Gt1 + b10Gt2 + b11Gt3 + b12Gt4 +
b13Pt0 + b14Pt1 + b15Pt2 + b16Pt3 + b17Pt4 + eij
We want the difference between time 1 and time 2: T1 is b0 + b2 and T2 is b0 + b3. The
difference between them is T2 T1 = (b0 + b3) (b0 + b2) = b3 - b2. In SAS, this L matrix is time
0 -1 1 0 0.
Below is the solution for the fixed effects of the LMM by proc mixed:
Table 15 Solution for Fixed Effects
Effect treatment time Estimate Standard DF t Value Pr > |t|
Error
Intercept 13.2827 1.1903 195 11.16 <.0001
treatment Gabapentin -4.0538 1.6818 756 -2.41 0.0162
treatment Placebo 0 . . . .
time 0 4.9826 0.7467 756 6.67 <.0001
time 1 1.4167 0.7474 756 1.90 0.0584
time 2 -0.9010 0.7474 756 -1.21 0.2284
time 3 -0.3938 0.7474 756 -0.53 0.5985
time 4 0 . . . .
treatment*time Gabapentin 0 5.5101 1.0594 756 5.20 <.0001
treatment*time Gabapentin 1 1.2626 1.0590 756 1.19 0.2335
treatment*time Gabapentin 2 1.8179 1.0598 756 1.72 0.0867
13 | P a g e
treatment*time Gabapentin 3 0.7359 1.0598 756 0.69 0.4877
treatment*time Gabapentin 4 0 . . . .
Table 16 Contrasts
Label Num DF Den DF F Value Pr > F
Time 2 - Time 1 1 756 14.84 0.0001
The p-value is 0.0001, and this indicates that the difference in hot flash scores between time 1
and time 2 is significant.
(b) Lets suppose the working correlation structure is AR(1). The model for contrast is still the
same as part a. In SAS, this L matrix is time 0 -1 1 0 0 still stays the same as part a.
The p-value is 0.0001, and this indicates that the difference in hot flash scores between time 1
and time 2 is significant.
(c) Lets suppose that the LMM has G and R matrices that follow the compound symmetry
structure. For contrast, we can build a model as the following:
14 | P a g e
Yij = b0 + b1t0 + b2t1 + b3t2 + b4t3 + b5t4 + b6G + b7P + b8Gt0 + b9Gt1 + b10Gt2 + b11Gt3 + b12Gt4 +
b13Pt0 + b14Pt1 + b15Pt2 + b16Pt3 + b17Pt4 + eij
The p-value is 0.6002, and this indicates that the difference in hot flash scores between
time 1 and time 2 for treatment group versus control group is not significant.
(d) Lets suppose the working correlation structure is AR(1). The model for contrast is still the
same as part c. In SAS, this L matrix is treatment*time 0 -1 1 0 0 0 1 -1 0 0. The GEE
Parameter Estimates generated by proc genmod remains the same as part b.
Table 20 Contrast Results for GEE Analysis
Contrast DF Chi-Square Pr > ChiSq Type
(G2-G1)-(P2-P1) 1 0.78 0.3763 Score
The p-value is 0.3763, and this indicates that the difference in hot flash scores between
time 1 and time 2 for treatment group versus control group is not significant.
Section VII: Conditional Expectations
(a) The linear mixed model is the same as section VIs part a. The question asks to find the
conditional mean of the response at time 3 in the control group.
Table 21 Estimates
Label Estimate SE DF t Value Pr > |t|
Placebo at Time 3 12.8890 1.1903 756 10.83 <.0001
15 | P a g e
(b) The generalized estimating equation model is the same as section VIs part b. The question
asks to find the conditional mean of the response at time 3 in the control group.
To see the calculation above, for instance we take treatment Gabapentin at time 0. Then,
E[yij| Trt = Gab, Time = 0] = b0 + b1 + b6 + b8 = 13.2827 + 4.9826 - 4.0538 + 5.5101 = 19.7216
16 | P a g e
treatment*time Gabapentin 0 Placebo 2 7.3399 1.6770 756 4.38 <.0001
treatment*time Gabapentin 0 Placebo 3 6.8326 1.6770 756 4.07 <.0001
treatment*time Gabapentin 0 Placebo 4 6.4389 1.6770 756 3.84 0.0001
treatment*time Gabapentin 1 Gabapentin 2 1.7624 0.7502 756 2.35 0.0191
treatment*time Gabapentin 1 Gabapentin 3 2.3372 0.7502 756 3.12 0.0019
treatment*time Gabapentin 1 Gabapentin 4 2.6793 0.7502 756 3.57 0.0004
treatment*time Gabapentin 1 Placebo 0 -6.3571 1.6775 756 -3.79 0.0002
treatment*time Gabapentin 1 Placebo 1 -2.7912 1.6806 756 -1.66 0.0972
treatment*time Gabapentin 1 Placebo 2 -0.4735 1.6806 756 -0.28 0.7782
treatment*time Gabapentin 1 Placebo 3 -0.9808 1.6806 756 -0.58 0.5596
treatment*time Gabapentin 1 Placebo 4 -1.3746 1.6806 756 -0.82 0.4137
treatment*time Gabapentin 2 Gabapentin 3 0.5747 0.7513 756 0.76 0.4445
treatment*time Gabapentin 2 Gabapentin 4 0.9168 0.7513 756 1.22 0.2227
treatment*time Gabapentin 2 Placebo 0 -8.1196 1.6788 756 -4.84 <.0001
treatment*time Gabapentin 2 Placebo 1 -4.5537 1.6818 756 -2.71 0.0069
treatment*time Gabapentin 2 Placebo 2 -2.2360 1.6818 756 -1.33 0.1841
treatment*time Gabapentin 2 Placebo 3 -2.7432 1.6818 756 -1.63 0.1033
treatment*time Gabapentin 2 Placebo 4 -3.1370 1.6818 756 -1.87 0.0625
treatment*time Gabapentin 3 Gabapentin 4 0.3421 0.7513 756 0.46 0.6490
treatment*time Gabapentin 3 Placebo 0 -8.6943 1.6788 756 -5.18 <.0001
treatment*time Gabapentin 3 Placebo 1 -5.1284 1.6818 756 -3.05 0.0024
treatment*time Gabapentin 3 Placebo 2 -2.8107 1.6818 756 -1.67 0.0951
treatment*time Gabapentin 3 Placebo 3 -3.3180 1.6818 756 -1.97 0.0489
treatment*time Gabapentin 3 Placebo 4 -3.7117 1.6818 756 -2.21 0.0276
treatment*time Gabapentin 4 Placebo 0 -9.0364 1.6788 756 -5.38 <.0001
treatment*time Gabapentin 4 Placebo 1 -5.4705 1.6818 756 -3.25 0.0012
treatment*time Gabapentin 4 Placebo 2 -3.1528 1.6818 756 -1.87 0.0612
treatment*time Gabapentin 4 Placebo 3 -3.6601 1.6818 756 -2.18 0.0298
treatment*time Gabapentin 4 Placebo 4 -4.0538 1.6818 756 -2.41 0.0162
treatment*time Placebo 0 Placebo 1 3.5659 0.7467 756 4.78 <.0001
treatment*time Placebo 0 Placebo 2 5.8836 0.7467 756 7.88 <.0001
treatment*time Placebo 0 Placebo 3 5.3763 0.7467 756 7.20 <.0001
treatment*time Placebo 0 Placebo 4 4.9826 0.7467 756 6.67 <.0001
treatment*time Placebo 1 Placebo 2 2.3177 0.7474 756 3.10 0.0020
treatment*time Placebo 1 Placebo 3 1.8104 0.7474 756 2.42 0.0157
treatment*time Placebo 1 Placebo 4 1.4167 0.7474 756 1.90 0.0584
treatment*time Placebo 2 Placebo 3 -0.5073 0.7474 756 -0.68 0.4975
treatment*time Placebo 2 Placebo 4 -0.9010 0.7474 756 -1.21 0.2284
treatment*time Placebo 3 Placebo 4 -0.3938 0.7474 756 -0.53 0.5985
The yellow highlighted ones show that there are no differences between the means.
Gabapentin at time point 0 comparing to other time points show there are differences
between the means because the small p-values. This suggests that Gabapentin is lowering
the hot flash scores. The two groups start out to be similar because of the p-value 0.3846.
17 | P a g e
More interestingly, the two groups at time point 4 have different means and they are
significant as seen from the blue highlight. This is calculated by 9.2289 - 13.2827 = -4.0538.
The Gabapentin group has a lower mean score at time point 4, which suggests that
Gabapentin is helping to reduce the hot flash scores more. Moreover, looking at the last
four rows for Placebo, the means between times 1 & 4, 2 & 3, 2 & 4, and 3 & 4 are not
significant. Thus, Placebo seems to be inefficient with reducing the hot flash scores.
(b) Least Squares Means for the Outcome at Each Time Point in Each Treatment for GEE
Note that the GEE above is still the same as Section VIs part b.
To see the calculation above, for instance we take treatment Gabapentin at time 0. Then,
E[yij| Trt = Gab, Time = 0] = b0 + b1 + b6 + b8 = 13.3617 + 4.9036 - 4.0226 + 5.4402 = 19.7216
The yellow highlighted ones show that there are no differences between the means. This
GEE also results in similar conclusion as the LMM. Gabapentin at time point 0 comparing to
other time points show there are differences between the means because the small p-
values. This suggests that Gabapentin is lowering the hot flash scores. The two groups start
out to be similar because of the p-value 0.5139. The two groups at time point 4 have
different means and they are significant as seen from the blue highlight. This is calculated
by 9.3391- 13.3617 = -4.0226. The Gabapentin group has a lower mean score at time point
4, which suggests that Gabapentin is helping to reduce the hot flash scores more.
19 | P a g e
(c) Still using the same LMM, but adding more covariates to the model. The solution for fixed
effects of the model is as follows:
The adjusted LS means estimates are have higher values than the ones calculated without the
additional covariates in the model. The standard errors also increased in the adjusted LS means
estimates. However, the trend of each treatment at each time point is still the same.
Still using the same GEE, but adding more covariates to the model. The model is as follows:
20 | P a g e
Table 29 Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard 95% Confidence Z Pr > |Z|
Error Limits
Intercept 2.0743 10.9789 -19.4439 23.5925 0.19 0.8501
treatment Gabapentin -3.8648 1.5612 -6.9248 -0.8049 -2.48 0.0133
treatment Placebo 0.0000 0.0000 0.0000 0.0000 . .
time 0 4.8097 1.1728 2.5111 7.1083 4.10 <.0001
time 1 1.2886 0.6526 0.0094 2.5678 1.97 0.0483
time 2 -0.9796 0.4544 -1.8703 -0.0889 -2.16 0.0311
time 3 -0.4371 0.3173 -1.0590 0.1848 -1.38 0.1683
time 4 0.0000 0.0000 0.0000 0.0000 . .
smoker 0 -1.6176 2.3691 -6.2609 3.0258 -0.68 0.4947
smoker 1 0.0000 0.0000 0.0000 0.0000 . .
wbc -0.3124 0.4109 -1.1178 0.4930 -0.76 0.4471
age 0.2767 0.2239 -0.1622 0.7156 1.24 0.2166
hrt_months 0.0008 0.0148 -0.0283 0.0298 0.05 0.9590
treatment*time Gabapentin 0 5.5379 1.5681 2.4644 8.6113 3.53 0.0004
treatment*time Gabapentin 1 1.2732 0.9196 -0.5292 3.0757 1.38 0.1662
treatment*time Gabapentin 2 1.8439 0.6401 0.5894 3.0984 2.88 0.0040
treatment*time Gabapentin 3 0.7553 0.4371 -0.1014 1.6119 1.73 0.0840
treatment*time Gabapentin 4 0.0000 0.0000 0.0000 0.0000 . .
The adjusted LS means estimates are have higher values than the ones calculated without the
additional covariates in the model. The standard errors also increased in the adjusted LS means
estimates. However, the trend of each treatment at each time point is still the same.
21 | P a g e
/* Section I */
/* Plot a spaghetti plot for outcome over time for all patients*/
proc sgplot data = work.mydata;
title 'Outcome Over Time';
series x=time y=outcome / group=id grouplc=treatment name='grouping';
keylegend 'grouping' / type=linecolor;
reg x=time y=outcome/nomarkers lineattrs=(color=black thickness=4);
run;
/* Plot a spaghetti plot for outcome over time for each group separately */
proc sgplot data = work.mydata;
where treatment = 'Placebo';
title 'Outcome Over Time';
series x=time y=outcome / group=id grouplc=treatment name='grouping';
keylegend 'grouping' / type=linecolor;
reg x=time y=outcome/nomarkers lineattrs=(color=black thickness=4);
run;
/* Box Plot for outcome over time for each group separately */
proc sgplot data = work.mydata;
where treatment = 'Placebo';
vbox outcome /category = time group = treatment connect=mean;
title 'Outcome Over Time for Placebo';
run;
22 | P a g e
model outcome = treatment time treatment*time/s;
repeated/subject=id(treatment) type=un r rcorr;
run;
/* Estimate the variance-covariance structure of the repeated measures
outcome within each of the treatment groups*/
proc mixed data=work.mydata covtest;
class id treatment time;
model outcome = treatment time treatment*time/s;
repeated/subject=id(treatment) type=un group=treatment r=1 rcorr=1 r=190
rcorr=190;
run;
/* Section II */
/* The association between each predictor individually and the outcome */
proc mixed data=work.mydata covtest;
class id treatment time;
model outcome=time/ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;
/* Final Model */
23 | P a g e
proc mixed data=work.mydata covtest;
class id treatment time smoker ;
model outcome= time treatment age smoker time*treatment /ddfm=kr s;
repeated/subject=id(treatment) type=un r rcorr;
run;
/* Section IV */
/* The Final Linear Mixed Model */
proc mixed data=work.mydata;
class id treatment time smoker;
model outcome=time treatment age smoker time*treatment;
random time/type=un subject=id(treatment)g;
run;
/* Section V */
/* The Final Generalized Estimating Equation */
proc genmod data=work.mydata;
class id time treatment smoker;
model outcome=time treatment age smoker time*treatment /d=n link=indentity;
repeated subject=id/type=cs within=time covb corrw;
run;
24 | P a g e
model outcome=time treatment age smoker time*treatment /d=n link=indentity;
repeated subject=id/type=cs within=time covb corrw;
run;
ods graphics off;
/* Section VIII */
/* Linear Mixed Model */
proc mixed data=work.mydata covtest;
class id treatment time smoker;
model outcome=treatment time smoker wbc age hrt_months treatment*time/s;
random int/s subject=id type=cs;
repeated /subject=id type=cs;
lsmeans treatment*time/diff;
run;
25 | P a g e