Professional Documents
Culture Documents
In short,
children who had high blood lead levels were randomized to receive a placebo (control group) or a new
chelating agent, succimer (treatment/intervention group). Children were followed over time, and blood lead
levels (outcome, or dependent variable) were recorded at baseline, week 1, week 4, and week 6. The overall
goal is to determine if succimer is effective at decreasing blood lead levels. We also expect placebo to have
negligible impact on blood lead levels. Therefore, we want to compare treatment and control subjects over
time. Due to the randomization of a relatively large number of subjects, there should be no difference between
the groups at baseline.
We have a dataset of 100 subjects. Here is what the top part (first 10 subjects) of it looks like. Note that this is
not the proper setup for the dataset when we actually analyze the data. You will learn how to set up the dataset
later. Here, we have one observation per subject. Subject id is denoted by id. Treatment type, P-placebo or Asuccimer (note that ids do not match up with the ids in Table 1.1 of your book, and your book has S for
succimer, not A), denoted by trt. Blood lead level values (outcomes / dependent variable) are given by y0, y1,
y4, and y6 for baseline, week 1, week 4, and week 6, respectively.
proc print data=long.tlc; run;
Obs
id trt
y0
y1
y4
y6
1 P
30.8
26.9
25.8
23.8
2 A
26.5
14.8
19.5
21.0
3 A
25.8
23.0
19.1
23.2
4 P
24.7
24.5
22.0
22.5
5 A
20.4
2.8
3.2
9.4
6 A
20.4
5.4
4.5
11.9
7 P
28.6
20.8
19.2
18.4
8 P
33.7
31.6
28.5
25.1
9 P
19.7
14.9
15.3
14.7
10
10 P
31.1
31.2
29.2
30.1
Variable
Mean
trt=A
Std Dev
Minimum
Maximum
y0
50
26.5400000
5.0209358
19.7000000
41.1000000
y1
50
13.5220000
7.6724870
2.8000000
39.0000000
y4
50
15.5140000
7.8522065
3.0000000
40.4000000
y6
50
20.7620000
9.2463316
4.1000000
63.9000000
trt=P
Std Dev
Minimum
Maximum
Variable
Mean
y0
50
26.2720000
5.0241068
19.7000000
38.1000000
y1
50
24.6600000
5.4611803
14.9000000
40.8000000
y4
50
24.0700000
5.7531269
15.3000000
38.6000000
y6
50
23.6460000
5.6398079
13.5000000
43.3000000
/*Plotting the sample means by group across time. Same type of plot as Figure 1.1 in your text.*/
goptions reset=all gunit=pct cback=white colors=(black) border ftext=zapf htext=4;
title 'Estimated Mean Time Trend for Blood Lead Levels by Trial Arm';
symbol1 color=red interpol=join width=1 line=1 value=dot;
symbol2 color=blue interpol=join width=1 line=2 value=dot;
legend1 label=('Legend') frame position=(top right inside) /*value=(font=swiss)*/
mode=protect value=('Placebo' 'Succimer') across=1;
axis1 value=(font=swiss) major=(height=2 width=2) minor=(height=1)
label= (height=2 font=swiss 'Time (Weeks)') width=2;
axis2 value=(font=swiss) major=(height=2 width=2) minor=(height=1)
label= (height=2 font=swiss 'Estimated Mean Blood Lead Levels') width=2;
proc gplot data=means_data;
plot Mean*Time=Treatment/legend=legend1 haxis=axis1 vaxis=axis2;
run;
quit;
Sample means are approximately the same at baseline due to randomization working well. There is little
change over time in the placebo group. There is a large change from baseline to week 1 in the treatment group,
but then the mean increases over time. Your book explains that this rebound in blood lead levels is due to the
mobilization of lead that has been stored in tissues and bones.
Now lets look at a plot of the trajectories of several subjects over time. This is called a spaghetti plot. First, we
need to set up the data in a manner that we can make this plot.
/*Suppose we want to produce a spaghetti plot. We first need to setup the data so we can make this plot. Note
that the data needs to be setup in a manner that we can also analyze the data. To do this, we need one new
column that denotes time, and a single column with corresponding outcome values. This SAS code is based on
the code in Table 5.10 (page 137) of your text.*/
data tlc; set long.tlc;
y=y0; time=0; output; /*Observation for baseline*/
y=y1; time=1; output; /*Observation for Week 1*/
y=y4; time=4; output; /*Observation for Week 4*/
y=y6; time=6; output; /*Observation for Week 6*/
drop y0 y1 y4 y6;
run;
proc print data=tlc; run;
Obs
id trt
time
2 A
26.5
2 A
14.8
2 A
19.5
2 A
21.0
3 A
25.8
3 A
23.0
3 A
19.1
3 A
23.2
/*Plotting the trajectories for the first four treatment subjects (2, 3, 5, and 6)*/
goptions reset=all gunit=pct cback=white colors=(black) border ftext=zapf htext=2.5;
title 'Time Trends for Blood Lead Levels for Select Subjects';
symbol1 color=red interpol=join width=1 line=1 value=dot;
symbol2 color=blue interpol=join width=1 line=2 value=dot;
symbol3 color=black interpol=join width=1 line=1 value=dot;
symbol4 color=purple interpol=join width=1 line=2 value=dot;
legend1 label=('Legend') frame position=(top right inside)
mode=protect value=('ID 2' 'ID 3' 'ID 5' 'ID 6') across=1;
axis1 value=(font=swiss) major=(height=2 width=2) minor=(height=1)
label= (height=2 font=swiss 'Time (Weeks)') width=2;
axis2 value=(font=swiss) major=(height=2 width=2) minor=(height=1)
label= (height=2 font=swiss 'Blood Lead Levels') width=2;
proc gplot data=tlc;
plot y*time=id/legend=legend1 haxis=axis1 vaxis=axis2;
where id<7 and trt="A";
run;
quit;
This spaghetti plot is also called a time plot (page 34 of your text). Notice the between and within subject
variability. Different subjects clearly have different trajectories over time. Between-subject variability can be
seen by the fact that individuals tend to consistently have lower or larger blood lead levels over time. This is
one major source for the correlation among outcomes from the same subject is induced.