You are on page 1of 4

Stat 303/403 SAS Programming and Applied Statistics

Homework 5

Due Thursday 10/7/2010

Problem 1
We have a data set called BLOOD that contains from one to five observations per subject. Each
observation contains the variables ID, GROUP, TIME, WBC (white blood cells), and RBC (red blood
cells). Run the following program to create this data set.
***Program to create data set BLOOD;
DATA BLOOD;
LENGTH GROUP $ 1;
INPUT ID GROUP $ TIME WBC RBC @@;
DATALINES;
1 A 1 8000 4.5 1 A 2 8200 4.8 1 A 3 8400 5.2
1 A 4 8300 5.3 1 A 5 8400 5.5
2 A 1 7800 4.9 2 A 2 7900 5.0
3 B 1 8200 5.4 3 B 2 8300 5.4 3 B 3 8300 5.2
3 B 4 8200 4.9 3 B 5 8300 5.0
4 B 1 8600 5.5
5 A 1 7900 5.2 5 A 2 8000 5.2 5 A 3 8200 5.4
5 A 4 8400 5.5
;

Create a data set that contains the mean WBC and RBC for each subject. This new data set should
contain the variables ID, GROUP, M_WBC, and M_RBC, where M_WBC and M_RBC are the mean
values for the subject. Finally, we want to exclude any subjects from this data set who have two or
fewer observations in the original data set (assume there are no missing values).

Answer
PROC MEANS DATA=BLOOD NWAY NOPRINT;
CLASS ID;
ID GROUP;
VAR WBC RBC;
OUTPUT OUT=TEMP(WHERE=(_FREQ_ GT 2)
DROP=_TYPE_)
MEAN = M_WBC M_RBC;
RUN;
PROC PRINT DATA=TEMP NOOBS;
TITLE "Listing of data set TEMP";
RUN;

Problem 2
Using data set BLOOD from problem 1, run the following program to create a new dataset
NEWBLOOD.
PROC SORT DATA=BLOOD;
BY ID TIME;
RUN;
DATA NEWBLOOD;
SET BLOOD;
BY ID;
IF FIRST.ID AND LAST.ID THEN DELETE;
IF FIRST.ID OR LAST.ID THEN DO;
D_WBC = WBC – LAG(WBC);
D_RBC = RBC – LAG(RBC);
END;
RUN;
The first 4 values of D_WBC and D_RBC are missing. Explain why. The 5th values of D_WBC and
D_RBC are 400 and 1.0 respectively. Explain how these numbers are calculated.

Answer
The first values of D_WBC and D_RBC are missing because it is the first observation for ID 1
and therefore the values of LAG(WBC) and LAG(RBC) are missing. This results in missing
values for D_WBC and D_RBC.

The second to the forth values of D_WBC and D_RBC are missing because the condition “IF
FIRST.ID OR LAST.ID” is false for these observations and therefore the calculations between
“DO” and “END” are not performed. This results in missing values for D_WBC and D_RBC.

The 5th value of D_WBC is calculated by 8400 – 8000 = 400; the 5th value of D_RBC is
calculated by 5.5 – 4.5 = 1.0.

Problem 3
The following table shows average amounts of bread consumed per person per week in London from
1960 to 1980.

Year
Type of bread
1960 1965 1970 1975 1980
White 1040 975 915 785 620
Brown 70 80 70 75 115
Wholemeal 25 20 15 20 45
Other 155 80 85 75 105

(a) Create an SAS data set called BREAD with three variables YEAR (Num), TYPE (Char), and
AMOUNT (Num). Provide your SAS codes here.
DATA BREAD;
LENGTH TYPE $ 9;
INPUT YEAR TYPE$ AMOUNT @@;
DATALINES;
1960 White 1040 1960 Brown 70 1960 Wholemeal 25 1960 Other 155
1965 White 975 1965 Brown 80 1965 Wholemeal 20 1965 Other 80
1970 White 915 1970 Brown 70 1970 Wholemeal 15 1970 Other 85
1975 White 785 1975 Brown 75 1975 Wholemeal 20 1975 Other 75
1980 White 620 1980 Brown 115 1980 Wholemeal 45 1980 Other 105
;

(b) Produce the following histograms for the variable AMOUNT: first for all types of bread, then
separately for each type. Provide your SAS codes here.

PROC UNIVARIATE DATA=BREAD;


VAR AMOUNT;
HISTOGRAM AMOUNT;
RUN;

PROC UNIVARIATE DATA=BREAD;


CLASS TYPE;
VAR AMOUNT;
HISTOGRAM AMOUNT;
RUN;

(c) Create a new data set named DIFF, which includes all the three variables in data set BREAD and
an additional new variable DIFF_AMOUNT, which is the difference in AMOUNT from year to the
next for each type of bread. Do not output the first observation for each type of bread. Provide your
SAS codes here.

PROC SORT DATA=BREAD; BY TYPE YEAR; RUN;


DATA DIFF;
SET BREAD;
BY TYPE;
DIFF_AMOUNT = AMOUNT – LAG(AMOUNT);
IF NOT FIRST.TYPE THEN OUTPUT;
RUN;
PROC PRINT DATA = DIFF; RUN;

(d) Create a new data set named CHANGE, which has one observation per type of bread with the
difference in AMOUNT between the years 1960 and 1980.

DATA CHANGE;
SET BREAD;
BY TYPE;
IF FIRST.TYPE AND LAST.TYPE THEN DELETE;
IF FIRST.TYPE OR LAST.TYPE
THEN D_AMOUNT = AMOUNT – LAG(AMOUNT);
IF LAST.TYPE THEN OUTPUT;
RUN;
PROC PRINT DATA = CHANGE; RUN;

(e) Create a new data set named SUMM, which has one observation per type of bread, with the mean,
median, and sum AMOUNT.

PROC MEANS DATA=BREAD NOPRINT NWAY;


CLASS TYPE;
VAR AMOUNT;
OUTPUT OUT=SUMM (DROP=_TYPE_)
MEAN=
MEDIAN=
SUM=/AUTONAME;
RUN;
PROC PRINT DATA=SUMM;
RUN;

Problem 4
Find mistakes (5 in total) in the following SAS program.

(1) DATA RATS;


(2) INPUT @1 RAT NO 1.
(3) @3 DOB DATE9.
(4) @13 DISEASE DATE9.
(5) @23 DEATH DATE9.
(6) @33 GROUP $1.;
(7) BIR_TO_D = DISEASE - DOB;
(8) DIS_TO_D = DEATH - DISEASE;
(9) AGE = DEATH - DOB;
(10) AGE_NOW = ROUND(YRDIF(DOB,TODAY,'ACTUAL'));
(11) FORMAT DOB DISEASE DEATH MMDDYY10;
(12) DATALINES;
1 23MAY1990 23JUN1990 28JUN1990 A
2 21MAY1990 27JUN1990 05JUL1990 A
3 23MAY1990 25JUN1990 01JUL1990 A
4 27MAY1990 07JUL1990 15JUL1990 A
5 22MAY1990 29JUN1990 22JUL1990 B
6 26MAY1990 03JUL1990 03AUG1990 B
7 24MAY1990 01JUL1990 29JUL1990 B
8 29MAY1990 15JUL1990 18AUG1990 B
;
(13) PROC MEANS RATS;
(14) BY GROUP;
(15) VAR BIR_TO_D AGE_NOW;
(16) RUN;

In line (2), it should be


INPUT @1 RAT_NO 1.
In line (10), it should be
AGE_NOW = ROUND(YRDIF(DOB,TODAY(),‘ACTUAL’));
In line (11), it should be
FORMAT DOB DISEASE DEATH MMDDYY10.;
In line (13), it should be
PROC MEANS DATA = RATS;
In line (14),
it should be CLASS GROUP;

You might also like