You are on page 1of 58

Week 11 SAS Procedures to Summarize Data

Unit 5
SAS for Data Description

Week 11: Procedures to Summarize Data

Welcome!

Data summarization is a significant portion of data management activities. It serves a variety of

purposes, including (1) monitoring and tracking of a study cohort, (2) informing project planning,

and (3) cohort description. Data summaries are also useful because they provide clues to data

cleaning and analysis.

Data summarization might take the form of a listing of the data, the reporting of averages (and

accompanying standard deviations), tabulations of frequency distributions, or graphical summaries

(e.g. scatter plots).

The better summaries are those that are self-explanatory. They are well labeled (have titles and

variable values identified) and are straightforward to understand. It’s also helpful to accompany the

summarization with documentation of the data source (name and version of data set) and the name

of program that generated the summary.

The SAS procedures discussed in this reading, Unit 5 week 1 (week 11 of course), are PRINT,

MEANS, SUMMARY, UNIVARIATE, TABLUATE, FREQ, FORMS, REPORT, CHART, and PLOT.

Week 11 Page 1 of 58
Week 11 SAS Procedures to Summarize Data

In the Unit 5 week 2 (week 12 of course) reading, the procedures discussed are CHART and

PLOT. The Unit 5 week 2 (week 12 of course) reading also includes a brief introduction to using the

SAS ANALYST for producing graphics with the SASGRAPH module. These are higher quality

graphics than the printer character charts and plots produced with PROCs CHART and PLOT.

TIP - Most procedures produce results that appear in the output window. Along with directly

producing output, many of the procedures to be discussed can produce new SAS data sets (output

data), which in turn can be used in other procedures, such as PRINT or TABULATE. In this way, it

is possible to have more control over the format in which results are printed. Output data sets can

also be modified in subsequent DATA steps, to add labels or formats before printing.

TIP – Be sure to see descriptions of these procedures in the SAS Procedures Guide. Learn to use

the SAS manuals - there are many options to use with all procedures. These course notes

are not all inclusive!

Week 11 Page 2 of 58
Week 11 SAS Procedures to Summarize Data

Goals of Week 11: Procedures to Summarize Data

1. to be competent in using SAS procedures to “write out” data values in a manner that is
easy to read;

2. to appreciate the utility of “writing out” data as a preliminary to data quality assessment;

3. to be competent in using SAS procedures to produce frequency distributions and cross-


tabulations of data that are self-explanatory to an independent reader; and

4. to be (at least a little) competent in using SAS procedures to “write out” data as part of the
production of forms (admittedly MS ACCESS might be a better tool in this regard);

Week 11 Outline – Procedures to Summarize Data


Section Topic Page

1. How to Produce Data Listings (PROC PRINT) ……………………………. 4


a. Printing with variable labels and ID statement …………………..….… 5
b. Printing with a BY statement …………………………………………….. 7
c. Introduction to the WHERE statement: How to Print a Subset ……… 13

2. How to Produce Summary Statistics ………………………………………… 15


a. How to Use PROC MEANS and PROC SUMMARY ……………… 15
b. How to Use PROC UNIVARIATE ………………………………….. 21
c. How to Use PROC TABULATE …………………………………….. 27

3. How to Produce Frequency Tables, Cross-Tabulations (PROC FREQ) 39

4. How to Print Forms (PROC FORMS) …….……………………………….. 50

5. How to Use PROC REPORT ………………………………………………. 54

Week 11 Page 3 of 58
Week 11 SAS Procedures to Summarize Data

1. How to Produce Data Listings: PROC PRINT

PROC PRINT is a simple way to get a listing of data in a SAS data set.

• We have already used PROC PRINT to get a listing of data (see Unit 4 week 1)

• More generally, PROC PRINT is used to list:

(1) all of the variable values for all of the observations in a SAS data set,

(2) some of the variable values for all of the observations in a SAS data set, or

(3) some of the variable values for some of the observations in a SAS data set.

• There are options in SAS to control the format of the listing.

• Selected options for PROC PRINT are illustrated in the examples that follow.

Examples – The examples below use data on two neurological assessment scales used in a

cardiopulmonary bypass study. Data on pre-op, post-op and follow-up scores are printed in

different ways to illustrate some of the options available for printing in SAS.

Week 11 Page 4 of 58
Week 11 SAS Procedures to Summarize Data

a Printing with variable labels and ID statement

Example

• The data in this example are arranged with one record per subject. Included in this record are

pre, post, and follow-up scores for each of two assessment scales.

• Scores are printed for all three periods, for the two scales.

• When the keyword LABEL is included in the PROC PRINT statement, the variable labels are

used in the column heading instead of the variable name.

• TIP - It is also possible to assign new labels in the PROC PRINT procedure.

• TIP - A split character can be used when creating variable labels. This character is used to

split the labels into two or more lines for printing. To do this:

- Instead of writing LABEL on the PROC PRINT statement,

- Write SPLIT=’ ’, where the split character (which can be a space) is enclosed

within the single quotes.

• In this example, new labels using the split character * were defined for all variables. You can

see the advantage of using a split: the column width for printing would be determined by the

length of the variable label if it were not split.

• The VAR statement is used to

- select the variables to print, and to

- control the order in which variables appear.

• When the ID statement is used, no observation number is printed. Instead, the variable named

after ID appears in the leftmost column, before the variables in the VAR statement.

• IF NO VAR statement is written, ALL of the variables in the dataset are printed, and they are

printed in the order in which they are stored.

Week 11 Page 5 of 58
Week 11 SAS Procedures to Summarize Data

Example -
*__________________________________________________;
** print neurologic summary scores for pre, post, **;
** & follow-up using label and ID options **;
**************************************************************
* Use ‘*’ to define split character **;
* print only first 10 observations **;
PROC PRINT DATA=MNSCORE(OBS=10) SPLIT='*';

** define labels indicating where to split into two lines **;


LABEL PATID=’PATIENT*ID’
MSCORE1='PRE-OP*MATHEW*SCORE'
MSCORE2='POST-OP*MATHEW*SCORE'
MSCORE3='FOLLOW-UP*MATHEW*SCORE'
NSCORE1='PRE-OP*NEURO*SCORE'
NSCORE2='POST-OP*NEURO*SCORE'
NSCORE3='FOLLOW-UP*NEURO*SCORE';

** define the variable to put in first column in place of obs number **;
ID PATID;

** name variables in order for printing **;


VAR MSCORE1 MSCORE2 MSCORE3 NSCORE1 NSCORE2 NSCORE3;

TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES';


TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
RUN;

The following output is produced:

LISTING OF NEUROLOGICAL TOTALS SCORES


FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES

PREOP POSTOP FOLLOWUP PREOP POSTOP FOLLOWUP


PATID MATHEW MATHEW MATHEW NEURO NEURO NEURO
ID SCORE SCORE SCORE SCORE SCORE SCORE
24 100 97 97 100 85 85
28 99 87 98 100 85 90
60 100 100 100 100 100 100
65 100 98 100 100 90 100
70 99 97 100 100 95 100
40 97 99 98 95 100 95
190 100 100 97 100 100 90
196 100 100 99 100 100 100
210 98 85 99 90 80 95
240 100 98 97 100 95 90

Week 11 Page 6 of 58
Week 11 SAS Procedures to Summarize Data

b Printing with a BY statement

Any use of a BY statement must be preceded with a sort of the data!

Use of a BY statement allows you to print data that is sorted.

• To print with a BY statement, the data must first be sorted by the BY variable(s). For

example, if you wanted to print data that is sorted by LNAME (e.g. LNAME is the variable

name for “last name”), the print instruction must be preceded by a sort instruction. This

might look something like

proc sort data=temp; by lname; run;

proc print data=temp; by lname; run;

• When a BY variable is used for printing, the data are grouped under a header line the gives

information on the grouping or BY variable.

• The second example uses the same data, but in this case it is arranged with multiple records

per subject, for pre, post, and follow-up status. Here the data are printed in two ways, first

by patient id number, and second by patient status.

• Tip - When the same variable is named in both an ID statement and a BY statement, the

grouping variable is listed in the first column, and not repeated for subsequent observations.

• Each by-group is separated by a blank line; this makes clear the separation of groups.

Week 11 Page 7 of 58
Week 11 SAS Procedures to Summarize Data

• Here, labels have been created with the split character as part of the PROC PRINT

instruction; however, this is not required.

• Tip - Do include a split character in variable labels; it makes your output easier to read!.

• Tip – And while you’re at it, nclude the variable name within the label. For example: LABEL

PATID = ‘PATID:*Patient*ID’; This also enhances readability since it produces the variable

name as well as its accompanying description in the output.

Week 11 Page 8 of 58
Week 11 SAS Procedures to Summarize Data

*************************************************************;
** print neurologic summary scores for pre, post, & **;
** follow-up using label and ID options and BY statements **;
*************************************************************;

** create formats for patient status **;


PROC FORMAT;
VALUE PFMT 1='PRE-OP'
2='POST-OP'
3='FOLLOW-UP';
RUN;

** SORT to print grouped by PATID **;


PROC SORT DATA=MNS2;
BY PATID;
RUN;

* define split char, suppress printing obs # with NOOBS;


* print grouped by patid;
* print first 9 obs ;
PROC PRINT DATA=MNS2(OBS=9) SPLIT='*' NOOBS;
BY PATID;

** format patient status **;


FORMAT PSTATUS PFMT.;

** define labels indicating where to split into two lines **;


LABEL MATTOTAL='MATHEW*TOTAL SCORE'
NTOTAL='NEUROLOGICAL*TOTAL SCORE'
PSTATUS='PATIENT*STATUS';

** name variables in order for printing **;


VAR PSTATUS MATTOTAL NTOTAL;

TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES';


TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
TITLE3 'LISTED BY SUBJECT';
RUN;

Week 11 Page 9 of 58
Week 11 SAS Procedures to Summarize Data

Output follows. Note that no observation number is listed, due to the option NOOBS on the PROC
PRINT statement.

LISTING OF NEUROLOGICAL TOTALS SCORES


FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
LISTED BY SUBJECT

--------------------- PATIENT ID NUMBER =24 ---------------------

PATIENT MATHEW NEUROLOGICAL


STATUS TOTAL SCORE TOTAL SCORE

PRE-OP 100 100


POST-OP 97 85
FOLLOW-UP 97 85

--------------------- PATIENT ID NUMBER = 28 --------------------

PATIENT MATHEW NEUROLOGICAL


STATUS TOTAL SCORE TOTAL SCORE

PRE-OP 99 100
POST-OP 87 85
FOLLOW-UP 98 90

--------------------- PATIENT ID NUMBER = 60 --------------------

PATIENT MATHEW NEUROLOGICAL


STATUS TOTAL SCORE TOTAL SCORE

PRE-OP 100 100


POST-OP 100 100
FOLLOW-UP 100 100

Week 11 Page 10 of 58
Week 11 SAS Procedures to Summarize Data

Recall - When the same variable is named in both an ID statement and a BY statement, the

grouping variable is listed in the first column, and not repeated for subsequent observations.

Example – The following example uses BY statement and ID statement together, and groups the

data by status rather than by patient.

** print the same data grouped by pstatus **;


PROC SORT DATA=MNS2;
BY PSTATUS;
RUN;

* print grouped by pstatus ;


* define split char ;
* ID the pstatus in first column ;
PROC PRINT DATA=MNS2 SPLIT='*';
BY PSTATUS;
ID PSTATUS;
** assign format for patient status **;
FORMAT PSTATUS PFMT.;

** define labels indicating where to split into two lines **;


LABEL PATID ='PATIENT*ID NO.'
MATTOTAL='MATHEW*TOTAL SCORE'
NTOTAL='NEUROLOGICAL*TOTAL SCORE'
PSTATUS='PATIENT*STATUS';

** name variables in order for printing **;


VAR PATID MATTOTAL NTOTAL;

TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES';


TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
TITLE3 'LISTED BY STATUS, ID STATUS';
RUN;

Week 11 Page 11 of 58
Week 11 SAS Procedures to Summarize Data

LISTING OF NEUROLOGICAL TOTALS SCORES


FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
LISTED BY STATUS, ID STATUS

PATIENT PATIENT MATHEW NEUROLOGICAL


STATUS ID NO. TOTAL SCORE TOTAL SCORE

PRE-OP 24 100 100


28 99 100
60 100 100
65 100 100
74 99 100

POST-OP 24 97 85
28 87 85
60 100 100
65 98 90
74 97 95

FOLLOW-UP 24 97 85
28 98 90
60 100 100
65 100 100
74 100 100

Note the variable listed in both the BY and ID statements is listed to the left, and written only once

for each group.

There are several more options to control printing, including line spacing (e.g., double or single),

printing subtotals and column totals

• See the PRINT procedure in the SAS Procedures Guide or use the online HELP.

Week 11 Page 12 of 58
Week 11 SAS Procedures to Summarize Data

c Introduction to the WHERE statement: How to Print a Subset of Observations

The WHERE statement instructs SAS to perform its task on a selected set of observations.

• WHERE is used to select which observations will be used in the procedure being performed.

• For example, if we are interested in listing only data for the pre-operative assessment, this

can be done by creating a subset data file – using a data step to take a subset of data with

PSTATUS=1, and then using this new data set in PROC PRINT.

• Alternatively, PROC PRINT can be used with a WHERE statement, with the condition:

WHERE PSTATUS=1;

to select the subset of interest.

• Note - In the following example, notice the difference in observation numbers that appear on

the data listing (if the OBS column is printed).

* Part 1: create subset of data and print it *;


DATA MNPSTAT1;
SET MNS2;
IF PSTATUS=1;
RUN;
* define split char, print subset ;
PROC PRINT DATA=MNPSTAT1 SPLIT='*';
FORMAT PSTATUS PFMT.;
LABEL PATID ='PATIENT*ID NO.'
MATTOTAL='MATHEW*TOTAL SCORE'
NTOTAL='NEUROLOGICAL*TOTAL SCORE'
PSTATUS='PATIENT*STATUS';
VAR PSTATUS PATID MATTOTAL NTOTAL;
TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES';
TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
TITLE3 'PRE-OP SCORES ONLY - SUBSET DATA;
RUN;

* Part 2: print subset using WHERE statement *;


PROC PRINT DATA=MNS2 SPLIT='*';
WHERE PSTATUS=1;

Week 11 Page 13 of 58
Week 11 SAS Procedures to Summarize Data

FORMAT PSTATUS PFMT.;


LABEL PATID ='PATIENT*ID NO.'
MATTOTAL='MATHEW*TOTAL SCORE'
NTOTAL='NEUROLOGICAL*TOTAL SCORE'
PSTATUS='PATIENT*STATUS';
VAR PSTATUS PATID MATTOTAL NTOTAL;
TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES';
TITLE2 'PRE-OP SCORES ONLY - USING WHERE STATEMENT;
RUN;

LISTING OF NEUROLOGICAL TOTALS SCORES


FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
PRE-OP SCORES ONLY - SUBSET DATA

PATIENT PATIENT MATHEW NEUROLOGICAL


OBS STATUS ID NO. TOTAL SCORE TOTAL SCORE

1 PRE-OP 24 100 100


2 PRE-OP 28 99 100
3 PRE-OP 60 100 100
4 PRE-OP 65 100 100
5 PRE-OP 74 99 100
----------------------------------------------------------------------------------------------------------------------------

LISTING OF NEUROLOGICAL TOTALS SCORES


PRE-OP SCORES ONLY - USING WHERE STATEMENT

PATIENT PATIENT MATHEW NEUROLOGICAL


OBS STATUS ID NO. TOTAL SCORE TOTAL SCORE

1 PRE-OP 24 100 100


4 PRE-OP 28 99 100
7 PRE-OP 60 100 100
10 PRE-OP 65 100 100
13 PRE-OP 74 99 100

Note that the observation numbers differ. In this case, the data file MNS2 had been previously

sorted by patient id and patient status, so that the WHERE statement selected every third

observation, corresponding to pre-op status.

Week 11 Page 14 of 58
Week 11 SAS Procedures to Summarize Data

2. How to Produce Summary Statistics

There are four procedures that provide basic descriptive statistics for continuous variables.

• These are MEANS, SUMMARY, UNIVARIATE, and TABULATE. The SAS Procedures Guide

provides a detailed comparison of the statistics available in each procedure.

• The procedures differ in the choice of statistics that can be produced. They also differ in the

formatting of results.

• MEANS, SUMMARY and UNIVARIATE can be used to create output datasets containing

summary statistics that can be used in other procedures, such as PROCs PRINT and

REPORT, which give many options for controlling the formatting of the data.

a. How to Use PROC MEANS and PROC SUMMARY

• PROCs MEANS and SUMMARY can be used to compute means, minimums, quantiles,

maximums, standard deviations and standard errors, range, number of missing values,

confidence intervals, plus several other statistics.

• CLASS and/or BY statements can be used to compute the statistics separately for subgroups

of observations.

• For example, when computing statistics on subject AGE, using the statement BY SEX; would

provide separate statistics on AGE for males and females, for a data set previously sorted

by the variable SEX.

• CLASS statements produce separate statistics for subgroups, along with overall statistics for

the whole group. For example using the statement CLASS SEX; would produce statistics

on AGE for all subjects, as well as for males and females separately.

• Note - Use of a CLASS statement does not require a prior sort of the data.

Week 11 Page 15 of 58
Week 11 SAS Procedures to Summarize Data

The primary difference between PROC MEANS and PROC SUMMARY is in the defaults for

printing.

• PROC MEANS, by default, provides results in the output window, although an output data set

containing the summary statistics can be requested.

• PROC SUMMARY, by default, produces only an output data set, although results can be

requested in the output window using the PRINT option.

The primary difference between CLASS and BY statements is that the format for printing is

different.

• A BY statement requires previous sorting and produces no overall statistics.

Each of the procedures, MEAN and SUMMARY produce a default set of statistics; however, you

can request specific statistics.

• This gives you control of the order the statistics appear in on the output, when the statistics

are printed.

• “How to” - Statistics are requested on the PROC statement, before the first semi-colon.

• See the SAS Procedures Manual or the online documentation for details of available

statistics.

Example - Producing summary statistics for two neurological assessment scales

• This example uses produces summary statistics on the neurologic assessment scales using

PROC MEANS and the PROC SUMMARY.

• Creation of an output data set, subsequently printed with PROC PRINT is also illustrated.

Week 11 Page 16 of 58
Week 11 SAS Procedures to Summarize Data

• TIP - Use the MAXDEC option to control decimal places printed in the output! This is

illustrated in the example. If you don’t use the MAXDEC option, the default is 8 places after

the decimal point – and no one should have to look at that much nonsense.

OPTIONS LINESIZE=78 PAGESIZE=60;


LIBNAME CPB 'C:\temp';

*************************************************************;
** get means of neurologic scores by patient status **;
** do this with PROC MEANS and SUMMARY to show options **;
*************************************************************;

** create formats for patient status **;


PROC FORMAT;
VALUE PFMT 1='PRE-OP'
2='POST-OP'
3='FOLLOW-UP';
RUN;
** sort by pstatus **;
PROC SORT DATA=MNS2;
BY PSTATUS;
RUN;

* name statistics, allow 2 dec places for printing *;


PROC MEANS DATA=MNS2 N MEAN STD STDERR MIN MAX MAXDEC=2;
* compute separately for each period;
BY PSTATUS;

** format patient status **;


FORMAT PSTATUS PFMT.;

** name variables to compute statistics for;


VAR MATTOTAL NTOTAL;

** define output data set, variable names for statistics **;


OUTPUT OUT=MNMEANS MEAN=MEANMAT MEANN
STD=STDMAT STDN
STDERR=SE_MAT SE_N
MIN=MINMAT MINN
MAX=MAXMAT MAXN;
TITLE1 'SUMMARY STATISTICS';
TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
TITLE3 'USING PROC MEANS WITH A BY STATEMENT';

Week 11 Page 17 of 58
Week 11 SAS Procedures to Summarize Data

RUN;

PROC PRINT DATA=MNMEANS;


ID PSTATUS;
FORMAT PSTATUS PFMT.;
TITLE1 'OUTPUT DATA SET FROM MEANS PROCEDURE WITH BY STATEMENT';
RUN;

Following is the output.


SUMMARY STATISTICS
FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
USING PROC MEANS WITH A BY STATEMENT

--------------------- STATUS PRE, POST OR FOLLOW-UP=PRE-OP ---------------------

N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
32 MATTOTAL MATHEW TOTAL SCORE 32 95.00 100.00 99.03 1.37 0.24
NTOTAL NEUROLOGICAL TOTAL 32 90.00 100.00 98.43 3.22 0.57
------------------------------------------------------------------------------------

---------------------- STATUS PRE, POST OR FOLLOW-UP=POST-OP -----------------------

N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
31 MATTOTAL MATHEW TOTAL SCORE 31 73.00 100.00 96.48 5.73 1.03
NTOTAL NEUROLOGICAL TOTAL 31 80.00 100.00 94.67 6.31 1.13
------------------------------------------------------------------------------------

--------------------- STATUS PRE, POST OR FOLLOW-UP=FOLLOW-UP ----------------------

N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error
------------------------------------------------------------------------------------
28 MATTOTAL MATHEW TOTAL SCORE 28 94.00 100.00 98.39 1.39 0.26
NTOTAL NEUROLOGICAL TOTAL 28 85.00 100.00 96.42 4.48 0.84
-----------------------------------------------------------------------------------

OUTPUT DATA SET FROM MEANS PROCEDURE WITH BY STATEMENT


PSTATUS _TYPE_ _FREQ_ MEANMAT MEANN STDMAT STDN SE_MAT SE_N MINMAT MINN MAXMAT MAXN

PRE-OP 0 32 99.03 98.43 1.37 3.22 0.24 0.56 95 90 100 100


POST-OP 0 31 96.48 94.67 5.73 6.31 1.03 1.13 73 80 100 100
FOLLOW-UP 0 28 98.39 96.42 1.39 4.48 0.26 0.84 94 85 100 100

Week 11 Page 18 of 58
Week 11 SAS Procedures to Summarize Data

************************************************************;
** repeat, using PROC SUMMARY with a class statement **;
************************************************************;
* name input data set, & print results *;
* to output window *;
PROC SUMMARY DATA=MNS2 PRINT MAXDEC=2
N MEAN STD STDERR MIN MAX ; /* name statistics*/

* compute separately for each period and overall;


CLASS PSTATUS;

** format patient status **;


FORMAT PSTATUS PFMT.;

** name variables to compute statistics for;


VAR MATTOTAL NTOTAL;

** create output for using in subsequent step, giving name **;


** for each statistic for each variable **;
OUTPUT OUT=MNSUMM MEAN=MEANMAT MEANN
STD=STDMAT STDN
STDERR=SE_MAT SE_N
MIN=MINMAT MINN
MAX=MAXMAT MAXN;
RUN;
PROC PRINT DATA=MNSUMM;
FORMAT PSTATUS PFMT.;
ID PSTATUS;
TITLE1 'SUMMARY STATISTICS';
TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES';
TITLE3 'USING SUMMARY WITH A CLASS STATEMENT';
RUN;

Following is the output.

SUMMARY STATISTICS
FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES
USING SUMMARY WITH A CLASS STATEMENT

PSTATUS _TYPE_ _FREQ_ MEANMAT MEANN STDMAT STDN SE_MAT SE_N MINMAT MINN MAXMAT MAXN

0 91 97.96 96.53 2.48 4.78 0.26 0.50 73 80 100 100


PRE-OP 1 32 99.03 98.43 1.37 3.22 0.24 0.56 95 90 100 100
POST-OP 1 31 96.48 94.67 5.73 6.31 1.03 1.13 73 80 100 100
FOLLOW 1 28 98.39 96.42 1.39 4.48 0.26 0.84 94 85 100 100

Week 11 Page 19 of 58
Week 11 SAS Procedures to Summarize Data

• MEANS produces results to the output window, by default, and this is shown, along with a

printing of the requested output data set.

• SUMMARY produces overall statistics for all observations, in addition to subgroup statistics,

when the class statement is used.

• In this example, the overall statistics are not particularly meaningful, so the advantages of

using a CLASS statement are not important.

• Explanation of the _TYPE_ Variable - The _TYPE_ variable is a SAS produced variable. It

is produced by both MEANS and SUMMARY when an output dataset is requested. _TYPE_

indicates the level of breakdown. 0 indicates overall statistics; 1 indicates that 1 level of sub-

grouping is used, etc. When several variables are used in a CLASS statement, such as

CLASS SEX PSTATUS; then statistics will be produced for females pre-op/ females post-

op/ females follow-up which would have _TYPE_=2, for a 2 level breakdown. Statistics for

all females and for all males would be produced with _TYPE_=1, as well as for all pre-op,

etc. would be produced.

• Again, many options are available for controlling the ways in which variable groups and

subgroups are defined, which makes these procedures very powerful for summarizing data.

Week 11 Page 20 of 58
Week 11 SAS Procedures to Summarize Data

b How to Use PROC UNIVARIATE

PROC UNIVARIATE also produces descriptive statistics for continuous numeric variables –along

with greater detail (including selected graphical descriptions) on the distribution of the variables.

• UNIVARIATE can be used to produce percentiles, such as the 10th and 90th percentiles

(or any other percentile you desire). There are several options for computing percentiles.

• Tests for normality of the distribution of the data are also available, along with a normal

probability plot, stem-and-leaf and box-and-whisker plots.

• The five smallest and five largest values can also be identified by an ID variable – which is

useful when identifying cases with outliers. The number and percent missing values for a

variable are also reported.

• UNIVARIATE can also be used with a BY statement, for previously sorted data, to produce

separate statistics for each group.

• Tip – Use a PROC UNIVARIATE with a BY statement to get a side-by-side box-and-whisker

plots; these allow you to compare visually the distribution of groups on a variable of interest.

• WARNING !!! PROC UNIVARIATE can take a lot of time, and produce tons of pages of

output. This is especially true if you are not careful in defining your variable list, or you use

a BY variable that has many groups.

• WARNING or TIP ??? (you decide) PROC UNIVARIATE will produce statistics for the

group (or groups) defined by missing values for the BY variable. This is because missing

values define a group.

• Along with producing output, PROC UNIVARIATE can be used to produce an output data

set. You can specify any set of percentile values to be included in the output data set can

be requested, along with the default set of output variables.

Week 11 Page 21 of 58
Week 11 SAS Procedures to Summarize Data

Example - The example that follows uses data from a study of peri-operative beta blocker use in

surgical patients.

options pagesize=55 linesize=78 nocenter nodate nonumber;


***********************************************************;
** Example using Proc Univariate **;
***********************************************************;
libname bb 'C:\bblocker';

proc univariate plot normal data=bb.bblock1;


id counter;
var age;
title1 'Univariate statistics on Age';
run;

proc sort data=bb.bblock1;


by gender;
run;

** summary stats separately by gender **;


proc univariate plot data=bb.bblock1;
id counter;
by gender;
var age;
title1 'Grouping by Patient Gender';
run;

Week 11 Page 22 of 58
Week 11 SAS Procedures to Summarize Data

The following output is produced.

Univariate statistics on Age

The UNIVARIATE Procedure


Variable: AGE (AGE)

Moments

N 158 Sum Weights 158


Mean 60.2088608 Sum Observations 9513
Std Deviation 15.6742532 Variance 245.682214
Skewness -0.1982307 Kurtosis -0.7009588
Uncorrected SS 611339 Corrected SS 38572.1076
Coeff Variation 26.0331337 Std Error Mean 1.24697663

Week 11 Page 23 of 58
Week 11 SAS Procedures to Summarize Data

Basic Statistical Measures


Location Variability

Mean 60.20886 Std Deviation 15.67425


Median 62.00000 Variance 245.68221
Mode 70.00000 Range 72.00000
Interquartile Range 25.00000

NOTE: The mode displayed is the smallest of 2 modes with a count of 6.

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student's t t 48.28387 Pr > |t| <.0001


Sign M 79 Pr >= |M| <.0001
Signed Rank S 6280.5 Pr >= |S| <.0001

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.980877 Pr < W 0.0275


Kolmogorov-Smirnov D 0.085311 Pr > D <0.0100
Cramer-von Mises W-Sq 0.205514 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 1.096303 Pr > A-Sq 0.0073

Quantiles (Definition 5)

Quantile Estimate

100% Max 93
99% 92
95% 85
90% 79
75% Q3 72
50% Median 62
25% Q1 47
10% 40
5% 35
1% 25
0% Min 21

Week 11 Page 24 of 58
Week 11 SAS Procedures to Summarize Data

Extreme Observations
----------Lowest--------- ---------Highest---------
Value COUNTER Obs Value COUNTER Obs

21 1 1 87 18 18
25 84 84 87 27 27
25 8 8 88 102 102
27 109 109 92 65 65
31 75 75 93 9 9

Stem Leaf # Boxplot


9 23 2 |
8 5566778 7 |
8 1114 4 |
7 55666778888899999 17 |
7 00000011122222333333444 23 +-----+
6 5556667777788899999 19 | |
6 0001122233344 13 *--+--*
5 555566666778899 15 | |
5 000111112244 12 | |
4 5556667777889999 16 +-----+
4 011122223333444 15 |
3 56778999 8 |
3 112 3 |
2 557 3 |
2 1 1 |
----+----+----+----+---
Multiply Stem.Leaf by 10**+1
Variable: AGE (AGE)

Normal Probability Plot


92.5+ +++* *
| **+* *
| +**
| *****
| *****+
| ****++
| ***++
57.5+ ***
| +**
| ****
| *****
| ****+
| ***
| *+**
22.5+*++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2

Week 11 Page 25 of 58
Week 11 SAS Procedures to Summarize Data

When a BY statement is used, the same output is generated separately for each group defined by

the BY variable.

• In addition, a side-by-side Box-and-Whisker plot to compare all groups is produced, as in

the example below.

• Note - The separate statistics for the 2 gender groups are not shown in the output that

follows, as it is too many pages.

Grouping by Patient Gender


The UNIVARIATE Procedure
Schematic Plots
100 +
|
|
| |
90 + |
| |
| | |
| | |
80 + | |
| | |
| | |
| | +-----+
70 + +-----+ | |
| | | | |
| | | | |
| | | *-----*
60 + *--+--* | + |
| | | | |
| | | | |
| | | | |
50 + +-----+ | |
| | | |
| | +-----+
| | |
40 + | |
| | |
| | |
| | |
30 + | |
| | |
| | |
| |
20 + |
------------+-----------+-----------
GENDER 1 2

Week 11 Page 26 of 58
Week 11 SAS Procedures to Summarize Data

c How to Use PROC TABULATE

PROC TABULATE procedure is initially confusing to use but bear with it; it is a powerful tool for

producing nicely formatted tables of descriptive statistics for groups and subgroups of classification

variables.

• In particular, PROC TABULATE can be used to produce formatted tables that can be

incorporated into a report directly, or with only minor modification.

• Moreover, while all of the statistics available in TABULATE (plus more) can be produced in

other procedures, TABULATE provides tremendous flexibility in the formulation of tables

• Counts and percentages for categorical variables can also be reported using PROC

TABULATE.

• TIP - PROC TABULATE can also be used to print tables of results from other

procedures (such as regression analyses, correlations, p-values, etc)! This is possible

when these results have been stored in output data sets.

There is a higher start up learning time for using PROC TABULATE than the other procedures that

produce summary statistics, but it’s worth it!!

• The reason is – once you’ve survived the learning time, you are later spared the task of

copying numbers, or repetitive cut and paste from crudely formatted output into tables for a

report. This is especially true for summary reports that are produced at regular intervals

throughout a study.

Week 11 Page 27 of 58
Week 11 SAS Procedures to Summarize Data

PROC TABULATE requires the specification of CLASS (categorical) variables used to form groups

and subgroups, and continuous numeric analysis variables (identified on a VAR statement) for

which statistics are produced.

• A TABLE statement is used to define the rows, columns, and pages of a TABLE, along with

the statistics to be produced, and the format for printing them.

• LABEL and FORMAT statements can be used to provide more descriptive information for

variable names and values.

• KEYLABEL can be used to provide more descriptive row and column titles for the statistics

requested. For example in the place of N as a column heading for the number of

observations, a KEYLABEL statement would allow you to use the phrase "NO. OF OBS".

• PROC TABULATE is described in the SAS Procedures Guide. In addition, there is a

separate manual devoted entirely to PROC TABULATE. Description of the options is also

available in the online documentation.

Following are a few examples of PROC TABULATE. They illustrate some of the different ways of

printing the same summary statistics for a data set.

Example -

• The data for this example come from a study of functional status outcome six months post-

cardiac catheterization procedure.

• Patients completed a functional status questionnaire pre-operatively, and again, 6 months

post-operatively.

Week 11 Page 28 of 58
Week 11 SAS Procedures to Summarize Data

• Of interest was a comparison of diabetic cardiac catheterization patients versus non-diabetic

cardiac catheterization patients with respect to their change over time in functional status.

Functional status was assessed using 2 scores: physical functioning and mental

functioning.

• In the tables that follow, summary statistics are printed for change in physical and mental

function scores. These summary statistics are reported for (1) diabetic patients, (2) non-

diabetic patients, (3) groups defined by age group, and (4) the entire study cohort.

• The examples that follow serve to illustrate the control in table formatting that is available.

TIP - While it is possible to name and create several tables in a single PROC TABULATE

procedure (before the RUN; statement), it is recommended that you request separate TABLE

statements; this has to do with the titles statements.

• Only one set of titles can be specified for a procedure -- to produce new titles for a new table,

the PROC TABULATE, CLASS and VAR statements must be repeated.

Guidelines for specification of the TABLE statement in PROC TABULATE.

• There are no default statistics produced with PROC TABULATE.

• Table rows - statement factors listed before a comma (,) define the table rows.

• Table columns – statement factors listed after the comma (,) define the table columns.

• An asterisk is used to separate sub-grouping within rows or columns.

Week 11 Page 29 of 58
Week 11 SAS Procedures to Summarize Data

• How to format the printing of a statistic – To accomplish formatting, the statistic name is

followed by a formatting definition.

(1) For example, MEAN*F=8.2 requests that the mean be printed, using 8 columns

(including 1 for the decimal place) with 2 of the 8 after the decimal place (i.e.,

#####.##).

(2) Alternatively, a single format can be defined for printing all statistics, by using

the phrase FORMAT=8.2 on the PROC TABULATE statement (see final example).

• The keyword ALL is used to get overall statistics in addition to subgroup statistics (e.g., DIAB

ALL will produce statistics for diabetics, non-diabetics and overall).

* Example 1: Proc Tabulate *;


* Statistics in columns, rows defined by class variables *;
PROC TABULATE DATA=FD2 ;

CLASS DIAB AGEGROUP ; * define grouping variables *;


VAR PF2_1 MF2_1; * define analysis variables *

* define table with rows for diab status, *;


* and sub-rows for age group *;
TABLES (PF2_1) * ((DIAB ALL)*(AGEGROUP ALL)) ,
(N*F=5. MEAN*F=8.1 STD*F=8.2 MIN*F=8. MAX*F=8.);

* assign formats and labels to variables *;


FORMAT DIAB dfmt.;

LABEL PF2_1='PF2_1: Physical Function Change'


MF2_1='MF2_1: Mental Function Change';

TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES';


TITLE2 'Example 1: Age Groups within Diabetic Status';
TITLE3 ' Using default column headings';
RUN;

* repeat for mental function change, using KEYLABEL *;

Week 11 Page 30 of 58
Week 11 SAS Procedures to Summarize Data

* separate procedure used to illustrate options, change titles *;


PROC TABULATE DATA=FD2 ;
CLASS DIAB AGEGROUP ;
VAR PF2_1 MF2_1;
TABLES (MF2_1) * ((DIAB ALL)*(AGEGROUP ALL)) ,
(N*F=5. MEAN*F=8.1 STD*F=8.2 MIN*F=8. MAX*F=8.);
FORMAT DIAB dfmt.;
LABEL PF2_1='PF2_1: Physical Function Change'
MF2_1='MF2_1: Mental Function Change';
KEYLABEL ALL=TOTAL
N='# of OBS';
TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES';
TITLE2 'Example 1: Age Groups within Diabetic Status';
TITLE3 ' using KEYLABEL to rename col and row headings';
RUN;

The following output is produced.


SUMMARY STATISTICS FOR CHANGE SCORES
Example 1: Age Groups within Diabetic Status
Using default column headings
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ†
‚ ‚ N ‚ Mean ‚ Std ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚PF2_1: ‚DIAB ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚Physic-‡ƒƒƒƒƒƒƒ‰UP ‚ ‚ ‚ ‚ ‚ ‚
‚al ‚Nondia-‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚Functi-‚betic ‚<65 ‚ 345‚ 5.7‚ 13.03‚ -29‚ 41‚
‚on ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚Change ‚ ‚>=65 ‚ 287‚ 2.2‚ 11.57‚ -28‚ 36‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚All ‚ 632‚ 4.1‚ 12.50‚ -29‚ 41‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabet-‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ic ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 79‚ 1.6‚ 12.16‚ -28‚ 37‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 98‚ 0.7‚ 11.85‚ -28‚ 36‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚All ‚ 177‚ 1.1‚ 11.96‚ -28‚ 37‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚All ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 424‚ 4.9‚ 12.95‚ -29‚ 41‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 385‚ 1.8‚ 11.64‚ -28‚ 36‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚All ‚ 809‚ 3.4‚ 12.44‚ -29‚ 41‚
Šƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ

Week 11 Page 31 of 58
Week 11 SAS Procedures to Summarize Data

SUMMARY STATISTICS FOR CHANGE SCORES


Example 1: Age Groups within Diabetic Status
using KEYLABEL to rename col and row headings

„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ†
‚ ‚# of ‚ ‚ ‚ ‚ ‚
‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: ‚DIAB ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚Mental ‡ƒƒƒƒƒƒƒ‰UP ‚ ‚ ‚ ‚ ‚ ‚
‚Functi-‚Nondia-‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚on ‚betic ‚<65 ‚ 345‚ -0.2‚ 11.48‚ -52‚ 32‚
‚Change ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 287‚ 0.5‚ 9.90‚ -30‚ 28‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 632‚ 0.1‚ 10.79‚ -52‚ 32‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabet-‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ic ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 79‚ -0.2‚ 10.17‚ -26‚ 20‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 98‚ 0.6‚ 8.85‚ -18‚ 21‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 177‚ 0.2‚ 9.45‚ -26‚ 21‚
‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚UP ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚<65 ‚ 424‚ -0.2‚ 11.23‚ -52‚ 32‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚>=65 ‚ 385‚ 0.6‚ 9.64‚ -30‚ 28‚
‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚ ‚TOTAL ‚ 809‚ 0.1‚ 10.50‚ -52‚ 32‚
Šƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ

Week 11 Page 32 of 58
Week 11 SAS Procedures to Summarize Data

* Example 2: Proc Tabulate *;


* Statistics in rows, groups and subgroups in columns *;
PROC TABULATE DATA=FD2 ;

CLASS DIAB AGEGROUP ; * define grouping variables *;


VAR PF2_1 MF2_1; * define analysis variables *;

* define table with rows for change scores *;


* and statistics and columns for groups, *;
* and sub-groups *;
TABLES (PF2_1 MF2_1) * (N*F=5. MEAN*F=8.1 STD*F=8.2 MIN*F=8. MAX*F=8.) ,
((DIAB ALL)*(AGEGROUP ALL)) ;

* assign formats and labels to variables *;


* and keylabels to headers *;
FORMAT DIAB dfmt.;

LABEL PF2_1='PF2_1: Physical Function Change'


MF2_1='MF2_1: Mental Function Change';

KEYLABEL ALL=TOTAL
N='# of OBS';

TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES';


TITLE2 'Example 2: Subgroups in Columns';
RUN;

Week 11 Page 33 of 58
Week 11 SAS Procedures to Summarize Data
SUMMARY STATISTICS FOR CHANGE SCORES
Example 2: statistics in rows and Subgroups in Columns

„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ DIAB ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ Nondiabetic ‚ Diabetic ‚ TOTAL ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰
‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚PF2_1: Physical ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Mean ‚ 5.7‚ 2.2‚ 4.1‚ 1.6‚ 0.7‚ 1.1‚ 4.9‚ 1.8‚ 3.4‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Std ‚ 13.03‚ 11.57‚ 12.50‚ 12.16‚ 11.85‚ 11.96‚ 12.95‚ 11.64‚ 12.44‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Min ‚ -29‚ -28‚ -29‚ -28‚ -28‚ -28‚ -29‚ -28‚ -29‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Max ‚ 41‚ 36‚ 41‚ 37‚ 36‚ 37‚ 41‚ 36‚ 41‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: Mental ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Mean ‚ -0.2‚ 0.5‚ 0.1‚ -0.2‚ 0.6‚ 0.2‚ -0.2‚ 0.6‚ 0.1‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Std ‚ 11.48‚ 9.90‚ 10.79‚ 10.17‚ 8.85‚ 9.45‚ 11.23‚ 9.64‚ 10.50‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Min ‚ -52‚ -30‚ -52‚ -26‚ -18‚ -26‚ -52‚ -30‚ -52‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Max ‚ 32‚ 28‚ 32‚ 20‚ 21‚ 21‚ 32‚ 28‚ 32‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ

Week 11 Page 34 of 58
Week 11 SAS Procedures to Summarize Data

* Example 3: Proc Tabulate *;


nd
* Statistics in and 1 group in rows, 2 group in columns *;

* reset linesize option so table will fit across page *;


options ls=175;
PROC TABULATE DATA=FD2 ;

CLASS DIAB AGEGROUP ;


VAR PF2_1 MF2_1;

TABLES (PF2_1 MF2_1) * (DIAB ALL) , (AGEGROUP ALL) *


(N*F=5. MEAN*F=8.1 STD*F=8.2 MIN*F=8. MAX*F=8.);

FORMAT DIAB dfmt.;

LABEL PF2_1='PF2_1: Physical Function Change'


MF2_1='MF2_1: Mental Function Change';

KEYLABEL ALL=TOTAL
N='# of OBS';

TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES';


TITLE2 'Example 3: Cross-classifying Diab (row) and Age Groups (Col)';

RUN;

Week 11 Page 35 of 58
Week 11 SAS Procedures to Summarize Data

SUMMARY STATISTICS FOR CHANGE SCORES


Example 3: Cross-classifying Diab (row) and Age Groups (Col)

„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ AGEGROUP ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚
‚ ‡ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰
‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚
‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚PF2_1: Physical ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚Nondiabetic ‚ 345‚ 5.7‚ 13.03‚ -29‚ 41‚ 287‚ 2.2‚ 11.57‚ -28‚ 36‚ 632‚ 4.1‚ 12.50‚ -29‚ 41‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabetic ‚ 79‚ 1.6‚ 12.16‚ -28‚ 37‚ 98‚ 0.7‚ 11.85‚ -28‚ 36‚ 177‚ 1.1‚ 11.96‚ -28‚ 37‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚ 424‚ 4.9‚ 12.95‚ -29‚ 41‚ 385‚ 1.8‚ 11.64‚ -28‚ 36‚ 809‚ 3.4‚ 12.44‚ -29‚ 41‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚MF2_1: Mental ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚Nondiabetic ‚ 345‚ -0.2‚ 11.48‚ -52‚ 32‚ 287‚ 0.5‚ 9.90‚ -30‚ 28‚ 632‚ 0.1‚ 10.79‚ -52‚ 32‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚Diabetic ‚ 79‚ -0.2‚ 10.17‚ -26‚ 20‚ 98‚ 0.6‚ 8.85‚ -18‚ 21‚ 177‚ 0.2‚ 9.45‚ -26‚ 21‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚ ‚TOTAL ‚ 424‚ -0.2‚ 11.23‚ -52‚ 32‚ 385‚ 0.6‚ 9.64‚ -30‚ 28‚ 809‚ 0.1‚ 10.50‚ -52‚ 32‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ

Week 11 Page 36 of 58
Week 11 SAS Procedures to Summarize Data

Additional options in TABULATE let you define if and where you want lines in the tables.

• For example you may choose not to have vertical or horizontal separators. This is

useful for producing tables for publication – some journals ask that tables be

presented without lines.

The next example illustrates reporting counts (N) and percentages (PCTN) using PROC

TABULATE.

• Note that all the variables are categorical, or CLASS variables.

• In this example a single format is defined for the whole table, on the PROC line, rather

than a different format for each statistic. This works here, because counts will always

be whole numbers, and I’m content to report percents rounded to a whole percent.

• How to get column percents - Column percents are defined by listing the ROW

variables within < > after the keyword PCTN.

• How to get row percents - To get row percents, the Column variable (DIAB in this

case) would be listed (e.g., PCTN<DIAB ALL>) instead of the row variables.

• The KEYLABEL statement is used to replace the word ‘ALL’ on the output with the

word ‘TOTAL’.

Week 11 Page 37 of 58
Week 11 SAS Procedures to Summarize Data

** Example 4: Counts and Percentages **;


** Using only CLASS (categorical) variables **;
** Rows: sex, race, age group **;
** Cols: diabetic status **;
** row percents reported **;
PROC TABULATE DATA=FD2 FORMAT=8.0;

CLASS DIAB SEX RACE2 AGEGROUP ;

TABLES (SEX RACE2 AGEGROUP ALL),


(DIAB ALL)*(N PCTN<SEX RACE2 AGEGROUP ALL>='PERCENT') / RTS=18;
FORMAT SEX $SEXF.
DIAB DIABF.
RACE2 RACEF.;
KEYLABEL ALL=TOTAL;
TITLE1 'DESCRIPTIVE TABLE FOR DEMOGRAPHICS';
TITLE2 'Example 4: Counts and Percentages ';
RUN;

DESCRIPTIVE TABLE FOR DEMOGRAPHICS


Example 4: Counts and Percentages
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ DIABETIC*STATUS ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
‚ ‚ Nondiabetic ‚ Diabetic ‚ TOTAL ‚
‚ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚PERCENT ‚ N ‚PERCENT ‚ N ‚PERCENT ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚SEX ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚
‚Female ‚ 837‚ 31‚ 400‚ 41‚ 1237‚ 34‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚Male ‚ 1875‚ 69‚ 576‚ 59‚ 2451‚ 66‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚RACE2 ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚
‚Caucasian ‚ 2589‚ 95‚ 844‚ 86‚ 3433‚ 93‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚Other ‚ 123‚ 5‚ 132‚ 14‚ 255‚ 7‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚AGEGROUP ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚
‚<65 ‚ 1566‚ 58‚ 496‚ 51‚ 2062‚ 56‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚>=65 ‚ 1146‚ 42‚ 480‚ 49‚ 1626‚ 44‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰
‚TOTAL ‚ 2712‚ 100‚ 976‚ 100‚ 3688‚ 100‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ

Week 11 Page 38 of 58
Week 11 Procedures to Summarize Data

3 How to Produce Frequency Tables and Cross-Tabulations (PROC FREQ)

PROC FREQ produces frequency table summaries of the distributions of discrete numeric or character

variables.

• PROC FREQ can produce cross-tabulations, as well as one-way frequency listings.

• Counts and percentages are produced for each group defined by variable values, or by the crossing of

variable values.

• Chi-square tests and other measures of association can also be produced by PROC FREQ (these will

not be discussed in this course).

• PROC FREQ can also be used to produce an output data set in addition to, or in place of results in the

output window.

A TABLES statement is used to define the frequency tables.

• To produce one-way tables, simply list the variable(s) on the TABLES statement separated by

spaces.

• To get cross-tabulations, name the variables separated by an asterisk (i.e., VAR1 * VAR2).

To define rows - The first named variable (var1) defines the rows.

To define columns - The second named variable (var2) defines the columns.

• You can list any combination of individual variables, cross-tabulations and multi-way tables on a single

TABLES statement. Multiple TABLES statements can be given in the same procedure.

• TIP - Only one set of titles may be given for all the tables requested in a PROC FREQ Thus, titles

may be the limiting factor in the number of tables requested in a single procedure.

Week 11 Page 39 of 58
Week 11 Procedures to Summarize Data

There are shorthand methods for listing variables in the TABLES statement.

• If a set of variables has the same prefix with sequential numbering, name the first and last variable,

separated by a single hyphen, no spaces. For example, the statement

TABLES score1-score10;

would produce frequency tables for all ten variables, score1 through score10.

• For variables without a common prefix, list the first and last in order as they appear in a data set

(use PROC CONTENTS POSITION to know correct order), separated by a double hyphen, no

spaces. All the variables listed in this manner must be of the same type, character or numeric. For

example, the line

TABLES age--insur;

would produce tables for age, insur, and all variables in order between them in the data set, as long

as they are all numeric or all categorical.

There are also shorthand ways of listing cross-tabulations.

• For example if you want cross-tabulations of sex by a whole set of variables use

TABLES SEX * (firstvar--lastvar);

• WARNING!! Be careful requesting a group of variables crossed by another group – listing (var1--varn)

* (varlist1--varlistn) – you may end up with a lot more tables than you bargained for!

Week 11 Page 40 of 58
Week 11 Procedures to Summarize Data

• For multi-way tables, (i.e., A*B*C), the first variable defines the table, the second the rows, and the third

the columns. For example the statement

TABLES SEX * AGEGR * HSMOKE;

would produce two cross-tabulations, one table for males and one for females, of age group against

smoking status (light, moderate, heavy).

• An alternative way to achieve the result would be to request tables of AGEGR * HSMOKE, using a

separate BY SEX statement. This requires that the data set be previously sorted by sex, and would

produce different statistics, should you be doing analyses as well as summary tables (e.g., different

chi-square tests).

Missing values are NOT included in frequency tables unless the options MISSPRINT or MISSING are

used.

• Options are listed in the TABLES statement and appear after a slash (/).

• If you use the MISSPRINT option -

- Missing values will be included in the table, BUT

- Missing values will not be used in computing percentages.

• If you use the MISSING option -

- Missing values will be included in the table AND

- Missing values will be used in computing percentages.

• If neither option is specified (default)

- Missing values are not included in the table AND

- Missing values are not used in computing percentages.

Week 11 Page 41 of 58
Week 11 Procedures to Summarize Data

Example Illustrating the Options for Handling Missing Data -

• This example uses data from a study of patients transferred from midwifery care during the course of a

pregnancy. One reason patients left midwifery care was early loss. If a woman suffered an early loss,

more detail about the loss was requested. Thus, for other women this question was not applicable.

• Two questions of interest in this study might be: 1) What proportion of all study women had a

therapeutic abortion? 2) What proportion of early pregnancy losses were therapeutic abortions?

• Recall that options are listed on a TABLES statement following a slash; the slash appears after the

listing of variables.

• The default leaves missing values out of the table altogether. The MISSPRINT option includes

these in the table, so that the difference between those for whom the question is not applicable, and

those with missing data can be seen.

• Note that the WHERE statement is used in this example; this is done so that the PROC FREQ

operates on a selected subset of the observations only.

Week 11 Page 42 of 58
Week 11 Procedures to Summarize Data

***********************************************************;
* example to illustrate different missing options in FREQ *;
* 1st define formats, including formats for missing codes *;
***********************************************************;
PROC FORMAT;
VALUE EARLYFMT 1='TAB'
2='SAB <11 WEEKS'
3='SAB 12-19 WEEKS'
4='SAB 20-24 WEEKS'
.='MISSING'
.N='NOT APPLICABLE';
RUN;
** with and without missprint and missing options **;
PROC FREQ DATA=CNMT;
FORMAT EARLYLOS EARLYFMT.;
TABLES EARLYLOS;
TITLE1 'CNM TRANSFER STUDY';
TITLE2 ‘DEFAULT OPTION FOR MISSING VALUES’;
RUN;

PROC FREQ DATA=CNMT;


FORMAT EARLYLOS EARLYFMT.;
TABLES EARLYLOS / MISSPRINT;
TITLE2 ‘MISSPRINT OPTION’;
RUN;

PROC FREQ DATA=CNMT;


FORMAT EARLYLOS EARLYFMT.;
TABLES EARLYLOS / MISSING;
TITLE2 ‘MISSING OPTION';
RUN;

PROC FREQ DATA=CNMT;


WHERE EARLYLOS NE .N;
FORMAT EARLYLOS EARLYFMT.;
TABLES EARLYLOS / MISSING;
TITLE2 ‘MISSING OPTION';
TITLE3 ‘WHERE used to select applicable cases only’;
RUN;

Week 11 Page 43 of 58
Week 11 Procedures to Summarize Data

CNM TRANSFER STUDY


DEfAULT OPTION FOR MISSING VALUES

EARLY PREGNANCY LOSS


Cumulative Cumulative
EARLYLOS Frequency Percent Frequency Percent
TAB 11 19.6 11 19.6
SAB <11 WEEKS 16 28.6 27 48.2
SAB 12-19 WEEKS 19 33.9 46 82.1
SAB 20-24 WEEKS 10 17.9 56 100.0

Frequency Missing = 286

CNM TRANSFER STUDY


MISSPRINT OPTION

EARLY PREGNANCY LOSS


Cumulative Cumulative
EARLYLOS Frequency Percent Frequency Percent
MISSING 6 . . .
NOT APPLICABLE 280 . . .
TAB 11 19.6 11 19.6
SAB <11 WEEKS 16 28.6 27 48.2
SAB 12-19 WEEKS 19 33.9 46 82.1
SAB 20-24 WEEKS 10 17.9 56 100.0

Frequency Missing = 286

Week 11 Page 44 of 58
Week 11 Procedures to Summarize Data

CNM TRANSFER STUDY


MISSING OPTION

EARLY PREGNANCY LOSS


Cumulative Cumulative
EARLYLOS Frequency Percent Frequency Percent
MISSING 6 1.8 6 1.8
NOT APPLICABLE 280 81.9 286 83.6
TAB 11 3.2 297 86.8
SAB <11 WEEKS 16 4.7 313 91.5
SAB 12-19 WEEKS 19 5.6 332 97.1
SAB 20-24 WEEKS 10 2.9 342 100.0
CNM TRANSFER STUDY
MISSING OPTION
WHERE used to select applicable cases only

EARLY PREGNANCY LOSS


Cumulative Cumulative
EARLYLOS Frequency Percent Frequency Percent
MISSING 6 9.7 6 9.7
TAB 11 17.7 17 27.4
SAB <11 WEEKS 16 25.8 33 53.2
SAB 12-19 WEEKS 19 30.7 52 83.9
SAB 20-24 WEEKS 10 16.1 62 100.0

• In the first two tables the computed percentages are identical, but use of the MISSPRINT option

distinguishes the not applicable (did not have an early loss) from those with missing information.

From these we can see, for example that of the patients with an early loss of known type, 19.6

percent had therapeutic abortions (TAB).

• The third table uses the MISSING option, which includes missing values in computation of percentages.

From this table we can see that of all the transferred patients, 3.2 percent had therapeutic

abortions.

• The final table uses the WHERE statement as well as the MISSING option, to create a table only on

patients with applicable data – in this case only those with early loss. Since 6 patients had early loss

of unknown type or age, these can now be included in the table to show that among the patients with

Week 11 Page 45 of 58
Week 11 Procedures to Summarize Data

an early loss, 9.7% were of unknown type/age, and that among all patients with early loss, 17.7%

were known to have therapeutic abortions.

• TIP - The availability of different options is useful, depending on what you want to know.

There are other options to control what is printed in the tables

• NOCUM suppresses printing of cumulative frequencies and percentages. This is especially appropriate

for nominal data, inasmuch as cumulative frequencies and cumulative percentages don’t make much

sense.

• NOFREQ suppresses printing of cell counts, NOROW, NOCOL, and NOPERCENT suppress row,

column and overall cell percentages, respectively.

Example Illustrating the Creation of Cross-Tabulations -

• The first example produces two tables, trimester by age group, and trimester by earlylos.

• The second TABLES statement produces separate tables of age group by earlylos for each level of

trimester. Note that there is no table for the third trimester, since by definition no one in the third

trimester can have an early loss.

• This example illustrates choosing to suppress percentagles.

** Example Crosstabulations **;


PROC FORMAT;
VALUE AGEFMT 1='<15 YRS'
2='15-17 YRS'
3='18-19 YRS'
4='20+ YRS';
VALUE EARLYFMT 1='TAB'
2='SAB <11 WEEKS'
3='SAB 12-19 WEEKS'
4='SAB 20-24 WEEKS'
.='MISSING'

Week 11 Page 46 of 58
Week 11 Procedures to Summarize Data

.N='NOT APPLICABLE';
RUN;
** cross-tabulations **;
PROC FREQ DATA=CNMT;
FORMAT AGEGR AGEFMT. EARLYLOS EARLYFMT.;

* look at trimester by 2 different factors *;


TABLES TRIMES * (AGEGR EARLYLOS);

* look at tables of age group by earlylos, separately for each trimester *;


TABLES TRIMES * AGEGR * EARLYLOS / NOPERCENT NOROW NOCOL;
TITLE1 ‘CNM TRANSFER STUDY’;
RUN;

CNM TRANSFER STUDY

TABLE OF TRIMES BY AGEGR


TRIMES(TRIMESTER OF PREGNANCY AT TRANSFER)
AGEGR(PATIENT AGE AT TRANSFER)
Frequency|
Percent |
Row Pct |
Col Pct |<15 YRS |15-17 YR|18-19 YR|20+ YRS |
| |S |S | | Total
---------+--------+--------+--------+--------+
1 | 0 | 9 | 7 | 3 | 19
| 0.00 | 5.20 | 4.05 | 1.73 | 10.98
| 0.00 | 47.37 | 36.84 | 15.79 |
| 0.00 | 14.29 | 13.73 | 6.25 |
---------+--------+--------+--------+--------+
2 | 8 | 19 | 23 | 20 | 70
| 4.62 | 10.98 | 13.29 | 11.56 | 40.46
| 11.43 | 27.14 | 32.86 | 28.57 |
| 72.73 | 30.16 | 45.10 | 41.67 |
---------+--------+--------+--------+--------+
3 | 3 | 35 | 21 | 25 | 84
| 1.73 | 20.23 | 12.14 | 14.45 | 48.55
| 3.57 | 41.67 | 25.00 | 29.76 |
| 27.27 | 55.56 | 41.18 | 52.08 |
---------+--------+--------+--------+--------+
Total 11 63 51 48 173
6.36 36.42 29.48 27.75 100.00

Frequency Missing = 169

Week 11 Page 47 of 58
Week 11 Procedures to Summarize Data

TABLE OF TRIMES BY EARLYLOS


TRIMES(TRIMESTER OF PREGNANCY AT TRANSFER)
EARLYLOS(EARLY PREGNANCY LOSS)
Frequency|
Percent |
Row Pct |
Col Pct |TAB |SAB <11 |SAB 12-1|SAB 20-2|
| |WEEKS |9 WEEKS |4 WEEKS | Total
---------+--------+--------+--------+--------+
1 | 3 | 12 | 5 | 0 | 20
| 6.25 | 25.00 | 10.42 | 0.00 | 41.67
| 15.00 | 60.00 | 25.00 | 0.00 |
| 50.00 | 92.31 | 26.32 | 0.00 |
---------+--------+--------+--------+--------+
2 | 3 | 1 | 14 | 10 | 28
| 6.25 | 2.08 | 29.17 | 20.83 | 58.33
| 10.71 | 3.57 | 50.00 | 35.71 |
| 50.00 | 7.69 | 73.68 | 100.00 |
---------+--------+--------+--------+--------+
3 | 0 | 0 | 0 | 0 | 0
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00
| . | . | . | . |
| 0.00 | 0.00 | 0.00 | 0.00 |
---------+--------+--------+--------+--------+
Total 6 13 19 10 48
12.50 27.08 39.58 20.83 100.00

Frequency Missing = 294

Week 11 Page 48 of 58
Week 11 Procedures to Summarize Data

CNM TRANSFER STUDY


TABLE 1 OF AGEGR BY EARLYLOS
CONTROLLING FOR TRIMES=1
AGEGR(PATIENT AGE AT TRANSFER) EARLYLOS(EARLY PREGNANCY LOSS)

Frequency |TAB |SAB <11 |SAB 12-1|SAB 20-2|


| |WEEKS |9 WEEKS |4 WEEKS | Total
----------+--------+--------+--------+--------+
<15 YRS | 0 | 0 | 0 | 0 | 0
----------+--------+--------+--------+--------+
15-17 YRS | 1 | 6 | 0 | 0 | 7
----------+--------+--------+--------+--------+
18-19 YRS | 2 | 2 | 1 | 0 | 5
----------+--------+--------+--------+--------+
20+ YRS | 0 | 1 | 0 | 0 | 1
----------+--------+--------+--------+--------+
Total 3 9 1 0 13

Frequency Missing = 15

TABLE 2 OF AGEGR BY EARLYLOS


CONTROLLING FOR TRIMES=2
AGEGR(PATIENT AGE AT TRANSFER) EARLYLOS(EARLY PREGNANCY LOSS)

Frequency |TAB |SAB <11 |SAB 12-1|SAB 20-2|


| |WEEKS |9 WEEKS |4 WEEKS | Total
----------+--------+--------+--------+--------+
<15 YRS | 1 | 0 | 2 | 1 | 4
----------+--------+--------+--------+--------+
15-17 YRS | 2 | 0 | 6 | 0 | 8
----------+--------+--------+--------+--------+
18-19 YRS | 0 | 1 | 3 | 0 | 4
----------+--------+--------+--------+--------+
20+ YRS | 0 | 0 | 1 | 1 | 2
----------+--------+--------+--------+--------+
Total 3 1 12 2 18

Frequency Missing = 77

TABLE 3 OF AGEGR BY EARLYLOS


CONTROLLING FOR TRIMES=3

Effective Sample Size = 0


Frequency Missing = 127

Week 11 Page 49 of 58
Week 11 Procedures to Summarize Data

4. How to Print Forms: PROC FORMS

Proc FORMS can be used to print mailing labels, file cards – any printer forms that have a regular pattern,

and that use information stored in SAS data sets.

• Beware! It may take some trial and error (with settings) to obtain the exact look (e.g. spacing)

that you are after. In some instances, it may be easier to export your data from SAS to ACCESS,

and use the special features that allow you to specify the mailing label format name, so spacing is

defined for you.

• There are options within the procedure to define the form dimensions, spacing, indentation, number of

units per page, and more. An example for printing mailing labels is shown.

• An alternative approach to producing “forms” is to use a PUT statement in a DATA _NULL_; step .

When you are printing combinations of text and variable values, this can sometimes be easier to use.

The PROC FORMS statement names the data file to use, followed by a series of options to control page and

line size, number of forms to print down and across a page, number of lines to skip between forms (in this

example, SKIP=2), as well as others options.

• LINE statements give a line number, followed by the variable names to be printed on the line. Options

for printing follow the slash (/). This example uses the LASTNAME feature to reorder a name given

as last, first. The option puts the text after a comma first, followed by the text before the comma, and

the comma is not printed. Alternatively, the first and last names could have been read in as separate

variables.

• The option PACK removes extra spaces that would appear, if a character variable doesn’t use all of the

available character variable length – as shown in the second example, without the options.

• Note that the statement TITLE1; must be used to remove all titles.

Week 11 Page 50 of 58
Week 11 Procedures to Summarize Data

** PROC FORMS EXAMPLE **;


** read in ASCII data **;
data mail;
input sid
name & $20.
addr1 & $25.
addr2 & $25.
zip;
cards;
01 Johnson, Lee. R. P.O. Box 243, Montgomery, Al 36113
02 Abbott, Brenda K. 568 Trillion Ct., Denver, CO 80237
03 Rodriquez, Juan 619 Powell Dr., Charleston, SC 29412
04 Stevenson, Mary K. 22 Meredith Blvd., Austin, TX 78702
05 Hawks, Patrick E. Rt. 1, Box 523, Taylorsville, NC 28681
06 Lee, Chen 123 Maple St. Raleigh, NC 27606
07 Weinstein, Joseph M. Rt. 4, Box 466, Dixon, IL 61021
08 Baskowshi, Bonnie G. P.O. Box 42, Sacramento, CA 85841
;
RUN;
* print subject list *;
proc print data=mail;
id sid;
var name addr1 addr2 zip;
title1 'List of subjects';
RUN;

* proc forms with options *;


proc forms data=mail skip=2;
line 1 name / lastname;
line 2 addr1 ;
line 3 addr2 zip/pack;
title1;
run;

* without options *;
proc forms data=mail;
line 1 name ;
line 2 addr1 ;
line 3 addr2 zip;
run;

Week 11 Page 51 of 58
Week 11 Procedures to Summarize Data

List of subjects

sid name addr1 addr2 zip

1 Johnson, Lee. R. P.O. Box 243, Montgomery, Al 36113


2 Abbott, Brenda K. 568 Trillion Ct., Denver, CO 80237
3 Rodriquez, Juan 619 Powell Dr., Charleston, SC 29412
4 Stevenson, Mary K. 22 Meredith Blvd., Austin, TX 78702
5 Hawks, Patrick E. Rt. 1, Box 523, Taylorsville, NC 28681
6 Lee, Chen 123 Maple St. Raleigh, NC 27606
7 Weinstein, Joseph M. Rt. 4, Box 466, Dixon, IL 61021
8 Baskowshi, Bonnie G. P.O. Box 42, Sacramento, CA 85841

(First example using options)


Lee. R. Johnson
P.O. Box 243,
Montgomery, Al 36113

Brenda K. Abbott
568 Trillion Ct.,
Denver, CO 80237

Juan Rodriquez
619 Powell Dr.,
Charleston, SC 29412

Mary K. Stevenson
22 Meredith Blvd.,
Austin, TX 78702

Patrick E. Hawks
Rt. 1, Box 523,
Taylorsville, NC 28681

Chen Lee
123 Maple St.
Raleigh, NC 27606

Joseph M. Weinstein
Rt. 4, Box 466,
Dixon, IL 61021

Bonnie G. Baskowshi
P.O. Box 42,
Sacramento, CA 85841

Week 11 Page 52 of 58
Week 11 Procedures to Summarize Data

(2nd example without options)


Johnson, Lee. R.
P.O. Box 243,
Montgomery, Al 36113
Abbott, Brenda K.
568 Trillion Ct.,
Denver, CO 80237
Rodriquez, Juan
619 Powell Dr.,
Charleston, SC 29412
Stevenson, Mary K.
22 Meredith Blvd.,
Austin, TX 78702
Hawks, Patrick E.
Rt. 1, Box 523,
Taylorsville, NC 28681
Lee, Chen
123 Maple St.
Raleigh, NC 27606
Weinstein, Joseph M.
Rt. 4, Box 466,
Dixon, IL 61021
Baskowshi, Bonnie G.
P.O. Box 42,
Sacramento, CA 85841

Week 11 Page 53 of 58
Week 11 Procedures to Summarize Data

5. How to Use PROC REPORT

PROC REPORT is another PROC that is worth the time and effort required to learn.

• When used well, reports can be generated directly as SAS output that require little or no further editing

before presentation.

• PROC REPORT encompasses many of the features of Procs PRINT, MEANS and TABULATE.

• Its features allow the presentation of detail (individual observations) and summary data, incorporated

into an organized report.

• The flexibility in reporting using PROC REPORT is a little better than that for PROC TABULATE.

However, both save later “cut and paste” work to create a project document.

• TIP – Consider using this procedure when you need to generate regular, multiple status reports. PROC

REPORT is handy inasmuch as, once a report has been designed, and the programming statements

saved, it can easily be re-generated.

• PROC REPORT is also nice for having features that allow comprehensive control over fonts,

background and text colors (or output text or html files) as well as the controls it provides over

spacing, page breaks and other formatting tools. PROC FORMAT also has features akin to a PUT

statement (LINE in Proc Report); this allows you to insert your own text in the report.

Week 11 Page 54 of 58
Week 11 Procedures to Summarize Data

When using PROC REPORT, plan in advance the layout of your report.

• A report’s layout is largely determined by the designation of variables into various categories.

• For each variable, a DEFINE statement is used to designate the display category for the report:

ƒ DISPLAY – A row appears for every observation for variables with this designation. By default,
all variables are considered DISPLAY variables, unless another designation is given.
ƒ ORDER – A row appears for every observation for variables with this designation, as a display
variable, but this designation will order by value.
ƒ ACROSS – Variables with this designation determine columns for the report – one for each
distinct value of the variable present in the input data. ACROSS variables are comparable to
CLASS variables in other procedures, but used to define columns in PROC REPORT.
ƒ GROUP – This designation groups on variable values to determine rows in the report, akin to
using a CLASS variable in other procedures.
ƒ ANALYSIS – Numeric variables that are used for computation of summary statistics, for each
cell of a report produced by ACROSS by GROUP designations.

Example

• This example uses data from the study of change in functional status, 6 months post-cardiac
catheterization among diabetic and non-diabetic patients. This study was described in the

section on PROC TABULATE (see, p 27) .

• The example is in three parts. Each demonstrates a few new features.


Part 1 - Prints means for physical function change scores by sex by age group by diabetic

status.

Part 2 - Adds formatting features and more summary statistics, and

Part 3 - Displays other statistics and t-test results.

Week 11 Page 55 of 58
Week 11 Procedures to Summarize Data

* example part 1 *;

* use NOWD to suppress windows, headskip to skip line after header row*;
PROC REPORT DATA=FD2 NOWD HEADSKIP;

* define columns for report *;


* nominal vars: diab, age group and sex *;
* numeric var: change in pf score *;
COLUMN DIAB AGEGROUP SEX PF2_1 ;

* define nominal variables to create rows, assign formats *;


DEFINE DIAB / GROUP FORMAT=DFMT.;
DEFINE AGEGROUP / GROUP ;
DEFINE SEX / GROUP FORMAT=$SEXF.;

* define statistic to print for pf change as the mean *;


* assign title to column *;
DEFINE PF2_1 / ANALYSIS MEAN 'PF CHANGE MEAN';
TITLE1 'EXAMPLE USING PROC REPORT';
RUN;

EXAMPLE USING PROC REPORT


PF
AGEG CHANGE
DIAB ROUP SEX MEAN

Diabetic <65 Female 1.4254753


Male 1.7350271
>=65 Female 0.7417697
Male 0.722176
Nondiabetic <65 Female 5.9010155
Male 5.5899143
>=65 Female 1.5194106
Male 2.5627356

Week 11 Page 56 of 58
Week 11 Procedures to Summarize Data

* EXAMPLE part 2: ADDING FORMATTING FEATURES and more statistics *;


PROC REPORT DATA=FD2 NOWD HEADSKIP;

* Define several columns for pf change, to print several statistics *;


COLUMN DIAB AGEGROUP SEX PF2_1=PFN PF2_1=PFMEAN PF2_1=PFMiN PF2_1=PFMAX;

* define nominal variables adding column labels and formats *;


DEFINE DIAB / GROUP 'DIABETIC STATUS' FORMAT=DFMT.;
DEFINE AGEGROUP / GROUP 'AGE GROUP' FORMAT=$6.;
DEFINE SEX / GROUP FORMAT=$SEXF.;

* define each of the statistics to use for PF change *;


* and give column header and format *;
DEFINE PFN / ANALYSIS N 'N' FORMAT=3.;
DEFINE PFMEAN / ANALYSIS MEAN 'PF CHANGE MEAN' FORMAT=6.1;
DEFINE PFMIN / ANALYSIS MIN 'PF CHANGE MIN' FORMAT=6.1;
DEFINE PFMAX / ANALYSIS MAX 'PF CHANGE MAX' FORMAT=6.1;

* create line breaks after each group *;


BREAK AFTER AGEGROUP / SKIP SUPPRESS;
BREAK AFTER DIAB / SUPPRESS SKIP;
TITLE2 'ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS';
RUN;

EXAMPLE USING PROC REPORT


ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS

PF PF PF
DIABETIC AGE CHANGE CHANGE CHANGE
STATUS GROUP SEX N MEAN MIN MAX

Diabetic <65 Female 34 1.4 -28.1 27.3


Male 45 1.7 -20.8 37.0

>=65 Female 40 0.7 -20.5 30.8


Male 58 0.7 -27.7 36.4

Nondiabetic <65 Female 92 5.9 -29.3 41.1


Male 253 5.6 -28.4 38.0

>=65 Female 100 1.5 -24.2 25.4


Male 187 2.6 -28.0 36.3

Week 11 Page 57 of 58
Week 11 Procedures to Summarize Data

** EXAMPLE part 3: REPORTING TEST STATISTICS **;

PROC REPORT DATA=FD2 NOWD HEADSKIP;

** Define column variables, including several columns for pf change *;


COLUMN DIAB AGEGROUP SEX PF2_1=PFN PF2_1=PFMEAN PF2_1=PFSTDERR
PF2_1=PFTEST;

** define nominal vars with column labels and formats **;


DEFINE DIAB / GROUP 'DIABETIC STATUS' FORMAT=DFMT.;
DEFINE AGEGROUP / GROUP 'AGE GROUP' FORMAT=$6.;
DEFINE SEX / GROUP FORMAT=$SEXF.;

** define statistics for pf change, **;


** including test that mean change is different from 0 **;
DEFINE PFN / ANALYSIS N 'N' FORMAT=3.;
DEFINE PFMEAN / ANALYSIS MEAN 'PF CHANGE MEAN' FORMAT=6.1;
DEFINE PFSTDERR / ANALYSIS STDERR 'STD ERROR PF CHANGE' FORMAT=6.1;
DEFINE PFTEST / ANALYSIS PRT 'P-VALUE: MEAN=0' FORMAT=8.4;

BREAK AFTER AGEGROUP / SKIP SUPPRESS;


BREAK AFTER DIAB / SUPPRESS SKIP;
TITLE3 'ADDING TEST OF MEAN CHANGE DIFFERENT FROM ZERO';
RUN;

EXAMPLE USING PROC REPORT


ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS
ADDING TEST OF MEAN CHANGE DIFFERENT FROM ZERO

STD
PF ERROR
DIABETIC AGE CHANGE PF P-VALUE:
STATUS GROUP SEX N MEAN CHANGE MEAN=0

Diabetic <65 Female 34 1.4 2.4 0.5522


Male 45 1.7 1.6 0.2906

>=65 Female 40 0.7 1.8 0.6876


Male 58 0.7 1.6 0.6521

Nondiabetic <65 Female 92 5.9 1.4 0.0001


Male 253 5.6 0.8 0.0001

>=65 Female 100 1.5 1.1 0.1602


Male 187 2.6 0.9 0.0039

Week 11 Page 58 of 58