Week 11 SAS Procedures to Summarize Data

Unit 5
SAS for Data Description

Week 11: Procedures to Summarize Data

Welcome!

Data summarization is a significant portion of data management activities. It serves a variety of

purposes, including (1) monitoring and tracking of a study cohort, (2) informing project planning,

and (3) cohort description. Data summaries are also useful because they provide clues to data

cleaning and analysis.

Data summarization might take the form of a listing of the data, the reporting of averages (and

accompanying standard deviations), tabulations of frequency distributions, or graphical summaries

(e.g. scatter plots).

The better summaries are those that are self-explanatory. They are well labeled (have titles and

variable values identified) and are straightforward to understand. It’s also helpful to accompany the

summarization with documentation of the data source (name and version of data set) and the name

of program that generated the summary.

The SAS procedures discussed in this reading, Unit 5 week 1 (week 11 of course), are PRINT,

MEANS, SUMMARY, UNIVARIATE, TABLUATE, FREQ, FORMS, REPORT, CHART, and PLOT.

Week 11 Page 1 of 58

Week 11 SAS Procedures to Summarize Data

In the Unit 5 week 2 (week 12 of course) reading, the procedures discussed are CHART and

PLOT. The Unit 5 week 2 (week 12 of course) reading also includes a brief introduction to using the

SAS ANALYST for producing graphics with the SASGRAPH module. These are higher quality

graphics than the printer character charts and plots produced with PROCs CHART and PLOT.

TIP - Most procedures produce results that appear in the output window. Along with directly

producing output, many of the procedures to be discussed can produce new SAS data sets (output

data), which in turn can be used in other procedures, such as PRINT or TABULATE. In this way, it

is possible to have more control over the format in which results are printed. Output data sets can

also be modified in subsequent DATA steps, to add labels or formats before printing.

TIP – Be sure to see descriptions of these procedures in the SAS Procedures Guide. Learn to use

the SAS manuals - there are many options to use with all procedures. These course notes

are not all inclusive!

Week 11 Page 2 of 58

Week 11 SAS Procedures to Summarize Data

Goals of Week 11: Procedures to Summarize Data

1. to be competent in using SAS procedures to “write out” data values in a manner that is
easy to read;

2. to appreciate the utility of “writing out” data as a preliminary to data quality assessment;

3. to be competent in using SAS procedures to produce frequency distributions and cross-
tabulations of data that are self-explanatory to an independent reader; and

4. to be (at least a little) competent in using SAS procedures to “write out” data as part of the
production of forms (admittedly MS ACCESS might be a better tool in this regard);

Week 11 Outline – Procedures to Summarize Data
Section Topic Page

1. How to Produce Data Listings (PROC PRINT) ……………………………. 4
a. Printing with variable labels and ID statement …………………..….… 5
b. Printing with a BY statement …………………………………………….. 7
c. Introduction to the WHERE statement: How to Print a Subset ……… 13

2. How to Produce Summary Statistics ………………………………………… 15
a. How to Use PROC MEANS and PROC SUMMARY ……………… 15
b. How to Use PROC UNIVARIATE ………………………………….. 21
c. How to Use PROC TABULATE …………………………………….. 27

3. How to Produce Frequency Tables, Cross-Tabulations (PROC FREQ) 39

4. How to Print Forms (PROC FORMS) …….……………………………….. 50

5. How to Use PROC REPORT ………………………………………………. 54

Week 11 Page 3 of 58

Week 11 SAS Procedures to Summarize Data

1. How to Produce Data Listings: PROC PRINT

PROC PRINT is a simple way to get a listing of data in a SAS data set.

• We have already used PROC PRINT to get a listing of data (see Unit 4 week 1)

• More generally, PROC PRINT is used to list:

(1) all of the variable values for all of the observations in a SAS data set,

(2) some of the variable values for all of the observations in a SAS data set, or

(3) some of the variable values for some of the observations in a SAS data set.

• There are options in SAS to control the format of the listing.

• Selected options for PROC PRINT are illustrated in the examples that follow.

Examples – The examples below use data on two neurological assessment scales used in a

cardiopulmonary bypass study. Data on pre-op, post-op and follow-up scores are printed in

different ways to illustrate some of the options available for printing in SAS.

Week 11 Page 4 of 58

Week 11 SAS Procedures to Summarize Data

a Printing with variable labels and ID statement

Example

• The data in this example are arranged with one record per subject. Included in this record are

pre, post, and follow-up scores for each of two assessment scales.

• Scores are printed for all three periods, for the two scales.

• When the keyword LABEL is included in the PROC PRINT statement, the variable labels are

used in the column heading instead of the variable name.

• TIP - It is also possible to assign new labels in the PROC PRINT procedure.

• TIP - A split character can be used when creating variable labels. This character is used to

split the labels into two or more lines for printing. To do this:

- Instead of writing LABEL on the PROC PRINT statement,

- Write SPLIT=’ ’, where the split character (which can be a space) is enclosed

within the single quotes.

• In this example, new labels using the split character * were defined for all variables. You can

see the advantage of using a split: the column width for printing would be determined by the

length of the variable label if it were not split.

• The VAR statement is used to

- select the variables to print, and to

- control the order in which variables appear.

• When the ID statement is used, no observation number is printed. Instead, the variable named

after ID appears in the leftmost column, before the variables in the VAR statement.

• IF NO VAR statement is written, ALL of the variables in the dataset are printed, and they are

printed in the order in which they are stored.

Week 11 Page 5 of 58

RUN. **. VAR MSCORE1 MSCORE2 MSCORE3 NSCORE1 NSCORE2 NSCORE3.Week 11 SAS Procedures to Summarize Data Example - *__________________________________________________. ** define the variable to put in first column in place of obs number **. ** name variables in order for printing **. * print only first 10 observations **. LABEL PATID=’PATIENT*ID’ MSCORE1='PRE-OP*MATHEW*SCORE' MSCORE2='POST-OP*MATHEW*SCORE' MSCORE3='FOLLOW-UP*MATHEW*SCORE' NSCORE1='PRE-OP*NEURO*SCORE' NSCORE2='POST-OP*NEURO*SCORE' NSCORE3='FOLLOW-UP*NEURO*SCORE'. TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES'. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'. post. PROC PRINT DATA=MNSCORE(OBS=10) SPLIT='*'. ************************************************************** * Use ‘*’ to define split character **. ID PATID. The following output is produced: LISTING OF NEUROLOGICAL TOTALS SCORES FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES PREOP POSTOP FOLLOWUP PREOP POSTOP FOLLOWUP PATID MATHEW MATHEW MATHEW NEURO NEURO NEURO ID SCORE SCORE SCORE SCORE SCORE SCORE 24 100 97 97 100 85 85 28 99 87 98 100 85 90 60 100 100 100 100 100 100 65 100 98 100 100 90 100 70 99 97 100 100 95 100 40 97 99 98 95 100 95 190 100 100 97 100 100 90 196 100 100 99 100 100 100 210 98 85 99 90 80 95 240 100 98 97 100 95 90 Week 11 Page 6 of 58 . ** print neurologic summary scores for pre. ** & follow-up using label and ID options **. ** define labels indicating where to split into two lines **.

• When a BY variable is used for printing. the print instruction must be preceded by a sort instruction. if you wanted to print data that is sorted by LNAME (e. post. run. by lname. LNAME is the variable name for “last name”). Week 11 Page 7 of 58 . Here the data are printed in two ways. and follow-up status. • Tip . This might look something like proc sort data=temp. and not repeated for subsequent observations. the data are grouped under a header line the gives information on the grouping or BY variable. • The second example uses the same data. this makes clear the separation of groups. For example.Week 11 SAS Procedures to Summarize Data b Printing with a BY statement Any use of a BY statement must be preceded with a sort of the data! Use of a BY statement allows you to print data that is sorted. run. by lname. first by patient id number. • Each by-group is separated by a blank line. for pre. but in this case it is arranged with multiple records per subject.g. • To print with a BY statement. the grouping variable is listed in the first column. the data must first be sorted by the BY variable(s).When the same variable is named in both an ID statement and a BY statement. proc print data=temp. and second by patient status.

This also enhances readability since it produces the variable name as well as its accompanying description in the output.Do include a split character in variable labels. this is not required. labels have been created with the split character as part of the PROC PRINT instruction. however. Week 11 Page 8 of 58 . it makes your output easier to read!. For example: LABEL PATID = ‘PATID:*Patient*ID’.Week 11 SAS Procedures to Summarize Data • Here. nclude the variable name within the label. • Tip . • Tip – And while you’re at it.

Week 11 SAS Procedures to Summarize Data *************************************************************. ** follow-up using label and ID options and BY statements **. ** format patient status **. TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES'. ** SORT to print grouped by PATID **. RUN. RUN. VAR PSTATUS MATTOTAL NTOTAL. & **. PROC FORMAT.. ** name variables in order for printing **. * define split char. * print first 9 obs . Week 11 Page 9 of 58 . post. PROC SORT DATA=MNS2. * print grouped by patid. VALUE PFMT 1='PRE-OP' 2='POST-OP' 3='FOLLOW-UP'. TITLE3 'LISTED BY SUBJECT'. PROC PRINT DATA=MNS2(OBS=9) SPLIT='*' NOOBS. LABEL MATTOTAL='MATHEW*TOTAL SCORE' NTOTAL='NEUROLOGICAL*TOTAL SCORE' PSTATUS='PATIENT*STATUS'. BY PATID. suppress printing obs # with NOOBS. ** create formats for patient status **. RUN. ** define labels indicating where to split into two lines **. BY PATID. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'. FORMAT PSTATUS PFMT. ** print neurologic summary scores for pre. *************************************************************.

Week 11 SAS Procedures to Summarize Data Output follows.PATIENT ID NUMBER =24 --------------------- PATIENT MATHEW NEUROLOGICAL STATUS TOTAL SCORE TOTAL SCORE PRE-OP 100 100 POST-OP 97 85 FOLLOW-UP 97 85 --------------------. LISTING OF NEUROLOGICAL TOTALS SCORES FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES LISTED BY SUBJECT --------------------. Note that no observation number is listed. due to the option NOOBS on the PROC PRINT statement.PATIENT ID NUMBER = 60 -------------------- PATIENT MATHEW NEUROLOGICAL STATUS TOTAL SCORE TOTAL SCORE PRE-OP 100 100 POST-OP 100 100 FOLLOW-UP 100 100 Week 11 Page 10 of 58 .PATIENT ID NUMBER = 28 -------------------- PATIENT MATHEW NEUROLOGICAL STATUS TOTAL SCORE TOTAL SCORE PRE-OP 99 100 POST-OP 87 85 FOLLOW-UP 98 90 --------------------.

** name variables in order for printing **. TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES'. RUN.Week 11 SAS Procedures to Summarize Data Recall . * ID the pstatus in first column . TITLE3 'LISTED BY STATUS. VAR PATID MATTOTAL NTOTAL. ID STATUS'. the grouping variable is listed in the first column. LABEL PATID ='PATIENT*ID NO. PROC PRINT DATA=MNS2 SPLIT='*'. BY PSTATUS. and not repeated for subsequent observations. ** print the same data grouped by pstatus **. RUN. Week 11 Page 11 of 58 . ID PSTATUS. * print grouped by pstatus . * define split char . and groups the data by status rather than by patient. PROC SORT DATA=MNS2.When the same variable is named in both an ID statement and a BY statement. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'. BY PSTATUS. ** assign format for patient status **. FORMAT PSTATUS PFMT. Example – The following example uses BY statement and ID statement together. ** define labels indicating where to split into two lines **..' MATTOTAL='MATHEW*TOTAL SCORE' NTOTAL='NEUROLOGICAL*TOTAL SCORE' PSTATUS='PATIENT*STATUS'.

Week 11 Page 12 of 58 . including line spacing (e.. There are several more options to control printing. printing subtotals and column totals • See the PRINT procedure in the SAS Procedures Guide or use the online HELP. and written only once for each group. ID STATUS PATIENT PATIENT MATHEW NEUROLOGICAL STATUS ID NO.Week 11 SAS Procedures to Summarize Data LISTING OF NEUROLOGICAL TOTALS SCORES FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES LISTED BY STATUS. double or single). TOTAL SCORE TOTAL SCORE PRE-OP 24 100 100 28 99 100 60 100 100 65 100 100 74 99 100 POST-OP 24 97 85 28 87 85 60 100 100 65 98 90 74 97 95 FOLLOW-UP 24 97 85 28 98 90 60 100 100 65 100 100 74 100 100 Note the variable listed in both the BY and ID statements is listed to the left.g.

Week 11 SAS Procedures to Summarize Data c Introduction to the WHERE statement: How to Print a Subset of Observations The WHERE statement instructs SAS to perform its task on a selected set of observations. RUN. notice the difference in observation numbers that appear on the data listing (if the OBS column is printed). * Part 1: create subset of data and print it *. PROC PRINT DATA=MNS2 SPLIT='*'. Week 11 Page 13 of 58 . * define split char. * Part 2: print subset using WHERE statement *. • Note . to select the subset of interest. FORMAT PSTATUS PFMT. LABEL PATID ='PATIENT*ID NO. SET MNS2. IF PSTATUS=1. • WHERE is used to select which observations will be used in the procedure being performed.. this can be done by creating a subset data file – using a data step to take a subset of data with PSTATUS=1. PROC PRINT can be used with a WHERE statement. and then using this new data set in PROC PRINT. VAR PSTATUS PATID MATTOTAL NTOTAL. • Alternatively. • For example.SUBSET DATA. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'. TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES'. with the condition: WHERE PSTATUS=1. if we are interested in listing only data for the pre-operative assessment. RUN. WHERE PSTATUS=1. DATA MNPSTAT1. PROC PRINT DATA=MNPSTAT1 SPLIT='*'.In the following example. TITLE3 'PRE-OP SCORES ONLY .' MATTOTAL='MATHEW*TOTAL SCORE' NTOTAL='NEUROLOGICAL*TOTAL SCORE' PSTATUS='PATIENT*STATUS'. print subset .

USING WHERE STATEMENT PATIENT PATIENT MATHEW NEUROLOGICAL OBS STATUS ID NO. so that the WHERE statement selected every third observation. LABEL PATID ='PATIENT*ID NO. TOTAL SCORE TOTAL SCORE 1 PRE-OP 24 100 100 2 PRE-OP 28 99 100 3 PRE-OP 60 100 100 4 PRE-OP 65 100 100 5 PRE-OP 74 99 100 ---------------------------------------------------------------------------------------------------------------------------- LISTING OF NEUROLOGICAL TOTALS SCORES PRE-OP SCORES ONLY . VAR PSTATUS PATID MATTOTAL NTOTAL.SUBSET DATA PATIENT PATIENT MATHEW NEUROLOGICAL OBS STATUS ID NO.' MATTOTAL='MATHEW*TOTAL SCORE' NTOTAL='NEUROLOGICAL*TOTAL SCORE' PSTATUS='PATIENT*STATUS'. In this case.Week 11 SAS Procedures to Summarize Data FORMAT PSTATUS PFMT.USING WHERE STATEMENT. TOTAL SCORE TOTAL SCORE 1 PRE-OP 24 100 100 4 PRE-OP 28 99 100 7 PRE-OP 60 100 100 10 PRE-OP 65 100 100 13 PRE-OP 74 99 100 Note that the observation numbers differ. RUN. corresponding to pre-op status.. LISTING OF NEUROLOGICAL TOTALS SCORES FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES PRE-OP SCORES ONLY . Week 11 Page 14 of 58 . TITLE1 'LISTING OF NEUROLOGICAL TOTALS SCORES'. the data file MNS2 had been previously sorted by patient id and patient status. TITLE2 'PRE-OP SCORES ONLY .

when computing statistics on subject AGE. For example using the statement CLASS SEX. using the statement BY SEX. • For example. number of missing values. UNIVARIATE. along with overall statistics for the whole group. plus several other statistics. for a data set previously sorted by the variable SEX. quantiles. • Note . as well as for males and females separately. They also differ in the formatting of results. and TABULATE.Week 11 SAS Procedures to Summarize Data 2. standard deviations and standard errors. SUMMARY. range. SUMMARY and UNIVARIATE can be used to create output datasets containing summary statistics that can be used in other procedures.Use of a CLASS statement does not require a prior sort of the data. How to Use PROC MEANS and PROC SUMMARY • PROCs MEANS and SUMMARY can be used to compute means. maximums. The SAS Procedures Guide provides a detailed comparison of the statistics available in each procedure. • These are MEANS. confidence intervals. How to Produce Summary Statistics There are four procedures that provide basic descriptive statistics for continuous variables. would produce statistics on AGE for all subjects. minimums. • CLASS and/or BY statements can be used to compute the statistics separately for subgroups of observations. which give many options for controlling the formatting of the data. a. would provide separate statistics on AGE for males and females. such as PROCs PRINT and REPORT. Week 11 Page 15 of 58 . • The procedures differ in the choice of statistics that can be produced. • MEANS. • CLASS statements produce separate statistics for subgroups.

by default. • PROC SUMMARY. • This gives you control of the order the statistics appear in on the output. • PROC MEANS. The primary difference between CLASS and BY statements is that the format for printing is different. Each of the procedures. • “How to” . although an output data set containing the summary statistics can be requested. when the statistics are printed. Week 11 Page 16 of 58 .Statistics are requested on the PROC statement. although results can be requested in the output window using the PRINT option.Producing summary statistics for two neurological assessment scales • This example uses produces summary statistics on the neurologic assessment scales using PROC MEANS and the PROC SUMMARY.Week 11 SAS Procedures to Summarize Data The primary difference between PROC MEANS and PROC SUMMARY is in the defaults for printing. • A BY statement requires previous sorting and produces no overall statistics. Example . provides results in the output window. before the first semi-colon. however. • See the SAS Procedures Manual or the online documentation for details of available statistics. produces only an output data set. by default. you can request specific statistics. MEAN and SUMMARY produce a default set of statistics. subsequently printed with PROC PRINT is also illustrated. • Creation of an output data set.

the default is 8 places after the decimal point – and no one should have to look at that much nonsense. BY PSTATUS.Week 11 SAS Procedures to Summarize Data • TIP . VALUE PFMT 1='PRE-OP' 2='POST-OP' 3='FOLLOW-UP'. PROC SORT DATA=MNS2. BY PSTATUS. RUN. ** format patient status **. *************************************************************.Use the MAXDEC option to control decimal places printed in the output! This is illustrated in the example. If you don’t use the MAXDEC option. ** do this with PROC MEANS and SUMMARY to show options **. allow 2 dec places for printing *. RUN. variable names for statistics **. OUTPUT OUT=MNMEANS MEAN=MEANMAT MEANN STD=STDMAT STDN STDERR=SE_MAT SE_N MIN=MINMAT MINN MAX=MAXMAT MAXN. ** define output data set. ** sort by pstatus **. PROC MEANS DATA=MNS2 N MEAN STD STDERR MIN MAX MAXDEC=2. ** create formats for patient status **. ** get means of neurologic scores by patient status **. OPTIONS LINESIZE=78 PAGESIZE=60. Week 11 Page 17 of 58 . ** name variables to compute statistics for. PROC FORMAT. LIBNAME CPB 'C:\temp'. * name statistics. TITLE3 'USING PROC MEANS WITH A BY STATEMENT'. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'. *************************************************************. * compute separately for each period.. FORMAT PSTATUS PFMT. TITLE1 'SUMMARY STATISTICS'. VAR MATTOTAL NTOTAL.

FORMAT PSTATUS PFMT.00 99.03 1.56 95 90 100 100 POST-OP 0 31 96.48 5.67 5.37 0.26 0.00 96.00 98.STATUS PRE.42 4.24 NTOTAL NEUROLOGICAL TOTAL 32 90.26 NTOTAL NEUROLOGICAL TOTAL 28 85.22 0.73 6.03 98.48 0.42 1.84 ----------------------------------------------------------------------------------- OUTPUT DATA SET FROM MEANS PROCEDURE WITH BY STATEMENT PSTATUS _TYPE_ _FREQ_ MEANMAT MEANN STDMAT STDN SE_MAT SE_N MINMAT MINN MAXMAT MAXN PRE-OP 0 32 99. SUMMARY STATISTICS FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES USING PROC MEANS WITH A BY STATEMENT --------------------.39 96.13 73 80 100 100 FOLLOW-UP 0 28 98.39 0.48 94.57 ------------------------------------------------------------------------------------ ---------------------. POST OR FOLLOW-UP=POST-OP ----------------------- N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error ------------------------------------------------------------------------------------ 31 MATTOTAL MATHEW TOTAL SCORE 31 73.00 94.03 1.24 0. Following is the output.00 100.37 3. POST OR FOLLOW-UP=FOLLOW-UP ---------------------- N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error ------------------------------------------------------------------------------------ 28 MATTOTAL MATHEW TOTAL SCORE 28 94.43 1.43 3.73 1.13 ------------------------------------------------------------------------------------ --------------------.00 100.STATUS PRE.00 100.31 1.00 100. POST OR FOLLOW-UP=PRE-OP --------------------- N Obs Variable Label N Minimum Maximum Mean Std Dev Std Error ------------------------------------------------------------------------------------ 32 MATTOTAL MATHEW TOTAL SCORE 32 95.00 98.00 96..STATUS PRE.00 100.67 6.03 NTOTAL NEUROLOGICAL TOTAL 31 80.39 4.31 1.48 0.22 0. TITLE1 'OUTPUT DATA SET FROM MEANS PROCEDURE WITH BY STATEMENT'. RUN.84 94 85 100 100 Week 11 Page 18 of 58 . PROC PRINT DATA=MNMEANS.00 100.39 1.Week 11 SAS Procedures to Summarize Data RUN. ID PSTATUS.

** repeat.. * name input data set. ** for each statistic for each variable **.96 96. ID PSTATUS.24 0.22 0. * to output window *. RUN. TITLE2 'FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES'.03 98. ************************************************************. Following is the output.26 0.39 4. CLASS PSTATUS..26 0. & print results *. VAR MATTOTAL NTOTAL.78 0.31 1.67 5.56 95 90 100 100 POST-OP 1 31 96. using PROC SUMMARY with a class statement **.48 0. ** create output for using in subsequent step. FORMAT PSTATUS PFMT.84 94 85 100 100 Week 11 Page 19 of 58 . /* name statistics*/ * compute separately for each period and overall. ** name variables to compute statistics for. OUTPUT OUT=MNSUMM MEAN=MEANMAT MEANN STD=STDMAT STDN STDERR=SE_MAT SE_N MIN=MINMAT MINN MAX=MAXMAT MAXN.13 73 80 100 100 FOLLOW 1 28 98.53 2.43 1. giving name **. PROC PRINT DATA=MNSUMM. ** format patient status **.Week 11 SAS Procedures to Summarize Data ************************************************************.39 96. SUMMARY STATISTICS FOR MATHEW AND NEUROLOGIC STANDARD ASSESSMENT SCALES USING SUMMARY WITH A CLASS STATEMENT PSTATUS _TYPE_ _FREQ_ MEANMAT MEANN STDMAT STDN SE_MAT SE_N MINMAT MINN MAXMAT MAXN 0 91 97. PROC SUMMARY DATA=MNS2 PRINT MAXDEC=2 N MEAN STD STDERR MIN MAX . FORMAT PSTATUS PFMT.42 1. TITLE1 'SUMMARY STATISTICS'.73 6.03 1.50 73 80 100 100 PRE-OP 1 32 99. TITLE3 'USING SUMMARY WITH A CLASS STATEMENT'.48 94.48 4. RUN.37 3.

for a 2 level breakdown. etc. so the advantages of using a CLASS statement are not important. It is produced by both MEANS and SUMMARY when an output dataset is requested. then statistics will be produced for females pre-op/ females post- op/ females follow-up which would have _TYPE_=2. _TYPE_ indicates the level of breakdown. which makes these procedures very powerful for summarizing data. • Explanation of the _TYPE_ Variable . Statistics for all females and for all males would be produced with _TYPE_=1. as well as for all pre-op. 0 indicates overall statistics. in addition to subgroup statistics.The _TYPE_ variable is a SAS produced variable.Week 11 SAS Procedures to Summarize Data • MEANS produces results to the output window. would be produced. 1 indicates that 1 level of sub- grouping is used. When several variables are used in a CLASS statement. when the class statement is used. • Again. etc. the overall statistics are not particularly meaningful. • SUMMARY produces overall statistics for all observations. and this is shown. Week 11 Page 20 of 58 . along with a printing of the requested output data set. many options are available for controlling the ways in which variable groups and subgroups are defined. • In this example. such as CLASS SEX PSTATUS. by default.

stem-and-leaf and box-and-whisker plots. • WARNING or TIP ??? (you decide) PROC UNIVARIATE will produce statistics for the group (or groups) defined by missing values for the BY variable. • UNIVARIATE can also be used with a BY statement. along with the default set of output variables. or you use a BY variable that has many groups. • Tests for normality of the distribution of the data are also available. along with a normal probability plot. There are several options for computing percentiles. • WARNING !!! PROC UNIVARIATE can take a lot of time. • The five smallest and five largest values can also be identified by an ID variable – which is useful when identifying cases with outliers. The number and percent missing values for a variable are also reported. • Along with producing output. This is especially true if you are not careful in defining your variable list. and produce tons of pages of output. Week 11 Page 21 of 58 . You can specify any set of percentile values to be included in the output data set can be requested. for previously sorted data. these allow you to compare visually the distribution of groups on a variable of interest. PROC UNIVARIATE can be used to produce an output data set. such as the 10th and 90th percentiles (or any other percentile you desire). • Tip – Use a PROC UNIVARIATE with a BY statement to get a side-by-side box-and-whisker plots. This is because missing values define a group.Week 11 SAS Procedures to Summarize Data b How to Use PROC UNIVARIATE PROC UNIVARIATE also produces descriptive statistics for continuous numeric variables –along with greater detail (including selected graphical descriptions) on the distribution of the variables. • UNIVARIATE can be used to produce percentiles. to produce separate statistics for each group.

libname bb 'C:\bblocker'. proc univariate plot normal data=bb. var age. run. proc sort data=bb. options pagesize=55 linesize=78 nocenter nodate nonumber. ** summary stats separately by gender **. run. by gender. run.bblock1. id counter. by gender. Week 11 Page 22 of 58 . var age.The example that follows uses data from a study of peri-operative beta blocker use in surgical patients. ** Example using Proc Univariate **.Week 11 SAS Procedures to Summarize Data Example . title1 'Grouping by Patient Gender'. proc univariate plot data=bb.bblock1. ***********************************************************.bblock1. id counter. ***********************************************************. title1 'Univariate statistics on Age'.

6742532 Variance 245.1982307 Kurtosis -0.2088608 Sum Observations 9513 Std Deviation 15.Week 11 SAS Procedures to Summarize Data The following output is produced.7009588 Uncorrected SS 611339 Corrected SS 38572.682214 Skewness -0. Univariate statistics on Age The UNIVARIATE Procedure Variable: AGE (AGE) Moments N 158 Sum Weights 158 Mean 60.1076 Coeff Variation 26.24697663 Week 11 Page 23 of 58 .0331337 Std Error Mean 1.

096303 Pr > A-Sq 0.980877 Pr < W 0.0001 Sign M 79 Pr >= |M| <.68221 Mode 70.5 Pr >= |S| <.0050 Anderson-Darling A-Sq 1.00000 Variance 245.085311 Pr > D <0.28387 Pr > |t| <.205514 Pr > W-Sq <0.0275 Kolmogorov-Smirnov D 0.Week 11 SAS Procedures to Summarize Data Basic Statistical Measures Location Variability Mean 60.0100 Cramer-von Mises W-Sq 0. -----p Value------ Shapiro-Wilk W 0.00000 NOTE: The mode displayed is the smallest of 2 modes with a count of 6.0073 Quantiles (Definition 5) Quantile Estimate 100% Max 93 99% 92 95% 85 90% 79 75% Q3 72 50% Median 62 25% Q1 47 10% 40 5% 35 1% 25 0% Min 21 Week 11 Page 24 of 58 .0001 Signed Rank S 6280. Tests for Location: Mu0=0 Test -Statistic.20886 Std Deviation 15.0001 Tests for Normality Test --Statistic--. -----p Value------ Student's t t 48.00000 Interquartile Range 25.67425 Median 62.00000 Range 72.

Week 11 SAS Procedures to Summarize Data Extreme Observations ----------Lowest--------.5+*++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Week 11 Page 25 of 58 .5+ *** | +** | **** | ***** | ****+ | *** | *+** 22.5+ +++* * | **+* * | +** | ***** | *****+ | ****++ | ***++ 57. ---------Highest--------- Value COUNTER Obs Value COUNTER Obs 21 1 1 87 18 18 25 84 84 87 27 27 25 8 8 88 102 102 27 109 109 92 65 65 31 75 75 93 9 9 Stem Leaf # Boxplot 9 23 2 | 8 5566778 7 | 8 1114 4 | 7 55666778888899999 17 | 7 00000011122222333333444 23 +-----+ 6 5556667777788899999 19 | | 6 0001122233344 13 *--+--* 5 555566666778899 15 | | 5 000111112244 12 | | 4 5556667777889999 16 +-----+ 4 011122223333444 15 | 3 56778999 8 | 3 112 3 | 2 557 3 | 2 1 1 | ----+----+----+----+--- Multiply Stem.Leaf by 10**+1 Variable: AGE (AGE) Normal Probability Plot 92.

• In addition. the same output is generated separately for each group defined by the BY variable.Week 11 SAS Procedures to Summarize Data When a BY statement is used. • Note . as it is too many pages.The separate statistics for the 2 gender groups are not shown in the output that follows. a side-by-side Box-and-Whisker plot to compare all groups is produced. as in the example below. Grouping by Patient Gender The UNIVARIATE Procedure Schematic Plots 100 + | | | | 90 + | | | | | | | | | 80 + | | | | | | | | | | +-----+ 70 + +-----+ | | | | | | | | | | | | | | | *-----* 60 + *--+--* | + | | | | | | | | | | | | | | | | 50 + +-----+ | | | | | | | | +-----+ | | | 40 + | | | | | | | | | | | 30 + | | | | | | | | | | 20 + | ------------+-----------+----------- GENDER 1 2 Week 11 Page 26 of 58 .

p-values. This is especially true for summary reports that are produced at regular intervals throughout a study. • In particular. etc)! This is possible when these results have been stored in output data sets. or repetitive cut and paste from crudely formatted output into tables for a report. • TIP . TABULATE provides tremendous flexibility in the formulation of tables • Counts and percentages for categorical variables can also be reported using PROC TABULATE. but it’s worth it!! • The reason is – once you’ve survived the learning time. Week 11 Page 27 of 58 . or with only minor modification.Week 11 SAS Procedures to Summarize Data c How to Use PROC TABULATE PROC TABULATE procedure is initially confusing to use but bear with it. it is a powerful tool for producing nicely formatted tables of descriptive statistics for groups and subgroups of classification variables. correlations. There is a higher start up learning time for using PROC TABULATE than the other procedures that produce summary statistics. while all of the statistics available in TABULATE (plus more) can be produced in other procedures. PROC TABULATE can be used to produce formatted tables that can be incorporated into a report directly.PROC TABULATE can also be used to print tables of results from other procedures (such as regression analyses. you are later spared the task of copying numbers. • Moreover.

and pages of a TABLE. • A TABLE statement is used to define the rows. OF OBS". and continuous numeric analysis variables (identified on a VAR statement) for which statistics are produced. • Patients completed a functional status questionnaire pre-operatively. 6 months post-operatively. a KEYLABEL statement would allow you to use the phrase "NO. In addition. and the format for printing them. • LABEL and FORMAT statements can be used to provide more descriptive information for variable names and values. Following are a few examples of PROC TABULATE. Week 11 Page 28 of 58 . along with the statistics to be produced. They illustrate some of the different ways of printing the same summary statistics for a data set. and again. there is a separate manual devoted entirely to PROC TABULATE. columns.Week 11 SAS Procedures to Summarize Data PROC TABULATE requires the specification of CLASS (categorical) variables used to form groups and subgroups. • PROC TABULATE is described in the SAS Procedures Guide. • KEYLABEL can be used to provide more descriptive row and column titles for the statistics requested. Description of the options is also available in the online documentation. For example in the place of N as a column heading for the number of observations. Example - • The data for this example come from a study of functional status outcome six months post- cardiac catheterization procedure.

Guidelines for specification of the TABLE statement in PROC TABULATE.to produce new titles for a new table. Functional status was assessed using 2 scores: physical functioning and mental functioning. These summary statistics are reported for (1) diabetic patients. statement). summary statistics are printed for change in physical and mental function scores.statement factors listed before a comma (.While it is possible to name and create several tables in a single PROC TABULATE procedure (before the RUN. Week 11 Page 29 of 58 .) define the table rows. and (4) the entire study cohort. this has to do with the titles statements. • An asterisk is used to separate sub-grouping within rows or columns. • Only one set of titles can be specified for a procedure -. • In the tables that follow. (2) non- diabetic patients. CLASS and VAR statements must be repeated. it is recommended that you request separate TABLE statements. the PROC TABULATE. • Table columns – statement factors listed after the comma (. • There are no default statistics produced with PROC TABULATE. (3) groups defined by age group. • Table rows .Week 11 SAS Procedures to Summarize Data • Of interest was a comparison of diabetic cardiac catheterization patients versus non-diabetic cardiac catheterization patients with respect to their change over time in functional status.) define the table columns. • The examples that follow serve to illustrate the control in table formatting that is available. TIP .

).2 MIN*F=8. * Example 1: Proc Tabulate *. * and sub-rows for age group *.. by using the phrase FORMAT=8. * repeat for mental function change. * assign formats and labels to variables *. MAX*F=8.2 on the PROC TABULATE statement (see final example). DIAB ALL will produce statistics for diabetics. * define analysis variables * * define table with rows for diab status. TITLE3 ' Using default column headings'.1 STD*F=8. (1) For example. rows defined by class variables *. * Statistics in columns. (N*F=5.g. MEAN*F=8. MEAN*F=8. using KEYLABEL *. CLASS DIAB AGEGROUP . Week 11 Page 30 of 58 . TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES'. LABEL PF2_1='PF2_1: Physical Function Change' MF2_1='MF2_1: Mental Function Change'. #####.. using 8 columns (including 1 for the decimal place) with 2 of the 8 after the decimal place (i. TITLE2 'Example 1: Age Groups within Diabetic Status'.2 requests that the mean be printed.e.##). (2) Alternatively..Week 11 SAS Procedures to Summarize Data • How to format the printing of a statistic – To accomplish formatting. • The keyword ALL is used to get overall statistics in addition to subgroup statistics (e. a single format can be defined for printing all statistics. VAR PF2_1 MF2_1. * define grouping variables *. *. FORMAT DIAB dfmt. TABLES (PF2_1) * ((DIAB ALL)*(AGEGROUP ALL)) . non-diabetics and overall). PROC TABULATE DATA=FD2 . RUN. the statistic name is followed by a formatting definition.

The following output is produced. RUN.4‚ 12.50‚ -29‚ 41‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Diabet-‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ic ‚UP ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚<65 ‚ 79‚ 1. VAR PF2_1 MF2_1.95‚ -29‚ 41‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚>=65 ‚ 385‚ 1.7‚ 13.85‚ -28‚ 36‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚All ‚ 177‚ 1.1 STD*F=8.44‚ -29‚ 41‚ Šƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ Week 11 Page 31 of 58 . SUMMARY STATISTICS FOR CHANGE SCORES Example 1: Age Groups within Diabetic Status Using default column headings „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ† ‚ ‚ N ‚ Mean ‚ Std ‚ Min ‚ Max ‚ ‡ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PF2_1: ‚DIAB ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚Physic-‡ƒƒƒƒƒƒƒ‰UP ‚ ‚ ‚ ‚ ‚ ‚ ‚al ‚Nondia-‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚Functi-‚betic ‚<65 ‚ 345‚ 5.).2‚ 11.8‚ 11. TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES'.6‚ 12. change titles *. TITLE2 'Example 1: Age Groups within Diabetic Status'. LABEL PF2_1='PF2_1: Physical Function Change' MF2_1='MF2_1: Mental Function Change'.1‚ 12. (N*F=5. CLASS DIAB AGEGROUP . TITLE3 ' using KEYLABEL to rename col and row headings'.Week 11 SAS Procedures to Summarize Data * separate procedure used to illustrate options.03‚ -29‚ 41‚ ‚on ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚Change ‚ ‚>=65 ‚ 287‚ 2. PROC TABULATE DATA=FD2 .9‚ 12. FORMAT DIAB dfmt.16‚ -28‚ 37‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚>=65 ‚ 98‚ 0. MAX*F=8.1‚ 11.7‚ 11.64‚ -28‚ 36‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚All ‚ 809‚ 3. TABLES (MF2_1) * ((DIAB ALL)*(AGEGROUP ALL)) .96‚ -28‚ 37‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚UP ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚<65 ‚ 424‚ 4. KEYLABEL ALL=TOTAL N='# of OBS'.57‚ -28‚ 36‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚All ‚ 632‚ 4.. MEAN*F=8.2 MIN*F=8.

45‚ -26‚ 21‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚TOTAL ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚UP ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚<65 ‚ 424‚ -0.Week 11 SAS Procedures to Summarize Data SUMMARY STATISTICS FOR CHANGE SCORES Example 1: Age Groups within Diabetic Status using KEYLABEL to rename col and row headings „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ† ‚ ‚# of ‚ ‚ ‚ ‚ ‚ ‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ ‡ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚MF2_1: ‚DIAB ‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚Mental ‡ƒƒƒƒƒƒƒ‰UP ‚ ‚ ‚ ‚ ‚ ‚ ‚Functi-‚Nondia-‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚on ‚betic ‚<65 ‚ 345‚ -0.48‚ -52‚ 32‚ ‚Change ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚>=65 ‚ 287‚ 0.85‚ -18‚ 21‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚TOTAL ‚ 177‚ 0.90‚ -30‚ 28‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚TOTAL ‚ 632‚ 0.79‚ -52‚ 32‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Diabet-‚AGEGRO-‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ic ‚UP ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚<65 ‚ 79‚ -0.5‚ 9.2‚ 11.1‚ 10.2‚ 9.2‚ 10.2‚ 11.17‚ -26‚ 20‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚>=65 ‚ 98‚ 0.6‚ 8.1‚ 10.64‚ -30‚ 28‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚TOTAL ‚ 809‚ 0.23‚ -52‚ 32‚ ‚ ‚ ‡ƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚>=65 ‚ 385‚ 0.50‚ -52‚ 32‚ Šƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ Week 11 Page 32 of 58 .6‚ 9.

* and statistics and columns for groups.. TITLE2 'Example 2: Subgroups in Columns'. * define table with rows for change scores *. groups and subgroups in columns *. KEYLABEL ALL=TOTAL N='# of OBS'. * define grouping variables *. VAR PF2_1 MF2_1. ((DIAB ALL)*(AGEGROUP ALL)) .1 STD*F=8. Week 11 Page 33 of 58 . * Statistics in rows. RUN. LABEL PF2_1='PF2_1: Physical Function Change' MF2_1='MF2_1: Mental Function Change'. * assign formats and labels to variables *. * and keylabels to headers *.) . *. PROC TABULATE DATA=FD2 .Week 11 SAS Procedures to Summarize Data * Example 2: Proc Tabulate *. FORMAT DIAB dfmt. MAX*F=8. TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES'. TABLES (PF2_1 MF2_1) * (N*F=5. * and sub-groups *. MEAN*F=8.2 MIN*F=8. * define analysis variables *. CLASS DIAB AGEGROUP .

85‚ 11.Week 11 SAS Procedures to Summarize Data SUMMARY STATISTICS FOR CHANGE SCORES Example 2: statistics in rows and Subgroups in Columns „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ ‚ DIAB ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ Nondiabetic ‚ Diabetic ‚ TOTAL ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚ AGEGROUP ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚ <65 ‚ >=65 ‚ TOTAL ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PF2_1: Physical ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚ ‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Mean ‚ 5.03‚ 11.44‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Min ‚ -29‚ -28‚ -29‚ -28‚ -28‚ -28‚ -29‚ -28‚ -29‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Max ‚ 41‚ 36‚ 41‚ 37‚ 36‚ 37‚ 41‚ 36‚ 41‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚MF2_1: Mental ‚# of OBS ‚ 345‚ 287‚ 632‚ 79‚ 98‚ 177‚ 424‚ 385‚ 809‚ ‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Mean ‚ -0.5‚ 0.90‚ 10.23‚ 9.16‚ 11.2‚ 0.95‚ 11.50‚ 12.1‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Std ‚ 11.2‚ 0.50‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Min ‚ -52‚ -30‚ -52‚ -26‚ -18‚ -26‚ -52‚ -30‚ -52‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Max ‚ 32‚ 28‚ 32‚ 20‚ 21‚ 21‚ 32‚ 28‚ 32‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ Week 11 Page 34 of 58 .4‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Std ‚ 13.85‚ 9.1‚ -0.2‚ -0.17‚ 8.64‚ 10.2‚ 0.8‚ 3.6‚ 0.79‚ 10.57‚ 12.6‚ 0.48‚ 9.7‚ 1.1‚ 4.45‚ 11.2‚ 4.9‚ 1.64‚ 12.1‚ 1.96‚ 12.7‚ 2.6‚ 0.

options ls=175.Week 11 SAS Procedures to Summarize Data * Example 3: Proc Tabulate *. CLASS DIAB AGEGROUP . TABLES (PF2_1 MF2_1) * (DIAB ALL) . nd * Statistics in and 1 group in rows.1 STD*F=8. FORMAT DIAB dfmt. TITLE2 'Example 3: Cross-classifying Diab (row) and Age Groups (Col)'. TITLE1 'SUMMARY STATISTICS FOR CHANGE SCORES'.). VAR PF2_1 MF2_1. (AGEGROUP ALL) * (N*F=5. Week 11 Page 35 of 58 . MEAN*F=8. RUN. PROC TABULATE DATA=FD2 .. * reset linesize option so table will fit across page *. LABEL PF2_1='PF2_1: Physical Function Change' MF2_1='MF2_1: Mental Function Change'. MAX*F=8.2 MIN*F=8. KEYLABEL ALL=TOTAL N='# of OBS'. 2 group in columns *.

85‚ -28‚ 36‚ 177‚ 1.7‚ 11.2‚ 9.2‚ 10.1‚ 10.1‚ 11.64‚ -30‚ 28‚ 809‚ 0.95‚ -29‚ 41‚ 385‚ 1.9‚ 12.2‚ 11.5‚ 9.7‚ 13.17‚ -26‚ 20‚ 98‚ 0.44‚ -29‚ 41‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚MF2_1: Mental ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚Nondiabetic ‚ 345‚ -0.Week 11 SAS Procedures to Summarize Data SUMMARY STATISTICS FOR CHANGE SCORES Example 3: Cross-classifying Diab (row) and Age Groups (Col) „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ ‚ AGEGROUP ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ <65 ‚ >=65 ‚ TOTAL ‚ ‚ ‡ƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚# of ‚ ‚ ‚ ‚ ‚ ‚ ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ OBS ‚ Mean ‚ Std ‚ Min ‚ Max ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚PF2_1: Physical ‚DIAB ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚Function Change ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚Nondiabetic ‚ 345‚ 5.79‚ -52‚ 32‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Diabetic ‚ 79‚ -0.6‚ 12.50‚ -29‚ 41‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚Diabetic ‚ 79‚ 1.57‚ -28‚ 36‚ 632‚ 4.1‚ 10.64‚ -28‚ 36‚ 809‚ 3.85‚ -18‚ 21‚ 177‚ 0.1‚ 12.23‚ -52‚ 32‚ 385‚ 0.2‚ 11.16‚ -28‚ 37‚ 98‚ 0.2‚ 11.50‚ -52‚ 32‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ Week 11 Page 36 of 58 .6‚ 8.8‚ 11.90‚ -30‚ 28‚ 632‚ 0.48‚ -52‚ 32‚ 287‚ 0.4‚ 12.03‚ -29‚ 41‚ 287‚ 2.6‚ 9.96‚ -28‚ 37‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚TOTAL ‚ 424‚ 4.45‚ -26‚ 21‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚ ‚TOTAL ‚ 424‚ -0.

Week 11 Page 37 of 58 . • How to get row percents . the Column variable (DIAB in this case) would be listed (e.Week 11 SAS Procedures to Summarize Data Additional options in TABULATE let you define if and where you want lines in the tables. PCTN<DIAB ALL>) instead of the row variables.. rather than a different format for each statistic. This works here. • Note that all the variables are categorical. • The KEYLABEL statement is used to replace the word ‘ALL’ on the output with the word ‘TOTAL’. and I’m content to report percents rounded to a whole percent. The next example illustrates reporting counts (N) and percentages (PCTN) using PROC TABULATE.To get row percents.g. This is useful for producing tables for publication – some journals ask that tables be presented without lines.Column percents are defined by listing the ROW variables within < > after the keyword PCTN. • For example you may choose not to have vertical or horizontal separators. • How to get column percents . • In this example a single format is defined for the whole table. because counts will always be whole numbers. or CLASS variables. on the PROC line.

** Using only CLASS (categorical) variables **.Week 11 SAS Procedures to Summarize Data ** Example 4: Counts and Percentages **.0. ** Cols: diabetic status **. DIAB DIABF. (DIAB ALL)*(N PCTN<SEX RACE2 AGEGROUP ALL>='PERCENT') / RTS=18. age group **. DESCRIPTIVE TABLE FOR DEMOGRAPHICS Example 4: Counts and Percentages „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ ‚ DIABETIC*STATUS ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ Nondiabetic ‚ Diabetic ‚ TOTAL ‚ ‚ ‡ƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒ‰ ‚ ‚ N ‚PERCENT ‚ N ‚PERCENT ‚ N ‚PERCENT ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚SEX ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚Female ‚ 837‚ 31‚ 400‚ 41‚ 1237‚ 34‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚Male ‚ 1875‚ 69‚ 576‚ 59‚ 2451‚ 66‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚RACE2 ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚Caucasian ‚ 2589‚ 95‚ 844‚ 86‚ 3433‚ 93‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚Other ‚ 123‚ 5‚ 132‚ 14‚ 255‚ 7‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚AGEGROUP ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚<65 ‚ 1566‚ 58‚ 496‚ 51‚ 2062‚ 56‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚>=65 ‚ 1146‚ 42‚ 480‚ 49‚ 1626‚ 44‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ‰ ‚TOTAL ‚ 2712‚ 100‚ 976‚ 100‚ 3688‚ 100‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒŒ Week 11 Page 38 of 58 . TITLE2 'Example 4: Counts and Percentages '. PROC TABULATE DATA=FD2 FORMAT=8.. KEYLABEL ALL=TOTAL. RUN. TITLE1 'DESCRIPTIVE TABLE FOR DEMOGRAPHICS'. race. TABLES (SEX RACE2 AGEGROUP ALL). RACE2 RACEF. FORMAT SEX $SEXF. ** row percents reported **. CLASS DIAB SEX RACE2 AGEGROUP . ** Rows: sex.

name the variables separated by an asterisk (i. To define rows . • To get cross-tabulations. simply list the variable(s) on the TABLES statement separated by spaces. To define columns . • PROC FREQ can also be used to produce an output data set in addition to.The second named variable (var2) defines the columns.The first named variable (var1) defines the rows. • PROC FREQ can produce cross-tabulations. cross-tabulations and multi-way tables on a single TABLES statement. titles may be the limiting factor in the number of tables requested in a single procedure. or in place of results in the output window. Multiple TABLES statements can be given in the same procedure.Only one set of titles may be given for all the tables requested in a PROC FREQ Thus. Week 11 Page 39 of 58 . • Counts and percentages are produced for each group defined by variable values. VAR1 * VAR2). • You can list any combination of individual variables.e. as well as one-way frequency listings. or by the crossing of variable values. A TABLES statement is used to define the frequency tables. • To produce one-way tables.Week 11 Procedures to Summarize Data 3 How to Produce Frequency Tables and Cross-Tabulations (PROC FREQ) PROC FREQ produces frequency table summaries of the distributions of discrete numeric or character variables.. • Chi-square tests and other measures of association can also be produced by PROC FREQ (these will not be discussed in this course). • TIP .

character or numeric. the statement TABLES score1-score10.Week 11 Procedures to Summarize Data There are shorthand methods for listing variables in the TABLES statement. • If a set of variables has the same prefix with sequential numbering. For example. and all variables in order between them in the data set. There are also shorthand ways of listing cross-tabulations. would produce tables for age. All the variables listed in this manner must be of the same type. no spaces. the line TABLES age--insur. score1 through score10. no spaces. name the first and last variable. as long as they are all numeric or all categorical. For example. separated by a single hyphen. • For example if you want cross-tabulations of sex by a whole set of variables use TABLES SEX * (firstvar--lastvar). separated by a double hyphen. • WARNING!! Be careful requesting a group of variables crossed by another group – listing (var1--varn) * (varlist1--varlistn) – you may end up with a lot more tables than you bargained for! Week 11 Page 40 of 58 . insur. • For variables without a common prefix. list the first and last in order as they appear in a data set (use PROC CONTENTS POSITION to know correct order). would produce frequency tables for all ten variables.

For example the statement TABLES SEX * AGEGR * HSMOKE. the first variable defines the table.g. • An alternative way to achieve the result would be to request tables of AGEGR * HSMOKE. the second the rows. heavy). and would produce different statistics. and the third the columns. • If neither option is specified (default) . • If you use the MISSPRINT option - .e.Missing values will not be used in computing percentages. Missing values are NOT included in frequency tables unless the options MISSPRINT or MISSING are used. of age group against smoking status (light.Missing values will be used in computing percentages. one table for males and one for females. moderate. using a separate BY SEX statement. • If you use the MISSING option - .. would produce two cross-tabulations. should you be doing analyses as well as summary tables (e. different chi-square tests).Missing values are not included in the table AND .Missing values will be included in the table AND . • Options are listed in the TABLES statement and appear after a slash (/). This requires that the data set be previously sorted by sex.Missing values are not used in computing percentages.. (i. BUT . A*B*C).Week 11 Procedures to Summarize Data • For multi-way tables.Missing values will be included in the table. Week 11 Page 41 of 58 .

The MISSPRINT option includes these in the table. • Two questions of interest in this study might be: 1) What proportion of all study women had a therapeutic abortion? 2) What proportion of early pregnancy losses were therapeutic abortions? • Recall that options are listed on a TABLES statement following a slash.Week 11 Procedures to Summarize Data Example Illustrating the Options for Handling Missing Data - • This example uses data from a study of patients transferred from midwifery care during the course of a pregnancy. • Note that the WHERE statement is used in this example. so that the difference between those for whom the question is not applicable. One reason patients left midwifery care was early loss. Thus. and those with missing data can be seen. more detail about the loss was requested. for other women this question was not applicable. If a woman suffered an early loss. Week 11 Page 42 of 58 . this is done so that the PROC FREQ operates on a selected subset of the observations only. • The default leaves missing values out of the table altogether. the slash appears after the listing of variables.

PROC FORMAT. Week 11 Page 43 of 58 . TABLES EARLYLOS.. PROC FREQ DATA=CNMT. VALUE EARLYFMT 1='TAB' 2='SAB <11 WEEKS' 3='SAB 12-19 WEEKS' 4='SAB 20-24 WEEKS' . RUN. PROC FREQ DATA=CNMT. TABLES EARLYLOS / MISSPRINT.='MISSING' . TITLE2 ‘MISSPRINT OPTION’. FORMAT EARLYLOS EARLYFMT. PROC FREQ DATA=CNMT. TITLE2 ‘DEFAULT OPTION FOR MISSING VALUES’. TITLE2 ‘MISSING OPTION'.N='NOT APPLICABLE'. ***********************************************************.. TABLES EARLYLOS / MISSING. TITLE2 ‘MISSING OPTION'. FORMAT EARLYLOS EARLYFMT. * example to illustrate different missing options in FREQ *. PROC FREQ DATA=CNMT. RUN. FORMAT EARLYLOS EARLYFMT.N. TITLE3 ‘WHERE used to select applicable cases only’. RUN. WHERE EARLYLOS NE ..Week 11 Procedures to Summarize Data ***********************************************************. including formats for missing codes *. RUN. TABLES EARLYLOS / MISSING. RUN. ** with and without missprint and missing options **.. TITLE1 'CNM TRANSFER STUDY'. FORMAT EARLYLOS EARLYFMT. * 1st define formats.

9 56 100. .1 SAB 20-24 WEEKS 10 17.9 46 82. . .6 11 19. .1 SAB 20-24 WEEKS 10 17.6 SAB <11 WEEKS 16 28.0 Frequency Missing = 286 Week 11 Page 44 of 58 .2 SAB 12-19 WEEKS 19 33. NOT APPLICABLE 280 .6 27 48.9 56 100.6 SAB <11 WEEKS 16 28.2 SAB 12-19 WEEKS 19 33.6 27 48.0 Frequency Missing = 286 CNM TRANSFER STUDY MISSPRINT OPTION EARLY PREGNANCY LOSS Cumulative Cumulative EARLYLOS Frequency Percent Frequency Percent MISSING 6 .6 11 19.Week 11 Procedures to Summarize Data CNM TRANSFER STUDY DEfAULT OPTION FOR MISSING VALUES EARLY PREGNANCY LOSS Cumulative Cumulative EARLYLOS Frequency Percent Frequency Percent TAB 11 19.9 46 82. TAB 11 19.

1 62 100.1 SAB 20-24 WEEKS 10 2. 3. 19.6 percent had therapeutic abortions (TAB).7 313 91.8 SAB <11 WEEKS 16 4. these can now be included in the table to show that among the patients with Week 11 Page 45 of 58 .5 SAB 12-19 WEEKS 19 5.Week 11 Procedures to Summarize Data CNM TRANSFER STUDY MISSING OPTION EARLY PREGNANCY LOSS Cumulative Cumulative EARLYLOS Frequency Percent Frequency Percent MISSING 6 1.0 CNM TRANSFER STUDY MISSING OPTION WHERE used to select applicable cases only EARLY PREGNANCY LOSS Cumulative Cumulative EARLYLOS Frequency Percent Frequency Percent MISSING 6 9.7 17 27.2 SAB 12-19 WEEKS 19 30.0 • In the first two tables the computed percentages are identical. to create a table only on patients with applicable data – in this case only those with early loss.9 342 100.7 52 83. which includes missing values in computation of percentages. • The third table uses the MISSING option.9 SAB 20-24 WEEKS 10 16.8 6 1.4 SAB <11 WEEKS 16 25. for example that of the patients with an early loss of known type.2 297 86.6 332 97.7 TAB 11 17.9 286 83. From this table we can see that of all the transferred patients.6 TAB 11 3.8 NOT APPLICABLE 280 81.7 6 9. From these we can see. • The final table uses the WHERE statement as well as the MISSING option. but use of the MISSPRINT option distinguishes the not applicable (did not have an early loss) from those with missing information. Since 6 patients had early loss of unknown type or age.2 percent had therapeutic abortions.8 33 53.

7% were known to have therapeutic abortions. PROC FORMAT. • NOFREQ suppresses printing of cell counts. 17. There are other options to control what is printed in the tables • NOCUM suppresses printing of cumulative frequencies and percentages. ** Example Crosstabulations **. respectively. inasmuch as cumulative frequencies and cumulative percentages don’t make much sense. Example Illustrating the Creation of Cross-Tabulations - • The first example produces two tables. and trimester by earlylos. trimester by age group.7% were of unknown type/age. • The second TABLES statement produces separate tables of age group by earlylos for each level of trimester. since by definition no one in the third trimester can have an early loss.Week 11 Procedures to Summarize Data an early loss. NOCOL. VALUE AGEFMT 1='<15 YRS' 2='15-17 YRS' 3='18-19 YRS' 4='20+ YRS'.The availability of different options is useful. VALUE EARLYFMT 1='TAB' 2='SAB <11 WEEKS' 3='SAB 12-19 WEEKS' 4='SAB 20-24 WEEKS' . and NOPERCENT suppress row. • This example illustrates choosing to suppress percentagles. Note that there is no table for the third trimester. column and overall cell percentages. • TIP . NOROW. and that among all patients with early loss. depending on what you want to know. 9. This is especially appropriate for nominal data.='MISSING' Week 11 Page 46 of 58 .

TABLES TRIMES * (AGEGR EARLYLOS).75 100.98 | 13. TABLES TRIMES * AGEGR * EARLYLOS / NOPERCENT NOROW NOCOL. RUN. * look at trimester by 2 different factors *.05 | 1.84 | 15.48 27.86 | 28.08 | ---------+--------+--------+--------+--------+ Total 11 63 51 48 173 6.00 | 5.36 36.73 | 6.23 | 12.42 29.29 | 11. PROC FREQ DATA=CNMT.25 | ---------+--------+--------+--------+--------+ 2 | 8 | 19 | 23 | 20 | 70 | 4.73 | 20.16 | 45.76 | | 27.67 | 25.37 | 36. EARLYLOS EARLYFMT.55 | 3.73 | 10..79 | | 0.00 | 14.43 | 27.45 | 48.00 Frequency Missing = 169 Week 11 Page 47 of 58 .57 | | 72.00 | 47.18 | 52.73 | 30.62 | 10.46 | 11. RUN. ** cross-tabulations **.27 | 55.29 | 13.00 | 29.98 | 0. CNM TRANSFER STUDY TABLE OF TRIMES BY AGEGR TRIMES(TRIMESTER OF PREGNANCY AT TRANSFER) AGEGR(PATIENT AGE AT TRANSFER) Frequency| Percent | Row Pct | Col Pct |<15 YRS |15-17 YR|18-19 YR|20+ YRS | | |S |S | | Total ---------+--------+--------+--------+--------+ 1 | 0 | 9 | 7 | 3 | 19 | 0. TITLE1 ‘CNM TRANSFER STUDY’.14 | 14.10 | 41.20 | 4. FORMAT AGEGR AGEFMT.57 | 41.56 | 41.N='NOT APPLICABLE'. * look at tables of age group by earlylos.Week 11 Procedures to Summarize Data .14 | 32.67 | ---------+--------+--------+--------+--------+ 3 | 3 | 35 | 21 | 25 | 84 | 1.56 | 40. separately for each trimester *.

| .17 | 20.00 | .00 | 10.57 | 50.00 | 0.Week 11 Procedures to Summarize Data TABLE OF TRIMES BY EARLYLOS TRIMES(TRIMESTER OF PREGNANCY AT TRANSFER) EARLYLOS(EARLY PREGNANCY LOSS) Frequency| Percent | Row Pct | Col Pct |TAB |SAB <11 |SAB 12-1|SAB 20-2| | |WEEKS |9 WEEKS |4 WEEKS | Total ---------+--------+--------+--------+--------+ 1 | 3 | 12 | 5 | 0 | 20 | 6.00 | ---------+--------+--------+--------+--------+ 2 | 3 | 1 | 14 | 10 | 28 | 6.71 | 3.00 Frequency Missing = 294 Week 11 Page 48 of 58 .68 | 100. | .83 100.00 | 0.00 | 7.67 | 15.00 | 41.00 | 0.25 | 25.00 | 25.00 | 92.69 | 73.25 | 2.58 20.00 | 0.00 | 0.00 | 35.00 | 0.31 | 26.00 | ---------+--------+--------+--------+--------+ 3 | 0 | 0 | 0 | 0 | 0 | 0.83 | 58.00 | 60.00 | ---------+--------+--------+--------+--------+ Total 6 13 19 10 48 12.33 | 10.00 | 0.42 | 0.71 | | 50.08 | 29.08 39. | .00 | | 50. | | 0.32 | 0.00 | 0.50 27.

Week 11 Procedures to Summarize Data CNM TRANSFER STUDY TABLE 1 OF AGEGR BY EARLYLOS CONTROLLING FOR TRIMES=1 AGEGR(PATIENT AGE AT TRANSFER) EARLYLOS(EARLY PREGNANCY LOSS) Frequency |TAB |SAB <11 |SAB 12-1|SAB 20-2| | |WEEKS |9 WEEKS |4 WEEKS | Total ----------+--------+--------+--------+--------+ <15 YRS | 0 | 0 | 0 | 0 | 0 ----------+--------+--------+--------+--------+ 15-17 YRS | 1 | 6 | 0 | 0 | 7 ----------+--------+--------+--------+--------+ 18-19 YRS | 2 | 2 | 1 | 0 | 5 ----------+--------+--------+--------+--------+ 20+ YRS | 0 | 1 | 0 | 0 | 1 ----------+--------+--------+--------+--------+ Total 3 9 1 0 13 Frequency Missing = 15 TABLE 2 OF AGEGR BY EARLYLOS CONTROLLING FOR TRIMES=2 AGEGR(PATIENT AGE AT TRANSFER) EARLYLOS(EARLY PREGNANCY LOSS) Frequency |TAB |SAB <11 |SAB 12-1|SAB 20-2| | |WEEKS |9 WEEKS |4 WEEKS | Total ----------+--------+--------+--------+--------+ <15 YRS | 1 | 0 | 2 | 1 | 4 ----------+--------+--------+--------+--------+ 15-17 YRS | 2 | 0 | 6 | 0 | 8 ----------+--------+--------+--------+--------+ 18-19 YRS | 0 | 1 | 3 | 0 | 4 ----------+--------+--------+--------+--------+ 20+ YRS | 0 | 0 | 1 | 1 | 2 ----------+--------+--------+--------+--------+ Total 3 1 12 2 18 Frequency Missing = 77 TABLE 3 OF AGEGR BY EARLYLOS CONTROLLING FOR TRIMES=3 Effective Sample Size = 0 Frequency Missing = 127 Week 11 Page 49 of 58 .

spacing) that you are after. When you are printing combinations of text and variable values. Alternatively. and the comma is not printed. and use the special features that allow you to specify the mailing label format name. and more. so spacing is defined for you. it may be easier to export your data from SAS to ACCESS. This example uses the LASTNAME feature to reorder a name given as last. • LINE statements give a line number. step . followed by the text before the comma. • The option PACK removes extra spaces that would appear. number of lines to skip between forms (in this example. and that use information stored in SAS data sets.Week 11 Procedures to Summarize Data 4. as well as others options. if a character variable doesn’t use all of the available character variable length – as shown in the second example. The PROC FORMS statement names the data file to use. number of units per page. followed by the variable names to be printed on the line. this can sometimes be easier to use. Week 11 Page 50 of 58 . SKIP=2). number of forms to print down and across a page. spacing. without the options. the first and last names could have been read in as separate variables. first. In some instances. • Beware! It may take some trial and error (with settings) to obtain the exact look (e. file cards – any printer forms that have a regular pattern. How to Print Forms: PROC FORMS Proc FORMS can be used to print mailing labels. must be used to remove all titles. indentation. The option puts the text after a comma first. An example for printing mailing labels is shown. • An alternative approach to producing “forms” is to use a PUT statement in a DATA _NULL_.g. • There are options within the procedure to define the form dimensions. • Note that the statement TITLE1. followed by a series of options to control page and line size. Options for printing follow the slash (/).

title1 'List of subjects'. Raleigh.. data mail. IL 61021 08 Baskowshi. line 3 addr2 zip. addr2 & $25.Week 11 Procedures to Summarize Data ** PROC FORMS EXAMPLE **. line 1 name / lastname. id sid. 568 Trillion Ct. Sacramento. Box 42. Mary K. Bonnie G. 01 Johnson. Dixon. Austin. CO 80237 03 Rodriquez. Brenda K. cards. Box 466. title1. * without options *. Rt. Box 523. Box 243. CA 85841 . proc print data=mail. RUN. TX 78702 05 Hawks. RUN. Chen 123 Maple St. 22 Meredith Blvd. Patrick E. * proc forms with options *. addr1 & $25. Denver. Taylorsville. Lee. Montgomery. Rt. Al 36113 02 Abbott. run.. R. line 2 addr1 . run. proc forms data=mail. SC 29412 04 Stevenson. proc forms data=mail skip=2. Charleston. NC 28681 06 Lee. * print subject list *. line 1 name . P. Juan 619 Powell Dr. zip. var name addr1 addr2 zip. P. 1. Week 11 Page 51 of 58 .. input sid name & $20.O. line 2 addr1 . ** read in ASCII data **. line 3 addr2 zip/pack. NC 27606 07 Weinstein. 4. Joseph M.O.

NC 27606 7 Weinstein. 4.O.Week 11 Procedures to Summarize Data List of subjects sid name addr1 addr2 zip 1 Johnson. Montgomery. Montgomery.. Rt. TX 78702 5 Hawks. Taylorsville. R...O. Joseph M. 1. Dixon. NC 28681 Chen Lee 123 Maple St.O. Johnson P. Box 243. Raleigh. Al 36113 2 Abbott. SC 29412 4 Stevenson.O. Chen 123 Maple St. Box 523.. Box 243. R. Al 36113 Brenda K. Juan 619 Powell Dr. Austin. Box 42. Rt. Box 466. P. Denver. Box 466. 4. Sacramento. Brenda K. Denver. Weinstein Rt. 568 Trillion Ct. Charleston. Abbott 568 Trillion Ct. Mary K. NC 28681 6 Lee. Hawks Rt.. Baskowshi P. 22 Meredith Blvd. NC 27606 Joseph M. Taylorsville. Dixon. Bonnie G. Lee. Austin. Patrick E. 1. Stevenson 22 Meredith Blvd. Charleston. Sacramento. TX 78702 Patrick E. CO 80237 Juan Rodriquez 619 Powell Dr. Raleigh. IL 61021 Bonnie G. CA 85841 Week 11 Page 52 of 58 . IL 61021 8 Baskowshi. CA 85841 (First example using options) Lee. Box 523. CO 80237 3 Rodriquez.. P. SC 29412 Mary K. Box 42.

. Al 36113 Abbott. IL 61021 Baskowshi. Box 523.O. Juan 619 Powell Dr. SC 29412 Stevenson. R. 1. TX 78702 Hawks. Rt. 568 Trillion Ct. Box 42. Raleigh. Denver. Bonnie G. Montgomery. Box 466. 22 Meredith Blvd. CO 80237 Rodriquez. Brenda K.O. Rt. NC 27606 Weinstein. Patrick E. Sacramento.Week 11 Procedures to Summarize Data (2nd example without options) Johnson. Taylorsville.. Austin. CA 85841 Week 11 Page 53 of 58 . 4. Box 243. P. Mary K.. Charleston. NC 28681 Lee. Lee. Joseph M. Dixon. P. Chen 123 Maple St.

once a report has been designed. and the programming statements saved. multiple status reports. However. this allows you to insert your own text in the report. Week 11 Page 54 of 58 . • The flexibility in reporting using PROC REPORT is a little better than that for PROC TABULATE. reports can be generated directly as SAS output that require little or no further editing before presentation. page breaks and other formatting tools. PROC REPORT is handy inasmuch as. How to Use PROC REPORT PROC REPORT is another PROC that is worth the time and effort required to learn. PROC FORMAT also has features akin to a PUT statement (LINE in Proc Report). background and text colors (or output text or html files) as well as the controls it provides over spacing. • When used well. • TIP – Consider using this procedure when you need to generate regular. it can easily be re-generated. • PROC REPORT encompasses many of the features of Procs PRINT. • PROC REPORT is also nice for having features that allow comprehensive control over fonts. both save later “cut and paste” work to create a project document. incorporated into an organized report. MEANS and TABULATE.Week 11 Procedures to Summarize Data 5. • Its features allow the presentation of detail (individual observations) and summary data.

but this designation will order by value. ACROSS variables are comparable to CLASS variables in other procedures. Each demonstrates a few new features. unless another designation is given. a DEFINE statement is used to designate the display category for the report: ƒ DISPLAY – A row appears for every observation for variables with this designation. • A report’s layout is largely determined by the designation of variables into various categories. • For each variable. By default.Week 11 Procedures to Summarize Data When using PROC REPORT. as a display variable. plan in advance the layout of your report. but used to define columns in PROC REPORT.Displays other statistics and t-test results. Example • This example uses data from the study of change in functional status. Part 1 . ƒ ORDER – A row appears for every observation for variables with this designation. 6 months post-cardiac catheterization among diabetic and non-diabetic patients. Week 11 Page 55 of 58 . ƒ ACROSS – Variables with this designation determine columns for the report – one for each distinct value of the variable present in the input data. for each cell of a report produced by ACROSS by GROUP designations. and Part 3 . • The example is in three parts. ƒ GROUP – This designation groups on variable values to determine rows in the report.Adds formatting features and more summary statistics.Prints means for physical function change scores by sex by age group by diabetic status. ƒ ANALYSIS – Numeric variables that are used for computation of summary statistics. This study was described in the section on PROC TABULATE (see. Part 2 . all variables are considered DISPLAY variables. p 27) . akin to using a CLASS variable in other procedures.

Week 11 Procedures to Summarize Data * example part 1 *.7417697 Male 0. assign formats *.5627356 Week 11 Page 56 of 58 . age group and sex *. TITLE1 'EXAMPLE USING PROC REPORT'. RUN. * define nominal variables to create rows. COLUMN DIAB AGEGROUP SEX PF2_1 . PROC REPORT DATA=FD2 NOWD HEADSKIP.722176 Nondiabetic <65 Female 5. DEFINE DIAB / GROUP FORMAT=DFMT. EXAMPLE USING PROC REPORT PF AGEG CHANGE DIAB ROUP SEX MEAN Diabetic <65 Female 1.5194106 Male 2. DEFINE AGEGROUP / GROUP .4254753 Male 1. * define columns for report *.. * define statistic to print for pf change as the mean *. DEFINE PF2_1 / ANALYSIS MEAN 'PF CHANGE MEAN'. * use NOWD to suppress windows. headskip to skip line after header row*. * numeric var: change in pf score *. * nominal vars: diab. * assign title to column *.9010155 Male 5.7350271 >=65 Female 0. DEFINE SEX / GROUP FORMAT=$SEXF..5899143 >=65 Female 1.

COLUMN DIAB AGEGROUP SEX PF2_1=PFN PF2_1=PFMEAN PF2_1=PFMiN PF2_1=PFMAX.0 >=65 Female 40 0.1 Male 253 5. DEFINE SEX / GROUP FORMAT=$SEXF. * create line breaks after each group *. TITLE2 'ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS'.7 -20. * and give column header and format *..7 -20.8 37.4 -28.0 36. DEFINE PFN / ANALYSIS N 'N' FORMAT=3.4 Nondiabetic <65 Female 92 5. to print several statistics *. DEFINE DIAB / GROUP 'DIABETIC STATUS' FORMAT=DFMT.Week 11 Procedures to Summarize Data * EXAMPLE part 2: ADDING FORMATTING FEATURES and more statistics *. DEFINE PFMEAN / ANALYSIS MEAN 'PF CHANGE MEAN' FORMAT=6.. * Define several columns for pf change.3 Week 11 Page 57 of 58 . RUN.7 36.5 -24.4 38. * define each of the statistics to use for PF change *.1.4 Male 187 2..6 -28. DEFINE AGEGROUP / GROUP 'AGE GROUP' FORMAT=$6.1 27.3 Male 45 1. * define nominal variables adding column labels and formats *.2 25. BREAK AFTER DIAB / SUPPRESS SKIP. BREAK AFTER AGEGROUP / SKIP SUPPRESS.0 >=65 Female 100 1..1. DEFINE PFMIN / ANALYSIS MIN 'PF CHANGE MIN' FORMAT=6.7 -27.1. EXAMPLE USING PROC REPORT ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS PF PF PF DIABETIC AGE CHANGE CHANGE CHANGE STATUS GROUP SEX N MEAN MIN MAX Diabetic <65 Female 34 1.6 -28.5 30.3 41. DEFINE PFMAX / ANALYSIS MAX 'PF CHANGE MAX' FORMAT=6.9 -29.8 Male 58 0. PROC REPORT DATA=FD2 NOWD HEADSKIP.

including several columns for pf change *. DEFINE PFMEAN / ANALYSIS MEAN 'PF CHANGE MEAN' FORMAT=6..5 1.0039 Week 11 Page 58 of 58 .7 1.6 0.9 0. DEFINE AGEGROUP / GROUP 'AGE GROUP' FORMAT=$6.4 0.9 1. DEFINE PFSTDERR / ANALYSIS STDERR 'STD ERROR PF CHANGE' FORMAT=6. DEFINE PFTEST / ANALYSIS PRT 'P-VALUE: MEAN=0' FORMAT=8.4. DEFINE PFN / ANALYSIS N 'N' FORMAT=3. BREAK AFTER DIAB / SUPPRESS SKIP.1 0. PROC REPORT DATA=FD2 NOWD HEADSKIP. ** define statistics for pf change. COLUMN DIAB AGEGROUP SEX PF2_1=PFN PF2_1=PFMEAN PF2_1=PFSTDERR PF2_1=PFTEST..6521 Nondiabetic <65 Female 92 5.4 2.5522 Male 45 1. BREAK AFTER AGEGROUP / SKIP SUPPRESS. RUN.1602 Male 187 2.7 1.8 0. **.. DEFINE SEX / GROUP FORMAT=$SEXF.0001 >=65 Female 100 1.6 0.4 0.0001 Male 253 5.6 0. ** including test that mean change is different from 0 **. ** define nominal vars with column labels and formats **.2906 >=65 Female 40 0.8 0. EXAMPLE USING PROC REPORT ADDING FORMATTING FEATURES AND MULTIPLE STATISTICS ADDING TEST OF MEAN CHANGE DIFFERENT FROM ZERO STD PF ERROR DIABETIC AGE CHANGE PF P-VALUE: STATUS GROUP SEX N MEAN CHANGE MEAN=0 Diabetic <65 Female 34 1.1.7 1.. ** Define column variables.6876 Male 58 0.6 0.Week 11 Procedures to Summarize Data ** EXAMPLE part 3: REPORTING TEST STATISTICS **.1. TITLE3 'ADDING TEST OF MEAN CHANGE DIFFERENT FROM ZERO'. DEFINE DIAB / GROUP 'DIABETIC STATUS' FORMAT=DFMT.