Advanced SAS Programming Techniques

Advanced SAS Programming
Techniques
A Workshop Presented to the
Alaska Chapter
The American Fisheries Society
E. Barry Moser
Department of Experimental Statistics
Louisiana State University
and
Louisiana State University Agricultural Center
Baton Rouge, LA 70803
Phone: 504-388-8376
FAX: 504-388-8344
E-mail: barry@stat.lsu.edu
September 29-October 3, 1997

Contents
1 Introduction 3
2 The DATA Step 4
2.1 The DATA STEP process : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
2.1.1 An implicit loop : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4
2.1.2 RETURN, DELETE, and OUTPUT : : : : : : : : : : : : : : : : : : 5
2.1.3 Compound Statements : : : : : : : : : : : : : : : : : : : : : : : : : : 7
2.1.4 Data Set Options : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
2.1.5 DROP, KEEP, and RETAIN : : : : : : : : : : : : : : : : : : : : : : 10
2.2 Input/Output : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10
2.2.1 List Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10
2.2.2 Column Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
2.2.3 Pointer Control and Formatted Input : : : : : : : : : : : : : : : : : 14
2.2.4 The PUT Statement : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
2.2.5 SAS Formats and Informats : : : : : : : : : : : : : : : : : : : : : : : 19
2.3 SAS Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
2.3.1 Mathematical Functions : : : : : : : : : : : : : : : : : : : : : : : : : 21
2.3.2 Random Number Generators : : : : : : : : : : : : : : : : : : : : : : 22
2.3.3 String Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
2.3.4 Date and Time Functions : : : : : : : : : : : : : : : : : : : : : : : : 24
2.3.5 PUT and INPUT Functions : : : : : : : : : : : : : : : : : : : : : : : 25
2.4 Looping and Arrays : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26
2.4.1 Univariate and Multivariate Data Views : : : : : : : : : : : : : : : : 27
1
CONTENTS 2
2.4.2 Indeterminant DO Loops : : : : : : : : : : : : : : : : : : : : : : : : 32
2.5 The NULL Data Set : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33
2.6 Data Step Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35
2.6.1 Simple Random Sampling Without Replacement : : : : : : : : : : : 35
2.6.2 Data Recoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36
3 Working With Files 38

3.1 External Files : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38
3.1.1 FTP Access : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44
3.1.2 WWW Access : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45
3.2 Including External SAS Code : : : : : : : : : : : : : : : : : : : : : : : : : : 45
3.3 The SAS Data Library : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45
3.3.1 The LIBNAME Statement : : : : : : : : : : : : : : : : : : : : : : : : 46
3.3.2 Library Procedures : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47
3.4 File Import/Export/Transport : : : : : : : : : : : : : : : : : : : : : : : : : 51
3.4.1 Import/Export : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 51
3.4.2 Transport : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 53
3.5 The X Files : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55
4 The Macro Language 57

4.1 Macro Variables : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 57
4.2 Macro Procedures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59
4.3 Bootstrap Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62
4.4 Cluster Dendrogram : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66
5 SAS Special Files 70

5.1 Autoexec.sas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70
5.2 Cong.sas : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72
5.3 Prole.sct : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75
6 SAS Internet Tools 76

6.1 Capturing OUTPUT for the Web : : : : : : : : : : : : : : : : : : : : : : : : 76
Chapter 1
Introduction
The SAS1 system, composed of many diverse components, is a very powerful programming
environment, data management and data analysis environment, and report generation and
graphics presentation environment. This manuscript was developed for a short-course in
\advanced SAS." Obviously the coverage will have to be quite limited. The coverage is
designed around material that I have encountered through my teaching, research, and sta-
tistical consulting work that I believe will be relevant and useful for others dealing with
basic data management and statistical analysis needs. This manuscript is not intended as
a SAS language or SAS system reference manual, it hardly scratches the surface. Nor is
it designed to show how to do statistical analysis with the SAS system. The manuscript
will rst focus on the data step, as a lot of the power of the SAS environment can be
demonstrated through the data step. Next, the input/output and library system will be
discussed. Later the macro language will be introduced. And nally several chapters dealing
with various parts of the system, graphics, data analysis procedures, and the internet will
be introduced. As this is an \advanced" course, some items will be introduced before they
are actually covered in some detail. This was purposefully done so as to avoid completely
articial-looking or contrived examples (although a few do exist, sorry). Further, since not
all of the \basics" are covered, keep copies of the SAS manuals available. The best way to
learn the SAS system and to benet from this course is to experiment with the examples
and to create your own. Again, DO NOT hessitate to modify the examples and to create
new ones.
1
SAS, SAS/BASE, SAS/GRAPH, SAS/ACCESS, SAS/ASSIST, SAS/FSP, SAS/INSIGHT, SAS/OR,
SAS/ETS, SAS/IntrNet, SAS/IML, and SAS/STAT are registered trademarks or copyrights of SAS Institute,
Inc., Cary, NC.
3
Chapter 2
The DATA Step

2.1 The DATA STEP process
2.1.1 An implicit loop
To understand much of what happens in the data step, one rst needs to understand its
overall design. When originally conceived, the SAS data step was designed to get data
stored in some \raw" format into the SAS data format and to perform any transformations
and computations on that data prior to data analysis with procedures that would follow.
Thus, the data step was designed with an implicit loop around the data input. That is,
rather than the programmer having to explicitly write a loop around the input code, as
would need to be done with FORTRAN (and most other programming languages), the loop
was already assumed to be needed, and was, therefore, automatically supplied. At the
end of the implied loop, the resulting data, in the form of variables, is output to the new
SAS data set. The programmer writes the code needed to process a single observation of
data, and the data step will then automatically repeat this same code for each observation,
outputing each new observation in turn into the new SAS data set. This same basic process
is also followed when a SAS data set is created from an existing SAS data set, such as when
several SAS data sets are concatenated or merged together. The example below illustrates
the basic looping process using a portion of the ier data set.
Title2 "Simple Data Step";
Data One;
Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Proc Print Data=One;
Run;
4
CHAPTER 2. THE DATA STEP 5
Data Step Examples

Simple Data Step
OBS MO DAY YR AR ST SEX AGE SN LT WT TSL
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00

2 1 9 76 2 3 2 0 1 5.3 3.0 0.00
3 12 18 74 2 4 1 0 5 5.4 3.4 83.20
4 2 15 76 2 1 1 0 5 6.0 5.7 111.00
5 9 13 75 2 2 2 1 5 10.1 23.4 203.00
2.1.2 RETURN, DELETE, and OUTPUT

The behavior of the data step loop can be modied by several statements. The RETURN
statement causes execution of a loop to \return" to a specic point in a loop. When the
loop is the implicit data step loop, execution returns immediately to the beginning of the
data step loop. If no OUTPUT statements are contained in the data step, then the RETURN
statement also outputs the current observation in whatever stat it is in into the SAS data
set.
Title2 "RETURN Statement";
Data One;
If TSL=0 Then RETURN;
ConvFact=Lt/TSL;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Run;
Data Step Examples

RETURN Statement
OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

2 1 9 76 2 3 2 0 1 5.3 3.0 0.00 .
3 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.064904
4 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.054054
5 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754
Notice that the conversion factor, CONVFACT, for the second observation is missing (rep-
resented by a period). This observation could be dropped from the data set by several
methods. We'll consider a couple to further illustrate the behavior of the data step. The
OUTPUT statement can be used to output an observation to a data set or data sets. When it
is present in a data step, observations will ONLY BE OUTPUT when the OUTPUT statement
is executed. Consider the example below using both the RETURN and OUTPUT statements.
Title2 "RETURN and OUTPUT Statements";
Data One;
If TSL=0 Then RETURN;
ConvFact=Lt/TSL;
OUTPUT;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Run;
Data Step Examples

RETURN and OUTPUT Statements
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

2 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.064904
3 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.054054
4 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754
Note that the second observation was never output to the new SAS data set. Now, what
if we would like the observations with zero (or no) total scale length (TSL) to be placed
into one data set and the others to be placed into another data set. Let's use the OUTPUT
statement to do this for us.
Title2 "OUTPUT to Different Data Sets";
Data WithTSL NoTSL;
ConvFact=Lt/TSL;
If TSL=0 Then OUTPUT NoTSL;
Else OUTPUT WithTSL;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Title3 "With TSL > 0";
Proc Print Data=WithTSL;
Run;
Title3 "TSL Not Measured";

Proc Print Data=NoTSL;
Run;
Data Step Examples

OUTPUT to Different Data Sets
With TSL > 0
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

2 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.064904
3 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.054054
4 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754
Data Step Examples

OUTPUT to Different Data Sets
TSL Not Measured
1 1 9 76 2 3 2 0 1 5.3 3 0 .
One problem that we will have here is that the conversion factor will be computed on
each and every observation and so we get a divide by zero error on the second observation
(TSL=0). Further, since no conversion factor is possible when the scale length is not
measured, we should like to drop the CONVFACT variable from the NoTSL data set.
2.1.3 Compound Statements

A compound statement is a programming statement that consists of several simple state-
ments. In the SAS Language, compound statements are constructed using the DO and END
statements. Often a compound statement is the object of a conditional statement such as
the IF statement. Using the Flier data set from the previous example, we can prohibit the
divide by zero error and output the observations to the proper data sets using a compound
statement.
Title2 "Compound Statements";
Data WithTSL NoTSL;
Else
Do;
ConvFact=Lt/TSL;
OUTPUT WithTSL;
End;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Run;

Run;
Data Step Examples

Compound Statements
With TSL > 0
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

2 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.064904
3 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.054054
4 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754
Data Step Examples

Compound Statements
TSL Not Measured
1 1 9 76 2 3 2 0 1 5.3 3 0 .
Unfortunately, the CONVFACT variable is still present in the NoTSL data set, although it
was not computed for any observations in this data set. A DROP statement could be used
to remove the CONVFACT variable from ALL output data sets, but this is not what we
would like either.
2.1.4 Data Set Options

There are a number of options that can be specied for accessing SAS data sets, including
passwords to protect a data set from being read, edited, or written to. These options are
specied within parenthesis following the data set name and can be used within a data step
or a procedure step. Some options that are commonly used are
KEEP= Species a list of variables to be retained in a new data set.
DROP= Species a list of variables to be excluded from a new data set.
LABEL= Species a label that is to be stored in the SAS data set and will be shown
using various data set utilities.
RENAME= Permits variable names to be changed. The old names are available in the
current data step, while the new names will actually be stored in the new data set.
Options that are used for the processing of an existing SAS data set, such as when using
SET, MERGE, and UPDATE statements, include
KEEP= Species a list of variables to be accessible from an existing data set.

DROP= Species a list of variables to be inaccessible from an existing data set.
FIRSTOBS= Species the observation number with which processing should begin.
OBS= Species the last observation number to be processed, after which pro-
cessing will stop.
IN= Names a new variable that will have the value 1 when the current ob-
servation is read from the data set and 0 when the current observation is read from
another data set.
Now we can use this information to update our program so that we can have a dierent set
of variables for the two new data sets.
Title2 "DROP Data Set Option";
Data WithTSL NoTSL(DROP=ConvFact);
Else
Do;
ConvFact=Lt/TSL;
OUTPUT WithTSL;
End;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Run;

Run;
Data Step Examples

DROP Data Set Option
With TSL > 0
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

2 12 18 74 2 4 1 0 5 5.4 3.4 83.20 0.064904
3 2 15 76 2 1 1 0 5 6.0 5.7 111.00 0.054054
4 9 13 75 2 2 2 1 5 10.1 23.4 203.00 0.049754
Data Step Examples

DROP Data Set Option
TSL Not Measured
1 1 9 76 2 3 2 0 1 5.3 3 0
Finally, this is what we had wanted. Note that the KEEP= option could also have been used,
but would have required that we list each of the variables that we wanted to keep in the
new data set. The choice as to KEEP or DROP is usally based upon which list is shorter or
easiest to write.
2.1.5 DROP, KEEP, and RETAIN
There are a few statements in the SAS data step that are not executable statements and can
be placed anywhere within the SAS data step. As with the DROP= and KEEP= data set options,
the DROP and KEEP statements are used to specify which variables are to be excluded or
included in the new data set or data sets being created. Others include the FORMAT, LENGTH,
and ARRAY statements. The RETAIN statement can also be placed anywhere but can serve
several roles. Normally, after the variables have been output for an observation at the end
of the data step loop, the variables' values are reset to missing before the next observation
is processed. The RETAIN statement alters this behavior by not resetting the data values for
any variables given in its list of variables. Another use of the RETAIN statement is to give
initial values to specic variables. Examples of the RETAIN statement will be encountered
later.
2.2 Input/Output
One of the many powerful features of the SAS language is the diversity of methods, and
modications to them, for inputing data into a SAS data set. We will review several of the
more important methods and then consider some options and modications that become
more and more useful as the data become more complicated to read. For reading data from
a spreadsheet, for example, you may want to skip to the section on le import and export
on page 51. We will next look at output methods that can be very useful for generating
reports that have to be in very specic formats.
The basic statement used for reading raw data from a le is the INPUT statement. It takes as
its arguments a list of variables, pointer placement instructions, and informat information.
For options that are used to modify the standard behavior of the input process, the INFILE
statement is used.
2.2.1 List Input

The list input method is the simplest of methods and works in very many situations. Essen-
tially, one species the variable names in the order in which they occur in the data set with
no pointer placement instructions. The values on a line will be read in order and placed into
the variables according to their order in the INPUT statement. This is seen in the \Simple
Data Step" example on page 4.
The behavior of the input process, however, depends upon having at least as many data
values on each line as there are variables in the INPUT statement. If the number of data
values is less than the number of variables specied in the INPUT statement, then, by default,
the input pointer will be moved to the beginning of the next line of input and the remainder
of the variables will be lled using data on this new line. The SAS data step parser will
report the following information in the SAS LOG:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
This may not be what you have intended. Consider the example below where there are
several missing data values scattered throughout the input data set. Note that the data
values become mismatched with which variables they should correspond with.
Title2 "List Input / Missing Data";
Data One;
Datalines;
7 21 74 2 7 3 0 5 3.5
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 23.4 203.00
1 28 75 2 3 1 1 5 13.5 33.2 242.00
;

Run;
Data Step Examples

List Input / Missing Data
1 7 21 74 2 7 3 0.0 5.0 3.5 1 9

2 12 18 74 1 0 5 5.4 3.4 83.2 2 15
3 9 13 75 2 2 2 1.0 5.0 23.4 203 1
First of all, the month and day information for observation 2 was used as the weight and
total scale length data for observation 1. Second, observe that the remainder of the data for
observation 2 was discarded. Third, observation 3 has missing data for the area and station
variables, but the sex and age data were used for these variables. Finally, we note that
instead of 6 observations, the nal data set has only 3. What has happened is the following.
When list input is used, the input processor simply scans a line of data until it comes to
non-blank information. It will then place this data into the next variable, in order, from
the INPUT statement. It does not matter in which columns the data are placed on the line,
except that their left-to-right order corresponds with the order of the variables in the INPUT
statement. If insucient values are found to ll the variables, then the processor moves on
to the next line of data. When all variables have been lled, the processor stops reading
in data and the current line of data being read is, by default, discarded. This is why the
remainder of line 2 was not used. You should now be able to duplicate the assignment of
the data to the variables by following the above rules.
How to modify the list input to handle the missing data? If the missing data all occur at
the end of a data line, as might happen with repeated measurements data where dropouts
would have no data after dropping out, then the MISSOVER option of the INFILE statement
can be used. This option species that the pointer is NOT to be moved to the next line if
no more values are found on a data line, rather, the remaining variables are to be assigned
the value for missing data (by default, a period for numeric data and a blank for character
data).
Title2 "Missing Data / Missover Option";
Data Trout;
/* Flack Lake trout Catch Data*/
Infile Datalines MISSOVER;
Input Year Age3-Age9;
Datalines;
1975 0 105 674 446 16 2 2
1976 46 422 838 726 70 4 4
1977 3 310 1224 1068 65
1978 14 354 1264 1172 69 0 6
1979 6 429 1222 1067 192
;
Proc Print Data=Trout;
Run;
Data Step Examples

Missing Data / Missover Option
OBS YEAR AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9
1 1975 0 105 674 446 16 2 2

2 1976 46 422 838 726 70 4 4
3 1977 3 310 1224 1068 65 . .
4 1978 14 354 1264 1172 69 0 6
5 1979 6 429 1222 1067 192 . .
Notice that we did get 5 observations with the correct data values. In this case the missing
values are not truly missing, but rather are zero values. The data step below would change
the missing values to zeros. See the section on looping and arrays on page 26 to see how to
solve this problem more generally and easily.
Title2 "Missing Data / Missover Option";
Data Trout;
If Age8=. Then Age8=0;
If Age9=. Then Age9=0;
Datalines;
1975 0 105 674 446 16 2 2
1976 46 422 838 726 70 4 4
1977 3 310 1224 1068 65
1978 14 354 1264 1172 69 0 6
1979 6 429 1222 1067 192
;
Now what if the missing data are interior to other data line values? In order to continue
to use list input, one needs to enter the missing value symbol in the data lines for these
values. This is illustrated below.
Title2 "List Input / Missing Data";
Data One;
Datalines;
7 21 74 2 7 3 0 5 3.5 . .
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 . . 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 . 23.4 203.00
1 28 75 2 3 1 1 5 13.5 33.2 242.00
;

Run;
Data Step Examples

List Input / Missing Data
1 7 21 74 2 7 3 0 5 3.5 . .
2 1 9 76 2 3 2 0 1 5.3 3.0 0.0
3 12 18 74 . . 1 0 5 5.4 3.4 83.2
4 2 15 76 2 1 1 0 5 6.0 5.7 111.0
5 9 13 75 2 2 2 1 5 . 23.4 203.0
6 1 28 75 2 3 1 1 5 13.5 33.2 242.0
2.2.2 Column Input

When the values for variables are placed in specic columns, as can be produced from
a printed spreadsheet or data base program, then column input can be very valuable. In
addition, since it only looks for data in the specied columns, it is easy to skip over unwanted
information and to easily handle missing data. Consider the data set above with the internal
missing values, then read the data using column input. If you are using the SAS Program
Editor window to edit the data and you have \line numbers" turned on, then you can enter
the \cols" line command over any of the line numbers and you will get a reference line to
determine the columns from.
Title2 "Column Input / Missing Data";
Data One;
Input Mo 2-3 Day 6-7 Yr 10-11 Ar 14 St 18
Sex 22 Age 27 Sn 31 Lt 32-36 Wt 37-42 TSL 43-50;
/* Some comment lines to help us find the columns */
/* 1 2 3 4 5*/
/*345678901234567890123456789012345678901234567890*/
Datalines;
7 21 74 2 7 3 0 5 3.5
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 23.4 203.00
1 28 75 2 3 1 1 5 13.5 33.2 242.00
;

Run;
Data Step Examples

Column Input / Missing Data
1 7 21 74 2 7 3 0 5 3.5 . .
2 1 9 76 2 3 2 0 1 5.3 3.0 0.0
3 12 18 74 . . 1 0 5 5.4 3.4 83.2
4 2 15 76 2 1 1 0 5 6.0 5.7 111.0
5 9 13 75 2 2 2 1 5 . 23.4 203.0
6 1 28 75 2 3 1 1 5 13.5 33.2 242.0
One can also mix the various input methods. Thus, the input line could have been written
as
Input Mo Day Yr Ar 14 St 18 Sex Age Sn Lt 32-36 Wt 37-42 TSL 43-50;
since the date, sex, age, and scale number information were never missing. It is more
common to use either all list or all column input.
2.2.3 Pointer Control and Formatted Input

There are also a number of special pointer controls that can move the pointer around in the
input line or to change its behavior. Similar to column input, the pointer can be positioned
at a particular column using @n, then list (or formatted) input can be used to read the data
in, where the value n is the column number. One can also use the symbols +n and -n to
move the pointer forward or backward along the line. When the data for an observation falls
on more than a single physical input line, the pointer can be moved on to additional lines
using the symbol #n, where n is the line number relative to the current line. For character
data, the symbol & can be used to tell the pointer that a single space does not separate
variable values for a specic character variable. The ier data set is modied so that we
can illustrate these techniques. The data for a single sh will now be placed on 2 physical
lines for input and the rst variable will be the scale reader's name.
Title2 "Pointer Control / Multiple Input Lines";
Data One;
Length Name $13;
Input @1 Name & Mo Day Yr @26 Ar 1. @30 St 1.
Sex Age Sn #2 @1 Lt 5.1 @6 Wt 6.1 @12 TSL 8.2;
/* Some comment lines to help us find the columns */
/* 1 2 3 4 5*/
/*345678901234567890123456789012345678901234567890*/
Datalines;
Bill Smith 7 21 74 2 7 3 0 5
3.5
Bill Smith 1 9 76 2 3 2 0 1
5.3 3.0 0.00
John P. Doe 12 18 74 1 0 5
5.4 3.4 83.20
John P. Doe 2 15 76 2 1 1 0 5
6.0 5.7 111.00
John P. Doe 9 13 75 2 2 2 1 5
23.4 203.00
Bill Smith 1 28 75 2 3 1 1 5
13.5 33.2 242.00
;

Run;
Data Step Examples

Pointer Control / Multiple Input Lines
OBS NAME MO DAY YR AR ST SEX AGE SN LT WT TSL
1 Bill Smith 7 21 74 2 7 3 0 5 3.5 . .

2 Bill Smith 1 9 76 2 3 2 0 1 5.3 3.0 0.0
3 John P. Doe 12 18 74 . . 1 0 5 5.4 3.4 83.2
4 John P. Doe 2 15 76 2 1 1 0 5 6.0 5.7 111.0
5 John P. Doe 9 13 75 2 2 2 1 5 . 23.4 203.0
6 Bill Smith 1 28 75 2 3 1 1 5 13.5 33.2 242.0
Note that we had to leave at least 2 spaces separating the reader's name from the month
value since we used the & symbol, otherwise the month value would have been read as part
of the name. Secondly, since we did not specify a character format, we needed to dene the
length of the character variable using the LENGTH statement. Otherwise the length would
have been determined from the rst value read in. Using formatted character input we
could have used the INPUT statement
Input @1 Name $Char13. Mo Day Yr @26 Ar 1. @30 St 1.
and have dropped the LENGTH statement from the program. SAS also has input formats,
called \informats" for reading in other types of data such as social security numbers, phone
numbers, and dates and times. We will give some examples of working with SAS dates and
times later. The input formats can be explicitly given on the INPUT statement, or can be
assigned to the variables using the INFORMAT statement.
When reading complex data les controlling the pointer can be very important. Normally
after the input statement has been executed for an observation, the physical input line
is discarded and the next physical input line is moved into the input buer. Sometimes,
however, you would like to read what is on the physical line at specic points, say looking
for special key words, then you use an input statement that depends upon the key word.
The \trailing" @ and @@ signs can be used to hold the pointer on the current input line.
The single @ sign holds the line until another input statement releases it or until the data
step loop restarts. The double @@ sign holds the current line even after the data step loop
restarts. They are called \trailing" because the symbol is placed at the very end of the
input statement (just before the semicolon).
Example: Multiple observations per input line

In this example more than one observation is placed on a single physical line. To keep the
example simple, we will assume that there is no missing data, or if missing data occurr
they are indicated in the data set by a \." surrounded by a space. The data set consists of
lengths and weights of ier sunsh.
Title2 "Multiple Observations Per Input Line";
Data One;
Input Lt Wt @@;
Datalines;
3.5 1.4 5.3 3.0 5.4 3.4 6.0 5.7 10.1 23.4
13.5 33.2 8.7 16.8 11.9 44.4 12.0 40.9 16.0 103.4
9.1 17.9 10.5 29.1 17.2 127.1 17.9 132.7
12.1 47.9 17.2 136.4 17.3 138.3 17.5 134.4
13.2 42.2 16.4 110.0 16.7 101.6 15.3 92.9
15.4 76.3 18.0 125.8 18.5 131.6
;
Run;
Data Step Examples

Multiple Observations Per Input Line
OBS LT WT
1 3.5 1.4
2 5.3 3.0
3 5.4 3.4
4 6.0 5.7
5 10.1 23.4
6 13.5 33.2
7 8.7 16.8
8 11.9 44.4
9 12.0 40.9
10 16.0 103.4
11 9.1 17.9
12 10.5 29.1
13 17.2 127.1
14 17.9 132.7
15 12.1 47.9
16 17.2 136.4
17 17.3 138.3
18 17.5 134.4
19 13.2 42.2
20 16.4 110.0
21 16.7 101.6
22 15.3 92.9
23 15.4 76.3
24 18.0 125.8
25 18.5 131.6
Example: Conditional input

In this example, we have data on lengths and weights of sh collected on dierent sampling
dates. To keep the amount of typing to a minimun, the data were coded such that the date
of collection falls on one line by itself and then on separate lines come the length-weight
data pairs measured on that date. Then the next date appears followed by its length-weight
data pairs. Since there need not be the same number of sh measured on each date, we must
test to see whether the input line contains a date or contains length-weight data. Since all
of the date lines contain a \/" while none of the length-weight data lines do, we can test
for the presence of the \/" on the input line using the INDEX() function.
Title2 "Conditional Input";
Data One;
Drop Test;
Retain Date;
Length Test $8;
Input Test @;
If Index(Test,'/') Then /* is a date value */
Input @1 Date MMDDYY8.;
Else /* is a fish LT WT observation */
Do;
Input @1 Lt Wt;
Output;
End;
Datalines;
4/6/97
3.5 1.4
5.3 3.0
5.4 3.4
6.0 5.7
10.1 23.4
13.5 33.2
8.7 16.8
11.9 44.4
12.0 40.9
16.0 103.4
9.1 17.9
5/12/97
10.5 29.1
17.2 127.1
17.9 132.7
12.1 47.9
17.2 136.4
17.3 138.3
17.5 134.4
6/4/97
13.2 42.2
16.4 110.0
16.7 101.6
15.3 92.9
15.4 76.3
18.0 125.8
18.5 131.6
;
Format Date MMDDYY8.;
Run;
Data Step Examples

Conditional Input
OBS DATE LT WT
1 04/06/97 3.5 1.4

2 04/06/97 5.3 3.0
3 04/06/97 5.4 3.4
4 04/06/97 6.0 5.7
5 04/06/97 10.1 23.4
6 04/06/97 13.5 33.2
7 04/06/97 8.7 16.8
8 04/06/97 11.9 44.4
9 04/06/97 12.0 40.9
10 04/06/97 16.0 103.4
11 04/06/97 9.1 17.9
12 05/12/97 10.5 29.1
13 05/12/97 17.2 127.1
14 05/12/97 17.9 132.7
15 05/12/97 12.1 47.9
16 05/12/97 17.2 136.4
17 05/12/97 17.3 138.3
18 05/12/97 17.5 134.4
19 06/04/97 13.2 42.2
20 06/04/97 16.4 110.0
21 06/04/97 16.7 101.6
22 06/04/97 15.3 92.9
23 06/04/97 15.4 76.3
24 06/04/97 18.0 125.8
25 06/04/97 18.5 131.6
Note that we needed to retain the date variable so that its value would be maintained
through each loop of the data step. The date is only changed when a date value is found on
the input line. Also note that in this example the date format for the DATE variable was
specied in the PROC PRINT section rather than in the data step. If the FORMAT statement
is placed in the data step, then the format will be used by future procedures. If it is placed
in a procedure step, then the format is local to that procedure and will not carry on to
future procedures.
2.2.4 The PUT Statement

The PUT statement provides the output interface for the SAS data step. It works very
similar to the INPUT statement in that a list of variables and constants can be given, or
pointer control can be used to position the pointer to get data placed into specic positions,
or a combination of them can be used. Additionally, SAS formats (to be discussed later
on page 19) can also be used. This all provides for a very powerful report writing tool. A
simple example will be used to illustrate some of this methodology.
Title2 "Report Writing";
Data One;
Input @1 Name $Char13. Mo Day Yr @26 Ar 1. @30 St 1.
File PRINT; /* Send to the OUTPUT window */
If _N_ = 1 Then Put // @10 "Flier Sunfish Scale Report";
PUT @5 Name $13. @20 Mo 2. '/' Day 2. '/' Yr 2.
@30 "Area=" Ar @41 St= / @5 Sex Age Sn
Lt 10.3 Wt 10.0 TSL 10.2;
Datalines;
Bill Smith 7 21 74 2 7 3 0 5
3.5
Bill Smith 1 9 76 2 3 2 0 1
5.3 3.0 0.00
John P. Doe 12 18 74 1 0 5
5.4 3.4 83.20
John P. Doe 2 15 76 2 1 1 0 5
6.0 5.7 111.00
John P. Doe 9 13 75 2 2 2 1 5
23.4 203.00
Bill Smith 1 28 75 2 3 1 1 5
13.5 33.2 242.00
;
Data Step Examples

Report Writing
Flier Sunfish Scale Report

Bill Smith 7/21/74 Area=2 ST=7
3 0 5 3.500 . .
Bill Smith 1/ 9/76 Area=2 ST=3
2 0 1 5.300 3 0.00
John P. Doe 12/18/74 Area=. ST=.
1 0 5 5.400 3 83.20
John P. Doe 2/15/76 Area=2 ST=1
1 0 5 6.000 6 111.00
John P. Doe 9/13/75 Area=2 ST=2
2 1 5 . 23 203.00
Bill Smith 1/28/75 Area=2 ST=3
1 1 5 13.500 33 242.00
There are several features to notice in this particular example, besides its very unattractive
appearance. Note that string constants can be printed in the report simply by enclosing
the data within quotes. Note also that the output format does not have to be the same
as the input format. Further, the name of a variable along with its value can be obtained
simply by listing the variable's name on the PUT statement followed immediately by the
equals sign. The pointer control symbol / is used to move the pointer to the next line. The
FILE statement can also be used to redirect the output report to a le or device such as a
printer.
2.2.5 SAS Formats and Informats

The SAS System provides a large number of built-in formats, as well as informats. The date
and time formats can be especially useful. When date data are input using a date informat,
the date is actually stored internally as a numerical value representing the number of days
since a specic date. We'll see what that value is in a moment. What this means for us
is several things. First, it becomes easy to sort the data by date without having to worry
about a character values such as \01/19/97" being sorted before \05/03/92". Secondly, it is
easy to compute the number of days between any two SAS dates, simply take the dierence
between them. Thirdly, the output format can be changed from that used upon input. Let's
look at an example.
Title2 "SAS Dates";
Data One;
Retain Now;
If _N_=1 Then Now=Today(); /* Get the current day */
Input Name $Char10. Birthday mmddyy8.;
If Birthday=. Then Birthday=0;
Bday=Birthday; Sday=Birthday; MDYday=Birthday;
DaysOld=Now-Birthday;
Format Bday WeekDatX29. Sday Date7. Now MDYday mmddyy8.;
Datalines;
Bob 12/08/87
Erica 3/15/92
Sammy 06/7/57
Keith .
;
Proc Sort Data=One;
By Birthday;
Run;
Proc Print Data=One NoObs;
Var Name MDYday Sday Bday Birthday Now DaysOld;
Run;
Data Step Examples 1

SAS Dates
NAME MDYDAY SDAY BDAY BIRTHDAY NOW DAYSOLD
Sammy 06/07/57 07JUN57 Friday, 7 June 1957 -938 09/16/97 14711

Keith 01/01/60 01JAN60 Friday, 1 January 1960 0 09/16/97 13773
Bob 12/08/87 08DEC87 Tuesday, 8 December 1987 10203 09/16/97 3570
Erica 03/15/92 15MAR92 Sunday, 15 March 1992 11762 09/16/97 2011
Now we discover that SAS dates are relative to January 1, 1960 as this is the date corre-
sponding to the SAS date of zero. Note that because we did not assign a SAS date format
to the BIRTHDAY variable, that the actual SAS date value was printed. The example also
demonstrated the SAS formats WEEKDATXn. and DATEn., where n is the format width. If the
width is not sucient to write out the entire names, as requested, then abbreviations will
be used where possible. Decimal values of SAS dates can also be used to contain internal
values for time. See the BASE SAS documentation for these formats. SAS date-time values
are entered in a date set as MM/DD/YY:hh:mm:ss where MM/DD/YY is the date, while hh is
the hour, mm is the minutes, and ss is the seconds in the time. When typed within the data
step, such as in an IF statement, a date is enclosed in quotes and followed by the letter d,
such as "09/29/97"d while times are followed by the letter t, such as "8:30"t. Note that
hour takes on the values 0 through 23, where 0 is midnight.
Formats can also be constructed using PROC FORMAT. Once constructed, these formats can
be used as with any other format. These formats provide a very nice way for coding data
very simply for input, but then producing reports with very nice labels for values. We'll
create a format for the SEX variable used in the ier data and use it to write out the labels
on the printout.
Title2 "Proc FORMAT";
Proc Format;
Value Sex 1="Male" 2="Female" 3="Unknown";
Run;
Data One;
Format Sex Sex.;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;

Run;
Data Step Examples

Proc FORMAT
1 7 21 74 2 7 Unknown 0 5 3.5 1.4 57.00

2 1 9 76 2 3 Female 0 1 5.3 3.0 0.00
3 12 18 74 2 4 Male 0 5 5.4 3.4 83.20
4 2 15 76 2 1 Male 0 5 6.0 5.7 111.00
5 9 13 75 2 2 Female 1 5 10.1 23.4 203.00
2.3 SAS Functions

The SAS system provides a very large number of functions for computing a wide diversity
of things. Some of the more common functions that are encountered in basic data analysis
and management are described below. Keep in mind that there are many more functions
and function families than are described herein.
2.3.1 Mathematical Functions

ABS(value) returns the absolute value of the numeric argument.
COS(value) returns the cosine in radians of value.
EXP(value) returns the constant e raised to the power given by value.
INT(value) returns the integer part of a real number.
LOG(value) returns the natural logarithm of value.
LOG10(value) returns the base 10 logarithm of value.
MOD(value,divisor) returns the integer remainder when value is divided by divisor.

ROUND(value,decimals) returns value rounded o to the nearest value based upon
the decimals value. For example, ROUND(123.456,0.1) returns 123:5 and ROUND(123.456,10.0)
returns 120. If the decimals value is omitted, a value of 1 is assumed.
SIN(value) returns the sine of value.
SQRT(value) returns the square root of value.
SUM(v1,v2,...,vn) returns the sum of the non-missing values contained in the ar-
gument list.
2.3.2 Random Number Generators

All of the random number generators require a seed to start them. A seed of zero can be
used to seed the generator with a value derived from the system clock. For a given positive
integer seed, a generator will return exactly the same random number series. There are
other generators available, and through programming and the use of existing generators,
variates from other distributions can be generated.
NORMAL(seed) returns a standard normal random variate. seed is the value which
species the position within the psuedorandom number stream the variates are se-
lected from.
RANBIN(seed,n,p) returns a binomial random variate from the binomial distribution
with parameters n and p.
RANNOR(seed) is the same as the NORMAL(seed) function.
RANPOI(seed,lambda) returns a Poisson random variate from the Poisson distribution

with parameter lambda.
RANUNI(seed) returns a continuous uniform random variate from the interval (0; 1).
UNIFORM(seed) is the same as RANUNI(seed).
A simple example to generate a standard normal variate Z, and from it a normal variate Y
with mean u and standard deviation s is given below. Assume u=5 and s=3.
Title2 "Normal Random Variates";
Data One;
Drop I u s;
Retain u 5 s 3;
Do I=1 To 25;
Z=Normal(0); /* Use system clock as seed */
Y=u + s*Z;
Output;
End;
Run;

Run;
Proc Means Data=One Mean Std;

Var Z Y;
Run;
Data Step Examples

Normal Random Variates
OBS Z Y
1 2.01531 11.0459
2 0.58587 6.7576
3 0.18383 5.5515
4 -1.08207 1.7538
5 -1.87971 -0.6391
6 -0.87702 2.3689
7 0.63108 6.8932
8 1.53379 9.6014
9 -0.34128 3.9761
10 0.47535 6.4261
11 1.26282 8.7885
12 0.64412 6.9323
13 2.11873 11.3562
14 0.25801 5.7740
15 0.08347 5.2504
16 -1.37542 0.8737
17 -1.00606 1.9818
18 -0.68600 2.9420
19 -0.91837 2.2449
20 -1.28510 1.1447
21 -1.69460 -0.0838
22 -1.37999 0.8600
23 0.35292 6.0588
24 0.76663 7.2999
25 0.00566 5.0170
Data Step Examples

Normal Random Variates
Variable Mean Std Dev

------------------------------------
Z -0.0643208 1.1334234
Y 4.8070375 3.4002701
------------------------------------
2.3.3 String Functions

String functions can be very useful for the processing of complex data sets and for subsetting
data sets according to values contained within strings. Some commonly used string functions
include:
COMPRESS(string) returns a new string that has blanks removed and then padded at
the end of the string.
INDEX(string,value) returns the position in the string where the rst character of
value begins within the string. If value is not contained within the string, then the
function returns zero. Thus, it is very useful for checking for the presence or absence
of a certain value in a string variable, such as testing that a string variable contains
the last name of a particular person.
LEFT(string) will left align a character string by removing any leading blanks from
the string. Note that the string's length is not changed by this function.
LENGTH(string) returns the \length" of a string. Here, the \length" is dened to
be the position of the right-most non-blank character in the string, rather than the
number of characters reserved for storage of the string.
LOWCASE(string) converts all uppercase characters to lowercase characters in string.
RIGHT(string) will right align a character string by removing trailing blanks from
the end of the string and inserting them at the beginning of the string. Thus, it does
not change the length of the string.
SCAN(string,n,delimiters) returns the nth word from the character string string,
where words are delimited by the characters in delimiters. If delimiters is omitted
from the function, then blanks and most punctuation and special characters are used
as the delimiters. Consult the SAS help or SAS/BASE documentation. If there are
fewer words in the string than given by n, then a blank character string is returned.
SUBSTR(string,start,n) returns a substring or part of string beginning with the
character at the position start in the string and continuing for n characters. If n is
omitted, then the remainder of the string is extracted.
SUBSTR(string1,start)=string2 replaces the characters in string1 beginning at
position start in string1 with string2.
TRIM(string) returns a new string whose trailing blanks have been removed and
whose length corresponds with the position of the last non-blank character in string.
A blank string, however, is returned as a string with one blank character.
TRIMN(string) is like TRIM(string), but a blank string is returned as a null string
(length of zero).
UPCASE(string) returns a new string will any lowercase characters replaced with their
uppercase counterparts.
2.3.4 Date and Time Functions

DATE() with no argument returns the current system date.
DATETIME() with no argument returns the current system date and time as a SAS
date-time value.
DHMS(date,hour,minute,second) returns a SAS date-time value by combining a SAS
date value with the hour, minute, and second values.
MDY(month,day,year) returns a SAS date from month, day, and year values.
TIME() with no argument returns the current system time.
TODAY() is the same as DATE().
WEEKDAY(date) returns an integer from 1 to 7, 1=Sunday, 7=Saturday, corresponding

to the day of the week for the SAS date date.
2.3.5 PUT and INPUT Functions

The PUT(value,format.) and INPUT(value,informat.) functions permit input/output
(I/O) to variables rather than to some I/O device. The value can be a variable or a
constant value, and the format or informat should conform to the value's type.
The example below uses the date function MDY(), the string functions COMPRESS(), SCAN(),
and UPCASE(), the I/O function PUT(), and the numeric function LOG(). It also uses the
string operator || that concatenates or joins two strings together.
Title2 "SAS Functions";
Data One;
Length Name $13 Chardate $8;
Input @1 Name & Mo Day Yr @26 Ar 1. @30 St 1.
SASdate=MDY(Mo,Day,Yr);
Chardate=Compress(Put(Mo,Z2.)||"/"||Put(Day,Z2.)||"/"||Put(Yr,Z2.));
SASdate2=Input(Chardate,mmddyy8.);
If Wt NE . Then LnWt=Log(Wt);
Lastname=Upcase(Scan(Name,3," "));
If Lastname=" " Then Lastname=Upcase(Scan(Name,2," "));
Format SASdate SASdate2 mmddyy8.;
Datalines;
Bill Smith 7 21 74 2 7 3 0 5
3.5
Bill Smith 1 9 76 2 3 2 0 1
5.3 3.0 0.00
John P. Doe 12 18 74 1 0 5
5.4 3.4 83.20
John P. Doe 2 15 76 2 1 1 0 5
6.0 5.7 111.00
John P. Doe 9 13 75 2 2 2 1 5
23.4 203.00
Bill Smith 1 28 75 2 3 1 1 5
13.5 33.2 242.00
;
Var Name Lastname Mo Day Yr SASdate Chardate SASdate2 Wt;
Run;
Data Step Examples

SAS Functions
OBS NAME LASTNAME MO DAY YR SASDATE CHARDATE SASDATE2 WT
1 Bill Smith SMITH 7 21 74 07/21/74 07/21/74 07/21/74 .

2 Bill Smith SMITH 1 9 76 01/09/76 01/09/76 01/09/76 3.0
3 John P. Doe DOE 12 18 74 12/18/74 12/18/74 12/18/74 3.4
4 John P. Doe DOE 2 15 76 02/15/76 02/15/76 02/15/76 5.7
5 John P. Doe DOE 9 13 75 09/13/75 09/13/75 09/13/75 23.4
6 Bill Smith SMITH 1 28 75 01/28/75 01/28/75 01/28/75 33.2
Note the use of the Zn. format to supply the leading zeros needed to mimic the look of
the MMDDYYn. format. Recognize that time calculations and ordering (sorting) can be made
with the SASDATE and SASDATE2 variables, while the CHARDATE varible is simply a character
representation of the date. If we had wanted to \ ag" all observations with a date prior to
"07/04/75"d, then it is easy using the SAS date variables,
If SASDATE < "07/04/75"d Then Flag="PRIOR";
Else Flag="AFTER";
while much more programming is needed if we use the character variable. Basically, we
would have to create our own \SAS date" representation of the character variable for such
comparisons.
2.4 Looping and Arrays

In a number of circumstances the same task needs to be performed multiple times over a
set of observations, variables, or times. The implicit loop of the data step is already seen
to provide a loop around the observations in an input data set. SAS arrays can be used
to facilitate looping over a set variables. The ARRAY statement lists the names of variables
to be treated as a set such that they can be referenced through an indexing variable to an
array. The syntax of the ARRAY statement is
ARRAY arrayname{number|* {,number,number,...}} list_of_variables
where arrayname is the name for the array and {number|*} is either the number of variables
in the that dimension of the array or is \*" indicating that all variables listed are to be used
in the one dimensional array. Using the Flier sunsh data, let's assume that we wished to
create a set of new variables that were the logarithmic transformation of a set of the original
variables. Rather than write a number of assignment statements, we can use arrays, a single
assignment statement, and a loop to complete the task. The following SAS code illustrates
the brute force method that we wish to avoid.
Title2 "Brute Force Approach to Repetitive Task";
Data One;
If TSL=0 Then DELETE;
LnLt=Log(Lt); LnWt=Log(Wt); LnTSL=Log(TSL);
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
Now let's re-write this program using arrays and the basic \do" loop.
Title2 "Arrays and Do Loop";
Data One(DROP=I);
If TSL=0 Then DELETE;
Array RawVars{3} Lt Wt TSL;
Array NewVars{3} LnLt LnWt LnTSL;
Do I=1 To 3;
NewVars(I)=Log(RawVars(I));
End;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
;
The data sets produced by each of these programs are the same. However, imagine that
instead of 3 variables that needed to be transformed, there were many more. Multidimen-
sional arrays are also allowed simply by specifying multiple subscript sizes.
2.4.1 Univariate and Multivariate Data Views

In many instances measurements are made at the same location or on the same individuals
through time. These repeated measures data can be viewed and analyzed using both uni-
variate and multivariate approaches. In the multivariate approach, each measurement made
on the same individual is treated as a dierent variable, while in the univariate approach,
each measurement is treated as a separate observation made on the same individual. In the
latter case, one variable is used to identify the individual while another is used to hold the
value of the measurement. Below we will treat the Flack lake trout data in both multivari-
ate and univariate formats. This data set has catch data reported on trout aged 3 through
9 years old for the years 1968 through 1979. First consider the multivariate view of the
data where each catch number for each age is kept in separate variables.
Title2 "Multivariate Data View";
Data TroutM;
/* Flack Lake Trout Catch Data*/
Datalines;
1968 13 129 646 954 99 19 4
1969 19 169 416 1031 243 47 18
1970 40 354 606 479 152 18 7
1971 32 606 1424 644 157 23 17
1972 0 226 1178 1156 116 16 5
1973 2 165 593 982 428 22 11
1974 53 209 560 410 30 0 4
1975 0 105 674 446 16 2 2
1976 46 422 838 726 70 4 4
1977 3 310 1224 1068 65 0 0
1978 14 354 1264 1172 69 0 6
1979 6 429 1222 1067 192 0 0
;
Proc Print Data=TroutM;
Run;
Data Step Examples
Multivariate Data View
OBS YEAR AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9
1 1968 13 129 646 954 99 19 4

2 1969 19 169 416 1031 243 47 18
3 1970 40 354 606 479 152 18 7
4 1971 32 606 1424 644 157 23 17
5 1972 0 226 1178 1156 116 16 5
6 1973 2 165 593 982 428 22 11
7 1974 53 209 560 410 30 0 4
8 1975 0 105 674 446 16 2 2
9 1976 46 422 838 726 70 4 4
10 1977 3 310 1224 1068 65 0 0
11 1978 14 354 1264 1172 69 0 6
12 1979 6 429 1222 1067 192 0 0
Now, let's rearrange the data into the univariate view, where one variable will contain the
age of the catch and another will contain the number caught of that age. We will start with
the raw data and use a loop to read in the data.
Title2 "Multivariate To Univariate Data View I";
Data TroutU;
Input Year @;
Do Age=3 To 9;
Input Number @;
Output;
End;
Datalines;
1968 13 129 646 954 99 19 4
1969 19 169 416 1031 243 47 18
1970 40 354 606 479 152 18 7
1971 32 606 1424 644 157 23 17
1972 0 226 1178 1156 116 16 5
1973 2 165 593 982 428 22 11
1974 53 209 560 410 30 0 4
1975 0 105 674 446 16 2 2
1976 46 422 838 726 70 4 4
1977 3 310 1224 1068 65 0 0
1978 14 354 1264 1172 69 0 6
1979 6 429 1222 1067 192 0 0
;
Proc Print Data=TroutU(Obs=22);
Run;
Data Step Examples

Multivariate To Univariate Data View I
OBS YEAR AGE NUMBER
1 1968 3 13
2 1968 4 129
3 1968 5 646
4 1968 6 954
5 1968 7 99
6 1968 8 19
7 1968 9 4
8 1969 3 19
9 1969 4 169
10 1969 5 416
11 1969 6 1031
12 1969 7 243
13 1969 8 47
14 1969 9 18
15 1970 3 40
16 1970 4 354
17 1970 5 606
18 1970 6 479
19 1970 7 152
20 1970 8 18
21 1970 9 7
22 1971 3 32
Here we only listed the rst 22 observations of the data set to illustrate the format of the
univariate view.
Title2 "Multivariate To Univariate Data View II";
Data TroutU;
Drop Age3-Age9;
Array Ages{3:9} Age3-Age9;
Set TroutM;
Do Age=3 To 9;
Number=Ages(Age);
Output;
End;
Run;
Proc Print Data=TroutU;

Run;
This data and print step will produce the same output as the previous one. In this instance
we are accessing a SAS data set that already has the data in the multivariate view. Notice
that the ARRAY statement species the beginning and ending value of the array index (3
and 9). Notice also that the DROP statement is needed to remove the \old" variables AGE3
through AGE9 from the new data set.
PROC TRANSPOSE
Before leaving this section it is worth looking at a procedure developed to convert between
the univariate and multivariate data views. Although not appropriate for all problems, it
can be very useful for many. PROC TRANSPOSE takes an input data set, and based upon
some structure commands, creates a new data set with a dierent conguration. For this
example we will again use the lake trout data and we will input it into the multivariate
view. Then PROC TRANSPOSE will be called to convert it to the univariate view. The DATA=
and OUT= options on the PROC TRANSPOSE statement specify the input and output SAS data
sets, respectively. The VAR statement lists the variables that will be transposed. The BY
statement instructs PROC TRANSPOSE to treat each year separately, i.e., we want to transpose
all of the values of the specied variables before moving on to the next year.
Title2 "PROC TRANSPOSE";
Data TroutM;
/* Flack Lake Trout Catch Data*/
Datalines;
1968 13 129 646 954 99 19 4
1969 19 169 416 1031 243 47 18
1970 40 354 606 479 152 18 7
1971 32 606 1424 644 157 23 17
1972 0 226 1178 1156 116 16 5
1973 2 165 593 982 428 22 11
1974 53 209 560 410 30 0 4
1975 0 105 674 446 16 2 2
1976 46 422 838 726 70 4 4
1977 3 310 1224 1068 65 0 0
1978 14 354 1264 1172 69 0 6
1979 6 429 1222 1067 192 0 0
;
Proc Transpose Data=TroutM Out=TroutU;
By Year Notsorted;
Var Age3-Age9;
Run;
Run;
Data Step Examples

PROC TRANSPOSE
OBS YEAR _NAME_ COL1
1 1968 AGE3 13
2 1968 AGE4 129
3 1968 AGE5 646
4 1968 AGE6 954
5 1968 AGE7 99
6 1968 AGE8 19
7 1968 AGE9 4
8 1969 AGE3 19
9 1969 AGE4 169
10 1969 AGE5 416
11 1969 AGE6 1031
12 1969 AGE7 243
13 1969 AGE8 47
14 1969 AGE9 18
Only the rst 14 observations are listed. The NAME variable contains the names of the
original variables, while the COL1 variable contains the values that those variables held.
There are some options on the PROC TRANSPOSE procedure line that can be used to make
the variable names more attractive. However, we will use the data step below to make the
data set look like one that we might have created from scratch if our original intent was
to have a univariate view. This will make the reverse transpose to follow more realistic
looking. Note the use of the INPUT() and SUBSTR() functions to convert values such as
AGE4 into a numeric value, here, 4. Then we dropped the NAME variable as it is no longer
needed. We also renamed the COL1 variable to NUMBER.
/* Make the data set look more like one that would
have come from reading the data directly into
the univariate view. This makes the example a
little more realistic. */
Data TroutU;
Set TroutU;
Age=Input(Substr(_NAME_,4),2.);
Rename Col1=Number;
Drop _NAME_;
Run;

Run;
Data Step Examples

PROC TRANSPOSE
OBS YEAR NUMBER AGE
1 1968 13 3
2 1968 129 4
3 1968 646 5
4 1968 954 6
5 1968 99 7
6 1968 19 8
7 1968 4 9
8 1969 19 3
9 1969 169 4
10 1969 416 5
11 1969 1031 6
12 1969 243 7
13 1969 47 8
14 1969 18 9
Again, only the rst 14 observations are listed here. In transposing the data set back, we
would like to have the same variable names as before. This would have been easy had we
not adjusted the data set as above. However, that is not particularly realistic, since you
would not likely transpose a data set and then reverse the transpose exactly as given later.
Since variable names cannot be numbers, we need to have a way to handle the numeric
values for AGE. PROC TRANSPOSE will use formats when available and this will be our way
out. For a very large number of levels of AGE, it might be more productive to construct a
character variable (like NAME ) containing the names of the variables that we wish to create.
The ID statement gives the variable containing the names of the new variables. Since we
used a FORMAT statement for AGE, the formatted values will be used.
Proc Format;
Value Ages 0="AGE0" 1="AGE1" 2="AGE2" 3="AGE3" 4="AGE4"
5="AGE5" 6="AGE6" 7="AGE7" 8="AGE8" 9="AGE9";
Proc Transpose Data=TroutU Out=TroutM;
By Year Notsorted;
Var Number;
Id Age;
Format Age Ages.;
Run;
Proc Print Data=TroutM;

Run;
Data Step Examples

PROC TRANSPOSE
OBS YEAR _NAME_ AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9
1 1968 NUMBER 13 129 646 954 99 19 4

2 1969 NUMBER 19 169 416 1031 243 47 18
3 1970 NUMBER 40 354 606 479 152 18 7
4 1971 NUMBER 32 606 1424 644 157 23 17
5 1972 NUMBER 0 226 1178 1156 116 16 5
6 1973 NUMBER 2 165 593 982 428 22 11
7 1974 NUMBER 53 209 560 410 30 0 4
8 1975 NUMBER 0 105 674 446 16 2 2
9 1976 NUMBER 46 422 838 726 70 4 4
10 1977 NUMBER 3 310 1224 1068 65 0 0
11 1978 NUMBER 14 354 1264 1172 69 0 6
12 1979 NUMBER 6 429 1222 1067 192 0 0
With the exception of the NAME variable, the data set looks like the data set we started
with. Thus, it is relatively easy to go either direction with PROC TRANSPOSE when dealing
with univariate and multivariate views.
2.4.2 Indeterminant DO Loops

There are occasions when a loop is needed, but it is not known in advance of the loop,
how many iterations will be needed. This is usually determined from the data themselves,
or by some other mechanism that triggers the end of the looping process. The SAS data
step has DO WHILE and DO UNTIL statements to handle indeterminant loops. The DO UNTIL
loop is always executed at least once while the DO WHILE loop is executed only if the WHILE
condition is met. The end of the scope of the loop is given by an END statement.
As an example, consider a data set that has as physical data lines, a part-time employee's
name and a list of the hours worked by the employee. We would like to create a data set
that gives the employee name and hours such that each hour value is treated as a separate
observation. Since the number of values on each physical line is unknown, we we read until
there is nothing left to read on the line. The MISSOVER option is used to keep the pointer
from moving to the next line to read new data.
Title2 "DO UNTIL Loop";
Data Pay;
Length Employee $15;
Input Employee & Hours @;
Do Until (Hours=.);
Output;
Input Hours @;
End;
Datalines;
Bob Jones 3.5 8 8 2 3
Erin Walsh 6 7.5 1.5
Tom N. Smith 2 3 3 1 6 7 3
;
Proc Print;
Run;
Data Step Examples

DO UNTIL Loop
OBS EMPLOYEE HOURS
1 Bob Jones 3.5

2 Bob Jones 8.0
3 Bob Jones 8.0
4 Bob Jones 2.0
5 Bob Jones 3.0
6 Erin Walsh 6.0
7 Erin Walsh 7.5
8 Erin Walsh 1.5
9 Tom N. Smith 2.0
10 Tom N. Smith 3.0
11 Tom N. Smith 3.0
12 Tom N. Smith 1.0
13 Tom N. Smith 6.0
14 Tom N. Smith 7.0
15 Tom N. Smith 3.0
The use of the single @ sign keeps the pointer on the same line until all data for an employee
have been read. Then the loop exits and the data step begins again, but because the single
@ sign was used, the previous line gets discarded and a new one is worked on.
2.5 The NULL Data Set

In some problems the power of the SAS data step is needed, but no new or changed SAS
data set will be produced. This happens, for example, when the data step is used to produce
a report, for system management tasks such as part of an interactive program, or to obtain
information to be used by the macro language. The reserved data set name _NULL_ is used
as the name for the \new" data set when one is not wanted. As an example with the ier
data set, let's say that we wanted to list all observations in a data set for which the sex of
the sh is female and that we wanted the average weight of these sh printed at the end of
the report.
Title2 "Null Data Set";
Data _NULL_;
File Print;
Infile Datalines EOF=EOF;
If _N_=1 Then Put " DATE WEIGHT";
If Sex=2;
Date=MDY(Mo,Day,Yr);
Put Date MMDDYY8. +2 Wt 8.1;
N+1;
TotWt+Wt;
Return;
Eof:
AveWt=TotWt/N;
Put // "Average Weight =" AveWt;
Return;
Datalines;
7 21 74 2 7 3 0 5 3.5 1.4 57.00
1 9 76 2 3 2 0 1 5.3 3.0 0.00
12 18 74 2 4 1 0 5 5.4 3.4 83.20
2 15 76 2 1 1 0 5 6.0 5.7 111.00
9 13 75 2 2 2 1 5 10.1 23.4 203.00
... lines omitted ...
;
Data Step Examples

Null Data Set
DATE WEIGHT
01/09/76 3.0
09/13/75 23.4
04/18/76 16.8
01/28/75 40.9
04/18/76 17.9
04/18/75 47.9
03/22/75 101.6
04/19/75 76.3
06/24/75 125.8
Average Weight =50.4
This example also demonstrates several other features of the SAS language that have not
yet been discussed. The 2 lines N+1; and TOTWT+WT; are called sum statements. They take
the value or value of the variable to the right of the plus sign and add it to the value of the
variable to the left of the plus sign. They are not exactly like, for example, N=N+1; because
when the data step loop begins again, unless N is retained it will lose its value. This does
not happen with variables in sum statements. They are automatically retained.
Note also the strange IF SEX=2; statement that appears to be missing the THEN clause. This
statement is equivalent to the one IF SEX NE 2 THEN DELETE;. It is called a subsetting IF
statement.
Lastly is the data step \subroutine." The data step does not have subroutines that are
separated from the data step, rather they are contained within it and all variables are global
to the main and subroutine sections. These \subroutines" behave like goto statements,
but control can return to the code immediately following the call. These subroutines are
called from input/output coditions, such as above where the condition is an \End of File"
condition and the INFILE option EOF is used to point to a subroutine to be executed when the
condition is met, or from LINK and GOTO statements. If the LINK statement is used, control
returns to the spot immediately following the LINK statement, while the GOTO statement
simply redirects program execution through the subroutine and control usually returns
to the top of the data step loop. Note the use of the RETURN statement. The rst use
prevents execution from continuing into the subroutine, while the second marks the end of
the subroutine. Had the rst RETURN been left out, a \running" average of the sh weights
would have been generated for the females.
2.6 Data Step Examples

2.6.1 Simple Random Sampling Without Replacement
This example will demonstrate how to take a simple random sample (SRS) without replace-
ment from a frame or list of the population under study. Here, we will treat our ier data
set as a population and take a simple random sample from it. For real problems, the frame
might be a list of names and addresses of persons on a license sales list. Those selected
from the list would then be sent a questionnaire. The example demonstrates several features
covered in this chapter, as well as showing some very simple macro language features.
Title2 "Simple Random Sample";
Data FLIER;
Datalines;
... data go here ...
;
%Let Size=12; /* Define The Sample Size */

Data SRS;
Drop Seed K NN Prob;
Retain Seed 123459; /* Random Number Seed */
Retain K &Size; /* Sample Size */
NN=N; /* Population Size */
Call Symput("PopSize",Trim(Left(Put(NN,20.))));
I=0;
Do Until ((I>=N) or (K=0));
I+1;
Set FLIER Point=I NObs=N;
Prob=K/NN;
If Ranuni(Seed) <= Prob Then /* Select This One */
Do;
Output;
K=K-1; /* We Need One Less In The Sample */
End;
NN=NN-1; /* We Have One Less To Choose From */
End;
Stop; /* We Are Done */
Run;
Title3 "Sample of Size &Size From a Population of Size &Popsize";

Proc Print Data=SRS;
Run;
Data Step Examples

Simple Random Sample
Sample of Size 12 From a Population of Size 25
1 7 21 74 2 7 3 0 5 3.5 1.4 57.00

2 1 9 76 2 3 2 0 1 5.3 3.0 0.00
3 2 15 76 2 1 1 0 5 6.0 5.7 111.00
4 9 13 75 2 2 2 1 5 10.1 23.4 203.00
5 1 28 75 2 3 1 1 5 13.5 33.2 242.00
6 4 18 76 3 3 1 2 5 11.9 44.4 244.00
7 1 28 75 2 3 1 2 5 16.0 103.4 307.80
8 4 18 76 3 2 2 3 5 9.1 17.9 171.40
9 2 23 75 2 3 1 3 5 17.2 127.1 346.00
10 4 18 75 2 1 2 4 5 12.1 47.9 266.60
11 2 8 75 2 3 1 4 4 17.5 134.4 357.75
12 1 28 75 2 3 1 6 1 18.5 131.6 409.00
This particular method uses unequal probability methods in the selection of each element
into the sample, but because each sample of size 12 has the same probability of selection as
every other sample of size 12, the sample is a simple random sample. Since we do not know
when the 12 elements will be selected, it could be the rst 12 or the last 12 observations,
we use an indeterminant loop, the DO UNTIL loop. Note also that the data step loop is
executed only once here. Otherwise, multiple samples of size 12 would have been taken.
The STOP statement is used to keep the data step from looping again by simply stopping
execution of the data step.
2.6.2 Data Recoding

Reversing a Likert scale
A Likert scale is often used in questionnaires to measure a degree of belief or agreement
with an idea or statement. If one wanted to measure the degree of satisfaction with a shing
trip, a questionnaire might ask
Would you consider this to be the best shing trip that you have had in the last
5 years?
1=Strongly Disagree
2=Disagree Somewhat
3=Neutral
4=Agree Somewhat
5=Strongly Agree
A typical way of analyzing a collection of such types of questions asked of the same indi-
viduals is to create a \scale" which is often the simple sum of the scores (1-5) over each
question. However, in many questionnaires, some questions may be worded in such a way
that a 5 means \positive" or \agree", while for others, a 5 means \negative" or \disagree".
Thus, some questions may need the Likert scale reversed. This is very easily accomplished
using a mathematical transformation. Let's make the example more interesting by stating
that variables Q1, Q20, Q32, and Q184 need reverse coding.
Data One;
Input Subject Q1-Q200;
Array Reverse {*} Q1 Q20 Q32 Q184;
Do I=1 To 4;
Reverse(I)=6-Reverse(I);
End;
Datalines;
data follow here
Using formats
For some problems the data recoding can be particularly complicated. Either a simple
mathematical transformation is not possible (as could be done above), or one may need
to convert between numeric and character data. The IF-THEN-ELSE statements or CASE
expression can be used for these purposes, but often require a lot of programming. Often
times the PUT() and INPUT() functions can be used to simplify this process. Assume that
we have a data set on vegetation collected from transects run across a marsh in which the
species of plant and coverage along a 5m stick are recorded. To facilitate data entry, only
a 2-character abbreviation for the species is input. However, the scientist would like the
complete name in the data set. The example below illustrates one approach to accomplishing
this.
Title2 "Data Recoding";
Proc Format;
Value $Veg "wg"="Wire Grass" "sp"="Spartina patens"
"br"="Bull Rush" "wi"="Widgeon Grass";
Run;
Data One;
Input Sp $ Coverage @@;
Length Species $20.;
Species=Put(Sp,$Veg.);
Drop Sp;
Datalines;
wi 0.4 wi 0.6 wg 1.2 br 0.2 sp 4.8
;
Run;
Data Step Examples

Data Recoding
OBS COVERAGE SPECIES
1 0.4 Widgeon Grass

2 0.6 Widgeon Grass
3 1.2 Wire Grass
4 0.2 Bull Rush
5 4.8 Spartina patens
Chapter 3
Working With Files

The SAS System works with data stored in a special format called a SAS data set. Although
it is possible for SAS to work with data in other program formats using SAS/VIEWS and
\data engines", it is much more common to work with data in the SAS data set format.
The SAS data set is specially formatted to permit the SAS System to quickly and easily
work with the data. For example, SAS procedures can determine whether the data have
been sorted without rst having to read the entire data set.
3.1 External Files

Often the rst step in a SAS program is the input of \raw" data into a SAS data set for
analysis. Later in the program, we may wish to create an external le or report for use
elsewhere. For these actions we must learn about some of the interfaces that the SAS system
has with external (non-SAS) les.
For input into a data step, external les can typically be referenced using several dierent
methods, some of which depend upon the platform. On the WIN95 and Unix platforms,
the lename for the \raw" data can simply be placed in the INFILE statement. Assume
that the le of interest is named raw.dat and is stored in the subdirectory fish. On the
WIN95 platform we could use the statement
Infile "c:\fish\raw.dat";
while on the Unix platform it might be referenced as

Infile "fish/raw.dat";
where the fish subdirectory is relative to our current working directory on Unix. Alterna-
tively, the FILENAME statement can be used to link a le reference to the actual le. This
can make programs much easier to use on multiple platforms and to update and modify
later. The basic statement looks like
FILENAME leref "path-and-le-name";
Using the WIN95 le given earlier a skeleton program would look like
38
CHAPTER 3. WORKING WITH FILES 39
FILENAME fish "c:\fish\raw.dat";
Data one;
Infile fish;
Input ....;
Run;
Notice that the INFILE statement is still needed, but it species the le reference rather
than the actual le name. The FILENAME statement can also be used to reference more than
one le in a subdirectory. Study the following example.
FILENAME fish "c:\fish"; /* specify the subdirectory only */
Data Halibut;
Infile Fish(Halibut.dat);
Input ...;
Run;
Data Coho;
Infile Fish(Coho.dat);
Input ...;
Run;
The SAS system assumes that data les will have the extension .DAT, and so the extension
can be dropped from the INFILE statement. Thus the code could have been written,
FILENAME fish "c:\fish"; /* specify the subdirectory only */
Data Halibut;
Infile Fish(Halibut);
Input ...;
Run;
Data Coho;
Infile Fish(Coho);
Input ...;
Run;
An example of these methods is shown for the ier data set. The orignal data set was also
broken into 2 pieces called FLIERS1.DAT and FLIERS2.DAT, and the variable headings were
removed from each of these two data sets. To better illustrate the results, the SAS log for
the code is shown. The rst method uses only the INFILE statement.
238 Title2 "External File Reference With Infile";
239 Data Flier;
240 Infile "c:\projects\alaska97\flier.dat" Firstobs=2;
241 Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL;
242 Run;
NOTE: The infile "c:\projects\alaska97\flier.dat" is:

FILENAME=c:\projects\alaska97\flier.dat,
RECFM=V,LRECL=256
NOTE: 664 records were read from the infile "c:\projects\alaska97\flier.dat".

The minimum record length was 102.
The maximum record length was 102.
NOTE: The data set WORK.FLIER has 664 observations and 11 variables.
NOTE: The DATA statement used 0.77 seconds.
243
244 Proc Print Data=Flier(Obs=10);
245 Run;
NOTE: The PROCEDURE PRINT used 0.11 seconds.

Next, the le reference is moved to the FILENAME statement.
247 Title2 "FILENAME File Reference";
248 Filename Fish "c:\projects\alaska97\flier.dat";
249 Data Flier;
250 Infile Fish Firstobs=2;
252 Run;
NOTE: The infile FISH is:

FILENAME=c:\projects\alaska97\flier.dat,
RECFM=V,LRECL=256
NOTE: 664 records were read from the infile FISH.

The FILENAME statement can also be used to refer to directories. When directories are
used, the INFILE statement is then used to select the member to process. This method also
permits several directories to be concatenated together to search for les, and the same le
reference can be used for multiple les, and for reading and writing.
254 Title2 "FILENAME Directory Reference";
255 Filename Fish "c:\projects\alaska97";
256 Data Flier;
257 Infile Fish(Flier.dat) Firstobs=2;
259 Run;
NOTE: The infile library FISH is:

DIRECTORY=c:\projects\alaska97
NOTE: The infile FISH(Flier.dat) is:

DIRECTORY=c:\projects\alaska97,
MEMBERNAME=c:\projects\alaska97\Flier.dat,
RECFM=V,LRECL=256
NOTE: A total of 664 records were read from the infile library FISH.
NOTE: 664 records were read from the infile FISH(Flier.dat).
Since the SAS system assumes that data les end with .DAT, the sux can be dropped
from the member name. However, for names that do not have a sux, the name should be
enclosed within quotes.
261 Data Flier;
262 Infile Fish(Flier) Firstobs=2;
264 Run;

NOTE: The infile FISH(Flier) is:

MEMBERNAME=c:\projects\alaska97\Flier.DAT,
RECFM=V,LRECL=256
NOTE: 664 records were read from the infile FISH(Flier).
Sometimes the data that we would like to work with exists in more than one physical raw
le. For example, the catch data might be kept in separate spreadsheets for each harvest
year. To analyze the data together, the data sets must be concatenated. Below, the two
ier data sets (the original data set split into 2 pieces) will be input and concatenated using
3 dierent data steps. Note that the le reference from above is being reused.
266 Title2 "Data Set Concatenation of Files";
267 Data Flier1;
268 Infile Fish(Fliers1);
270 Run;

NOTE: The infile FISH(Fliers1) is:

MEMBERNAME=c:\projects\alaska97\Fliers1.DAT,
RECFM=V,LRECL=256
NOTE: 298 records were read from the infile FISH(Fliers1).
NOTE: The data set WORK.FLIER1 has 298 observations and 11 variables.
271 Data Flier2;

272 Infile Fish(Fliers2);
274 Run;

NOTE: The infile FISH(Flier2) is:

MEMBERNAME=c:\projects\alaska97\Fliers2.DAT,
RECFM=V,LRECL=256
NOTE: 366 records were read from the infile FISH(Fliers2).
275 Data Flier;

276 Set Flier1 Flier2;
277 Run;
As the FILENAME statement can concatenate directories together, it can also concatenate
physical les together. The method is to put the list of lenames within parenthesis, each
separated by a comma.
279 Title2 "FILENAME Concatenation of Files";
280 Filename Fish
("c:\projects\alaska97\fliers1.dat","c:\projects\alaska97\fliers2.dat");
281 Data Flier;
282 Infile Fish;
284 Run;

FILENAME=c:\projects\alaska97\fliers1.dat,
RECFM=V,LRECL=256

FILENAME=c:\projects\alaska97\fliers2.dat,
RECFM=V,LRECL=256

File references can also be listed as

66 Title2 "List All Defined File References";
67 Filename _ALL_ List;
NOTE: Fileref= FISH
Physical Name= c:\projects\alaska97\Fliers2.dat
c:\projects\alaska97\Fliers1.dat
NOTE: Fileref= TMP1
Physical Name= C:\PROJECTS\Alaska97\Files.sas
and can be cleared using

69 Title2 "Clear The FISH File Reference";
70 Filename Fish Clear;
NOTE: Fileref FISH has been deassigned.
Clearing a le reference can free up some memory, but usually is important with complicated
programs to insure that an important le is not written over, or the wrong le read as input,
due to programming errors.
Files can also be created and appended to using techniques similar to the above for input.
Basically, dene a le reference to receive the data and use the FILE statement to direct
output to the reference. The example below rst writes the FLIER1 data to an external ascii
le, then next, this could be at some latter time for example, appends the FLIER2 data to
the same external le.
72 Title2 "Creating An External File";
74 Data _NULL_;
75 File Fish(Flier1N2);
76 Set Flier1;
77 Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);
78 Run;
NOTE: The file library FISH is:

NOTE: The file FISH(Flier1N2) is:

MEMBERNAME=c:\projects\alaska97\Flier1N2.DAT,
RECFM=V,LRECL=256
NOTE: A total of 298 records were written to the file library FISH.
NOTE: 298 records were written to the file FISH(Flier1N2).
80 Title2 "Appending To An Existing External File";

81 Data _NULL_;
82 File Fish(Flier1N2) Mod;
83 Set Flier2;
84 Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);
85 Run;
NOTE: The file library FISH is:

NOTE: The file FISH(Flier1N2) is:

MEMBERNAME=c:\projects\alaska97\Flier1N2.DAT,
RECFM=V,LRECL=256
NOTE: A total of 366 records were written to the file library FISH.
NOTE: 366 records were written to the file FISH(Flier1N2).
A sample of what the output data set looks like is given below.
7 21 74 2 7 3 0 5 3.500 1.400 57.000
1 9 76 2 3 2 0 1 5.300 3.000 0.000
12 18 74 2 4 1 0 5 5.400 3.400 83.200
2 15 76 2 1 1 0 5 6.000 5.700111.000
12 14 75 2 1 1 0 5 6.100 6.600109.000
Notice the use of the grouped formats to associate a single format with several dierent
variables that appear together. A slightly larger eld width for TSL appears needed. To
append the second data set to the rst, it was necessary to specify the MOD option on the
FILE statement.
3.1.1 FTP Access

The FILENAME statement can also be used to access, modify, and create information stored
or to be stored on other computers using TCP/IP network methods. The FTP option can
be used to open an FTP (le transfer protocol) connection with another computer. Data
can then be read from the FTP server or can be written to the FTP server, just as if the le
were stored on the local machine. The syntax for writing the le flier.dat to the Unix1
computer at LSU for user bill and to prompt for the password would be
Title2 "FTP Access To Write A File";
Filename Out FTP "flier.dat" /* Name of data set */
host="unix1.sncc.lsu.edu" /* Host name */
user="bill" /* User login name */
prompt /* Prompt for password */
cd="/u/bill/alaska/stuff" /* Change directory first */
rcmd="ascii" /* Remote command to exec */
recfm=v; /* User variable record length */
Data _NULL_;
File Out;
Set Flier1;
Put (Mo Day Yr) (3.) (Ar St Sex Age Sn) (2.) (Lt Wt TSL) (7.3);
Run;
This technique can also be used to input not only les via FTP, but to retrieve any infor-
mation that can be obtained via FTP. To get a directory listing from the remote machine
above, we might write
Title2 "FTP Access To List A Directory";
Filename Dir FTP " " /* Null data set (required) */
ls /* FTP command */
host="unix1.sncc.lsu.edu" /* Host name */
user="bill" /* User login name */
prompt /* Prompt for password */
cd="/u/bill/alaska/stuff"; /* Change directory first */
Data _NULL_;
File Log;
Infile Dir;
Input;
Put _INFILE_;
Run;
The data step simply reads then writes the directory listing to the SAS Log. Note that the
data could have been placed into a variable or variables, and then processed using the data
step.
3.1.2 WWW Access
Not to be outdone by the FTP access method, the FILENAME statement can also reference
a URL (uniform resource locator), otherwise known as a web page. The page might be a
hypertext document or might be a le being distributed via a WWW server. In the example
below, we will assume that the ier data set is stored at the URL
http://www.stat.lsu.edu/faculty/moser/flier.dat
and we wish to read these data into our program.

Title2 "HTTP Access To Read A File";
Filename Fish URL "http://www.stat.lsu.edu/faculty/moser/flier.dat";
Data Flier;
Infile Fish Firstobs=2;
Run;
Some important aspects of the FTP and URL access, is that data may be stored in some
\distant" location, yet reachable by network, and be accessed at runtime. This could be
very important for real-time data, such as from data loggers, or can be important when
many people are accessing the data and and single, master copy is needed. From a systems
programmer perspective, it means that the SAS system can be used to develop an interface
to the internet.
3.2 Including External SAS Code

As you will begin to see later, we may wish to reuse the same SAS code in many dierent
programs. This may be especially true of SAS macros that we may have written to solve
various problems. In other cases, the program that we are working on is very large and
we would like to maintain the program in separate pieces or les. SAS code external to
the currently executing SAS session can be included into the session using the %INCLUDE
statement. The argument to the statement can either be a le name or a le reference.
Notice that if a le reference is used, then all of the powerful methods described above for
reading in data are available. It is assumed that SAS program le names end with the .sas
extension. A couple of examples are given below.
%Include "c:\projects\alaska97\fishdemo.sas" / Nosource2;
Filename Fishdemo URL "http://www.stat.lsu.edu/faculty/moser/fishdemo.sas";

%Include Fishdemo;
The NOSOURCE2 option to the %INCLUDE statement is used to suppress printing the included
code to the SAS log.
3.3 The SAS Data Library

SAS data sets are stored in SAS libraries. When the SAS System was initially being
developed on the IBM mainframes, a SAS library was a specially formatted MVS data set
whose members were SAS data sets. Now, a SAS library may be nothing more than a
reference to a subdirectory in a lesystem on a PC or a Unix workstation, and may contain
much more than SAS data sets. The idea behind the library is to organize and reference a
collection of similar objects, such as data sets, in a common way. If I have several dierent
data sets on shing eort, say one data set for each year since 1975, I may want to keep
the data in each year separated, but be able to nd all of the data very easily and be able
to combine any or all of it very easily as well. On a Windows PC you can identify the SAS
data sets by the sux \ssd", while on an AIX Unix workstation the les have the sux
\ssd001". Because they are in a special format, they are not to be read or edited using
non-SAS tools.
3.3.1 The LIBNAME Statement

The LIBNAME statement is used to specify the name and location of a SAS library. It has
the basic format
LIBNAME libname "path";
where \libname" is some 1-8 character name and \path" is the path in the lesystem
where the library is to be kept. Upon startup, the SAS System internally issues a LIBNAME
statement dening a library named WORK which it uses as the default library. By default, at
the end of the SAS session, the WORK library is cleaned of all SAS les. Thus, to keep any
SAS data sets permanent, you should store them in a library that you specify. To do this,
a two-level data set name is used in the SAS program. A two-level name has two names
which are separated by a period. The rst name is the library name dened in the LIBNAME
statement, and the second name is the member or data set name.
/* Point to the subdirectory containing the raw effort data */
FILENAME raw "c:\temp\effort97.csv";
/* Point to the SAS Library to store the effort data */
LIBNAME effort "c:\fishing\effort";
DATA effort.year1997;
LENGTH location $30.;
INFILE raw delimiter=",";
INPUT date mmddyy8. location & effort;
RUN;
Here, the raw data are input into the SAS data set effort.year1997, which can be used
later in the same program or in some future program for analysis, without the need for the
above data step. For example, the SAS code below could be used to print the data set at
some future time.
LIBNAME effort "c:\fishing\effort";
PROC PRINT DATA=effort.year1997;
RUN;
3.3.2 Library Procedures
There are several procedures for working with SAS libraries. The procedures permit update
of the information about a data set and utility operations such as copying, deletion or
renaming a member. You can also append one member to another.
PROC DATASETS
PROC DATASETS does many things related to the management of data sets. The usual way
to operate with this procedure is in full-screen mode (the default). In full-screen mode you
can easily modify parameters of the data set, copy, rename, and delete data sets, and list
various parameters associated with them. A couple of these tasks are shown below in a
non-full-screen session.
1777 Title2 "Proc Datasets";
1778 Proc Datasets Library=Work Nofs;
-----Directory-----
Libref: WORK
Engine: V612
Physical Name: C:\SAS\SASWORK\#TD16077
# Name Memtype Indexes
1 CURSTAT CATALOG
2 FLIER DATA
3 FLIER1 DATA
4 FLIER2 DATA
5 SASMACR CATALOG
1779 Modify Flier(Label="James Geaghan Flier Data Set");
1780 Run;
1781 Delete Flier1;

1782 Run;
NOTE: Deleting WORK.FLIER1 (memtype=DATA).

1783 Quit;
PROC CONTENTS
The PROC CONTENTS procedure produces a printout of information about the structure of
a data set, such as the variable names, types, lengths, labels, and formats, and the num-
ber of observations in the data set, etc. This listing can also be produced from within
PROC DATASETS.
1785 Title2 "Proc Contents";
1786 Proc Contents Data=Work.Flier;
1787 Run;
Proc Contents
CONTENTS PROCEDURE
Data Set Name: WORK.FLIER Observations: 664

Member Type: DATA Variables: 11
Engine: V612 Indexes: 0
Created: 18:55 Thursday, September 25, 1997 Observation Length: 88
Last Modified: 18:55 Thursday, September 25, 1997 Deleted Observations: 0
Protection: Compressed: NO
Data Set Type: Sorted: NO
Label: James Geaghan Flier Data Set
-----Engine/Host Dependent Information-----
Data Set Page Size: 8192

Number of Data Set Pages: 8
File Format: 607
First Data Page: 1
Max Obs per Page: 92
Obs in First Data Page: 73
-----Alphabetic List of Variables and Attributes-----
# Variable Type Len Pos

------------------------------------
7 AGE Num 8 48
4 AR Num 8 24
2 DAY Num 8 8
9 LT Num 8 64
1 MO Num 8 0
6 SEX Num 8 40
8 SN Num 8 56
5 ST Num 8 32
11 TSL Num 8 80
10 WT Num 8 72
3 YR Num 8 16
This type of listing can be very important for documentation purposes and for nding out
what is in a permanent SAS data set.
PROC COPY
PROC COPY is used to copy SAS data set members (and catalogs) from one data library to
another. This is useful for making backups. When combined with import/export engines,
it can also be used to convert data from one form into another. The example below simply
copies a data set from one library to another.
1789 Libname Perm "c:\temp";
NOTE: Libref PERM was successfully assigned as follows:
Engine: V612
Physical Name: c:\temp
1790 Proc Copy In=Work Out=Perm;
1791 Select Flier;
1792 Run;
NOTE: Copying WORK.FLIER to PERM.FLIER (MEMTYPE=DATA).

NOTE: The data set PERM.FLIER has 664 observations and 11 variables.
PROC APPEND
PROC APPEND is normally used to append one SAS data set to another SAS data set called
the base. If the base is a new data set, then a simple copy is used. In the example below,
the FLIER2 data set is appended to the original FLIER data set.
1795 Title2 "Proc Append";
1796 Proc Append Base=Perm.Flier Data=Flier2;
1797 Run;
NOTE: Appending WORK.FLIER2 to PERM.FLIER.

NOTE: 366 observations added.
NOTE: The data set PERM.FLIER has 1030 observations and 11 variables.
PROC DELETE
The DELETE procedure simply deletes a SAS data set.
1799 Title2 "Proc Delete";
1800 Proc Delete Data=Perm.Flier;
1801 Run;
NOTE: Deleting PERM.FLIER (memtype=DATA).

PROC CATALOG
The CATALOG procedure is necessary for working with SAS catalogs. Catalogs contain a
wide variety of information. Rather than data, per se, they contain settings for various
aspects of the SAS system. They may contain the help les used by SAS. They can contain
information that you have created for a SAS/AF program. We will look at what might
show up in as user's SAS Prole.
2361 Title2 "Proc Catalog";
2362 Proc Catalog Cat=sasuser.profile;
2363 Contents;
2364 Run;
2365 Quit;
Files and Libraries

Proc Catalog
Contents of Catalog SASUSER.PROFILE
# Name Type Date Description
1 AF AFGO 09/08/97
2 DMKEYS KEYS 04/06/97 Function Key Definitions
3 PASSIST SLIST 09/25/97 User profile
4 _VPLAY_ SLIST 06/25/97 VIDEO: player preferences
5 MRUWSAVE WSAVE 09/25/97
Some of these catalogs might be accessible using functions found in PROC CATALOG, but typ-
ically the parameters that they contain are to be set using the programs that these catalogs
are associated with (such as the Display Manager). This procedure is more commonly used
in its interactive mode.
3.4 File Import/Export/Transport
3.4.1 Import/Export
Many users enter their data using spreadsheets or data base software such as Microsoft
Excel, Lotus 1-2-3, or Borland Paradox. If you have SAS/ACCESS installed, special
import/export lters are available to directly access and use the data stored in several
other software formats. A SAS/AF application is provided on the File->Import and
File->Export pull-down menu that automates the programming of this task.
The task of importing an Excel spreadsheet containing age and growth information of Flier
sunsh, courtesy of Dr. James Geaghan, LSU, where the rst non-blank row of the spread-
sheet contains the variable names (SAS compatible) and the data follow in the remaining
rows, is given below using PROC ACCESS.
PROC ACCESS DBMS=EXCEL;
CREATE work.xcell.ACCESS;
PATH='C:\fishing\Flierdat.xls';
WORKSHEET='flier';
GETNAMES YES;
SCANTYPE=YES;
CREATE work.xcell.VIEW;
SELECT ALL;
RUN;
DATA work.flier;
SET work.xcell;
RUN;
The PROC ACCESS step creates a view into the data set and species the particular worksheet
to import, whether variable names are to be gotten from the worksheet, etc. The library
name WORK is included to show that the data view is actually two catalog members within
the libarary member XCELL. Note that the data step is the process that actually converts
the data to a SAS data set. It is possible to use the data directly from the spreadsheet by
referencing the data view as in the SET statement.
PROC PRINT DATA=work.xcell;
RUN;
In general, this would not be the most ecient way to access the data as the \conversion"
would need to be performed on each data access.
It is also possible to export the data to other program formats. PROC DBLOAD is used below
to convert the ier data set back into an Excel spreadsheet.
PROC DBLOAD DBMS=EXCEL DATA=work.flier;
PATH='C:\fishing\newflier.xls';
PUTNAMES YES;
LIMIT=0;
LOAD;
RUN;
Also there are a number of other options for both PROC ACCESS and PROC DBLOAD that
can be used to control the variable names, types, data ranges, and other input/output
information.
What if you do not have SAS/ACCESS? It is not too dicult to input a spreadsheet using
the SAS data step. If the data contain no commas, then the \comma separated values"
(CSV) format produced by most spreadsheets and many data bases is often convenient.
The rst step is to save the spreadsheet into the CSV format (don't remove the original
spreadsheet le, the CSV le is just an intermediate step). Next, write a SAS data step
to input the data using an INFILE statement containing the DELIMITER="," option. As an
example, let's assume that the ier data set has been saved as a CSV le called \ ierdat.csv".
A portion of the data set is shown below.
Mo,Day,Yr,Ar,St,Sex,Age,Sn,Lt,Wt,TSL,Size1,Size2,Size3,Size4,Size5,Size6,Edge,No
7,21,74,2,7,3,0,5,3.5,1.4,57.00,0.00,0.00,0.00,0.00,0.00,0.00,34.72,1
1,9,76,2,3,2,0,1,5.3,3.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2
12,18,74,2,4,1,0,5,5.4,3.4,83.20,0.00,0.00,0.00,0.00,0.00,0.00,47.79,3
2,15,76,2,1,1,0,5,6.0,5.7,111.00,0.00,0.00,0.00,0.00,0.00,0.00,60.97,4
12,14,75,2,1,1,0,5,6.1,6.6,109.00,0.00,0.00,0.00,0.00,0.00,0.00,60.04,5
Now we can write a SAS data step to read in these data. Note that we need to skip over
the rst line of the data set as it contains the spreadsheet column headings.
FILENAME in "flierdat.csv";
DATA flier;
INFILE in FIRSTOBS=2 DELIMITER=",";
INPUT Mo Day Yr Ar St Sex Age Sn Lt Wt TSL
Size1 Size2 Size3 Size4 Size5 Size6 Edge No;
RUN;
If character data were included in the data set, then appropriate \informats" would need
to be used for reading those variables.
The third-party software produce DBMS/COPY is designed to move data between a number
of dierent data formats including SAS data sets. This introduces the need for a \third"
program, but can greatly ease the data export/import process, especially when the same
data are needed in several dierent formats.
The SAS/CONNECT product that interconnects a network of PC's, Unix workstations,
and/or mainframes, can also be used to move SAS data sets from one machine type to
another. This product contains PROC UPLOAD and PROC DOWNLOAD for easily moving SAS
data sets, but they can also be used to handle ascii les as well. This product also permits
you to execute code and work with data on several dierent platforms at the same time.
3.4.2 Transport
To copy a SAS data set from one system to another may lead to trouble if the systems and
SAS versions are not exactly compatible with one another. For example, the structure of a
SAS data set on an IBM MVS mainframe is quite dierent from that on a WIN95 PC. To
move or copy a SAS data set, a transport data library can be created. Since it is a library,
it may contain more than one SAS data set. The CPORT and CIMPORT procedures are used,
respectively, to create and import a transport data library. The process is to rst create
the transport library, transport it to the other host system, then import the library. A
real advantage to this methodology is that you can send someone a SAS data set and know
exactly what is in the data set, while sending them a \raw" data set requires that they
know how to read in the data correctly. In other instances, the data may not exist in its
\raw" form, but may have been entered directly into a SAS data set. Below is an example
that creates a transport library of the ier data set, then imports it as if we had moved
the transport library to another system. First, let's create the transport data library. The
LIBRARY argument species which SAS data library will be used, the FILE argument species
the destination for the transport image, and the MT argument species what types of things
from the SAS data library are to be transported. Here we specied that only SAS data sets
are to be considered.
668 Title2 "Create A Transport Data Library";
669 Filename XPT "c:\projects\alaska97\flier.xpt";
670 Proc CPort Library=Work File=XPT MT=Data;
671 Run;
NOTE: Proc CPORT begins to transport data set WORK.FLIER

NOTE: The data set contains 11 variables and 664 observations.
Logical record length is 88.
NOTE: Proc CPORT begins to transport data set WORK.FLIER1

NOTE: Proc CPORT begins to transport data set WORK.FLIER2

672 Filename XPT Clear;

NOTE: Fileref XPT has been deassigned.
Now we can reverse the process using the CIMPORT procedure. This would be the same type
of code that would be used on the \other" platform.
674 Title2 "Import From A Transport Data Library";
675 Filename IMPT "c:\projects\alaska97\flier.xpt";
676 Proc CImport Library=Work File=IMPT;
677 Run;
NOTE: Proc CIMPORT begins to create/update data set WORK.FLIER

NOTE: Data set contains 11 variables and 664 observations.
NOTE: Proc CIMPORT begins to create/update data set WORK.FLIER1

NOTE: Proc CIMPORT begins to create/update data set WORK.FLIER2

Note that we could couple some of this code with the FTP le access method described
earlier to let the SAS system move the le for us. When moving the transport library,
use a binary transfer method so that the internal structure of the transport library is not
changed.
3.5 The X Files
The SAS system provides through various methods, access to the operating system and its
commands. The X statement can be used to pass a command along to the operating system
and have the command executed as if it were given at a system command prompt (such
as at an MS-DOS, Unix, or terminal window). The SAS program will wait, by default,
until the command has completed. Also by default, the user will have to \manually" close
the external shell or program started by the X statement. The program below will use the
WIN95 operating system to generate a directory listing of the ier data sets and will redirect
the output to a le named flier.dir. The NOXWAIT system option is used to tell the SAS
system to continue processing SAS statements once the X statement has completed, while
the XSYNC option is used to prevent the SAS system from processing more statements before
the command has completed. We then open the flier.dir data set and read its contents
and write them to the SAS log.
652 Options XSync NoXWait;
653 X "dir c:\projects\alaska97\flier*.dat > c:\projects\alaska97\flier.dir"
653 ;
655 Data _NULL_;
656 Infile Fish("flier.dir") EOF=LF;
657 File Log;
658 Input;
659 If _N_=1 Then Link LF;
660 Put _INFILE_;
661 Return;
662 LF:
663 Put //;
664 Return;
665 Run;

NOTE: The infile FISH("flier.dir") is:

MEMBERNAME=c:\projects\alaska97\flier.dir,
RECFM=V,LRECL=256
Volume in drive C is RON RICO

Volume Serial Number is 1BF6-1260
Directory of C:\PROJECTS\Alaska97
FLIER DAT 69,160 09-22-97 5:57p FLIER.dat

FLIERS2 DAT 38,064 09-22-97 6:20p Fliers2.dat
FLIER1N2 DAT 27,888 09-22-97 9:32p Flier1N2.DAT
FLIERS1 DAT 30,992 09-22-97 6:19p Fliers1.dat
4 file(s) 166,104 bytes
0 dir(s) 345,735,168 bytes free
NOTE: 11 records were read from the infile FISH("flier.dir").
Note that with very little programming we could have read the directory listing into variables
in a SAS data set and have processed them in some fashion, say to compute the disk space
occupied by the les.
Chapter 4
The Macro Language

Macro languages are programming languages that lie on top of another programming lan-
guage. Their purpose, typically, is to control which code may be compiled at compile time
(say, certain aspects of the programming code might be machine dependent) and to do
variable substitution. That is, the macro language looks for special symbols that indicate
that it is simply to replace this set of special symbols with some other code. The SAS
macro language can accomplish these tasks, but generally is used to control the generation
of the SAS language itself. In fact, the macro language often writes SAS language code.
Thus, entire applications can be developed with the macro language, which upon execu-
tion, generates SAS language that might be a mixture of macro commands, data steps, and
procedure steps.
4.1 Macro Variables

Probably, the easiest way to begin with the SAS macro language is to start with macro
variables. A macro variable holds an assigned value and when its reference is encountered,
the assigned value is substituted for the reference. In the macro language a macro variable
reference looks like &NAME where NAME is some variable name that you have assigned. The
assignment of a macro variable is often made through the %LET macro statement. Notice
that macro statement elements begin with a percent sign to dierentiate them from SAS
language statements. Let's look at a simple example. We will assume that the ier data
set has been read in and available.
15 %Let DSName=Flier;
16 Title2 "Listing of Data Set &DSNAME";
17 Title3 'Single Quotes for &DSNAME';
18 Proc Print Data=&DSName(Obs=10);
19 Run;
NOTE: The PROCEDURE PRINT printed page 1.
57
CHAPTER 4. THE MACRO LANGUAGE 58
The SAS Macro Language 1

Listing of Data Set Flier
Single Quotes for &DSNAME
1 7 21 74 2 7 3 0 5 3.5 1.4 57.0

2 1 9 76 2 3 2 0 1 5.3 3.0 0.0
3 12 18 74 2 4 1 0 5 5.4 3.4 83.2
4 2 15 76 2 1 1 0 5 6.0 5.7 111.0
5 12 14 75 2 1 1 0 5 6.1 6.6 109.0
6 12 18 74 2 4 1 0 4 6.2 5.8 104.5
7 9 13 75 2 2 1 0 5 6.4 6.7 122.8
8 12 14 75 2 1 1 0 5 6.4 7.6 118.4
9 12 14 75 2 1 1 0 5 6.5 6.8 130.6
10 2 15 76 2 1 1 0 5 6.8 8.1 118.0
The %LET statement has the name of the variable to receive the macro assignment, the equals
sign, then the value to be assigned to the macro variable. Note that the variable is written
without any special symbols (at least for now). Also note that the value is given without
any quotes. We'll address more complicated issues as we move along. In this particular
instance if we had several dierent data sets that we might print, it becomes easy to specify
a dierent data set and, at the same time, have its name included in the title. The type
of quotes used, however, are very important. You denitely must use double quotes if you
want the macro variable to resolve to its value. Single quotes will print the ampersand and
name as written.
Normally, there is nothing written on the SAS log to indicate what happened with respect
to the macro substitution. To see what actually was substituted we can change one of the
system options, SYMBOLGEN, on. Then upon re-running the code we see the following:
21 Options Symbolgen;
22 %Let DSName=Flier;
SYMBOLGEN: Macro variable DSNAME resolves to Flier
25 Run;
Now we can see what value was substituted for &DSNAME. Let's look at the following code.
27 %Let Somecode=%Str(Proc Print Data=Flier(Obs=10); Run;);
28 %Unquote(&Somecode);
SYMBOLGEN: Macro variable SOMECODE resolves to Proc Print
Data=Flier(Obs=10); Run;
SYMBOLGEN: Some characters in the above value which were subject to macro
quoting have been unquoted for printing.
Thus, macro variables can actually hold very complicated expressions. The macro quoting
function %STR() was used to quote the entire expression so that the internal parentheses
and semicolons would not cause any problems. However, when we reference the variable,
the value remains quoted, which means that its contents want be executed as normal SAS
language statements. To get rid of the special quoting, the macro function %UNQUOTE was
used.
Although these variables are quite powerful with respect to substitution, they have their
limits for writing reusable software. Macro procedures give us even more control of code
generation and variable substitution.
4.2 Macro Procedures

Macro procedures consist of a procedure denition and a body. The body can be \plain"
SAS language code, or can make use of the macro language. In fact, a number of the macro
statements are only permitted within the body of a macro. A macro procedure begins with
the %MACRO statement and ends with the %MEND statement.
Let's rewrite our earlier program as a sas macro named PRT.
30 %Macro PRT(DSName);
33 Run;
34 %Mend PRT;
35 %PRT(Flier)
Again, it is dicult to determine what is actually happening in the macro. The SYMBOLGEN
results do help but they appear out of place. This is because macros are actually compiled
before they are executed. This is also why the complexity of macro programming can be
greater within a macro procedure. You should notice that to perform the same task on a
dierent data set would only require another call to the procedure. For example
%PRT(Bluegill);
The MPRINT system option can be used to print out the SAS language statements generated
by the macro language. The code below turns on this option and turns o the SYMBOLGEN
option.
37 Options NoSymbolGen MPrint;
38 %PRT(Flier)
MPRINT(PRT): TITLE2 "Listing of Data Set Flier";
MPRINT(PRT): PROC PRINT DATA=FLIER(OBS=10);
MPRINT(PRT): RUN;
Not only did we get the macro variables resolved, but we also have them listed in the context
of the SAS language statements.
It is also possible to put optional arguments to a macro. This permits you to assign initial
values to these macro variables, which can be overwritten when the macro is called. The
example below makes the data set name optional.
40 %Macro PRT(DSName=Flier);
43 Run;
44 %Mend PRT;
45 %PRT;
MPRINT(PRT): TITLE2 "Listing of Data Set Flier";
MPRINT(PRT): PROC PRINT DATA=FLIER(OBS=10);
MPRINT(PRT): RUN;
Note that the macro was called without replacing the data set name. Had we wanted to
use the BLUEGILL data set, for example, we would have made the call
%PRT(DSNAME=BLUEGILL);
We can even include required and optional parameters in the same macro. The required or
positional parameters must be called in the same order as given in the %MACRO statement,
while the optional parameters can be given in any order following after the positional
parameters.
47 %Macro Sex(Sex,DSName=Flier);
48 Data &DSName&Sex;
49 Set &DSName;
50 Where (Sex=&Sex);
51 Run;
52 %Mend Sex;
Now we can call the new macro and specify which sex we would like to create the data set
for. Notice here the complicated name &DSName&Sex. This is actually two macro names
joined together. When executed, both macro names will be resolved then joined together
to form a single name. Let's call the macro and look at the results produced.
53 %Sex(1,DSNAME=Flier);
MPRINT(SEX): DATA FLIER1;
MPRINT(SEX): SET FLIER;
MPRINT(SEX): WHERE (SEX=1);
MPRINT(SEX): RUN;
54 %Sex(2);
MPRINT(SEX): RUN;
55 %Sex(3);
MPRINT(SEX): RUN;

56
57 Proc Datasets Library=Work;
-----Directory-----
Libref: WORK
Engine: V612
Physical Name: C:\SAS\SASWORK\#TD41921
# Name Memtype Indexes

----------------------------
1 FLIER DATA
2 FLIER1 DATA
3 FLIER2 DATA
4 FLIER3 DATA
5 SASMACR CATALOG
58 Run;
59 Quit;
The macro created each of the 3 data sets, one for each sex category. Now if we knew
that we would be doing this, there might be other ways to accomplish this with the macro
language. The macro language also has looping statements like the SAS language does.
61 %Macro Sex(DSName=Flier);
62 %Do Sex=1 %To 3;
63 Data &DSName&Sex;
64 Set &DSName;
65 Where (Sex=&Sex);
66 Run;
67 %End;
68 %Mend Sex;
which when called generates the code below.

69 %Sex;

MPRINT(SEX): RUN;

MPRINT(SEX): RUN;

MPRINT(SEX): RUN;

4.3 Bootstrap Example
In this section we will write a more complicated macro. This macro will compute a Monte
Carlo or bootstrap test for the equality of two means (mean weight of the ier sh of males
and females) using the ANOVA F statistic as the criterion. Under the null hypothesis, there
is only one population and so it doesn't matter which observations you classify as males
and which you classify as females. The algorithm will be the following.
1. Compute the test statistic for the original sample.
2. Select a simple random sample with replacement equal to the number of males in the
sample and label these observations as males.
3. Select a simple random sample with replacement equal to the number of females in
the sample and label these observations as females.
4. Concatenate the two samples together and compute the ANOVA test statistic testing
for dierences between the sexes.
5. Save the test statistic then start again at (2) until a large number of iterations have
been performed.
6. Rank order the test statistics from smallest to largest and record the rank of the
original test statistic.
7. If the original test statistic is within the largest 5% of these test statistics, then reject
the null hypothesis and claim that the weight means are dierent between the sexes.
The following macro was written to implement this basic scheme. You should be aware that
much more ecient schemes exist to complete this task. For example, one could generate
all of the Monte Carlo samples (iterations) in a single data step and use BY processing to
compute the ANOVAs BY SEX.
%Macro MonteT(MaxIter=99);
%* Get Rid of Unknowns First;
Data _NoUnk_;
Set Flier;
If Sex NE 3;
Run;
%* Compute Test Statistic for Observed Data;

Proc ANOVA Data=_NOUNK_ NoPrint OutStat=AllStats;
Class Sex;
Model Wt=Sex;
Run;
Quit;
%* Determine the Number of Males and Females;

%* We should preserve the sample sizes;
Proc Means Data=_NoUnk_ Noprint;
Class Sex;
Var Wt;
Output Out=Temp N=N;
Run;
%* Get The Results From Above Into Macro Variables;

Data _NULL_;
Set Temp;
Select (Sex);
When (1) Call Symput("Males",Put(N,5.));
When (2) Call Symput("Females",Put(N,5.));
Otherwise;
End;
Run;
%Put The Data Set Contains &Males Males.;
%Put The Data Set Contains &Females Females.;
%* This macro actually selects the sample;

%Macro TakeSamp(Sex,Number);
Sex=&Sex;
Do I=1 To &Number;
Obs=Floor(Ranuni(0)*N+1);
Set _NoUnk_(Drop=Sex) Point=Obs Nobs=N;
Output;
End;
%Mend TakeSamp;
%* This loop controls the sampling;

%Do Iter=1 %To &MaxIter;
Data BootSamp;
%TakeSamp(1,&Males);
%TakeSamp(2,&Females);
Stop;
Run;
%* Compute the test statistic based upon the pseudo-sample;

Proc ANOVA Data=BootSamp NoPrint OutStat=One;
Class Sex;
Model Wt=Sex;
Run;
Quit;
%* Append the results to the stats data set;

Proc Append Base=AllStats Data=One;
Run;
%End;
%* Print Out the Original Statistics;

Title2 "Listing of All ANOVA Statistics";
Proc Print Data=AllStats(Obs=10);
Run;
%* Keep Only The F Statistics For Sex;

Data AllStats;
Retain Iter -1;
Set AllStats;
If _Source_="SEX";
Iter+1;
Run;
%* Reorder The Runs By F Statistic Value;

Proc Sort Data=AllStats;
By F;
Run;
%* List In Increasing Order of F;
%* Find Iter 0 And See Where It Falls In Relation;
%* To The Other F Values.;
Title2 "Listing of Ordered ANOVA Results";
Title3 "Iter Number 0 Is From Original Analysis";
Proc Print Data=AllStats;
Run;
Data _NULL_;
Set AllStats;
If Iter=0 Then
Do;
MonteP=(_N_-1)/(&MaxIter+1);
Put "Monte Carlo P-value is " MonteP 6.4;
End;
Run;
%Mend MonteT;
Some of the rst code generated by the macro is the following.

171 %MonteT;
MPRINT(MONTET): DATA _NOUNK_;

MPRINT(MONTET): SET FLIER;
MPRINT(MONTET): IF SEX NE 3;
MPRINT(MONTET): RUN;
NOTE: The data set WORK._NOUNK_ has 646 observations and 11 variables.
MPRINT(MONTET): PROC ANOVA DATA=_NOUNK_ NOPRINT OUTSTAT=ALLSTATS;

MPRINT(MONTET): CLASS SEX;
MPRINT(MONTET): MODEL WT=SEX;
MPRINT(MONTET): QUIT;
NOTE: The data set WORK.ALLSTATS has 2 observations and 7 variables.
MPRINT(MONTET): PROC MEANS DATA=_NOUNK_ NOPRINT;

MPRINT(MONTET): VAR WT;
MPRINT(MONTET): OUTPUT OUT=TEMP N=N;
NOTE: The data set WORK.TEMP has 3 observations and 4 variables.
MPRINT(MONTET): DATA _NULL_;

MPRINT(MONTET): SET TEMP;
MPRINT(MONTET): SELECT (SEX);
MPRINT(MONTET): WHEN (1) CALL SYMPUT("Males",PUT(N,5.));
MPRINT(MONTET): WHEN (2) CALL SYMPUT("Females",PUT(N,5.));
MPRINT(MONTET): OTHERWISE;
MPRINT(MONTET): END;
The Data Set Contains 306 Males.

The Data Set Contains 340 Females.
MPRINT(MONTET): DATA BOOTSAMP;
MPRINT(TAKESAMP): SEX=1;
MPRINT(TAKESAMP): DO I=1 TO 306;
MPRINT(TAKESAMP): OBS=FLOOR(RANUNI(0)*N+1);
MPRINT(TAKESAMP): SET _NOUNK_(DROP=SEX) POINT=OBS NOBS=N;
MPRINT(TAKESAMP): OUTPUT;
MPRINT(TAKESAMP): END;
MPRINT(MONTET): ;
MPRINT(TAKESAMP): SEX=2;
MPRINT(TAKESAMP): DO I=1 TO 340;
MPRINT(TAKESAMP): OBS=FLOOR(RANUNI(0)*N+1);
MPRINT(TAKESAMP): SET _NOUNK_(DROP=SEX) POINT=OBS NOBS=N;
MPRINT(TAKESAMP): OUTPUT;
MPRINT(TAKESAMP): END;
MPRINT(MONTET): ;
MPRINT(MONTET): STOP;
NOTE: The data set WORK.BOOTSAMP has 646 observations and 12 variables.
MPRINT(MONTET): PROC ANOVA DATA=BOOTSAMP NOPRINT OUTSTAT=ONE;

MPRINT(MONTET): MODEL WT=SEX;
MPRINT(MONTET): QUIT;
NOTE: The data set WORK.ONE has 2 observations and 7 variables.
MPRINT(MONTET): PROC APPEND BASE=ALLSTATS DATA=ONE;

NOTE: Appending WORK.ONE to WORK.ALLSTATS.

NOTE: 2 observations added.
The last 3 steps will continue until all iterations have been completed. Once completed the
code generated will be.
MPRINT(MONTET): TITLE2 "Listing of All ANOVA Statistics";
MPRINT(MONTET): PROC PRINT DATA=ALLSTATS(OBS=10);
MPRINT(MONTET): DATA ALLSTATS;

MPRINT(MONTET): RETAIN ITER -1;
MPRINT(MONTET): SET ALLSTATS;
MPRINT(MONTET): IF _SOURCE_="SEX";
MPRINT(MONTET): ITER+1;
MPRINT(MONTET): PROC SORT DATA=ALLSTATS;

MPRINT(MONTET): BY F;
MPRINT(MONTET): TITLE2 "Listing of Ordered ANOVA Results";

MPRINT(MONTET): TITLE3 "Iter Number 0 Is From Original Analysis";
MPRINT(MONTET): PROC PRINT DATA=ALLSTATS;
NOTE: The PROCEDURE PRINT printed pages 8-10.
MPRINT(MONTET): DATA _NULL_;

MPRINT(MONTET): SET ALLSTATS;
MPRINT(MONTET): IF ITER=0 THEN DO;
MPRINT(MONTET): 1-MONTEP=(_N_-1)/(99+1);
MPRINT(MONTET): PUT "Monte Carlo P-value is " MONTEP 6.4;
MPRINT(MONTET): END;
Monte Carlo P-value is 0.0800
The nal page of the output produced had the following observations, including the original
analysis results (ITER=0),
OBS ITER _NAME_ _SOURCE_ _TYPE_ DF SS F PROB
80 56 WT SEX ANOVA 1 933.89 1.56952 0.21073

81 76 WT SEX ANOVA 1 1063.30 1.63018 0.20214
82 79 WT SEX ANOVA 1 999.67 1.82098 0.17767
83 27 WT SEX ANOVA 1 1281.41 2.03846 0.15385
84 15 WT SEX ANOVA 1 1452.89 2.28557 0.13107
85 8 WT SEX ANOVA 1 1488.45 2.55722 0.11028
86 94 WT SEX ANOVA 1 1658.02 2.69699 0.10103
87 13 WT SEX ANOVA 1 1437.03 2.73300 0.09878
88 4 WT SEX ANOVA 1 1690.12 2.80394 0.09452
89 18 WT SEX ANOVA 1 1827.85 3.08179 0.07965
90 51 WT SEX ANOVA 1 1857.69 3.10496 0.07853
91 5 WT SEX ANOVA 1 1817.26 3.33083 0.06846
92 63 WT SEX ANOVA 1 1926.93 3.53462 0.06055
93 0 WT SEX ANOVA 1 2318.17 3.83827 0.05053
94 26 WT SEX ANOVA 1 2148.51 3.85926 0.04990
95 87 WT SEX ANOVA 1 2675.15 4.32986 0.03784
96 43 WT SEX ANOVA 1 2799.96 4.38115 0.03673
97 6 WT SEX ANOVA 1 3672.40 6.04351 0.01422
98 24 WT SEX ANOVA 1 3720.97 6.30796 0.01226
99 83 WT SEX ANOVA 1 4204.62 8.06348 0.00466
100 14 WT SEX ANOVA 1 5745.28 8.91242 0.00294
We get a very similar answer as to what was obtained by the original ANOVA analysis.
Many more simulations and we would have obtained a more precise estimate of the sampling
distribution of the test statistic. It does take a while for the macro to execute. Its speed can
be increased by suppressing some of the printed output. For example, suppress the macro
printed output along with the notes.
Options NoNotes NoMPrint NoSymbolgen;
4.4 Cluster Dendrogram

There are many SAS macros that have been written by users, as well as by SAS Institute
sta, to ll \holes" in the collection of SAS procedures. For example, the dendrogram
generated by PROC TREE is not particularly readable and is very frustrating to users that
are already familiar with a dendrogram. The dendro macro, available at the SAS WWW
site, uses SAS/GRAPH to produce a high resolution dendrogram. The following program
calls this macro and produces the attached graph.
*---------------------------------------------------------*
| Average Linkage Cluster Analysis. |
| Analysis of a subset of the 1980 Uniform Crime Report |
| data base. Only cities with populations between 100,000|
| and 249,999 are used. The largest of these per state is|
| selected for clustering. Reported crimes are |
| standardized to total offenses for each city. |
*---------------------------------------------------------*;
Options PS=55 LS=78 NoDate NoNumber;
Proc Format;
Value State
1="ALABAMA" 2="ARIZONA" 3="ARKANSAS"
4="CALIFORNIA" 5="COLORADO" 6="CONNECTICUT"
7="DELAWARE" 8="WASHINGTON, D.C." 9="FLORIDA"
10="GEORGIA" 11="IDAHO" 12="ILLINOIS"
13="INDIANA" 14="IOWA" 15="KANSAS"
16="KENTUCKY" 17="LOUISIANA" 18="MAINE"
19="MARYLAND" 20="MASSACHUSETTS" 21="MICHIGAN"
22="MINNESOTA" 23="MISSISSIPPI" 24="MISSOURI"
25="MONTANA" 26="NEBRASKA" 27="NEVADA"
28="NEW HAMPSHIRE" 29="NEW JERSEY" 30="NEW MEXICO"
31="NEW YORK" 32="NORTH CAROLINA" 33="NORTH DAKOTA"
34="OHIO" 35="OKLAHOMA" 36="OREGON"
37="PENNSYLVANIA" 38="RHODE ISLAND" 39="SOUTH CAROLINA"
40="SOUTH DAKOTA" 41="TENNESSEE" 42="TEXAS"
43="UTAH" 44="VERMONT" 45="VIRGINIA"
46="WASHINGTON" 47="WEST VIRGINIA" 48="WISCONSIN"
49="WYOMING" 50="ALASKA" 51="HAWAII";
Run;
Title1 'Reported Offenses from the 1980 Uniform Crime Report';

Title2 'for Moderate Sized Cities';
Data Crime;
Input state division citysize
murder manslght rape robbery assault burglary larceny motor total;
Array crimes{8} murder manslght rape robbery assault burglary larceny motor;
Do i=1 to 8; /* standardize the reported crimes */
Crimes{i}=crimes{i}/total*100;
End;
Drop i;
Label manslght='manslaughter';
Format state state.;
Datalines;
1 6 58040 54 0 144 956 2270 7130 10189 1094 21837
1 6 58730 37 1 58 306 1015 3671 7597 663 13348
1 6 41840 28 0 69 276 1781 4161 7491 827 14633
2 8 56210 10 0 62 193 1273 2773 7947 567 12825
--- Data Deleted Here ---
46 9 85450 11 0 127 409 1521 4110 10278 869 17325
48 3 52470 4 0 76 244 1136 3646 10125 590 15821
50 9 1900 15 0 117 296 1581 2611 7322 1055 12997
;
Proc Sort;
By state citysize;
Run;
/* Now keep the largest city from each state */

Data crime;
Set crime;
By state;
If first.state;
Drop citysize;
Run;
Proc print;
Format murder manslght rape robbery assault burglary larceny motor 6.2;
Run;
Proc Cluster Data=crime Method=Average Outtree=Tree;

Var murder manslght rape robbery assault burglary larceny motor;
Id state;
Run;
%EPS; /* my own macro to turn on encapsulated postscript */
%Include "dendro.sas";
%dendro;
Quit;
for Moderate Sized Cities

Chapter 5
SAS Special Files

There are several les that are useful to be aware of and to make use of. The rst is the
AUTOEXEC.SAS le.
5.1 Autoexec.sas
The AUTOEXEC.SAS le is typically stored in a user's home directory or in the main SAS
subdirectory. Once the SAS system initializes, it will, by default, read and execute the
statements contained in the AUTOEXEC.SAS le. This makes it a very convenient way to
set up your environment with the basic settings that you might like. You can dene your
graphics drivers here as well as printout size, and other options. In developing applications,
it can be a way to autostart a program for persons that are non-programmers, but need to
use a pre-built SAS application.
Below is an example AUTOEXEC.SAS le that I use on my Unix system. Some parts is use
frequently and others much less frequently.
*************************************************;
* AUTOEXEC.SAS *;
*************************************************;
%Macro CGM;
%************************************************;
%* CGM options are CGMFRMA - Monochrome *;
%* CGMFRGA - Gray Scale *;
%* CGMFRCA - Color *;
%************************************************;
Filename GSASFile '/tmp/sas.cgm';
GOptions Device=CGMFRMA GAccess=GSASFile GSFMode=Replace
FText=HWCGM010 HText=1 CText=Black
FTitle=HWCGM010 HTitle=1 CTitle=Black;
%* FText = HWCGM010 is for Helvetica;
%* Set FText to HWCGM001 For SansSerif Font;
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: CGMFRMA);
%Put %Str(NOTE: Writing Graphics Output To: /tmp/sas.cgm);
%Put %Str( );
%Mend CGM;
70
CHAPTER 5. SAS SPECIAL FILES 71
%Macro CGMX;
%************************************************;
%* Display device is XCOLOR - Then print to CGM *;
%* CGM options are CGMFRMA - Monochrome *;
%* CGMFRGA - Gray Scale *;
%* CGMFRCA - Color *;
%************************************************;
Filename GSASFile '/tmp/sas.cgm';
GOptions Device=XColor TargetDevice=CGMFRMA GAccess=GSASFile
GSFMode=Replace FText=HWCGM010 HText=1 CText=Black;
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: XCOLOR);
%Put %Str(NOTE: Target Device is: CGMFRMA);
%Put %Str(NOTE: Writing Target Output To: /tmp/sas.cgm);
%Put %Str( );
%Mend CGMX;
%Macro EPS;
%************************************************;
%* EPS -Generate Encapsulated Postscript Output.*;
%************************************************;
FILENAME GSASFile "graph.eps";
GOPTIONS Device=PSEPSF TargetDevice=PSEPSF
CBack=White Colors=(Black)
GAccess=GSASFile NoPrompt GSFMode=Replace;
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: PSEPSF);
%Put %Str(Note: Graphics Output to: graph.eps);
%Put %Str( );
%Mend EPS;
%Macro EPSX;
%************************************************;
%* EPSX-Generate Encapsulated Postscript Output.*;
%************************************************;
FILENAME GSASFile "graph.eps";
GOPTIONS Device=XCOLOR TargetDevice=PSEPSF
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: XCOLOR);
%Put %Str(Note: Graphics Output to: graph.eps);
%Put %Str( );
%Mend EPSX;
%Macro PS;
%************************************************;
%* PS - Generate Postscript Output For Printing.*;
%************************************************;
FILENAME GSASFile pipe 'lpr -Pps -h';
GOPTIONS Device=PS1200 TargetDevice=PS1200
GProlog='252150532D41646F62652D0D0A'x
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: PS1200);
%Put %Str( );
%Mend PS;
%Macro PSX;
%************************************************;
%************************************************;
FILENAME GSASFile pipe 'lpr -Pps -h';
GOPTIONS Device=XColor
TargetDevice=PS1200 CBack=White Colors=(Black)
GProlog='252150532D41646F62652D0D0A'x
%Put %Str( );
%Put %Str(Note: Graphics Device is: XCOLOR);
%Put %Str(NOTE: Hardcopy Graphics Device is: PS1200);
%Put %Str( );
%Mend PSX;
%Macro PSCX;
%************************************************;
%************************************************;
FILENAME GSASFile pipe 'lpr -Ppsc0 -J/nff/nb';
GOPTIONS Device=XColor
TargetDevice=PSCOLOR CBack=White
GProlog='252150532D41646F62652D0D0A'x
%Put %Str( );
%Put %Str(Note: Graphics Device is: XCOLOR);
%Put %Str(NOTE: Hardcopy Graphics Device is: PSCOLOR);
%Put %Str( );
%Mend PSCX;
%Macro HP;
%************************************************;
%* HP - Generate HPLJS3 Output For Printing. *;
%************************************************;
FILENAME GSASFile pipe 'lpr -Php1 -J/nff/nb';
GOPTIONS Device=HPLJS3 GAccess=GSASFile NoPrompt GSFMode=Append;
%Put %Str( );
%Put %Str(NOTE: Graphics Device is: HPLJS3);
%Put %Str( );
%Mend HP;
Options /* Forms=SASUSER.PROFILE.DEFAULT.FORM */ LPTYPE=BSD;

Libname Insight "/usr/lpp/sas612/samples/insight";
The macros make it easy for me to assign the graphics devices that I want to use. For in-
corporating graphics into FrameMaker I use the %CGM and %CGMX macros. For incorporating
graphics into LATEXI use %EPS and %EPSX. While for direct printing I might use %PS.
5.2 Cong.sas

The CONFIG.SAS le contains the conguration data that a user might customize, such as
the locations of the SAS system les and memory allocations. It also species where the
SAS WORK and SASUSER libraries are to be located. On networked systems, this can be very
useful for dening independent congurations for each user.
An example CONFIG.SAS le is shown below.
/*
* This file, config.sas, holds default configuration options
* for the SAS System. These options are overridden by options on the
* command line, or options specified through the SAS612_OPTIONS
* environment variable.
*
*/
/*
* -maps specifies the pathname of the map datasets used by PROC GMAP.
*/
-maps !SASROOT/maps
/*
* -msg specifies the directory where the SAS System will search
* for the files containing the text for all error messages
* and notes. These messages are stored in an internal format.
*/
-msg !SASROOT/sasmsg
/*
* -news specifies the name of a text file that will automatically
* be displayed in the log when SAS is invoked.
*/
-news !SASROOT/misc/base/news
/*
* -sasautos establishes the path(s) to director(ies) for automatic
* macro definitions to be searched by the macro facility when
* an unknown macro is referenced.
*/
-sasautos !SASROOT/sasautos
/*
* -sashelp specifies the pathname for the directory containing on-line
* help and menu screens for the SAS System.
*/
-sashelp !SASROOT/sashelp
/*
* -sasuser specifies the pathname for the directory used by the SAS
* System as a default place to store files, such as the SAS
* user profile catalog. See your SAS System documentation
* for more information.
*/
-sasuser ~/sasuser
/*
* -work specifies where to create the SAS work library. This
* library is temporary and any SAS data sets created there
* will be deleted when the system terminates. The unique name
* 'SAS_workANNNN' is assigned to each SAS work library. 'A' is
* some letter and 'NNNN' is the hexadecimal representation of the
* process ID of the SAS process.
*/
-work /tmp
/*
* -sasscript specifies the location to search for SAS/CONNECT scripts.
*/
-sasscript !SASROOT/misc/connect
/*
* -dms specifies that you are running SAS in Display Manager mode.
*/
-dms
/*
*
* -memsize limits the amount of memory that will be allocated by the
* SAS System.
*/
-memsize 32m
/*
*
* -sortsize limits the amount of memory that will be allocated during
* sorting operations.
*/
-sortsize 16m
/*
* Default windowing system to use.
*/
-fsdevice x11
/*
* -helpenv specifies that native help should be used when help is
* invoked.
*/
-helpenv helplus
/*
* -helploc specifies the location of the native help files for helplus.
* You would have to specify '-helpenv helplus' to use those files.
*/
-helploc !SASROOT/X11/native_help
/*
* -samploc specifies the location of the sample files for helplus.
*/
-samploc !SASROOT/X11/native_help
/*
* -path specifies the search path that the SAS System will use
* to find the dynamically loaded modules. Each -path
* specification indicates one directory. They will be
* searched in the order in which they are given.
*/
-path !SASROOT/sasexe/base
-path !SASROOT/sasexe/graph
-path !SASROOT/sasexe/stat
-path !SASROOT/sasexe/fsp
-path !SASROOT/sasexe/af
-path !SASROOT/sasexe/insight
-path !SASROOT/sasexe/ets
-path !SASROOT/sasexe/eis
-path !SASROOT/sasexe/iml
-path !SASROOT/sasexe/connect
-path !SASROOT/sasexe/or
-path !SASROOT/sasexe/qc
-path !SASROOT/sasexe/dbi
-path !SASROOT/sasexe/english
-path !SASROOT/sasexe/fsc
-path !SASROOT/sasexe/gis
-path !SASROOT/sasexe/image
-path !SASROOT/sasexe/lab
-path !SASROOT/sasexe/nvi
-path !SASROOT/sasexe/pub
-path !SASROOT/sasexe/share
-path !SASROOT/sasexe/trader
-path !SASROOT/sasexe/toolkit
-path !SASROOT/sasexe/spectraview
-path !SASROOT/sasexe/unixdb
-path !SASROOT/sasexe/mddbserver
5.3 Prole.sct

The PROFILE.SCT is a SAS catalog that contains lots of information about your SAS ses-
sion. It will contain your keyboard mapping settings, your window color settings, printer
denition information, and lots of other information of this sort. Typically, its values are
modied through the various utilities and pull-down menus available in the SAS display
manager.
Chapter 6
SAS Internet Tools

SAS Institute is developing many products to WEB-enable the SAS system. The products
enable the system to, for example, produce HTML output. That is, you can read the SAS
output with a web browser. Other products permit the use of HTML forms to interact with
SAS data bases. Thus you can issue data base queries against a SAS data set where the
commands are obtained from an HTML form. Other products allow an HTML document
to invoke and run commands on the local SAS system. This would permit development of
web applications that would actually be executed from within SAS.
6.1 Capturing OUTPUT for the Web

The easiest way to get started with a web-enabled SAS system is to use the macros for
generating HTML output. In the current versions (6.11 and 6.12) the macros must rst
be installed. The necessary les can be obtained from the SAS Institute WWW server
at the URL http://www.sas.com/rnd/web/intro.html. Once installed it is a relatively
straightforward process to calling the OUT2HTM() macro to turn on the capturing of the
output (or log) and calling it again to turn o the capturing and to generate the HTML
code.
An example of capturing a printout of the ier data set is given below. The FORMCHAR option
is reset to the values below so that the line plots and tables look properly constructed.
76
CHAPTER 6. SAS INTERNET TOOLS 77
Options PS=55 LS=78 PageNo=1 NoDate;
Options Formchar="|----|+|---+=|-/\<>*";
Title1;
Data Flier;
Infile "c:\projects\alaska97\flierdat.csv" Delimiter="," Firstobs=2;
Input Mo Day Yr Ar St Sex Age Sn Lt Wt TSL
Size1 Size2 Size3 Size4 Size5 Size6 Edge No;
Run;
%out2htm(capture=on);
%*Title1 "<center><font face=arial size=2 color=red>Flier Data</center>";

Title1 "Flier Data";
Proc Print Data=Flier Uniform;

Run;
Proc Plot Data=Flier;

Plot Wt*Lt;
Run; Quit;
%out2htm(capture=off,proploc=sasuser.htmlgen.outprop.slist,
encode=n,htmlfile=flier.htm);
Note the use of HTML codes embedded within the TITLE statement so as to modify the de-
fault settings. The rst call to OUT2HTM turns capturing on. The second call ends capturing,
species the properties catalog that will control the appearance of the HTML code, and also
species the output le to contain the results. It is also possible to specify modications to
the properties directly in the OUT2HTM macro as below
%out2htm(capture=off,encode=n,dface=COURIER,hface=COURIER,htag=PREFORMATTED TEXT,
htmlfile=flier.htm,ttag=NO FORMATTING);
The DFACE variable controls the typeface for the data, the HFACE variable controls the type-
face for the headings, HTAG controls the tag type(s) that will be used around the headings,
and the TTAG controls the tag type(s) around the titles. The ENCODE variable determines
whether or not the angle brackets (greater than and less than signs) are encoded so as to
print in HTML as angle brackets. To capture the SAS log, use the option WINDOW=LOG. To
append HTML output to an existing le, specify OPENMODE=APPEND. There are many other
options that are possible.
For the settings that I used (xed-width fonts and black color), the following page was
observed in my browser.
Since a properties catalog can be constructed with the properties information, once you
have found the properties you like for a particular type of report, you may want to enter
them into a properties catalog. An easy way to do this is interactively. From within the
display manager, issue the following macro call
%out2htm(runmode=I);
This will bring up a dialog box within which you can specify the properties catalog to use.
Then select the properties button on the dialog box and make whatever modications that
you have decided upon. The rst dialog box that you encounter looks like the following.
Click on the properties button to move to the next window.

Then select the TEXT tab to get to the html denitions for the text.
From this dialog box you can update the properties very easily then save them away. Later
when you wish to assign those properties to the SAS output or log, simply refer to this
properties list and you'll not have to make a long list of properties in the macro call.

Advanced SAS Programming Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced SAS Programming Techniques

Uploaded by

Copyright:

Available Formats

Advanced SAS Programming

September 29-October 3, 1997

3 Working With Files 38

4 The Macro Language 57

5 SAS Special Files 70

6 SAS Internet Tools 76

The DATA Step

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00

2.1.2 RETURN, DELETE, and OUTPUT

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

Title3 "TSL Not Measured";

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

2.1.3 Compound Statements

Title3 "TSL Not Measured";

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

2.1.4 Data Set Options

KEEP= Speci es a list of variables to be accessible from an existing data set.

Title3 "TSL Not Measured";

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL CONVFACT

1 7 21 74 2 7 3 0 5 3.5 1.4 57.00 0.061404

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

2.2.1 List Input

Proc Print Data=One;

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

1 7 21 74 2 7 3 0.0 5.0 3.5 1 9

Data Step Examples

OBS YEAR AGE3 AGE4 AGE5 AGE6 AGE7 AGE8 AGE9

1 1975 0 105 674 446 16 2 2

Proc Print Data=One;

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

2.2.2 Column Input

Proc Print Data=One;

Data Step Examples

OBS MO DAY YR AR ST SEX AGE SN LT WT TSL

2.2.3 Pointer Control and Formatted Input

Proc Print Data=One;

Data Step Examples

OBS NAME MO DAY YR AR ST SEX AGE SN LT WT TSL

1 Bill Smith 7 21 74 2 7 3 0 5 3.5 . .

Example: Multiple observations per input line

Data Step Examples

Example: Conditional input

Data Step Examples

1 04/06/97 3.5 1.4

KEEP= Species a list of variables to be accessible from an existing data set.

EXP(value) returns the constant e raised to the power given by value.

INT(value) returns the integer part of a real number.

LOG(value) returns the natural logarithm of value.

LOG10(value) returns the base 10 logarithm of value.

MOD(value,divisor) returns the integer remainder when value is divided by divisor.

SQRT(value) returns the square root of value.

RANPOI(seed,lambda) returns a Poisson random variate from the Poisson distribution

UNIFORM(seed) is the same as RANUNI(seed).

WEEKDAY(date) returns an integer from 1 to 7, 1=Sunday, 7=Saturday, corresponding