You are on page 1of 7

10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with

g with Base SAS(R)…

Previous Page | Next Page

How the DATA Step Works: A Basic Introduction

Overview of the DATA Step

The DATA step consists of a group of SAS statements that begins with a DATA statement. The DATA statement begins
the process of building a SAS data set and names the data set. The statements that make up the DATA step are
compiled, and the syntax is checked. If the syntax is correct, then the statements are executed. In its simplest form, the
DATA step is a loop with an automatic output and return action. The following figure illustrates the flow of action in a
typical DATA step.

Flow of Action in a Typical DATA Step

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 1/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…

During the Compile Phase

When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is,
automatically translates the statements into machine code. SAS further processes the code, and creates the following
three items:
input buffer is a logical area in memory into which SAS reads each record of data from a raw data file when the
program executes. (When SAS reads from a SAS data set, however, the data is written directly to the
program data vector.)
program is a logical area of memory where SAS builds a data set, one observation at a time. When a program
data vector executes, SAS reads data values from the input buffer or creates them by executing SAS language
support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 2/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…

statements. SAS assigns the values to the appropriate variables in the program data vector. From here,
SAS writes the values to a SAS data set as a single observation.
The program data vector also contains two automatic variables, _N_ and _ERROR_. The _N_ variable
counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the
occurrence of an error caused by the data during execution. These automatic variables are not written to
the output data set.
descriptor is information about each SAS data set, including data set attributes and variable attributes. SAS creates
information and maintains the descriptor information.

During the Execution Phase

All executable statements in the DATA step are executed once for each iteration. If your input file contains raw data, then
SAS reads a record into the input buffer. SAS then reads the values in the input buffer and assigns the values to the
appropriate variables in the program data vector. SAS also calculates values for variables created by program
statements, and writes these values to the program data vector. When the program reaches the end of the DATA step,
three actions occur by default that make using the SAS language different from using most other programming
languages:

1. SAS writes the current observation from the program data vector to the data set.

2. The program loops back to the top of the DATA step.

3. Variables in the program data vector are reset to missing values.

Note: The following exceptions apply:

Variables that you specify in a RETAIN statement are not reset to missing values.

The automatic variables _N_ and _ERROR_ are not reset to missing.
For information about the RETAIN statement, see Using a Value in a Later Observation.

If there is another record to read, then the program executes again. SAS builds the second observation, and continues
until there are no more records to read. The data set is then closed, and SAS goes on to the next DATA or PROC step.

Example of a DATA Step

The DATA Step

The following simple DATA step produces a SAS data set from the data collected for a health and fitness club. As
discussed earlier, the input data contains each participant's identification number, name, team name, and weight at the
beginning and end of a 16-week weight program:

data weight_club; 1
input IdNumber 1-4 Name $ 6-24 Team $ StartWeight EndWeight; 2
Loss = StartWeight - EndWeight; 3

datalines; 4
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 3/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
1095 Susan Stewart blue 135 127
1157 Rosa Gomez green 155 141
1331 Jason Schock blue 187 172
1067 Kanoko Nagasaka green 135 122
1251 Richard Rose blue 181 166
1333 Li-Hwa Lee green 141 129
1192 Charlene Armstrong yellow 152 139
1352 Bette Long green 156 137
1262 Yao Chen blue 196 180
1087 Kim Sikorski red 148 135
1124 Adrienne Fink green 156 142
1197 Lynne Overby red 138 125
1133 John VanMeter blue 180 167
1036 Becky Redding green 135 123
1057 Margie Vanhoy yellow 146 132
1328 Hisashi Ito red 155 142
1243 Deanna Hicks blue 134 122
1177 Holly Choate red 141 130
1259 Raoul Sanchez green 189 172
1017 Jennifer Brooks blue 138 127
1099 Asha Garg yellow 148 132
1329 Larry Goss yellow 188 174
; 4

The Statements

The following list corresponds to the numbered items in the preceding program:
The DATA statement begins the DATA step and names the data set that is being created.

The INPUT statement creates five variables, indicates how SAS reads the values from the input buffer, and
assigns the values to variables in the program data vector.

The assignment statement creates an additional variable called Loss, calculates the value of Loss during each
iteration of the DATA step, and writes the value to the program data vector.

The DATALINES statement marks the beginning of the input data. The single semicolon marks the end of the
input data and the DATA step.
Note: A DATA step that does not contain a DATALINES statement must end with a RUN statement.

The Process

When you submit a DATA step for execution, SAS automatically compiles the DATA step and then executes it. At compile
time, SAS creates the input buffer, program data vector, and descriptor information for the data set WEIGHT_CLUB. As
the following figure shows, the program data vector contains the variables that are named in the INPUT statement, as
well as the variable Loss. The values of the _N_ and the _ERROR_ variables are automatically generated for every
DATA step. The _N_ automatic variable represents the number of times that the DATA step has iterated. The _ERROR_
automatic variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors
exist. These automatic variables are not written to the output data set.

All variable values, except _N_ and _ERROR_, are initially set to missing. Note that missing numeric values are
represented by a period, and missing character values are represented by a blank.

Variable Values Initially Set to Missing

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 4/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…

The syntax is correct, so the DATA step executes. As the following figure illustrates, the INPUT statement causes SAS to
read the first record of raw data into the input buffer. Then, according to the instructions in the INPUT statement, SAS
reads the data values in the input buffer and assigns them to variables in the program data vector.

Values Assigned to Variables by the INPUT Statement

When SAS assigns values to all variables that are listed in the INPUT statement, SAS executes the next statement in the
program:

Loss = StartWeight - EndWeight;

This assignment statement calculates the value for the variable Loss and writes that value to the program data vector, as
the following figure shows.

Value Computed and Assigned to the Variable Loss

SAS has now reached the end of the DATA step, and the program automatically does the following:

writes the first observation to the data set

loops back to the top of the DATA step to begin the next iteration

increments the _N_ automatic variable by 1

resets the _ERROR_ automatic variable to 0

except for _N_ and _ERROR_, sets variable values in the program data vector to missing values, as the following
figure shows

Values Set to Missing

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 5/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…

Execution continues. The INPUT statement looks for another record to read. If there are no more records, then SAS
closes the data set and the system goes on to the next DATA or PROC step. In this example, however, more records
exist and the INPUT statement reads the second record into the input buffer, as the following figure shows.

Second Record Is Read into the Input Buffer

The following figure shows that SAS assigned values to the variables in the program data vector and calculated the value
for the variable Loss, building the second observation just as it did the first one.

Results of Second Iteration of the DATA Step

This entire process continues until SAS detects the end of the file. The DATA step iterates as many times as there are
records to read. Then SAS closes the data set WEIGHT_CLUB, and SAS looks for the beginning of the next DATA or
PROC step.

Now that SAS has transformed the collected data from raw data into a SAS data set, it can be processed by a SAS
procedure. The following output, produced with the PRINT procedure, shows the data set that has just been created.

proc print data=weight_club;


title 'Fitness Center Weight Club';
run;

PROC PRINT Output of the WEIGHT_CLUB Data Set

Fitness Center Weight Club 1

Id Start End
Obs Number Name Team Weight Weight Loss

1 1023 David Shaw red 189 165 24


2 1049 Amelia Serrano yellow 145 124 21
3 1219 Alan Nance red 210 192 18
4 1246 Ravi Sinha yellow 194 177 17

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 6/7
10/14/2019 Introduction to DATA Step Processing: How the DATA Step Works: A Basic Introduction :: Step-by-Step Programming with Base SAS(R)…
5 1078 Ashley McKnight red 127 118 9
6 1221 Jim Brown yellow 220 . .
7 1095 Susan Stewart blue 135 127 8
8 1157 Rosa Gomez green 155 141 14
9 1331 Jason Schock blue 187 172 15
10 1067 Kanoko Nagasaka green 135 122 13
11 1251 Richard Rose blue 181 166 15
12 1333 Li-Hwa Lee green 141 129 12
13 1192 Charlene Armstrong yellow 152 139 13
14 1352 Bette Long green 156 137 19
15 1262 Yao Chen blue 196 180 16
16 1087 Kim Sikorski red 148 135 13
17 1124 Adrienne Fink green 156 142 14
18 1197 Lynne Overby red 138 125 13
19 1133 John VanMeter blue 180 167 13
20 1036 Becky Redding green 135 123 12
21 1057 Margie Vanhoy yellow 146 132 14
22 1328 Hisashi Ito red 155 142 13
23 1243 Deanna Hicks blue 134 122 12
24 1177 Holly Choate red 141 130 11
25 1259 Raoul Sanchez green 189 172 17
26 1017 Jennifer Brooks blue 138 127 11
27 1099 Asha Garg yellow 148 132 16
28 1329 Larry Goss yellow 188 174 14

Previous Page | Next Page | Top of Page

support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm 7/7

You might also like