You are on page 1of 28

SASTechies

info@sastechies.com
http://www.sastechies.com
 You can use a DATA step to read raw data
into a SAS data set from multiple sources;

◦ Instream data – Cards / datalines / input
◦ External file – Infile / Input
◦ DBMS – SAS Access to DBMS (Oracle/SQL Server
etc.)

SAS Techies 2009 11/13/09 2
To read the raw data file,
the DATA step must
give the following
instructions to the SAS
System:
◦ reference the external
Filename fileref text file to be read
“C:\Temp\some.txt” ◦ name the SAS data set
Data readdata; ◦ identify the external file
Infile fileref; ◦ describe the data values
Input var1 $ var2; to be read.

SAS Techies 2009 11/13/09 3
 During the compilation phase, each
statement is scanned for syntax errors. Most
syntax errors prevent further processing of
the DATA step.

 If the DATA step compiles successfully, then
the execution phase begins. A DATA step
executes once for each observation in the
input data set, unless otherwise directed.

SAS Techies 2009 11/13/09 4
 Input buffer, an area of memory, is created to hold a
record from the external file. It’s a logical concept

Note: The input buffer is created only when raw data is
read, not when a SAS data set is read.

 Then the PDV is created. The program data vector is the
area of memory where SAS software builds a data set,
one observation at a time.

SAS Techies 2009 11/13/09 5
 Program
Data Vector
(PDV), a
logical
framework
that the SAS
System uses
when creating
SAS data sets.

SAS Techies 2009 11/13/09 6
 During the compilation phase, SAS software
also scans each statement in the DATA step,
looking for syntax errors. Syntax errors
include:
• missing or misspelled keywords
• invalid variable names
• missing or invalid punctuation
• invalid options.

Variable attributes such as length and type are
determined the first time that a variable is
encountered.

SAS Techies 2009 11/13/09 7
data perm.update;
infile invent;
input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25;
Total=instock+backord;
run;
Data Set Descriptor
Data Set Name: PERM.UPDATE
Member Type: DATA
Engine: V8
Created: 11:25 Friday, August
7, 1998
Observations: 0
Variables: 5
Indexes: 0
Observation Length: 30

The attributes of Total are
determined by the expression in
the statement.

SAS Techies 2009 11/13/09 8
 During execution, each observation in the input data set is
processed, stored in the PDV, and then written to the new
data set as an observation, unless otherwise directed.
 The DATA step executes once for each observation in the
input data set, unless otherwise directed.
 At the beginning of the execution phase, the value of _N_ is
1. Because there are no data errors, the value of _ERROR_ is
0.
The remaining variables are initialized to missing.

 Next, the INFILE statement identifies the location of the raw
data.

SAS Techies 2009 11/13/09 9
When an INPUT statement begins to read data
values from a record, it uses an input pointer to keep track of
its position.
data perm.update; >----+----1----+----2----+V
Raw Data File Invent

infile invent;  Bird Feeder LG088   3   20 •
input Item $ 1-13 IDnum $ 15-19
Instock 21-22 BackOrd 24-25;  6 Glass Mugs SB082  6   12  
Total=instock+backord;  Glass Tray  BQ049 12    6  
run;  Padded Hangrs  MN256 15   20  
 Jewelry Box  AJ498 23    0  
 Red Apron  AQ072  9   12  
 Crystal Vase  AQ672 27    0  
 Picnic Basket  LS930 21    0  
 Brass Clock  AN910  2   10  

At the end of the DATA step, three default actions occur.
First, the record is dumped to the SAS dataset from the PDV

SAS Techies 2009 11/13/09 10
Next, control returns to the top of the DATA step. Then the variable
values in the program data vector are reset to missing.

Item         IDnum InStock BackOrd Total
Bird Feeder  LG088       3      20    23

SAS Dataset

 When reading raw data, SAS software sets the value of each variable
in the DATA step to missing at the beginning of each iteration, with
these exceptions:
 variables named in a RETAIN statement
variables created in a sum statement
data elements in a _TEMPORARY_ array
any variables created with options in the FILE or INFILE
statements
automatic variables.

SAS Techies 2009 11/13/09 11
 The execution phase continues in this manner until there
are no more records in the raw data file to be read and
the data portion of the new data set is complete

 At the end of the execution phase, the SAS log confirms
that the raw data file was read and displays the number
of observations and variables in the data set.

SAS log

NOTE: 9 records were read from the infile INVENT.
NOTE: The data set PERM.UPDATE has 9 observations and 5
variables.

SAS Techies 2009 11/13/09 12
 When reading raw data, Obs= LENGTH=
use the INFILE Pad LINESIZE=
statement to indicate Lrecl= MISSOVER
which file the data is in. End= N=
DLM= _INFILE_
INFILE file-specification DSD
<options>;
EOF=
Ex: Infile fileref dlm=“,”
FILEVAR=
dsd missover lrecl=
obs= FIRSTOBS=

SAS Techies 2009 11/13/09 13
 INPUT variable  <$> startcol-endcol . . . ;
where
variable is the SAS name you assign to the field
the dollar sign ($) identifies the data set type as character
(nothing appears here if the data set is numeric)
startcol represents the starting column location in the
data line for this variable
endcol represents the ending column location in the data
line for this variable

SAS Techies 2009 11/13/09 14
data finance.duejan;
set finance.loans;
Interest=amount*(rate/12);
run; Start of
Compilation
SAS Data Set Finance.Loans
Phase
Account Amount Rate Months Payment
101-1092  22000 0.1000     60   467.43 When the SET
101-1731 114000  0.0950   360   958.57 statement is
101-1289  10000   0.1050     36   325.02 compiled, a slot is
101-3144    3500  0.1050     12   308.52 added to the
program data
vector for each
variable in the input
data set.

SAS Techies 2009 11/13/09 15
 At the bottom of the DATA step (in this example, when
the RUN statement is encountered), the compilation
phase is complete and the descriptor portion of the new
SAS data set is created.

 The descriptor portion of the data set includes:
 name of the data set
number of observations and variables
names and attributes of the variables.

 Remember, _N_ and _ERROR_ are not written to the data
set. There are no observations because the DATA step
has not yet executed.

SAS Techies 2009 11/13/09 16
 During execution, each
observation in the input
data set is processed,
stored in the program
data vector, and then
written to the new data
set as an observation,
unless otherwise
directed.
 The SET statement
reads the first
observation from the
input data set and
writes the values to the
program data vector.

SAS Techies 2009 11/13/09 17
 First, the values in the
program data vector are
written to the new data
set as the first
observation.
 Second, control returns to
the top of the DATA step.
 Third, SAS retains the
values of variables that
were read from a SAS
data set with the SET
statement, or that were
created by a sum
statement. All other
variable values, such as
the variable Interest, are
set to missing.

SAS Techies 2009 11/13/09 18
 At the beginning of the second iteration, the value of _N_
is set to 2 and the value of _ERROR_ is reset to 0.

 Remember, the automatic variable _N_ keeps track of
the number of times the DATA step has begun to
execute.

 SAS prints the record to the Output and the control
returns to the start of the Datastep and so on.

SAS Techies 2009 11/13/09 19
SAS Techies 2009 11/13/09 20
 SAS Log A note in the SAS log displays the number of observations
and variables in the new data set and also ALL errors that might have
occurred in the compilation or execution.

 Recognizing Errors in a DATA Step Program This section teaches
you how to debug common DATA step programming errors. After
completing this section, you will be able to
 recognize and diagnose syntax errors
recognize and diagnose execution-time errors
diagnose errors in programming logic.

SAS Techies 2009 11/13/09 21
 Compile-time errors, including syntax errors such as
missing or invalid punctuation or misspelled keywords.

 Execution-time errors, such as illegal mathematical
operations or processing a character variable as a
numeric variable. Execution-time errors are detected
after compilation, during the execution of the DATA step.

 In addition, any errors in your program logic can
sometimes cause a DATA step program to produce
results that are different from what you expect.

SAS Techies 2009 11/13/09 22
 When the DATA step
compiles, the SAS data set
Work.Annual is created.
However, due to the syntax
error, the DATA step does
not execute. The new data
set contains no observations
or variables.

 Note that SAS does not
correct the misspelled word
in your program.

 If no syntax errors are
detected or if SAS can
interpret the syntax errors,
the DATA step compiles and
then executes.

SAS Techies 2009 11/13/09 23
 Most execution-time
errors produce warning
messages but allow the
SAS program to continue
executing. Note: If you
process a DATA step in
noninteractive mode,
execution-time errors
may cause the program
to stop processing.

 The new data set is
created and contains nine
observations, even
though some values are
missing.

SAS Techies 2009 11/13/09 24
NOTE: Invalid data for RecHR in line 14 35-37.
RULE: ----+----1----+----2----+----3----+----4----+----5---
14 2575 Quigley, M 74 152 Q13 11 26 I

ID=2575 Name=Quigley, M RestHR=74 MaxHR=152 RecHR=. TimeMin=11
TimeSec=26 Tolerance=I _ERROR_=1 _N_=14

NOTE: 21 records were read from the infile TESTS.
The minimum record length was 45.
The maximum record length was 45.
NOTE: The data set CLINIC.STRESS has 21 observations and 8 variables.
NOTE: DATA statement used: real time 2.04 seconds cpu time 0.06
seconds

SAS Techies 2009 11/13/09 25
 PUT Statement When the
source of program errors
may not be apparent, you
can use the PUT statement
to examine variable values
and generate your own
message in the log.

data test; if code='1' then
Type='Variable'; else if
code='2' then Type='Fixed';
else put 'MY NOTE: invalid
value: ' code=; run;

Data step Debugger

SAS Techies 2009 11/13/09 26
proc print data=clinic.admit  PROC PRINT step lists all
obs=‘Patient’ label double the variables in a data set.
split='*' ; You can select variables
var age height weight fee; and control the order in
where age>30; which they appear by
sum fee; using a VAR statement in
Sum by age; your PROC PRINT step.
Label age=‘Age Today’;
run;  To change the text for the
Obs heading, you can
Sample Output: specify the OBS= option

Patient Age Height Weight Fee
1 27 72 168 85.20
 To remove the Obs
2 34 66 152 124.80 column, you can specify
3 31 61 123 149.75 the NOOBS option
4 43 63 137 149.75
5 51 71 158 124.80

SAS Techies 2009 11/13/09 27
 If condition then expression;
 If ….then….else….;
 Do i=1 to 10 by 3; …statements…end;
 Do while….

SAS Techies 2009 11/13/09 28