You are on page 1of 108

Accessing Data

Last Updated : 29 June, 2004

Center of
Excellence

Agenda
Creating datasets using DATA step.
Infile statement
Different input styles.
Combining datasets using DATA step

What Is the SAS System?


The SAS System is an integrated system of software
products that enables you to perform
data entry, retrieval, and management
report writing and graphics
statistical and mathematical analysis
business planning, forecasting, and decision support
operations research and project management
quality improvement
applications development.

What Is the SAS System?


In addition, you can integrate with SAS
many SAS business solutions that enable
you to perform large scale business
functions, such as
Data Warehousing and Data Mining
Human Resources Management and Decision
Support
Financial Management and Decision Support

Overview of Base SAS Software


The core of the SAS System is base SAS software
SAS language
a programming language that you use to manage your
data.

SAS procedures
software tools for data analysis and reporting.

Macro facility
a tool for extending and customizing SAS software
programs and for reducing text in your programs.

DATA step debugger


a programming tool that helps you find logic problems in
DATA step programs.

Output Delivery System (ODS)


a system that delivers output in a variety of easy-toaccess formats, such as SAS data sets, listing files, or
Hypertext Markup Language (HTML).

Base SAS
data access
management
analysis
presentation

Components of the SAS


Language
SAS Files
Files with formats or structures known to SAS.
All SAS files reside in a SAS data library.
SAS data set
is structured in a format that SAS can process.

SAS catalog
Many different kinds of information that are used in a SAS job

are stored in SAS catalogs, such as instructions for reading and


printing data values, or function key settings that you use in
the SAS windowing environment.

SAS stored program


contains compiled code that you create and save for repeated

use.

Components of the SAS


Language
SAS Data Sets
SAS data file
both describes and physically stores data values.
descriptor portion

describes the contents of the SAS data set to SAS.


Data portion

data that has been collected or calculated.


An observation is a collection of data values that usually
relate to a single object. A variable is the set of data
values that describe a given characteristic.

SAS data view


does not actually store values but create logical SAS

data sets without using the storage space required by


SAS data files.

Structure of SAS Data Sets

Descriptor
Portion

SAS Data Set


General Data Set Information
Name
Number of Obs.
*Label
Number of Variables
*Date/Time Created
Storage Information
Information for Each Variable
Type
Length
Position
Name
*Label *Format *Informat
IDNUM

Data
Portion

NAME

1351
Farr, Sue
161
212
Moore, Ron
2512
Ruth, G H
...
...
...
5151
Coxe, Susan

WAGECAT

WAGERATE

S
S
S
S

3392.50
5093.75
.
1572.50

3163.00

Components of the SAS


Language
External Files
Data files that you use to read and write data, but which
are in a structure unknown to SAS.
External files can be used for storing
raw data that you want to read into a SAS data file
SAS program statements
procedure output

Database Management System Files


SAS software is able to read and write data to and from

other vendors' software, such as many common database


management system (DBMS) files.
In addition to base SAS software, you must license the
SAS/ACCESS software for your DBMS and operating
environment.

Components of the SAS


Language
SAS Language Elements
DATA step
consists of a group of statements in the SAS language that

reads raw data or existing SAS data sets to create a SAS data
set.

PROC step
A group of procedure statements used to analyze data in SAS

data sets to produce statistics, tables, reports, charts, and


plots, to create SQL queries, and to perform other analyses and
operations on your data. They also provide ways to manage
and print SAS files.

SAS Macro Facility


a powerful programming tool for extending and customizing

your SAS programs, and for reducing the amount of code that
you must enter to do common tasks. Macros are SAS files that
contain compiled macro program statements and stored text.

What Can the DATA Step Do?


You can use the DATA step in the following
ways to transform your information:
Read from a raw data file into the SAS system.
SAS Data Set

Raw Data File

Descriptor

DATA Step

What Can the DATA Step Do?


Create multiple SAS data sets in one DATA
step.

DATA Step

What Can the DATA Step Do?


Combine existing data sets.
SAS Data Set 2

SAS Data Set 1

DATA Step

What Can the DATA Step Do?


You can also add or augment information
in a variety of ways.
Create accumulating totals.
SaleDate
SaleDate

Sale
Sale
Amt
Amt

Mth2Dte
Mth2Dte

01APR2001
01APR2001
02APR2001
02APR2001
03APR2001
03APR2001
04APR2001
04APR2001
05APR2001
05APR2001

498.49
498.49
946.50
946.50
994.97
994.97
564.59
564.59
783.01
783.01

498.49
498.49
1444.99
1444.99
2439.96
2439.96
3004.55
3004.55
3787.56
3787.56

What Can the DATA Step Do?


Manipulate numeric values.

BirthDay
4253
SAS Function

Age
30

What Can the DATA Step Do?


Summarize data sets.

Salary
Salary
42000
42000
34000
34000
27000
27000
20000
20000
19000
19000
19000
19000

Div
Div
HUMRES
HUMRES
FINACE
FINACE
FLTOPS
FLTOPS
FINACE
FINACE
FINACE
FINACE
FLTOPS
FLTOPS

DATA Step

Div
Div

DivSal
DivSal

FINACE
FINACE
FLTOPS
FLTOPS
HUMRES
HUMRES

42000
42000
46000
46000
73000
73000

What Can the DATA Step Do?


presenting your data
file management
And much, much more

Words in the SAS Language


A word or token in the SAS language is a collection of
characters that communicates a meaning to SAS.
A word or token ends when SAS encounters one of the
following:
the beginning of a new token
a blank after a name or a number token
the ending quotation mark of a literal token.

Each word or token in the SAS language classified into


four categories.
Names - a series of characters that begin with a letter or an
underscore. Ex.: data, _old, yearcutoff, _n_, year_04, descending
Literal - consists of 1 to 32,767 characters enclosed in single or
double quotation marks ( Bangalore, 2003-04, Wipros
Plan,
"Report for the Third Quarter" )

Words in the SAS Language


Number - in general is composed entirely of
numeric digits, with an optional decimal point and a
leading plus or minus sign. SAS also recognizes
numeric values in the following forms as number
tokens:
scientific
(E-)
notation,
hexadecimal
notation, missing value symbols, and date and time
literals.
Ex: 1234, -2004, 1.25, 5.4E-1, 30jun04'd

Special character - is usually any single keyboard


character other than letters, numbers, the
underscore, and the blank. In general, each special
character is a single token, although some twocharacter operators, such as ** and <=, form single
tokens.
Ex: =, :, @, , +, /

Names in the SAS Language


A SAS name is a name token that represents

variables
SAS data sets
formats or informats
SAS procedures
options
arrays
statement labels
SAS macros or macro variables
SAS catalog entries
librefs or filerefs.

There are two kinds of names in SAS.


names of elements of the SAS language
names supplied by SAS users.

Names in the SAS Language


Rules for User-Supplied SAS Names

Members of SAS data libraries - 32


(SAS data sets, views, catalogs, indexes)
Generation data sets
- 28
Catalog entries
- 32
Engines, Librefs, Filerefs, Passwords - 8
DATA step variables - 32
DATA step variable labels - 256
DATA step statement labels
- 32
Arrays - 32
Functions
- 16
Formats
-8
Informats
-7
Macros, Macro variables - 32

Names in the SAS Language


Rules for User-Supplied SAS Names
The first character must be a letter (A, B, C, . . ., Z) or
underscore (_). Subsequent characters can be letters,
numeric digits (0, 1, . . ., 9), or underscores.
You can use upper or lowercase letters. SAS processes
names as uppercase regardless of how you type them.
Blanks cannot appear in SAS names.
SAS reserves a few names for automatic variables and
variable lists, SAS data sets, and librefs.
When creating variables, do not use the names of special

SAS automatic variables (for example, _N_ and _ERROR_) or


special variable list names (for example, _CHARACTER_,
_NUMERIC_, and _ALL_).
When associating a libref with a SAS data library, do not
use SASHELP, SASMSG, SASUSER, WORK .
When you create SAS data sets, do not use _NULL_, _DATA_,
_LAST_.

Names in the SAS Language


Rules for User-Supplied SAS Names
Special characters, except for the underscore,
are not allowed. In filerefs only, you can use
the dollar sign ($), pound sign (#), and at sign
(@).
When assigning a fileref to an external file, do
not use:
SASCAT.
When you create a macro variable, do not use
names that begin with SYS

SAS Dates
SAS dates are special numeric values
representing the number of days between
January 1, 1960 and a specified date.
1jan1959

1jan1960

1jan1961

1jan2000

DATE9. Informat
-365

0
SAS Date
Values

366

14610

SAS Date
Values

MMDDYY10. Format
01
01
01
01
/01
/01
/01
/01
/19
/19
/19
/20
59
60
61
00

Standard Data
The term standard data refers to character and
numeric data that SAS recognizes automatically.
Some examples of standard numeric data
include
35469.93
3E5 (exponential notation)
-46859

Standard character data is any character you


can type on your keyboard. Standard character
values are always left-justified by SAS.

Nonstandard Data
The term nonstandard data refers to
character and numeric data that SAS does
not recognize automatically.
Examples of nonstandard numeric data
include
12/12/2012
29FEB2000
4,242
$89,000

Create a SAS Data Set from a Raw


Data File
E1232
E2341
E3452
E6781
E8321
E1052
E1062
E8172
E1091

15OCT1999
01JUN1997
26OCT1993
16SEP1992
26NOV1996
27FEB1997
10MAY1987
06JAN2000
20AUG1991

raw
data
file
61065 A
contains
employee
91688
32639
information
for
28305
the level 1 flight
40440
attendants. Use the
39461
raw data file to
41463
create
the
40650
work.fltat1 SAS data
40950
set.

Desired Output
Obs

EmpID

Hire
Date

1
2
3
4
5
6
7
8
9

E1232
E2341
E3452
E6781
E8321
E1052
E1062
E8172
E1091

14532
13666
12352
11947
13479
13572
9991
14615
11554

Salary

Bonus

61065
91688
32639
28305
40440
39461
41463
40650
40950

3053.25
4584.40
1631.95
1415.25
2022.00
1973.05
2073.15
2032.50
2047.50

The DATA Statement


A DATA step always begins with a DATA
statement.
General form of a DATA statement:

DATA
DATASAS-data-set;
SAS-data-set;

The DATA statement starts the DATA step


and names the SAS data set being created.

The INFILE Statement


If you are reading data from a raw data file,
you need an INFILE statement.
General form of an INFILE statement:

INFILE
INFILE 'raw-data-file'
'raw-data-file' <options>;
<options>;

The INFILE statement points to the raw data


file being read. Options in the INFILE
statement affect how SAS reads the raw data
file.

The INPUT Statement


When you are reading from a raw data file, the
INPUT statement follows the INFILE statement.
General form of an INPUT statement:

INPUT
INPUTvariable-specification
variable-specification;
;

The INPUT statement describes the raw data


fields and specifies how you want them
converted into SAS variables.

Formatted Input
The input style tells SAS where to find the
fields and how to read them into SAS.
INPUT
INPUT@n
@nvariable-name
variable-nameinformat.
informat....;
...;

@n - moves the pointer to the starting point


of the field.
variable-name - names the SAS variable
being created.
Informat - specifies how many positions to
read and how to convert the raw data into a
SAS value.

The INPUT Statement


Common SAS informats:
$w. - reads a standard character field,
where w specifies the width of the field in
bytes.
W - reads a standard numeric field, where w
specifies the width of the field in bytes.
DATE9. - reads dates in the form
31DEC2012.

The Assignment Statement


To create a new variable in the DATA step,
use an assignment statement:

variable-name=expression;
variable-name=expression;

The assignment statement creates a SAS


variable and specifies how to calculate that
variable's value.

Create a SAS Data Set from a


Raw Data File
data work.fltat1;
infile 'raw-data-file';
input @1 EmpID $5.
@7 HireDate date9.
@17 Salary 5.;
Bonus=.05*Salary;
run;

Create a SAS Data Set from a


Raw Data File
Partial Log
NOTE: 9 records were read from the infile
'fltat1.dat'.
The minimum record length was 21.
The maximum record length was 21.
NOTE: The data set WORK.FLTAT1 has
9 observations and 4 variables.

Overview of DATA Step


Processing

Processing the DATA Step


The SAS System processes the DATA step in two
phases:
compilation
execution.

When you submit a DATA step for execution, SAS


checks the syntax of the SAS statements and
compiles them. During the compile phase, SAS
creates the following three items
input buffer
is a logical area in memory into which SAS reads each

record of raw data when SAS executes an INPUT statement.

program data vector (PDV)


is a logical area in memory where SAS builds a data set, one

observation at a time. When a program executes, SAS reads


data values from the input buffer or creates them by
executing SAS language statements.

DATA Step Processing


The

data values are assigned to the appropriate


variables in the program data vector. From here, SAS
writes the values to a SAS data set as a single
observation Along with data set variables and
computed variables, the PDV contains two automatic
variables, _N_ and _ERROR_. The _N_ variable counts
the number of times the DATA step begins to iterate.
The _ERROR_ variable signals the occurrence of an
error caused by the data during execution.

descriptor information
is information that SAS creates and maintains about

each SAS data set, including data set attributes and


variable attributes. It contains, for example, the name
of the data set and its member type, the date and time
that the data set was created, and the number, names
and data types (character or numeric) of the variables.

DATA Step Processing


The flow of action in the Execution Phase of a simple
DATA step
The DATA step begins with a DATA statement. Each time the
DATA statement executes, a new iteration of the DATA step
begins, and the _N_ automatic variable is incremented by 1.
SAS sets the newly created program variables to missing in
the program data vector (PDV).
SAS reads a data record from a raw data file into the input
buffer, or it reads an observation from a SAS data set
directly into the program data vector. You can use an INPUT,
MERGE, SET, MODIFY, or UPDATE statement to read a
record.
SAS executes any subsequent programming statements for
the current record.

DATA Step Processing


At the end of the statements, an output, return, and
reset occur automatically. SAS writes an observation
to the SAS data set, the system automatically
returns to the top of the DATA step, and the values of
variables
created
by
INPUT
and
assignment
statements are reset to missing in the program data
vector. Note that variables that you read with a SET,
MERGE, MODIFY, or UPDATE statement are not reset
to missing here.
SAS counts another iteration, reads the next record
or observation, and executes the subsequent
programming
statements
for
the
current
observation.
The DATA step terminates when SAS encounters the
end-of-file in a SAS data set or a raw data file.

DATA step
Reading External File Data

data bonus_04; [1]


infile 'your-input-file'; [2]
input IDnumber name $ salary ; [3]
bonus=salary * 0.25; [4]
run; [5]

1- Begin the DATA step and create a SAS data set


called bonus_04.
2- Specify the external file that contains your data.
3- Read a record and assign values to three
variables.
4- Calculate a value for variable bonus.
5- Execute the DATA step.

DATA step Input Styles


The INPUT statement reads raw data from instream
data lines or external files into a SAS data set and
input styles depending on the layout of data values in
the records.
INPUT, Formatted - Reads input values from specified columns
and assigns them to the corresponding SAS variables
INPUT, Column - Reads input values with specified informats
and assigns them to the corresponding SAS variables
INPUT, List - Scans the input data record for input values and
assigns them to the corresponding SAS variables
INPUT, Named - Reads data values that appear after a
variable name that is followed by an equal sign and assigns
them to corresponding SAS variables

DATA step
An informat is an instruction that SAS uses to
read data values into a variable.
The INPUT statement with an informat after a
variable name is the simplest way to read values
into a variable.
$w.
DATEw.

Reads standard character data


Reads date values in the form ddmmmyy

or

ddmmmyyyy
MMDDYYw.-

Reads date values in the form mmddyy or

mmddyyyy
w.d

COMMAw.d

Reads standard numeric data


Removes embedded characters

Accessing Data List input


List input uses a scanning method for locating data
values. Data values are must be separated by at
least one blank (or other defined delimiter). List
input requires only that you specify the variable
names and a dollar sign ($), if defining a character
variable.

Libname new /wipro/dw/data;


data new.scores;
length name $ 12;
input name $ score1 score2;
datalines;
aaaaaa 1132 1187
bbbbbbbb 1015 1102
cccc 246 357
;
Run;

Modified List Input


data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
run;
output
Name Score1
Smith 12 22
Mitchel
23
Jones 09
17

Score2
Score3
Team Div
46
"Green Hornets, AtlantaAAA
19
25
"High Volts, Portland
54
"Vulcans, Las Vegas
AA

AAA

Modified List Input


The : (colon) format modifier enables you to use list input
but also to specify an informat after a variable name,
whether character or numeric. SAS reads until it
encounters a blank column.
The ~ (tilde) format modifier enables you to read and retain
single quotation marks, double quotation marks, and
delimiters within character values.
If you want SAS to read consecutive delimiters as though
there is a missing value between them, specify the DSD
option in the INFILE statment.
To read and store a character input value longer than 8
bytes, define a variable's length by using a LENGTH,
INFORMAT.
Character values cannot contain embedded blanks when
the file is delimited by blanks.
Fields must be read in order.
Data must be in standard numeric or character format.

Data Accessing - Column Input


Column input enables you to read standard data values that
are aligned in columns in the data records. Specify the
variable name, followed by a dollar sign ($) if it is a
character variable, and specify the columns in which the
data values are located in each record:
data scores;
infile datalines truncover;
input name $ 1-12 score1 17-20 score2 27-30;
datalines;
123456789101112131415161718192021222324252627282930
Riley
1132
987
Henderson
1015
1102
;
run;

Data Accessing - Column Input


To use column input, data values must be
in the same field on all the input lines
in standard numeric or character form.

Features of column input include the following


Character values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Placeholders, such as a single period (.), are not required for
missing data.
Input values can be read in any order, regardless of their
position in the record.
Values or parts of values can be reread.
Both leading and trailing blanks within the field are ignored.
Values do not need to be separated by blanks or other
delimiters.
Use the TRUNCOVER option on the INFILE statement to ensure
that SAS handles data values of varying lengths
appropriately.

Data Accessing - Formatted


Input
Formatted input combines the flexibility of using
informats with many of the features of column
input. By using formatted input, you can read
nonstandard data for which SAS requires additional
instructions. Formatted input is typically used with
pointer controls that enable you to control the
position of the input pointer in the input buffer
when you read data.
data scores;
input name $12. +4 score1 comma5. +6 score2
comma5.;
datalines;
Riley
1,132
1,187
Henderson
1,015
1,102
;

Data Accessing - Formatted


Input
Important points about formatted input are
Characters values can contain embedded blanks.
Character values can be from 1 to 32,767
characters long.
Placeholders, such as a single period (.), are not
required for missing data.
With the use of pointer controls to position the
pointer, input values can be read in any order,
regardless of their positions in the record.
Values or parts of values can be reread.
Formatted input enables you to read data stored
in nonstandard form, such as packed decimal or
numbers with commas.

Data Accessing - Named Input


You can use named input to read records in which
data values are preceded by the name of the
variable and an equal sign (=). The following
INPUT statement reads the data lines containing
equal signs.
data games;
input name=$ score1= score2=;
datalines;
name=abc score1=1132 score2=1187
;
run;

The MISSOVER Option


The MISSOVER option prevents SAS from loading
a new record when the end of the current record
is reached.
General form of the INFILE statement with the
MISSOVER option:
INFILE
INFILE raw-data-file
raw-data-fileMISSOVER;
MISSOVER;

If SAS reaches the end of the row without finding


values for all fields, variables without values are
set to missing.

Using the MISSOVER Option


data airplanes3;
length ID $ 5;
infile 'raw-data-file' dlm=',' missover;
input ID $
InService : date9.
PassCap CargoCap;
run;
Raw Data File
50001 ,25feb1989,132, 530
50002, 11nov1989,152
50003, 22oct1991,168, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520

Using the MISSOVER Option


Partial SAS Log
NOTE: 6 records were read from the infile
'aircraft3.dat'.
The minimum record length was 19.
The maximum record length was 26.
NOTE: The data set WORK.AIRPLANES3 has 6 observations
and 4 variables.
NOTE: DATA statement used:
real time
0.42 seconds
cpu time
0.07 seconds

Missing Values without


Placeholders
There is missing data represented by two
consecutive delimiters.
50001 ,25feb1989,, 540
50002, 11nov1989,132, 530
50003, 22oct1991,168, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180,520

Missing Values without


Placeholders
By default, SAS treats two consecutive
delimiters as one. Missing data should be
represented by a placeholder.
5 0 0 0 1 ,25feb1989 , . ,

530

The DSD Option


General form of the DSD option in the INFILE
statement:
INFILE
INFILEfile-name
file-nameDSD;
DSD;

Missing Values without


Placeholders
The DSD option
sets the default delimiter to a comma
treats consecutive delimiters as missing values
enables SAS to read values with embedded
delimiters if the value is
surrounded by double quotes.

Using the DSD Option


data airplanes4;
length ID $ 5;
infile 'raw-data-file' dsd;
input ID $
InService : date9.
PassCap CargoCap;
run;
50001 ,25feb1989,, 540
50002, 11nov1989,132, 530
50003, 22oct1991,168, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180,520

INFILE Statement Options


Problem

Option

Non-blank delimiters

DLM='delimiter(s)'

Missing data at end of row

MISSOVER

Missing data represented by


consecutive delimiters
or
Embedded delimiters where values
are surrounded by double quotes

DSD

These options can be used separately or


together in the INFILE statement.

Multiple Records Per


Observation
Farr, Sue
Anaheim, CA
869-7008
Anderson, Kay B.
Chicago, IL
483-3321
Tennenbaum, Mary Ann
Jefferson, MO
589-9030

A raw data file has


three
records
per
employee. Record 1
contains the first and
last names, record 2
contains the city and
state of residence,
and record 3 contains
the
employees
phone number.

Desired Output
The SAS data set should have one
observation per employee.
LName

FName

Farr
Anderson
Tennenbaum

Sue
Kay B.
Mary Ann

City
Anaheim
Chicago
Jefferson

State
CA
IL
MO

Phone
869-7008
483-3321
589-9030

Multiple INPUT Statements


data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $;
Load Record input City $ State $;
Load Record input Phone $;
run;

Line Pointer Controls


You can also use line pointer controls to
control when SAS loads a new record.
DATA
DATASAS-data-set;
SAS-data-set;
INPUT
INPUTvar-1
var-1 var-2
var-2var-3
var-3// var-4
var-4var-5;
var-5;
additional
additionalSAS
SASstatements
statements

SAS loads the next record when it


encounters a forward slash.

Reading Multiple Records Per


Observation
data address;
length LName FName $ 20
City $ 25 State $ 2
Phone $ 8;
infile 'raw-data-file' dlm=',';
Load Record input LName $ FName $ / Load Record
City $ State $ / Load Record
Phone $;
run;

Reading Multiple Records Per


Observation
Partial Log

NOTE: 9 records were read from


the infile 'addresses.dat'.
The minimum record length was 8.
The maximum record length was 20.
NOTE: The data set WORK.ADDRESS has
3 observations and 5 variables.

Reading Raw Data Files with


Multiple Records Per
Observation
Mixed Record Types
101 USA 1-20-1999 3295.50
3034 EUR 30JAN1999 1876,30
101 USA 1-30-1999 2938.00
128 USA 2-5-1999 2908.74
1345 EUR 6FEB1999 3145,60
109 USA 3-17-1999 2789.10

Reading Raw Data Files with


Multiple Records Per
Observation
Desired Output
Sales
ID
101
3034
101
128
1345
109

Location

Sale
Date

Amount

USA
EUR
USA
USA
EUR
USA

14264
14274
14274
14280
14281
14320

3295.50
1876.30
2938.00
2908.74
3145.60
2789.10

Reading Raw Data Files with


Multiple Records Per
Observation
The Single Trailing @
The single trailing @ option holds a raw data record in the
input buffer until SAS
executes an INPUT statement with no trailing @
reaches the bottom of the DATA step.
General form of an INPUT statement with the single
trailing @:

INPUT
INPUTvar1
var1var2
var2var3
var3
@;
@;

Reading Raw Data Files with


Multiple Records Per
Observation
Processing the Trailing @

Load next record.

Hold record for next


INPUT statement.

input SalesID $ Location $ @;


if location='USA' then
input SaleDate : mmddyy10.
Amount;
else if Location='EUR' then
input SaleDate : date9.
Amount : commax8.;

Multiple Observations Per


Record
The raw data file RETIRE contains each
employees identification number and this
years contribution to his or her
retirement plan. Each record contains
E00973
1400 E09872
2003 E73150
2400
information
for three
employees.
E45671 4500 E34805 1980 E47200 4371

Desired Output

EmpID

Contrib

E00973
E09872
E73150
E45671
E34805

1400
2003
2400
4500
1980

Multiple Observations Per


Record
Processing: What Is Required?
E00973 1400 E09872 2003 E73150 2400
Read for
Obs. 1

Read for
Obs. 2

Read for
Obs. 3

Process
Other
Statements

Process
Other
Statements

Process
Other
Statements

Output

Output

Output

Multiple Observations Per


Record
The Double Trailing @
The double trailing @ holds the raw data
record across iterations of the DATA step until
the line pointer moves past the end of the line.
INPUT
INPUTvar1
var1var2
var2 var3
var3
@@;
@@;

data work.retire;
length EmpID $ 6;
infile 'raw-data-file';
input EmpID $ Contrib @@;
run;
Hold until end of
record.

Multiple Observations Per


Record
Partial Log
NOTE: 2 records were read from the
infile 'retire.dat'.
The minimum record length was 35.
The maximum record length was 36.
NOTE: SAS went to a new line when INPUT
statement reached past the end of
a line.
NOTE: The data set WORK.RETIRE has
6 observations and 2 variables.
The SAS went to a new line message is expected because the
@@ option indicates that SAS should read until the end of each
record.

Multiple Observations Per


Record
Trailing @ Versus Double Trailing @
Option
Trailing @
INPUT var-1... @;
Double trailing @
INPUT var-1 ... @@;

Effect
Holds raw data record until
1) an INPUT statement with no trailing @
2) the bottom of the DATA step.
Holds raw data records in input buffer until
SAS reads past end of line.

Reading Hierarchical Raw Data


Files
Processing Hierarchical Files
Many files are hierarchical in structure, consisting of a
header record
one or more related detail records.
Header
Typically, each record contains a field
that identifies whether it is a header
record or a detail record.

Detail
Detail
Header
Header
Detail
Header
Detail
Detail

Reading Hierarchical Raw Data


Files
Processing Hierarchical Files

You can read a hierarchical file into a SAS data set by creating
observation per detail record and storing the header informatio
of each observation.
Hierarchical File

Header 1
Detail 1
Detail 2
Detail 3
Header 2
Detail 1
Header 3
Detail 1
Detail 2

SAS Data Set

Header
Variables

Detail
Variables

Header 1
Header 1
Header 1
Header 2
Header 3
Header 3

Detail 1
Detail 2
Detail 3
Detail 1
Detail 1
Detail 2

Reading Hierarchical Raw Data


Files
Creating One Observation Per Detail
E:Adams:Susan
D:Michael:C
D:Lindsay:C
E:Porter:David
D:Susan:S
E:Lewis:Dorian D.
D:Richard:C
E:Dansky:Ian
E:Nicholls:James
D:Roberta:C
E:Slaydon:Marla
D:John:S

The raw data file


DEPENDANTS has a
header
record
containing the name
of the employee and
a detail record for
each dependant on
the
employees
health insurance.

Reading Hierarchical Raw Data


Files
Desired Output
Personnel would like a list of all the dependants and the
name of the
associated employee.

EmpLName
Adams
Adams
Porter
Lewis
Nicholls
Slaydon

EmpFName
Susan
Susan
David
Dorian D.
James
Marla

DepName
Michael
Lindsay
Susan
Richard
Roberta
John

Relation
C
C
S
C
C
S

Reading Hierarchical Raw Data


Files
The RETAIN Statement
General form of the RETAIN statement:

RETAIN
RETAIN variable-name
variable-name<initial-value>;
<initial-value>;

The RETAIN statement prevents SAS from reinitializing the


values of
new variables at the top of the DATA step. This means that
values from
previous records are available for processing.

Reading Hierarchical Raw Data


Files
Hold EmpLName and EmpFName
data dependants(drop=Type);
length EmpLName EmpFName
DepName $ 20 Relation $ 1;
retain EmpLName EmpFName;
infile 'raw-data-file' dlm=':';
input Type $ @;
if Type='E' then
input EmpLName $ EmpFName $;
else do;
input DepName $ Relation $;
output;
end;
run;

Example Modified List Input


This example explains
Compilation Phase
Execution Phase.

Example Modified List Input


Raw Data File
50001
50002
50003
50004
50005
50006

Compile

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

Input Buffer

ID
$
5

PDV

Raw Data File


50001
50002
50003
50004
50005
50006

Compile

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer

ID
$
5

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

...

Raw Data File


50001
50002
50003
50004
50005
50006

Execute

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer

ID
$
5

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer
5 0 0 0 1

ID
$
5

4 f e b 1 9 8 9

1 3 2

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer
5 0 0 0 1

ID
$
5

50001

4 f eb 1 9 8 9

1 3 2

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10627
.

.
132

530
.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

Implicit return

Input Buffer
5 0 0 0 1

ID
$
5

50001

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

4 f eb 1 9 8 9

1 3 2

Implicit
output

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10627
.

530
.
Write out observation to airplanes.
.
132

...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
Implicit
output

Input Buffer
5 0 0 0 1

ID
$
5

50001

4 f eb 1 9 8 9

1 3 2

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10627
.

530
.
Write out observation to airplanes.
.
132

...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

Implicit return

Input Buffer
5 0 0 0 1

ID
$
5

50001

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

4 f eb 1 9 8 9

1 3 2

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10627
.

.
132

530
.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer
5 0 0 0 1

ID
$
5

4 f eb 1 9 8 9

1 3 2

5 3 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer
5 0 0 0 2

ID
$
5

50002

1 1 n o v 1 98 9

1 5 2

5 4 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10907
.

.
152

540
.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

Implicit return

Input Buffer
5 0 0 0 2

ID
$
5

50002

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

1 1 n o v 1 98 9

1 5 2

Implicit
output

5 4 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10907
.

540
.
Write out observation to airplanes.
.
152

...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
Implicit
output

Input Buffer
5 0 0 0 2

ID
$
5

50002

1 1 n o v 1 98 9

1 5 2

5 4 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10907
.

.
152

540
.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

Implicit return

Input Buffer
5 0 0 0 2

ID
$
5

50002

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

1 1 n o v 1 98 9

1 5 2

5 4 0

PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

10907
.

.
152

540
.
...

Raw Data File


50001
50002
50003
50004
50005
50006

4feb1989 132 530


11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520

data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;

Input Buffer
5 0 0 0 2

ID
$
5

Continue processing until


1 1 n o v 1 98 9
1 5 2
5 4 0
end of the raw
data file.
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8

Output of Dataset
proc print data=airplanes noobs;
run;

ID
50001
50002
50003
50004
50005
50006

In
Service

Pass
Cap

Cargo
Cap

10627
10907
11617
12088
12228
12772

132
152
168
172
170
180

530
540
530
550
510
520

Combining SAS Data Sets


Create a Data set from two or more
existing data sets by joining observation
side-by-side or appends the observations
from one data set to another data set.
Methods to combine SAS data sets
concatenating
interleaving
one-to-one reading
one-to-one merging
match merging
updating.

Combine SAS data sets

Concatenating
Concatenating the data
sets appends the
observations from one
data set to another data
set. The DATA step reads
DATA1 sequentially until
all observations have
been processed, and
then reads DATA2. Data
set COMBINED contains
the results of the
concatenation.

Combine SAS data sets


Interleaving
intersperses
observations from two
or more data sets,
based on one or more
common variables.

Combine SAS data sets


One-to-One Reading and Oneto-One Merging.
One-to-one reading combines
observations from two or more
SAS data sets by creating
observations that contain all
of the variables from each
contributing data set.
Observations are combined
based on their relative
position in each data set. The
DATA step stops after it has
read the last observation from
the smallest data set. One-toone merging is similar to a
one-to-one reading, with two
exceptions: you use the
MERGE statement instead of
multiple SET statements, and
the DATA step reads all
observations from all data
sets.

Combine SAS data sets


Match merging
combines observations
from two or more SAS
data sets into a single
observation in a new
data set based on the
values of one or more
common variables.

Combine SAS data sets


Identifying Data Set Contributors
When you read multiple SAS data sets in one DATA step,
you can use the IN= data set option to detect which data
set contributed to an observation.
General form of the IN= data set option:

SAS-data-set(IN=variable)
SAS-data-set(IN=variable)
where variable is any valid SAS variable name.
Variable is a temporary numeric variable with a value of:
0 to indicate false; the data set did not contribute to the

current
observation
1 to indicate true; the data set did contribute to the current
observation

Combine SAS data sets


IN= Data Set Option
Transact
Num
111
111
113
114
116

Trans
D
C
C
D
C

Amnt
126.32
560
235
14.56
371.69

Branch
Num
111
112
114
115
116

Branch
M.G.Road
Sivaji Nagar
Madiwala
Koramangala
BTM

A data set named Newtrans shows this weeks


transactions.
A data set named noactiv shows accounts with no
transactions
this week.
A data set named noacct shows transactions with no
matching

Combine SAS data sets


Num
111
111
113
114
116

Trans
D
C
C
D
C

Amnt
126.32
560
235
14.56
371.69

Num
111
112
114
115
116

Branch
M.G.Road
Sivaji Nagar
Madiwala
Koramangala
BTM

Branch
data Transact
newtrans noactiv (drop=Trans Amnt) noacct
(drop=Branch);
merge prog2.transact(in=InTrans) prog2.branch(in=InBanks);
by ActNum;
if InTrans and InBanks then output newtrans;
else if InBanks and not InTrans then output noactiv;
else if InTrans and not InBanks then output noacct;
run;

Combine SAS data sets


A data set named Newtrans shows this
weeks transactions.
Num
Trans Amnt
Branch
111
111
114
116

D
C
D
C

M.G.Road
M.G.Road
Madiwala
BTM

126.32
560
14.56
371.69

A data set named noactiv shows accounts


Num . Branch
with no transactions this week
112
115

Sivaji Nagar
Koramangala

A data set named noacct shows


transactions with no matching account
Num
Trans Amnt
number.
113

235

Questions

You might also like