Accessing Data

Last Updated : 29 June, 2004

Center of Excellence

Agenda
Creating datasets using DATA step. Infile statement Different input styles. Combining datasets using DATA step

What Is the SAS System?
 The SAS System is an integrated system of software products that enables you to perform
 data entry, retrieval, and management  report writing and graphics  statistical and mathematical analysis  business planning, forecasting, and decision support  operations research and project management  quality improvement  applications development.

What Is the SAS System?
In addition, you can integrate with SAS many SAS business solutions that enable you to perform large scale business functions, such as
 Data Warehousing and Data Mining  Human Resources Management and Decision Support  Financial Management and Decision Support

Overview of Base SAS Software
 The core of the SAS System is base SAS software  SAS language
 a programming language that you use to manage your data.

 SAS procedures
 software tools for data analysis and reporting.

 Macro facility
 a tool for extending and customizing SAS software programs and for reducing text in your programs.

 DATA step debugger
 a programming tool that helps you find logic problems in DATA step programs.

 Output Delivery System (ODS)
 a system that delivers output in a variety of easy-toaccess formats, such as SAS data sets, listing files, or Hypertext Markup Language (HTML).

Base SAS data access management analysis presentation

Components of the SAS Language
 SAS Files  Files with formats or structures known to SAS.  All SAS files reside in a SAS data library.
 SAS data set
 is structured in a format that SAS can process.

 SAS catalog
 Many different kinds of information that are used in a SAS job

are stored in SAS catalogs, such as instructions for reading and printing data values, or function key settings that you use in the SAS windowing environment.

 SAS stored program
 contains compiled code that you create and save for repeated

use.

Components of the SAS Language
SAS Data Sets SAS data file
 both describes and physically stores data values.
 descriptor portion

– describes the contents of the SAS data set to SAS.
 Data portion

– data that has been collected or calculated. – An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.

 SAS data view
 does not actually store values but create logical SAS data

sets without using the storage space required by SAS data files.

Structure of SAS Data Sets
SAS Data Set General Data Set Information Name Number of Obs. *Label Number of Variables *Date/Time Created Storage Information Information for Each Variable Type Length Position Name *Label *Format *Informat IDNUM Data Portion NAME WAGECAT S S S S S WAGERATE 3392.50 5093.75 . 1572.50 3163.00 1351 Farr, Sue 161 212 Moore, Ron 2512 Ruth, G H ... ... ... 5151 Coxe, Susan

Descriptor Portion

Components of the SAS Language
 External Files
 Data files that you use to read and write data, but which are in a structure unknown to SAS.  External files can be used for storing
 raw data that you want to read into a SAS data file  SAS program statements  procedure output

 Database Management System Files
 SAS software is able to read and write data to and from

other vendors' software, such as many common database management system (DBMS) files.  In addition to base SAS software, you must license the SAS/ACCESS software for your DBMS and operating environment.

Components of the SAS Language
 SAS Language Elements
 DATA step
 consists of a group of statements in the SAS language that reads

raw data or existing SAS data sets to create a SAS data set.

 PROC step

A group of procedure statements used to analyze data in SAS data sets to produce statistics, tables, reports, charts, and plots, to create SQL queries, and to perform other analyses and operations on your data. They also provide ways to manage and print SAS files. a powerful programming tool for extending and customizing your SAS programs, and for reducing the amount of code that you must enter to do common tasks. Macros are SAS files that contain compiled macro program statements and stored text.

 SAS Macro Facility

What Can the DATA Step Do?
You can use the DATA step in the following ways to transform your information:
 Read from a raw data file into the SAS system. SAS Data Set
Descriptor

Raw Data File

DATA Step

What Can the DATA Step Do?
 Create multiple SAS data sets in one DATA step.

DATA Step

What Can the DATA Step Do?
 Combine existing data sets.
SAS Data Set 1 SAS Data Set 2

DATA Step

What Can the DATA Step Do?
You can also add or augment information in a variety of ways.
 Create accumulating totals. SaleDate SaleDate 01APR2001 01APR2001 02APR2001 02APR2001 03APR2001 03APR2001 04APR2001 04APR2001 05APR2001 05APR2001 Sale Sale Amt Amt 498.49 498.49 946.50 946.50 994.97 994.97 564.59 564.59 783.01 783.01 Mth2Dte Mth2Dte 498.49 498.49 1444.99 1444.99 2439.96 2439.96 3004.55 3004.55 3787.56 3787.56

What Can the DATA Step Do?
 Manipulate numeric values.

BirthDay 4253 SAS Function Age 30

What Can the DATA Step Do?
 Summarize data sets.

Salary Salary 42000 42000 34000 34000 27000 27000 20000 20000 19000 19000 19000 19000

Div Div HUMRES HUMRES FINACE FINACE FLTOPS FLTOPS FINACE FINACE FINACE FINACE FLTOPS FLTOPS Div Div DATA Step FINACE FINACE FLTOPS FLTOPS HUMRES HUMRES DivSal DivSal 42000 42000 46000 46000 73000 73000

What Can the DATA Step Do?
presenting your data  file management And much, much more……………

Words in the SAS Language
 A word or token in the SAS language is a collection of characters that communicates a meaning to SAS.  A word or token ends when SAS encounters one of the following:
 the beginning of a new token  a blank after a name or a number token  the ending quotation mark of a literal token.

 Each word or token in the SAS language classified into four categories.
 Names - a series of characters that begin with a letter or an underscore. Ex.: data, _old, yearcutoff, _n_, year_04, descending  Literal - consists of 1 to 32,767 characters enclosed in single or double quotation marks ( ‘Bangalore’, “2003-04”, ‘ Wipro”s Plan’,  "Report for the Third Quarter" )

Words in the SAS Language
 Number - in general is composed entirely of numeric digits, with an optional decimal point and a leading plus or minus sign. SAS also recognizes numeric values in the following forms as number tokens: scientific (E-) notation, hexadecimal notation, missing value symbols, and date and time literals.
 Ex: 1234, -2004, 1.25, 5.4E-1, ’30jun04'd

 Special character - is usually any single keyboard character other than letters, numbers, the underscore, and the blank. In general, each special character is a single token, although some twocharacter operators, such as ** and <=, form single tokens.
 Ex: =, :, @, ‘, +, /

Names in the SAS Language
 A SAS name is a name token that represents
          variables SAS data sets formats or informats SAS procedures options arrays statement labels SAS macros or macro variables SAS catalog entries librefs or filerefs.

 There are two kinds of names in SAS.
 names of elements of the SAS language  names supplied by SAS users.

Names in the SAS Language
 Rules for User-Supplied SAS Names
             Members of SAS data libraries (SAS data sets, views, catalogs, indexes) Generation data sets Catalog entries Engines, Librefs, Filerefs, Passwords DATA step variables - 32 DATA step variable labels - 256 DATA step statement labels Arrays - 32 Functions - 16 Formats Informats -7 Macros, Macro variables - 32 - 32 - 28 - 32 -8

- 32

-8

Names in the SAS Language
 Rules for User-Supplied SAS Names
 The first character must be a letter (A, B, C, . . ., Z) or underscore (_). Subsequent characters can be letters, numeric digits (0, 1, . . ., 9), or underscores.  You can use upper or lowercase letters. SAS processes names as uppercase regardless of how you type them.  Blanks cannot appear in SAS names.  SAS reserves a few names for automatic variables and variable lists, SAS data sets, and librefs.
 When creating variables, do not use the names of special SAS automatic variables (for

example, _N_ and _ERROR_) or special variable list names (for example, _CHARACTER_, _NUMERIC_, and _ALL_).  When associating a libref with a SAS data library, do not use SASHELP, SASMSG, SASUSER, WORK .  When you create SAS data sets, do not use _NULL_, _DATA_, _LAST_.

Names in the SAS Language
Rules for User-Supplied SAS Names
 Special characters, except for the underscore, are not allowed. In filerefs only, you can use the dollar sign ($), pound sign (#), and at sign (@).  When assigning a fileref to an external file, do not use:  SASCAT.  When you create a macro variable, do not use names that begin with SYS

SAS Dates
SAS dates are special numeric values representing the number of days between January 1, 1960 and a specified date.
1jan1959 1jan1960 1jan1961 1jan2000 DATE9. Informat -365 0
SAS Date Values

366

14610

SAS Date Values

MMDDYY10. Format 01 01 01 01 /01 /01 /01 /01 /19 /19 /19 /20 59 60 61 00

Standard Data
The term standard data refers to character and numeric data that SAS recognizes automatically. Some examples of standard numeric data include
 35469.93  3E5 (exponential notation)  -46859

Standard character data is any character you can type on your keyboard. Standard character values are always left-justified by SAS.

Nonstandard Data
The term nonstandard data refers to character and numeric data that SAS does not recognize automatically. Examples of nonstandard numeric data include
 12/12/2012  29FEB2000  4,242  $89,000

Create a SAS Data Set from a Raw Data File
E1232 E2341 E3452 E6781 E8321 E1052 E1062 E8172 E1091 15OCT1999 01JUN1997 26OCT1993 16SEP1992 26NOV1996 27FEB1997 10MAY1987 06JAN2000 20AUG1991

raw data file 61065 A contains employee 91688 information for 32639 the level 1 flight 28305 attendants. Use the 40440 raw data file to 39461 41463 create the work.fltat1 40650 SAS data set. 40950

Desired Output
Obs 1 2 3 4 5 6 7 8 9 EmpID E1232 E2341 E3452 E6781 E8321 E1052 E1062 E8172 E1091 Hire Date 14532 13666 12352 11947 13479 13572 9991 14615 11554 Salary 61065 91688 32639 28305 40440 39461 41463 40650 40950 Bonus 3053.25 4584.40 1631.95 1415.25 2022.00 1973.05 2073.15 2032.50 2047.50

The DATA Statement
A DATA step always begins with a DATA statement. General form of a DATA statement:

DATA SAS-data-set; DATA SAS-data-set;

The DATA statement starts the DATA step and names the SAS data set being created.

The INFILE Statement
If you are reading data from a raw data file, you need an INFILE statement. General form of an INFILE statement:
INFILE 'raw-data-file' <options>; INFILE 'raw-data-file' <options>;

The INFILE statement points to the raw data file being read. Options in the INFILE statement affect how SAS reads the raw data file.

The INPUT Statement
When you are reading from a raw data file, the INPUT statement follows the INFILE statement. General form of an INPUT statement:

INPUT variable-specification …; INPUT variable-specification …;

The INPUT statement describes the raw data fields and specifies how you want them converted into SAS variables.

Formatted Input
The input style tells SAS where to find the fields and how to read them into SAS.
INPUT @n variable-name informat. ...; INPUT @n variable-name informat. ...;

@n - moves the pointer to the starting point of the field. variable-name - names the SAS variable being created. Informat - specifies how many positions to read and how to convert the raw data into a SAS value.

The INPUT Statement
Common SAS informats: $w. - reads a standard character field, where w specifies the width of the field in bytes. W - reads a standard numeric field, where w specifies the width of the field in bytes. DATE9. - reads dates in the form 31DEC2012.

The Assignment Statement
To create a new variable in the DATA step, use an assignment statement:

variable-name=expression; variable-name=expression;

The assignment statement creates a SAS variable and specifies how to calculate that variable's value.

Create a SAS Data Set from a Raw Data File
data work.fltat1; infile 'raw-data-file'; input @1 EmpID $5. @7 HireDate date9. @17 Salary 5.; Bonus=.05*Salary; run;

Create a SAS Data Set from a Raw Data File
Partial Log NOTE: 9 records were read from the infile 'fltat1.dat'. The minimum record length was 21. The maximum record length was 21. NOTE: The data set WORK.FLTAT1 has 9 observations and 4 variables.

Overview of DATA Step Processing

Processing the DATA Step
 The SAS System processes the DATA step in two phases:
 compilation  execution.

 When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them. During the compile phase, SAS creates the following three items
 input buffer
 is a logical area in memory into which SAS reads each

record of raw data when SAS executes an INPUT statement.

 program data vector (PDV)
 is a logical area in memory where SAS builds a data set, one

observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements.

DATA Step Processing
 The

data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution.

 descriptor information
 is information that SAS creates and maintains about

each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

DATA Step Processing
 The flow of action in the Execution Phase of a simple DATA step
 The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.  SAS sets the newly created program variables to missing in the program data vector (PDV).  SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.  SAS executes any subsequent programming statements for the current record.

DATA Step Processing
 At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here.  SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation.  The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.

DATA step
 Reading External File Data
     data bonus_04; [1] infile 'your-input-file'; [2] input IDnumber name $ salary ; [3] bonus=salary * 0.25; [4] run; [5]

 1- Begin the DATA step and create a SAS data set called bonus_04.  2- Specify the external file that contains your data.  3- Read a record and assign values to three variables.  4- Calculate a value for variable bonus.  5- Execute the DATA step.

DATA step Input Styles
 The INPUT statement reads raw data from instream data lines or external files into a SAS data set and input styles depending on the layout of data values in the records.
 INPUT, Formatted - Reads input values from specified columns and assigns them to the corresponding SAS variables  INPUT, Column - Reads input values with specified informats and assigns them to the corresponding SAS variables  INPUT, List - Scans the input data record for input values and assigns them to the corresponding SAS variables  INPUT, Named - Reads data values that appear after a variable name that is followed by an equal sign and assigns them to corresponding SAS variables

DATA step
An informat is an instruction that SAS uses to read data values into a variable.
 The INPUT statement with an informat after a variable name is the simplest way to read values into a variable.
 $w.  DATEw.

Reads standard character data Reads date values in the form ddmmmyy or Reads date values in the form mmddyy or

ddmmmyyyy
 MMDDYYw.

mmddyyyy
 w.d -

Reads standard numeric data  COMMAw.d Removes embedded characters

Accessing Data – List input
 List input uses a scanning method for locating data values. Data values are must be separated by at least one blank (or other defined delimiter). List input requires only that you specify the variable names and a dollar sign ($), if defining a character variable.
          Libname new “/wipro/dw/data”; data new.scores; length name $ 12; input name $ score1 score2; datalines; aaaaaa 1132 1187 bbbbbbbb 1015 1102 cccc 246 357 ; Run;

Modified List Input
data scores; infile datalines dsd; input Name : $9. Score1-Score3 Team ~ $25. Div $; datalines; Smith,12,22,46,"Green Hornets, Atlanta",AAA Mitchel,23,19,25,"High Volts, Portland",AAA Jones,09,17,54,"Vulcans, Las Vegas",AA ; run; output Name Score1 Smith 12 22 Mitchel 23 Jones 09 17 Score2 Score3 Team Div 46 "Green Hornets, Atlanta“ AAA 19 25 "High Volts, Portland“ 54 "Vulcans, Las Vegas“ AA

AAA

Modified List Input
 The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column.  The ~ (tilde) format modifier enables you to read and retain single quotation marks, double quotation marks, and delimiters within character values.  If you want SAS to read consecutive delimiters as though there is a missing value between them, specify the DSD option in the INFILE statment.  To read and store a character input value longer than 8 bytes, define a variable's length by using a LENGTH, INFORMAT.  Character values cannot contain embedded blanks when the file is delimited by blanks.  Fields must be read in order.  Data must be in standard numeric or character format.

Data Accessing - Column Input
 Column input enables you to read standard data values that are aligned in columns in the data records. Specify the variable name, followed by a dollar sign ($) if it is a character variable, and specify the columns in which the data values are located in each record: data scores; infile datalines truncover; input name $ 1-12 score1 17-20 score2 27-30; datalines; 123456789101112131415161718192021222324252627282930 Riley 1132 987 Henderson 1015 1102 ; run;

Data Accessing - Column Input
 To use column input, data values must be
 in the same field on all the input lines  in standard numeric or character form.

 Features of column input include the following
 Character values can contain embedded blanks.  Character values can be from 1 to 32,767 characters long.  Placeholders, such as a single period (.), are not required for missing data.  Input values can be read in any order, regardless of their position in the record.  Values or parts of values can be reread.  Both leading and trailing blanks within the field are ignored.  Values do not need to be separated by blanks or other delimiters.  Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles data values of varying lengths appropriately.  

Data Accessing - Formatted Input
 Formatted input combines the flexibility of using informats with many of the features of column input. By using formatted input, you can read nonstandard data for which SAS requires additional instructions. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. data scores; input name $12. +4 score1 comma5. +6 score2 comma5.; datalines; Riley 1,132 1,187 Henderson 1,015 1,102 ;

Data Accessing - Formatted Input
Important points about formatted input are
 Characters values can contain embedded blanks.  Character values can be from 1 to 32,767 characters long.  Placeholders, such as a single period (.), are not required for missing data.  With the use of pointer controls to position the pointer, input values can be read in any order, regardless of their positions in the record.  Values or parts of values can be reread.  Formatted input enables you to read data stored in nonstandard form, such as packed decimal or numbers with commas.

Data Accessing - Named Input
 You can use named input to read records in which data values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs. data games; input name=$ score1= score2=; datalines; name=abc score1=1132 score2=1187 ; run;

The MISSOVER Option
 The MISSOVER option prevents SAS from loading a new record when the end of the current record is reached.  General form of the INFILE statement with the MISSOVER option: INFILE ‘raw-data-file’ MISSOVER; INFILE ‘raw-data-file’ MISSOVER;

 If SAS reaches the end of the row without finding values for all fields, variables without values are set to missing.

Using the MISSOVER Option
data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=',' missover; input ID $ InService : date9. PassCap CargoCap; run;
Raw Data File
50001 ,25feb1989,132, 530 50002, 11nov1989,152 50003, 22oct1991,168, 530 50004, 4feb1993,172 50005, 24jun1993, 170, 510 50006, 20dec1994, 180, 520

Using the MISSOVER Option
Partial SAS Log NOTE: 6 records were read from the infile 'aircraft3.dat'. The minimum record length was 19. The maximum record length was 26. NOTE: The data set WORK.AIRPLANES3 has 6 observations and 4 variables. NOTE: DATA statement used: real time 0.42 seconds cpu time 0.07 seconds

Missing Values without Placeholders
There is missing data represented by two consecutive delimiters. 50001 ,25feb1989,, 540 50002, 11nov1989,132, 530 50003, 22oct1991,168, 530 50004, 4feb1993,172, 550 50005, 24jun1993,, 510 50006, 20dec1994, 180,520

Missing Values without Placeholders
By default, SAS treats two consecutive delimiters as one. Missing data should be represented by a placeholder. 5 0 0 0 1 ,25feb1989 , . , The DSD Option
 General form of the DSD option in the INFILE statement: INFILE ‘file-name’ DSD; INFILE ‘file-name’ DSD;

530

Missing Values without Placeholders
The DSD option
 sets the default delimiter to a comma  treats consecutive delimiters as missing values  enables SAS to read values with embedded delimiters if the value is  surrounded by double quotes.

Using the DSD Option
data airplanes4; length ID $ 5; infile 'raw-data-file' dsd; input ID $ InService : date9. PassCap CargoCap; run;
50001 ,25feb1989,, 540 50002, 11nov1989,132, 530 50003, 22oct1991,168, 530 50004, 4feb1993,172, 550 50005, 24jun1993,, 510 50006, 20dec1994, 180,520

INFILE Statement Options
Problem Non-blank delimiters Missing data at end of row Missing data represented by consecutive delimiters or Embedded delimiters where values are surrounded by double quotes Option DLM='delimiter(s)' MISSOVER DSD

These options can be used separately or together in the INFILE statement.

Multiple Records Per Observation
Farr, Sue Anaheim, CA 869-7008 Anderson, Kay B. Chicago, IL 483-3321 Tennenbaum, Mary Ann Jefferson, MO 589-9030
A raw data file has three records per employee. Record 1 contains the first and last names, record 2 contains the city and state of residence, and record 3 contains the employee’s phone number.

Desired Output
The SAS data set should have one observation per employee.
LName Farr Anderson Tennenbaum FName Sue Kay B. Mary Ann City Anaheim Chicago Jefferson State CA IL MO Phone 869-7008 483-3321 589-9030

Multiple INPUT Statements
data address; length LName FName $ 20 City $ 25 State $ 2 Phone $ 8; infile 'raw-data-file' dlm=','; Load Record input LName $ FName $; Load Record input City $ State $; Load Record input Phone $; run;

Line Pointer Controls
You can also use line pointer controls to control when SAS loads a new record.
DATA SAS-data-set; DATA SAS-data-set; INPUT var-1 var-2 var-3 // var-4 var-5; INPUT var-1 var-2 var-3 var-4 var-5; additional SAS statements additional SAS statements

SAS loads the next record when it encounters a forward slash.

Reading Multiple Records Per Observation
data address; length LName FName $ 20 City $ 25 State $ 2 Phone $ 8; infile 'raw-data-file' dlm=','; Load Record input LName $ FName $ / Load Record City $ State $ / Load Record Phone $; run;

Reading Multiple Records Per Observation
Partial Log

NOTE: 9 records were read from the infile 'addresses.dat'. The minimum record length was 8. The maximum record length was 20. NOTE: The data set WORK.ADDRESS has 3 observations and 5 variables.

Reading Raw Data Files with Multiple Records Per Observation
Mixed Record Types

101 USA 1-20-1999 3295.50 3034 EUR 30JAN1999 1876,30 101 USA 1-30-1999 2938.00 128 USA 2-5-1999 2908.74 1345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10

Reading Raw Data Files with Multiple Records Per Observation
Desired Output
Sales ID 101 3034 101 128 1345 109 Sale Date 14264 14274 14274 14280 14281 14320

Location USA EUR USA USA EUR USA

Amount 3295.50 1876.30 2938.00 2908.74 3145.60 2789.10

Reading Raw Data Files with Multiple Records Per Observation
The Single Trailing @
 The single trailing @ option holds a raw data record in the input buffer until SAS  executes an INPUT statement with no trailing @  reaches the bottom of the DATA step.  General form of an INPUT statement with the single trailing @:

INPUT var1 var2 var3 … @; INPUT var1 var2 var3 … @;

Reading Raw Data Files with Multiple Records Per Observation
Processing the Trailing @

Load next record.

Hold record for next INPUT statement.

input SalesID $ Location $ @; if location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;

Multiple Observations Per Record
The raw data file RETIRE contains each employee’s identification number and this year’s contribution to his or her retirement plan. Each record contains E00973 1400 E09872 2003 E73150 2400 information for three employees.
E45671 4500 E34805 1980 E47200 4371 EmpID Contrib 1400 2003 2400 4500 1980

Desired Output

E00973 E09872 E73150 E45671 E34805

Multiple Observations Per Record
 Processing: What Is Required?
E00973 1400 E09872 2003 E73150 2400 Read for Obs. 1 Read for Obs. 2 Read for Obs. 3

Process Other Statements Output

Process Other Statements Output

Process Other Statements Output

Multiple Observations Per Record
The Double Trailing @
 The double trailing @ holds the raw data record across iterations of the DATA step until the line pointer moves past the end of the line. INPUT var1 var2 var3 … @@; INPUT var1 var2 var3 … @@;

data work.retire; length EmpID $ 6; infile 'raw-data-file'; input EmpID $ Contrib @@; run; Hold until end of
record.

Multiple Observations Per Record
Partial Log
NOTE: 2 records were read from the infile 'retire.dat'. The minimum record length was 35. The maximum record length was 36. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.RETIRE has 6 observations and 2 variables.
The “SAS went to a new line” message is expected because the @@ option indicates that SAS should read until the end of each record.

Multiple Observations Per Record
Trailing @ Versus Double Trailing @
Option Trailing @ INPUT var-1... @; Double trailing @ INPUT var-1 ... @@; Effect Holds raw data record until 1) an INPUT statement with no trailing @ 2) the bottom of the DATA step. Holds raw data records in input buffer until SAS reads past end of line.

Reading Hierarchical Raw Data Files
Processing Hierarchical Files
•Many files are hierarchical in structure, consisting of a •header record •one or more related detail records. Header Typically, each record contains a field that identifies whether it is a header record or a detail record.

Detail Detail Header Header Detail Header Detail Detail

Reading Hierarchical Raw Data Files
Processing Hierarchical Files
Heade

Heade You can read a hierarchical file into a SAS data set by creating r1 r observationDetail per detail record and storing Detail the header informatio Variab Variab 1 of each observation. les les Detail 2 Hierarchica Detail l File 3 Heade r2 Detail 1 Heade r3 Detail 1 Detail 2 Heade Detail r 1 SAS Data1 Set Heade Detail r1 2 Heade Detail r1 3 Heade Detail r2 1 Heade Detail r3 1 Heade Detail r3 2

Reading Hierarchical Raw Data Files
Creating One Observation Per Detail
E:Adams:Susan D:Michael:C D:Lindsay:C E:Porter:David D:Susan:S E:Lewis:Dorian D. D:Richard:C E:Dansky:Ian E:Nicholls:James D:Roberta:C E:Slaydon:Marla D:John:S
raw data file  The DEPENDANTS has a header record containing the name of the employee and a detail record for each dependant on the employee’s health insurance.

Reading Hierarchical Raw Data Files
 Desired Output
 Personnel would like a list of all the dependants and the name of the  associated employee.

EmpLName Adams Adams Porter Lewis Nicholls Slaydon

EmpFName Susan Susan David Dorian D. James Marla

DepName Michael Lindsay Susan Richard Roberta John

Relation C C S C C S

Reading Hierarchical Raw Data Files
The RETAIN Statement
General form of the RETAIN statement:

RETAIN variable-name <initial-value>; RETAIN variable-name <initial-value>;

The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. This means that values from previous records are available for processing.

Reading Hierarchical Raw Data Files
Hold EmpLName and EmpFName
data dependants(drop=Type); length EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end; run;

Example – Modified List Input
This example explains
 Compilation Phase  Execution Phase.

Example – Modified List Input
Raw Data File
50001 50002 50003 50004 50005 50006

Compile

4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer

ID $ 5

PDV

Raw Data File
50001 50002 50003 50004 50005 50006

Compile

4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

...

Raw Data File
50001 50002 50003 50004 50005 50006

Execute

4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

.

.

.
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer
5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

.

.

.
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50001

10627 .

. 132

530 .
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return Implicit output

Input Buffer
5 0 0 0 1

4 f eb 1 9 8 9

1 3 2

5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50001

530 . Write out observation to airplanes. . 132

10627 .

...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit output

Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2

5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50001

530 . Write out observation to airplanes. . 132

10627 .

...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return

Input Buffer
5 0 0 0 1

4 f eb 1 9 8 9

1 3 2

5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50001

10627 .

. 132

530 .
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

.

.

.
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50002

10907 .

. 152

540 .
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return Implicit output

Input Buffer
5 0 0 0 2

1 1 n o v 1 98 9

1 5 2

5 4 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50002

540 . Write out observation to airplanes. . 152

10907 .

...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit output

Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2

5 4 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50002

10907 .

. 152

540 .
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return

Input Buffer
5 0 0 0 2

1 1 n o v 1 98 9

1 5 2

5 4 0

ID $ 5

PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

50002

10907 .

. 152

540 .
...

Raw Data File
50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520

data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;

Input Buffer
5 0 0 0 2

Continue processing until 1 1 n o v 1 98 9 1 2 5 4 end of the raw5data file. 0 PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8

ID $ 5

.

.

.

Output of Dataset
proc print data=airplanes noobs; run; In Service 10627 10907 11617 12088 12228 12772 Pass Cap 132 152 168 172 170 180 Cargo Cap 530 540 530 550 510 520

ID 50001 50002 50003 50004 50005 50006

Combining SAS Data Sets
Create a Data set from two or more existing data sets by joining observation side-by-side or appends the observations from one data set to another data set. Methods to combine SAS data sets
 concatenating  interleaving  one-to-one reading  one-to-one merging  match merging  updating.

Combine SAS data sets

 Concatenating
 Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2. Data set COMBINED contains the results of the concatenation.

Combine SAS data sets
 Interleaving
 intersperses observations from two or more data sets, based on one or more common variables.

Combine SAS data sets
 One-to-One Reading and One-toOne Merging.
 One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set. Observations are combined based on their relative position in each data set. The DATA step stops after it has read the last observation from the smallest data set. One-toone merging is similar to a one-toone reading, with two exceptions: you use the MERGE statement instead of multiple SET statements, and the DATA step reads all observations from all data sets.

Combine SAS data sets
 Match merging
 combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables.

Combine SAS data sets
 Identifying Data Set Contributors
 When you read multiple SAS data sets in one DATA step, you can use the IN= data set option to detect which data set contributed to an observation.  General form of the IN= data set option:

SAS-data-set(IN=variable) SAS-data-set(IN=variable)
 where variable is any valid SAS variable name.  Variable is a temporary numeric variable with a value of:
 0 to indicate false; the data set did not contribute to the current

observation  1 to indicate true; the data set did contribute to the current observation

Combine SAS data sets
 IN= Data Set Option Transact
Num 111 111 113 114 116
s

Branch
Num 111 112 114 115 116 Branch M.G.Road Sivaji Nagar Madiwala Koramangala BTM

Trans D C C D C

Amnt 126.32 560 235 14.56 371.69

A data set named Newtrans shows this week’s transactions. s A data set named noactiv shows accounts with no transactions this week. s A data set named noacct shows transactions with no matching

Combine SAS data sets
Num 111 111 113 114 116 Trans D C C D C Amnt 126.32 560 235 14.56 371.69 Num 111 112 114 115 116 Branch M.G.Road Sivaji Nagar Madiwala Koramangala BTM

Branch data Transact noactiv (drop=Trans Amnt) noacct (drop=Branch); newtrans merge prog2.transact(in=InTrans) prog2.branch(in=InBanks); by ActNum; if InTrans and InBanks then output newtrans; else if InBanks and not InTrans then output noactiv; else if InTrans and not InBanks then output noacct; run;

Combine SAS data sets
A data set named Newtrans shows this week’s transactions. Num Trans Amnt Branch
111 111 114 116 D C D C 126.32 560 14.56 371.69 M.G.Road M.G.Road Madiwala BTM

A data set named noactiv shows accounts Num with no transactions this week. Branch
112 115 Sivaji Nagar Koramangala

A data set named noacct shows transactions with no matching account Num Trans Amnt number.
113 C 235

Questions

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.