You are on page 1of 6

(7) Create SAS Data From Raw Data (I)

Two methods to create SAS data from raw data


Raw data are not recognized by SAS, they have to be converted into SAS format first
before they can be utilized by SAS software
Two methods to create SAS data from raw data
-Using Job Stream
-Using External File

Job Stream
Use INPUT statement to specify variables, variable types and their sequence
Use CARDS or DATALINES to indicate the beginning of raw data
Input the data one line at a time, which represents one record in the SAS data set
It is useful when you have a small volume of data
External File
Use INFILE statement to specify full path of the external file or raw data file
Use INPUT statement to specify variables, variable types and their sequence
Variables Types
-Default is numeric
-Character, using dollar sign $ when reading character variables
Four Input Styles
List Input
Column Input
Formatted Input
Named Input

List Input
Input value must be separated by at lest one blank or space
Character values can not contain embedded blanks
Values must be read in order
Values must be in standard character or numeric format
Missing character or numeric value must be represented by a period or a dot.

Column Input
Data values must be in the same field on all the input lines, in other word, the data
should line up vertically
Data values must be in standard character or numeric format
Character values can contain embedded blanks
Data values can be read in any order, regardless their position in the record
Values need not to be separated by blanks or other delimiters
No placeholder, such as a period, is needed to represent missing data
Four Input Styles

Formatted Input
The values in raw data file is formatted
In the input statement, associate the variable statement with the format, which is
called Informat
SAS will automatically convert the formatted value in raw data into numeric value

Named Input
Read data in which data values are preceded by the variable name and an equal sign
Usage of modifier in List Input

By default, when using List Input, the character values can not contain embedded
blanks and the length of value can be read in without truncation is 8
You can change the default using modifiers
There are two modifiers:
-Ampersand (&) modifier
-Colon (:) modifier
The Ampersand (&) modifier is used to read character values the contain embedded
blanks
The value is read until two or more consecutive blanks are encountered
No other delimiter can be used to indicate the end of each field

The Colon (:) modifier can also be used to read nonstandard data values and character
values that are longer than 8 characters
The value is read until a delimiter is encountered, or when the full length of an
informat is reached, whenever which occurs first

Read in raw data with delimiter other than space
Delimiter is a symbol or character to separate different field or items
By default, blank or space is the delimiter
Other common delimiter are
Comma
Tab

To read in raw data with delimiter with other than space, you need to use DLM=
options to Specify the delimiter.
Use DLM= , to read raw data file with comma as delimiter, such file is also called
comma delimited file or CSV file
Use DLM= 09x to read raw data file with tab key as delimiter, such file is also called
tab delimited file

You might also like