Raw data are not recognized by SAS, they have to be converted into SAS format first before they can be utilized by SAS software Two methods to create SAS data from raw data -Using Job Stream -Using External File
Job Stream Use INPUT statement to specify variables, variable types and their sequence Use CARDS or DATALINES to indicate the beginning of raw data Input the data one line at a time, which represents one record in the SAS data set It is useful when you have a small volume of data External File Use INFILE statement to specify full path of the external file or raw data file Use INPUT statement to specify variables, variable types and their sequence Variables Types -Default is numeric -Character, using dollar sign $ when reading character variables Four Input Styles List Input Column Input Formatted Input Named Input
List Input Input value must be separated by at lest one blank or space Character values can not contain embedded blanks Values must be read in order Values must be in standard character or numeric format Missing character or numeric value must be represented by a period or a dot.
Column Input Data values must be in the same field on all the input lines, in other word, the data should line up vertically Data values must be in standard character or numeric format Character values can contain embedded blanks Data values can be read in any order, regardless their position in the record Values need not to be separated by blanks or other delimiters No placeholder, such as a period, is needed to represent missing data Four Input Styles
Formatted Input The values in raw data file is formatted In the input statement, associate the variable statement with the format, which is called Informat SAS will automatically convert the formatted value in raw data into numeric value
Named Input Read data in which data values are preceded by the variable name and an equal sign Usage of modifier in List Input
By default, when using List Input, the character values can not contain embedded blanks and the length of value can be read in without truncation is 8 You can change the default using modifiers There are two modifiers: -Ampersand (&) modifier -Colon (:) modifier The Ampersand (&) modifier is used to read character values the contain embedded blanks The value is read until two or more consecutive blanks are encountered No other delimiter can be used to indicate the end of each field
The Colon (:) modifier can also be used to read nonstandard data values and character values that are longer than 8 characters The value is read until a delimiter is encountered, or when the full length of an informat is reached, whenever which occurs first
Read in raw data with delimiter other than space Delimiter is a symbol or character to separate different field or items By default, blank or space is the delimiter Other common delimiter are Comma Tab
To read in raw data with delimiter with other than space, you need to use DLM= options to Specify the delimiter. Use DLM= , to read raw data file with comma as delimiter, such file is also called comma delimited file or CSV file Use DLM= 09x to read raw data file with tab key as delimiter, such file is also called tab delimited file