You are on page 1of 24

The Hang Seng University of Hong Kong

Department of Mathematics, Statistics and Insurance


AMS2640 Statistical Computing in Practice

Study Notes

Lesson 3 – Import SAS Dataset


✓ Entering Data with Viewtable Window
✓ Reading Raw Data Using List Input
✓ Reading Raw Data Using Column Input
✓ Reading Non-Standard Format Raw Data
✓ Reading Raw Data Using INFILE Options
✓ Reading Files Using IMPORT Options

HSUHK 1 AMS2640 Study Notes


Before SAS can analyse your data, the data must be in a special form called a SAS data set. If
your data is not stored in the form of a SAS data set, then you need to create a SAS data set by
entering data, by reading raw data, or by accessing external files. Once your data have been
read into a SAS data set, SAS keeps track of what is where and in what form.

✓ Entering Data with Viewtable Window


As discussed in the previous lesson, the Viewtable window displays tables (another name
for data sets) in a tabular format. You can start typing data into this default table, and SAS
will automatically figure out if your columns are numeric or character. However, it’s a good
idea to tell SAS about your data with the Column Attributes window.
To open the empty Viewtable window
(1) Select Table Editor from the Tools menu
(2) Right-clicking a letter to open the Column Attributes window
(3) Select Save As from the File menu to save the dataset
(4) Select a library, and then specify the member name of your dataset

To browse or edit an existing table


(1) Select Table Editor from the Tools menu to open the Viewtable window
(2) Select Open from the File menu
(3) Click the library you want and then the table name.

HSUHK 2 AMS2640 Study Notes


SAS Code Example 3.1:
Tables that you create in Viewtable can be used in programs. Try to type the data in the
Viewtable window for the variable US president name, political party, and their number of
presidency. By saving the table in the SASUSER library and named it uspresident, you could
print it with this program:
* Print the results ;
PROC PRINT DATA=Sasuser.uspresident;
RUN;

✓ Reading Raw Data Using List Input


You may use the DATA step to read the data. SAS assumes external files have a record
length (number of characters, including spaces, in a data line) of 256 or less.
To read the raw data file, DATA step must provide the following instructions to SAS:
(1) the location or name of the external text file
(2) a name for the new SAS data set
(3) a reference that identifies the external file
(4) use DATALINES statement for internal data
(5) a description of the data values to be read.
When reading raw data, use the INFILE statement to tell SAS the filename and path, if
appropriate, of the external file containing the data. The INFILE statement follows the
DATA statement and must precede the INPUT statement. After the INFILE keyword, the
file path and name are enclosed in quotation marks.
INFILE 'drive:\directory\filename.dat';

Windows: INFILE 'c:\MyDir\President.dat';


UNIX: INFILE '/home/mydir/president.dat';
Open VMS: INFILE '[username.mydir]president.dat';
z/OS: INFILE 'MYID.PRESIDEN.DAT';

The INPUT statement, which is part of the DATA step, tells SAS how to read your raw data.
To write an INPUT statement using list input, simply list the variable names after the
INPUT keyword in the order they appear in the data file. If the values are character (not
numeric), then place a dollar sign ($) after the variable name. Leave at least one space
between names, and remember to place a semicolon at the end of the statement.

HSUHK 3 AMS2640 Study Notes


SAS Code Example 3.2 (same dataset as example 3.1):
Suppose the following data are in a file called US_president.dat in the directory Users on the
C drive (Windows). The following program shows the use of the INFILE statement to read
the external data file. You can use notepad to open .dat files to look at the data carefully before
input to SAS.
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
* Read data from external file into SAS data set;
DATA uspresid3_2a;
INFILE 'C:\Users\elainemo\US_president.dat';
INPUT President $ Party $ Number;
RUN;
PROC PRINT DATA = uspresid3_2a;
RUN;
This statement tells SAS to read three data values. The $ after President and Party indicates
that both variables are character, whereas Number is standard numeric values. We can observe
from the log. Alternatively, we can use FILENAME statement from previous lesson and read
the data as well.

FILENAME uspresid 'C:\Users\elainemo\US_president.dat';


DATA uspresid3_2b;
INFILE uspresid;
INPUT President $ Party $ Number;
RUN;

DATA 'drive:\directory\filename';
SET datasetname;
RUN;
If your data lines are long, then use the LRECL= option in the INFILE statement to specify
the longest record in your data file.
INFILE 'drive:\directory\filename.dat' LRECL=2000;

HSUHK 4 AMS2640 Study Notes


• Comma-separated values files
The Import Wizard can read all types of delimited files including comma-separated
values .csv files are a common file type for moving data between applications.
To use Import Wizard
(1) Choose Import Data from the File menu
(2) Select the type of file you are importing
(3) Specify the location of the file that you want to import
(4) By default, SAS uses the first row in the file as the variable names for the SAS data
set and starts reading data in the second row.
(5) Choose the SAS library and member name for the SAS data set that will be created
(6) Choose whether save the PROC IMPORT statements used for importing the file

SAS Code Example 3.3 (similar dataset as example 3.1, but comma separated):
Try to import US_president.csv in SASUSER library using Import Wizard. Then print the
result like example 3.1. You will observe the following in the log.

HSUHK 5 AMS2640 Study Notes


• Text, ASCII, sequential, or flat files
If you type raw data directly in your SAS program, then the data are internal to your
program.
To indicate internal data using the DATALINES statement
(1) The DATALINES statement must be the last statement in the DATA step
(2) All lines in the SAS program following the DATALINES statement are considered
data until semicolon
(3) The CARDS statement and the DATALINES statement are synonymous

SAS Code Example 3.4 (similar dataset as example 3.1, now different spacing):
* Read internal data into SAS data set uspresid3_4;
DATA uspresid3_4;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
RUN;
PROC PRINT DATA = uspresid3_4;
RUN;

SAS Code Example 3.5:


Suppose results of public exam are collected from the district of school based on the variables
exam id, female or not, race, school type, score of reading, writing, mathematics, and science.
Try to read the following information using CARDS statement and print the data. Check with
the log, and result viewer window.
* Read internal data into SAS data set hsb3_5;
DATA hsb3_5;
INPUT id female race schtype $ read write math science;
CARDS;
147 1 1 pub 47 62 53 53
108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44
53 0 1 pub 39 . 40 39
50 0 2 pri 50 59 42 53
51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
;
RUN;
PROC PRINT DATA = hsb3_5;
TITLE 'Example 3.5';
RUN;

HSUHK 6 AMS2640 Study Notes


The TITLE statement after the PROC PRINT tells SAS to put the text enclosed in quotation
marks on the top of each page of output. If you had no TITLE statement in your program, SAS
would put the words “The SAS System” at the top of each page. Using TITLE without
arguments cancels all existing titles.

• Space files
If the values in your raw data file are all separated by at least one space, then using list
input (or call free format input). List input is an easy way to read raw data into SAS, but
with ease come a few limitations. You must read all the data in a record — no skipping
over unwanted values. Any missing data must be indicated with a period.

SAS Code Example 3.6 (same dataset as example 3.5, now in hsb.dat):
* Create a SAS data set named hsb3_6;
* Read the data file hsb.dat using list input;
DATA hsb3_6;
INFILE 'C:\Users\elainemo\hsb.dat';
INPUT id female race schtype $ read write math science;
RUN;

PROC PRINT DATA = hsb3_6;


TITLE 'Example 3.6';
RUN;

HSUHK 7 AMS2640 Study Notes


✓ Reading Raw Data Using Column Input
• Data arranged in columns
Some raw data files do not have spaces (or other delimiters) between all the values or
periods for missing data — so the files can’t be read using list input. But if each of the
variable’s values is always found in the same place in the data line, then you can use column
input as long as all the values are character or standard numeric.
Column input has the following advantages over list input
(1) Spaces are not required between values
(2) Missing values can be left blank
(3) Character data can have embedded spaces
(4) Skip unwanted variables
The columns are positions of the characters or numbers in the data line. To determine how
does a dataset look like if it is suitable for column input over list input, we can add a ruler
on the top row.
The dataset on the left is in free format, meaning data that is not arranged in columns.
Notice that the values for a particular field do not begin and end in the same columns. You
cannot use column input to read this file. The dataset on the right is arranged in columns or
fixed fields. You can specify a beginning and ending column for each field.

To do column input using INPUT statement


(1) After the INPUT keyword, list the first variable’s name
(2) If the variable is character, leave a space; then place a $
(3) After the $, or variable name if it is numeric, leave a space; then list the column or
range of columns for that variable
(4) Repeat this for all the variables you want to read

Column input has several features that make it useful for reading raw data. It can be used to
read character variable values that contain embedded blanks. Fields can be read in any order.
No placeholder is required for missing data where a blank field is read as missing. Fields or
parts of fields can be re-read. Fields do not have to be separated by blanks or other delimiters.

HSUHK 8 AMS2640 Study Notes


SAS Code Example 3.7 (same dataset as example 3.4):
The following is the data file named US_president_Column.dat with ruler added on top (actual
dataset does not have the rule).
1---+----10-
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
* Create a SAS data set named uspresid3_7;
* Read the US_president_Column.dat using column input;
DATA uspresid3_7;
INFILE 'C:\Users\elainemo\US_president_Column.dat';
INPUT President $ 1-7 Party $ 8-9 Number 10-12;
RUN;
PROC PRINT DATA = uspresid3_7;
TITLE 'Example 3.7';
RUN;

SAS Code Example 3.8 (similar dataset as example 3.5, now different spacing):
The following is the data file named hsb_Column.dat with ruler added on top (actual dataset
does not have the rule).
1---+----10---+----20--
147 1 1 pub 47 62 53 53
108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44
53 0 1 pub 39 40 39
50 0 2 pri 50 59 42 53
51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
* Create a SAS data set named hsb3_8;
* Read the hsb_Column.dat using column input;
DATA hsb3_8;
INFILE 'C:\Users\elainemo\hsb_Column.dat';
INPUT id 1-3 schtype $ 8-11 female 4-5 race 6-7
read 12-14 write 15-17 math 18-20 science 21-22;
RUN;
PROC PRINT DATA = hsb3_8;
TITLE 'Example 3.8';
RUN;

HSUHK 9 AMS2640 Study Notes


✓ Reading Non-Standard Format Raw Data
• Column pointer, line pointer, and trailing
Column and formatted input do not require spaces (or other delimiters) between variables
and can read embedded blanks. Sometimes you use one style, sometimes another, and
sometimes the easiest way is to use a combination of styles. SAS is so flexible that you can
mix and match any of the input styles for your own convenience.
To read mixed input styles using column pointer
(1) Use @n to refer to specific column number
(2) Use +n to move forward to a column number that is relative to the current position
(3) Use @'character' to refer to specific character u
(4) Can be used to skip backwards or forwards within a data line.
(5) Can skip over unneeded data, or to read a variable twice using different informats
SAS Code Example 3.9 (same dataset as example 3.8):
Instead of reading all the variables, use the column pointer to read the variables ID, school type,
math, and science.
1---+----10---+----20--
147 1 1 pub 47 62 53 53
108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44
53 0 1 pub 39 40 39
50 0 2 pri 50 59 42 53
51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
DATA hsb3_9;
INFILE 'C:\Users\elainemo\hsb_Column.dat';
INPUT id @8 schtype $ @18 math science;
/* Read the data file using column pointers */
RUN;
PROC PRINT DATA = hsb3_9;
TITLE 'Example 3.9';
RUN;

HSUHK 10 AMS2640 Study Notes


SAS Code Example 3.10:
The following data lines are part of a web log for a dog care business website. The data lines
start with the IP address of the computer accessing the web page followed by other information
including the date the file was accessed and the file name.
130.192.70.235 - - [08/Jun/2008:23:51:32 -0700] "GET /rover.jpg HTTP/1.1" 200 66820
128.32.236.8 - - [08/Jun/2008:23:51:40 -0700] "GET /grooming.html HTTP/1.0" 200 8471
128.32.236.8 - - [08/Jun/2008:23:51:40 -0700] "GET /Icons/brush.gif HTTP/1.0" 200 89
128.32.236.8 - - [08/Jun/2008:23:51:40 -0700] "GET /H_poodle.gif HTTP/1.0" 200 1852
118.171.121.37 - - [08/Jun/2008:23:56:46 -0700] "GET /bath.gif HTTP/1.0" 200 14079
128.123.121.37 - - [09/Jun/2008:00:57:49 -0700] "GET /lobo.gif HTTP/1.0" 200 18312
128.123.121.37 - - [09/Jun/2008:00:57:49 -0700] "GET /statemnt.htm HTTP/1.0" 200 238
128.75.226.8 - - [09/Jun/2008:01:59:40 -0700] "GET /Icons/leash.gif HTTP/1.0" 200 98

DATA weblogs;
INFILE 'C:\Users\elainemo\dogweblogs.dat';
INPUT @'[' AccessDate DATE11. @'GET' File :$20.;
/* Read the data file using column pointers */
RUN;

PROC PRINT DATA = weblogs;


TITLE 'Example 3.10';
RUN;

To read multiple lines of raw data per observation using line pointers
(1) Use slash (/) to skip to the next line
(2) Use pound-n (#n) specify number of the line of raw data for that observation
(3) Can be used to skip backwards or forwards between multiple data lines

To read multiple observations per line of raw data using double trailing
(1) Use double trailing at signs (@@) at the end of INPUT statement
(2) SAS will continue to read observations until it either runs out of data or reaches an
INPUT statement that does not end with a double trailing

To read part of a raw data file


(1) Use trailing at signs (@) at the end of INPUT statement
(2) SAS will hold that line of data until it reaches either the end of the DATA step, or
an INPUT statement that does not end with a trailing
(3) Test the next observation with an IF statement to see if you want to keep
(4) Read data for the remaining variables with a second INPUT statement

HSUHK 11 AMS2640 Study Notes


SAS Code Example 3.11 (same dataset as example 3.5, now different spacing):
Result of public exam is record in a different format as hsb_Line.dat, try to read the data with
slash and line pointer.
147 1 1
pub 47 62
53 53
108 0 1
pri 34 33
41 36
18 . 3
pub 50 33
49 44
53 0 1
pub 39 .
40 39
50 0 2
pri 50 59
42 53
51 1 2
pri 42 36
42 31
102 0 1
. 52 41
51 53
DATA hsb3_11;
INFILE 'C:\Users\elainemo\hsb_Line.dat';
INPUT id female race
/ schtype $ read write
#3 math science;
/* Read the data file using line pointers */
RUN;

PROC PRINT DATA = hsb3_11;


TITLE 'Example 3.11';
RUN;

HSUHK 12 AMS2640 Study Notes


SAS Code Example 3.12 (same dataset as example 3.5, now different spacing):
Result of public exam is record in a different format as hsb_Trailing.dat, try to read the data
with trailing sign.
147 1 1 pub 47 62 53 53 108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44 53 0 1 pub 39 . 40 39
50 0 2 pri 50 59 42 53 51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
* Input more than one observation from each record;
DATA hsb3_12;
INFILE 'C:\Users\elainemo\hsb_Trailing.dat';
INPUT id female race schtype $ read write
math science @@;
/* Input more than one observation from each record */
RUN;

PROC PRINT DATA = hsb3_12;


TITLE 'Example 3.12';
TITLE2 'High School Band';
/* Separate the title into 2 lines */
RUN;

HSUHK 13 AMS2640 Study Notes


SAS Code Example 3.13 (same dataset as example 3.8):
The data hsb_Column.dat is used. Suppose you want to delete the private school type.
1---+----10---+----20--
147 1 1 pub 47 62 53 53
108 0 1 pri 34 33 41 36
18 . 3 pub 50 33 49 44
53 0 1 pub 39 40 39
50 0 2 pri 50 59 42 53
51 1 2 pri 42 36 42 31
102 0 1 . 52 41 51 53
DATA hsb3_13;
INFILE 'C:\Users\elainemo\hsb_Column.dat';
INPUT id 1-3 female 4-5 race 6-7 schtype $ 8-11 @;
IF schtype = 'pri' THEN DELETE;
/* Use a trailing @ to hold the INPUT statement,
then delete school type private */
INPUT read 12-14 write 15-17 math 18-20 science 21-22;
RUN;

PROC PRINT DATA = hsb3_13;


TITLE 'Example 3.13';
RUN;

Sometimes raw data are not straightforward numeric or character. For example, the non-
standard numerical data mentioned in previous lesson with values that contain percent signs,
dollar signs, and commas, or date and time values. We can tell SAS what to do in the
INPUT statement.

HSUHK 14 AMS2640 Study Notes


• Informats for non-standard data.
There are three general types of informats: character, numeric, and date with general forms:
Character Numeric Date
$informatw. informatw.d informatw.
The $ indicates character informats, informat is the name of the informat, w is the total
width, and d is the number of decimal places (numeric informats only). Two informats do
not have names: $w., which reads standard character data, and w.d, which reads standard
numeric data. The period is very important part of the informat name. Some selected
informat are listed in the table below:

HSUHK 15 AMS2640 Study Notes


Dates are perhaps the most common non-standard data. Using date informats, SAS will
convert conventional forms of dates like 10-31-2007 or 31OCT07 into a number, the
number of days since January 1, 1960. This number is referred to as a SAS date value.
SAS stores time values similar to the way it stores date values. A SAS time value is stored
as the number of seconds since midnight. A SAS datetime is a special value that combines
both date and time information. A SAS datetime value is stored as the number of seconds
between midnight on January 1, 1960, and a given date and time.
Lets look at some informat examples below:

HSUHK 16 AMS2640 Study Notes


SAS Code Example 3.14
Results from a local pumpkin-carving contest is collected. Each line includes the contestant’s
name, age, type (carved or decorated), the date the pumpkin was entered, and the scores from
each of five judges. The following is a sample of the data file named Pumpkin _Column.dat.
1---+----10---+----20---+----30---+----40---+----50-
Alicia Grossman 13 c 10-28-2008 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2008 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2008 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2008 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2008 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2008 7.8 8.4 8.5 7.9 8.0
DATA contest3_14;
INFILE 'C:\Users\elainemo\Pumpkin_Column.dat';
INPUT Name $16. Age 3. +1 Type $1. +1 Date MMDDYY10.
(Score1 Score2 Score3 Score4 Score5) (4.1);
/* Read the data file using informat */
RUN;

PROC PRINT DATA = contest3_14;


TITLE 'Example 3.14';
RUN;

The variable Name has an informat of $16., meaning that it is a character variable 16 columns
wide. Variable Age has an informat of three, is numeric, three columns wide, and has no
decimal places. The +1 skips over one column. Variable Type is character, and it is one column
wide. Variable Date has an informat MMDDYY10. and reads dates in the form 10-31-2007 or
10/31/2007, each 10 columns wide. The remaining variables, Score1 through Score5, all
require the same informat, 4.1. By putting the variables and the informat in separate sets of
parentheses, you only have to list the informat once.

HSUHK 17 AMS2640 Study Notes


• Modifying List Input
Apart from using column input with informat, we can use informat with modified list input.
To read character values that contain embedded blanks
(1) Use ampersand & modifier to read with list input might contain one or more single
embedded blanks
(2) Value is read until two or more consecutive blanks are encountered
(3) Must use two consecutive blanks as delimiters when you use the & modifier
(4) Use LENGTH statement to define the length of variable

To read nonstandard data longer than eight characters


(1) Use colon : modifier to read until a blank (or other delimiter) is encountered
(2) Read non standard data values and character values that are longer than eight
characters
(3) Specify the informat for that variable after colon : modifier

SAS Code Example 3.15 (similar dataset as example 3.14, with different spacing):
The following is a sample of the data file named Pumpkin _List.dat.
Alicia Grossman 13 c 10-28-2008 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2008 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2008 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2008 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2008 8.9 9.5 10.0 9.7 9.0
Brian Williams 11 C 10-29-2008 7.8 8.4 8.5 7.9 8.0
DATA contest3_15;
INFILE 'C:\Users\elainemo\Pumpkin_List.dat';
LENGTH Name $ 16;
INPUT Name & Age Type $ Date : MMDDYY10.
Score1 Score2 Score3 Score4 Score5;
/* Read the data file using modified list input */
RUN;
PROC PRINT DATA = contest3_15;
TITLE 'Example 3.15';
RUN;

HSUHK 18 AMS2640 Study Notes


✓ Reading Raw Data Using INFILE Options
• Delimited files
Delimited files are raw data files that have a special character separating data values, often
with commas or tab characters for delimiters.
To read your delimited data using list input
(4) Use DELIMITER= or DLM= option in the INFILE statement allows you to read
data files with delimiters other than space
(5) Read data files with any delimiter character by just enclosing the delimiter character
in quotation mark
(6) Use the DLMSTR= option if the delimiter is a string of characters

By default, SAS interprets two or more delimiters in a row as a single delimiter. For
delimiter-sensitive data files with missing values, and two delimiters in a row indicate a
missing value, we can use DSD option.
To read delimiter-sensitive data
(1) Ignores delimiters in data values enclosed in quotation marks
(2) Does not read quotation marks as part of the data value
(3) Treats two delimiters in a row as a missing value
(4) The DSD option in the INFILE statement, it assumes the delimiter is a comma
(5) Use DLM= option with the DSD option to specify other delimiters. For example,
DLM='09'X option equivalent of a tab character hexadecimal

SAS Code Example 3.16 (same dataset as example 3.3):


The US_president_Comma.dat is comma delimited as below. Try to read using infile DLM.
Adams,F,2
Lincoln,R,16
Grant,R,18
Kennedy,D,35
DATA uspresid3_16;
INFILE 'C:\Users\elainemo\US_president_Comma.dat'
DLM = ',';
/* Read csv data from external file into SAS data set*/
INPUT President $ Party $ Number;
RUN;

PROC PRINT DATA = uspresid3_16;


TITLE 'Example 3.16';
RUN;

HSUHK 19 AMS2640 Study Notes


SAS Code Example 3.17 (similar dataset as example 3.3, now with missing data):
The US_president_Miss.dat is comma delimited as below. Try to read using infile DSD.
Adams,,2
Lincoln,R,16
Grant,R,18
Kennedy,D,35
DATA uspresid3_17;
INFILE 'C:\Users\elainemo\US_president_Miss.dat' DSD;
* treats two consecutive delimiters as a missing value;
INPUT President $ Party $ Number;
RUN;

PROC PRINT DATA = uspresid3_17;


TITLE 'Example 3.17';
RUN;

• Messy raw data


When reading raw data files, SAS makes certain assumptions, but some data files can’t be
read using the default assumptions. By default, SAS starts reading with the first data line
and, if SAS runs out of data on a line, it automatically goes to the next line to read values
for the rest of the variables. The options in the INFILE statement change the way SAS
reads raw data files.
To begin reading data not at the first line
(1) Use FIRSTOBS=n option where n is the number of beginning data line
(2) Useful if the data file contains descriptive text or header information at the
beginning

To read only a part of your data file


(1) Use OBS=n option to stop reading when it gets to that line in the raw data file
(2) Not correspond to the number of observations, but n is number of data lines

To stop reading if runs out of data line


(1) Use MISSOVER option to stop reading if there is missing value at the end of a data
line
(2) Use TRUNCOVER option when you are reading data using column or formatted
input and some data lines are shorter than others
(3) Both assign missing values to variables if the data line ends before the variable’s
field starts
(4) When the data line ends in the middle of a variable field, TRUNCOVER will take as
much as is there, whereas MISSOVER will assign the variable a missing value.

HSUHK 20 AMS2640 Study Notes


SAS Code Example 3.18:
The Address.dat file contains addresses and must be read using column or formatted input
because the street names have embedded blanks. Note that the data lines are all different lengths,
some of the addresses stop before the end of the variable Street’s field (columns 22 - 37).
1---+----10---+----20---+----30---+--
John Garcia 114 Maple Ave.
Sylvia Chung 1302 Washington Drive
Martha Newton 45 S.E. 14th St.
DATA address3_18a;
INFILE 'C:\Users\elainemo\Address.dat' TRUNCOVER;
/* Read variable street does not end on same column */
INPUT Name $ 1-15 Number 16-19 Street $ 22-37;
RUN;
PROC PRINT DATA = address3_18a;
TITLE 'Example 3.18a';
RUN;

DATA address3_18b;
INFILE 'C:\Users\elainemo\Address.dat' MISSOVER;
/* assign missing value if the data does not have
full length input */
INPUT Name $ 1-15 Number 16-19 Street $ 22-37;
RUN;
PROC PRINT DATA = address3_18b;
TITLE 'Example 3.18b';
RUN;

HSUHK 21 AMS2640 Study Notes


SAS Code Example 3.19:
The Address.dat file contains addresses and must be read using column or formatted input
because the street names have embedded blanks. Note that the data lines are all different lengths,
some of the addresses stop before the end of the variable Street’s field (columns 22 - 37).
1---+----10---+----20---+----30---+--
John Garcia 114
Sylvia Chung 1302 Washington Drive
Martha Newton 45 S.E. 14th St.
DATA address3_19;
INFILE 'C:\Users\elainemo\Address_Miss.dat' MISSOVER;
/* missing value at the end of a data line */
INPUT Name $ 1-15 Number 16-19 Street $ 22-37;
RUN;
PROC PRINT DATA = address3_19;
TITLE 'Example 3.19';
RUN;

SAS Code Example 3.20 (similar dataset at example 3.2, with data description):
The US_president_Obs.dat has a description of the data in the first two lines and a remark at
the end of the file that was not part of the data.
Information of selected US presidents
President Party Number
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
Data copy from wiki
DATA uspresid3_20;
INFILE 'C:\Users\elainemo\US_president_Obs.dat'
FIRSTOBS=3 OBS=6;
/* Read the data from line 3 to 6 */
INPUT President $ Party $ Number;
RUN;

PROC PRINT DATA = uspresid3_20;


TITLE 'Example 3.20';
RUN;

HSUHK 22 AMS2640 Study Notes


✓ Reading Files Using IMPORT Options
The previous section introduced how to read delimited data files using the DATA step. We
can also read delimited and PC files using the IMPORT procedure. The default setting and
general form about PROC IMPORT procedure:
PROC IMPORT DATAFILE = 'filename' OUT = data-set
DBMS = identifier REPLACE;
o Scan the first 20 rows by default
o Determine the variable types (character or numeric) and assign lengths to the
character variables, and can recognize some date formats
o Treat two consecutive delimiters in your data file as a missing value
o Read values enclosed by quotation marks, and assign missing values to variables
when it runs out of data on a line.
Options to specify within the PROC IMPORT procedure:
DATAFILE ='filename' The file you want to read
OUT = data-set Name of the SAS data set you want to create
DBMS = identifier Determine the file type by the extension of the file
Tells SAS to replace the SAS data set named in the
REPLACE
OUT= option if it already exists

• Delimited files
Delimited files are raw data files that have a special character separating data values, often
with commas or tab characters for delimiters. We specify different DBMS identifier
for SAS to read the following types of file:
Type of File Extension identifier
Comma-delimited .csv CSV
Tab-delimited .txt TAB
Delimiters other than commas or tabs - DLM

Apart from different delimited files, there are also some other options we can specify after
the PROC IMPORT procedure:

Optional statements Description


DATAROWS = n; Start reading data in row n. Default is 1.
DELIMITER = 'delimiter-
character'; Delimiter for DLM files. Default is space.
Do not get variable names from the first line of
GETNAMES=NO; input file. Default is YES. If NO, then variables
are named VAR1, VAR2, VAR3, and so on.
GUESSINGROWS = n; Use n rows to determine variable types. Default is
20.

HSUHK 23 AMS2640 Study Notes


• PC files
You can use the IMPORT procedure to import several types of PC files in the Windows
and UNIX operating environments. An alternative method of reading some types of PC
files in the Windows operating environment which does not require SAS/ACCESS is
Dynamic Data Exchange (DDE) which we will not cover here. Again, we specify different
DBMS identifier for SAS to read the following types of file:

Type of File Extension identifier


Microsoft Excel .xls EXCEL or XLS
dBase .dbf DBF
JMP .jmp JMP
Lotus .wk4 WK4
Paradox .db PARADOX
SPSS Save file .sav SAV
Stata .dta DTA
The Microsoft Excel XLS identifier looks at all rows in the file to determine the column
type, whereas the EXCEL identifier only looks at the first 8 data rows by default.

Optional statements Description


SHEET = "sheet- specify which sheet to read if you have more than one sheet
name"; in your workbook
RANGE = "sheet- specify a range to read only specific cells in the sheet. The
name$UL:LR"; range can be a named range (if defined), or you can specify
the upper-left and lower-right cells for the range
Do not get variable names from the first line of input file.
GETNAMES=NO; Default is YES. If NO, then variables are named VAR1,
VAR2, VAR3, and so on.

SAS Code Example 3.21 (same dataset as example 3.3):


PROC IMPORT DATAFILE ='C:\Users\elainemo\US_president.csv'
OUT = uspresid3_21 REPLACE;
/* create dataset call uspresid3_18 */
RUN;
PROC PRINT DATA = uspresid3_21;
TITLE 'Example 3.21';
RUN;

SAS Code Example 3.22 (similar dataset as example 3.1, but saved in excel .xlsx file):
PROC IMPORT DATAFILE = 'C:\Users\elainemo\US_president.xlsx'
DBMS=XLSX OUT = uspresid3_22 REPLACE;
/* file type extension .xlsx is excel file */
RUN;
PROC PRINT DATA = uspresid3_22;
TITLE 'Example 3.22';
RUN;

HSUHK 24 AMS2640 Study Notes

You might also like