You are on page 1of 32

SASTechies

info@sastechies.com
http://www.sastechies.com
 Character data with specified lengths

 Standard numeric data values can only contain
numbers
decimal points
numbers in scientific, or E, notation (23E4)
minus signs.

 Nonstandard numeric data include
values that contain special characters, such as
percent signs (%), dollar signs ($), and commas (,)
date and time values
data in fraction, integer binary and real binary, and
hexadecimal forms.

SAS Techies 2009 11/13/09 2
External File Data
Raw data can be organized in several different ways.
>----+----10---+----20

 BARNES NORTH 360.98
 FARLSON WEST 243.94
This external file contains data that is free-format,  LAWRENCE NORTH 195.04
meaning data that is not arranged in columns. Notice that  NELSON EAST 169.30
 STEWART SOUTH 238.45
the values for a particular field do not begin and end in the  TAYLOR WEST 318.87
same columns. Column input can not be used to read data
organized in this way.

>----+----10---+----20
This external file contains data that is arranged  2810 61 MOD F
in columns or fixed fields. You can specify a  2804 38 HIGH F
 2807 42 LOW M
beginning and ending column for each field. Let's  2816 26 HIGH M
look at how column input can be used to read  2833
 2823
32
29
MOD
HIGH
F
M
this data.

SAS Techies 2009 11/13/09 3
 Column Input
To use column input, your data must be standard character or
numeric values in fixed fields.
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;
>----+----10---+----20

 2810 61 MOD  F
 2804 38 HIGH F
 2807 42 LOW  M
 2816 26 HIGH M
 2833 32 MOD  F
 2823 29 HIGH M

 One of the features of column input is the capability to read
fields in any order.
 Character variables values can be up to 32K and can contain
embedded blanks.
 No placeholder is required for missing data. A blank field is
read as missing and does not cause other fields to be read
incorrectly.
 Fields or parts of fields can be reread.
 Fields do not have to be separated by blanks or other
delimiters.

SAS Techies 2009 11/13/09 4
 You can use formatted input, which combines the
features of column input with the ability to read
nonstandard, as well as standard data.

 Whenever you encounter raw data that is organized into
fixed fields, you can use
 column input to read standard data only
 formatted input to read both standard and nonstandard
data.

SAS Techies 2009 11/13/09 5
 INPUT pointer-control variable  The @n is an absolute
informat.; pointer control that
moves the input pointer
>----+----10---+----20---+--
to a specific column
 ENVELOPE   $13.25   500   4 number.
you can use the @n to
 DISKETTES $29.50   10   3
 BANDS     $2.50   600   2 
 RIBBON    
 PAPER      
$94.20   12   1
$15.95   250   10
move a pointer forward
or backward when
reading a record.
input Name $14. @16 Amount comma6.2 damout var
 The +n is a relative
pointer control that
input Name $14. +2 Amount comma6.2 damout var moves the input pointer
forward to a column
number relative to the
current position.

SAS Techies 2009 11/13/09 6
>----+----10---+----20---+--

 ENVELOPE   $13.25  500   4
 The $w. informat
 DISKETTES  $29.50   10   3
 BANDS     $2.50  600   2 enables you to read
 RIBBON     $94.20   12   1
 PAPER       $15.95  250  10 character data.
 The w represents
the field width of the
input Name $ 1-14 +2 Amount data value
or
Difference !!!
the total number of
input Name $14. +2 Amount columns that
contain the raw data
field.

SAS Techies 2009 11/13/09 7
 The informat for reading standard numeric
data is the w.d informat.

34.0008 7.4 34.0008

SAS Techies 2009 11/13/09 8
 The COMMAw.d informat is
used to read numeric
values and remove
embedded
 blanks
commas $34,000 Comma7. 34000
dashes
dollar signs
percent signs
right parentheses
left parentheses,
which are converted to
minus signs.

SAS Techies 2009 11/13/09 9
 External files with a fixed-length record format have an
end-of-record marker after a predetermined number of
columns.

 A typical record length is 80 columns.

>----+----10---+----20---+---------------
 BIRD FEEDER   LG088   3 20
 GLASS MUGS    SB082   6 12
 GLASS TRAY    BQ049 12 6
 PADDED HANGRS MN256 15 20
 JEWELRY BOX   AJ498 23  0
 RED APRON     AQ072 9 12
 CRYSTAL VASE   AQ672 27  0
 PICNIC BASKET LS930 21   0

SAS Techies 2009 11/13/09 10
input Department $ 1-11 @13
TotalReceipts comma8.;
 Files with a variable-
>----+----10---+---V20------------- length record format
 BED/BATH     1,354.93*
 HOUSEWARES   2,464.05*
have an imaginary
 GARDEN      
 GRILL      
 SHOES    
923.34*
598.34*
  1,345.82*
end-of-record
 SPORTS*
 TOYS        6,536.53* marker after the last
field in each record.

◦ Beware of Errors
◦ infile receipts pad;

SAS Techies 2009 11/13/09 11
 raw data that is >----+----10---+----20---+----

  ABRAMS*L.*MARKETING*$8,209
free-format; that  BARCLAY*M.*MARKETING*$8,435
 COURTNEY*W.*MARKETING*$9,006

is, it is not arranged  FARLEY*J.*PUBLICATIONS*$8,305
 HEINS*W.*PUBLICATIONS*$9,539

in fixed fields
The fields may be
>V---+----10---+----20

 MALE 27 1 8 0 0
separated by blanks  FEMALE 29 3 14 5 10
 FEMALE 34 2 10 3 3

or some other
delimiter
infile credit dlm=‘ ‘; input
Gender $ Age Bankcard
FreqBank
Deptcard FreqDept;

SAS Techies 2009 11/13/09 12
 Limitations
◦ Missing data values must be specified with a period
(.) for both character and numeric data.
◦ Although the width of a field can be greater than
eight characters, both character and numeric
variables have a default length of 8. Character
values longer than eight characters will be
truncated.
◦ Data must be in standard numeric or character
format.
◦ Character values cannot contain embedded blanks.

SAS Techies 2009 11/13/09 13
>V---+----10---+----20
 Missover option is
used to handle
 MALE 27 1 8 * *
 FEMALE 29 3 14 5 10
 FEMALE 34 2 10 3 3

missing values at
the end of a record
data perm.survey;
infile credit missover;
input Gender $ Age Bankcard
 If the missing value
FreqBank Deptcard FreqDept; is in the middle of
>V---+----10---+----20
the record then edit
 MALE 27 1 8 92 39 the raw data file
 FEMALE * 3 14 5 10
 FEMALE 34 2 10 3 3

SAS Techies 2009 11/13/09 14
 You can make list input more
data perm.cityrank; versatile by using modified
infile topten; list input. There are two
input Rank City & modifiers that can be used
$12. Pop86 : with list input.
comma.;
 The ampersand (&) modifier
>----+----10---+----20---+-- is used to read character
values that contain
  1 NEW YORK  7,262,700
  2 LOS ANGELES  3,259,340
embedded blanks.
  3 CHICAGO  3,009,530
  4 HOUSTON  1,728,910
  5 PHILADELPHIA  1,642,900
  6 DETROIT  1,086,220
  7 SAN DIEGO  1,015,190
 The colon (:) modifier is used
  8 DALLAS  1,003,520 to read nonstandard data
  9 SAN ANTONIO  914,350
 10 PHOENIX  894,070
values and character values
longer than eight characters,
but without embedded
blanks.

SAS Techies 2009 11/13/09 15
 When you read a date using a SAS informat, SAS software
converts it to a numeric date value. A SAS date value is the
number of days from January 1, 1960, to the given date.

Date Expression SAS Date Informat SAS Date Value

02Jan00 DATEw. 14611
01-02-2000 MMDDYYw. 14611
02/01/00 DDMMYYw. 14611
2000/01/02 YYMMDDw. 14611

SAS Techies 2009 11/13/09 16
 SAS software stores
time values similar to
the way it stores date
values. A SAS time value
is stored as the number
of seconds since
midnight.
 A SAS datetime is a
special value that
combines both date and
time information. A SAS
datetime value is stored
as the number of
seconds between
midnight on January 1,
1960, and a given date
and time.

SAS Techies 2009 11/13/09 17
 When a two-digit year
 Date7. Informat value is read, SAS software
 Mmddyyn8. defaults to a year within a
100-year span determined
by the YEARCUTOFF=
system option.
 The value of the
YEARCUTOFF= system
Date Expression Interpreted As option only affects two-
digit year values. A date
12/07/41 12/07/1941 value that contains a four-
18Dec15 18Dec2015 digit year value will be
interpreted correctly even
04/15/30 04/15/1930 if it does not fall within the
100-year span set by the
15Apr95 15Apr1995 YEARCUTOFF= system
option.

SAS Techies 2009 11/13/09 18
Since dates are stored as numerics any
meaningful arithmetic calculations can be
performed on them.
Ex: Days=dateout-datein+1;

SAS Techies 2009 11/13/09 19
 Write multiple Input statements
>----+----10---+----
  ABRAMS THOMAS input Lname $ 1-8 Fname $ 10-15;
MARKETING     SR01
$25,209.03 input Department $ 1-12 JobCode
$ 15-19;
BARCLAY ROBERT
EDUCATION     IN01
input Salary comma10.;
$24,435.71
 one INPUT statement that contains a
COURTNEY MARK
PUBLICATIONS  TW01 line pointer control to specify the
$24,006.16 record(s) from which values are to
 You use the forward slash (/) be read
line pointer control to read
multiple records in sequential
order. input
#1 Lname $ 1-8 Fname $ 10-15
input Lname $ 1-8 Fname $ #2 Department $ 1-12 JobCode $
10-15 / Department $ 1-12 #3 Salary comma10.;
JobCode $ 15-19 / Salary
comma10.;

SAS Techies 2009 11/13/09 20
 repeating blocks of
>----+----10---+----20---+----30-- data that represent
01APR90 68 02APR90 67 03APR90 78
04APR90 74 05APR90 72 06APR90 73
separate observations
07APR90 71 08APR90 75 09APR90 76
 an ID field followed by
>----+----10---+----20---+----30--
an equal number of
repeating fields that
 001 WALKING AEROBICS CYCLING
 002 SWIMMING CYCLING SKIING represent separate
observations
 003 TENNIS SWIMMING AEROBICS

>----+----10---+----20---+----30--
 an ID field followed by a
 001 WALKING
 002 SWIMMING CYCLING SKIING
varying number of
 003 TENNIS SWIMMING
repeating fields that
represent separate
observations.

SAS Techies 2009 11/13/09 21
 The SAS System provides two line-hold specifiers.

The trailing @ enables the next INPUT statement to
read from the current record in the same iteration of
the DATA step.

Ex: input name $20. @;

The double trailing at sign (@@) enables the next INPUT
statement to read from the current record across
further iterations of the DATA step.

input name $20. @@;

SAS Techies 2009 11/13/09 22
 Normally, each time a DATA
step executes, the INPUT
statement reads a new record.
But when you use the @@, the
INPUT statement holds the
current record and reads the
next value.

 A record held by the double
trailing at sign (@@) is not
released until

◦ the input pointer moves past the
input ID $4. @@; end of the record. Then the input
. pointer moves down to the next
record.
.
input Department 5.; ◦ an INPUT statement without a
line-hold specifier executes.

SAS Techies 2009 11/13/09 23
data perm.april90;
infile tempdata;
input Date : date. HighTemp @@;
format date date7.;
run;

SAS Techies 2009 11/13/09 24
 Like the @@, the single trailing @
◦ enables the next INPUT statement to read from the
same record
◦ releases the current record when a subsequent
INPUT statement executes without a line-hold
specifier.

 Unlike the @@, the single @ also releases a
record when control returns to the top of the
DATA step for the next iteration.

SAS Techies 2009 11/13/09 25
data perm.sales97;
infile data97;
input ID $4. @;
do Quarter=1 to 4;
input Sales : comma. @;
output;
end;
run;

SAS Techies 2009 11/13/09 26
Raw Data File
>----+----10---+----  H indicates a header record
  HP
P
P
 321 S. MAIN ST
 MARY E    21 F
 WILLIAM M 23 M
that contains a street address
HP
P
 SUSAN K    3 F
 324 S. MAIN ST
and P indicates a detail
P
P
 THOMAS H  79 M
 WALTER S  46 M record that contains
information about a person
P  ALICE A   42 F
H  MARYANN A 20 F
 JOHN S    16 M
 325A S. MAIN ST
living at that address.

SAS Data Set
Obs  Address          Name       Age Gender

 1   321 S. MAIN ST   MARY E     21    F
 2   321 S. MAIN ST   WILLIAM M  23    M
 3   321 S. MAIN ST   SUSAN K     3    F
 4   324 S. MAIN ST   THOMAS H   79    M
 5   324 S. MAIN ST   WALTER S   46    M
 6   324 S. MAIN ST   ALICE A    42    F
 7   324 S. MAIN ST   MARYANN A  20    F
 8   324 S. MAIN ST   JOHN S     16    M
 9   325A S. MAIN ST  JAMES L    34    M
10  325A S. MAIN ST  LIZA A     31    F
11  325B S. MAIN ST  MARGO K    27    F

SAS Techies 2009 11/13/09 27
 you want to keep the header record as a part of each observation
until the next header record is encountered.
 RETAIN variable1 variable2; If no variable is mentioned then applies
to ALL variables.
 When a RETAIN statement specifies variables, new variables are
created. Therefore, you must name any variables used in a RETAIN
statement exactly as you want them stored in the data set. You might
need to drop the extra variables.

>----+----10---+----
data perm.people;
 H   321 S. MAIN ST
infile census;
retain Address;
 P MARY E     21 F
 P WILLIAM M 23 M
 P SUSAN K     3 F

SAS Techies 2009 11/13/09 28
data perm.people (drop=type);
infile census;
retain Address;
input type $1. @;
if type='H' then input @3 Address $15 @@.;
if type='P‘ then
input @3 Name $10. @13 Age 3. @15 Gender
$1.; run;

SAS Techies 2009 11/13/09 29
Raw Data File
>----+----10---+---20
  H 321 S. MAIN ST SAS Data Set
P MARY E    21 F
P WILLIAM M 23 M Address Total
P SUSAN K    3 F 321 S. MAIN ST 3
324 S. MAIN ST 5
H 324 S. MAIN ST 325A S. MAIN ST 2
325B S. MAIN ST 3
P THOMAS H  79 M
P WALTER S  46 M
P ALICE A   42 F
P MARYANN A 20 F
P JOHN S    16 M

H 325A S. MAIN ST

P JAMES L 34 M
P LIZA A 31 F

H 325B S. MAIN ST

P MARGO K 27 F
P WILLIAM R 27 M
P ROBERT W 1 M

SAS Techies 2009 11/13/09 30
>----+----10---V----20
1802 JOHNSON2123 it's important to
specify a w value
  1803 BARKER2142 that is large enough to
1804 EDMUNDSON2325
1805 RIVERS2543 accommodate the
1806 MASON2646
1807 JACKSON2049 longest value.
1808 LEVY2856
1809 THOMAS2222

data perm.phones;
infile phondat length=reclen;
input ID 4. @;
namelen=reclen-9;
input Name $varying10. namelen PhoneExt;

SAS Techies 2009 11/13/09 31
         15             15             15
       |     14     | |     14     | |     14     |

>----+----10---+----V0---+----30---V----40---+----V0
  1234 13MAR89 120/80
1443 12FEB89 120/70 03FEB90 125/80 07OCT90 125/99
1681 11JAN90 120/80 05JUN90 110/70
2034 19NOV88 130/70 12MAY89 150/90 23MAR90 130/80

data perm.health;
infile bpdata length=reclen;
input ID 4. @;
do index=6 to reclen by 15;
input Date : date. BP $ @;
output;
end;
run;

SAS Techies 2009 11/13/09 32