Professional Documents
Culture Documents
'\
-\,
''I,,"
.f~~i~:,1!d~~,~::
-s~$"'·Onli;~e sa~p1es
I r
SAS Online Samples ~nables you lo download the sample programs from many SAS books by using one of
three facilities: Anonymous FfP, SASDOC-L, or the World Wide Web,
. I ,
Anonymous FT~ sample programs. You also receive
Anonymous Fl'P enab/es you 10 do1\-nload ASCJI files and binary files notification when sample progrnms
(SAS dnl:1 libraries in 1ransport format). To use anonymous Fl'P, connect from n new·book become available.
to FTP.SAS.COM. One~ connected, enJer the following respo11ses as you To use SASOOC-L, send e-mail, with no
are prompted: i: subject, to LISTSERV@VM.SAS.COM. The body
of the messnge should be one of the lines listed below.
Name (ftp.sas,ci:>m:user-id}: anonymous
1 ! To subscribe to SASDOC-L, send this message:
Password: <YO~r ~-mall address>
SUBSCRIBE SASDOC-L <flrstname ·1astname>
Next, change lo the pu~lications directory:
To get general information about files, download the file INFO by
>Cd pub/publlcati~ms I . sending this message
For general informatioj1 about files, download the file Info:
I ' GET INFO EXAMPLES SASDOC-L
>get Info <target-filename>
I ' To gel n list of available sample progrnm~, download the file INDEX by
For a list of available sample programs, download the me Index: sending this message:
I ;
>get Index <target-filename>
I . GET INDEX EXAMPLES SASDOC·L
Once you kuow the narne of the lile you want, issue n GET com1111111d to Once you know lhe name of the file you want, send the message
download the file. Note:I Filenames nre cnse sensitive.
. GET flle11ame EXAMPLES SASDOC·L
To download ... Issue this command..,
World Wide Web
compressed ASCII file 1· >get ll/e11ame.Z <target-filename>
The SAS Institute World Wide Web information server can be accessed al
ASCII file >get filename <target-filename> rhe following URL:
I
binal'y transport file I >binary
.I >get flfename <target-filename> http! //www.sas.com/
i The sample programs are available from the Support· Services portion of
SASDOC·L I ' the lnstitute's server.
I '
SASDOC-L is a listse,·v mnintnined by the Publicatio11s Division nt
SAS Institute. As o sub~ci·iber, you cn11 request ASCII files that contain
i I
i
I .
i !
i
I
I
'. i
: !
II
. '. I
,!I
Combining and Modifying' $AS· Data
Sets: Examples .!
' ;
Version 6
First Edition
/Alf.
®
SAS Institute Inc.
SAS Campus Drive
Cary, NC 27513
I
I
I,
The con-ect bibliograp!J ccitation for this manual is as follows: SAS lnslilule Inc., Combi11i11g a11d
Modifyi11g SA.$4' Data 's~t:r: Examples, Versio11 6, First Editio11, Cary, NC: SAS Institute Inc.,
1995. 197 pp. 11 i
I !
Combining and Modifying SAS,. Data Sels: Examples, Version 6, Fir.st Edition
I 1
1
ISBN 1-55544-220-X I:
Ii
All rights reserved. Printed In the United Stales of America. No part oflhis publication may be
reprod11ced, stored ln a;¥1rieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopy/ilg, or otherwise, without the prior written permission of the publisher,
SAS Institute Inc. !
I:
:I:
Restricted Rights Lege~d. Use, duplication, or disclosure by the U.S. Government is subject to
restrictions as set forth ,~'subparagraph (c)(l)(ii) of the Rights in Technical Data anii Computer
Software clause at DFARS 252.227-7013.
III
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
11:
1st printing, August 199S:
.The SAS• System is an:Iintegrated
Ii system of software providing complete control over data access,
management, analysis, ~bd presentation. Base SAS software is the foundation of the SAS System,
Products within the SAslsystcm include SAS/ACCESS~ SAS/AFil! SAS/ASSIST'/
SAS/CALCI/ SAS/CONNE~ SAS/CPE~ SAS/DMII/ SAS/EIS~ SAS/ENGLISHil!
SAS/BTSa: SAS/FSP'/ jS./\,S/GRAPHil! SAS/IMAGE'/ SAS/IML~ SAS/IMS-DI.JI~
SAS/INSIGHTil! SAS/LAB'/ SAS/NVISION'/ SAS/OR"? SAS/PH-Clinical'/ SAS/QCil!
SAS/RBPLAY-CICS": S~S/SESSIONil! SAS/SHARE'/ SAS/SPECTRAVIEW~ SAS/STA~
SAS/rOOLKITl/"SAS1i1:RADERil! SA~/rUTOR~ SAS/DB"2: SAS/GEO: SAS/GIS:
SAS/PH-Kinetics , SAS/SHARE*NET , and SAS/SQL-DS software. Other SAS Institute
products are SYSTBM 20QOG> Data Management Software, with basic SYSTEM 2000, CREATE:
• "" "" 11: , .. •
Multi-User, QueX, S~~en Wntcr, and CICS interface software; InfoTap• software;
NeoVisuals• software; ~MP"1 JMP IN~ JMP Servel/ and JMP Desig11• software; SAS/RTERM,.
software; and lhe SAS/cj:f:compiler and the SfS/CX-a Co"mpiler; VisualSpace" software; and
Emulus• software. Mulli}';endor Architecture and MVA arc trademarks of SAS Institute Inc.
SAS Institute also offeril S,AS Consulting'/ SAS Video Productions\!/· Ambassador Select'/ and
On-Site Ambassador~ s~h•ices. A11thorli11e"1 Books by Users~ The Encore Series:
JMPer Cable~ Observa/i011st: SAS Co111111u11ications~ SAS Trai11i11g~ SAS View&~ the
SASware Ballot~ and S~l~cText documentation arc published by SAS Institute Inc. The SAS
N
Video Productions logo ~~d the Books by Users SAS Institute's Author Service logo are registered
service marks and !he H~lplus logo and The Encore Series logo are trademarks of SAS Institute
Inc. All trademarks abo~e 'are registered trade.marks or trademarks of SAS Institute Inc. in the USA
and oilier countries, ® indicates USA registration.
jll
,1i
The Institute is a privatelcl?mpany devoted to !he support and further development of its software
and related 11ervices, 1
I1 !
I I
Other brand and productln~mes are registered trademarks or trademarks· of their respective
companies.
Example 3.6
Variable 50 · IJ
Applying Transactions to a Master Data Setil sing an
Index 53 ! I
Example 3.7 Removing Observations from a Master Data Set Based on
Values in a Transaction Data Set 56 ; I. I
Example 3.8 Performing a Table Lookup with a Small Look.up Data Set 60
! 11
I .
I
,I
i
i
IV l,Olllems .I I,
I
Sort) 158 ~ I I;
Example 6.6 Creating Equal-Sized Random Samples and Pro~ucing
Equal-Sized Subsets or Exact-Sized Subsets: 1161
Example6.7 Counting the Occurrences of a String within ,tli~ Values of a
Variable 164 · j I!
Example6.8 Extracting a Character String without Break{ng he Text in the
Middle of a Word 166 I ,j
Example6.9 Creating SAS Datetime Values 168
i i
Example 6.10 Creating a SAS Time Value from a CharacterI V
,1
lue 170
Example 6.11 Calculating a Person's Age 172 ; I:
Appendix • Error Checking When Using MODIFY or S![ with
KEY= 175
Why Error Checking? 176 ;
New Error-Checking Tools 176
Index 185 . :!
Your Turn
vii
Credits
i Documentation
. i i
iam F. Heffner,
Kevin Hobbs, Charles A. Jacob , Paul M. Kent,
Susan Marshall, RickMatthe+s~ Denise J.
Moorman, Lynn H. Patrick, 1\my S, Peters, Jon
C. Schiltz, Bruce Tindall, an~• . 'ichael Williams
i
Recognition '
I
I • I I
This book was conceived and planned by members of the Technical Support
and Publications Divisions. The SAS code was written b~jinembers of the
Technical Support Division and of Research and Developrilent. The book was
! review~d by members o{ the Technical Support, Educatid.n! and Research and
Development Divisions. : ' : I I!
i · : · I !i
Without the advice,' exp~~tise, and skill in coding from ril~mbers of other
divisi~ns within SAS Institute, this book would not hav~ bJen possible. The
Publicati~ns Division is 'grateful
'' ' ' ' '
for
the talent and expetti~J that helped create
' II I
this book and would especially like to recognize the serious commitment of
time ana'resources :on the part of the Technical Support id{ ision.
I: ' : ' ' l I
1 :
~ I
: I ! I
'i!
''
:
; I
'
I:
.i
,I
i
''
I'
I
I
':
, I
i:
:'
A~ fu;od~ction tp ~,~~ ~ei~~i9nswps, ~i~es~
Methods, and Teqhmf! i~; {o~ qat~ Mamp1lation
' '.
)f ·i
: Overview 2 I
: Data Relationships 2
II
I
One-to~One · 3 , I
One'-to-Ma,iJ• and Mm~y-tofone 3' !
; Ma11y-t:-Many. · 4:: .. j · :• .! .;
: Access Methodsi Sequenti~l vefsus Difect 5
' SequentialA'ccess, 5' !
Direct AcceSs · 5 i '
l, ·,,
' \l . -. I . {; i
11,.- : ' ; ·1
I
.
\
I
.I
: I.
.,'i ,; ;
''
I
I
i
,:·
~ Vl'CWV/1<11' u l,/l(iT'i
111 '
Overview
: Many applications, including Decisi~n Support and Executive Information
: Systems, require input data to be in a specific format before it can be processed
: to produce meaningful results. Even if all of your data are already in SAS data
: sets, the data to support th1ese systems typically come from multiple sources
i and may be in d/fferen~ formats. Therefore, you often, if not always, have to
! take intermediate steps to 1ogically relate and process the data before you can
t analyze them or. create repmts from them.
! • ! I
jApplication req~irements vary, but there are common denominators for all
i applications. that access, combine, and process data, Once you have determined
iwhat you .want t~e output to look like; you must
discover ho~ the input data are related
' . : : . :
select the appropriate access method to process the input data
' . ·,. ' ' !
□ select the appropriate SAS tools to complete the task.
Data Relatid~ships
i i '.I, . ,
:Relationships a~ong multiple sources: of input data exist when the sources
'.each contain co~on ~ata, either at the physical or logical level. For example,
;employee data and department data co.uld be related through an employee ID
l . I . '. :
;variable that.shares common values. Another data set could contain numeric
'sequence numbers wp.ose partial values logically relate it to a separate data set
by observation numbe{·, Orice data relationships exist, they fall into one of four
categories: i,
D one-to-one:,
D one-to-many
l'i
D many-to-on~
. . I
i
D many-to.:ma~1y.
. .,,',
' .
are
~
identify the existing relationships in your data since this knowledge is crucial
to understanding'how input:data can b~ processed to produce desired results.
'
il:
I'
:;
1
i: ; t
i! I
I
, Ii Iii I
I I
i !
, l i
! '.
I:
1'
!
'
!. H
.It
Data Relatio11ships, Access Methods, a11d Dt1/a Ma11ip11latio11 j p! Data Relationships 3
i 1: I
One-to-One . h'tp, typ1ca
I n a one-to-one relahons . 11y a sing .I 11 • one data set is
. 1e ,o b servatlon'm
related to a single observation from another based on th~ t'~lues of one or more
~elected var~ables. A one-to-one relationshi~ implies th~t ~~ch value of t~e
selected van able occurs no more than once each data m When working:set.I
with multiple selected variables, this relationship implies ~hat each
cor:r{bination of values "occurs no more than once in each cia 1a set.
I
I
Figure I.I One-to-One SALARY TAXES [
Observations in SALARY and TAXES are
related by common values for EMPNUM.
EMPNtJl1 SALARY EMPNUM 'l'AXBRCK)
4876. .
_~:il!].
k~-Y~tt~!@l~~.,!~.~.!: -.~
32.·000 ~ ·
-.11
I
I
.;
.
I.
• . • • • •
1'
I
,
1·[
•1·
I
I
One-to-Many and A one-to-many or many-to-one relat1onsh1p between mptJtldata sets 1mphes
Many-to-One that one data set has at most one observation with a speci.fi'c 1iValue of the
selected variabl~ but the other input data set may have mb~6 1than one
occurrence of each value. When working with multiple sel~cted variables, this
relationship implies that each combination of values occur~' ii.o more than once
in one data set hut may occur more than once in the other dala set. The order in
which the input data sets are pro'cessed determines whethe 't' e relationship is 1
one-to-many or many-to-one. ,/
'
Figure 1.2 One-to-Many ,ONE 'l'NO
Observations in ONE and TWO are related
by common values for variable A. Values of
A are unique in data set ONE but not in · A ·B C ,A ' E F ·,
TWO. j
II
1 s 6 1 2· .o I
~ . 1:·i
I
I I
3 3 .. 4 ..
.
.
3 : 99 I
j
4 "BB I
i·'
! :;
j i 5 7.7 I
:!'. I 1
I
i .
2 1, 66 i
2 2 5,5
3 4 44
::' I;
I :
't l,t l
.u111a 1te1a11u11~, w Lm1p1er J
1
Figllre 1.3 One tri-Many and
TWO· THREE
Many-to-One ;I i
Obset·vations in da'ta sets ONE, TWO, and
I .~ I
THREE are related oy common values for SALES ID QUOTA
variable ID. Valu~slor ID are unique in 28000 1 15000
1. I
ONE and THREE111 but not in TWO. For
I
,30000
values 2 and 3 of ID, a one-to-many .
relationship exist~ ~e~ween observations in 5000
data sets ONE and T'}'O and a many-to-one 5 8000
relationship existJ ~;et.ween observations in
data sets TWO anti ; BREE. . :I
! ' I'
• I
4 35000
' '
I' 5 40000
: (. fi:.
i; r: . 1 ., : ,
J 1; : p; i' 1:L
: !I: '. !i;( :';::;.:' i i !
Many~to~Mal't i•The ID!1!J-Y·t07in,ny category impli~s that multiple observations from each
input data set may be related based on values of one or more common
' I
i variables. . ' : ;
!:
: l ' :
I '
1 I
I ,
Figure 1.4 Many- I I
to~Many
1
I ' · BREAKDWN: : ' !~INT
Observations in dlilA sets BREAKDWN and
I I, I
'
I
I i
DDD 22JUN94
. ~ ... j [l ; I:
:. !1
DDD 19SEP94
.
..· :: 1,!
!~\ I r
.,
.'1 i\ ~
'' .
'.
'
: I•
'
,I
.·,1
'' l •1·' I l i
''
I:
I,
'
. '· r j .. ,n
Uata Kela/ia11sllips, Access Mel/rods, a11d Data Ma11ip11latio11 □ Access Mel/rods: Seq1ie11tial versus Direct 5
' ,: ' II I': I
. : 11
,I .. . , I 1· 1'
,: . ,' .I II
Sequential Access IThe simplest and perhaps most common way to process dat~ with a DATA
!step is to read observations in a data set sequentially. You qar read
!observations sequentially using the SET, MERGE, UPDATE, or MODIFY
;statements. I i I
: I ! I
: ' ; '. 1. !
Direct Access
;two methods; i l
l
;
I
' :!
;□ by an observation number ! !
, I '
:0 by the value of one or more variables through a simple or composite
I
i
• d
m ex.
'
I j
II
:To access observations directly by their observation number; use the POINT=
:Option with the SET or MODIFY statement. The POINT~ dphon names a
~ariable whose current value determines which observatioh 1 1~SET or MODIFY a
~tatenient reads. · · · I 1
·1
I ,
ffo access observations directly based on the values of one ot . ore specified
~ariables, you must first create an index for the variables anti then read the data
set using the KEY= option with the SET or MODIFY stateril~nt. An index is a
of
~eparate structure that contains the data values the key vaHrble or variables
paired
.
with a location identifier for the
'
observations contain LhI the value.
j
!
I
ii
iI
: I
I j '
.I
0 /J/1 ~Vel"vtetv OJ me111'.qlai JCII' l,()I/ICJ/1/mg
I ; I I !
o,)/l~ : " " ' ,ms:' u .--·~r"/ , ' •~ \ I;
;< I : : I ,
·:. ' : I
I :I 1·
□ updating. i !: :
,:: i :: !, •·· . I
Figures 1.S'thro~gh 1.9 show basic illustrations of all of these methods for
comtiirung SA~ ~atil seis. :i: ... I
. I J;. ·i ;:; · ;: 1~ d. f:;/: :; !i'. :: . !
Figure 1.5 Concate ating SAS Data Sets : I ,1 I: ;, iL .I ::1
••'; I II :' I I
· appen a' s' 1the observations
C011cate11at111g 1 •
. i' DATAl :.1· 1:', -,:: DATA2 !! ALL
!: I
.(! 'I : ii
,1 1 data set.
from one data set to a11o[t~er 1 •
YEAR' : i . i: ·:, . ;Y~AR I YEAR
DAT'Al IS Ii 11yunll'I;aII
. read SeqUef~la 1
19911 . 1\ :.\ 1991 .../
observations have beer. processed.
1 ,
: 1-----11:: .-;: ·1(
Likewise, data sets in ti1e SET statement are 1992
processed sequentiall~ i I the order in which· J·
they are listed, · ' = 1993
1994
1995
.,,/
·,,
.../
:1:
I:•
I :1 j ,
',.
'fi
:'•! ~ . ;
';,; ::;
:'i I
' . :11 11 ·
i ,,•
i- 1
,;
l
I·
.:
:::j,, ;,
·, I•' i
·lj• .:1,,
'.t ! .
. i!;
';1,:
I.• ! '! ~ < •
j;,
:ii I
,· .
.l
_ ............................,l'.., ....................... ., ...., ...... .Lol' ■•• .......... ,..l:'............,•• "1111 -rc.1 rH,fP V.J 4•,ICoHfl,4.lolJVI t••rr VIIIUl4S Olk.I LlltrQ .:lets 7
: I·!
Fig111·e 1,6 Interleaving ! ·'
/11terleavi11g intersperses observations from DATAl ALE • l•i
1992
1993 +
1994
·I
i
1995
,.:; ........
_.,,-·(.
/ /i : 1 _ _.data_)a11-, _
I ! .setti"i~t:al'--da:tai, '.
I _,"fCift ·•
\ :·.
'\~
--~-~
Figure 1.7 One-to-One Reading or
Merging ! DATAt DATA2
' ~VARY VARY
011e-to-011e readi11g combines observations VARX VA~
from two or more data sets by. creating
observations that contain all of the variables Xl xii
from each contributing d·ata set.
Observations are combined based on their X2 x2: I
I
relative position in each data: set, that is, the i
first observation in one data set with the · X3 + = x3:I
first in the other, and so on.
:X:4 X4i
The DATA step stops after it has read th~---\ :i i
last observation from the smallest data s~~: ,
''
XS ,,
·:
xsl I
data ap; _ . _
, m1:1~~e' d~ta1. data-2?
~n; '!- ;'
1 I
I
i (\0
,1..--',.._,J i
~\o/.. J
''I .
o n11. vve-rvunv UJ ,r.ic:11,uu.)' Jur 1..,un10111111.g a>n.&> uu,u 13,:1~ u J
IiiI., 1
j•
'
I '
· ~veoJtc:l,t{,.
'
r~
DATA2
!
.e>'c . .-l«~o-,
,·
} y~
Match-mergi11g combines 'observations DATAl / -s:. ' ALL ....Jl
from two or more datd ~htk into a single YEAR VARX , YEAR i VARY /' YEAR VARX VARY
observation fa a new dala set based on the
I :1 .
i
,
values of one or more common variables.
j I
A·
"'
1991
1992 X2 O
1991
1991
Xl
Xl
- I
i
1993 X3 ) + = 1992' X2 - I
1994 X4 1993 X3 !
...J
1995 . XS 1994 X4
1995 X5 - '
''
,
i.
~ - 1
I
....i
_J
I
: '
- ;
' I
r-Ji0t j · ....J
-
• I
-,
I
i! .
;1'
--- ~1eJ(O!at ....J
,. I
i
i: l
i _J
..J
\. '' .... ---- -· .... :·~ -·-·-·---·---··~- ·---t···--,------··-----·--------..
- '
.....
i_c>d._,_.. ......,
,
......'
-...J
Figt1rel.9 Updating
Data Relatio11ships, Access Methods, a11d Daia Ma11ip11latio11
MASTER
D
A,.~w;~ ~T-fo,C1[;;;~:
observations in a transaction data set to YEAR VARX VARY
delete, add, or alter information in 11
observations in a master data set. 1985 XI YI 11f85 Xl Yl
•: : .:: .
I dat.i ma.s~$rJ .. ··:·
,.
'.: update master··trail~; .
.by year;): . . ·. . .
runf:
I:i:
I
IU /111 v1•er1'1ell' UJ ,,,r~Ol'
Ill I
~j
Table I.I Tools fi 9ombining SAS Data Sets
:r '
Access Method
iI
'' !
Statement : C~n Use with
orProc Sequential Dkect : · ;;BY statement Comments
Array Prncessing i I•
When you ·want to process several variables in the same w~ ~ use array
processing. Processing variables in arrays can save you tinieland simplify your
code. Use an ARRAY statement to define a tempornry grotiP.ing of variables as
.• · I I ·
an array. T~en use a DO loop to perform a task ~-epetitivel.~ ol·n all or selected
elements in the array. i 1-:
Choosing between UPDATE 1 You can us·e either the UPDATE or MODIFY statement io !u date a master
; I ,
and MODIFY !data set with information in a transaction data set. The UPDATE statement is a
imore familiar tool. Its only application is to update a maste( ~ata set.
!MODIFY,'~ newer and more powerful tool, has many m~r,'.applications. You
:can use the MODIFY statement to :( I
:□ process a file sequentiaUy to apply updates in place (w"tiout a BY
statement) : .
i . . i
.□ make changes to a master data set in place by applying transactions from a
transac~ion data set _ ! .: I
1□ : update the values of variables by directly accessing observations based on
: observation numbers ! I.I
;□ update the values of variables by directly accessing obJe vations based on
the values of one or more key variables.
LI l..llllpter1
more powe1ful tool thanUPDATE, UPDATE is still the tool of choice in some
cases. Table. 1.2 helps y9u choose y.,~ether to use UPDATE or MODIFY with
BY. ·. . i
• : .:
:
'
I
I
The :toll owing sources contain more complete explanations of topics covered
briefly in this chapter: ; .· I
.. i I
;11
• .
: ; I
; Language Statements," in SAS Language: Reference, V~rsion 6, First
j Edition. . .. j· I:
D Combining SAS Data Sets. For a complete description1apd examples of
concatenating, interleaving, one-to-one reading, one-to-orle merging,
match-merging, and updating/see pp. 137-160 in SAS La~zguage:
Refereiice, '(:rsio,i 6, First Edition. For more examples,!ste Part 4,
"Combining SAS Data Sets,''. Jn SAS Language and Procedures: Usage,
Versi~n 6, First.E~ition. · , . I: I
i
D Creating an index for a SAS data set. For a discussion of indexes and
!
how to create them, see pp. 217-225 in SAS Language'.· ~~ference,
Version 6, First Edition. Also see the description of the ~DEX= option in
SAS Technical Report P-242, ~AS Software: Changes ~114 Enhancements,
1
; Release 6.08, pp.:3.1-32. . , • j: I
ci _IORC..:, automatic variable and SYSRC autocall ma~o. These
error-checking tools were originally documented in SASi Technical Report
P-222, Changes and Enhancements to Base SAS Softwar~,,Release 6.07.
. Both detailed descriptions and ~xamples are in the appen'd1x in this book.
·Also see.Jacobs IlI, Charles A: (1992), "DATA Step Prdgtamming Using
,.
I
□ MERGE statement. For complete reference documentatibn, see
I Chapter 9, "SAS Language Statements," in SAS Langua~i: Reference,
I Version 6, First Edition. jij
d MODIFY statement. For complete reference documentation, see pages
: 1-10 in SAS Technical Report P-242, SAS Software: ChHges and
, Enhancements, Release 6.08. Also see Jacobs III, CharleirA. (1992),
"DATA Step Programming Using the MODIFY Statem~~i,"
; Obse111ations, 2 (1), 4-11. j:; I
d PROC SQL procedu1'.e, If you are unfamiliar with Struc,tured Query
j Language, see Getting Started with the SQL Procedurei ye~-sion 6, First
' Edition. For complete documentation on PROC SQL, se~ SAS Guide to the
SQL Procedure: Usage and Re~erence, Version 6, Fi_,-stldftion.
Di SET statement, For complete reference documentation, see Chapter 9,
"SAS Language Statements," in SAS Lang,,age: ReferenhJ, Version. 6,
First Edition. For information o'n the KEY= option, seep! 43 in SAS
Technical Report P-222, Changes and Enhancements to 4tt,se SAS
Software, Release 6.07. The UNIQUE option is described ih SAS
Technical Report ~-242, SAS S~ftware: Changes and E?h·aI11cements,
Release 6.08, p. 14.
.
,
I , ..
I:
□! UPDATE statement. For complete reference documentati ·n, see
Chapter 9, "SAS Language Statements," in SAS Languaie.l Reference,
Version 6, First Edition. Ii
I:
I:
1,
C H A p T E R
II
.I
Combining Single Observations with Single
I
Observations i :
J~.
. I.1
2.2 Combining Observations ~en Variables Values Not Match
&~~
.
u !
2.3 Combining Observations When There is No Commb
Variable 24 ' · J;: I
2.4 Performing a Table Look~p When the Lookup DatJjet Is
Indexed 27 · I!
2.5 Performing a Table Look~p When the Lookup Datj '. et Is Not
Ind~xed 31
i
J
2.6 Matching Observations Randomly 35
I
I
I·
10 ~tCIIIIJJte -',1 u ll'up1e1· I!.
I I
Example 2.1 . Merging D~ta Sets.by a: Cpmmon Variable,
Specifying"The~~ Origin:, and Replacing
Missing V~l.ues) ·
I
Goal I Combine observations from t~o data s~ts based on a variable common to both.
I
I To make the new data set more informative, create a new variable whose
I of
values indicate the origin each obse~vation and replace the missing values
that result from tlie merge operation with meaningful values .
. , ' '· • , ;• I
!
•t
Strategy Use the MERGE and BY statements tci match-merge the observ,ations from
two data sets. Use the IN= data set option to indicate which data sets contribute
1
•.1 I • i
r1 " ;O(i'o <Miguel A12 Document 1 111 Fred 35
I
,2 111· Fred B45 Su~vey 2 222 Diana 40
•3 222 Diana , B45 Document 3 777 Steve 0
,4 888 l~~nique · 1 A12 · Dobument 4 888 Monique 37
I I
5 ; .999 Vien D03 Survey 5 999 Vien 42
'1·. '. ;.
<
II !
. '
Strategy Sort each .data l[le;t by· the variable you jre comparing, Read an observation from
each,, one, then
.
con:ipare .
the values of the
'
appropl'iate variable. (Remember to
';'
rename' ,variables
-,
common
' i ; : , - ,.
to both
'
data sets so that values from one data set do I
not .overwrite vaiuesfrom the other.) ~f the difference between the compared
valt1es is within arr acceptable range, write an observation containing values
fro~ b~th data sets. If it isn't within a? acceptable range, test to see which of
the two observations should ccime first. Write to the data set an observation
that contains tho~e values; setting the yalues from the other input data set to
missing to indicate that no appropriate match was found. Then read another
obs~rvation from the data set that con(ributed the values you've just written to
the output data set, and test again to srie if you have a close match.
i
Because·you need to read from one data set, from a second data set or from
both. based on the.result ofa comparison, there are three different points in the
code,from\-vhich you may :need to exepute a SET statement to read an
obse1:vation. To sirnpHfy the c~de, putJhe SET statement in a group of
11 i 1 :., ·, . --· . i • •
statements following a label and use the LINK statement at each pomt m the
program where a read should occur to pranch execution to the appropriate SET
statement.
l : \ '
In each
.
labeled group, precede the SET statement with an IF/THEN
j ' I
statement that prevents SET from attempting to read past the end of a data set.
Otherwise; the DATA step might automatica1ly end before all observations are
processed from both data sets. ·
l ' I I .
f. I .· ,· , ' ' :
Use the END;,,, "(ariable to determine when you've read the last observation in
a data set. Create ~nother variable to indicate that an observation has been read
and proc7~~ed. ~~st th~ vafo~ of that variable for each data set so that you can
end the I?ATA step only after the last observation has been read and processed
rro~ each
inpu(ciata set. ·. : : i
,. ' i. ' !
Using the SQL pr<?ce~ure, you <;:~n per~orm the same task with less code. See
"Related .Technique."
• ,: ~ ~ : I• •
[:
• , f • , · : •
determine the path chosen by the PROy SQL optimizer, it is not always
poss\ble to determine the most efficien~ method without first testing with your
data.: , ;' , <, , , i
: 1,
! I. ,. 'I
i :
!,.
i• : ,C~,n,(: I ,
i t ·;
' t ~
I i I
Combi11i11g Si11gle Observ.t1tio11.r 1vitll Si11gle Obse,,,~,;~,/s □ E.mmple 2.2 19
: !I.
I
1 23NOV94:09:01:00 100 1
;I
23NOV94:09 Ob:00 200
i , ,.. I
2 23NOV94:10:03:00 ;io1 2 23NOV94: ~9 ;,5~: 00 201
3 23NOV94.:10:58:00 · ,102 3 23NOV94: 11 /o :00 202
; I• I
4 23NOV94:11:59:00 103 4 23NOV94:12:02:00 203
s
I
23NOV94:13:00:00 104 5 23NOV94: 14
' I I
00tot: 204
6 23NOV94:14:02:00 105 6 23NOV94:14:59:00 205
7 23NOV94:16:00:00 106 7 23NOV94:15:,5 :00 q 206
' ; 11
8 23NOV94: 1_6 =M: 00 207
,1,.
9 23NOV94:18:00:00 208
i
'l'IIIB2 SAflPLBl
I
SA!IPLB2
1 23NOV94 .. 09.'01··; 23HOV94109100 100 200
2
3
4
23NOV94 I 10 I 03 '
23NOV9b10 I 58
23NOV94 I 09 I 59
23NOV9411l 104
,
101
.
102
201
202
.
5 23NOV9.41ll1S9', 23NOV94112102 103 203
6 2jNOV94 I 13 I 00 . 104
7 23NOV9h 14102 23NOV94 I 14 I 01 105 204
,8 23NOV94 : 14 I 59 205
9 23NOV941l6100 23NOV94115159 105 205
10 . , 23NOV94116159 207
11 -.._ __..'._.. - :..,~- -~~!19V94:~.a,_oq ( . 208
',?\",\lh •
;
J:
' 1:
Orttput2.2b MATCH2 Data Set !
MA'l'CH2
!
MATCH2 was created with PROC SQL. OBS !l'IMBl SAMPLBl !l'IMB2 ~AMPLB2
I
'
23NOV94116:001
:
. ..
106 23NOV94 I 15 159 •l
23NOV94:111H J
206
202
7
8 . 23NOV94114159 ~
23NOV941l6159~1
205
207
9
10
11
23NOV9':10158 ·~
23NOV94113100 &
102
104
23NOV94118100~:i
.. 208
.
. ! 1:1
Program The objective is to combine obscrvations from data sets nod TWO when
the values of the variable l'lME from both data sets are wilhin five minutes of
ii
each othe_!, First, sort both data sets by TIME. Rename t~ei;vrriables TIMifand
, SAMPLE so that values do not overlay each other in the ~lj~gram data vector
: when they are read from both data sets. Read an observati9nlfrom each data
set, and write an observation to the MATCH data set if thel~r,lues of TIME
meet the criteria. Then read again from each data set and co' tinue comparing
values and writing observations whe:n appropriate.
I I
ui 1.,11up1er ~
j
iI When the TIME values are not within five minutes of each other, test to
determine which is earliest. Since the data are sorted by TIME, you know you
won't find a closer· match later in the other data set, so write an observation
that contains the earliest TIME value and its associated SAMPLE value along
with missing values to represent th'e other set of TIME and SAMPLE values.
Then read again from the data set that contributed the earliest value, and again
test to see if the match is close enough.
Prevent the DATA step from automatically ending when it reaches the end of
the smallest data set by using an IF/THEN statement to test the value of a
variable that indicates when the las,t observation is read. Because you have to
prevent the SET statement from reading past the end of a data set and because
a
you may need to read new obser".ation from data set ONE or TWO, or both
data sets from multiple points in the
program, you can place these statements
I in a labeled group and branch to it ~s appropriate:
I
Create MATCH I. lute executio11 to a data matchl {keep = timel time~ sarnplel sample2);
'
group ofstateme11 },s fliat
I
read a,i link getone; 1I
observation from ONE a11d tlte11 to link gettwo;
another group that rkads fl'om TWO. Both
groups prevent_ the P,i'\TA step from
:~~~ping before realcrng the end of a data
' i
Format tlie datetir, e ivariables. Set to Otlte fo~.t timel time2 datetirne13 . ;
two vadables tltat ,~ill be used to indicate ~nedone=O; twodone=O;
that tlte last observbtio11from data set
ONE or TWO has ter11 botlt read a11d
1-1
I
1 I
[
I!
[ I
I
I
[ I
I
..r:,;.c n.Au111p1t: ~.~ u
·I
'. l'tup,er.
~
;I ;
,1 .
I ,
I
If tl,ere are mol'e obsJrvatio11s in TWO, gettwo: if last2 then
read a11otl1e1· observ~tio~,. If the last do;
I'' ,
observatio11 has alre~4ybee11 read, set twodone=l;
TWODONE lo I to '11ilicate that tlie last return;
''I
observation was botliI l'ead a11d processed,
I
end;
a,1d tlle11 preve11t th~ ~ETstateme11tfro111 set\wo (rename=(time=teinpt2 sample=temps2) l end=last2;
executi11g a11d altem'pti11g to read past tlie return;
e11d ofdata set TWO.,'fhis code segment
¥
uses the same logic fhe previous one but
run;
applies to data set TWO.·
Related TechniA'ue The following PROC SQL step uses donsiderably less code to produce the
same output as the DATA step, althotlgh the rows and columns are in a
different order in the resulting data set
I
PROCSQLjoins the tables to produce a new table,* MATCHl. Conceptually,
the join results. in an internal table tba~ matches every row in ONE with every
row in TWO. The ON clause subsets ~hat internal table by those 1·ows where
there is less than afive-minute time difference.
Thi~ join is a full outerjoin, which returns rows that satisfy the conditiol'1 in
the ON clause. tn addition, a full outer join returns all of the rows from each
table' that do ·not match with a row froni the other table, based on the condition
in th~ ON claus'e.' For example, for rnws 2 0 2, 2 0 5, 2 o7, and 2 0 8 in table
TWO, there art: no _rows in table ONE:that they can match with that results ,in a
time differential offive minutes or less. Likewise, for rows 102 and 104 from
table· ONE, there are ri9 rows in table two that they can match with that
in
results a time,.: differential
: !, : \
of five minutes
:
or less.
• A PROC SQL table· .is a SAS darn 5et. In SQL terminology, columns are variables and rows
are observations: I. ' i
i
I
!! i
I
j
i
,1
I .t l
proc sql;
create table match2 as
select *
from one(rename=(time=timel sample=samplel)) full _join
two(rename=(time=time2 sample=sample2))
'I
on abs(timel-time2)<=5*60;
: quit;
I
I
~ote: In PROC SQL, SELECT statements automatically p~~~uce a report.
SELECT clauses, which follow CREATE TABLE or CREATE VIBW
statements, do not automatically produce a report. 1
:
I
I
I
i
i
!;
.,
~-. z::,,A(Ull/llt: J'. • .J LI ':l'~UpH:-r L:,
I '
i
Example 2.f j , Combining Observatitjns -When There Is No
Common Variable i
i
II iI
Goal Combine observations based on some criteria, even when there is no common
variable in the two data sets. :
i
Strategy Use the looping action of the DATA step to access an observation from one
data set on each iteration while reading all observations from a second data set
to look for a match. To read the second data set, use the SET statement with
the POINT= and NOBS= options in a DO loop to access all observations
sequentially by observation number until a match is found. Then you can test a
condition for each one to determine whether combining the information from
,,I
I
the. current observation of each data set is appropriate and write an observation
I to a new data set when the condition is met. Optionally, you can write a note to
I
I
the SAS log when no match for a project is found.
I
I, You can perform the same task with PROC SQL, with the exception of writing
I
a note to the log under a certain cond~tion. See "Related Technique."
Note: Due to the variability of data '.and the number of conditions that
determine the path chosen by the PROC SQL optimizer, it is not al ways
possible. to.determine the most efficient method without first testing with the
data; · · · 1
I
Resulting Data Sets
II,
011tpllt 2.3a COMBINE! Data Set
1 : COMBINBl
1 -l . !
COMBINE! was created with the DATA OBS PROJECT STDATli ENDDATR IYORKID COMPDATB Cl!AR02
'
step. I 1 BASEIWll' . 01/09/95 01n119s 1234 01/17 /95 $944,80
2 ROOFING ; ' 02/15/95 02/20/95 2225 02/18/95 $1,280.94
3 WIRE 03/02/95 03/05/95 3879 03/04/95 $888,90
4 BRICK 03/07/9S 03/29/95 8888 03/21/95 $2,280.87
i
'
,, I -
--···-··• .. •o -•••o .... __ .,..,, ,-•·-··- .. , ... ..... ,a,- ............ , • .... _,,.,
II I: i
! I
-IHUOJ'I"" NooJ<
--
I; I
O11tp11t 2.3b COMBINE2 Data Set
COMBIHl2 i! i:I, I
COMBINE2 was created with PROC SQL. OBS l'l!OJSC!' S'l'DA'l'B BH!lDA'l'B WORKID COHPDATE CIW\GB
' J, I
1 BASl!lmN'l! 01/09/95 01/27/95 1234 01/17/95 $9'"80
2 ROOFING 02/15/95 02/20/95 2225 02/18/95 $1,280.94
3 WIRE 03/02/95 03/05/95 3879 03/04(9$ $898.90
BIIICK 03/29/95 $2,280,87
'
=-..,:___--
03/07/95
~-- - ~
8888 03/21/95.
1:
----------~~-- . ·------- - - = - - -
: 11
! 11
! Ii
Program The objective is to bill charges to the correct phase of a ~6~struction project by
creating a new data set that contains the appropriate inf~rw~tion from
PROJECTS and BILLS. Read each observation in PROJEGTS and compare
the values of t~e STDATE and. E~DDATE variables to ;t~iJvalue of _ .
COMPDATE m each observation m BILLS. If the completion date falls within
the range of dates indicated by the project sta~t and end aJtJ values, write an
observation to COMBINE. Set FOUND to 1 and use that ~Jndition I
to stop the . I;
DO UNTIL loop so that no more observations are read fro BILLS after a
match is found,
Create COMBINEl. Read a11 observatio11 data combinel(drop=foundl;.
from PROJECTS. Set FOUND back to 0. set projects;
FOUND will be used to stop the DO found=O;
UNTIL loop aftel' a match has been found.
W/1e1t tire conditio,i is met, set FOUND to if stdate <= compdate <= enddate·then
1 and write a,i observatio1t to COMBINEl, do;
FOUND is set to 1 when a match is found. found=l;
This condition stops the DO UNTIL loop so output;
no more observations are read from BILLS end;
on ·this iteration of the DATA step. end;
If 110 observatio11s match, wl"ile a 11ote to if not found then put 'No bills
.
exist for: 'projec
I
tllelog. 'with start date·• stdate ·•and enddate' enddate +!-1) '.';
run; ._, 1·
.I
Related Technique If you are famili,r with Sttuctured Qoecy Language (SQJ), ~au may want to
use PROC SQL instead of the DATA step. You cannot, hoy,-~ver, write a note
to the log under certain conditions as you can with the D~ · step example.
,t.'U LAU111ptt: ~ •.J LI J11up1t:r .&
PROC SQL joins the PROJECTS and BILLS tables to produce a new table,*
COMBINE2. Conceptually, the join r~sults in an internal table that matches
every row in PROJECTS with every ~ow in BILLS. Using that-internal table,
the WHERE clause deter.mines that oµly the rows that have a value of
COMPbATE that is between STDATB and ENDDATE will be in the resulting
table.
proc sgl;
create table combine2 as
·select *
'I from projects, bills
';<· where compdate between stdate and enddate;
quit;
I
I
i
Note: In PROC SQL, SELECT stat~ments automatically produce a report.
SELECT clauses, which follow CREATE TABLE or CREATE VIEW
statements, do ~ot automatically prod~ce a report.
• A PROC SQL table is n SAS data set. In SQL terminology, columns are variables and rows
are observations.
Co111bi11i11g Si11gfe Obse11•atio11s with Single Obse1va1io11si □ Example 2.4 27
ij
Example 2.4 Performing a Table Lookup When .t'~ e 1
1
l
! :I
Goal Combine two data sets using a table lookup technique th'at directly accesses the
lookup data set through an index on a key variable. This 1dokup technique is
appropriate for a large lookup data set. :
Strategy Perform a table lookup using an index to locate observatiol that have key
values equal to the current value of the key variable. Re~d,ftom the primary
file sequentially. To read the lookup data set, use the SET i;t~tement with the
KEY= option to access the observations directly. Write all!observations from
the primary data set to the output data set even when no inktbh is found and
write a warning message to the SAS log. Before writing ari 1bservation, you 1
can calculate a value for a new variable based on values fr~tp a variable in
each data set. Use error-checking logic to direct executioh 1~d the appropriate
code path. • : I:1
You can perform the same t~sk with PROC SQL, with th.e ~ ception of writing
a warning message to the SAS log when no match is found.1 . ee "Related
Technique." i I! I
; Note: Due to the variability of data and the number of ~oij~itions that
' determine the path chosen by the PROC SQL optimizer, it ~s1not always
possible to determine the most efficient method without first testing with the
data.
I ,
I
l !
I i
Resulting Data.Sets
I!:
Output 2.4a FINAi;,I Data Set
FINALl
11 OBS BMPNCJM SALARY. 'l'AXBRCK'l' Nii'
FINALl was create ith the DATA step. !
1 1m $Ss,'ooo 0,28 $39,600
2 3333 $72,000 0.32 $'8,960
3 '876 $32,000 0-2, $2t,320
4 5'89 $11,·ooo
:
Outp,,t2.4b FINA
I ;
~,;...Set !rINAL2
FINAL2 was created with PROC SQL. OBS BMPNCJM SALARY i'AXDRCK'l' NB'l'
i 1 1234 $55!000 0.28 $39,600
i 2 3333 $721000 0,32 $48,960
3 4876 $32)000 o.2, $24,320
509
" $17i000
--= ====- ~ ~
I
'= -_-----
Program The objective is to create a new data set that includes all of the information
from PRIMARY, only the con·esponding descriptive information from
LOOKUP, and the values of a new calculated variable. The resulting data set,
FINALl, contains the employee's nutnber, salary, tax bracket, and net adjusted
income.
I
• I
First, read an observation from PRIMARY. Then use the SET statement with
the KEY= option to read an observation from LOOKUP based on the current
value of EMPNUM. To verify wheth~r a matching value in LOOKUP has been
located for the current value of EMPNUM in PRIMARY, use the %SYSRC
autocall macro and the _lORC_ automatic variable.* When a match is found,
calculate a value for NET baselon th~ current values of SALARY from
~RIMARY and TAXBRCK.T from LQOKUP. When no match is found, set
TAXBRCKT to missing and write a message to the SAS log.
i
I
.!
Create FINAL!. Read]··'11·observatio11from data finall;
PRIMARY. I· : se~ __ primary;
I· :
Read a11 observatio,ifr, ,ti L001(UP based set lookup key=empnum;
011 tT,e valr,e oftl,e ke~ iiaiiable,
1. I ,
EMPNUM. The SET statement with KEY=
accesses an observatio~ lii~ctly through the
index, using the curre 1t talue of
EMPNUM. I. i
I
i.
value from the observation most recently
retrieved from LOOKUP is written as part
of the current observation. _ERROR_ is
reset to Oto prevent an error condition that
would write the contents of the program
data vector to the SAS log.
select ,primary.empnum,primaty.salary,taxbrckt,
• I
salary*(l-taxbrckt) as net forrnat=dollar7.
from primary left join l9okup
on primary.empnum=lookup,ernpnwni
quit;
i:
., I
I.
• A PROC SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations. i
Example 2.5 Performing a Table Lookup When: tile
Lookup. Data Set is Not Indexed · !
I
Goal Subset the observations from one data set into one of two putput data sets,
based on specified criteria. 11
Strategy ln
Load into an ~rray the data that will be used to determine which subset an
p
observation belongs. Read the input data set sequentially, rforming a lookup
into the array structure. Compare values in the current obs'eivation to the
appropriate values from the array to determine whether th~Yi fall within a
specified range, Then write the current observation to the ~J),propriate output
I I
data set. I i
I
7 Baucom M 70 170 3
i
8 Blair M _.69. 133 1
9 Blalock M 68 148 2
10 Bostic F, 74 170 3
IDEAI,
,:
I
I1:
I.
j:
I.
I '
Resulting Oat~ r3ets
Data Sot
OBS BBIOHi'
:INSHAPE
.l 69 Ai,ple 139 1
2 70 Baucom 170 l
3 69 Blair 133 1
j 4 68 Blalock 148 2
I
i
·1
:I
Output 2.Sb OUTS PE Data Set
Oll'l'SIIAPE
I
I
OBS ' HEIGHT L~ IIBIGHT 'l'YPE
1 67 lldw 160 2
2 69 Alexander 115 1
3 66 Avory 152 2
4 68 ».refoot 158 2
I
I
Program The objective is to create subsets from the BTEAM data set, based on whether
a male team member is considered to be in shape or out of shape. The IDEAL
data set contains three WEIGHT values for each HEIGHT, based on an ideal
male weight for each body TYPE. Thbse values are used to determine whether
an observation_ from the BTEAM data set should be written to the INSHAPE
or OUTSHAPE data set.
-1 !
I I I
I I
! So that all of the values from IDEAL :are available for comparing to the
·I
I WEIGHT
, ,
in
value in each observation I BTEAM, load values from IDEAL into
a temporary rurny. A subsetting IF statement ensures that the only observations
processed are those for males with values for HEIGHT and WEIGHT that are
within a specified range. Use expressions to determine if a WEIGHT value is
within a range of five pounds
.
ot
above I below the ideal weight for that body
type. IF-THEN, ELSE, and OUTPUT:statements write each observation to the
appropriate data set. j
I
Create INSHAPE all ·ouTSHAPE. data inshape outshape; \
keep lname height weight type; !
:[ '
i
" On the first DATA( step iteration, load a arr:ay wt(66:75,j) _temporary_; I
two-dime11sio11al temp,o~ary
'.1
array from //,e if _n_=l then j
i,1/ormation ilt IDEAL: [fhe DO loop reads do i=l to all; i
each observation fromjWEAL and loads the set. ideal nobs=all;
WT array. The ~ssignT1nt statements ' '
wt(height,l)=small;
assign weight values from IDEAL to the ' ·' wt ·(height, 2] =medium;
I I ·
correct array cells. There are three weight ,I •i . .
i, ·
I
Combi11i11g Single Observatio11s wit!, Si11g/e Obsen•atio,Js □ &·ample 2.5 33
lTo help you visualize processing in this example, Figure 2. IJ.'. represents the
itwo-dimensional array WT, beginning with the lower bou'n~ bf 66. If you
!compare it to the IDEAL data set, you can see how it was cbAstructed.
I I '
This
I
statement processes the. array:
!
if wt(height;type)-5.le weight le wt(height,type)+5
then output inshape; i
else output outshape;
> I 'j
On the first iteration of the DATA step, the first observation from BTEAM is
processed: - 111
1
1
Where to Go f~dm Here □ Two-Dimensional Arrays. For a discussion, see pp. 165-169 in SAS
! . Language: Reference, Version 6; First Edition. For examples, see
Chapter 7, "Grouping Variables:to Perform Repetitive Tasks Easily," in
SAS Language and Procedures: :Usage 2, Version 6, First Edition and
Example l4, "Expense Report, "i in SAS Guide to Report Writing:
Examples, Version 6, First Editi(1n.
'
□ Tempor;;iry Array Elements. F9r an explanation and an example, see
pp. 129-1,31 in SAS Language a~d Procedures: Usage 2, Versioiz 6, First
Edition. For a short example, se~ pp. 170-171 in SAS Language:
Reference, Version 6, First Edition.
/ ;
I.: J :
, . , . ; , .. 1 I 'f
--•••-······o -···o•- ----• •••••-••• ...... ~--•o•• ~-••• •T•-, i'f ~ =u,.,y,o ~.v _,..,
.! I
Example 2.6 Matching Observations Randomly ii
Goal Randomly pair observations from transaction and mast~rld ta sets until a good
match is found. Create a new data set containing the result. of the match.
Update the value of a variable in the master data set appr&~riately.
Ii!
Strategy Sequentially process observations from the transaction
• . '· I
dJj
set. Access the
master data set directly by observation number, using thejMODIFY statement
with the POINT= and NOBS"' options; MODIFY allows tlie data set to be
updated in place. Use the RANUNI function to randomiy!gbnerate a number,
and use the CEIL function to return it as an integer; assigti the resulting integer
to the POINT= variable. Use IF-THEN logic to test a variable for a condition.
Continue reading observations until one meets the condidori, write an
observation to a_ new data set, assign a value to a new vari1JI' le, and update a
value in the master data set. ! :
I
i :
I I
'
5
6
Prad
Kia
Monique
16
60
13
7 Sofus 23
Ii I
Output 2.6b ASSIGN Data Set
i
ASSIGN ; i
OBS PROJID BNGIIIBIR '
1- Aero Monique
2 Brandx NOW
3 Chem Jane
: 4 Contra Kia
s· Bng2 Eduardo
6 Eng3 Kia :
t
11
I
i
!
36 Example 2.0 ltl i (,1rapter :l
II i
I ,
i
I
Program The data set ENGINEER lists engineers and their available hours. The data set
PROJECT lists each project by ID and the hours needed to complete that
project. The objective is to use a ra~dom direct access technique to match a
project with an engineer who has sufficient hours to complete that project,
output the results to the new data set ASSIGN, and update ENGINEER to
reflect the hours remaining for eac~ engineer after assignment. The random
direct access technique causes the program to produce different output each
time it is executed. :
FOUND is then set to 1 and the next project is accessed. Exiting the loop when
FOUND::::0 means that no engineer y,rith sufficient hours was found and,
therefore, assigns the. value 'NONE1I to ENGINEER and writes an observation
:1 to ASSIGN. !
:1I ; .f.· I_
I
;
II
Ib
Combi11i11g Single 0bsen•a1io11s with Single Obse,vatio? i Example 2.6 37
' iI
.,0 r.xamp1e~.1 u 'L11up1er4
I .
I
:1
qi
Example 2.1 iI Combining Observations Based on a
Calculation on Variables Contributed by
i Two Data Sets '
'
!
:
:
Goal :I
•'I
I Use one-to-many matching on columns* in two tables and perform a
ii
;,
calculation that shows the relationship between values in columns that are
I
II
I unique to each table. Produce a table that includes only those rows that meet a
:I I
I
i specified condition. I
i I
i
i
Strategy Use the SQL procedure to join two taples. The join produces a Cartesian
: product, which is a combination of e~ch row from the first table with every
' row from the second table. During th~ join, you can perform mathematical
computations to create a new column;using values from columns that are
common to both tables. Subset the joi n to get only those rows that have a
1
I
specified value of the new column.
:
:
ONE TWO
I
i'
.1
• A PROC sqL table is a SAS data set. Jn S~L terminology, columns are variables and rows
;
are observallons. ·
: i
Combining Single Obse1va1io11s with Single Observali~11~i j □ Example 2.7 39
; j, I
Program The objective is to join ONE and TWO to get a row for ~ve~~ combination of
house and store. In this exa.mple, the join results in an inte.,]1.: ~I table of 16 rows,
four-for each house. ! I1
! !I
For each row, calculate the distance between each house k ~ each store by
performing mathematie:al calculations on the X and Y coorbi'nates. Lastly,
determine which store is closest to each house. Select onlyltliose rows whose
values for DIST represent the minimum distance from a spe' 1ific house to the
closest store. 1
I[
I11voke PROC SQL a11d create a table. The proc sgl;
CREATE TABLE statement creates the create table final as
table FINAL to store the results of the
subsequent query,
Select the colrmms. The SELECT clause select one.house, two.store label='Closest Store'
selects the HOUSE and STORE columns
from tables ONE and TWO, respectively. !
. ] I
llJ Ca/cr,late a new cohmm. The arithmetic sqrt((abs(two.x-one.x)**2)+(abs{two.y-one.y)* 2 las dist
expression uses the square root function label='Distance' format=4,2 ' j
I '
!
!
I'
'IU J!xample ~.I L!J l-1iapte,· ~
I 1
i:
lia A Closer Lio'ok Calculate a New Column
I I ;
w It may help you to visualize the plot of the location of the houses and stores
and to actually see how the distance between a specific house and store is
calculated. The following plot sho~s the position of each house and store:
;
6
i
5 s3 s4
'
3 h3 h2
2 s2
h1 sl
l 2 3 4 5 6 1;
As an example, this is the calculation for the distance between housel and
storef !
' .
sqrt( (abs (two.x-one ,x) **2)+ (abs [two.y-one.y) **2))
sqrt((abs(6 - 1)**2)+(abs(; 1 - 1)**2))
sqrt((5**2) + (0**2))
sqrt(25 + 0)
sqrt(25)
=5
,,
:I
41
C H A p T E R 3
! i
Combining a Single Observation with Multiple
Observations : I·:
Goal Efficiently combine values from a single observation in one data set with all
observations in another data set.
Strategy On the first iteration of the DATA step, read the values of all variables from a
single observation in one data set once to place those values into the program
data vector. Then read each observation in the second data set, outputting a
new observation that contains the combined values.
I :
'i
ii
I :I I
'i
i
'""..,,.,,..,11111•.s 1., uu1,0111-- ...,....,...,..,r rn••v•• .. ,.,.,, ,..,,,, .. ,,,,~.,. _u..,c:.1 rur1'"l L..I J....:il.AUf1ty,c ..,,~ -.~
I:
Program The objective is to take the data set that contains infor~al~ n about sales
representatives and add to each observation the same v~t4ep for two new
variables, STORE and DEPT. The only observation in DEI;>T_ID contains
values for STORE and DEPT. The IF-THEN statement Jith the N option
l·I - -
executes the SET statement to read from DEPTJD only o~ce, on the first
iteration of the DATA step. The values for STORE and DEPT remain in the
program data vector for the duration of the DATA step ei~tution because
' ' ' 11
values read with the SET statement are automatically retailled until another
observation is read from that data set. Each iteration readJ ·an observation from
SALESREP and writes an observation that contains the 'sJriie value for STORE
and DEPT and all the data for a single sales representativb: ·
from DEPT_ID 011ly 011 the first iteration. if _n_=l then set dept_id:
Since DEPT_ID contains only one
observation, using IF-THEN and _N_ to
read from it only once avoids prematurely
ending the DATA step when end-of-file is
reached. The variable values from
DEPT_ID are retained throughout the
DATA step.
I:[ .
Example 3.l , Adding Values from the Last Observation in
a Data Set to All Observations
I
in Another
Data Set I
!
Goal · Efficiently combine values from the l_ast observation in one data set to all
observations in another data set.
Strategy Read the values of all vadables from the last observation in one data set once
to place those values into the progra~ data vector. You can use the POINT=
and NOBS= options with the SET sta'tement to go directly to the last
observation in the data set. Then read;each observation in the second data set,
writing a new observation that contaii.1s the combined values.
:
DEPT_ID
Each salesperson in SALESREP works m SALESREP
the store and departm&ritshown in the last
observation in DEPT ID.i
,. OBS STORE DEPT ; OBS NAME MONTH TOTSALES
I
I.
: 1 02 AUTO 1 Harvey Jan $25,375
2 07 HSEWARES 2 Lou Jan $9,950
3 10 AUDIO 3 Mary Jan $27,985
4· 13 VIDEO 4 Sam Jan $8,795
Resulting Data et s 1
!1
i
Ii
II
I:
,.,.'-'"'""""JS tf ,nuz;1c vv.,c:, yuuv11 n•1,r1 1r.u1u1pu:. vu.,.erv1111u1~.1i- I L:..J &.:.rlllllple .J.~ 'f::11
' ! II
Program. ~he objective is to take the data set that contains informatioh kbout sales
representatives and add to each observation the same value (or store and
oepartment, which are read from another data set. The Iast'observation in
DEPT_ID contains the correct store and department value~, k~ that observation
is read on the first iteration of the DATA step. The values f6r STORE and
DEPT remain in the program data vectbr for the duration of t~e DATA step
execution because values read with the SET statement are aritomatically
~etained until other values for those same variables are read tolreplace them.
Each DATA step iteration reads an observation from SALESREP and writes
~n observation that contains the same value for STORE an~
the data for a single sales representative: ! ·
and all of ~rPT
I I
! . '
I
Cl'eate SALES_ID. Read the last data sales_id;
obse,-vatio11ft·om DEPT_ID on the first if _n_=l then set dept_id point=last nobs=last;
iteration, NOBS= sets the value of LAST to
4, the last observation in the data set.
POINT= allows you direct access to
observation 4.
,,:,
Resulting Oat~ S~~
:] !
011tpllt 3.3 FlNAL D ta Set
FINAL
.. OBS ID IWIB SALB BONUS
1 1 Nay Rong $28,000 $2,000
2 2 Kelly Windsor $30,000 $4,000
3 2 Kelly Ninasor $40,000 $4,000
3
'
5
6
3,
.3
Julio Hara=
Julio Meraz
Julio Mera;
$15,000
$20,000
$25,000
$3,000
$3,000
$3,000
7
8 '
5,
Richard Krabil
Rita Giuliano I
!
$35,000
$40,000
$2,500
$2, BOO
C:0111bi11i11g a ~·111g1e uvsen1a11011 w1111 Mllftlpte uoserva11011s y J<.;xamp/e 3.3 47
! !
. :I
Program The objective is to create a new data set that matches each inciividual with the
correct sale and bonus based on corresponding ID values. Thi~ program
m•· atch-merges the data sets 0.NE, TWO, and THREE to er~.al1e.[la single data set
that contains variables ID, NAME, SALE, and BONUS: I i
~ata set TWO contain~ multiple occurr~nces of some valu~.s ID, while ONE Jr.
and THREE contain only one occurrence. Because values of ~11AME and
BONUS (which are read from ONE and THREE) are automatibally retained
across the BY group, multiple observations with the same ".ai~bl for ID contain
the.correct NAME and BONUS values: I ;·
i '
Create FINAL. Combine observations data final;
from Ille three data sets based o,i tlre merge one two three;
111atcl,i11g va_lues for ID to create the by id;
FINAL data set. run;
'to r.xa111p1e :,.-, u I Y.","pier :,
i
I
Example 3.f·I I
Applying. Transaction~ to a Master Data Set
Based on a Common Variable
I I
1• 'i
' I
'
Goal Use a common variable to update th~ values of variables in a master data set
with the values of variables in a trans'action data set without writing missing
values to the revised master data set and without overlaying variable values in
1
the program data vector.
:1
' i
Strategy to
Use the MERGE and BY statements update the values of a master data set
with the values of a transaction data s et. Use the IN= data set option to indicate
1
whether the transaction data set contributed to this obse1·vation. If it did not
contribute, use IF-THEN logic with a DO group to preserve the original values
1
I •
from the master data set. You must rename variables with the RENAME=
option because the master and transaqtion data sets contain the same variables.
HAS'l'ER TRANS
.i
1 1 2 0 1 1 5 6
2 1 3 99 2 3 3 4
3 1 "4 88
4 1 5 '17
5 2 1' 66
6 2 2 55
7 3 i 4 44
I,
Co111billi11g a Single Observation with Multiple Obsen1atio1isj □ E:mmple 3,4 49
. . I·
Program The objective is to update all variable values in observatiol from MASTER
with the values of variables contained in observations in T~NS, based on the
values of the BY variable ITEMA. Because both data seis bontain the same
variables, you must rename variables other than the BY va~ikb1e so that
variable values from observations in MASTER will not bejnerlaid.
Special handling is required when TRANS does not contairi ~ matched value
for ITEMA in MASTER. Use IF~THEN processing and th~: {N= option to
: from MASTER:
Goal ii Reshape the transaction data s~t, turtiing related observations into single ones,
!I,, and match each collapsed
.
observationI from the transaction data set with an
appropriate observation from the master data set, based on the value of a key
i! variable. . I
II'I • I
!,
Strategy i Combine· data from ·two SAS data sets based on the key variable. Read the
I
! ' master d~ta set sequentially, while using the KEY= option to directly access
observations in an indexed transaction data set. When the transaction data set
11
'I contains multiple observations for thJ same key value, you can collapse them
into a single observation as you combine it with an observation from the
i' MASTER data set. Use error-checkirig logic to direct execution to the
appropriate code path. ! I
!
,,, .
Input Data Sets! ;
Ii ,
SSN is common to botii TRANS and MASTER TRANS
MASTER. MASTERl~?ntains only u~ique
values of SSN, but T~1NS can contain up OBS i' SSN NAME OBS SSN RECDATE
to three observations ~ith the same value
forSSN. I JI 1 215-15-0007 David 1 202-36-5566 89
II :_I .. 2 221-27-1:234 Jane 2 215-15-0007 92
Becaus~ the program ~~pends on accessing
3 '231-18-1345 Susan 3 215-15-0007 90
observations in TRAI-fS directly using
KEY==SSN, TRANS m'~st be indexed on the 4 233-44-3215 Paula 4 215-15-0007 89
I I 5 243-09-8956 Joe 5 221-27-1234 92
variable SSN. That botti data sets be sorted
by SSN is not requireh but is recommended
1
6 221-27-1234 90
for pedrn-m,nce, !I 7
8
221-27-1234
231-18-1345
89
93
9 231-18-1345 92
10 243-09-8956 93
,· 11 243-09-8956 92
12 243-09-8956 91
When Only Matched Observations Are OBS DA'l'Bl DA'rB2 DA'rE3 SSN N»IE
91
. 221-27-123(
231-18-1345
20-09-8956
Jana
Suaan
Joa
=
,: I'[
I
'
:
;
Program The objective is to combine data from the TRANS dat~ s~ with the
appropriate observations from the MASTER data set. MASTER contains one
observation for each value of SSN while TRANS contairi~lmultiple
observations for some values of SSN. Instead of creating! ¢ultiple observation:
in MASTER, this application collapses into a single obsJr
: I!
ation multiple
observations from TRANS that contain the same SSN val e. This program
assumes that TRANS contains no more than three obseH~ ~ons with the same
value for ID. ' i 1! ·
! 11 .
First, read one observation from MASTER. Then, by designating SSN as the
KEY variable, read all of the observations from TRANS th the same SSN r'f
value, and write a single observation to FINAL!. Use an ~r;ray to assign each
RECDATE value from TRANS to DATEl, DATE2, or DATE3, as
appropriate. This application assumes that the TRANS c1JiJ set has been
indexed on SSN. It also assumes that no more than three t~~ords exist in
TRANS for each value of SSN, so the DATEFLD array dohtains obly three
elements: ' : . 'I
Create FINALI a11d define array data finall(drop=i recdate);
DATEFLD. Read a11 obse,-vatio11fi·om array datefld(*) date1-date3;
MASTER. In preparation for collapsing set master;
multiple observations in TRANS into single
observations for each SSN value, the an·ay
DATEFLD is defined.
~ Related TedHnique The program shown previously produces a resulting data set (FINALl) that
includes all observations in MASTER, due to the logic of the DO group in the
WHEN statement: ·
i
l11cl11de all obserJJa io,11sfrom MASTER i11 when (%sysrc (_dsenom)) do;
output data set. See INALl in Output output; 1
3.Sa. ii end;
.:_error_= O;
To produce a resulting data set (FIN~L2) that includes only those observations
from MASTER whose SSN values o·ccur in TRANS, change the statements in
the WHEN statement: ·
.i
' i i
Goal • Update a master data set in place using values supplied by 1~ transaction data
; set to locate observations in the master data set that are to be replaced.
i ; 11 .
i i.! :
; '.l .
set. When no match occurs, you can write a note to the log ;
I i
Step-by-Step Strategy I ;'
' I
Read the transaction data set sequentially, but use the SET statement with the
POINT= option to access observations directly by observ~t~dn number. Place
the SET within an iterative DO loop so that the index variablf for the DO loop
can supply values to the POINT= variable. Using direct a~d~ss is important
Ibecause_ it makes it possible for _you to reread the current ~ti~+rvation from the
;i transaction data set when there 1s a match. :· I . I: I
i . ! H.
;Next, use the MODIFY statement to read an observation fro:qi the master data
lset, retrieving a match based on the values of the KEY;::;; va#,ble. Use the'
i IORC automatic variable and the SYSRC autocall macrd'iri a SELECT
!found.
!
t;
!group execute the appropriate statements, based on whbttWa match is
.
:
. : 11
II l
_ .
1when a match is found, reread the current observation froni t e transaction
'data
:
set so that the new values for these variables will ovei"l~t:
. ,. I·
the values read
from the master data set. Use SET with POINT= to reread tlie same
:obs_ervati~n from the tra~saction data set, this time bringi~g!I the values of all
1
vanables mto the program data vector. ! I:
1
i
1
!Fmally,
I
• usethe REPLACE statement• to up dfile the observa
•
I 110n
• •
mp1ace mt
. he
master data set. · •
! . !
i
!
Input Data Sets :
Both X and Y are common to the MASTER MASTER TRANS
and TRANS data sets. MASTER contains
multiple observations with duplicate values OBS X y OBS X y
for the key variable X. Because the program
depends on directly accessing observations 1 1 2 :i 1 8
in MASTER by using KEY=X, MASTER
2 1 3 2 3 9
must be indexed on the variable X.
3 2 4 3 5 2
4 3 5
5 1 2
.... /CACIIIIJIII: .,,U 11":""JIii:/".,
I ,
Resulting Data Set
I;
1.' •
Output 3.6 Update~ Version of MASTER
MAHER
Data Set !
OBS X 1C
11 1 8
21 1 8
3 I 2 4
4 ! 3 9
5 ! 1 8
Program The objective is to update the MASTER data set in place, replacing any
observation whose value of the key yariable X matches the value of X stored in
the TRANS data set TRANS contains one observation for each value of X,
while MASTER
. contains multiple observations
. for some values of X.
Use the KEEP= option to read only t_he variable X from TRANS. Then read an
observation from MASTER using the MODIFY statement and the KEY=
option. If there is a match for the key variable X in MASTER, reread the same
observation from TRANS, this time including all of the variables. These values
overlay the existing values in the program data vector that were read from
MASTER, and the REPLACE statement updates in place the current
observation in MASTER. !
To read selected observations from iRANS twice, use the SET statement with
the POINT= option to access observations directly by observation number, Use
an iterative DO loop to supply values to the POINT= variable for the first SET
statement. Then use this same variable (P) as the POINT= variable in the
I
second SET statement, allowing it to :read the same observation in TRANS in
i. its entirety. The entire program runs in one iteration of the DATA step.
I
·:
Use the _IORC_ automatic variable ~nd the SYSRC autocall macro in a
SELECT group to route execution to the appropriate code path based on
whether
, .
a match is found:
.' ~
i
I:
i: 1· l,_J
!i
Combi11i11g a Single Obsen•ation wit!, M11/tiple Obse1wi1iJ11s □ Example 3.6 55
. Ji
Set i
!
Goal Update a master data set in place usiri.g values supplied by a transaction data
set to locate observations in the rnast~r data set that are to be deleted.
I
I
:• :
Strategy Process observati9ns from the transa4tion data set sequentially, supplying
values to locate o~servations that are :to be deleted in the master data set. Use
the MODIFY statement to update a data set in place and the KEY= option to
directly access the master data set by 'using an index on the KEY= variable.
Verify the results of the MODIFY execution using the automatic variable
_IORC_ and the SYSRC autocall ma9ro. Use the REMOVE statement to
delete any match in the master data s~t. Use an iterative process to access all
observations in the master data set th* have a match for the cu1Tent key value.
When all observations for the current key value have been deleted or when the
master data set does not contain the kh
value supplied by the transaction data
set, continue processing to the next kJy value froin the transaction data set. The
task is complete when all observation~ from the transaction data set have been
processed.. , I
I
!
You can pe1form the same task with ~ROC SQL; see "Related Technique."
I
I
Note: Due to the vqriability of data imd the number of conditions that
determine the path chosen by the PRQC SQL optimizer, it is not always
possible to determine the most efficie11t method without first testing with your
data. · ' ·
: :,.:
--:=:::::--~ - - - = - = = = - = - - - - - - - = - - - - - - - - - - - -
I !.
!
i !
, I '
Program The objective is to up~ate the MASTER data set in place, 1iemoving any
observation whose value of the key variable CUST match~~1 the value of CUST
stol'ed in the TRANS data'set. TRANS contains one obser11~ation for each value
ofCUST, while MASTER contains multiple observatiorisl6r some values of
CUST. . i 1':!
!' I;·1
Read an observation from TRANS to obtain a value of Ct,r~/f· Execut.e the
MODIFY statement with the KEY:::; option to directly access MASTER using
the index defined for CUST. .
To verify whether a match ~a's peen located, use
. l;
the,SYSRC autocall macro and the _IORC_ automatic vanable. When a match
occurs, the RE~OVE statement deletes the observation ju~~lretrieved and
updates MASTER in place. When no match occurs, FLA'.Q 1~set to prevent
further retrievals for the current value of CUST. I I·!
i:
,j .
Remove o~sJ1·vatio11s i11 MASTER based · flag=O; .
011the v~li~e ~f the key val'iable CUST. The do until (flag);
DO UNTilli loop executes and processes modify master k~y=cust;
0
i
: '1;.i
! I;
Related Technique If you are familiar with Structured Query Language (SQt); ou may want to
use PROC SQL instead of the DATA step. You can use aDELETE statement
in PROC SQL to delete rows from a table.* The rows that 1 1eet the criteria
specified in the WHERE clause are deleted.
proc sql;
delete from master · I
, ·I
I
!
I
'I
I
I
i
i
•i
I•
1:
'i
• A PROC SQL table is a SAS da<a set. In SQL terminology, cplumns ~re variables and rows
are observations. I ·
11
OU r,.\Uf/1/JII: J,U
Goal Combine two data sets by using the value of a specific variable to look up
information in a small auxiliary or lookup data set and add it to information in
the primary data set to create a new data set.
Strategy Sequentially process observations in a primary data set while using direct
access to read observations in the lookup data set until a match is found. This
table lookup technique directly accesses the lookup data based on observation
number and avoids reading subsequent observations from the lookup data set
once a match has been found. This te6hnique is best used with a smaH lookup
data set because there is the possibility of having to read many records from
the lookup data set when trying to fin~ a match.
!
To read the primary data set, use the SET statement to read one observation on
I
each iteration of the DATA step. To r~ad the lookup data set, use the SET
statement with the NOBS= and POIN::£';:: options and an iterative DO loop to
access each observation by observation number. Then you can test a condition
to determine whether combining the i11formation from the current observation
of each data set is appropriate and wri,te an observation to a new data set only
when the condition is met. Use the RENAME= option to rename the common
variable from ti1elc>okup data set so tliat the value read does not overwrite the
value read from the primary data set. \
You can perform the same task with P,ROC SQL; see "Related Technique."
, , ., , ' I
1, ! ' ,·, :
Note: Due to the variability of data and the number of conditions that
determine the path chosen by the PROC SQL optimizer, it is not always
possible to 1eter~ne the most efficie~t method without first testing with your
data. i
i
';. !
:
Input Data Setsl ! I:
Both data sets have the common variable 1 PRIMARY i LOOKUP
,. t
PARTNO. Ii' I
I
I
OBS PARTNO QUANTITY OBS PliRTNO DESC
. l I
!
1 A220 ' 4 1 A401 tuning peg
' 2 A498 41 2 Ao2s
I
bridge
3 A063 ' 8' ' ' 3 A203 nut
. 4' " A810 •, '4 ; '
4 l\220 neck
5 .1\810 pick guard
11,l:·, i
6 A063 pickup
d ! ,; A047 pot
7
.. 'cl,, 'i! volume knob
" 8 if608
I 9 A097 toggle switch
I 10 A498 body
,-1
Co111bi11/11g a Single Obsen1atioll with Multiple Observnt{o\,f □ Example J,8 51
l
' A220 4 neck i I
2 A810 4 pick guard I
3 A063 8 pickup !
4 A498 4 body i
!
'' ' 'i
' '' ·!
i
Program The oqjective is to create a new data set that includes all OJe information
from PRIMARY and only 'the corresponding descriptive 6rmation from i~f
LOOKUP. The resulting data set, REPORT!, contains the p!rt number,
quantity, and description. Read an observation from PRIM:i\~Y and
subsequently read observations from LOOKUP until a ma(~h is found. Use
RENAME= to rename PARTNO in LOOKUP so PARm~ ~ alues from
LOOKUP and PRIMARY are both retained. Use IF-THEl'f logic to compare
, the values and to output only matching observations: I I!
·1
;
i
Writea11 observatioil:io REPORT! ift!,e if partno=pn then
co11ditio11 is met. Se~F,OUND to 1 so the do;
DO loop stops and np.¥1~re observations output;
are read from LOOI<!UP until the next found=l;
observation from PR1MARY is read. ii end;
'
end;
i
Write a 11ote to the f gl whell there is 1l0 if not found then put 'No mat~h for PARTNO=' partno 'in LOOKUP.'
matclt. !I ' •
'Observation I
not added to!REPORTl data set.';
run;.
I ,
I !
Related Techn'que If you are familiar with Structured Query Language (SQL), you may want to
:1 ; use PROC SQL instead of the DATA step. PROC SQi.joins the tables* to
'I produce a new table, REPORT2. The; REPORT2 table is the same as
,i
i REPORT!, except for the order of the data, The difference in the order is a
:j result of the different processing techriques.
:· I I
Conceptually, the join results in an internal table that matches every row in
PRIMARY with every row in LOOKUP. The WHERE clause determines that
only the rows that have matching valttes for PARTNO will be in the resulting
table. The table REPORT2 has the qu antity and description for each part that is
1
proc sql;
create table report2 as
:select *
from primary, lookup
where primary,partno=loo~up.partno;
!;
quit;.
I
I
I
I
* A PROC SQL table is a SAS dnla set. In SQL terminology, columns are variables and rows
:1 are observations.
,I
i
.I
!i
i
i
I
I
I
I
Example 3.9 Performing a Table Lookup with Larige
Nonindexed Data Sets ! :! I
Goal
values remain fairly constant. :
•
,
j!
I
I.
1
strategy Use a table lookup technique that relates data using a usl) 1ritten format
rather than sequentially processing both data sets. Dyna~ically build the
format and retrieve the formatted values. This technique ik;~fficient when you
have a large data set whose retrieved values remain fairly bb"nstant and when
no index is otherwise needed for the data sets. i 1:: 1.
First, create a data set that· you can use to pass information !from the lookup file
· jl I
to the FORMAT procedure to dynamically build the format! Specify this data
set in the CNTLIN::::: option as input to the FORMAT proc~4ure. PROC
FORMAT uses the data iri the input control data set to bµi~~ the format. Create
a new data set by reading observations from the primaryifii,{and using the
PUT function to apply the formatted values of the commori ~ariable to a new
variable. 'I '
1
I;
, I 1'.
. ' II
Ea See "A Closer Look'.' for more information on dyn~~·1•cally building
: formats and retrieving val~es. ; I! :
You can perform the same task with PROC SQL; see " R~l ted Technique."
Note: Due to the variability of data and the number of hAh~i~ions that
determine the path chosen by the PROC SQL optimizer, itJ~J not always
possible to determine the most efficient method without fi~si testing with your
data. l:i
1:i '
~-=-------
' '
-=----=-====== -====== ====== - -__ -==-
•
I;
f I, .
~
' A810
'
====-----=========-=-=======- --==-======= -_-- --
piak 9'\lard
Program The objective is to create a new data set that includes all of the data from
PRIMARY and only the correspondi~g descriptive information from
LOOKUP. The resulting data set, REPORTl, contains the part number,
quantity, and description. Create the data set FORMATS, which takes the
information contained in LOOKUP, r~naming variables and adding a
FMTNAME variable so that PROC FORMAT can use it to dynamically build
the format $PARTS. Execute PROC FORMAT. Then create REPORT I, which
reads from PRIMARY and applies th~ formatted values of the key variable
PARTNO from LOOKUP to the new ~ariable DESC:
I !
Create tlie co11trol da~a ~~t FORMATS. data formats; !
Readfrom LOOKUPr~'~ re11ame tire set lookup(rename=(partno=star~desc=label));
variables tJ,at are req11iredfor tire fmtname='$parts';
CNTLIN= data set. R~riame PARTNO to run;
I ' .
START and rename~~~<;:! to LABEL.
Assign the required vI!,a~le FMTNAME
the value $PARTS, i:
;
I l
Ii.! Use CNTLIN= to ilild the format proc format
;
cntlin=formats;
$PARTS. dynamicall1,.1 j run; :
1
Ii
Create the data set Rl{P,ORTJ. Readfrom data reportl;
PRIMARYa11d create1(~te:11ew variable · set primary;
DESC. The PUT function relates the values de~c=p~t' (~rtno, $parts. ) ;
,a
~! the format $PARTS:. the common run;
variable PARTNO fro?1f90KUP, and the
results are stored in th new character
variable DESC. : ! I ; .::11-
I . ;I : j •
ii
I
II
:;i
Lo1110111mg a .:,111g1e uoservario,1 1111111 Mtttrtpte uoserva:10 ;1 u uampte 3,9 65
!
I
I
i ;
I
Related Technique If you are familiar with Structured Query Language (SQL),! may want to tou
use PROC SQL instead of t"1e DATA step. Using the PR::1¥i*-RY. table,* PROC
SQL creates a new table, REPORT2, that has a new columnj DESC. The PUT
function assigns values to DESC by using the $PARTS. tor[at, created earlier,
with the PARTNO column. The $PARTS. format contains description for a
each part represented in the PARTNO column: 1 I· ·
! ' '
i :
proc sql;
create table report2 as
I I
select*, put(partno,$parts.) as desc i
from primary;·
quit; i
I
i
Note: In PROC SQL, SELECT statements automatically~foduce a report.
SELECT clauses, which follow CREATE TABLE or CREA:fE VIEW
statements, do not automatically produce~ repoct I )' .
I . I
iI '
j,,
* A PROC SQL table is a SAS data set. In SQL 1crminology,, columns 1 : 1 variables and rows
are observations. : i: j
·I
''1I ..
I
I
UU 1:#.\lltllpll: .,.lU ~ ,1..:,,ruprt:1 J
1 ,
1
! .
• For an explanation of the behavior of SET ,vilh KBY= when duplicates exist, see SAS
Technical Report P-242, SAS Software: Ch;"ges 011d E,1/1a11ceme11/s, Releose 6.08, page 14.
c:ombi11i11g a Single Ubse,-v~tio11 with Multiple Obse,-vatio11s □ Example 3.10 67
;
'I
Resulting Data Sets i
''
011tpr1t 3.10a REPORTl Data Set ' ':
:; ·' lll!PORT1
'
REPORTl was created with the DATA OBS STORNUIB CI'l'Y I'l'Elf 'JiMOUN'l'
;. 1.
step, 1 Lynn's Fin88t St Thomas llBBI'l' $350
2 Lynn'&; Finest . St Thomas DEBI'l' $550
3 Lynn i's: Finaat San Diego DEBIT $550
Lynn's. Finest San Diego DEBIT $250
'
5
6
Lynn• s · Finest
Lynn's Finest
St Th0111aa
St 'l'homas
Clll!DI'l'
CREDIT
$450
$300
I
7 Just 4 ,You San Francisco DEBIT $20
8 Juit 4 ·you San Francisco DBSI'l' $10
I g Just 4 lt'ou New York CRBDIT $775
10 Just 4,You New York CRBDI'l' $995
11 Just '4 '.You Boston CRBDI'l' $jl.,OOO
12 Just 4 ,You Boston CREDIT 1'$,,500
"I I'.
i :
!
Output 3.10b REPORT2 Data Set
RBPOR'l'2
REPORT2 was created with PROC SQL. OB!! STOIUWIB CITY ITEM iAJ OUN'l'
i
1 Lynn• a Finest St 'l'hOIIIBB DBBI'l' i $350
2 Lynn, I Finest st Thomas DEBU ' $550
3 Lynn's Finest San Diego DEBIT ' mo
4 Lynn's Fine11t San Diego DBBI'l' $250
5 Lynn's Finest St ThOlll!lB CIIEDIT $450
6 Lynn's FinBBt St Thomas ·Cl!EDI'l' $300
7 Just 4 You San Francisco DEBIT $20
8 Juat 4 You San Francisco DEBIT $10
9 Just 4 You N'ew York CREDIT I $775
10 Just 4 You New York CRBDIT ! •$995
11 Just 4 You Boston CRBDIT 1#,000
12 Just 4 You Boston CRBDIT ,$roo
H
I
'
i
Program The objective is to create a new data set that includes ijtion from
PRIMARY and the corresponding descriptive informatio? f~·pm LOOKUP.
The resulting data set, REPORT!, contains the store name,lcity, item, and
amount. Read an observation from PRIMARY using sequehfial access. Using
the composite index STORLOC, read an observation from ~bOKUP directly
based on the current values of variables STORE and LOC. B~cause
PRIMARY contains duplicate values, you must begin eachl~barch on the
STORLOC index for the LOOKUP data set at the beginnin~-1 _Otherwise, you
would miss matches in LOOKUP for consecutive duplicate v.alues of
STORLOC in PRIMARY.
UU 1:M.Uttlptr: J,.JVI ! L.1 1-1,aup1c1 ..,J
I:1 :
Create REP01T~i Read a11 obsenation data reportl(drop=store loc};
from PRIMAR!Yl : set primary;
I·I i
Readfro,n LOO,{(UP witlt direct access, set lookup
. key=storloc/unique;
;
based Ott val11eh11 't!,e composite illdex
STORLOC, UNIQUE causes the search to
always begin a~ ·tite'.beginning of the index,
so that consecuti~e duplicate values in
PRIMARY will riot miss amatch in
LOOKUP. I !:I
Iii ·
Whe11 tlie c11n·e1it val11es of STORE a11d select (_iorc_);
LOCfrom PRI~Ymatch a STORLOC when (%sysrc{_80K)) oqtput;
i11dex valuefi·op~!LOOI(UP, wrilea11
observatio11 to REPORTJ. When the value
t ·I '
of _IORC_ corresponds
I; .
to _SOK, there is a
match. I'!'I ..
i
Wl,e11 tlte Cltrl'Bfl'i val11es of STORE a11d when (\sysrcLdsenom))[
LOCf,'0111 PRlfA~Y do 11ot matcli a do;' j
STORLOC i11def lvqlllefro111 LOOIWP, put 'WARNING! New Location not in Table' store= loc=;
write a wan1i11B1 llf essage to tlte log. When _error_=O;
the value of _IOjRC.:.. corresponds to end;
_DSENOM, therelis no match. The PUT
statement writes! ~!I message to the log.
Setting _ERROR_ to Oprevents the error
condition from writing the entire contents of
the program dat~ ~ector to die log,
l!.I ·.
/11 case of all u,iexpected _IORC_ otherwise.
co11ditio11, w,·ite H1r ~rror message to the do;
SAS log a11d stol/:~xec11tio11. Wlten JORC_ put 'Unexpected RROR: _IORC_ = ' _iorc_;
corresponds to ap~thing other than _error_=O;
...DSENOM or -f~K, an unexpected stop;
condition has ber~ encountered, so an error end;
message is writt«r~!to the SAS log and the end•
STOP statement ~efminates the DATA step. ' I
run;
_ERROR_ is resl'~ 1t~ 0 to prevent an error
condition that wJuldI •I ,
write the contents of
the program datl y~tor to the log,
I
t
:I,1 ;.
I .
1
;1
:I
i .
l I
·1
I
I
I
'I
I
i
II
O o . I:' j j:] I ....,,,.uu,yn.; .., • .s.v u~
, l;j
: i·i .
• 11
Related Technique If you are familiar with Structured Query Language (SQL)! JYOU may want to
use _PROC SQL instead of the DATA step. PROC SQL joins the tables* to
produce a new table, REPORT2. The REPORT2 table is' thd same as
REPORT 1. • ! I11
Conceptually, the join res~lts in an internal table that matctl s every row in
PRIMARY with every row
in LOOKUP. However, you Y'~qt
only the rows
where the values for STORE and LOC are the same. The WHERE clause
returns the rows from the join that have the same values fd~
TORE and LOC.
Thus, the result is a table that includes information from 'bb't tables, based on
the columns they have in common: i; !! •.
proc sql;
create table report2 as
select storname,·city, item, amount ,
from primary p, lookup l I!..
quit;
where p,sl:ore;=l.store and p,loc=l.loc;
:
• A PROC SQL table is a SAS data set. In SQL terminology, columns iir I variables and rows
are observations. ' !
,u t:M,(IIIIJJlt! .J." J. J ·1 '-llr.lpuu J
I 1 ;
I I'
Example 3. ·',, i Performing a Table Lopkup with a Large
Lo~kup; Data Set That ~s Indexed
.' . !
Goal Efficiently combi~e two data sets wh6n the lookup data set is large and has an
index. · '
Strategy I Use a table lookup technique that is e~pecially appropriate for a large lookup
data set. Perform a table lookup using an index to locate observations that have
key values equal to the current value l:>f the key variable. Read from the
primary file sequentially. Then to read the lookup data set, use the SET
statement with the KEY= option to adcess the observations directly.
Observations are written to the output data set only when a match occurs in the
lookup data set for the key value supelied by the primary data set. Use
error-checking logic to direct executi?n to the appropriate code path.
!
You can perform the same task with ~ROC SQL, see "Related Technique."
I
Note: Due to the variability of• data I~nd the number of conditions that
determine the path chosen by the PRqC SQL optimizer, it is not always
possible to .determine the most efficient method without first testing with your
data, i :. •; \' ·
REPORT! was created with the DATA OBS PAR'l'NO QUANTITY DISC Ii
step. 1 A063 8 pickup n
2 A220 4 neck ! Ii
3 AOB 4 body ! I;
A810, piak guard 1:
' ' !' 1:
REPORT2 was created with PROC SQL. OBS PAltfflO QIIAN'l'I'l'Y DISC
1 A01i3 8 pickup
A220 naak ,
''
2
3 A08 body i
4 A810 4 pick guard:
'
'
: lj·
!
: i
t
Program The objective is to create a new data set that includes anloi he information
1
1
from PRIMARY and only the corresponding descriptive:iMormation from
, LOOKUP. The resulting data set, REPORT!, contains tlie1p~rt number,
quantity, and description. I 1:! I:
I !:11
First, read an observation from PRIMARY. Then, use the SET statement with
' 1·1
the KEY:::: option to read an observation from LOOKUP :i,~~fd on the current
value of PARTNO. To verify whether a matching value in!~OOKUP has been
located for the·current value of PARTNO in PRIMARY,I U~fr.the SYSRC
! autocall macro and the _IORC_ automatic variable. Wh~n !match is found, I~
write the observation, When no match is found, write a ~ar,rling message to the
SAS log, reset _ERROR_ to 0, and continue processing. Whkn an unexpected
condition is _encountered, write an error message and stop 4xkcution:
, . Ii::
:!
Create REPORTI. Read all observatio,i data report:1; iI
fl'om PRIMARY. set: primary;
; '
!I
I If!. I:,Xtllltp,t:c J,J. l J; \.,,rlUJJU:t J
II::.1 :!.
Wizen 110 match is Jou11d, write a wanzi11g when (%sysrc {_dsenom) )
message to tlze SAs 1 rJ.g:
1
When the value of do;
•I I
_IORC_ correspondf ,to ~DSENOM, no put •~/ARNING: Part humber' partno •is not in lookup table,';
I
observations in LOq~QP contain the _error_=0; '
I
Related Techn;que If you are familiar with Structured Q*ery Language (SQL), you may want to
use PROC SQL instead of the DATAi step. PROC SQL joins the tables* to
; I
,,
• A PROC SQL table is a SAS data set. In SQL terminology, columns arc variables and rows
: are obs'ervntions. :
!
II ,.. r ... n ..., .
I 11:i I
I ij
!I
:i
,
Ii iI I
The many-to-many category implies that multiple observations from each
input data set may be related based on the values of a ccinirilon variable.
I I:!
; !: i
4.1 Adding Variables from a Transaction Data Set to a Master Data Set 74
4.2 Updating a Master Data Set with Only Nonmissini tLues from a
Transaction Data Set 76 j l:i I;
4.3 Generating Every Combination of Observations (6a~1t~sian Product)
between Data Sets •78 : 1:1 I.
i i'
4.4 Generating Every Combination of Observations bet~Jen Data Sets
Based on a Common Variable 80 f l!J
: '; 1.
I:
i iii' '
4.5 Delaying Final Disposition of Observations Until All' Processing
Is Complete 82 • i Ii I
: 1: :
!, 4.6 Generating Every Combination between Data Sets!. Biased
I•
on
a Common Variable When an Index Is Available 86 · 1
, Ii .
! 4.7 Combining Multiple Data Sets without a Variable 1 ~6 on to Ali
1 the Data Sets 92 ' I:!
l (1
I•
ii
:i
I
,.. t:,.\Wll/Jlt: 't,J LI I'\'''."P'"''.,
• I
;1 _:
;:
Example 4., Addi~g Variables fro"' a Transaction Data
Set to a Master Data Set
Goal Based on the values of a common variable, produce a new data set by
combining variables from a master data set and a transaction data set. Include
only observations that the masterdat~ set contains.
i;
I
I
Strategy Use the MERGE statement with the I;JY statement to combine the observations
from the two data sets. Use the IN= data set option to indicate whether the
master data set contributed to an obsdrvation. To get the desired results, set the
value of the temporary variable creat~d with IN= to Oat the top of the DATA
step to l'eset the value when the BY v~iable changes. While merging
observations within each BY group, tjse the subsetting IF statement to allow
the DATA step to complete the current iteration and to write an observation
only when the master data set has contributed to it.
';
This match-merge operation requires that each data set either have an index on
the BY variable or be sorted by the values of the BY variable.
MAS'i'ER TRANS
ma~ contain duplicate! .JaI~es for the BY
val'1able NAME. .• J :
y·
OBS NAHE OBS NAME z
i! :
Ii' 1
2
John
John
}111
2222
1
2
John
John
89·
94
3 John 3333 3 John 83
,, 4 !4afy 1111 4 Macy 77
;I
. i •. 5 Mary 88
,·, . 6 Mary 99
. ,1
I I !:!,'
Resulting Data TTiet 'i •. : i:il\-1;
!
Ortlpr,t4.la COMB i ED Data Set I
... ·•\ .: COMBIIIBD
I
Ol!S WIMI! i y z
i
• ,:~ r _:~ _., ··i 1 John I 1111 89
d : , [,; I.( i 2 John! 2222 94
3 John i 3333 83
4 Hilty! 1111 77
Co111bi11i11g Multiple Obsen•ations wi1l1 M11ltip/e Observalioi1's , □ Example 4.J 75
: :I 1•.ii ' I
Program The objective is to combine observations from MASTE~ lwd TRANS based
·on the values of a common variable, including only thos~ 6tiservations to
which MASTER contributes. Use the MERGE and BY st~t~ments to combine
IF
observations from the two,data sets. Use the subsetting ~t~tement and the
IN= data set optio.n t9 determine when MASTER has contti?.uted vari~bles.
Reset the IN= variable to Oat the top of the DATA step. qtHerwise this value;
which is retained until the BY group changes, may cause. t~i :.DATA step to
write additional observations to COMBINED from TRANS ;
;" I:, .. I i !! i
,j, •. ·i ' ·:
Create COMBINED. Combi11e data combined; : : '!
observ.atio11Sfrom MASTER a11d TRANS inrnast=O; : . :I ,
based 011-tlze matcl,l11g values for tlze BY merge master(in=inmast) trans; ;I
variable NAME. IN= creates INMAST, by name; ' 'i
which is set to 1 when an observation from I
I
MASTER contributes to the current
observation. lNMAST is set to Oat the top :1
of the DATA step so that a previous value 1
Related Technique
•
!The preceding program writes an observation to COMBINE , only if the
! Ii
;MASTER data set contributed. In the input ?ata sets, TRA*~ contains three
Iobservations with the value of MARY for NAME, but MA,5TER contains only
j one. If in your application you want the resulting output da~, f.et to contain
1multiple observations when the transaction data set does butthe master data set
idoes not, then simply remote the assignment statement th~i~bts the value of
:the IN= variable to 0. In· thi~ example, if you do not reset iNMAST to 0, its
;value is retained throughout the BY group. Three observati6\is, therefore,
' '. 1,, I
;containing the value MARY for NAME are created and wdtt n to the output
data set COMBINE2. See Output 4. lb. 1 !
:I I.
[ data combine2;
merge master(in=inmast) trans;
by name;
if irunast;
run;
10 l!.,).Olllple 'l,,t. u Il11ap1er,.
I :1
Strategy Use the MERGE statement with the BY statement to update values in a master
data set with values from a transaction data set. Use IF-THEN logic in
conjunction with the RENAME= dat~ set option to apply transaction values
only if they are not missing values. i
, I
I
Tbis match-merge operation requires that each data set either have an index on
the BY variable or be sorted by the v~lues of the BY variable.
;
i,. :I OBS
COMIIINB
I
ITl!M PRICII
'I , 1
i
ap~le $1,99
:1 2 aPl)le $2.89
3 apple $1.49
4 banana $1.05
5 grapes $2,75
G 9rapes $2.75
7 Or8119'8 $U9
8 orange $1,89
9 ora:ige $2,39
;', ,•
;
Combi11i11g M11friple Observ11t/011s with Multiple Obsen1atioi1'~1 □ Example 4.2 77
' Ii
Program J
The objective is to update. the observations in MASTE~ I: ose ITEM values
have a match in TRANS, 'except when the value of the PRI€E, the variable
I
being updated, has a missing value in TRANS. The variatM PRICE in TRANS
is renamed NEWPRICE so that in the program data vecfoHts value does not
automatically overlay the value of PRICE read from MA~1JER. When the
value of NEWPRICE is not missing in TRANS, use IF-THEN processing to
assign its value to the PRICE variable in MASTER. Othe~~l.ise, u~e the .
existing value of PRICE in MASTER: l !;
. I'
1!
Create COMBINE. Co111bi11e observations data combine (drop=newprice);
from MASTER a11d TRANS based on the merge master trans(rename=(price=newprice)); i
111atclli11g values fot ITEM. RENAME= by item;
I
·:
renames the variable PRICE in TRANS for
later processing with the IF-THEN i :
statement.
Goal Combine two tables* that have no common columns in order to produce every
:1 possible combination of rows.
I
I
Strategy :I Join the two tables with PROC SQL, '?/hen you join two tables without
specifying join criteria in a WHERE clause, you get a Cartesian product. A
Cartesian product shows every possible combination of rows from the tables
beingjoined. PROC SQLjoins the tables listed in the FROM clause.
:1
• A PROC SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations. j
I
I '
·1
' i
I ;j
Combi11i11g Mulriple Observal_iowr with M11/tiple Observatioii1 0 Example4.3 79
;
Result,ng Data Set
i
01ttp11t 4.3 FLIGJ-J.TS Table
FLIGHTS '
i
OBS DIST 'l'RAVCODB HAMB LML
'
; !
1 DB'l'ROI'l'' C751 Kreuger, Jobn 1
1
2 Dl'l'ROIT · C751 Angler, Erica' 2
3 DITROI'l'. C751 Ng, Sebastillll; l
'
5
6
DB2.'ROIT.
DB!l'ltOI'l'.
SAN FIIANCISCO
C751
C751
C288
Sook, Joy
Silverto11, Lou
Kreuger, John,
3
2
1
7 SAN FRANCISCO caea Angler, Brica; 2
9
8 SAN FRANCISCO
SAN FRANCISCO
C288
C288
Ng, Sebaatia11 I
Sook, Joy :
I 1
3
10 SAN FIWfCISCO C:288 Silverton, Loll _,J 2
11 S'r THOMAS A054 Kreuger, John! 1
12 ST 'l'HOMAS A054, Angler, Erica i 2
13 S'r 'rROMAS A054 Ng, Sebastian! 1
1' - ST THOMAS A054 Sook, Joy : 3
15 ST i'HOMAS A054 Silverton, Lou 2
16 HAWAII P003 Kreuger, John i 1
17 HAWAII P003 Angler, Erica i 2
18 HAWAII P003 Ng, Sebastian ' 1
19 HAWAII P003 SoC!k, Joy ! 3
20 HAWAII 1'003 Silverton, Lou 2
21 l!IRMUDA . A059 Kreuger, John. ·1
22 BBRMIJDA i A059 Angler, Erica! 2
23 BBRlltlDA ' A059 Ng, Sebastian ! 1
24 BBRiluDA A059 Sook, Joy 3
25 · BERMUDA · A059 Silverton, Lou 2
;
;
rn
Program Because each flight attendant in ATIENDS /lies to each lUnation, the
objective is to produce a table that shows every possible co'tribination of
NAME and DEST:
I
i
'
i
Invoke PROC SQL and create a table. The proc sql; '
CREATE TA~LE statement creates the create table flights as
table FLIGHTS to store the results of the
subsequent query.
,-:I
Example 4.itll:1 Gene~ating Every Combination of
i!I Observations· between! Data Sets Based on a
I' Common1'Variable ! .
I: ; ; i
Goal Combine :two tables"' that have a common column. The common column has
I ' I
duplicate values in both tables. Produ~e a table that shows the possible
combination of rows where the values from the common column match.
i
. I
Strategy Join the two tables with PROC SQL. The join produces all possible
combinations of rows from both tables. Use a WHERE clause to choose only
those rows where the values from the;common column match. Order the query
rnsult to make the data easier to proce:ss in subsequent steps. You do not have
to sort the data prior to joining the ta~le.
10 . 12 Jake
,; 'i
• A PROC SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations.
Combi11i11g Multiple Observatiqirs with M11ltip/e Observar~o,M 0 I; Example 4.4 81
! I' ,
; 1'1
4 Cindy 10 4 roorn2, -;
S Cindy 10 3 rooml
6 Cindy 10 2 eafe;
7 Denise 10 4 room2
8 Denise 10 2 cafe ·
9 Denise 10 3 roomi
10 Ginny 11 5
11 Ginny 11 6 room3 l'I'I
room4
12 Ginny 11 7 ahop, ,
13 Jake 12 8 library I ,
H Jon 11 6 roomi j'1 ,
15 11 5-.
Jon rooml l'j'
16 Jon 11 7 shop I, ,
17
18
19
Lynn
Michael
Michael
12
11
11
8
5
6
library!
rooml
room( I '.
II ,
20
21
Michael
Rick
11
9
7
1
shop
gym
I
't'l!
Ii
22 Susan la 8 libra~r! .
1
I ::1 1 '
I I::
Program The objective is to produce a table that shows all of the pos:¢~ble homeroom
locations for each student, b. ased on grade. I 1:.! I I
: If
Join the two tables to find all of the possible combinations ~~ STUDENT and
LOCATION. Use the GRADE column to join the tables.~ G:Hoose only those
! 1:1 I
rows where the values for GRADE match. Order the data:by TUDENT:
I Iii
lllvoke PROC SQL a11d create a table. The proc sql; ' i
CREATE TABLE statement creates the create table assign as C\"fll~t \-a.~~ I.Xl;il : ~
I '
Cti
table ASSIGN to store the results of the
subsequent query.
I' I
::1
Select t1ie colmmzs. The SELECT clause select student, ro. ster. grade, homeroom, location !':.!
selects the specified columns from the
tables specified in the FROM clause, :
'
5t!Lct iJ-1"~~-
i
t'"',
11
Because GRADE is in both tables; you need ; :1 '
to qualify the name by prefixing the table : it '
name to the column name.
' .l
n
Name the tables tojoill and query, from·roster, schedule
I ii
I. 1-:I,
1·1
ji '
I,
I
·f
,1
g,t; oxu111pu: 'f•.J u I:Juupu:r .,
:!
·!
Example 4.51 Delaying Final Disposition of Observations
I.,I! Until All Processing Is, Complete
I:1 iI
!
Goal Search through a data set multiple times to find the closest match based on
calculated criteria, not on matching ✓alues of common variables. Flag
observations for subsequent processipg based on those criteria.
i
I
I
I
i
Strategy Flag observations in a data set for further processing by reading one data set
sequentially and another data set dire~tly using the POINT= option. Set up an
array with one :element for each obsetvation in the second data set, the one you
read dire<?tly. '
Read an observation from the first da~a set, then begin reading observations
from the second data set, looking for :Values
I
of one or more variables that meet
a certain condition set by a value from the observation in the first data set. Use
the iterative DO loop and the POINT~ option to read all observations not
marked as already used from the second data set. Continue reading
observations
' . .
and comparing values td: see if a better match occurs in the second
data set. :
I
After the entire second data set has bJen processed to locate the best match for
the current observation in the first data set. write an observation that contains
the best match to the output data set. Mark the selected observation from the
I •
1 RlOO; N 10
2 R200; y 15
3 R301 y 30
4 . R305 N 50
5 IR4QQ, y 60
6 · R420. y 100
,.,
'
I!:: I!
.,
OBS IIOOM
ROOMS
DBMOFAC: CAPACI'l'Y
Ii
I:
ROOMS contains only the rooms that ·! I-
remain unassigned. 1: Rl00 N 10
2' R301 y 30
iI
! I
Program !The objective is to find the most suitable room for a meetinl based on the
!number
.
of attendees and the need
.
for demo facilities. After' 1.,ti1~
1
first suitable
:match is found in ROOMS for an observation in MEETINGi the rest of the
:observations in ROOMS are searched in case there is an ey~~ more appropriate
match. "More appropriate"imeans that the room is closer ~1i'size to the number
~f attendees or that demo facilities are not scheduled unlessl~liey are needed. In
~this application, keeping demo facilities available was the :highest priority.
! . . ' i Ii I
First, determine the n'umber1of observations in ROOMS by ~Jing the NOBS=
bption and write that number to a macm variable using CAI!.U SYMPUT. In
; · · 1· 1·
~he second DATA step, use sequential access to read an 09sfr1vation from
;tvJEETINGS. Use the value ·of the macro variable to create aplarray with one
element for each observation in ROOMS. Create an iterativ~ bo loop that 1
iterates once for each observation in ROOMS. When a rooni 1lias been
f ch~duled for a meetin~, the value of the appropriate eler_n~~~ fn the array
md1cates that the room 1s currently scheduled. If a room 1s:noo already tagged
~s scheduled, read an observation from ROOMS directly u'slnk the POINT=
bption. Determine if the room is large enough; if it has dern6}acilities,
determine if the meeting requires them. Continue reading otl1er observations
from ROOMS, testing to see if a more appropriate roo~ is; a;vJiiable. Set up
temporary variables to hold values for seating capacity and ~vrilability of
~emo facilities so that you can compare those values to onesr~ad from the next
6bservation as you search for an even more suitable room, I' 1:1., 11
II i :j .
At the end of each DATA step iteration, write an observatii>ri 1l~ ASSIGN. If a
I ; 1,11
match was found, set the USED array element for the approJjriate observation
from ROOMS to 1. If it wasn't,.write the observation and incubate that no
ioom was assigned.
I
· I I.!
i I!
I
• I t! I
:i,,i
',j
ii
J
VUIIIVUlfllli, ,r.iu~ftjJCc;;. '-'V->r;;.1 1 Lu1yu.Jo 1nu, 1r.r.u111r•c. vv~w, r1,11,,.n,...,I
[
L...I
: : 1:I
if (capacity < tempcap) or : I:I
[ Determi11e if the cu11"e11t room is a better (demo= 'N' and tempdemo = 'Y') lhen
fit tha,1 the previous choice. If the do; : P
l
CAPACITY value of the current room is
smaller than that of the previous choice
(TEMPCAP) and if the status of demo
, : :~~:~o c:~: !!~:
tempobs = i;
== I: i
facility is the same or not needed, then the ~nd;
current room is a better choice. TEMPOES
[ is set to the value of I, the number of the
end; ;• ends a DO group
. current observation. It will be used later to
set the appropriate member of the USED
[ array to indicate that the room has been ' i,
selected. I :
l !I
Example 4.fr 1
' Generating Every Com_bination between D~ta
Sets, Based on a Common Variable When an
Index Is Available ·
,j i
!
l
i
I
I
Goal Create a new data set that is a cartesia'n product* of two input data sets.
I
I
:
i
Strategy The overall strategy is to process the first data set sequentially using BY-group
processing and to process the second ~ata set directly based on the value of a
key variable. (The variable corrunon to both data sets is the BY variable for the
first data set an~ the key variable for the second data set.) Bach time you find a
match, write an observation to the output data set. If there are consecutive
duplicate values for the common variable in the first data set, you must force
the pointer to return to the beginning ~f the index so that matching values in
the secondI data set will be retrieved and
I
paired with the appropriate
observations in the first data set. i
!
In detail, sort the first data set on the BY variable and index the second data set
on the same variable. Read observatiohs from the first data set sequentially,
executing the SET statement in each iteration of the DATA step. Read an
observation from the second data set u.~ing SET with the KEY= option in a
DO UNTIL loop. Continue reading observations until there is no mateµ for the
common variable. ,, · i
:: ;: • i'
Use the SELECT group .to conditional1Y execute statements based on whether
a match is found. If a match is found, write an observation to the output data
set. If a match is not found, take differbnt actions, based on whether the current
observation from the first data set is thb last one in the current BY group.
When it is ,the last in the BY group, ta!<;e no additional action. The DO UNTIL
loop will end. I
:· I I
When the current observation from the first data set is not the last in the current
1
?n
BY group, you ~ust force positioning the index to the beginning.
Otherwise,:cons,ecutive duplicate value:s for the comrnoµ variable in the first
data set cannot: be.. paired
'
with matching! values in the second data set.
I
You can perforf!l tl~e s~me task with P~OC SQL. See "Related Technique."
' '
, it, · I
Note: Due to the variabiiity of data and the number of conditions that
determine the path.chosen
' '.. • 'I'
' .,_ '-
:,i,
by the PROC SQL optimizer, it is not always
I
possible, to det~r~~e t~e most yfficien~ method without first testing with your
data. , ;· I
i
i
i
II
i
I
i
I
I
i
I
!
• In this example, a Cartesian product is a ne~v data set that consists of every possible
combination of observntions from the two input data sets, based on the value of a BY
variable.. ·
Combining M11ltiple Obsen1atfo11s with Multiple Observatidi,f □ Example 4.6 87
:I
,I
:I
Input Data Sets i
The SALES data set is sorted by
.I
SALES
PRODUCT.
OBS PRODUCT SALESREP ORDERNUM
SHIPLIST was created with the DATA OBS PllOP\ICT SALESREP . ORDBRNUM PRDTDESC PCDl!SC
step. 1 310 Polanski · RAL54'7 oak pedestal table tabletop
2 310 Polanski ' RAL5U7 oak pedestal table pedestal
3 310 Polanski . RAL5447 oak pedestal table 2 leaves
4 310 Alvarez CH1443 oak pedestal table tabletop
s 310 Alvai:az CH1443 oak pedestal table pedestal
6 310 Alvarez , CH14'3 oak pedestal table 2 leaves
7 312 Corrigan DDll5523 brass floor lamp lamp base
8 312 Corrigan DUR5523 brass floor lamp lllll!P shade
9 m Corrigan : DUR5524 oak hookcaee, short bookcase
10 313 Corrigan : DUR5524 oak bookca ■e, short 2 shelves
11 313 Polanski i RALS49B oak bookcase, short bookcase
12 313 Polanski : RAL5498 oall: bookcase, abort 2 shelves
SHIPLST was created with PROC SQL. OBS PRODUCT SAL!!SRXP ORDBRN1JM PRD?DESC: PCDKSC
1 310 Polanski RAL5447 oak pedestal table tabletop
2 310 Alvarez CH1443 oak pedestal table tabletop
3 310 Polanski : RAL5447 oek pedestal table pedestal
4 310 Alvarez . CH1443 oak pedestal table pedestal
s 310 Polanski RAL5447 oak pedestal tabl& 2 leaves
6 310 Alvarez : CH1443 oak pedestal table 2 leaves
7 m Corrigan . DDll5523 brass floor lamp lamp base
8 312 Corrigan , DtlR5523 brass floor lBJl\p lamp shade
9 313 Corrigan DUR5524 oak bookcase, short liookcase
10 313 Polanski .RAL5498 oak bookcase, short bookcase
11 313 Corrigan I DUR5524 oak bookcase, short 2 shelves
12 313 Polanski . RAL5498 oak bookcase, short 2 shelves
oo .excm1p1e "·"
;
i
Program The objective is to create a shipping! list data set from one data set that shows
each item sold and from another dat~ set that shows how many pieces need to
be packed for shipping each item. F~r example, an observation in SALES
shows that an oak pedestal table, item 310, was sold, and STOCK shows that
item 310 consists of three pieces:_ a tpp, a base, and two leaves. The resulting
data set SHIPLIST, therefore, will contain three observations for the first sold
item recorded in SALES. !
I
Then read observations from STOC~ directly. Use the SET statement and
i
,,
specify PRODUCT as the key variaqle with the KEY= option. Place this
statement in a DO UNTIL loop that executes until there are no matches in
STOCK for the current value of PRQDUCT in SALES. Each time a match.is
found, write an observation to SHIPLIST. When no match occurs, take one of
two actions based on whether you'v¢ finished processing the current BY group
in SALES. :
If the cun·ent observation is the last observation in SALES for the current BY
group, tlie DO UNTIL loop conditio1i. is met and the ioop ends and processing
returns t~ the top
of the DATA step tp read the first observation from the next
BY. group in SALES.
. ., i
' I
If the cur~ent observation is not the l~st observation in SALES for the current
BY group, yo~• must force the pointe~\ to return to the beginning of the index so
that observations with matching PRQDUCT values in STOCK will be found
and matched
I
with observations from SALES. See "A Closer Look" for more
-t '. 1
detail. j
I
I !;I : .' · 'i I
Create SHIPLIST. Rfqdau obscrvaiioll . data
•
shipl;ist
Ii• tO •
(drop=dwruny); .
from SALES. Specify PRODUCT as the · set sales;
BY variable. Set DU.M~;Y to O at the top of · by, product;
each PATA step iter~tipq, (The next DO d~y=O_;, .
loop uses the value o1D·.
·I UMMY.)
ij
;_.,· .. '
!·. ,: h .:i.
l I
Co111bi11i11g M11ftiple Obsen•atio11s with M111tip/e ObservatioJ,s □ Example 4.6 89
'
The SELECT group in the DO UNTIL loop begins this process. When there
are no more matches in the index on STOCK for the cu1rent value of
PRODUCT in SALES, determine if there are more observations in the cunent
BY group in SALES. If there are mo1!e observations to process in the same BY
group in SALES and DUMMY has not already been set to 1, assign values to
_IORC_ and the variable DUMMY: .
when (%sysrc(_dsenoml)
do;
_error_=O; 1
if not last.product and not dummy then
i
do; ,
dummy=l;
_iorc_=O;
end;
end;
I
I
By changing the value of _IORC_ to 0, you cause the DO UNTIL loop to
• ·• . . , I
iterate agam: :
,i i
do until(_iorc_=%sysrc(_dsenom));
if d~y then product=99999i
I
set stock key=product; I
data shiplist(drop=dwnmy);
set sales;
by product;
dummy=O;
i
i
i.;
I,; :
Co111bi11/11g Multiple Obse11'alio11s with Multiple Observatloi1~I □ Example 4.6 91
; 1,
Related Technique II you are familiar with slructured Query Language (S~J you may want to
use PROC SQL instead of the DATA step. PROC SQL jofn.s the tables* to
produce a new table, SHIPLST, which includes informatiotj from both input
tables. : I:. :.
:i I1
I
1
Conceptually, the join results in an internal table that matches every row in
SALES with every·row in STOCK. However, you want oril~
the rows where
the values for PRODUCT are the same in both tables. Thei~HERE clause
returns the rows from the join that have the same values for ~RODUCT:
' :\,: I
·,
proc sql; ,
create table shiplst as
select *
from sales as· a, stock as b
where a.product=b.product; 1··
quit; ' :
,,
1-I
Note: In PROC SQL, SELECT statements automatically:p oduce a report.
SELECT clauses, which follow CREATE TABLE or CRE~rm VIEW
• I I
• A PROC SQL table is a SAS data set. In SQL terminology, ,columns ~~ i variables and rows
are observations. ·
• ·,t
You do not have to sort the data prior\o joining the tables.
I
I
Input Data Set, J:i I
I
•;1
u
i:[
:11
• A PROC SQL table is a SAS dnla set. In SQL terminology, columns are variables and rows
are observations. '
t:01110I11111,: MII111pte uvse1va1I011s 1111111 MI1111p,e U/Jservat/011 l □ Example 4, 7 93
I
DAILY i
i
OBS ~DNUM UBMNO
. '
QUAN'l'ITY iI
1 ~~-~ 101' 2
2 3~~ .103, 1
3 511 101' 1
4 ?1~ 103 1
5 5112 105 1
6 5132" '.to5'·, 1
7 3551" ...l~t 1
8 ~S5~ -@$i 2
9 ~78~ 104
•. i·
1
10 ·34~ :1ot 2
11 Sll 1.0·i: 1
12 ~1~ :103• 3
13 511~ 10S; 1
14 5112 ro1:; 3
15 5132° ~o~ 2
16 3551 @J; 1
17 ~551 @Ji. 2
18
19
355~
,3782
®:
104: 1
2
.
. i
20 3782 -105 ! 3 Ii
PRICES
1 ·101 0.30
2 .10.2 0.65
3 103_ 2. 75'
4 104 1.25'
5 105 0.85,·
"i
'
Resulting Data Set .li I·
!:
011tp11t4,7 CHARGETable
CHARGE u I·
!i;-
OBS .ID IWIB LOCl\.'1.'XON T()'l'AI, ; 'l'll'PB
I' I
1 341 Kteur,ar, John Bldr, A, 111111111 $3.95 : ca~h charge
2 3551 Sook, Joy . Bldg I, Rm 2533 $11, 40 - ctah charge
3 3782 comuzzi, James Bldg 1, Jim 1101 $5, 05 ; payroll deduction
4 511 Olazweald, _Joe Bldg A, 1111 1234 $11.60 . payroll deduction
s 5112 Nuhn, Len ; Bldg A, 11111 2123 $2. 60 : payroll deduction
6 5132 Nguyen, Luan Bldg B, Alli 5022 $2. 55 : payxoll deduotion
'• l"I·I
"" ,::,X<1111p1e 't,I
I .
I
<-j '-'"'P'",...,
!I I
;
Program The objective is to join the EMPLOYEE, DAILY, AND PRICES tables to
learn the total charges for each employee. Use the common columns to join all
three tables.* As a result of the join, all columns from all three tables are
available to process. By joining DAILY and PRICES, you can multiply
:j
QUANTITY and PRICE to get a dollar amount for each purchase. By joining
I:, ;j
DAILY and EMPLOYEE, you can get the name of the employee who made
! each purchase.
!I'
I
I I
Ii Group the rows so that you can perf9rm a summary calculation on each group
.i l:1
I i'.i
and get the total charges for each employee. Grouping the data also eliminates
duplicate rows.
'I
!j
;;i Two employees,·381 and 5151, have no charges in the DAILY table.
'i Therefore, there are no rows with these two IDNU.Ms that satisfy the WHERE
i! conditions:
I; !l;,i
fovoke PROC SQL imd create a table. The
I 'j
proc sql;
CREATE TABLE statement creates the create table charge as
table CHARGE to stbr~ the results of the
subsequent query. ! ;! I
i 'I :
Begin to specify the ~~lm1ms to be in the select id, name, location,
query result, Becaus~;ID, NAME, and
LOCATION occur ohly ,.. in the EMPLOYEE·
table, you do not have to prefix the table
'
. h . I
a1ias to I eir names .. ,::
I
:
1 \I :
Create a lleW colrmm '1~iti1 an al'itlm,etic surnlquantity*price) as:total format=dollar8.2,
expression. The SUM function sums the
values that tesuit frm~ pl)-lltiplying
QUANTITY and PRtq_E, The column _
TOTAL shows the 10·1al charges for each
I I .
employee. Because t11e 1data are grouped,
the value of TOTAL isI •I
'for each group, (If
the data were not gro~~ed, the value of
TOTAL would be th]· total for the entire
table.) . I;
:1
'I
I
,i
tj
,i
• The columns 1h11t you join on do not have to have the same name.
·;
i
I
i
I
Ii
II
I;!
\,,.UIIHIIIHllg J.t'JHIIIJ)tt: vv.n:1 VUUUIIS tvltll lV.lHlllpu: VV~C:l"l'UIIUII.) . L.J e,xan,pte 4.7 95
. . I.
i
Joi11 tlte tables, ITBMNO is common to the where p.itemno~d.itemno and id~idnum I
'
DAILY and PRICES tables. IDNUM and
ID are common columns in the DAILY and i
I
EMPLOYEE tables, respectively. I:
Group the data to get the total for eaclt
employee. The GROUP BY clause returns quit;
group by id, name, location, type; I I!
one row for each employee. In the GROUP I:
BY clause, if you list each column specified 1:
in the SELECT clause, PROC SQL has to j:
make only one psss of the data. i: I
:
!:ii II
Note: In PROC SQL, SELECT statements automaticallyI _groduce a report.
•
SELECT clauses, which follow CREATE TABLE or
statements, do not automatically produce a report.
I
VIEW C~,1I: TE
:
• I '
i ;
: .
1:
1·
I
I
I:
I
,.g r.xu111p1e -,,o u II ~'"'P""",.,
),
Example 4-~! I Interleaving Nonsorte~ Data. Sets
\ii
'I
Goal ,i Combine two tables'-' that contain columns with the same names. Create a new
ii table froni the result. Put the data in order according to the values of two of the
columns. ·
Strategy PROC SQL provides set operators that enable you to work-with the results of
two independent queries. Use the OUTER UNION set operator to concatenate
the two independent query results retqrned by the SELECT clauses. Use the
I' CORR keyword to overlay like-name~ columns.
I
Input Data Sets!
I ONE 'l'WO
' '
Oulpttt4.8a SCHBDULEData Set '
1 ·1 SCHEDULE
SCHEDULE was created with PROC SQL.
I: OIIS DATB I DIPllR'l' FLIOH'l'
I: 1
2
01JAN93
0liJAN93
7110
8131
, 10143
114
176
I 3 01iJMf93 202
4 01JJIN93 i 12116 439
5 02JAN93 7110 114
Ii 02JAN93 9110 176
I: 7 02JMl93 10145 202
1· 0 03JAN93 8121 176
g 04JAN93 9131 176
I 10 05JAN93 8113 176
~--~--=----------
---~~=---~--------~ --,- -----~
----
,. A PROC ,SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations.
1'
I
!t
t:
--•- ■ -uouy r-- - ~--·. n••-··-
........ ,...... - -·-----·r- ... ----· ....... • ·: J - -~-'""I"'""" .......
i'
I
Output 4,Bb SCHED Data Set
SCHBD
SCHED was created with the DATA step. I
OBS DATB DBPllR~ lLIGH~ .
l 01JAN93 7110 114
2 01JAN93 8:21 176
3 01JAN93 10143 202
4 01JAN93 12116 439
5 02JAN93 7 ilO 114
6 02JAN93 9110 176
7 02JAN93 10145 202
8 03JAN93 8121 176
9 04JAN93 9131 176
10 05JAN93 8:13 176
· ,, I'
1:
. :! I:
Program The objective is to combine the tables so that all the flight '.ihformation is in
t bl ,I
one a e. i ji
To make the table more useful, order the data by the date, and then by the
departure time: I:
Invoke PROC SQL and create a table. The proc sql;
CREATE TABLE statement creates the create table schedule as
table SCHEDULE to store the results of the
subsequent queries and set operation.
:
Select all colmm1sfrom table ONE. select *
!:
from one
,,l
11
Co11cate11ate the two query results. The outer union corr
OUTER UNION set operator concatenates Ii
the queries returned by the two SELECT
clauses. CORR overlays columns that have I!
Ii
data sched; : I; I;
set one two; i: ,,
by date depart; I; I
i' .
run;
!,,
.
I'
I·
>10 £J,J,a111p1e -,,o u
Note: If you use PROC SQL, you do not have to sort or index the data. If the
data sets i,ire not sorted (regardless of;being indexed), it is typically more
efficient to use PROC SQL. If the da~a sets are sorted, it is typically more
I. efficient to use the DATA step and tll'e SET statement.
I !
I:i
I:
Example 4.9 Interleaving Data Sets Based on a ¢.9mmon
Variable !; ·
1:
i I: I!
Goal Interleave two data sets containing a common variable. Aisb, demonstate that
testing the value of an existing variable instead of a new ✓atiable can produce
unexpected results when using BY-group processing. · j: I:
: 1:
Strategy Sort the input data by the BY
.
variable. Specify both input dbta
. I
sets in a single
SET statement. Use the IN= data set option with one of the aata sets to create a
. n
variable that indicates when that data set has contributed tg kn observation.
Two examples of the sam~ program illustrate how unexpe~~ed results can
occur. The first program tests the value of and updates a vati~able read from the
input data set. It produces unexpected results. The revisedjyersion produces
accurate results by testing and resetting the value of a varil,lile created during
theDATAstep. • : !J:
CAUTION! ; 1I .
Variables read from input SAS data sets are retained ~¢ oss DATA step
iterations. Testing or resetting those variables can prod Ice unexpected
results. ■ ·· • l·i ·
:
!'
i:
OBS COMMON
1 A
2 C
I uu l!,XCllllpll! ¥,:,
'·I
u: I \,llllplt:I'.,
I I .
II
Desired Results
ll
Output 4.9a COMBINED
I , Data Set ·coMBINID
TEST contains the vi1!e TRUE only in OBS COMM~ SWITCH TBS'l'
I
Original Progrant The objective is to interleave data sets ONE_A and TWO, based on the values
I of the BY variable COMMON. Readithe input data sets with BY-group
I
I processing by using the SET and BY statements. Use the IN::: data set option to
I
I create variable IN2, which will be se~ to 1 (true) for each observation that
,1
i originates from data set TWO. With an IF statement, test the value of IN2 and
1·i
i the value of the variable SWITCH to ~etermine whether to set the value of the
.i
!"I
·I
existing variable TEST to 'TRUE':
:·1
11
'i
:!
Create COM/JINED. 1~ead a11 observation data combined;
fi·om data set ONE.A a11d data set TWO, set one_a two(in=in2);
nsi11g BY group p,·oc~s~ing. Variable IN2 by common;
will be set to 1 for eaqhiobservation to
which TWO contributes. COMMON is the
-:•.J .
BY variable. i !!
.I,;
If data set TWO /1as coi1lribnted to a11 if in2 and switch = 'Y' then test = 'TRUE';
observatio11 (INZ is il·f1,J) a11d if t/le vallte
'I , run;
of SWITCH is 'Y', tlifli set tl,e cm·1·e11t
valt1e of TEST to 'TRUE', The assignment
statement assigns a valub to the existing
variable TEST. .
Unexpected Results
Ii•
1
TRW
N
lJ
y
·1. 7 II TRW ti
I 8 C cccc
,j
.,:1
----~- -
·i ~ -
:I
~u"'v"'"'6 mmu,-,v ~v••• ••mw•• """ ,...,.,,,,,. ~v••• ••mv,r !: ~ =muy<~ T,_, IU l
At first glance, Output 4.9b seems to show that the IF coddttion did not work
correctly since observations 4 and 7 contain the incorrect!v~lue of TRUE.
Actually, these observations contain incorrect values for TEST because its
value is retained in the program data vector throughout th~ [ife of the current
BY group. It is replaced only when a new observation is ie~d from data set
ONE_A. Because TWO contains multiple observations 'w;ith
the same value of
the BY variable while ONE_A contains unique values of th~ BY variable, the
value of TEST is duplicated across all remaining observa icins in the current
I
BY group. l
Revised Program The revised program uses the same code, but different in~u data. It reads data 1
set ONE_B, which does not contain TEST. This program H~ts and changes the
value of TEST as a variable that is created during the DAT step, not read f
from an existing SAS data set. The assignment statement freates TEST in this
example. Its value, therefore, is not retained throughout tne urrent BY group,
so testing and setting its value does not produce incorrect tesults when
f
subsequent observations in the same BY group are read frbrh
data set TWO.
See Output 4.9a. · ·
'
I: :
, I
' 1'
Ii
Example 4.1. p Comparing All Obser"'.ations with the Same
;I BY Values '
I
I
Goal Create a new data set by merging two data sets, each of which may contain
multiple observations with the same BY values, and by comparing all
observations with the same BY values.
i
·I·'
Strategy Begin with two data sets that may co~tain multiple observations for each
11 unique value of the common variable'. In the DATA step, read these data sets
I I to create new data sets that contain one observation per BY group, with
variables whose values identify the observation number of the first and last
observation for each BY group. Merge these two new data sets so that all the
informati~n about BY groups in both:data sets is in one location. Include only
observations to which tlie first data s~t contributed.
To create the final data set, read all three data sets: the two original ones and
I the merged one that identifies the first and last observation in each BY group.
,,! Because each of the original data sets;may contain multiple observations with
duplicate values of the BY variable, you must loop through the BY groups in
'II each data set multiple times to compa~e each observation in a given BY group
.,i with each .ol:!servation in the same BY; group in the other data set. Therefore,
use the POINT= option to directly access both data sets by observation
number. ' ' · ;
'
;
'I You can perform the same task with P,ROC SQL; see "Related Technique,"
·I !
I
<I Note: Due to the variability of data ~nd the number of conditions that
;j
determine the path chosen by the PRQC SQL optimizer, it is not always
I possible to determine the most efficient method without first testing with your
i data. · 1
I
I
. I '
Input Data Sets! i I
Both BREAKDWN andiMAINTcontain
'., BREARDWN MAil'li'
for
multiple observations I • certain values of
the BY variable VEHICLE. Each is sorted OBS:. 'BRKDND'l' VEHICLE OBS MNTDAi'E VEHICLE
by VEHICLE, Within ~~HICLE, each is
also sorted by date of breakdown 1 02MAR94 1
AAA 03JAN94 AAA
(BRKDNDT) or mainferiance . :20MAY94°
2 AAA 2 05APR94 AAA
(MNTDATE),
3 l9Jijij94' AAA 3 10A0094 AM
4 29NOV94 AM 4 28JAN94 CCC
5 ' ' , 04JUL94 ,' BBB 5 16MAY94 CCC
6 31MAY94 1 CCC 6 070CT94 CCC
;
7 I · 24DEC94 ' CCC 7 24FBB94 DDD
8 22JON94 DDD
9 19SEP94 DDD
i
i
i
'·
!
i
,I
ri
:1
I
'I
Co111bi11i11g M11l1iple_Observa!io11s wit/I M11/tip/e Observation, D Example 4.10 103
1 AAA :1 4 1 ~-! ,I
1 3
2 BBB '5 5 2 CCC::,, 4 6
3 CCC 6 7 3 DDD!' 7 9
1 AAA 1 4 1 3
2 BBB 5 5 I
I
3 CCC
I
,' 6 7 4 6 ;!
~ I
';
, I
I
Resulting Data Sets I!
O11tput 4.10a FINALl Data Set
FINAL2 was created with PROC SQL. OBS VBJ!ICLB BRKDND'l' LASTM!l'l!:
l' AAA 02MARH 03JAN94
2, AAA 20MAY94 05APll94
3 All 19JDNM 0SAPll94 :
! l1
's'
6,
AAA
BBB
CCC
29H0V94
04JUL9'
31MAY94
10AUG9' •
16MAY94
.. !
i I
! I;
f CCC 24DIC9' 070C!l!l4 ; i i ; L
1 I
I"! '
: I
I
iI I
1i i
l.1 !
Program j The objective is to create a:data set that shows the mostrec1 J :tmaintenance
j date for each time a vehicle had a breakdown. 1 !I:
I ; 1 !
!First, create data sets (BRKKEY and MAINTKEY) that corit~in one
j opservat!on foreach BY gro~p and two additiona~ variable~'. t~at identify the
1 observation numbers of the f1r~t and last observations for _tli~~ BY group.
!Merge these two data sets into a single data set (KEYS) so ihit you will be
iable to compare all observations in each BY group betwee~ t~o data sets. ;¥e
!Then read this merged data set and use the FIRSTl, LAST~; tIRST2, and
1LAST2 values to directly access all observations in each BY group in data sets
iBREAKDWN and MAINT. Then compare the valu~s ofMN!fDATE and
1BRKDNDT so that you can determine tµe correct value forirJ~STMNT, the
Imost recent maintenance d~te prior to each time the v:ehic~I i ~eded repairs.
i
ru•t·1 l!,.\'Ulllp1e~.1u u ~ttup1er.,
'·I
Ii
cr!ate BRI(KEY. Read a11 observatio11 data brkkey (keep = vehicle f.irstl lastl);
ft·o~11:BREAIWWN, 11sillg VEHICLE as . set breakdwn; (
tlle1~Yval'iable. Create variables FIRSTJ by vehicle; !
muf LASTl whose values represent the•
retain firstl;
obiei·vatio11 mtmber of the first and last
if first.vehible then firstl=_n_;
obieJ.vatio11 in each BY group. After
rea~ing the last obser11atio11 ill each BY if last.vehicle then
I
' I
I
'
' i I
I
'__L,I
Co111bi11i11g Multiple 0bsel1'atio,!S with Multiple Observations [I] Example 4.10 105
'. 1:
i i
! I
i i
iI
Related Technique ;If you are familiar with Structured Query Language (SQL), y '. u may want to
use PROC SQL instead of the DATA step. PROC SQLjoin~ fpe tables* to
1
produce a new table, FINAL2. Conceptually, a join results i½lan internal table
that matches every row in BREAKDWN with every row MA.INT.
1
in
i . : 11 Ii
ffhis example shows a left join, which returns all rows that Jrie~t the ON clause·
briteria and the rows from the left table (BREAKDWN) that:]'pnot match any
row in the right table (MAINT). ; i j
~he ON clause specifies that the resulting table will contain 1 jly those rows
;,,,,here the values of VEHICLE match and where the breakdd~n date is later
l
' • II I
than the maintenance date. : · 11 ,
I
I . i' I
pie HAVING clause ensures that you get the row with th~ lat<:st maintenance
date for each vehicle. : II II
I 'Ii
To understand how this join works, consider the matches fo~ ;t~e breakdown
date of 20MAY94 in the BREAJ{DWN table. ON specifies that,~hejoin will
r•m only rows from tho internal table where th• value of ,/1.1rcle is the same
i : !l
I '. 1 I
:• A PROC SQL table is a SAS data set. In SQL terminology, columns ir~ variables and rows
i are observations. : j !
l !; I
I ;
i
I I
i
: I
I i
1uo l!.Xa111p1e .,.,,v :IJ c..11aprer 'f
l
I
'
and where the breakdown date is later than the maintenance date. Only two
rows from the internal table
.
meet both
'
of those criteria:
¥A 2.0l-!AY9.4 ·05APR94
AM 20MAY94 03JAN94 i
I
I
Because the HAVING clause further;restricts the result to include only the row
that has the latest maintenance date, only the shaded row appears in the final ·
result. ' i ·
I
!
The row for vehicle BBB is the only ~ow returned by the join from the left table
that does not have a match in the right table.
I
i
Here is the PROC SQL step that creates FINAL2:
' !
proc sql; !
create table final2 as i
select b. vehicle, b,brkdndt, m.mntdate as lastmnt
from breakdwn b left joi~ rnaint rn
I on b.vehicle=m.veh~cleand b,brkdndt >= rn.mntdate
group by b.vehicle, b.brkdndt
having rn.rnntdate = rnax(m'.mntdate);
. I
quit; !
l i
Note: In PROC SQL, SELECT stat~ments automatically produce a report.
SELECT clauses, which follow CRE~TE TABLE or CREATE VIEW
statements, do not automatically produce
I
a. report.
!
i
□ LEAVE statement, For a compl~te description with an example, see
pp. 34-35 in SAS Technical Report P-222, Changes and Enhancements to
Base SAS Software, Release 6.07.!
I
''
-11
'
. 'I
_____________
CH APTER
,..5!
I
i :
j
'
1 I!
I' II
You can work with the data in a single data set in many w' 1s to enhance it or
reshape it as you need. For example, you can calculate ne~ Jalues from
existing variables, apply common operations to a group of/y~riables, collapse
observations, or expand observations. For a complete list oflthe tasks covered
in this chapter, see the example titles below: : 11 1 I!
! I I
5.1 Performing a Simple Subset · 108 ; I[I!
5.2 Separating Unique Observations from Duplicate Ob.s~r.,vations 110
: I II !
5.3 Accessing a Specific Number of Observations from tp.'[;Beginning and
EndofaDataSet 112 : ii !
: : !J :
, : I I
5.4 Adding New Observations to the End of a Data Set! /115
: . i 11
5.5 Adding Observations: to a Data Set Based on the V~lUfjPf a
Variable 118 : ! I! jl
5.6 Simulating the LEAD Function by Comparing the V. a,]! of a Variable to
Its Value in the Next Observation 120 : 11 i
• I I
;5. 7 Obtaining the Lag (Previous Value) of a Variable w'itHi : a BY
Group 122 , i I\ i
!5.9
Cumulative Total 126 l I!I
5.10 Calculating the Percentage That One Observation C.Jn.~ibutes to the
Total of a BY Group 129 ; j 1 I
:S. 11 Adding a New Variable that Contains the Frequendy-lb.~i a BY-Group
: Value 132 : l iIJ
11
iI
!j
II II
' :
i:
I I
108 Example 5.1 j j Chapters
!I
Performing a Simple ~ubset.
I
,1
!
Goal :I Create a subset of a SAS data set efficiently by selecting for processing only
:i observations that meet a particular c~ndition.
!I
Strategy i •
To subset a SAS data set based on a variable value, you can use the WHERE
statement with the SET statement to ~pecify a c~ndition that the data must
satisfy before observations are read into the program data vector. Using a
WHERE statement is efficient because it talces effect before the SET statement
executes on each DATA step iteratioA. Instead of reading all observations, the·
SET statement then reads only the obkervations from the input data set whose
data meet. the specified condition. [
l
!
·I.
i
Input Data Set~ i NEl'IBIRES
Resulting Data:set
fl'
Outprlt 5.1 TOYDE: !Data _Set
TOYDBl.'1'
I
!
' OBS NAMB I DBPT ID
!
' I
; ' l Bstefon, Blllilio Toys 5'3(5
2 Harper, Chang Toys 45434
3 smart, Hattliew Toya 45412
4 Ochman, Jindra 'l'oys 45413
II
u. .. ,.l" .. '"'~"' 6
~ _ ..., ..... '"''" .... ..,.,, 6 .. L- uvnj&..f l L..J ~ .. ,,,,,p,.: J.~
•'•. 1'I; .';
Program . The objective is to create a subset of the data set NEW~IIIBS that includes
i only employees in the Toys department. The WHERE state 1 ent allows only
observations that have a value of 'l'oys for DEPT to be re I·d by the SET
I
statement:
i'
CJ"eate TOYDEPT. Read an observatio11 data toydept;
jl'Om NEWHIRES 011/y if tlte employee set newhires;
works in tl,e toy depart111e11t. The WHERE where dept= 1 Toys 1 ; .
statement prevents unneeded observations run;
from being read into the program data
vector. II I
I
! i
iI
!
i
:
I
i 11
;
I
i
11u r.xamp1e .J.~ J : j 1.,11apu:r .J
ii.iiI :.
Example 5.:'! Separ~ting Unique Observat(ons from
' Duplicate· Observation:s
i
Goal Identify duplicate and nonduplicate observations in a data set and write each tc:>
the appropriate data set. :
Strategy Sort the input data set by the BY variables. Read the input data set with the
SET and BY statemen~s. Use the FIRST. and LAST. vruiables for the
appropriate BY variable to determine, when an entire observation is a duplicate
! in the dat~ set. When both FIRST.vm'.iable and LA~T.variable fo~ the
appropriate BY variable are equal to ~ (true), then you know that the
observation is not a duplicate so write it to a data set. Write all other
observations to a data set for duplicates. .
i
i'
i i1 1 3456 Amber CHEM101 :
I
! I ' 2 3456 Amber . MATH102
i 3 3456 Amber MATH102
i i
4 4567 Denis_e ENGL201
I! 5
:
4567 Denise BNGL201
6 · 2345 Ginny CHBM101
I 7 · 2345 Ginny ENGL201
Ii 8 2345
:·1234
Ginny
Lynn
MATH102
CHE~U01
9
I
10 ; 1234
0
Lynn CHBM101
11 1234 'Lynn MATH102
12 I 5678 'Rick CHEMlOl
13 , 5678 Rick HISTJOO
14 : 5678 Rick HIST300
- ~---__ -_ -- ---====--====-=====~~---=--
. :11
Program The objective is to deterrriine which observations in CL~ ~ATA are
duplicates. A student's name may be in the data set more tl+n once, but no two
observations should contain both the same student name a~? the class.
· · · II 1:
First, sort CLASDATA b; NAME and CLASS. Then uie ~~
-group
processing to create the FIRST. and LAST. variables for tUJ.BYvariables.
When FIRST.CLASS and LAST.CLASS are both equal t<}iL you know that
the observation is the only one with these values for NAME and CLASS in the
· · · II'
data set. Write it to the NODPPS data set. If these variablrs are not both equal
to 1, the observation is a duplicate, so write it to the DUP I ata set:
Create DUPS a11d NOD UPS. Read 011 data dups nodups;
observatio11from CLASDATA 11si,ig tlie set clasdata;
SET'stateme11t a11d BY-group processi11g. by name class;
Specify NAME a11d CLASS as BY
variables,
Compa,·e tl,e val1tes of tlie FIRST.CLASS if first.class and last.class then output nodups;:i
a11d LAST.CLASS variables. Write a11 else output dups;
observatio11 to NOD UPS or DUPS, run;
depe11di11g OIi tlie 011tcome ofthe
comparison.
112 Example S.3
ll
m Chaptel' 5
d1'
:I '
Accessing a Specific Number of
Observations ·from th~ Beginning and End of
a Data Set 1
!
i
I
Goal i'I 'I
'
Process only the first five and last fi~e observations in a data set efficiently by
: !1 not reading the entire data set. I
I 11 I
.' ''I,, !
I
I
Strategy Process specific observations rather than all observations sequentially by using
the POIN,T= option in the SET state~ent. Use the NOBS= option in the SET ·
statement to assign to a variable the ~umber of observations in the data set.
Use DO loops to read only the first five and last five observations in the data
set. Because the application calls for l·eading at least ten observations, use
I IF-THEN logic to avoid reading sombI observations twice when a data set
' I
I contains fewer than ten observations.jReduce redundancy in your program by ,
using the LINK statement to repeated,ly route execution to a group of
Ii data-reading and data-writing statements.
'I
: 1!I
!
ii Ve observations
I
Program The objective is to create a subset of the SALES dat~ , ~t that contains only the
first five and last five observations. Because SALE~~~~ more than ten
observations, you must set the values of the variables St'ARTOBS and
ENDOBS to indicate which observations to read: the' fit.st
I,'
five observations
and the last five observations. After these values are setJ link to a set of
labelled statements that read and write five observatiohs.
;·. i'. I:
1: I
If SALES has fewer than ten observations, you can simply read from the
beginning to the end of the data set. However, by usin~ the same method of
access, direct instead of sequential, regardless of the :sfa! of the data set, you
can link to the same block of data-reading and data-wrltlng statements, making
your code more compact: ' ·
For data sets with more than ten . if numobs > 10 then
observatio11s,pl'Ocess tliefirstjive a11d last do;
Jfre obsel'vatio11s. The assignment startobs=l;
statements set the appropriate values for i endobs=5;1
STARTOBS and ENDOBS, which will be' link getobs;
used to control the DO loop that reads and: startobs=numobs-4;
writes observations. The first LINK '
endobs=numobs;
statement causes the statements that follow
the label GETOBS to execute and process ! link getobs;
the first five observations. Execution then 1
end;
returns to the statement following the LINK
statement. The STARTOBS and ENDOBS
I
values are reset, based on the value of i
NUMOBS, the NOBS= variable. (See the i
SET statement later in this program.) When
the program is compiled, NUMOBS is :
assigned a value equal to the number of
observations in data set SALES, The second
L1NK statement causes the labeled i
statements to execute again, this time
processing the last five observations.
'
For data sets with te11 or fewer ! else
observatio11s, process all obscrvatio11s. Th~ do; :
LINK statement causes the statements that ! startobs=1;
follow the label GETOBS to execute and : endobs=numobs;
process all of the observations in data set link getobs;
SALES.
end;
11
;
Manip11fati11g Data From a Si11gle Somti i° Example 5.4 115
iI
Example 5.4 Adding New Observations to the End of a
1
Data Set !. ! '!
' I
I
I ,! I
Goal Add new observations to the end of a data set, while retaining the original
name of the data set. • i i·
I , .
1 .
I i
Strategy Use the END= option in the SET statement to determine 1w]l~fn the end of the
data set has been reached. Then use a DO loop to generate new observations
and append them to the end of the data set. If you want the r~·sulting data set to
retain the same name, specify the same data set name in th I DATA and SET
statements. : I! I
You can also create a new data set to contain the new obsef~~tions and then
add those to the original data set by using PROC APPEND) :~ee "Related
Technique." · I! :
I
iI '
Input Data Set TES'l'l I ,
i 1,
OBS X y I I
I
:
1 1 2 !
'
2 2 4
·:
i !
3 3 6 I:
·: I
4 4 8 i
5 5 10 i '
i
: I
i
2
ILIi •
2 2 4 I'
3
4
3 6 I
4 8 i
' 5 5 ,10 II
6 6 12
!
? ? 14
B 8 16 ''; :
i
9 9 18 '
10 10 20 I
I'
I
11 11 22
12 12 24 I
13 13 26
14 14 28 I
15 15 30
I I
I
I
!
i'
;:
! I
i i
I
I.
i
j
!
,.
I
'
I
1 10 r,.mmp,e J,'1- l·i'I
I
1
~11up11:r .J
Program !II The objective is to use the value of the END= variable to determine when the
I last observation from TEST! has been read. Then execute an iterative DO loop
I to generate ten new observations and add them to the end of TESTI.
I i
You can also use a DATA step to create the new data set and PROC APPEND
to add observations from the second ~ata set to the end of the first one, See
"Related Technique."
J ,
I
I
I
roa111p111a_1111g uara l' ram a .wrg1e .lo11rce1 : iu =ample S,4 117
: lij'
, I;
i Ii
Related Technique If your original data set is very large, it is probably mori ef 1cient to use a
DATA step to create the additional observations and then ~~d them to the end
of the original data set by using PROC APPEND. So that Y~'.u can initialize the
values of X and Y to their values in the last observation of/'IJESTl, read only
that observation from TEST! on the first iteration. Use a DO loop to create ten
new observations. Because there is no end-of-file conditioh ostop this DATA
step, you must use a STOP statement: ; [! .
data test2 (drop=il; j ; •
if _n_=l then set testl point=lastobs nobs=last6b ';
' i
do i = 1 to 10; :
X = X +1;
y = y + 2;
output;
end;
stop;
run;
Iii
Example 5.il Adding Observations to a Pata Set Based on
the Value of a Variable;
i
Goal Ii Add a specific number of observatioris to a data set, based on the value of one
of its variables, so that the resulting data set retains the name of the original.
I
Ii
Strategy Read an observation from the data set. Use the value of an existing variable to
' I determine how many times the DO loop should iterate and write an
'
!
observation. tf you want the resulting data set to have the same name, specify
the same data set name in the DATA and SET statements.
I 1 l wiring
-1
2 2 drywall
3 4 flooring
4 2 trimwork
5 'J painting
ii
1
2
lOJULl995
llJUL1995,
i 1 wiring
drywall
2
:I 3 12JUL1995 • 2 drywall
4 13JUL1995 1 4 flooring
5 14JUL1995. 4 floorin11
6 17JUL1995 4 flooring
7 18JUL1995 4 flooring
8 19JUL1995 2 tri111Work
9 20JUL1995 2 tr!mwork
10 21JUL1995 3 painting
11 24JUL1995 3 painticg
12 25JUL1995 3 painting
Program The o~~tive is 00 use th~~~:e~f·~~: :: ~=~:J~.:· •· ,. ,•
observations to generate for each JOB and, beginning wit~ the current day,
. ,, 1
determine on what days the job will be done. The value 'ofIDAYS determines
how many times the DO loop iterates, writing an observ1a11i 1n each time:
Create an output data set with the same data tasks(drop=i testday);
11ame as the original one. Read a11 format date date9.;
observatio11fro111 TASKS. set tasks;
Write one observation for each day that the do i=l to days;
task requi1'es a11d i11cl'ease the DATE value testday=weekday(date);
appropriately. Use the WEEKDAY if testday=7 then date=date+2;
function to derive the day of the week from if testday=l then date=date+l;
the DATE variable. If the weekday is output;
Saturday (7) or Sunday (1), then add either date+l;
1 or 2 to its value so that the new value is
end;
the date for the following Monday (2). The
sum statement (date+l;) increases the run;
value of DATE by 1 and also causes the
value.of DATE to be automatically retained
across iterations of the DATA step.
I
i
11
I
Ij
l
120 Example 5.6
Goal W,ithin the same data set, look ahea~ from a variable value in one observation
to return· the value of the same variable in the observation that immediately
follows it. Then compare the returne~ value with the current observation or use
it in a calculation on the current observation.
: . i
Strategy ' i
You can use DATA step processing to simulate the LEAD function. To look
I ahead from one observation to the 11~xt within the same data set, merge the
' " data set with itself by specifying the same data set name twice in the MERGE
statement. In the second reference to:the data set, use the data set option
FIRSTOBS= lo start processing with' the second observation in the same data
set. Because the program does not contain the BY statement, SAS software
performs. a one-to-one merge, but th~ pointer in the second reference to the
data set will always be one observati~>n ahead of the first reference.
I
;
i
In the second reference to the data sei, use the RENAME= and KEEP=
options. RENAiv,IE= gives the look-ahead variable a unique name, thus
preventing the look-ahead value froiri overwriting the value read from the first
reference. ~EP= allows only the lo9k-ahead variable from the second
reference to. the data set to be brought into the program data vector. If you keep
all variables from the look-ahead
•
read,I you would overlay values of variables
with the same names that you just read from the first reference to the data set.
iI
!
Input Data Set
ONE
OBS X y
1 5 1
2 5 2
3 10 1
4 2 1
5 2 2
'6 19 1
.,.
:i_
Ma11ip11lnti11g Data From a Si11gl~ So,Ji-ce □ Example 5.6 121
' I
.I
! I
OBS X y NBXTX MATCH I
, I
1 5 1 5 YES I
2 5 2 10 NOi
3 10 l 2 NO i
4 2 1 2 n:s
5 2 2 19 NO I
6 19 1 NO I
I
I
!•I' I'
I I
I I .
:I I
Program The objective is to create a new data set, TWO, in whibB each observation
contains the value of X. for the current observation arid ,,
. ~I·. r the next
observation. To create'data set TWO, merge data set:0. E with itself. In the
second reference to data set ONE, do three things: : 1· •
3. Bring only the look-ahead variable into the program data vector.
Otherwise, you would overwrite all the other varia'!Jles with values from
the next observation. I t1,
! Iii· ·
Then use IF-THEN/ELSE logic to compare the origiJal !~d look-ahead values
and to report a match: I
I:I
Strategy jj Use the LAGn function in conjunction with BY-group processing and array
processing to create lagged values for
a variable within each BY group. After
ii each BY group is processed, use an IF-THEN statement to reinitialize the
.I
1 •
lagged values to missing.
'I
11
II
1 1 2
2 1 1
3 1 3
4 1 4
5 1 10
6 1 5
7 2 1
8 2 2
9 3 1
10 3 3
11 3 2
12 3 4
13 3 5
ii
8
9
10
2
3
3
2
l
3
.
1
1.
11
12
13
3
3
3
2
4
5
3
2
4
1
3
2
1
3
.''
.
1
I
I
I
1 Betty ?8 88 94 57 89 77 79 81 89 82
2 James 74 82 88 1f 88 81 72 84 91 77
3 Fred 69 71 81 64 79 74 66 77 81 95
CURVE
;
CURVE contains scores!with curved values OBS TBS'l'J TES'r5 TllST9 NAMB 'l'ES'rl ITES'r2 TEST4 'l'ES'r6 TBST7 'rEST8 'rES'r10
for seven tests. Scores for Tests 3, 5, and 9 1 94 89 89 Betty 88 ! 98 67 87 89 91 92
are not curved. !I II ., 2 88 88 91 Jam&11 84 I 92 81 91 8~ 94 87
3, 01 79 01 Fred 79 81 74 84 76 87 100
I
'I
I
I
I
I
I
I
I I
Ma11ip11lati11g Data Ft-om a Single Source i ID Example 5.8 125
. ' !!
Program The objective is to curve the values of seven out of ten t~s t'. l~ores in a data set.
1
First, move the three scores you don't want curved to the ~~nt of the progam
data vector by using the RETAIN statement before the ~ETjstatement. Then
~efine the arr~y AL_LTEST by_ u~ing _NUMERIC_ so tha~_ytou do no_t hav~ to
list the numeric variables exphc1tly. Use a DO loop to begm processing with
the fourth numeric element in the ALLTEST array. Then
formula you want on the fourth through tenth array elements:
~xf
cute whatever
I! !
:
Strategy The input data must be sorted on the ;BY variable. In a DATA step, use a BY
I statement to create FIRST. and LAS'f. variables for the BY variable. Using the
i
! values of;these variables with IF-TH?N logic, you can process observations in
I!
i groups. By using SUM statements and creating new variables to contain
running totals, you can accumulate the values of each variable as you process
the BY groups. ·
'
So that the new data set contains only the grand totals for each BY group, use
an OUTPUT statement with IF-THE~ logic to write oniy the last observation
for each BY group. Rename the original variables if you want the new
variables containing the totals to hav1 the same name as the original variables.
. I
To create an output data set that contains only a running or cumulative total for
each BY group, remove the IF-THEN logic that causes only the last
observation from each BY group to be
written to the output data set. See
"Related Technique."
1 ·A 2 3 4
2 ,A 5 6 7
3 B 1 2 3
4 ' ·C 1 2 3
5 C 4 5 6
6 C 7 8 9
· i! 11!:'
1!
Program The objective is to create an output data set that co~tJ~ds the grand totals for
the variables GAMEl, GAME2, and GAME3 for eacH ~y group. The data
must be sorted by ID, the BY variable. Create new yafirbles that will contain
accumulated t~tals. Use IF·TH~N processing and the1tr~ue of FIRST.ID to
reset these variables to Oeach time a new BY group ~rims. Use the SUM
statement to create running totals. Use the IF-THEN anti OUTPUT statement
to write only the last observation for each BY group tg lhe output data set.
Rename the original variables from SCORES so that th~ variables in the new
data set that contain the accumulated totals can preser~~; the original variable
names: ! I:
I ,I ,
: I ,
Create GRANDTOT atid drop tlze data grand tot (drop:templ temp2 temp3) ; ! I:
~ariables tllnt represe11t tlie GAMEl- 1 set scores(rename= (gamel=templ game2:temp2 ga e•:temp3)J;
, I
GAME3 vali,esfro,n SCORES. Read all I
obse1'11atio11from SCORES a11d re11ame I
by id; !:
tlie origi11al variables co11taini11g tlie ga11, e
score valtter. The BY statement specifies I
ID as the BY-group variable and creates the
variables FJRST.ID and LAST.ID. !
i
Wizen re((di11g tliefirst observation ofeaJ1z if first.id then
BY gror1p, reset tlie values of GAMEi- i do;
GAME3 to zero so that tl,e total from tl,e1 game1=9;
previous BY gro11p is 1iot retai11ed. I game2:0;
game3=0;
end;
I
Add tlie cw·re11t valtte of TEMP1-TEMP3 gamel + templ;
,to tl,e rrm11i11g totals. Write 011/y tlie last ! game2 + temp2;
observationfor each valrte ofID to : game3 + temp3;
GRANDTOT, The three sum statements a4d if last.id then,output;
thevaluesofTEMP1-TEMP3 to GAMBlL run;
GAMB3 and also cause the values of i
GAME1-GAME3 to be automatically ,
.__) retained across iterations of the OATA step.
1.::0 L!.XOIIIPIC ;J,Y 11<-tmpter J
I
:
I
Ref ated Technlqu'e You can produce a data set that contains running totals of GAME1-GAME3
I! for each BY group by removing the last IF-THEN statement from the end of
11
,ii. ., the DATA step in the previous program:
I
11 I
·I if last.id then output; I
I: I
\
The DATA statement was also chang¢d to produce the CUMTOT data set. See
Output 5.9b. i ·
. I
! !
i
' I
I
!
Example 5.1 o Calculating the Percentage That O~~
Observation Contributes to the Tot~II of a
BY Group I
, I
Goal Calculate BY-group totals for a variable and then create: aiy1ariable that shows
the percentage that each observation contributes to the totf:1 for that BY group.
1r
' .1
1:
Strategy Use PROC MEANS to calculate a total for each BY group and to create a new
. ,. I
data set that contains one observation for each BY group. [!'hen use a
one-to-many merge to merge this data set with the origina) data set and
1
I'
You can perform the same task with PROC SQL. See "Re;Iated Technique."
Note: Due to the variabiHty of data and the number orcWditions that
H:
determine the path chosen by the PROC SQL optimizer,:i~ not always
possible to determine the most efficient method without fi I st testing with your
I : :
data. ·
'l
3 NORTH 1001 $1,000,000 3 $4,635,000
4 NORTH 1002 $1,100,000
5 NORTH 1003 $1,550,000 : ! .
: i :
6 NORTH 1008• $1,250,000
1 ; :
7 NORTH 1005 $900,000
8 SOUTH 1007 $2,105,000
9 SOUTH 1010. $875,000
10 SOUTH 1012 $1,655,000
1.:,u r,xarnpre .J,Ju LI: i ~11u111e1· .J
:1
II ------=-::---=--------------=------=-----=--=----
_ _ _______,~ --- -----------------=---
~ - -
. i ,\
Output S,lOb PERC~NT2 Data Set I
I'I
PERCENT2 was creal~d ~ith PROC SQL.
1
OBS IIBGION REPID
l'BRCENT2
l:1,,
·i:
·'I
I
' 1
2
BAS'l'
IWl1'
1051
1055
$2,508,000
$.1,805, 000
$4,313,000
$(,313,000
58.15
41,85
ii 3 NORTH 1001 $'1,ooo,ooo $5,800,000 17.24
'! 4 NORTH 1002 $1,100,000 $5,800,000 18,97
5 NOR'l'K 1003 $1,550,000 $5,800,000 26,72
6 NOR'rH 1008 $1,250,000 $5,800,000 21.55
7 HOll'l'H 1005 I $900,000 . $5,800,000 15,52
8 SOUTH 1007 $2,105,000 $(, 635,000 45,'2
9 SOU'rH 1010 l $875,ooo $(,635,000 18,SS
10 35,71
i SOll'l'H 1012 $1,655,000
' !
$4,635,000
iI
I
,I
. I
I
1! I
!
Program The objective is to produce a data set ~hat shows not only the sales amount
produced by each sales representative but also what percentage it represented
in the total for the region. !
!
First, use PROC MEANS to produce REGTOT, an output data set that
contains totals calculated for the AMOUNT variable for each BY group in
SALES. Then merge REGTOT with S!A.LES by REGION and calculate the
• . I
percentage ofreg1on total (REGTOTA:L) for the amount sold by each sales
representative. i
I
Due to match-merging behavior, the. value I
of REGTOTAL is retained until the
value of the BY vru.iable REGION changes. The value of REGTOTAL,
therefore, is available for calculating the value of REGPCT for each
observation.
,,.u.c,11pnn.u111,5 .1.,1u1u •·1ur11 u "111,;1r: .IJUU/"(."C'I' r,;rnmpte :,,JU 131
; ,I
: ii
1
lllj
Create REGTOT, a data set tl,at co11tai11s proc means data=sales noprint nway; ,.
011e obser11atio11Jor eacli REGION. Create 1:
var amount; :;
a 11e111 va,-iable, REGTOTAL, tliat co11tai11s by region; ; Ii :
. I• !
the total AMOUNTfor eaclt REGION. output out=regtot (keep=regtotal region) sum=regtota: i
!
run; I
!
C,-eate PERCENTI by mergi11g REGTOT 11
data percent!;
·1
will, SALES, based 011 tl,e Jlali,e of the BY
variable REGION.
merge sales regtot; rl
by region; II1:
,!
Calc11late the perce11tage eacl, obse1·vatio1i regpct = (amount/ regtotal) * 100; 1:
il
co11trib11ted lo tlie total for tlze approp1·iate format regpct 6,2 amount regtotal dollarlO.; ti
I!
regio11. AMOUNT is the ainount run;
contributed by each sales representative and 1·
REGTOTALis the sales total for that
region. I
I
I
Related Technique If you are familiar with Structured Query Language (SQLl, you may want to
use PROC SQL instead of the DATA step. Using the SAL~S table,* PROC
SQL creates a new table that contains two new columns ofi ~limmary data,
; I, I·
REGTOTAL and REGPCT. Because the data are grouped lby REGION, the
summary SUM function sums data in each group, not the e~tire table. Thus, to
get the total for each region, simply use the SUM functio:n the AMOUNT :op
column. The region t~tal becomes the values of the REGT~]l'AL column. To
calculate a percentage for each REPID, divide AMOUNT ~Y.:the sum of
, AMOUNT for the region and multiply by 100. The percenta~e of the total
I
1 becomes the values of the REGPCT column: · ' ·1
I I
! .
proc sql;
create table percent2 as i 1
select t, sum(amount)· as regtotal
•. ----- •
format=dollarlO·,
; II
100* (amount/sum(amoun~)) as r:~?.Qt forDlil; t=l6 2
from sales 1
group by region;
quit, ·- · : I :
r
'i
'
• A PROC SQL table is a SAS data set. In SQL terminology, columns ~r+ variables and rows
are observ111ions. i Ii !
1·;
I.
1,3~ i:.:i:ampte .J,IJ .HI
I" '
L.11ap1e:r J
'1 i! i
Goal For each row in a table, determine th~ number of occurrences of a column's
value, and store the number in a new icolumn. *
I
i:
I :
Strategy Use the COUNT function in the SQ~ procedure to obtain a frequency count.
Create a new table that includes a column that shows the frequency count. Use
the GROUP BY clause so that the fre~uency count will be for each group.
Resulting Data Fr t
Output 5.11 FINAL ,~ble
FIIIAL
• A PROC SQL table is a SAS data set. In SQL lerrninology, columns are variables and rows
are observations.
I
I
I
. I
ivwmpmmmg vu,a r rom a ;,111g1e ;,om·cel,
. ,1 w r,xampte :,, 1 l 133
; ! I
Program The objective is to create a new table that shows how m~ , times each ~
employee appears in the original table. Group the data by Jy~lues of ID. Count
the number of rows in each group. Create a new columd th t shows how many
times the employee appears in the original table: , Ii
Invoke PROC SQL, a11d create a table. The proc sql; I Ii
1:
CREATE TABLE statement creates the create table final as !
table FINAL to store the results of the
subsequent query.
i 1·1 i
Note: In PROC SQL, SELECT statements automaticallYj~foduce a report.
SELECT clauses, which follow CREATE TABLE or CRE-~!fE VIEW 1
I . '
: I!
I
..
I
I
; i!
i I
i i
!I ''
134 &;ample 5.12
ii
C?j _Chapter 5
ll
ii
I , .
Goal Group the data in a table* and create'a new column that contains an average for
each group. Use the average for each group to subset the table.
i
Strategy Using the SQL procedure, create a nbw table that contains a column that gives
the average of values from a specified column. Use the GROUP BY clause to
group the data and find the average f9r each group. Use the HAVING clause to
subset the table and to return only rows from each group that meet a specific
search criterion.
i
1 Nikos Al $32,456.00
2 Paul NA2 $53,798.00
I
3 Jody T2 $25,147.00
4 Olga Tl $19,810.00
5 Yao NAl $4~,433.00
6 Natasha Al $3~,987 .00
7. Tom T2 $23,596.00
8 Kendrick NAl $4~,690.00
9 Kesha Al $33,067.00
10 Klaus Tl $2~,230.00
11 Kyle NA2 $51,081.00
12 Carla NA2 $5?,270.00
13 1\nne T2 $24,876.00
14 Gunner NAl $4?,345.00
15 Candice Al $34,567.00
"
I
:Jl
• A PROC SQL table is n SAS data set. In SQL terminology, columns arc variables and rows
are observations. ·
••-•••••r• .. •• .. •••o _ .... _ • '""'""" ,e .., ... 6,., ._,._.,.,,1,o1.o11I" &:.t..\f.flllJ,IIG J■ I..£ 1~'1
I li1:
: ii I'
Program The objective is to find which employees make higher s'aiJties than the average
salary for their jobcode. Calculate the average salary for
create a new column that contains the average. Use the
jobcode and
age salary for each ~Je
J~F
jobcode and each employee's salary to subset the table:! :
!
I11voke PROC SQL, a11d c,·eate a table. The proc sql;
CREATE TABLE statement creates the create table final·as
table FINAL to store the results of the
subsequent query. i'
i ii
. : lj !
Select tl,e col1111111s. The SELECT clause select *, avg(salary} as average formal:=dollatlO:, ·
selects all the columns from EMPLOYEE ' lj
and creates an additional column, 1,
AVERAGE. The AVG function calculates ,1
the average for all the values of SALARY
for each JOBCODE, which becomes the
value of AVERAGE.
-i'
I,
I I
Really Rounding Numbers
Strategy Introduce a fuzz.factor** that allows you to obtain numeric calculations that
are closer to the pencil-and-paper results that you expect. In a macro, use the
regular ROUND function, but first a~d a fuzz factor to the number.
1:J!
it against the results o rounding
technique. ·1 !
1 0.0000540 0.0000500
'ii : 2 0.0000550 0,0000600
:!
: j 3 0,0000560 0.0000600 ,
0,9998100 i
: j
,I 4 0.9998050
:' I'I
5 0.9998060 0.9998100:
I
6 17,9998050 17.9998100 !
7 17,9998060 17.9998100 '
8 18.9998050 18.9998100
9 18,9998060 18.9998100
10 18,9999050 18,9999100
11 18.9999060 18,9999100
Resulting Data:~~ 1s
-l,1 '
01tlpllt 5.13a MAC,N_D. Data Set '
'I I !11\CRND
. !1 I I
MACRND shows rou dmg with a fuzz OBS NIT1 ! Al!.T2 M_IIOUNI) HATCH
I
factor. 1 o.oooou 0,000050 0,000050 yes
2 0,000055 . 0.000060 0,000060 yes
J 0,000056 0 ,000060 0.000050 yea
4 0,999805 o,9m10 0,999810 yea
5 0,599806 0,999810 0,999810 '/{88
6 17,999805 17.999810 17,999810 yes
7 17 ,999806 17.999810 17,999810 yea
0 18.999805 18,999810 18,999810 ye11
9 18.999806 18.!199810 18.999810 yea
10 18,999905 18,999910 18,999910 yaa
11 18.999906 18,999910 18,999910 yea
I:
. I!
Program The objective is to produce rounded values that are thJ Jame as
pencil-and-paper r~suits. The program use~ a macro fa: ~~traduce a fuzzing
factor when roundmg a value. For companson purposesj ,data set AMOUNTS
contains the variable AMT2, whose value is the corri:9~ Iiencil-and-paper
rounded value of AMTl. The program rounds AMT~ y?~lh a macro, produces
M_ROUND, and then ;compares its value with AMT?,·jW~ie TEST is set to
'YES' when th~ values match so that you can easily see ithe result of
fuzz-factor rounding: • · ·
. I
1:1;
I I
I" !
. 1:lli
~ A Closer .Look '1 Numeric Precision and the ROUND Fu?~fl~n
We are introducing a fuzz factor because of a numeri~ Jr~cision problem
common to computer applications, not because the R(?'9J;ll'D function produces
inaccurate results. The problem of numeric precision ~~¥s because of
hardware limitations in.the way computers store real qu'irtbers. Basically, a
finite set of numbers must be used to represent the infin1i~U real number system.
1 !11; 1
' 1:1
!I
138 Examp{e 5.13 i
I
I
I
I
the usual pencil-and-paper results. Numeric precision is an issue across all
platforms and is not consistent from riachine to machine.
I
The key to defining your own rounding routine is to determine how much of a
. I
fuzz factor should be added. You want to add enough so that values are
rounded up when they should be, but!you do not want to add so much that
values that should be rounded down are also rounded up instead. See "The
MACROUND Macro and the Fuzz F~ctor" for a discussion of the rounding
routine used in this program. j
I
So that you can compare the results, the following program is the same as the
original one except that it uses the RQUND function with no fuzzing factor:
I
data regrnd;
format amt1 amt2 r_round 10. 6;:
set amounts; ]
r_round=round(amtl,. 00001); ·:
if amt2=r_round then match='yes';
1
. j else match='no';
i run;
!
.l
. ! I
Output 5.13b shows that the rounded numbers in REGRND do not match the
iiij pencil-and-paper answers as well as tli.e MACRND values do in
it Output 5.13a. !
!.'11 I
I '
~ The MACROUND Macro and the Fuzz Factor
I
I
!
i
The MACROUND macro uses the ROUND function with an additional fuzz
!I factor to produce pencil-and-paperres:ults when rounding a number. When the
. 11
: 1 macro is defined, macro parameters in' the %MACRO statement define three
-:1
,,:1 macro variables: VAR, UNIT, and FU:ZZ. The variable FUZZ is assigned a
value at that time.
ii"
,I
ii %macro macround(var,unit,fuzz=le-10);
round ( {&var+ (sign(&var) *&fuzz)) ,&unit)
; !1
j:I %mend;
i
'
When you invoke the MACROUND macro, values for VAR and UNIT are
I I
supplied: 1
m_round=%macround(amtl, . 00001) ;!
!
At macro execution time, the macro variables are resolved:
I
' i
m...round=round( (arntl t (sign (aintll ;* (le-10) J) , •00001)
i I
The routine used here ensures that a value that should be rounded up will be
rounded co1Tectly, even when its valueiwas slightly less because of the
representation error. The SIGN functiop is important here because it ensures
that negative values are rounded in the '.correct direction. The SIGN function
returns a negative value when AMTl iJ negative. Without it, a negative value
for AMTl would be rounded in the wr6ng direction. ·
!
mumpwu1111i,: uu1u £"IV/II u ""'8'" "u"rfe j f j l!:Xamp[e S.13 139
1 i_l 11
: 1i I 1
As an example, here's how the expression works when ~t ex~cutes for the first
observation in this example: ! j I;
iI !1 'II
m round=round(0,000054+(sign(0.000054)*(1e-10)\J d6001)
m=round=round(0.000054+.0000000001, .00001) 11
. · 1
11
m_round=round(0.0000540001,.00001)
m_round=.000050 i
l,
'II ,
: I:
Where to Go from Here □ Numeric precision. For a discussion of this issue, Jee!~~-
88-95 in SAS
Language: Reference, Version 6, First Edition. For ml:>fe discussion, see
Klenz, Brad (1992), "Handling Numeric RepresentJti◊p Error in SAS
Applications," Observations, 1(3), 19-30.
,1 ii!
,, I
□ Macro processing. For useful introductions, see C~ap1ter 1, "Introducing
the Macro Facility," in SAS Macro Facility Tips and T.dchniques. For
complete reference information, see SAS Guide to Ma 1dh:, Processing,
Version 6, First Edition. · !
Example 5. 1;'-l Collapsing Observatio:ns withJn a BY Group
into a Single Observation
Ii
I'
i
I
'
Goal .I
I
Rean-ange a data set by changing a sipgle variable in a group of observations to
i a group of variables in one observation. Reshape data by collapsing
observations within a BY group into hsingle observation in order to simplify
data analysis and report generation, I
i
Strategy J
Collapse multiple observations with common BY variable value into a single
observation. The data set must be sorted pdor to the DATA step. Use a BY
statement to create BY groups and th~ FIRST.variable and LAST.variable.
Use array processing to assign the current value of a certain variable to the
appropriate new variable. Use the FIRST.variable to control a DO loop that
reinitializes the retained array values knd resets the array subscript variable.
Use the LAST.variable to output the il.ewly
I
created observation.
I
I
'
!
S'l'UDEN'l'S
\ J
OBS NAME SCORE
1 • Deborah 89
2 Deborah 90
3 Deborah 95
4 Hartin 90
5 Stefan 89
6 . Stefan 76
Resulting Data•frr
Output 5.14 SC01r Set :
'
SCOl\ES
~I ·s
OBS NAME -SC0RE1
I
sc3i2 SGOI\E3
I:;: 1 Deborah 89 90 95
2 Ma.rtin 90
3 Stefan 89 '
7/i
Iiii ·
I
i
!
1J
H
i
I
I I
.!
!
I
i1
ii
I·
:!
Iilj
11
•I
Ii
Ii
I!
!i
I!i r,
I 11
!!
1·
i
1'
·I :I,,
,, • .,_UUf/" . . . . . . . •~•6 .... l,UH & f VUI ... ..,llllji&S. ~i,r~.•r~ "-' l,j.U,,UJJfe J,14 141
'1'
,,
11 .
, 'I
Program Each observation in the data set STUDENTS curre~ttY stores a single score.
' ,, I
The objective is to reshape STUDENTS so that an qbs~'rvation contains all test
scores for an individual student. Use array processi~gi~4 create and assign
values to the new variables SCOREl, SCORE2, and SGORE3.11'
The values are
retained with each iteration of the DATA step, and the't'ecordis written only
when the last observation in the BY group is processe~.jThis program assumes
that STUDENTS contains no more than three obser~.a~irns with the same value
for NAME: ' 11
: ii '
: I
Create SCORES. Use the RETAIN data scores (keep=name scorel-score3 l ; I
stateme11t to create the variables SCOREi- retain name scorel-score3; !I
SCORE3 and to l'elai11 tlze values of !
!
I
NAME a11d SCOREI-SCORE3 from one
iteratio11 ofthe DATA step to the next. j
Ii
I
lj
Create the array SCORES. SCOREl- I array scores(*) scorel-score3; !
SCORE3 will receive their values from the
variable SCORE in STUDENTS. 1 i
i
!
i
Read observalionsfrom STUDENTS. Use i
set student~( ! :
the BY statement to create BY groups a11d by name; ' i
tire va,-iables FIRST.NAME a11d 1·
l
LAST.NAME. 1 i
i
At the begi1111i11g of each BY group, use I
I
if first.name then do;
I
the assig11111e11t stnteme11t to set the vallle I i=l;,
i
I
to
of tl,e al'ray subscript I lo 1. Reinitialize I do j=l to 3; I
-i
111issi11g the values of SCOREI-SCORE3; ,scores (j) ". ;
! end;
end;
:j. l1'[ I
.
i
Goal Reshape data by creating multiple ob~ervations from a single observation in
the input data set and by assigning variable names as values in the output data
set. :
II
Strategy 1: Use an mTay and a DO loop to create'.multiple observations from each single
ii
IJ
observation in the input data set. Use 'the CALL routine VNAME to assign
variable names from the input data se't as values of a new variable in the output
ii ' I
II!I ~- I
I
I
I
.
I
I
SURVEY I
'i .
i
I
OBS NAME CEREAL PASTRYi BAGEir
1 John 10 9 I 8
2 Sam 2 8 4
3 Sally 5 7 6
Resulting Data:~ e~ 1
syiw1m
I'I,.'
SURVEY2 contains reshaped survey data.
!I OBS IIAMB D!IRAKFST RESPONSE
I
Ii !
l John [CBRBAL 10
2 John [PASTRY 9
1: ' 3 John 'Bll.GEL 8
ii, I 4 Sam ic&llEAL 2
11 ,
iI i s Sam 1
PASl'RY 8
I ~ Slllll :BAGEL 4
7 Sally CBRBAL 5
I 8
9
Sally
Sally
iPll.Si'RY
sll.GBL
7
6
I 1
I'i 1
;'l·l·i
. I·'
·.·i;!
:I
.. I
Ii
iiII
:;.11.1
I
I:
J!
i 1·I,
11
Ii'I
:.1
1 •I
!:1
I
1:
J"'-Ul••f'Hll.4•U'0, ._,LlfW • ,..,,,. M ~Ul4SUi UU'UIL,]
i i
IP....,
!
.t;,,\Ull4j,IIC J • .tJ )&f-->
Program The objective is to read SURVEY and create a new d,J which the lifa
variable names CEREAL, PASTRY, and BAGEL becom~ yalues for the new
variable BREAKFST and in which three observations ai-elcreated for each one
in SURVEY. The numeric values of the variables CEREA" ; PASTRY, and
BAGEL are written to the new variable RESPONSE: : ! : J
Create SURVEY2, dropping Ille variables data pUrvey2(drop=cereal pastry bagel il;
whose values are being written to the llf!W
variable RESPONSE a11d droppi,1g tlze
set survey; I' i
I
I
variable used by ti,e iterative DO loop.
Read a11 observali01tfrom SURVEY.
Defi11e the array NUM. Defl11e character array num (*) cereal pastry bagel;
variable BREAKFST a11d give it a length length breakfst $ 8;
of eigi,t.
I '
Reshaping· Observatiohs into. Multiple ·
Variables i
i
I
Strategy Sometimes data that are stored as nutiierical values actually have greater
significance and potential usability than is apparent. To improve meaning and
usability. use a multi-step process to transpose a data set so that information
originally: stored in a few numeric variables with numerous observations is
stored as multiple variables with few~r observations.
!
First, use the TRANSPOSE procedure with the VAR statement to reshape the
variable so that its values are stored iri a series of new variables. Use one
variable as a BY variable to create one observation for each unique value of
that variable. Write the results to an output data set.
!
i
Second, transpose all of the variables !ll the output data set so that each
variable name is now a value of a new, character variable. Store the values of
each variable in another series of variables and create a new output data set.
I
1 1 6
2 1 9
3 1 8
4 1 9
5 1 8
6 1 10
7 1 10
8 2 7
9 2 8
10 2 8
11 2 10
12 2 9
13 2 8
14 2 10
15 3 6
16 3 9
17 3 9
18 3 9
19 3 8
20 3 9
21 3 9
i' * To transpose is to reshape data by turning columns (variables) of information into rows
I (observations). i
,,
i
!i
11
,!
'I
l1
11
1'
I:'
':1
Ii
~u,np1c .1• .1 u .....
i '
Resulting Data Sets I I
i
I,
011tpr,t 5,16a INTERIM, First Transposed
Data Set
OllS FORD NISSAN MAZDA
IN'l'BRlM
SAAB SA'l'IJRN
i 1·1:
HONDX\
'i
TOYOO'A
I•
1 6 9 8 9 8 10 10
2 7 8 8 10 g la 10
! 1:!~
3 5 9 9 9 8 , 9
i
i
!
I 11., f
011tp11t 5.16b FINAL, Final Transposed
Data Set
FINAL :' 1:1
OBS MAXI DEPEND APPEAL PBRFOl!M
i'',
' I
,:
1 !FORD 6 7
2 IIISSMI 9 8 ,: I I
3 MAZDA 8 8 9 r i
9 10 I:
'
s
6
SAAB
SA'l'UllN
RONDA
8
10
9
8
9:
B
9:
t
i
7 .TOYOTA 10 10 ,; I:
i
!
i ! I !
I ' lj: 1
:
I : 11:
Program The data set ONE contains survey data that have been colle ted on the
dependability, overall appeal, and performance of cars fr~Mlseven
manufacturers. However, the results are stored simply as; v~\*es in the numeric
variables CATEGORY and RATING. Not only are the datJ rot useful in their
current form for further processing, but their meaning is ~c1tually bi~ried.
i II: .
Reshape the values in a two-step process to reflect their ~eaJ :r;neaning and to
1
make the data usable. In the first step, transpose RATING ~'tj:its values are
stored in a series of new variables, each reflecting a car m_ d~tifacturer. Do not
I. transpose the values of CATEGORY. Simply collapse the· ya~ues Ill•
into one
I observation corresponding to each BY group. The resultip*l1.ata set INTERIM
i has seven variables, one for each car manufacturer, and three!observations.
• i jjl!
In the second step, transpo~e the seven car variables so t~at:~ch variable name
is now the value of a new variable, MAKE. Store the valu~~ ~f each car
variable in the three new variables ~EPEND, APPEAL, i~1iPERFORM,
which reflect the qualities surveyed. The resulting data set,!f.INAL, has four
variables, one for the make of car and three reflecting the! tlfo' e qualities. It has 1
i i:i j
! This chapter is not strictly about combining observationJ fioin
' • If I.
I
different SAS
, data sets. It contains, however, examples of commonly as~7tl questions about
dealing with d~ta values, s_uch a~ extracting character s_ti:.in~~ ;rro~ a variable
value, converting a numenc variable to a character vana~l~ rnd vice versa,
performing a bubble sort, and determining someone's agF•. ~'~-p. m a SAS date
value, among others. : 1p
: I '
' ~ ii
6.1 Converting Variable Types from Character to Numeri and Vice
Versa 148 i I!
: 11
6.2 Determining the Type of a Variable's Content 1~01! i
' ! 11 ;
: ! '
' I
!
:I
Goal Read the.value of a character variabl~ and write its value to a numeric variable,
and vice versa. i !
Strategy pf
You cannot directly change the type a variable. You must create a new
variable of the desired type. Use the PUT or INPUT function and a specified
format or informat, respectively, to c~>nvert a value. Use an assignment
statement to assign that value to the new variable. When converting character
to numeric, use the INPUT function *nd a numeric informat. To do the reverse,
use the PUT function and a numeric format.
i
i
: I
• •I '
s
Data set ONE contai a single character ONE , TWO
variable. Data set TW.O c:ontains a single !I
numeric variable. ; iI i, OBS XCIIAR OBS, Y?Ml
. !
1 0123 11 123
: I 2 12345 2 !I 12345
: !
: i
3 123456 3 I 999
:: iI ! 4 123A45 4 !f
.!
!i
i
'
CHWNUM
OBS XCHAR XIMI
1 0123 123
2 12345 123'5
3 123456 123456
4 l23A45
I I
i I
i
II ,
Output6.lb NUM2GHAR Data Set I
:I ii I
NUM1CRAR
II
I
'I
OBS nMl ! YCRAR1 YCIIAR2
I l
2
123
12345
j 000123
1012345
123
12345
3 999 000999
I 999
4
'
I
' j
i
i,
Output6.lc
I : Data Set
CHAR2•rr2:•. I
Cllllll2tnl2
i
•i ; OBS I XCRAR
. I , l 123
2 : 123'5
3 : 123456
!j : 4
i
,.. I
i,.
±&±fr=
I
I
I.
iii
I''
f ii
111
n11";•fVU~ ·~~
UUIIIIG-.J LIIIU ,1
; !I
Program The objective is to read a character value and write 'id '.if possible, as a numeric
value to a new variable, or vice versa. To write the ctja}acter value of XCHAR
to the numeric variable XNUM, use the INPUT funct,iAn to return the existing
i'I
'
character value as it is read with the numeric informa't' 8, To write the value of
numeric variable YNUM as a value for character v~rtJbtes YCHARl and
I YCHAR2, use the PUT function and the numeric fonrtht I I•
Z6. to return the
•I
I existing numeric value with leading zeros, or use the ~UT function with the
I
I standard numeric format 6. to return the numeric val~e !without leading zeros.
I In all three cases, use·an
,
assignment statement to save!the
I I I J:
returned value to the
new variable XNUM, YCHARl, or YCHAR2, respec~ vely:
':I .
Create CHAR2NUM. Read a11 observatiL data char2num; j
from ONE. ' set one;
l
!
Read the value of XCHAR 111itfr a m,me,:ic xnum = input(xchar,?? 8.);
illformat a11d assign it as a 1111111eric valrtk run;
to XNUM, a ,mmeric variable. The INPUT
function reads the value ofXCHAR with 1
the numeric informat 8. and returns a
numeric value. The ?? format modifier
suppresses the invalid data messages and
prevents the automatic variable _ERRORl
from being set to 1 if XCHAR doesn't i
contain valid numeric data. I
I
I
Create NUM2CHAR, Read a11 observatidu data num2char;
from TWO. I set two;
1
[fthe value/or YNUMis missing, tlzen if ynum=, then ycharl=' '; !
explicitly assig11 YCHARJ a value of run; .i
bla11k. Assigning a blank to YCHARl . Iri
overrides the default value of a period (for ii
missing), ii :
If
ti
Related Technique You can drop a_nd rename variables so that the new JaH ~le with the numeric
value can retain the same name as the original variable 1 ·was created from:
1:
1:
I,
i
1 :,u r.xampte O,-' ,· 1..;(IC1prer u
I;
I ,
ii
Goal Determine whether a character varia~le's value contains numeric data,
character data, or missing data.
1.•11
1·
I
Strategy To determine the contents of a charadter variable's value for each observation,
first test the value to see if it is missing (blank). If it is missing, classify it as
undefined. If it is not missing, then use the INPUT function to read the value
with a numeric format. If that result is not missing, it is a valid numeric value.
If it is mi~sing, it is classified as a chfacter value.
Ii
ll,,
. 1, 1 1234
2 12E5
I,
1
1 1
3
' ·,1 4 124ABC
·I 5 124
i 6 ABCDEFGH
Resulting Data
Output 6.2
~rf
NEW DaiJ
1!
;Set
I , lmw
'I
Ol!S 'l'lPB X
I
1 Numeric 1234
2 NWDeric 12BS
3 Undefined
~
,.
!"I
s
Character
Numeric
124ADC
124
,I
' I
6 Cbara~ter
i
AllCDBFGH
II ;
I
I 1
_.,.,., ....,,., •"'""":•111• ~ =umy,~ u.~ ,_, I
1'
i ,!
Program 'of
The objective is to determine the data type of the value I ariable X in ea.ch
observation in data set OLD. Read each observation and telt
to see if the value
of Xis missing (blank). If it is missing, set a new vatiabl6Jh'amed TYPE to
"Undefined". For all other observations, assign a value: t6J hi temporary variable
by using the INPUT function and a numeric informat to r~tlirn a numeric
value, If the value is not missing, it is a valid numeric v~lti '. If it is missing, it
1
. jl ,
is a character value. Ii '
Create NEW. Read a11 observatio11from data new(drop=tempvar); !i
OW. The LENGTH function assigns a
length of 9 to the new character variable
length type$ 9.; I!
set old;
TYPE. By default, TYPE would have been
created with a length of 8. 1:1
11
Goal Determine if a variable is character Jl' numel'ic to ensul'e that you have the
right type of data for your applicatio#.
i
Query the table* DICTIONARY.COLUMNS in PROC SQL to determine the
variable's type. Use the INTO clauseito store the variable's type in a macro
variable. Use the macro valiable in a 'subsequent DATA step to c1·eatea new
variable of the other type that contairis the same data as the original variable,
I
I
Resulting Data: l$ets
Outpllt 6;3a NUM2t¼ARData Set i
:Ii! : NUM2CIIAR
. 11 !
. Ii '
OBS' X_NUM iullllR
I
CIIJ\RVAL
I 1 12345 112m 123,5
I I
:·I
' ·1
' i
,I
Output 6,3b CHAR2 : UM Data Set I
CIIAR2NllM
-I 11! !
, !I . OBS X_NUH j Y_CHAR HUMVAL
!I r
. ii i 1 1234S i 12345 12345
. I.I . I
'L!
, !·I
ii I
I
i
Program The objective is to ensure that you are :using a certain type of data in your
application. . I
lI
The table DICTIONARY.COLUMNS!contains information about all variables
in all SAS data sets in the current SASi session. For each variable in data set
'. I
i
ONE, query DICTIONARY.COLUMNS to determine its type. You must
I subset the query to get the type for only one variable, (Typically, you subset
,
1!
I!
queries of dictionary tables with a WHERE clause because they are very large
I
11
I
!
• A PROC SQL table is a SAS data set. In SQL terminology, columns arc variables and rows
are observations.
~ . . . . . . ~J ...... • .... 1l'r. J ~ "'""""'"' u.J .....
tables.) The query returns the value num or the value c,qar. Store the value i-n a
macro variable. Use the macro variable in a subseque~fDATA step. The
DATA step creates a new data set and a new variableltl!at contains the same
data as the original variable, but of a different type. ! j
In many cases, SAS automatically changes data from{e type to another, but
it is more efficient if you do it. In addition, if you contrbl the conversion, you
can avoid possible unexpected results when numeri6 d~ta defaults to using the
BEST12. format. : I:! .
Qllery DICTIONARY.COLUMNS to /eam proc sql; !' .·.1 '
S11bset 1/ze query. To get the type for only;I where libname='WORK' and
X_NUM, subset the query to return only t~e rnemname='ONE' and
row for that variable. The values in the 1 narne='X_NUM';
dictionary table are in upper case, so the quit;
WHERE clause must use upper case as
well.
Subset the query. To get the type for only where libname='WORK' and
Y_CHAR, subset the query to return only rnemname='ONE' and
the row for that variable. name='Y_CHAR';
quit;
I :
Create a 11e111 data set a11d a new vadable data char2num; di
that co11tai11s tlze same data as //,e ol'igi11a set one; j;[ :
val'iable but ofa differe11t type. The INPUT if "&vartype"='char• then numval:cinput(y_char,51),r-'.
function returns the value ofY_CHAR as i run;
· l,1
!i !
numeric. The numeric value is stored in thd :i •
variable NUMVAL. I II . !
'
'
SELECT clauses, which follow CREATE TABLE or dREATE VIEW
statements, do not automatically produce a report. · 11 j
1:
Ii
I O't
,
lfl"'.'t!
Y'tl/t!Tt! IQ \JO j/"0/'. LI lAlCIJJlt!I" 0
.:11:1
I ,!
!
D Dictionary tables.
□ For an example that describes dictionary tables more thoroughly, see
Example 6.4, "Creating a s4s Data Set Whose Variables Contain the
Attributes of Variables from IAnother Data Set," in this book.
□ For a complete description ahd
examples of dictionary tables, see
pp. 286-291 and pp. 294-295 in Chapter 37, "The SQL Procedure,''
in SAS Technical Report P-222, Changes a11d Enhancements to Base
SAS Software, Release 6.07. iI
□ For examples that use dictionary tables, see Chapter 11, "Five Nifty
Reports Using PROC SQL yiews in the SASHELP Library" by
Bernadette Johnson in Repor/ingfrom the Field: SAS Software
Exper/s Present Real-World Report-Writing Applicatio11s.
. I
II
I
i
. i.
r
i
'·:
'II
i
:;
i
;j
ii
Utilities a11d F111i~~io11s □. Example (i 4 155
: 111: .
: I'! I
I!
Il :
,.! ;
ii :
Resulting SAS Log I
I
Iqi !I
Output 6.4n Columns in
DICTIONARY.COLUMNS NOTE I SQL table DICTIONARY. COLUHNS was creatad like I
Id
I; I
creata table D!CTIONARY,COLIJ}INS I
( i :1
The descriptions of the columns used in this
example are shaded.
LlBNAME char(8) labela'Libraey Name' 1
IIEMNAME char (8) label='Member Nllllle',
f
MEMT:tPE char(8) label='Member Type',
NAME,.char(8) labala 1 Column. !lame•,
'l.'YPI. char(4). labela'Coliimn Type',
LIHG'l'R lilim lllhel='Column Length''
NPOS nWll ·1ebela·•coluinn Position',
VARNUM nWll label='Column Nwnbar in Table',
LABEL char(40) label=•Column Label',
FORMA'l' cbar(16) label='ColWllll Format',
INFORMAT char(16) labela!Column Informat',
IDXUSAGB char [9) labala 'Collllllll Index Type'
)1
I! 1·
11.
• A PROC SQL table is aSAS data set. In SQL terminology, col~hlns arc variables and rows
arc observations, l' I•I 1i ;I
i ;: ;
:1
i'
i
1 iI
.:ii
10v nxamp1e V,'f J.I I i:.11ap1er u
, .
:1 I
I
ATTR
Colu.'M\ Colll!lUl Column
OBS Typa N11111e Length
1 char MO!iTll 5
2 char CROP 5
3 char IWIKBT 9
4 num LAS'!' 8
5 mun LOW 8
6 num HIGH 8
. I
Program !I Look at your SAS log to determine tqat you need to use the TYPE, NAME,
i
!. and LENGTH columns from DICTIONARY.COLUMNS to get the type,
name, and length of variables in WO~K.PRICES. In addition, use the
i LIBNAME and MEMNAME colunuisI to subset DICTIONARY.COLUMNS
'i to produce information about the columns in WORK.PRICES, only. Lastly,
l I.
::1 order the table so that the character columns and the numeric columns are
I
1i
Select tlze appropriat cplrmms.
!I
I '
select type, name, length
,Ii ;
Name tl,e dictio11ary fable to query. You do !from dictionary. colwnns
not have to assign the ~-;~ref
DICTIONARY. • •11 :
'Iii
I
Order the data values by variable type. order by type; i
quit; I:
·11 i' i
I
I 1:1
Ii:·
i;
i
I:
' i;I
1:1
I.
,I1:1
il
I!:!;I ,
I
i ,; I
Utilities a11d F1111ctio11s □ Wliere to Go "ron, H
, 1:: I
J• ere 157
! 11'.1
Ii
':I
i
1
:1
ii
ii
:1
!i
:1
:I
I 1:11
1: i
Li
II'I ;.
l.1 i
11 •
1'1
q, !
I l
:I: l
. :i ;
' :i !
•1
I,
I!
H;I
Ill:1 I
l!.I :
:j l
(rl :
ir
1:i :
i :.
.
I!
,i .
'
ii '.
1
11 '
ii ;
i i
I!!:
100 l!,)CUlllpte 0,.) J jJ l.,ll(lpter U
''iiJ:·tI !'
Example 6.$] Sorting Variable Valu~s within .an
·I 1,!:1
. Observation (Bubble Sort)
I
11' I
·1 :1 .
',,
i
I
I,,
Goal '
I': Sort the values of variables within ad observation.
, Ii I
I
:I !i,,
I
Strategy To sort the values of variables within an observation, use a technique called
bubble sort. Create an array that conthlns the variables that you want to sort.
Then use nested DO UNTIL and itedtive DO loops to compare the value of
each variable in an observation with the next variable value until all have been
compared and placed in ascending or~er.
1
For an enhanced version of this prog~am that will increase efficiency for
processing larger data sets, see "Rela~ed Technique."
i
i
1 3 1 5 4 6 2
2 9 8 6 5 7 4
3 3 2 1 9 0 7
4 8 2 6 4 0 1
5 5 7 4 3 8 2
Resulting Data_]]e~s
Output 6.5a VARS~~TiData Set VARSOR'l'
I
iiiI ,
I
:1'
Output 6.Sb VARSOR!f2Data Set
VARSOR'l'2
i''I1, '• ODS CODE1 COOB2 CODE)
I
CODR4 COD&S CODB6
VARSORT2 was prodp:ced by a technique I
shown in "Related Tedlifique" that requires 1 1 2 ~ 4 5 6
more coding but that i~ p10re efficient for 2 ~ 5 6 7 8 9
larger,~ata sets. • i ti 3
~
5
0
0
2
1
1
3
2
2
4,
3
4
5
7
~
7
9
8
8
:1
i:I
i'l
1:1
I!
i
.
J:
I,
:!
: l
'1i:
r
lI:I:·1
I
I
.,!
-
0
•·-~-~ -••-'"'••••-••:-"'""'I ._. .-.-uu•y•~ lJoJ l~W
: ! I
: i I:
Program The obje_ctive is to reorder t~e values ?f CODE! ~hrough 1~0DE6 for each
observation so that they are.in ascenduig order. First, cre~~e. the CODE array
to contain the values for these six variables in each observ,ition.
' I
Use a
DO UNTIL loop that iterates until the data are completelYj sorted. Within that
loop, nest an iterative DO loop that iterates five times, 6nq6/or every -
comparison that needs to be made (CODEl to CODE2, aHdso on). This DO
loop mak~s the compariso~s by process.ing the COD~ a~r1 V~lues are
reordered 1f the next value m sequence 1s larger than 1ts ,nru:ned1ate
~f
predecessor. ; i' I'
! i I'
Create VARSORT. Defi11ean·ay CODE. data varsort(keep=codel-code6);
11
Read a11 observationfrom ONE. array code(*) codel-code6;
set one; ii:
: ,,I:
liJ Begi,z a DO UNTIL loop that iterates do until (sorted)/'
1111til all of the
variable values withill a1, sorted=l;
observation have bee,z sorted. Set
SORTED to 1, SORTEDwHl beset to 0
each time the DO group executes to reorder
values. When that code does not execute,
the array is already sorted. In that case,
SORTED will remain 1 and prevent the DO
UNTIL loop from executing again.
Compare eaclz val11e of a11 elemellt i11 Ille if codeli) > code(i+l) then
CODE array (values of variables CODEl do;
througl, CODE6) wit!, tlze value of tile temp=code(i+l);
next variable. If the first element is larger, code(i+l)=code(i);
reorder tlie values alld set SORTED to 0. code(i)=temp;
The variable TEMP holds an array element sorted=0;
while you assign the larger value to the
end;
second element and the smaller value to the
first element. SORTED is set to O so that end;
the DO UNTIL loop continues iterating. end;
This DO group only executes when a value
is greater than its immediate successor.
After all values are in order, this block does
not execute. SORTED is, therefore, not
reset to 0, causing the DO UNTIL loop to
stop._
rcu c..w111p1e o•., J! !,,'I 1.11ap1er
;
u
. ,!, '
_I
. !I!:
I •j ,
I l\
i I
I
I
Related Techni:que If you are sorting a small data set, the: technique described in "A Closer Look"
' -
i is simple and useful. But if you are sorting a larger data set, the gain in
i
; iI efficiency can make it worth the effo1!t to limit the comparisons performed to
only those that are necessary. Set the upper bound of the iterative DO loop that
I
, I compares values and switches them v.;hen necessary so that only pairs before
i
!: the last pair switched are rechecked. i
First, create two additional variables, HBND and MOVEHIGH in this
example, that you can use to prevent the iterative DO loop from rechecking
pairs unnecessarily. Use HBND to control how many times the DO loop that
compares pail's of values iterates. Initi:auy set HBND to the highest number
necessary, the next-to-last element in the array. In the DO group that switches
I
values when necessary, set the value of MOVEHIGH to I, the number of the
iteration and, therefore, of the element in the array being processed. Use that
value to reset the value of HBND. Th~ next time the DO loop iterates, it will
not check more pairs than are necessa~y:
I
I
data varsort2(keep=codel-code6);
array code(•) codel-code6;
set one;
hbnd =-dim(code)-1;
do until (sorted);
sorted=l;
do i = 1 to hbnd;
if code(i) > code(i+l) then
do;
temp=code (i+l); ,
code (i+l) =code(i); :,-
code [i) =temp;
movehigh=i;
sorted=O;
end;
end;
hbnd=movehigh-1;
end;
run;
Utilities a,rd F1111ctl~11~ j I□ Example 6.6 161
: !1' 1
:i!
Example 6.6 Creating Equal-Sized Random Sam'.ples and
Producing Equal-Sized Subsets or Ii ..
Exact-Sized Subsets 1:
i''
I 1::
; Ii
Goal Create equal-sized subsets ~rom randomly chosen observ.'at. ~tn. s from a data set.
You can also create exact-sized subsets.
'
;
I
1rI .
Note: You can create equal-sized subsets only if the nhihber of observations
is divisible by the number of subsets you want to create. j 11 1i
; 11
: t,
1 :i ' '
Strategy Create a new version of the data set by adding a new varia~l~ v.ihose values are
randomly generated with the RANUNI function. Sort th~ ~~w data set based on
the values of that variable. Then read the sorted data set ~nd 1calculate the value
of a new variable for each observation, based on the remiiHder of the current
, I i I:
[ ij!
' ·1 :
It'
; 1:
11 ,.
'I
i!,,
ONB
OBS HAMB
I
1 Duke
2 Virginia
3 Csorgia Tech
Output 6.6c TH
TIIREB
!
OBS !NAMB
1 i Maryland
2
3 li Clemson
IJNC
0lllput6.6d
~IMPLB
OBS lWIB
1 tlllke Foxest
2 Maxylalld
3 Duka
4 NCSU
5 I/NC
i
_- - - - -·~-- - --=-~~- - ~-,. ~ - ~--=-~
~
U,l/1,/u """""'~1 j O Example 6. 6 163
Create data sets ONE, TWO, and THREE. data one two three;
Read an ohservatioufrom RANDOM. set random; 1:!
Drop JJariahles you do 11ot need i11 tlze ; 1:!
drop x class; fl
output data sets. C1·eate the variable class=mod(_N_,3)_; :1'1.
CLASS. The MOD function returns the
remainder of _N_, the number of the c·urrent
iteration, divided by 3, the number of Ii·1
subsets being created.
run; .i
i
"!
'i I
: i i
Related Technique To create a randomly selected subset of an exact size, us~ ~~~ OBS::;::; data set
option to read only a specific number of observations fro;
case5:
f
ANDOM, in this
1 i '
! ;
data simple{keep=name);
set random(obs=S);
run;
;J'l ',I :.
·11
!I
iiil
,i
l1
i
ii
, ,;
•i
·Ii,:
_, .....................T..f.i 1· - - . . . . . ., . , . ~ v ..
1! .
Program The objective is to count all occurrences of the string m; i~ !the values of
MEMNAME in data set ONE. First, use the TRANWRD !ttlnction to change
all occurrences ofll\Y in each value of MEMNAME to ah ~~persand. Assign
this value to a new variable, NEWMNAME. Use the COMPRESS function to
remove all instances of & from the value of NEWMNAMB.i Use the LENGTH
function to determine the length ofNEWMNAME both:v.iith and without the
compression. Assign this difference to COUNT, which ~n~ir··ates how many
times Jl\Y occurred in each value of MEMNAME: i I! [
Ii '
Create TWO. Read a11 observatio11from data two; !!
ONE. set one; 11
I
I
Goal Extract from a variable a character st~ing that is no longer than a specified
length and that does not end in the ~ddle of a word.
• I
:
I
Strategy Use the SUBSTR function and assigriment statements to create two new
variables from a character variable. One contains the character that would be
last in the extracted string; the other dontains the first character following the
1r
extracted string. If the last character the first character following the string is
a punctuation mark or a blank, assign, the extracted string to a new variable. If
not, use a DO loop to search backward in the string until a blank or
punctuation mark is found, and then ~riteI
to a variable the new, shorter string
that does not end in the middle of a word. 1
IU
Input Data Set:,I,,
i !I i
Variable COMMENT~ contains a character SURVEY
value. The twcnty-fir~t16haracter in each
value contains either d blank, a punctuation OBS COl•IMEm'S
1·1 '
mark, or a letter within!a word.
.'I I'
. I[,
i
I
The food was served 1n a timely manner,
I I '
1
• . od i
II 2 The service was good! Fo 1ras great!!
:I I 3 The waiter was very helpful and courteous.
;l ! 4 My chicken is great, but service is slow!!!
j 5 I love the restaurant!!! SerJice
I
is great!!
I IIi1 I
Resulting Data
I
sl':
I,
r, !
does not end in the middle of a word. Use the SUBSTR:function and an
/• I
variable NEWCOMNT. I
length newcomnt $ 20; ' I:;
!
1::
Assign values to NEXTCllAR and ' 1
nextchar=substr(comments,21,1);
CUTPT. The SUBSTR function reads a 1 cutpt=substr(cornments,20,1);
number of specified characters beginning:at
a specified location in variable J
COMMENTS.. !
,l'j •
Resulting Data· S~t
111 !
011tput 6.9 RES UL~~ f~la Set
RESUL'l'S
111
I; i
L
OBS
I
!DA'l'IMVAL
I• I
,• f i
l 19JUL94116100100
2 25DBC94:1412310S
3 01JM95123101100
4 09J~95109135101
!
Utilities a11d F1111ctio11s_ i I □ Exmnple 6.9 169
I I, I
I 11
: I,, 1
I
Program The objective is to use a SAS date value and a SAS time ✓~lue to create a new
variable that contains a datetime value. The OHMS functi8ri accepts four
numeric values that provide values for date, hour, minute,l~rid seconds,
respectively. It returns a single value in the form of a SA.SI tfatetime value. In
this example, the DATEVAL variable supplies the date v~ltle and the
TJMEVAL variable supplies the time, which is stored as: ~~!integer
representing the number of seconds since midnight: 1 i I.
I.
Create RESULTS. Read a11 observatioll data results(keep=datirnval); i
from ONE. set one;
Where to Go from Here :□ SAS datetime values. For a complete discussion of how:SAS handles
time, date, and datetinie values, see "Using SAS Date ~fi
Time Values"
on p. 85 and "Understanding SAS Date and Time VaIJ~s" on pp.
129-131 in SAS Language: Reference, Version 6, Fii·s~!Edition.
. !I · .
d
i
i'
!i
Creating a SAS Time Value from a Character
Value '
Goal 1ii'
;j ::I
i
Read a character value to create a SAS time value.
i
!
,am i
Strategy Converting a character value to a SA~ time value is a multistep process if the
value is not in a form that can be read with an existing SAS time informat or
function, such as the TIIYIEn. inform~t or the HMS function. If the data values
are not in such a form, you must firs(create a picture format that is in the form
expected by the SAS informal TIMEl 1 (or HMS function). Then use a series
of assignment statements that contai~ INPUT and PUT functions to transform
a character value into a numedc value and then into a SAS time value.
H
'i
oas
TXHB2
i
SAS'r~MB
!
'l'IMB.CIIAR
1 0100133;'9 33,49
i 2 0101113:o 1:13,69
3 13:00100;33 13100100,33
',. 4 1:13:43l45 1113143,45
;
!
Program The objective is to create a SAS time falue from the character value
TIMECHAR in data set TIME!. First, 'use the PICTURE statement in
PROC FORMAT to create a user-defi~ed format that you will use later to put a
value into the propel' form for a SAS t(me value (ltours:mim,tes:seconds) so
that it can be read with the TIME! 1. informat,
I
Use the COMPRESS function to remoye the colons from the values of
TIMECHAR. Then read this value wit~1 the INPUT function and a numeric
informat to return a numeric value thatican be written with a picture format.
Use the PUT function and a picture fo~mat to write the numeric value to
variable TEMP2, so that it can be read 'as a SAS time value with the TIME! 1.
inforrnat and assigned to a variable named SASTIME. As an example, a value
that contains only seconds and tenths of seconds (33.34), will be expanded to
contain leading zeros for hours and mi1~utes (00:00:33.34).
-••••••..,.,. ~•••- ._ ... ,,.,.,■ .,;•u , i j":"""
1
,&,,,,W0,1.Ul ■J-'10:-. Vo.IV I I I
' :i
The value that is assigned to SASTIME is not only numeiW, it is also a valid
SAS time value. For example, the SASTIME value that:p~ihts as 13:00:00:33
is stored as 46800.33. Ii ·
Create the TME.Jormat, Use the proc fomat; i!
PICTURE statement to create a format that picture tme other=' 99: 99: 99, 99' ; 1 1: !
can be used as a template for writing
run,· 11
numbers. In this case, the format will be ' Ii
used to write a value so that it can be read •·1
as a SAS time value. 11
I
:! :
:i
i
:i 1:
·l•I
, , ,:, r,xc,mp1e u.,, , f-f, 1-''"'JI'",. u
II :
! LI :
Calculating a Person'~ Age
i
I
I
!
Strategy Detennine the current age of each petson' in a data set by subtracting the SAS
data value of the date of birth from the current date. Use the TODAY function
to obtain the SAS date value of the c~rrent date. Use the INTCK function to
count the number of months between: the date of birth and the current date.
Divide the number of months by 12 t<:> produce the number of years. _Use the
MONTH function to determine if the;month of the birthday and the current
date are the same. If they are, determine if the birthday has occurred this year.
Ifit hasn't, adjust the age by subtract~ng one year.
Ii :
! !
Ii
I
Ii! 115
!:ii
A P P E N D I X 1
ii'
1) ;
: 11 :
with KEY= ! 11 !
,!;;' .:.
i
/Why Error Checking? 176 ;
' i
• I! ,
New Error-Checking Tools 176
1 1
i; !
I ; I.· :
!Example 1: Routing Execution When an Unexpected Conpiii :n Occurs 177
I ! :_1 !;
: . : i-1 :
Example 2: Using Error Checking on All Statements That'Us ; KEY= 180
! 11 :
II :
, ,o YYIIJ l!.rror 1.,,reci1i,k,:
I!I i
Why Error c:11ecking?
! :
!
!
When reading observations with the ~ET statement and KEY= option or with
the MODIFY statement, error checkii1g is imperative for several reasons. The
most important reason is that becaus~ these tools use non.sequential access
methods, there is no guarantee that rui observation will be located that satisfies
the request. Error checking enables y~m to direct execution to specific paths,
i depending on the outcome of the YO operation. Your program will continue
an4
.Ii execution for expected conditions
results occur. i
terminate execution when unexpected
i.
New Error-,:·~;, ~eking Tools
:i : Two tools have been created to make ~rror checking easier when you use the
• !. : MODIFY statement or the SET state~ent with the KEY= option to process
. ! SAS data sets: i
1
I), :'
·!; □ _IORC_ automatic vadable
□ SYSRC autocall macro.
'
_IORC_ is created automatically whe11 you use the MODIFY statement or the
SET statement with KEY=. The value:of _IORC_ is a numeric return code that
indicates the status of the I/0 operatioh from the most recently executed
or
MODIFY SET statement with I<EY;=. Checking the value of this variable
i for abnormal YO conditions· enables you to detect them and direct execution
'
!' down specific code paths instead of having the application terminate
abnormally. For example, if the I<EY~ variable value does match between two
observations, you might want to combine them and output an observation. If
they don't match, however,. you may wantI
only to write a note to the log.
I
Because the values of the _IORC_ aut9matic variable are internal and subject
to change, the SYSRC macro was created to enable you to test for specific YO
conditions while protecting your code from future changes in _IORC_ values.
Using SYSRC, you can check the valu~ of _IORC_ by specifying one of the
mnemonics listed in Table A.l. 1
i
I
I ,
Ii I
I • 111
En·or Checki11g Whe11 Usi11g MODIFY 01· SEI' with KEY= □ E:rample 1: Ro11ti11g .Exec11tio11 W/re11 n,1 U11bxp'e' ted Co11ditio11 Occurs 177
' 1., .
t1
I•
I jl
Table A.I List of Most Common
Mnemonic Value Meaning of Return Code · 11 .
Mnemonic Values of JORC_for DATA
Step Processing
_DSENMR The TRANSACTION data s~tibbservation does
not exist in the MASTER dataisbt. (This return
code occurs when MODIFY W)th BY is used and
no match occurs.) : ·1·1 !:
l 1•
.J)SEMTR Multiple TRANSACTION d~t~ ~et observations
with the same BY variable .v~l❖~J' 'do not exist in
the MASTER data set. (This :ret rn code occurs
when MODIFY with BY is u'sJaiand. l•:J, consecutive
observations with the same B,Y(:i"[?lues do not find
: a match in the first data set. In tnis situation, the
· first observation to ~ail to fin~ ~:r.atch returns
'_DSENMR. Followmg ones fe~mn _DSEMTR.)
_DSENOM . No matching observation wa~ ~opnd in MASTER
data set. (This return code occuis'.when SET or
MODIFY with KEY= finds ~oi11#~tch.)
: !1-
_SOK The I/0 operation was success\~l: (This return
code occurs when a match is fou.11d.)
lI !I!:
I I.
i !i I:
l 1·1!•
!. '
I
'" This program works as expected only if 1he master and lransactid~ hata sets contain no
consecutive observations with the same value for the common val:i~ble. For an explanation of
the behavior of MODIFY with KEY= when duplicates exist, see SAS Technical Report
P-242, SAS Softtllare: Clra11ges and E11l1a11ce11re11ts, Release 6.0B,!P~ges 4 and 8-10.
. 1,•',
1· :
f :
1; •
I
.!
ii
1;
I:
ii' ;
1··1 ·
I ,o ~At.t111p1.t: .1.; .l\UHtl~lg ~u:C:HllUlt rrllt::11 trll Ullt:J.p~Clt!tl l,UltctlUUII vc:c:nrJ
1:1 ;
11' ;
I:·'
11.!I ,
:,-F ;
I'.
4
5
"4
5
54
50
6 :6 16
·;:11-I '
!
Original Progr+i)li The objective is to update the MASTER data set with information from the
I TRANS data set. The program reads trRANS sequentially. MASTER is read
directly, not sequentially, using the N):ODIFY statement and the KEY= option.
Only observations with matching values for PARTNO, the KEY= variable, are
read from MASTER.
,1 '
Opell MASTERf01· 11p_ifate. Read a,i data master;
obser11atiorifrom TR{l./:{S. Match set trans;
obser11atio11sfrom M~S7'ER based Oil tlie modify master key=partno;
val11es of PARTNO. Ujdate
I ' •
tire quantity = quantity + addquant_,
i11formatio11 o,i QUANJ'JTY by addi,ig the run:
11ew valttesfrom T. . NSI •
,
I
l'
:1
.,ii
Resulting Log •1· !:I '.
I! :
Output A.lb Log Meiliage about DATA
Step Ending Ij:I : ERROR1 No matohing observation was foubd iD MASTER data set.
PARTN0•6 AVllQ11AllT=16 QUAN'l'l'l'Y=70 _EIUIOR....=1 _IORC'_=1230015 _ti_~2
NOTB1 The SAS SysteJD stopped processing this step becauae of errors.
This program has corrjtl~ updated one NOTB I The data set WOIU(,1111STBR bas been updated, 'lhere were 1
obsurvationa xo1tXitten, 0 observations added and O obsei:vation11
observation but stopp~~!\\'.hen it could not deleted, !
find a match for PAR1.1~ vob,e 6
-I:
: ! ;
,.,
F,lesulting
.
Data Set
11·! :
·1 q :
Output A.le Incorre1tly lJpdated
MASTER ·iii I :
MASTER
I
11;:
i'I
I.
!'1
I•
!"i
ii!
. -I
·I t
I 'i
I
i
'I
:I
-·. -· _,. __·····o .. ··-·· --.. ·o ··---·· • i. --- ...... ··-· -· - ,,,r1rr~:"""'u" ~""'""
- .......,..... ··-·· .. ··o -··•-•n•-............. ~ ....
, i! I
Revised Program The objective is to apply two updates and one addition to[MASTER,
preventing the DATA step from stopping when it does pnd a match in not
MASTER for the PARTNO value 6 in TRANS. By addiri~'error checking, this
DATA step is allowed to complete normally and producd al co1Tectly revised
version of MASTER. This program uses the _IORC_ aut?rhatic variable and
the SYSRC autocall macro in a SELECT group to check tlt'e value of the
_IOR~- variable and execute the appropriate code based whether or not a
match 1s found. :
~t
I, ;
j :, '.
Resulting Log
Output A.Id Log Message
NOH: The data eet WORX.MASTBR has bean updated, There wer~ JI II
observations rewritten, 1 observations added and O obs.·e.rvl'.altlona
The DATA step executed without error and deleted. ' ·! : i i
observations were appropriately updated i .I :
and added.
i
I ;
'\ II '11
, ;
I
ORDER data set con~~i:ns values for all OBS PARTNO QUAN'l'I'l'Y OBS PARTNO
parts in a single orde(.]bnly
,,I '
ORDER
contains the PARTN9NI'~lue 8. 1 1 10 1 2
. 1:1 : 2 2 20 2 4
'1·1 ' 3 3 30 3 1
. l;Il1 4 4 40 4 3
5 5 50 5 8
lt
6 5
7 6
DESCRPTN
1 4 nuts
2 3 bolts
3 2 screws
4 6 washers
i·I
Correctly Creat~cl COMBINE
Data Set ;I i:
Output A.Za COMBfNE Data Set
COMBillB
Note that COMBINE !~e~ not contain an 0811 PAllTNO DBS~ QUANTITr'
i
observation with the p'~RTNO value 8. 1 2 llCXOlrll :lO
2 nuts 40
This value does not ocbJr in either
MASTER or DESCRITTN'.
3 '
1 No description
bolts
10
'
5
6
3
5
6
No ~Hcdpti0n
washers
30
50
0
I
I
The objective is to create a data set th~t contains the description and number in
stock for each part in a single order, eicept for the parts that are not found in
either of the two input data sets, MAS'fER and DESCRPTN. A transaction
data set contains the part numbers of all parts in a single order. One data set is
read to retrieve the description of the ~art and another is read to retrieve the
quantity in stock. !
!
i
The program reads the ORDER data set sequentially and then uses SET with
the KEY= option to read the MASTE~ and DESCRPTN data sets directly,
based on the key value of PARTNO. When a match occurs, an observation is
written that contains all the necessary ihformation for each value of PARTNO
l!.1'1'01' !.-1/eC/Clllg WIie/i usmg MUUII' l' or ;)l!.L 111ltll Kl!.r= U J.!Xatnpte ~: USl/lg l!./'/'01' UleC/Clllg OIi All ~·tq(e(llellts 1710I Use KEY= 181
' : 11
end;utput; 1r
l'I: '
l!i
I:!,:,
1:!
-:i
1
·1.
Resulting Log Ii:
,,llli..
Ot1tp11t A.Zb Log Message
·ti!: '
This program creates an output data set but
executes with one error.
...........................,. .- . . 1:r
PARTNOa1 DBSC=nuts QUAN'l'll'lr=lO _BRROR_•l _IOllC_,.O JL=3 f
PARTN0,,5 DHC=Ho deacription QIWITI'l'Ya50 _!IIIIOR_c1 _IORC~=q,:-!ll..•6
,i! i
i '
i
i
!
ii
iiH
IJ
1:1
·I
Resulting Dali-Ir~t
OutprttA.2c Incorrectly Created
COMBINE '] ii i OBS l!ARTNO
:coMBilll!
I
DISC QUA!fl'ITr
!!11 : I
Observation 5 should not be in this data set. 1 2 BCrl!IB 20
PARTNO value 8 do~ n:ot exist in either 2 4 nut& 40
3 1 nu'ta 10
MASTER or DESC, so no QUANTITY 4 3 bolts 30
should be listed for it. !A.Jso, observations 3 5 8 No' description 30
Ii 5 No' description 50
and 7 contain descri~t!~ns from 7 6 No'. description 50
observations 2 and 6,jrespectively. '
'·I -
·11'.1 !
IqI·1! : !
I
Revised Program! To create an accurate output data set,!this example performs error checking on
·1i!i i both SET statements that use the KEY= option:
!: '
Create COMBINE. I(~~~ an obser11atio11 data combine(drop=foundes);
fro,n OIWER. Read~1i ,,, obser11atio1ifrom
' set order;
DESCRPTN, 11si11g P~:R'[NO as tlie key foundes = O;
variable. FOUNDES is !created so that its set descrptn key=partno;
value can be used latdr!lo:indicate when a
PARTNO value has a!¥atch in
DESCRPTN. •ji'I,- :'
Ii :
Take the correct collff~ ofactioll based on select (_iorc_);
whether a matclti11g.!~~1{efor PARTNO is when(%sysrc(_sok)l do;
fom1d in DESCRPT~·!The SELECT group foundes = 1;
directs execution to lheli t:orrect code based
I I · end;
on the value of _IOR<j:.J When a match when(%sysrc(_dsenom)) do;
• 'I -
of
occurs (._SOK), the v!(!Ue PARTNO in desc = 'No description';
the observation being read from
_error_= O;
DESCRPTNmatches hib currentvalue
'
from ORDER~ FOUNDES
111 '
is set to 1 to end;
lh
indicate that DESCRP[l'Ncontributed to the otherwise do;
current observation. wiibn there is no match put 'ERROR: Unexpected value for _IORC_= _iorc_;
I
!Ii
:1;1·
_,,
l'I
I~ -
I'
f
·1
,:
ti
1-1
!,l"
!-I'j
Read a11 observation from MASTER, set master key=partno;
j
usi11g PARTNO as a key variable.
Take the correct course ofaction based ~,z select (_iorc_) ; l;J l_
i,,
I''
I
!1
I,
11
·1
:1
1naex 1: I
.'. i!•!' !1
, HI
I
variables 99-101
BY-group processing examples
! lij
!r I!,I' 1 I :
Ii
defining arrays 11 i generating every combination betwee!1 <l~ta
temporary arrays, example i 34 sets with index available 86-91: Ji I!
AS clause, example 94 [ obtaining lag value of variable within B,Yj !
asterisks (**) representing e~onentiation 40 group 122-123 H',
AVG function, example 135i
I
reshaping observations into multiple I! ;
variables 144-146 ,j ,
separatmg' uruque
• ob servations room 1i,;' 1
\:i !:
B
BEST12. format 153
'
duplicate observations 111
r It
Ji
definition 158 calculated columns 1• '
limiting the number of comparisons calculation, example 40 J;'!
performed 160 f combining observations based on calculation
sorting variable values within one of varia}Jles, example 38-39 ! 1•.11:.
observation, example 158-160 . Cartesian product :. . ; 1·.
BY group examples i definition 3 8 i i: I
adding new variable containing frequency of generating every combination of observatmns
I· 1:
DY-group value 132-133 between data sets,. exa_mple. 78:-79( !
calculating percentage of BY group total for generating every combmation WJth mdex ;
each observation 129-~31 available, example 86-91 1
:i ·
calculating totals across BY group for CASE expression, example 94 ·!
producing totals 126- ~ 28 CEIL function, example 36-36
:I
I
Iii\
11 I
I".
;, 1
I: I
r!I·[
J•:
f'.
um inaex
I
,t
. specific variable value 1118-119
adding observations to end 1of data set
115-116 :
applying common operations to group of
:... ma-
See also IF-THEN/ELSE logic examples!! 1,
, Ii
variables 124-125 ;I table lookups with unindexed lookup data set,
. .. 1·
applying transactions to master data set based example 32-33 · I:! 1 I
on common variable 48-49 end of data set j:i I:
applying transactions to ma~ter data using preventing SET statement from readingft,ast
index 53-54 j end of data set 18, 20-22 '. II i !:
collapsing observations witlµn BY group into END= option, SET statement · ' • 1.
single observation 140~141 adding observations to end of data sel I:
!;
combining observations witji inexactly example 115-116 j I. I:
matching variables 20-j-22 END= variable I I• I;
combining observations witlt no common determining last observation of data set' l !
variable 24-26 ! example 18, 20, 22 i !:
comparing all observations ~th same BY error checking 176-183 , I
values 104-105 i See also ....lORC- automatic variable ;i
delaying final processing of :observations error checking for all statements using Ir
82-85 i KEY= option, example 180-183, j 1
expanding single observatio~s into multiple reasons for using 176 f'!
observations 142-143 I routing execution upon unexpected conditlon,
extracting character string without breaking example 177-179 • ii I;
text in middle of word Ii166-167 tools for error-checking 176-177 :_ -11;; :_
limiting the number of comparisons -ERROIL variable examples ! i I•
performed in bubble sort 160 applying transactions to master data usi~i 1
obtaining lag value of variable within BY index 55 ' Ii,:
group 123 I combining and collapsing observationi by I:
random matching of observations 36-37 common variable 52 . l 1·
sorting variable values within one error checking for all statements using -, :I
observation 158-160 ; KEY= option 181 ; j j;
table lookup with indexed Iobkup data set
29 I
table lookups with small lookup data set
generating every combination between 1 data
sets with index,. example 89 i II
rem?ving obse;vations from master d~t~ i 158
r
60-62 r routmg execution upon occurrence of ; !:1 I'
table lookups with unindexe~ lookup data unexpected condition 179 , t'i 1:
set 32 ! table lookup with composite index and. Iii
,_
DO UNrIL loop examples I duplicate values in data set 6B ; •I 1:
combining and collapsing ob~ervations by table lookup with indexed lookup data 1set t;
common variable 51-52 29 1 Id I:
generating every combination between data table lookup with large indexed lookuJ d~t~
sets with index available 86-91 set 71-72 · 11
: 1, I
I
processing array correct number of times
160 I . I'.I' I
I:·
removing observations from rilaster data
57-58 :
F ,/
sorting variable values within one FIRST.variable 11 · Ii
i 11 I
1
11
:I
ii II
.uns 1naex
i ; 1:: !
table lookups with. small lo~kup data set INPUT function examples j dI:
61-62 ! converting character variables to num~ric
updating master data sets ':'ith nonmissing
values 76-77 j
.variables 148-149 j
creating SAS time value from character: ·
Ii •·
IF-THEN statement 1 value 170-171 i I!
ensuring processing of lastjobservation 22 determining type of variable's content 11:i
_N_ option 43 I 150-1S1 : · ,I
IF-THEN statement examples] INTCK function, example 172 i l:l
adding values to all observations 43 interleaving
combining observations wi~h fuzzy merge See SAS do.ta sets, interleaving
20-21 . INTO clause, example 152-154
obtaining lag value of variable within BY _IQRC.... automatic variable
group 122-123 : checking value with %SYSRC autocall
preventing SET statement from reading past macro 176 :: !
end of data set 18; 20 error-checking capabilities 176 !I
processing two-dimensional! array 33 summary of use, table 10 ,
table lookups with unindextd lookup data set, _IQRC.... automatic variable examples ·j 1
index
accessing data. directly 5
unexpected condition 179 !. i
table lookup with composite index and Ii
definition · 5 1
forcing pointer to beginning of index,
duplicate values in date. set 66, 68 i
table lookup with ind.exed lookup dat~ s~ti i
example 86-90 :
index requirements for combining data sets
.28-29 ·• i Ii ii
table lookup with large indexed lookup data.
BY-group processing 11 I
generating every combinati~n between data
~et .11-12 :
I
I! :
,,· ,
sets with index, example 88 ' 11
merging data sets by commtin variable, I!
exa.mple 16 ! J 1-i
I.I
merging observations based :on common joining tables
variable, example 46 1 · See tables, joining
'II
updating data sets 9 i
indexed data set examples 1
I
applying tnnsactions to master data using
index 58-55 i K I
combining and collapsing o~servations by KEEP= option examples . !.
common variable 50-52 applying transactiops to master data usi ili:
generating every combinatioh between data
sets with common variable, example
· in~ex 54 • . ; !ii•
simulating LEAD function by looking ahead
86-91 i at observations 120-121 '! li
removing observations fromimaster data 56 KEY= option, MODIFY statement :i ·
table lookup with composite '.index nnd See also error checking
duplicate values in data :set 66-69 accessing data directly 5 ! 1;
table lookup with indexed lobkup data set applying transactions to master data using :
27-30 ,
!
I
Index 53-55 ! Ii
table lookup with lnrge inde1ed lookup data automatic creation of -IORC.... a.utomatio I
set 70-72 i variable 176 I ::
informat examples I error checking for all stateme~ts using :t .
converting character variables to numeric KEY= option, example 180-183; :! .:
variables 148-149 I removing observations from master da~a.:j ji:.
creating SAS time value fron;i character example 56-S8 : h•
value 170-171 . i routing ·execution upon occurrence of i l:i j:
determining type of variable' content 1S1 unexpected condition, example 178~ ·79
I
' i
I
J.UU 1naex
1- :i
: i
i
'
one-to-one merging of do.ta sets converting character variables to numeric
purpose and use 7 / variables 148-149
simulating LEAD function by looking ahead cr~ating SAS time value from character
at observations, example 120-121 I value 170-171
one-to-one reading of data sets 7 determining numeric vs. chru·acter
one-to-one relationships 3 i variables 153
ORDER BY statement, SQL tab le lookups with large nonindexed data
0
97 t
outer join question mark(??) format modifier, example
full 22 149, 151
!
left 30
OUfER UNION set operator
compared with SET statement in DATA
step 97
interleaving nonsorted data sets, example randdm samples
96-97 cre~ting
I
equal-sized random samples,
OUTPUT statement examples j example 161-163
adding observations to end of data set 116 random matching of observations, example
calculating totals across BY g1·oup for ;35-37
p1·oducing totals 126-127 RANUNI function
table lookups with unindexed lookup data set creating equal-sized random samples,
32-33 Iexample 161-163 .
random matching of observations, example
135-36 .
p REMOVE statement, example 56-58
RENAME= data set option
percentage of BY group total, calculating for
app}ying tr:msnctions to master data set based
each observation, example 129-131
PICTURE statement . ion co=on variable 48-49
creating SAS time value from character calcµlating totals across DY group for
value, example 170-171 !producing totals, example 127
i POINT= option, MODIFY statement converting character variables to numeric
accessing data directly 5 . 1variables, example 149
i random matching of observations, example simulating LEAD function by looking al1ead
!;
35-36 iat observations, example 120-121
L POINT= option, SET statement tably lookups with small lookup data set,
l
accessing data directly 5 ~ :example 60-61
i'
,·. accessing observations from beginning and updating master data sets with nonmissing
,,i
I
end of data set, example 112-114 1vw.ues, example 76-77
:-1 adding values from last observations to all REPLACE statement
observations, example 44-46
:I applying transactions to master d111!1 using
applying transactions to master data using
il ;index, example 53-55
I''
1 !'
index, example 53-55 random matching of observations, example
I combining observations with no common :36-'37
I varinble, example 24-25 RETAIN statement
i/ comparing all observations with same UY comparing all observations with same BY
i values, example 102-106
~alues, example 104
delaying final processing of observations,
11 RETAl/,'l' statement examples
example 82-86
I applying common operations to group of
table lookups with small lookup data set,
~ariables 124-125
I example 60-61
processing information in groups collapsing observations within BY group into
i See array processing ~ingle observation 141
I I See BY-group processing RETURN statement
! I punctuation marks returning execution to LINK statement,
extracting character string without breaking ~xample 21
1
1 text, example 166-167 ROUND function
I
Pur function examples
I
I.
I I
I !
applying formatted values to new, variable
MACROUND macro and fuzz factor
138-139
63,64 num~ric precision considerations 137-138
:! I
' I
,j I
I i
'I '
i
I
iI
i
I
r
! Index !::193
i
rounding numbers with paper-and-pencil
results, example :136-1?7
table lookup with large nonindexed daL
63-65 . . . ;, ,
IJ1
1 Ii .
without fw:z factor, example!· 138 table lookup with small lookup data set !,1 !
i 60-62 , !' ,
' . I.I :
table lookup with unindexed lookup dataiset
s 31-34 · !, !
tools for combining 9-12 1i i'
See also SAS data sets, merging SAS data sets, manipp.Iating 107-146 i !:i 1·
See also tables, joining j accessing beginning and ending observatidns
adding values from last observations to all , 11_ 2-114 ,· ' ; J:: .
observations 44-'.45 I adding observations based on specific · H
adding values to ~ll observati~ns 42-43 V!l,riable vaiue i 115-117 Ii ;
1
adding variables from transa~tion data set to adding observations to end of data set i
I
master data set 74.:..75 · 115-117 ' Ii
applying transactions to mast~r data set based adding variable 'containing frequ~ncy of: H
on common variable 48-'49 BY-group value 132-133 ! i'l ,
applying transactions to mast!n- data set using applying ~ommon operations to group o_f ·I:. j _
index 63-55 · i variables 124--:-126 i , :
based on calculated column 38-40 calculating percentage of BY group tota~ f?r :
choosing between UPDATE and MODIFY each observ,ation 129-131 i : I! I
statements 11-12 I calculating totals across BY group 126-;-1~8;
combining ,and coliapsing obs~rvations based collapsing observations in BY group into [:! I:
on common variable 50-f52 single obse1·vation 140-141 ! !• p
combining ,multiple data sets without common compafing variable ,values by looking nhea~:
variable 92-95 · i 120-121 , · . : I:·:
comparing '.an observatio~s with same BY expand,ing single observations into multiple :
variable 102-106 I observations 142-143 • I: :
concatenating 6 . ! obtaining lag value of variable within B¥· I! ·
delaying final disposition of observations group 122-123 · ! d
82-85 i reshaping observations into multiple : ! I
generating every combination ~etween data variables 144-146 1 !i •
sets with index available 186-91 rounding numbers to pencil•and:paper i 1./
generating every combination :of observation results 136-139 1,,
based on common variabl,e 80-81 separating unique observations from 11
generating every combination ;of observation duplicate observations 110-'-111 I::
between data sets 78-791 simulating LEAD function 120-121 Ii
including only matching obser~ations in subserting 108-10~ , l' ,'.
outp~t _data, example 521 subsetting based on calculated average of B~'
interleaving, 7 ·I group 134-135 •. ; 1,! 1:
interleaving based on common variable SAS data sets, merging ·• ! 1 i 1!
99-101 I See.also m?tch-nierging of dat~ s~ts ; i:1 jl
interleaving nonsorted data sets 96-98
match-merging 8
methods for combining 6-9
'
addmg variables from transactwn data setltq
master dataset 74-75 : •
applying transactions to master data set ~a:J,d
}r
no common variable 24-26 on common variable 48-49 . i 'I: i I:
one-to-one merging 7 calculating percentage of BY group total ·ror, I:
one-to-one reading 7 , each observation, example 129-131 : 1•
processing information in grotips 11 SAS data sets, merging · : ;i '
random matching of observations 35-37 comparing all observations with same BY .!
removing observations
. from
. m'aster
I data set values 102-106 · i :j
56-59 , [uzzy merge of data sets 18-23 ' I! .
table lookup using composite ihdex 66-69 · match-merging 8 !II
table lookup 'with indexed looktip data set merging data sets by common variable i 1. •
21-30 .. I 16-17 I ·
1
table lookup with large index~ lookup data merging observations based on common I i,,I
set 707 72 variable 46-4 7 · i!
!·j
l
Ii.I! I
ii
!Il(
I' ~
i'
r!
I Ii
1naex
! i i
error checking for all statements using table l~okup with unindexed lookup data set
KEY= option 181, 182 si.:.34
error checking with ....Iorrc_, automatic variable vnlues do not match exactly 18-23
variable 53..:.55 _SOK r~turn code
generating every combination between data error bhecking for all statements using
sets based on·common variable 86, KEY= option 181-183
89-90 ' purpose, table 177 ,
routing e."Cecution upon occurrence of routink execution. 'upon occurrence or
unexpected condition 179 unexpected condition, example 179
I
;i·
·!
. Index IIliss
, ,.; I'
sets
'
sorting requirP.ments for manipulating data
!
string function examples i
counting occurrences of string within i I j·
IPri
!'
See also ORDER BY statement, SQL variable 164-165 ! J:l 1,
BY-group processing , 11 ) extracting character string without brealdng
calculating totals across BY group, example text in middle of word 166-167: ':! j1
126-127 I subquery, SQL WHERE clause 59 •: i;
collapsing observations within BY group into subsetting data sets '. 11•
single observation, example 140 See SAS data sets,·subsetting I! I:
combitrlng and collapsing obkervations by See ~ERE clause, SQL : 1:1 11
common variable, example 50
I
SUBSTR function, example 166-167 ' l'!r I:
merging data sets by common variable, SUM function examples : i, I:
example 16 ' I.
. calculating percentage of BY group total '(dr
•
each observation _131
I l l I·
i j j !!
separating unique observations from
duplicate observations, Jxample 110 combining multiple data sets without variable
updating data sets , 9 ,I
I' comm.on to all 92-95 ! Il I!
sorting variables within observation SUM statement examples ! 11 I:
adding to data sets based on specific v~rilil~e
See bubble sort . · -1'. .
value .119 ,,.·: :I I1j iI
SQL procedure
calculating totals ·across BY group for i Ii :
conditions affecting PROC SQL optimizer 18
summary ~f use, table 10 1.
producing totals 1~6-127 : I! :
collapsing observations within BY group fato
SQL procedure examples , 1
adding new variable containing frequency of
single ob~ervation 141 . i l'l'I;
SYMPUT routine, example 83-84 : i:1 ;
BY-group value 132..:133 %SYSRC autocall macro · : 1: i I;
calculating percentage of BY group total for applying.transactions to master data us)n~ Ii
each observation 131 !
combining' multiple.data sets_ ,without variable
index, example 53-55 ! I:
j;i
checking val~e o_f:._--'.IORC,_ automatic : -1:i !
common to all 92-95 I variable 176 : , :i
combining ·observations based on calculation remo~ng observations from master data,:!
.of variables 3~-40 ! example 56, 58 !I !jI·
combining observations with inexactly
.
summary of use, table 10 . i ,!
I
I
. I
small lookup data set 60-62 1 ; '
table lookups with composite index and
duplicate values in data s~t 68-69
tables, joining . · !:
Ii i
• . I combining multiple data sets without va:ri~ble
table lookups with indexed lookup data set
30 .. ' [ common to all, example 92-95 i
I: iI;
combining observations based on calculli.ti~1·:
table lookups with large inde~ed data set 72
table lookups with large notrlridexed data
of variables, example 38-40 ] i • I
l comparing all observations with same B?-71 [ '.
set 65 I values, example 105-106 ! •1· 1l.:
table lookups with small lookup data set 62 · full outer join 22 ! ·'
SQRT function, example 39 i fuzzy merge of data sets, example 22-23. :i I:
STOP statement · : generating every combination of observatiops
explicitly stopping DATA stepi example 21,
55 i
bet~veen data sets,. ex~mple 78-79 ;J: ! i!'.
generating every combmation of observation·s
stopping execution after unex~ected ---1ORC,_ with common column, example so+ 8~1 : •
condition 29 interleaving nonsorted data sets, example I;i
stopping execution after unexiiected ---1ORC,_ 96-98 · 1 !
w 6
l
,:J
I' I•
,!
I
WHERE clause, SQL : I !11!
BY-group processing with MODIFY
statement 11 : Special Characters Ii I:
•• (asterisks) representing exponentiation j ~~
subquery 59 ·
WHERE clause examples 1.1 ;,
?? format modifier, example 149, 151 : I' I 1
combining multiple data sets without variable
common to all 92-95 ! I
I 1:
i Ii
,,[
i l,l 1l
rl !:
i11:
r: 1
ri1:
Your Turn : 1:i
.I,·
; lill!
j 1i1 :
i 1:1 •
. I' .
For suggestions about the software, please return the phot copy to
: I:: i
SAS Institute Inc. : ,,i '
Technical Support Division i
.i
SAS Campus Drive
I
Cary, NC 27513 !
email: suggest@unx.sas.com
; 1:11=
~dditional D.ocumentation
!
I
For a complete list of SAS0 publications, yoJ should refer to the
current P11blicatio11s Catalog. The catalog is produced twice a
year. You can order a free copy of the catalog by writing, calling,
SAS Institute.Inc.
Book Sales Department
SAS Campus Drive
!~,
; Iii 11:
; 1!1.,
Telephon~: 9-677-8000, then
press 70Q_l ,
· Fax: 919-,677r:8166
or faxing the Institute (or, access the on line version of the Cary, NC 27513 E-mail: sasbdokl @vm.sas.com
P11blicatio•1s Catalog via the World Wide Web): WWW: hltb:)/www.sas.com/
''
'
I 1:1 I:
II Ja~obs III, Charles A. ( 1992), "DATA St~p Programming Ill SAS' Gulde to the SOL Procedure: Usage and
Using the MODIFY Statement," Obse11•atio11s, 2(1), Reference, Version 6, First Editlori' . .... (order #A56070)
10-42 ................................ !.... (order #A56305) fully describes the SAS System's impl~1l,entation of
provides numerous examples and an in-depth explanation of Structure Query Language. It illustrates SQL through simple
the MOOIFY statement. · ! queries and provides documentation:oh: ~dvanced features.
• i 1i!j'
1-,'
ra "MODIFY and Indexes: Beyond the ~aslcs," Im SAS' Software: Changes and Enhancements,
in "SAS Technical Tips," (1994), SAS Co111)111111icatio11s, 20(1). Release 6.10 ............. ·..... ,i. ,l:J jL .... (order#A55l20)
i documenti: the changes and enhanceri1~rit~ to several SAS
1111 SAS" Programming Tips: A Guide to Efficient software products for Release 6.09. ) Iii Ii
' ,,,!!
SAS' Processing ................... .! .•• (order#A56150) I 1!i 1,
provides mo1·e than 100 tips for improving the efficiency of DI SAS'" Technical Report P-252, s4~ Software:
your SAS programs. : Changes and Enhancements, ; Ii!
· I I
I:
I Release 6.09 ................... ; {/,:: .... (order#A59l69)
Ii SAS• Language and Procedures: U~age, documents the changes and enhancemeil~:S to several SAS
Version 6, First Edition ............. l ... (order #A56075) software products for Release 6.09. j 1·:11:
I 1
SAS• Language and Procedures: Usage 2, j ·1 ,;
yersion 6, First Edition ............. j ... (order #A56078) m SAS"'Technlcal Report P·242, SA.!P Software:
Provide 1ask-oriented ex.amples of the majpr featu1·es of base
SAS software. · j
Changes and Enhancements, ! 1 ! ' "II
.
1:
Display Manager System; the SAS Text Editor; and nny i!lll SAS9 Technical Report P-222, Changes and
other element of base SAS software excep\ for procedures. Enhancements to Base ·sAg SottWare,
I Release 6.07 ................... ·.,. (:l~i .... (order#A59139)
Ill SAS'" Procedures Gulde, Version 6, ] provides the latest features and change{i : Release 6.07 of
Third Edition ....................... -!- .. (order #A56080) base SAS software. !i,.,i ,I
provides detailed information about the procedures available 1!1'
in the SAS System. j
I 1l
I
1,i
Ill Getting Started with the ·saL Procedure, 1:!
Version 6, First Edition . ............. ! •. (orde1· #A55042) j:i
introduces users to the SQL procedure, a bJse SAS procedure ,:1
that implements Structured Query Language in Version 6 of
the SAS System . ::i
I:,
I
, ;,---- ------------- · o•, • •··· 1 ~ 1F - - -.- - - . ·-·• -,,--.-:,- - -··-·- •1r. -·•- . ._- - - -·--~ ·
,,, ... ,.,.,,..,,""'-'W''"' ., . ..-., ..
!.,- ,,, •...•.. ~..... ,.,,,._, ....... t,.,,.-,•
•. -11:i;;~.
cjj•<•'A··, J'i.,,sAS-lnstiti:Jte
1-, . . . . . .·
ln~··•·n··.....a:, . :,.•- .,, .. ,.,. ,J':. ,-.,..:.",~:,"....:.......,.. ,.,.,.~ •.. :., ...........,.,,;,., .. ;........ }j
, _, ·, .. ......-.,:......... ,. •~.•-·····••''-'"';.... ,.-.. 1.•••,... ,- .. 1J .. ·,..... , . ,.•; .. ,,,." •. ,-.,1 ••,1~ .., .. ,~ .. ,,,.1,--: -f
i · f~ : .1. I SA~H;.amp·u,s Drive
~
.:) ' :' -· ·., .· . _: , : t\
. · ~• ® ;; Ca.ry, NC,27513 .. / ·.,. · . · t:,[
:~"!: ~~i
_ -.): <1A
"t;.i" ·.;1g
.:.;::rl
t~
::.:h1
•::,·"!4
(I
.'.)~.: .. :
.;i.,:
-
:-: .. ·
· · ■. i,se ~gfyupproc_e"ss,itig e.[fecfftJ(!.ly_. . L ( :ft·
·i.. • i·
!(·'.
·:;.
h·. ·. )'.·... ,. __
;.:
~:.
r
!; : ·., \.~
'I !I .
ISBN 1-55544r220-X