Combing and Modifying Sas Data Set

.-,. .
'\
-\,
''I,,"
.f~~i~:,1!d~~,~::
-s~$"'·Onli;~e sa~p1es
I r
SAS Online Samples ~nables you lo download the sample programs from many SAS books by using one of
three facilities: Anonymous FfP, SASDOC-L, or the World Wide Web,
. I ,
Anonymous FT~ sample programs. You also receive
Anonymous Fl'P enab/es you 10 do1\-nload ASCJI files and binary files notification when sample progrnms
(SAS dnl:1 libraries in 1ransport format). To use anonymous Fl'P, connect from n new·book become available.
to FTP.SAS.COM. One~ connected, enJer the following respo11ses as you To use SASOOC-L, send e-mail, with no
are prompted: i: subject, to LISTSERV@VM.SAS.COM. The body
of the messnge should be one of the lines listed below.
Name (ftp.sas,ci:>m:user-id}: anonymous
1 ! To subscribe to SASDOC-L, send this message:
Password: <YO~r ~-mall address>
SUBSCRIBE SASDOC-L <flrstname ·1astname>
Next, change lo the pu~lications directory:
To get general information about files, download the file INFO by
>Cd pub/publlcati~ms I . sending this message
For general informatioj1 about files, download the file Info:
I ' GET INFO EXAMPLES SASDOC-L
>get Info <target-filename>
I ' To gel n list of available sample progrnm~, download the file INDEX by
For a list of available sample programs, download the me Index: sending this message:
I ;
>get Index <target-filename>
I . GET INDEX EXAMPLES SASDOC·L
Once you kuow the narne of the lile you want, issue n GET com1111111d to Once you know lhe name of the file you want, send the message
download the file. Note:I Filenames nre cnse sensitive.
. GET flle11ame EXAMPLES SASDOC·L
To download ... Issue this command..,
World Wide Web
compressed ASCII file 1· >get ll/e11ame.Z <target-filename>
The SAS Institute World Wide Web information server can be accessed al
ASCII file >get filename <target-filename> rhe following URL:
I
binal'y transport file I >binary
.I >get flfename <target-filename> http! //www.sas.com/
i The sample programs are available from the Support· Services portion of
SASDOC·L I ' the lnstitute's server.
I '
SASDOC-L is a listse,·v mnintnined by the Publicatio11s Division nt
SAS Institute. As o sub~ci·iber, you cn11 request ASCII files that contain
i I
i
I .
i !
i
I
I
'. i
: !
II
. '. I
,!I
Combining and Modifying' $AS· Data
Sets: Examples .!
' ;
Version 6
First Edition
/Alf.
®
SAS Institute Inc.
SAS Campus Drive
Cary, NC 27513
I
I
I,
The con-ect bibliograp!J ccitation for this manual is as follows: SAS lnslilule Inc., Combi11i11g a11d
Modifyi11g SA.$4' Data 's~t:r: Examples, Versio11 6, First Editio11, Cary, NC: SAS Institute Inc.,
1995. 197 pp. 11 i
I !
Combining and Modifying SAS,. Data Sels: Examples, Version 6, Fir.st Edition
I 1
1
Copyright© 1995 by ~~S Institute Inc., Cary, NC, USA.
ISBN 1-55544-220-X I:
Ii
All rights reserved. Printed In the United Stales of America. No part oflhis publication may be
reprod11ced, stored ln a;¥1rieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopy/ilg, or otherwise, without the prior written permission of the publisher,
SAS Institute Inc. !
I:
:I:
Restricted Rights Lege~d. Use, duplication, or disclosure by the U.S. Government is subject to
restrictions as set forth ,~'subparagraph (c)(l)(ii) of the Rights in Technical Data anii Computer
Software clause at DFARS 252.227-7013.
III
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
11:
1st printing, August 199S:
.The SAS• System is an:Iintegrated
Ii system of software providing complete control over data access,
management, analysis, ~bd presentation. Base SAS software is the foundation of the SAS System,
Products within the SAslsystcm include SAS/ACCESS~ SAS/AFil! SAS/ASSIST'/
SAS/CALCI/ SAS/CONNE~ SAS/CPE~ SAS/DMII/ SAS/EIS~ SAS/ENGLISHil!
SAS/BTSa: SAS/FSP'/ jS./\,S/GRAPHil! SAS/IMAGE'/ SAS/IML~ SAS/IMS-DI.JI~
SAS/INSIGHTil! SAS/LAB'/ SAS/NVISION'/ SAS/OR"? SAS/PH-Clinical'/ SAS/QCil!
SAS/RBPLAY-CICS": S~S/SESSIONil! SAS/SHARE'/ SAS/SPECTRAVIEW~ SAS/STA~
SAS/rOOLKITl/"SAS1i1:RADERil! SA~/rUTOR~ SAS/DB"2: SAS/GEO: SAS/GIS:
SAS/PH-Kinetics , SAS/SHARE*NET , and SAS/SQL-DS software. Other SAS Institute
products are SYSTBM 20QOG> Data Management Software, with basic SYSTEM 2000, CREATE:
• "" "" 11: , .. •
Multi-User, QueX, S~~en Wntcr, and CICS interface software; InfoTap• software;
NeoVisuals• software; ~MP"1 JMP IN~ JMP Servel/ and JMP Desig11• software; SAS/RTERM,.
software; and lhe SAS/cj:f:compiler and the SfS/CX-a Co"mpiler; VisualSpace" software; and
Emulus• software. Mulli}';endor Architecture and MVA arc trademarks of SAS Institute Inc.
SAS Institute also offeril S,AS Consulting'/ SAS Video Productions\!/· Ambassador Select'/ and
On-Site Ambassador~ s~h•ices. A11thorli11e"1 Books by Users~ The Encore Series:
JMPer Cable~ Observa/i011st: SAS Co111111u11ications~ SAS Trai11i11g~ SAS View&~ the
SASware Ballot~ and S~l~cText documentation arc published by SAS Institute Inc. The SAS
N
Video Productions logo ~~d the Books by Users SAS Institute's Author Service logo are registered
service marks and !he H~lplus logo and The Encore Series logo are trademarks of SAS Institute
Inc. All trademarks abo~e 'are registered trade.marks or trademarks of SAS Institute Inc. in the USA
and oilier countries, ® indicates USA registration.
jll
,1i
The Institute is a privatelcl?mpany devoted to !he support and further development of its software
and related 11ervices, 1
I1 !
I I
Other brand and productln~mes are registered trademarks or trademarks· of their respective
companies.
Doc P4, l 1JUL95

Contents
Credits vii
Recognition Ix
CHAPTER 1 • An Introduction to Data Relationships Access Methods,

and Techniques for Data Manipulation 1
Overview 2
Data Relationships 2 I:
i
Access Methods: Sequential versus Direct 5 :
An Overview of Methods for Combining SAS Data Setd 6
An Overview of Tools for Combining SAS Data Sets 9
Where to Go from Here 12
CHAPTER 2 • Combining Single Observations with Smgle

Observations 15 . I I
Example 2.1 Merging Data Sets by a Common Variabl/~pecifying Their
Origin, and Replacing Missing Values , 6 1
1
I I,
Example 2.2 Combining Observations When Variablei"ialues Do Not Match
Exactly 18 i 11
Example 2.3 Combining Observations When There Is Np common
Variable 24 · 1 I
Example 2.4 Performing a Table Look.up When the LookJp Data Set is
Indexed 27 , I !
Example 2.5 Performing a Table Lookup When the Lookup Data Set is Not
Indexed 31 ·
Example 2.6 Matching Observations Randomly 35 1
Example 2.7 Combining Observations Based on a Calculation on Variables

Contributed by Two Data Sets 38 : 11
CHAPTER 3 • Combining a Single Observation with Multiple

Observations 41 : I
Example 3.1 Adding Values to All Observations in a Data Set 42
' • . 1.
Example 3.2 Adding Values from the Last Observation in a Data Set to All
Observations in Another Data Set 44 ; 11
Example 3.3 Merging Observations from Multiple Data Sets Based on a
Common Variable 46 i 11
Example 3.4 Applying Transactions to a Master Data Set Based on a
Common Variable 48 ! !I
Combining and· Collapsing Observations Based on a Common
Example 3.5
Example 3.6
Variable 50 · IJ
Applying Transactions to a Master Data Setil sing an
Index 53 ! I
Example 3.7 Removing Observations from a Master Data Set Based on
Values in a Transaction Data Set 56 ; I. I
Example 3.8 Performing a Table Lookup with a Small Look.up Data Set 60
! 11
I .
I
,I
i
i
IV l,Olllems .I I,
I
Example 3.9 Performing a Table Lookup with Large Nonindexed Data

Sets 63
Example 3.10 Performing a Table Lookup Using a Composite Index When
the Transaction Data Set Contains Duplicate Values 66
Example 3.11 Performing a Table Lookup with a Large Lookup Data Set That
Is Indexed 70 ·
CHAPTER 4 o Combining Multiple Observations with Multiple

Observations 73
Example 4.1 Adding Variables from a Transaction Data Set to a Master Data
Set 74
Example 4.2 Updating a Master Data Set with Only Nonmissing Values
from a Transaction Data Set 76
Example 4.3 Generating Every Combination of Observations (Cartesian
Product) between Data Sets 78
Example 4.4 Generating Every Combination of Observations between Data
Sets Based on a Common Variable 80
Example 4.5 Delaying Final Disposition of Observations Until All
Processing Is Complete 82
Example 4.6 Generating Every Combination between Data Sets, Based on a
Common Vw"iable When an Index Is Available 86
Example 4.7 Combining Multiple Data Sets without a Variable Common to
All the Data Sets 92
Example 4.8 Interleaving NonsortedDat11: Sets 96
Example 4.9 Interlea-ving Data Sets Based on a Common Variable 99
Example 4.10 Compaling All Observations with the Same BY Values 102
CHAPTER 5 • Manipulating Data From a Single Source 107

Example 5.1 Performing a Simple Subset 108
Example 5.2 Separating Unique Observations from Duplicate
Observations 110
Example 5.3 Accessing a Specific Number of Observations from the
Beginning and End of a Data Set 112
Example 5.4 Adding New Observations to the End of a Data Set 115
Example 5.5 Adding Observations to a Data Set Based on the Value of a
Variable 118
Example 5.6 Simulating the LEAD Function by Comparing the Value of a
Variable to Its Value in the Next Observation 120
Example 5.7 Obtaining the Lag (Previous Value) of a Variable within a BY
Group 122
0
·
Example5.8 Applying Common Operations to a Group of Variables 124 -r-
Example 5.9 Calculating Totals across a BY Group to Produce Either a
Grand or Cumulative Total 126
Example 5 .1_0 Calculating the Percentage That One Observation Contributes
to the Total of a BY Group 129
Example 5.11 Adding a New Variable that Contains the Frequency of a
BY-Group Value 132
: I· L·ontents v
! 1:
; j·
Example 5.12 Subsetting a Data Set Based on the Calculatei:I !A.verage of a BY
Group 134 : :
: :
Example 5.13 Really Rounding Numbers 136 i :
Example 5.14 Collapsing Observations within a BY Group 1ilto a Single
Observation 140 i I!
Example 5.15 Expanding Single Observations into Multiple!
Observations 142 : . I
Example 5.16 Reshaping Observations into Multiple Variable 144
CHAPTER 6 • Utilities and Functions 147

Example 6.1 Converting Variable Types from Character to lumeric and
Vice Versa 148 : :
I
Example 6.2 Determining the Type of a Variable's Content: 150
' I
Example6.3 Determining Whether a Variable is Charact~r :~r Numeric 152

Example 6.4 Creating a SAS Data Set Whose Variables Coriiain the
Attributes of Variables from Another SAS Dat~jset 155 <1--
Example 6.5 Sorting Variable Values within an Observatioh (Bubble 1
Sort) 158 ~ I I;
Example 6.6 Creating Equal-Sized Random Samples and Pro~ucing
Equal-Sized Subsets or Exact-Sized Subsets: 1161
Example6.7 Counting the Occurrences of a String within ,tli~ Values of a
Variable 164 · j I!
Example6.8 Extracting a Character String without Break{ng he Text in the
Middle of a Word 166 I ,j
Example6.9 Creating SAS Datetime Values 168
i i
Example 6.10 Creating a SAS Time Value from a CharacterI V
,1
lue 170
Example 6.11 Calculating a Person's Age 172 ; I:
Appendix • Error Checking When Using MODIFY or S![ with
KEY= 175
Why Error Checking? 176 ;
New Error-Checking Tools 176
Ex.ample 1: Routing Execution When an Unexpected Conditio!• pccurs 177

Example 2: Using Etrnr Checking on All Sti}tements That Use ~y= 180
Index 185 . :!
Your Turn
vii
Credits
i Documentation
. i i
pesign and Production Design, Production, and Prirti. ,g Services

· . I! I
:Style Programming Publicat,ions Technology D~~~lopment
I I 1• I
:Planning and Prototyping Ginny Dunn, Amber Elam, Brenda C. Kalt,
Carol Austin Linden, Rick M1:tlhews, Denise J.
: I' I
!
Moorman, Lynn H. Patrick,!and Helen Weeks
: 11 I
rrogramrning Ginny Dunn, Amber Elam, Winiam F. Heffner,
• · I, I
Charles A. Jacobs, Paul M. Kent, Susan
. ' I·
Marshall, Rick Matthews, Deni e J. Moorman,
Lynn H. Patrick, Jon C. Schilti and Michael -
Williams : I:
I
:Writing and Editing
· • 11
i
i Poole, Philip R. Shelton, He.~e~1 eeks, and John
M. West ! ;
i . I
Technical Review
!
I
Ginny D,unn, Amber Elam, wn

I
iam F. Heffner,
Kevin Hobbs, Charles A. Jacob , Paul M. Kent,
Susan Marshall, RickMatthe+s~ Denise J.
Moorman, Lynn H. Patrick, 1\my S, Peters, Jon
C. Schiltz, Bruce Tindall, an~• . 'ichael Williams
i
Recognition '
I
I • I I
This book was conceived and planned by members of the Technical Support
and Publications Divisions. The SAS code was written b~jinembers of the
Technical Support Division and of Research and Developrilent. The book was
! review~d by members o{ the Technical Support, Educatid.n! and Research and
Development Divisions. : ' : I I!
i · : · I !i
Without the advice,' exp~~tise, and skill in coding from ril~mbers of other
divisi~ns within SAS Institute, this book would not hav~ bJen possible. The
Publicati~ns Division is 'grateful
'' ' ' ' '
for
the talent and expetti~J that helped create
' II I
this book and would especially like to recognize the serious commitment of
time ana'resources :on the part of the Technical Support id{ ision.
I: ' : ' ' l I
1 :
~ I
: I ! I
'i!
''
:
; I
'
I:
.i
,I
i
''
I'
I
I
':
, I
i:
:'
A~ fu;od~ction tp ~,~~ ~ei~~i9nswps, ~i~es~
Methods, and Teqhmf! i~; {o~ qat~ Mamp1lation
' '.
)f ·i
: Overview 2 I
: Data Relationships 2
II
I
One-to~One · 3 , I
One'-to-Ma,iJ• and Mm~y-tofone 3' !
; Ma11y-t:-Many. · 4:: .. j · :• .! .;
: Access Methodsi Sequenti~l vefsus Difect 5
' SequentialA'ccess, 5' !
Direct AcceSs · 5 i '
l, ·,,
' \l . -. I . {; i
An Overview of,Methods for Combining SAS Data Sets

. . : .. ..l \ ,.
i An Overview of ITools for.Combining SAS Data Sets 9
. . . . '_ ~ I ' '
Tools/01'.Piocessbig lnforn.iation in Groups ,11

;BY~Group Proces}ing /' 11 I .:
;¥il?~{ff ~~ulfrl •111: :i
. i1':';~Y/'l:'.oc~ssmg_ i 11 !: •. 1 , .
C/10osmg between UPDATE and MODIFY 11
i: ,i I:,: I I w: .
I q : i ii
1· l i !· ':: 1 ·; ,,_ it·~: : --~::, . r :i·
. .
Where to Go from Here 12 . ;_'
, I! : ., ' :' ;,, 1')1'· ,'
! ,' ! 1' 1 ·1 1: .~ . 1 :/ . ,: ~
·I,
I :
J: t,
. I! ;. :. l •: l·
. 'i .- ·
11,.- : ' ; ·1
I
.
\
I
!~ t '• : • lI .·, ~:i:'.

:
::
,·
•:'1: r
,,
.I
: I.
.,'i ,; ;
''
I
I
i
,:·
~ Vl'CWV/1<11' u l,/l(iT'i
111 '
Overview
: Many applications, including Decisi~n Support and Executive Information
: Systems, require input data to be in a specific format before it can be processed
: to produce meaningful results. Even if all of your data are already in SAS data
: sets, the data to support th1ese systems typically come from multiple sources
i and may be in d/fferen~ formats. Therefore, you often, if not always, have to
! take intermediate steps to 1ogically relate and process the data before you can
t analyze them or. create repmts from them.
! • ! I
jApplication req~irements vary, but there are common denominators for all
i applications. that access, combine, and process data, Once you have determined
iwhat you .want t~e output to look like; you must
discover ho~ the input data are related
' . : : . :
select the appropriate access method to process the input data
' . ·,. ' ' !
□ select the appropriate SAS tools to complete the task.
Data Relatid~ships
i i '.I, . ,
:Relationships a~ong multiple sources: of input data exist when the sources
'.each contain co~on ~ata, either at the physical or logical level. For example,
;employee data and department data co.uld be related through an employee ID
l . I . '. :
;variable that.shares common values. Another data set could contain numeric
'sequence numbers wp.ose partial values logically relate it to a separate data set
by observation numbe{·, Orice data relationships exist, they fall into one of four
categories: i,
D one-to-one:,
D one-to-many
l'i
D many-to-on~
. . I
i
D many-to.:ma~1y.
. .,,',
' .
are
~
The categories characterized by how observations relate among the data

sets.
I
All related
, I·
data
"
fall into one of these categories. You must be able to
1 1 ' •
identify the existing relationships in your data since this knowledge is crucial
to understanding'how input:data can b~ processed to produce desired results.
'
il:
I'
:;
1
i: ; t
i! I
I
, Ii Iii I
I I
i !
, l i
! '.
I:
1'
!
'
!. H
.It
Data Relatio11ships, Access Methods, a11d Dt1/a Ma11ip11latio11 j p! Data Relationships 3
i 1: I
One-to-One . h'tp, typ1ca
I n a one-to-one relahons . 11y a sing .I 11 • one data set is
. 1e ,o b servatlon'm
related to a single observation from another based on th~ t'~lues of one or more
~elected var~ables. A one-to-one relationshi~ implies th~t ~~ch value of t~e
selected van able occurs no more than once each data m When working:set.I
with multiple selected variables, this relationship implies ~hat each
cor:r{bination of values "occurs no more than once in each cia 1a set.
I
I
Figure I.I One-to-One SALARY TAXES [
Observations in SALARY and TAXES are
related by common values for EMPNUM.
EMPNtJl1 SALARY EMPNUM 'l'AXBRCK)
4876. .
_~:il!].
k~-Y~tt~!@l~~.,!~.~.!: -.~
32.·000 ~ ·
-.11
5489 17000 ' 4222 0 '.18 I

·· 4B76
I
o;•24 •
I
I
.;
.
I.
• . • • • •
1'
I
,
1·[
•1·
I
I
One-to-Many and A one-to-many or many-to-one relat1onsh1p between mptJtldata sets 1mphes
Many-to-One that one data set has at most one observation with a speci.fi'c 1iValue of the
selected variabl~ but the other input data set may have mb~6 1than one
occurrence of each value. When working with multiple sel~cted variables, this
relationship implies that each combination of values occur~' ii.o more than once
in one data set hut may occur more than once in the other dala set. The order in
which the input data sets are pro'cessed determines whethe 't' e relationship is 1
one-to-many or many-to-one. ,/
'
Figure 1.2 One-to-Many ,ONE 'l'NO
Observations in ONE and TWO are related
by common values for variable A. Values of
A are unique in data set ONE but not in · A ·B C ,A ' E F ·,
TWO. j
II
1 s 6 1 2· .o I
~ . 1:·i
I
I I
3 3 .. 4 ..
.
.
3 : 99 I
j
4 "BB I
i·'
! :;
j i 5 7.7 I
:!'. I 1
I
i .
2 1, 66 i
2 2 5,5
3 4 44
::' I;
I :
't l,t l
.u111a 1te1a11u11~, w Lm1p1er J
1
Figllre 1.3 One tri-Many and
TWO· THREE
Many-to-One ;I i
Obset·vations in da'ta sets ONE, TWO, and
I .~ I
THREE are related oy common values for SALES ID QUOTA
variable ID. Valu~slor ID are unique in 28000 1 15000
1. I
ONE and THREE111 but not in TWO. For
I
,30000
values 2 and 3 of ID, a one-to-many .
relationship exist~ ~e~ween observations in 5000
data sets ONE and T'}'O and a many-to-one 5 8000
relationship existJ ~;et.ween observations in
data sets TWO anti ; BREE. . :I
! ' I'
• I
4 35000
' '
I' 5 40000
: (. fi:.
i; r: . 1 ., : ,
J 1; : p; i' 1:L
: !I: '. !i;( :';::;.:' i i !
Many~to~Mal't i•The ID!1!J-Y·t07in,ny category impli~s that multiple observations from each
input data set may be related based on values of one or more common
' I
i variables. . ' : ;
!:
: l ' :
I '
1 I
I ,
Figure 1.4 Many- I I
to~Many
1
I ' · BREAKDWN: : ' !~INT
Observations in dlilA sets BREAKDWN and
I I, I
'
I
MAINT are relate~,?~ common values for

variable VEHICL1?HV:alues of VEHICLE
MNTDATE
are not unique in either data set. A
many-to-many reIJti~nship exists between' . ',,' Q~~~..94·
observations in thd~d data sets for values ·o,,~~.9,:.
AAA and CCC oflvEHICLE. ,.1Qlt,U~9·4, ,
I i
DDD 22JUN94
. ~ ... j [l ; I:
:. !1
DDD 19SEP94
.
..· :: 1,!
!~\ I r
.,
.'1 i\ ~
'' .
'.
'
: I•
'
,I
.·,1
'' l •1·' I l i
''
I:
I,
'
. '· r j .. ,n
Uata Kela/ia11sllips, Access Mel/rods, a11d Data Ma11ip11latio11 □ Access Mel/rods: Seq1ie11tial versus Direct 5
' ,: ' II I': I
. : 11
Access Methods: Sequen~i~I versus, Direct ! lj

i • I I
: Once you have established data relationships, the next step i to determine the
i best mode of data access to relate the data. You can access 1bbservations
; sequentially in the order in which they appear in the physicJI file. Or you can
! access them directly, that is, you can go straight to an ob~etyation in a SAS
: data set wit. hout h avmg
1
• to process each observat10n
. that ~rece
' 1• I des 1t.
• 1
,I .. . , I 1· 1'
,: . ,' .I II
Sequential Access IThe simplest and perhaps most common way to process dat~ with a DATA
!step is to read observations in a data set sequentially. You qar read
!observations sequentially using the SET, MERGE, UPDATE, or MODIFY
;statements. I i I
: I ! I
: ' ; '. 1. !
Direct Access
;two methods; i l
l
;
I
' :!
;□ by an observation number ! !
, I '
:0 by the value of one or more variables through a simple or composite
I
i
• d
m ex.
'
I j
II
:To access observations directly by their observation number; use the POINT=
:Option with the SET or MODIFY statement. The POINT~ dphon names a
~ariable whose current value determines which observatioh 1 1~SET or MODIFY a
~tatenient reads. · · · I 1
·1
I ,
ffo access observations directly based on the values of one ot . ore specified
~ariables, you must first create an index for the variables anti then read the data
set using the KEY= option with the SET or MODIFY stateril~nt. An index is a
of
~eparate structure that contains the data values the key vaHrble or variables
paired
.
with a location identifier for the
'
observations contain LhI the value.
j
!
I
ii
iI
: I
I j '
.I
0 /J/1 ~Vel"vtetv OJ me111'.qlai JCII' l,()I/ICJ/1/mg
I ; I I !
o,)/l~ : " " ' ,ms:' u .--·~r"/ , ' •~ \ I;
;< I : : I ,
·:. ' : I
I :I 1·
An Overview of Method_s for ~0111~ilii~:g SAS Dat~ Sets

l1 Il
1
:
.
!
!
Yo~
.'
1
c~ :s~ ~h'e!se ~~t~od; io combiAe SAS data sets:
. ; ' .·- . ~ I l '. \ : · .. I !
. □ .concatenating ' ;
~ { : •: I ·1 I : r
□ interleaving
I •. : . It
o .one-to-one reading :
. :~ : . i ~ ~ .; I . ..
o one-tp-one mergmg
□ match ~er~,i~g · · ..
;: : : . I · ·t ·: 1~ 1
□ updating. i !: :
,:: i :: !, •·· . I
Figures 1.S'thro~gh 1.9 show basic illustrations of all of these methods for
comtiirung SA~ ~atil seis. :i: ... I
. I J;. ·i ;:; · ;: 1~ d. f:;/: :; !i'. :: . !
Figure 1.5 Concate ating SAS Data Sets : I ,1 I: ;, iL .I ::1
••'; I II :' I I
· appen a' s' 1the observations
C011cate11at111g 1 •
. i' DATAl :.1· 1:', -,:: DATA2 !! ALL
!: I
.(! 'I : ii
,1 1 data set.
from one data set to a11o[t~er 1 •
YEAR' : i . i: ·:, . ;Y~AR I YEAR
DAT'Al IS Ii 11yunll'I;aII
. read SeqUef~la 1
19911 . 1\ :.\ 1991 .../
observations have beer. processed.
1 ,
: 1-----11:: .-;: ·1(
Likewise, data sets in ti1e SET statement are 1992
processed sequentiall~ i I the order in which· J·
they are listed, · ' = 1993
1994
1995
.,,/
·,,
.../
:1:
I:•
I :1 j ,
',.
'fi
:'•! ~ . ;
';,; ::;
:'i I
' . :11 11 ·
i ,,•
i- 1
,;
l
I·
.:
:::j,, ;,
·, I•' i
·lj• .:1,,
'.t ! .
. i!;
';1,:
I.• ! '! ~ < •
j;,
:ii I
,· .
.l
_ ............................,l'.., ....................... ., ...., ...... .Lol' ■•• .......... ,..l:'............,•• "1111 -rc.1 rH,fP V.J 4•,ICoHfl,4.lolJVI t••rr VIIIUl4S Olk.I LlltrQ .:lets 7
: I·!
Fig111·e 1,6 Interleaving ! ·'
/11terleavi11g intersperses observations from DATAl ALE • l•i
two or more data sets, based on one or more YEAR YEAR'.i

common variables. DATA2
;
1991 YEAR 19~11:1 ! ;
1992
1993 +
1994
·I
i
1995
,.:; ........
_.,,-·(.
/ /i : 1 _ _.data_)a11-, _
I ! .setti"i~t:al'--da:tai, '.
I _,"fCift ·•
\ :·.
'\~
--~-~
Figure 1.7 One-to-One Reading or
Merging ! DATAt DATA2
' ~VARY VARY
011e-to-011e readi11g combines observations VARX VA~
from two or more data sets by. creating
observations that contain all of the variables Xl xii
from each contributing d·ata set.
Observations are combined based on their X2 x2: I
I
relative position in each data: set, that is, the i
first observation in one data set with the · X3 + = x3:I
first in the other, and so on.
:X:4 X4i
The DATA step stops after it has read th~---\ :i i
last observation from the smallest data s~~: ,
''
XS ,,
·:
xsl I
011e-to-011e merging is the same as a

one-to-one reading, with two exceptions: I
you use the MERGE statement instead of
multiple SET statements, and the DATA
step reads all observations from all data
sets,
data ap; _ . _
, m1:1~~e' d~ta1. data-2?
~n; '!- ;'
1 I
I
i (\0
,1..--',.._,J i
~\o/.. J
''I .
o n11. vve-rvunv UJ ,r.ic:11,uu.)' Jur 1..,un10111111.g a>n.&> uu,u 13,:1~ u J
IiiI., 1
j•
Figure 1.8 Match M;erging

11 I
1_ '
'
I '
· ~veoJtc:l,t{,.
'
r~
DATA2
!
.e>'c . .-l«~o-,
,·
} y~
Match-mergi11g combines 'observations DATAl / -s:. ' ALL ....Jl
from two or more datd ~htk into a single YEAR VARX , YEAR i VARY /' YEAR VARX VARY
observation fa a new dala set based on the
I :1 .
i
,
values of one or more common variables.
j I
A·
"'
1991
1992 X2 O
1991
1991
Xl
Xl
- I
i
1993 X3 ) + = 1992' X2 - I
1994 X4 1993 X3 !
...J
1995 . XS 1994 X4
1995 X5 - '
''
,
i.
~ - 1
I
....i
_J
I
: '
- ;
' I
r-Ji0t j · ....J
-
• I
-,
I
i! .
;1'
--- ~1eJ(O!at ....J
,. I
i
i: l
i _J
..J
\. '' .... ---- -· .... :·~ -·-·-·---·---··~- ·---t···--,------··-----·--------..
-j- 0~4l \~Z - I

I
.....l
\
- '
.....
i_c>d._,_.. ......,
,
......'
-...J
Figt1rel.9 Updating
Data Relatio11ships, Access Methods, a11d Daia Ma11ip11latio11
Updati11g uses information from

I
MASTER
D
A,.~w;~ ~T-fo,C1[;;;~:
observations in a transaction data set to YEAR VARX VARY
delete, add, or alter information in 11
observations in a master data set. 1985 XI YI 11f85 Xl Yl
Note that MASTER and TRANS are both

1986 ;Xl Yl l1rs6 Xl Yl
II
sorted by YEAR. Updating a data set 1987 XI Yl 11~87 XI YI
requires that the data be sorted or indexed
on the common variable. You can update a
master data set by using the UPDATE
.1988 Xl YI ifoss
,I
Xl Yl
statement or the MODIFY statement. I I
·1989 Xl Yl. 11989 Xl Yl
I, I
Note also that UPDATE and MODIFY do. TRANS; I I
not replace nonmissing values in a master 1990 Xl Yl I

YEAR ·. VARX VARY 11r90 Xl Yl
'i"
data set with missing values-in a lransactiori .
data set. 1991 Xl Yl Jitn
1992 Xl Yl + = l&I992
1993 - XI YJ ~~93
1994 x~ Yl ~~94
•: : .:: .
I dat.i ma.s~$rJ .. ··:·
,.
'.: update master··trail~; .
.by year;): . . ·. . .
runf:
An Overview of Tools for qombining SAS Data Sets 1

1
!:!
6nce you understand the basics .

of establishing relationships l~mong data, the
. l, I
ways to access data, and the ways you can combine SAS data: ~ets, you can
choose from a variety of SAS tools for accessing, combiningt;and processing
your data. Table 1.1 lists and briefly describes the primary tob s featured in
tl1is book. ' ' !
I:i:
I
IU /111 v1•er1'1ell' UJ ,,,r~Ol'
Ill I
l-0/IIU//111/g ol/i.> UC/1(1 .Jers ; U <-1mp1er1
~j
Table I.I Tools fi 9ombining SAS Data Sets
:r '
Access Method
iI
'' !
Statement : C~n Use with
orProc Sequential Dkect : · ;;BY statement Comments
SET readsi iianI observatlon

. firom one X X Use KEY= or POINT= for
or ni6te1ISAS data sets. directly accessing data.
I:1 ··
MERGE reads;?~servations from two or X. X When using MERGE with
mor~'SAS data sets andjoins BY, the data must be sorted
I i,I I . l b. .
them mto smg e o servat1ons. or indexed on the BY
variable.
UPDATE
Ill I
appli~s ~ransactions to X X Both the master and
. ,;. I
obsel·V.ations in a master · transaction data sets must be
1
'.! I
SAS1d~ta set. UPDATE does sorted by or indexed on the
'l~I
not upuate observatlons
. .
m :I BY variable.
PlacJ{it:produces
I r'I ,
an updated i
copyl ~f re current data set.
MODIFY manipulrtes observations in a X X X Sorted and indexed data are
SAS ic_t~t~ set in place. (Contrast not required for direct access
or use with BY, bµt are
withj~;PIPATE.)· recommended for
:· I
[. : performance.
1! :
PROC SQL* read I,,an'observation
, from one X X 'I·
X All three access methods are
or more SAS data sets; reads available in PROC SQL, but
obse~~~tions from to 16 up the access method is chosen
SAS Hat~ sets and joins them by the internal optimize1·.
, 1,11 I •
into smg~e observat1~ns;
maniht1~tes observations in a
SAS aata set in place. ,
I,,,I i ' ' ; ··,
- - - - - - - - - - - - - rr:, ; - -- - - . -;- - - - . - - - - . - - - - - - - - - - - - - - - - - ) ;-1 -------------------------
BY contr?I~ !he operation ofa SET, . NA,. NA ':: j NA BY-group processing is a
MERf'.n~;_UPDATE, or means of processing
M09ft statemen~ in the observations that have the
DATtr:lstep ~nd sets, up spe~ial ,: same values of one or more
grou~mg vanables. ,i ,.; variables.
111 : '. I ':;
_IORC_** an automatic variable created NA NA NA The value of this variable is a

whenl*H use the-MODIFY numeric return code that
statement or when you use the indicates the status of the
SET Jt~tdment with the KEY= most recent YO operation that
optiotj;:I ; ' used MODIFY or KEY=.
I1: I ; '
an auto9aH macro that you use NA NA ,NA
in cotijµn~tion with _IORC_ to !
'
test fdl~pecific YO conditions.
In
,Ii
1· ;;
: ; .
. !
* PROC SQL is the SAS Sistem implementation of Strnctured Query Language. In addition to expected SQL capabilities, PROC SQL
includes additional capa\11!/ties specific to SAS such as the use of formats ana SAS macro language;
I ii' · 1, · .
**JORC_ and SYSRC a~• :·· o~umented ip ~~tail in the Appendix.
I . :
Da1t1 Relatio11ships. Access Methods, tmd Data Ma11ip11latioJ1 □ A11 Oi•tm•iew o/Toolsfor Co1l1bi1i11g SAS Data Sets 11
Tools for Processing

Information in Groups
BY~Group Processing
· . 1 •
j
When combining s4s data sets; it is often conven_ient to rocess observations
in BY-groups, that is, groups of observations that have the lame value for one
' .. ; ' ' ' ' I' I
or more selected vanables. Many examples m this book Uffi BY-group
processing with one or more SAS data sets to create a new data set.
': :; . · 11
The BY stat~ment_jdentiq~s on1: or more BY variables. Y'-i•.hen using the BY
statement, w1th the SET, MERGE, or UPDATE statement,, Y,our data must be
sorted or indexed on, the BY variable or variables. I; [
In a DATA step, the SAS ~ystemi~entifies the beginnidg!~ d end of each BY
group by creating two temporary variables for each BY-v!rtable:
FIRST.variable and LAST.11ariable. These variables are sbt 1to 1 if true and Oif
false to indicate if that observation is the first or last in thJ current BY group.
Using programming logic, you can test FIRST.variable ana I
LAST.variable to
determine if the current observation is the first, last, or both first and last in the
cm1·ynt BY group. Testing the values of these variables in ~0nditional
processing lets you perform certain operations at the begi9nlng or end of a BY
group. ' II;
i
MODIFY;and BY_ i I;: . .
Internally the MODIFY statement handles BY-group proc~J~rng differently
from the SET, MERGE, and UPDATE statements. MODIFY creates a
dynamic WHERE clause, making it possible for you to u'slBY without either
sorting or indexing your data first. However, processing i,J~bd on
FIRST.vaNables and LAST.variables can result in multipl~ BY groups for the
same BY values if your data are not sorted, You may not, lhbrefore, get the
expected r¢sults unless you use sorted data. And even thouk sorting is not
required, it is often useful for improved pe1formance. . I.
; I I
Array Prncessing i I•
When you ·want to process several variables in the same w~ ~ use array
processing. Processing variables in arrays can save you tinieland simplify your
code. Use an ARRAY statement to define a tempornry grotiP.ing of variables as
.• · I I ·
an array. T~en use a DO loop to perform a task ~-epetitivel.~ ol·n all or selected
elements in the array. i 1-:
Choosing between UPDATE 1 You can us·e either the UPDATE or MODIFY statement io !u date a master
; I ,
and MODIFY !data set with information in a transaction data set. The UPDATE statement is a
imore familiar tool. Its only application is to update a maste( ~ata set.
!MODIFY,'~ newer and more powerful tool, has many m~r,'.applications. You
:can use the MODIFY statement to :( I
:□ process a file sequentiaUy to apply updates in place (w"tiout a BY
statement) : .
i . . i
.□ make changes to a master data set in place by applying transactions from a
transac~ion data set _ ! .: I
1□ : update the values of variables by directly accessing observations based on
: observation numbers ! I.I
;□ update the values of variables by directly accessing obJe vations based on
the values of one or more key variables.
LI l..llllpter1
Only one application of MODIFY is: comparable to UPDATE: using MODIFY

with the BY statement to'apply tr~ns action~ to a data set. While MODIFY is a
1
more powe1ful tool thanUPDATE, UPDATE is still the tool of choice in some
cases. Table. 1.2 helps y9u choose y.,~ether to use UPDATE or MODIFY with
BY. ·. . i
Table 1.2 UPDAf,E !;versus MODIFY wit.h BY

'I
I ! ' ._.
Issue I:I MODIFY with BY. UPDATE
Disk space saves disk space because i~ updates requires mor
. I
disk space because it
data in place produces :an updated copy of the data set.
,· I '. I
Sort and index for good performan9e, iti~ strongly r~quires only that both data sets be sorted
recommended that both data sets be i
sorted and that the master data set be
indexed
When to use use only when you expect to process use if yoti expect to need to process most
a SMAI.,L portion of the data set of data set . l
'
Duplicate BY-valu allows duplicate BY-values in both allows duplicate BY-values in only the
the master and transaction data sets transaction
. data
'
set
Scope of changes cannot change the da~a set descriptor can make changes that require a change in
information, so changes such as· th,e descripto~ portion of a data set, such
adding or deleting variables or as adding new variables, etc.
' variable labels, etc., ar~ ~ot valid
Error-checking has new error-checking capabilities · needs no error checking because
using _IORC_ autom~tic variable transactions without a corresponding
and the SYSRC autocall macro master record are not applied but are
added to the data set
,I <;
Data set integrity data may only be par,tially updated no data.loss occurs because UPDATE
due to an abnormal task termination w~rks on a cdpy of the data
' ' 1.' ' •
, I, j
Where to G ' '.from Here

1
• : .:
:
'
I
I
The :toll owing sources contain more complete explanations of topics covered
briefly in this chapter: ; .· I
.. i I
;11
• .
□ Array processing and the ARR.4 Y statement. For a complete

' i discussion, see pp. 160-171 in, SAp' Language: Reference, Version 6, First
Edition. For a complete description of the ARRAY statement, see
Chapter 9, "SAS Language Staterilents," in SAS Language: Reference,
Ve1·sio1i 6, First Editiq,i. For a ~asic explanation and simple examples see
Chapter 12, "Finding'Shortcuts in!Programming," in SAS Language and
l ' I • '
Proce1ures: Usage, Version 6,. Fi~-st Edition For more extensive .

examples, see Chapters 7, 9, arid 1.0 in SAS Language and Procedures:
Usage
• ,
2,, Version 6, F'irst Edition,
,
iI
□ BY-group processing and the BYI statement, For a complete discussion,
see'pp. 131-136 in SAS Language: Reference, Version 6, First Edition.
For a complete descri~tion of the ~y statement, see Chapter 9, "SAS
!-J ·1. • I 1 1
! <J!"'' nGmuvu•my•, .~~LGJJ ,., .. .,.vu•, ""•''. <J<m• '''""'I'"'"'""' '-' '"'."''C ,,., VUJIUIII nr;1r; ,_,
: ; I
; Language Statements," in SAS Language: Reference, V~rsion 6, First
j Edition. . .. j· I:
D Combining SAS Data Sets. For a complete description1apd examples of
concatenating, interleaving, one-to-one reading, one-to-orle merging,
match-merging, and updating/see pp. 137-160 in SAS La~zguage:
Refereiice, '(:rsio,i 6, First Edition. For more examples,!ste Part 4,
"Combining SAS Data Sets,''. Jn SAS Language and Procedures: Usage,
Versi~n 6, First.E~ition. · , . I: I
i
D Creating an index for a SAS data set. For a discussion of indexes and
!
how to create them, see pp. 217-225 in SAS Language'.· ~~ference,
Version 6, First Edition. Also see the description of the ~DEX= option in
SAS Technical Report P-242, ~AS Software: Changes ~114 Enhancements,
1
; Release 6.08, pp.:3.1-32. . , • j: I
ci _IORC..:, automatic variable and SYSRC autocall ma~o. These
error-checking tools were originally documented in SASi Technical Report
P-222, Changes and Enhancements to Base SAS Softwar~,,Release 6.07.
. Both detailed descriptions and ~xamples are in the appen'd1x in this book.
·Also see.Jacobs IlI, Charles A: (1992), "DATA Step Prdgtamming Using
,.
the MO~IFY Statement,,. Observations, 2. (1), 4-11. : !. !

1 • 4,_
I
□ MERGE statement. For complete reference documentatibn, see
I Chapter 9, "SAS Language Statements," in SAS Langua~i: Reference,
I Version 6, First Edition. jij
d MODIFY statement. For complete reference documentation, see pages
: 1-10 in SAS Technical Report P-242, SAS Software: ChHges and
, Enhancements, Release 6.08. Also see Jacobs III, CharleirA. (1992),
"DATA Step Programming Using the MODIFY Statem~~i,"
; Obse111ations, 2 (1), 4-11. j:; I
d PROC SQL procedu1'.e, If you are unfamiliar with Struc,tured Query
j Language, see Getting Started with the SQL Procedurei ye~-sion 6, First
' Edition. For complete documentation on PROC SQL, se~ SAS Guide to the
SQL Procedure: Usage and Re~erence, Version 6, Fi_,-stldftion.
Di SET statement, For complete reference documentation, see Chapter 9,
"SAS Language Statements," in SAS Lang,,age: ReferenhJ, Version. 6,
First Edition. For information o'n the KEY= option, seep! 43 in SAS
Technical Report P-222, Changes and Enhancements to 4tt,se SAS
Software, Release 6.07. The UNIQUE option is described ih SAS
Technical Report ~-242, SAS S~ftware: Changes and E?h·aI11cements,
Release 6.08, p. 14.
.
,
I , ..
I:
□! UPDATE statement. For complete reference documentati ·n, see
Chapter 9, "SAS Language Statements," in SAS Languaie.l Reference,
Version 6, First Edition. Ii
I:
I:
1,
C H A p T E R
II
.I
Combining Single Observations with Single
I
Observations i :
In a one-to-one relationship, typically a single observationln one data set is

r
related to a single observation from another based on the lue of a chosen
1 f this variable
variable. A one-to-one relations.hip implies that each valuer
occurs only once in each data set. I
I'
' I
2.1 Merging Data Sets by a qommon Variable, SpecifjfJg

Their Origin, and Replacing Missing Values 16 !.
J~.
. I.1
2.2 Combining Observations ~en Variables Values Not Match
&~~
.
u !
2.3 Combining Observations When There is No Commb
Variable 24 ' · J;: I
2.4 Performing a Table Look~p When the Lookup DatJjet Is
Indexed 27 · I!
2.5 Performing a Table Look~p When the Lookup Datj '. et Is Not
Ind~xed 31
i
J
2.6 Matching Observations Randomly 35
2.7 Combining Observations Based on a Calculation on ariables

Contributed by Two Data Sets 38
• : ! I
I '
' :j
:;
I
I
I·
10 ~tCIIIIJJte -',1 u ll'up1e1· I!.
I I
Example 2.1 . Merging D~ta Sets.by a: Cpmmon Variable,
Specifying"The~~ Origin:, and Replacing
Missing V~l.ues) ·
I
Goal I Combine observations from t~o data s~ts based on a variable common to both.
I
I To make the new data set more informative, create a new variable whose
I of
values indicate the origin each obse~vation and replace the missing values
that result from tlie merge operation with meaningful values .
. , ' '· • , ;• I
!
•t
Strategy Use the MERGE and BY statements tci match-merge the observ,ations from
two data sets. Use the IN= data set option to indicate which data sets contribute
1
II to an obse1;vation. Use IF-THEN/ELSE logic to specify the origin of the

obsei·vation and to handle. qussing val~es that result from the merge operation.
This task requires that each data set either have an index on the BY variable or
be s~rted by the values of the BY variable.
' ' !
Input Data Sets 11
Both ONE and TWO a e sorted by ID. 1

ONE 'l'WO
OBS ID NAME DEPT PROJECT OBS ID NAME PROJHRS

. :· !i\l
•.1 I • i
r1 " ;O(i'o <Miguel A12 Document 1 111 Fred 35
I
,2 111· Fred B45 Su~vey 2 222 Diana 40
•3 222 Diana , B45 Document 3 777 Steve 0
,4 888 l~~nique · 1 A12 · Dobument 4 888 Monique 37
I I
5 ; .999 Vien D03 Survey 5 999 Vien 42
'1·. '. ;.
<
II !
Resulting Data 4e1t ·

Output 2.1 COMB J:>ata Set
·coMBINE
OBS Ol!IOIN ll> ,· NAM& DBM' l'I\OJICT PROJHRS
I • ~ ; ! ' ' ,,
1 , ,one 000 • Miguel A12 Document 0
i · ' · 2 ~ ' ' 'lioth 111 Fred 8'5 Survey 35
..,3 , ,; both . 222 . Diana , HS Document 40
4
5
,.t.wo
'both,
m , Steve i NBW NONE 0
888 , Monique • A12 Document 31
6 both 999 Vien · D03 Survey 42
--···-······o -···o·- ----•·,-··-..- ...... -'"·o·- ----· ·····:-·r - _ .....,,.~ ~--
:II
' I I
Program 'J'.he objective is to create a single data set that matches ·e~~h individual with
the correct departmental and project information based 'oh borresponding ID
vall,les, to add a new variable indicating the origin of tha(ibformation, and to
add meaningful information where values are missing. program Tff
match-merges the data sets ONE and TWO to create the data set COMBINE
with the variables ID, NAME, DEPT, PROJECT, PROJHRS, and the new
variable ORIGIN. Use the IN= data set option to deterriiih6 which input data
set contributes to the observation output to COMBINE. Usb IF-THEN/ELSE
logic to specify the values for ORIGIN. Use IF-THEN logi~ to supply
meaningful values in the place of missing values that resul ed from the merge.
Create COMBINE by merging data combine;

observatio11sfro111 ONE a11d TWO based length origin$ 4;
o,i the matc11i11g values for ID. JN;,;; creates merge one(in=inl) two(in=in2);
IN!, which is set to 1 when ONE by id;
contributes an observation, and IN2, which
is set to 1 when TWO contributes an
observation. ID is the BY variable.
Assig11 valrtes to ORIGIN accordi11gto the if inland in2 then origin='both'; ·

§Jl.ecified co11ditio11s. Set the' value of else if inl then origin='one';
ORIGIN to indicate whether the current else origin=' two' ;
pbservation to be output to COMBINE
received a contribution from data set ONE,
data set TWO, or both data sets.
Replace missi11g val11es with more if dept=' ' then dept='NEW';

111ea11i11gful valr,es. if project=' ,· then project='NONE';
if projhrs=. then projhrs=O;
run;
., a l!•.\"Dlllpte ,t,,t u <lnaprer t
Example 2.2 Combining Observations When Variable

Values Do Not Match ~xactly
I
I
I
Goal Peiform a fltzz)i ,nerge by merging observations from two data sets based on
' . ' . , I
data:values that 90 not exactly match.;
. '
Strategy Sort each .data l[le;t by· the variable you jre comparing, Read an observation from
each,, one, then
.
con:ipare .
the values of the
'
appropl'iate variable. (Remember to
';'
rename' ,variables
-,
common
' i ; : , - ,.
to both
'
data sets so that values from one data set do I
not .overwrite vaiuesfrom the other.) ~f the difference between the compared
valt1es is within arr acceptable range, write an observation containing values
fro~ b~th data sets. If it isn't within a? acceptable range, test to see which of
the two observations should ccime first. Write to the data set an observation
that contains tho~e values; setting the yalues from the other input data set to
missing to indicate that no appropriate match was found. Then read another
obs~rvation from the data set that con(ributed the values you've just written to
the output data set, and test again to srie if you have a close match.
i
Because·you need to read from one data set, from a second data set or from
both. based on the.result ofa comparison, there are three different points in the
code,from\-vhich you may :need to exepute a SET statement to read an
obse1:vation. To sirnpHfy the c~de, putJhe SET statement in a group of
11 i 1 :., ·, . --· . i • •
statements following a label and use the LINK statement at each pomt m the
program where a read should occur to pranch execution to the appropriate SET
statement.
l : \ '
In each
.
labeled group, precede the SET statement with an IF/THEN
j ' I
statement that prevents SET from attempting to read past the end of a data set.
Otherwise; the DATA step might automatica1ly end before all observations are
processed from both data sets. ·
l ' I I .
f. I .· ,· , ' ' :
Use the END;,,, "(ariable to determine when you've read the last observation in
a data set. Create ~nother variable to indicate that an observation has been read
and proc7~~ed. ~~st th~ vafo~ of that variable for each data set so that you can
end the I?ATA step only after the last observation has been read and processed
rro~ each
inpu(ciata set. ·. : : i
,. ' i. ' !
Using the SQL pr<?ce~ure, you <;:~n per~orm the same task with less code. See
"Related .Technique."
• ,: ~ ~ : I• •
[:
• , f • , · : •
,j : ' ' •· r 'l

Note:· 1Due to the variability of data and the number of conditions that
i•' I .•. I •I I , ,
determine the path chosen by the PROy SQL optimizer, it is not always
poss\ble to determine the most efficien~ method without first testing with your
data.: , ;' , <, , , i
: 1,
! I. ,. 'I
i :
!,.
i• : ,C~,n,(: I ,
i t ·;
' t ~
I i I
Combi11i11g Si11gle Observ.t1tio11.r 1vitll Si11gle Obse,,,~,;~,/s □ E.mmple 2.2 19
: !I.
I
Input Data Sets

Both ONE and TWO must be sorted by ONE 'l'NO
TIME.
OBS . TIME SAMPLE OBS ,TIME SAMPLE
1 23NOV94:09:01:00 100 1
;I
23NOV94:09 Ob:00 200
i , ,.. I
2 23NOV94:10:03:00 ;io1 2 23NOV94: ~9 ;,5~: 00 201
3 23NOV94.:10:58:00 · ,102 3 23NOV94: 11 /o :00 202
; I• I
4 23NOV94:11:59:00 103 4 23NOV94:12:02:00 203
s
I
23NOV94:13:00:00 104 5 23NOV94: 14
' I I
00tot: 204
6 23NOV94:14:02:00 105 6 23NOV94:14:59:00 205
7 23NOV94:16:00:00 106 7 23NOV94:15:,5 :00 q 206
' ; 11
8 23NOV94: 1_6 =M: 00 207
,1,.
9 23NOV94:18:00:00 208
i
Resulting Data Sets

Output 2.2a MATCH! Data Set
MATCH I was created with the DATA step. OBS 'l'IIIBl

MATCH1
'l'IIIB2 SAflPLBl
I
SA!IPLB2
1 23NOV94 .. 09.'01··; 23HOV94109100 100 200
2
3
4
23NOV94 I 10 I 03 '
23NOV9b10 I 58
23NOV94 I 09 I 59
23NOV9411l 104
,
101
.
102
201
202
.
5 23NOV9.41ll1S9', 23NOV94112102 103 203
6 2jNOV94 I 13 I 00 . 104
7 23NOV9h 14102 23NOV94 I 14 I 01 105 204
,8 23NOV94 : 14 I 59 205
9 23NOV941l6100 23NOV94115159 105 205
10 . , 23NOV94116159 207
11 -.._ __..'._.. - :..,~- -~~!19V94:~.a,_oq ( . 208
',?\",\lh •
;
J:
' 1:
Orttput2.2b MATCH2 Data Set !
MA'l'CH2
!
MATCH2 was created with PROC SQL. OBS !l'IMBl SAMPLBl !l'IMB2 ~AMPLB2
I
l '23NOV94:0910l 1 1110 23NOV94:09:IIII \ 200

2 23NOV9hllh03 1. 101 23NOV94 1O!h 59 2, 201
3 23NOV941ll159 •1 103 23NOV94 I 12 : 02 L\ 203
4 23NOV94 11' I 02 ~ . 105 23NOV9411':015 204
5
'
23NOV94116:001
:
. ..
106 23NOV94 I 15 159 •l
23NOV94:111H J
206
202
7
8 . 23NOV94114159 ~
23NOV941l6159~1
205
207
9
10
11
23NOV9':10158 ·~
23NOV94113100 &
102
104
23NOV94118100~:i
.. 208
.
. ! 1:1
Program The objective is to combine obscrvations from data sets nod TWO when
the values of the variable l'lME from both data sets are wilhin five minutes of
ii
each othe_!, First, sort both data sets by TIME. Rename t~ei;vrriables TIMifand
, SAMPLE so that values do not overlay each other in the ~lj~gram data vector
: when they are read from both data sets. Read an observati9nlfrom each data
set, and write an observation to the MATCH data set if thel~r,lues of TIME
meet the criteria. Then read again from each data set and co' tinue comparing
values and writing observations whe:n appropriate.
I I
ui 1.,11up1er ~
j
iI When the TIME values are not within five minutes of each other, test to
determine which is earliest. Since the data are sorted by TIME, you know you
won't find a closer· match later in the other data set, so write an observation
that contains the earliest TIME value and its associated SAMPLE value along
with missing values to represent th'e other set of TIME and SAMPLE values.
Then read again from the data set that contributed the earliest value, and again
test to see if the match is close enough.
Prevent the DATA step from automatically ending when it reaches the end of
the smallest data set by using an IF/THEN statement to test the value of a
variable that indicates when the las,t observation is read. Because you have to
prevent the SET statement from reading past the end of a data set and because
a
you may need to read new obser".ation from data set ONE or TWO, or both
data sets from multiple points in the
program, you can place these statements
I in a labeled group and branch to it ~s appropriate:
I
Create MATCH I. lute executio11 to a data matchl {keep = timel time~ sarnplel sample2);
'
group ofstateme11 },s fliat
I
read a,i link getone; 1I
observation from ONE a11d tlte11 to link gettwo;
another group that rkads fl'om TWO. Both
groups prevent_ the P,i'\TA step from
:~~~ping before realcrng the end of a data
' i
Format tlie datetir, e ivariables. Set to Otlte fo~.t timel time2 datetirne13 . ;
two vadables tltat ,~ill be used to indicate ~nedone=O; twodone=O;
that tlte last observbtio11from data set
ONE or TWO has ter11 botlt read a11d
1-1
processed, ONEDO~E and TWODONE are

I .
not END= variables. The END:a variables
indicate only that f!je last observation has
been read. These vilri'ables are set to 1 after
the last observation 1ihils been processed.
;,, ..
11
Check the value of 'r[I/MPTI agai11st do while (l=l);
TEMPT2. If there is less than a 5-mi11ute if abs(temptl-tempt2) < 300 then
(300-seco11d)differ11~ce betwee11 them, do;· ''
assig11 tlie vallles o.fjtliese "temp" variables ,i .· ; ;: timel=temptl;
lo the variables tl1at jou wa11t to write to ! :, qme2=tempt2;
I '
tlie output data set, ~,1,d then write 011 · , samplel=tempsl;
observatio11, Execute the LINK stateme11ts I· ,; sample2=temps2;
lo read a new obsery~tionji·om ONE a11d
', . l· ou~put;
ji'Olll TWO, The AB~ function returns the
difference between TEMPTl and TEMPT2 :. ' ~ink getone;
as a positive integer :crbgardless of which link gettwo;
value is larger) so th~t!you can compare end;
them. Because this DO WHILE condition
.,, will always be true, h1ls DATA step must
··be explicitly stopped Ihter when processing
is complete.
[ o a o : I r
I
1 I
[ If the differe11ce between TEMPT] and else if

TEMPT2 is five mi11utes or more, test for do; I I,
further co11ditio11s. If the co11ditio11s are •
tllDel==.; !, I
[ met, write an observation that co11taills tlze

actual values from TWO but missing
time2=tempt2;
samplel=.;
values from ONE. If the time value from
sample2=temps2;
ONE is greater than the time value from
[ TWO and if the program has not already
output;
processed all observations from TWO, or if link gettwo;
you have already reached the end of ONE, end; 1
[ you know that you are not going to find a I

1:
match for these values in data set ONE. So
write an observation to MATCH that
I:
Ii
[ contains actual values from TWO and p
i:
missing values from ONE. Then link to I:
statements that read another observation
from TWO so you can continue comparing.
[ I
If co11ditio11s /Jave 11ot been met i,i the ' . II
else if (tempt!< tempt2 and onedone=O) or tw~dr e then
previous IF-THEN or ELSE-IF/THEN
[ stateme11ts, test/or further co11ditio11s, If
do; , :
timel=temptl:
'
,
the co11ditio11s are met, write a11 time2=,; ·
observation that co11taills the actual valrtes
[ from ONE but missing values from TWO.
samplel=tempsl; ·
sarnple2=, ;
'
This code segment•uses the same logic as
the previous one but writes values from output;
[ ONE and links to statements that read
end;
link getone;
another observation from ONE.
[ li.l Wilen you have processed all

observations from both ONE a11d TWO,
if onedone and twodone' then stop;
end; /* ends the DO WHILE loop*/
slop the DATA step. Because the DO return;
WHILE condition is always true, this
[ DATA step must be explicitly stopped by
I'
this STOP statement.
[ If there al'e more observatious i11 ONE, getone: if lastl t_hen ·

read a11otherobservatio11, If the last ~; I.
observatio11 llas already bee11 read, set onedone=l; j:
[ ONEDONE to I to indicate I/rat tl,e last return; .•'· 1 ! •
observation was both read m,d processed
n11d then preve11t the SETstaleme11tfrom end; . . i I:
set one (renarne={time=temptl sample=tempsl)l :end=lastl;
[ execziti11g a11d attempting to read past the
end of data set ONE. This strategy prevents return;
the DATA step from ending automatically
when there are no more observations to
[ read. The RETURN statement causes
execution to return to the LINK statement
that branched execution to this label,
[ Rename variables TIME and SAMPLE so
that their values are not overwl'itten when
variables of the same name are read from
[ TWO. END= creates LAST!, a variable
that is set to 1 when the last observation is
read from ONE.
[
[
I!
[ I
I
I
[ I
I
..r:,;.c n.Au111p1t: ~.~ u
·I
'. l'tup,er.
~
;I ;
,1 .
I ,
I
If tl,ere are mol'e obsJrvatio11s in TWO, gettwo: if last2 then
read a11otl1e1· observ~tio~,. If the last do;
I'' ,
observatio11 has alre~4ybee11 read, set twodone=l;
TWODONE lo I to '11ilicate that tlie last return;
''I
observation was botliI l'ead a11d processed,
I
end;
a,1d tlle11 preve11t th~ ~ETstateme11tfro111 set\wo (rename=(time=teinpt2 sample=temps2) l end=last2;
executi11g a11d altem'pti11g to read past tlie return;
e11d ofdata set TWO.,'fhis code segment
¥
uses the same logic fhe previous one but
run;
applies to data set TWO.·
~ A Closer Looi<: St~pping the DATA Step

:1
You want to stop the DATA step aftd- all observations have been processed
from both data sets, not after all obsetvations have been read. The END=
variables, LASTl and LAST2, are set to 1 after the last observation has been
read. BuUhe statements that process ~ach observation are in a different
location in the program from the SE~ statement and must operate conditionally
based on whether the last observation has been read and processed, not just
read. Otherwise, the last observation is read but never processed. To enable the
last observation to be processed, the variables ONEDONE and TWODONE
are created and are set to 1 only after :the last observation in ONE and TWO,
respectively, has been processed. The, final IF-THEN statement in the program
tests the variables ONEDONE and TWODONE and stops the DATA step
when the values pf both indicate that processing is complete.
:.. ' :
Related TechniA'ue The following PROC SQL step uses donsiderably less code to produce the
same output as the DATA step, althotlgh the rows and columns are in a
different order in the resulting data set
I
PROCSQLjoins the tables to produce a new table,* MATCHl. Conceptually,
the join results. in an internal table tba~ matches every row in ONE with every
row in TWO. The ON clause subsets ~hat internal table by those 1·ows where
there is less than afive-minute time difference.
Thi~ join is a full outerjoin, which returns rows that satisfy the conditiol'1 in
the ON clause. tn addition, a full outer join returns all of the rows from each
table' that do ·not match with a row froni the other table, based on the condition
in th~ ON claus'e.' For example, for rnws 2 0 2, 2 0 5, 2 o7, and 2 0 8 in table
TWO, there art: no _rows in table ONE:that they can match with that results ,in a
time differential offive minutes or less. Likewise, for rows 102 and 104 from
table· ONE, there are ri9 rows in table two that they can match with that
in
results a time,.: differential
: !, : \
of five minutes
:
or less.
• A PROC SQL table· .is a SAS darn 5et. In SQL terminology, columns are variables and rows
are observations: I. ' i
i
I
!! i
I
j
i
,1
I .t l
proc sql;
create table match2 as
select *
from one(rename=(time=timel sample=samplel)) full _join
two(rename=(time=time2 sample=sample2))
'I
on abs(timel-time2)<=5*60;
: quit;
I
I
~ote: In PROC SQL, SELECT statements automatically p~~~uce a report.
SELECT clauses, which follow CREATE TABLE or CREATE VIBW
statements, do not automatically produce a report. 1
:
I
I
I
i
i
!;
.,
~-. z::,,A(Ull/llt: J'. • .J LI ':l'~UpH:-r L:,
I '
i
Example 2.f j , Combining Observatitjns -When There Is No
Common Variable i
i
II iI
Goal Combine observations based on some criteria, even when there is no common
variable in the two data sets. :
i
Strategy Use the looping action of the DATA step to access an observation from one
data set on each iteration while reading all observations from a second data set
to look for a match. To read the second data set, use the SET statement with
the POINT= and NOBS= options in a DO loop to access all observations
sequentially by observation number until a match is found. Then you can test a
condition for each one to determine whether combining the information from
,,I
I
the. current observation of each data set is appropriate and write an observation
I to a new data set when the condition is met. Optionally, you can write a note to
I
I
the SAS log when no match for a project is found.
I
I, You can perform the same task with PROC SQL, with the exception of writing
I
a note to the log under a certain cond~tion. See "Related Technique."
Note: Due to the variability of data '.and the number of conditions that
determine the path chosen by the PROC SQL optimizer, it is not al ways
possible. to.determine the most efficient method without first testing with the
data; · · · 1
Input Data Sets I

II
The PROJECTS and BILLS data sets have PROJECTS BILLS
no common variable. The
I t
dates associated
with each project in PROJECTS do not OBS STDATE ENDDATE. PROJECT OBS WORKID COMPDATE CHARGE
overlap. I i
1 01/09/95 ' 01/27/95 BASEMENT 1 1234 01/17/95 $944.80
• .-' • f,· ·. ·
I
'2 :' . 02/01/95 . 02/12/95 FfWl!i:! 2 2225 02/18/95 $1,280.94
i' 3'•' 02/15/95.' 02/20/95 ROOFING 3 3879 03/04/95 $888.90
4'.• . 02/22/95 02/28/95 PLilllB : 4 8888 03/21/95 $2,280.87
·'I I 5 1 I 0J/02f95 • 03/05/95 WIRE
! i 6' · 03/07/95 ' 03/29/95 BRICK.
II '
I
Resulting Data Sets
II,
011tpllt 2.3a COMBINE! Data Set
1 : COMBINBl
1 -l . !
COMBINE! was created with the DATA OBS PROJECT STDATli ENDDATR IYORKID COMPDATB Cl!AR02
'
step. I 1 BASEIWll' . 01/09/95 01n119s 1234 01/17 /95 $944,80
2 ROOFING ; ' 02/15/95 02/20/95 2225 02/18/95 $1,280.94
3 WIRE 03/02/95 03/05/95 3879 03/04/95 $888,90
4 BRICK 03/07/9S 03/29/95 8888 03/21/95 $2,280.87
i
'
,, I -
--···-··• .. •o -•••o .... __ .,..,, ,-•·-··- .. , ... ..... ,a,- ............ , • .... _,,.,
II I: i
! I
-IHUOJ'I"" NooJ<
--
I; I
O11tp11t 2.3b COMBINE2 Data Set
COMBIHl2 i! i:I, I
COMBINE2 was created with PROC SQL. OBS l'l!OJSC!' S'l'DA'l'B BH!lDA'l'B WORKID COHPDATE CIW\GB
' J, I
1 BASl!lmN'l! 01/09/95 01/27/95 1234 01/17/95 $9'"80
2 ROOFING 02/15/95 02/20/95 2225 02/18/95 $1,280.94
3 WIRE 03/02/95 03/05/95 3879 03/04(9$ $898.90
BIIICK 03/29/95 $2,280,87
'
=-..,:___--
03/07/95
~-- - ~
8888 03/21/95.
1:
----------~~-- . ·------- - - = - - -
: 11
! 11
! Ii
Program The objective is to bill charges to the correct phase of a ~6~struction project by
creating a new data set that contains the appropriate inf~rw~tion from
PROJECTS and BILLS. Read each observation in PROJEGTS and compare
the values of t~e STDATE and. E~DDATE variables to ;t~iJvalue of _ .
COMPDATE m each observation m BILLS. If the completion date falls within
the range of dates indicated by the project sta~t and end aJtJ values, write an
observation to COMBINE. Set FOUND to 1 and use that ~Jndition I
to stop the . I;
DO UNTIL loop so that no more observations are read fro BILLS after a
match is found,
Create COMBINEl. Read a11 observatio11 data combinel(drop=foundl;.
from PROJECTS. Set FOUND back to 0. set projects;
FOUND will be used to stop the DO found=O;
UNTIL loop aftel' a match has been found.
Read obse1·vati011sfrom BILLS 1111/il a do i=l ton until (found);

matclt is f or,nd or 1mtil all observatio,rs set bills point=i nobs=n;
/1ave beett read. POINT= references a '· I '
variable (I) whose value provides direct

.,
.access to each observation in BILLS by
observation number. NOBS:::: assigns the
number of observations in BILLS to the
variable N; tlie D0 loop iterates once for
each observation ih BILLS until a match is
found.
W/1e1t tire conditio,i is met, set FOUND to if stdate <= compdate <= enddate·then
1 and write a,i observatio1t to COMBINEl, do;
FOUND is set to 1 when a match is found. found=l;
This condition stops the DO UNTIL loop so output;
no more observations are read from BILLS end;
on ·this iteration of the DATA step. end;
If 110 observatio11s match, wl"ile a 11ote to if not found then put 'No bills
.
exist for: 'projec
I
tllelog. 'with start date·• stdate ·•and enddate' enddate +!-1) '.';
run; ._, 1·
.I
Related Technique If you are famili,r with Sttuctured Qoecy Language (SQJ), ~au may want to
use PROC SQL instead of the DATA step. You cannot, hoy,-~ver, write a note
to the log under certain conditions as you can with the D~ · step example.
,t.'U LAU111ptt: ~ •.J LI J11up1t:r .&
PROC SQL joins the PROJECTS and BILLS tables to produce a new table,*
COMBINE2. Conceptually, the join r~sults in an internal table that matches
every row in PROJECTS with every ~ow in BILLS. Using that-internal table,
the WHERE clause deter.mines that oµly the rows that have a value of
COMPbATE that is between STDATB and ENDDATE will be in the resulting
table.
proc sgl;
create table combine2 as
·select *
'I from projects, bills
';<· where compdate between stdate and enddate;
quit;
I
I
i
Note: In PROC SQL, SELECT stat~ments automatically produce a report.
SELECT clauses, which follow CREATE TABLE or CREATE VIEW
statements, do ~ot automatically prod~ce a report.
• A PROC SQL table is n SAS data set. In SQL terminology, columns are variables and rows
are observations.
Co111bi11i11g Si11gfe Obse11•atio11s with Single Obse1va1io11si □ Example 2.4 27
ij
Example 2.4 Performing a Table Lookup When .t'~ e 1
1
Lookup Data Set is Indexed ! :

· I
I
i;
l
! :I
Goal Combine two data sets using a table lookup technique th'at directly accesses the
lookup data set through an index on a key variable. This 1dokup technique is
appropriate for a large lookup data set. :
Strategy Perform a table lookup using an index to locate observatiol that have key
values equal to the current value of the key variable. Re~d,ftom the primary
file sequentially. To read the lookup data set, use the SET i;t~tement with the
KEY= option to access the observations directly. Write all!observations from
the primary data set to the output data set even when no inktbh is found and
write a warning message to the SAS log. Before writing ari 1bservation, you 1
can calculate a value for a new variable based on values fr~tp a variable in
each data set. Use error-checking logic to direct executioh 1~d the appropriate
code path. • : I:1
You can perform the same t~sk with PROC SQL, with th.e ~ ception of writing
a warning message to the SAS log when no match is found.1 . ee "Related
Technique." i I! I
; Note: Due to the variability of data and the number of ~oij~itions that
' determine the path chosen by the PROC SQL optimizer, it ~s1not always
possible to determine the most efficient method without first testing with the
data.
Input Data Sets

EMPNUM is common to both the PRIMARY. LOOKUP
PRIMARY and LOOKUP data sets.
PRIMARY contains no consecutive OBS EMPNUM SALARY OBS EMPNUM
duplicate values for EMPNUM.* Because
the program depends on directly accessing 1234 $55,000 1 1111 0.18'
1
observations in LOOKUP by using I
2 3333 $72,000. 2 1234 0.28;
KEY=EMPNUM, LOOKUP must be
indexed on EMPNUM. 3 4876 $32,000 -3 3333 0.32:
4 5489 $17,000 4 4222 0.18
5 4~76 0-.24:
I ,
I
l !
• This program works as expected only if PRIMARY contains no co~sJ~~itive observations

with the same value for EMPNUM. For on explanation of the behavior bf SET with KEY=
when duplicates exist, see SAS Technical Report P-242, SAS Softwkrk '1w11ges a11d
E11ha11ceme11ts, Release 6,08page 14, i I!
I
!
.::o c,.r,1111p11t ,e,.,. u I r•JJ•rtr"
'LI
I i
Resulting Data.Sets
I!:
Output 2.4a FINAi;,I Data Set
FINALl
11 OBS BMPNCJM SALARY. 'l'AXBRCK'l' Nii'
FINALl was create ith the DATA step. !
1 1m $Ss,'ooo 0,28 $39,600
2 3333 $72,000 0.32 $'8,960
3 '876 $32,000 0-2, $2t,320
4 5'89 $11,·ooo
:
Outp,,t2.4b FINA
I ;
~,;...Set !rINAL2
FINAL2 was created with PROC SQL. OBS BMPNCJM SALARY i'AXDRCK'l' NB'l'
i 1 1234 $55!000 0.28 $39,600
i 2 3333 $721000 0,32 $48,960
3 4876 $32)000 o.2, $24,320
509
" $17i000
--= ====- ~ ~
I
'= -_-----
Program The objective is to create a new data set that includes all of the information
from PRIMARY, only the con·esponding descriptive information from
LOOKUP, and the values of a new calculated variable. The resulting data set,
FINALl, contains the employee's nutnber, salary, tax bracket, and net adjusted
income.
I
• I
First, read an observation from PRIMARY. Then use the SET statement with
the KEY= option to read an observation from LOOKUP based on the current
value of EMPNUM. To verify wheth~r a matching value in LOOKUP has been
located for the current value of EMPNUM in PRIMARY, use the %SYSRC
autocall macro and the _lORC_ automatic variable.* When a match is found,
calculate a value for NET baselon th~ current values of SALARY from
~RIMARY and TAXBRCK.T from LQOKUP. When no match is found, set
TAXBRCKT to missing and write a message to the SAS log.
i
I
.!
Create FINAL!. Read]··'11·observatio11from data finall;
PRIMARY. I· : se~ __ primary;
I· :
Read a11 observatio,ifr, ,ti L001(UP based set lookup key=empnum;
011 tT,e valr,e oftl,e ke~ iiaiiable,
1. I ,
EMPNUM. The SET statement with KEY=
accesses an observatio~ lii~ctly through the
index, using the curre 1t talue of
EMPNUM. I. i
I
• .)ORC 'and SYSRC are documented in detail in the Appendix.

. ' ,. ··' i .• . '
Wlmi all obsel"vationfrom LOOKUP Jza~ select(_iorc_);
bee11 sr,ccessft,lly located and re'trieved, when (lsysrc(_sok))
calct1late a val11efor NET. When the value do;
of IORC_ corresponds to _SOK, the value
net=salary*(l-taxbrckt};
ofEMPNUM in the observation retrieved
end;
from LOOKUP matches the current
EMPNUM value from PRIMARY.*
Whm 110 match isfowul, set TAXBRCKT when (lsysrc Ldsenom))

to missing a11d write a wami11g message to do; ,
tlle SAS log. When the value of JORC_ taxbrckt= '. ; ! I
corresponds to _DSENOM, no observations put 'WARNING: No tax information for empnW!" I empnum;
in LOOKUP contain the current value of
EMPNUM. If you do not set TAXBRCKT
_error...:=O; .I
end; I
to missing when no match is found, the I
i.
value from the observation most recently
retrieved from LOOKUP is written as part
of the current observation. _ERROR_ is
reset to Oto prevent an error condition that
would write the contents of the program
data vector to the SAS log.
111 case ofa11 unexpected _IORC_ otherwise

co1tditio11, write an error message a11d stop do; ''
execution. When _IORC_ corresponds to put. 'Unexpected ERROR: _IORC_ = '_iort
I
anything other,than _DSENOM or _SOK, stop;
an unexpected condition has been met, so end;
an error message is written to the SAS log end; /* ends the SELECT group*/
and the STOP statement executes to
run;
terminate the DATA step.
• JORC and SYSRC are documented in detail in the Appendix_.

Related Technique If you are familiar with Structured Qilery Language (SQL), you may want to
• I •
:.1
;I
l
use PROC SQL instead of the DATA step. PROC-SQL joins the tables to
,1
:!'I produce a new table,* FINAL2. Con9eptually, the join results in an internal
'I table that .matches every row
.
in PRIMARY
I
with eyery row in LOOKUP. The
:i ON clause subsets that internal table to include only these rows for employees
who are in both tables. !
. . I .
This join is a left outel'join, which re~urns rows that satisfy the condition in the
ON clause. In addition, a left outer jo~n returns all of the rows from the left
table (first table listed in the FROM clause) that do not match with a row from
the right table (second table listed in t_he FROM clause). Thus, the resulting
table has ~ row for employee 5 4 8 9, even though there is no row for 5 4 8 9 in
the LOOKUP table.
proc sgl;
create table final2 as 1
select ,primary.empnum,primaty.salary,taxbrckt,
• I
salary*(l-taxbrckt) as net forrnat=dollar7.
from primary left join l9okup
on primary.empnum=lookup,ernpnwni
quit;
Note: In PROC SQL, SELECT statements automatically produce a report.

SELECT clauses, which follow CREA.TE TABLE or CREATE VIEW
statements, do not automatically produce a report.
! . : l
i:
., I
I.
• A PROC SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations. i
Example 2.5 Performing a Table Lookup When: tile
Lookup. Data Set is Not Indexed · !
I
Goal Subset the observations from one data set into one of two putput data sets,
based on specified criteria. 11
Strategy ln
Load into an ~rray the data that will be used to determine which subset an
p
observation belongs. Read the input data set sequentially, rforming a lookup
into the array structure. Compare values in the current obs'eivation to the
appropriate values from the array to determine whether th~Yi fall within a
specified range, Then write the current observation to the ~J),propriate output
I I
data set. I i
I
Input Data Sets i

; I:
BTEAM contains data on team members' BTEAM
height, weight, and body type. IDEAL
shows the ideal male weight for each OBS LNAME SEX HEIGHT l'IEIClHT TYPE
height, based on one of three body types.
IDEAL is loaded into an array.
1 Adams M 67 160 2
2 Alexander · M .69 115 l
3 Apple M 69. 139 1
4 Arthur F 66 125 2
5 Avery M 66 152 2
6 Barefoot M 68 158 2 I
7 Baucom M 70 170 3
i
8 Blair M _.69. 133 1
9 Blalock M 68 148 2
10 Bostic F, 74 170 3
IDEAI,
OBS HEIOHT SMALL MEDIUM LARGE
l 66 126 138 149

2 67 130 141 154
3 68 134 145 15&
4 69 138 149 162
5 70 142 153 167
6 71 146 157 172
7 72 150 161 177
8 73 .154 165 181
9 74 158 169 185
10 75 162 173 189
,:
I
I1:
I.
j:
I.
I '
Resulting Oat~ r3ets
O•tput 2.Sa INS1r 1 :1 : .
Data Sot
OBS BBIOHi'
:INSHAPE
LNAME \iEIGHT TYPE
.l 69 Ai,ple 139 1
2 70 Baucom 170 l
3 69 Blair 133 1
j 4 68 Blalock 148 2
I
i
·1
:I
Output 2.Sb OUTS PE Data Set
Oll'l'SIIAPE
I
I
OBS ' HEIGHT L~ IIBIGHT 'l'YPE
1 67 lldw 160 2
2 69 Alexander 115 1
3 66 Avory 152 2
4 68 ».refoot 158 2
I
I
Program The objective is to create subsets from the BTEAM data set, based on whether
a male team member is considered to be in shape or out of shape. The IDEAL
data set contains three WEIGHT values for each HEIGHT, based on an ideal
male weight for each body TYPE. Thbse values are used to determine whether
an observation_ from the BTEAM data set should be written to the INSHAPE
or OUTSHAPE data set.
-1 !
I I I
I I
! So that all of the values from IDEAL :are available for comparing to the
·I
I WEIGHT
, ,
in
value in each observation I BTEAM, load values from IDEAL into
a temporary rurny. A subsetting IF statement ensures that the only observations
processed are those for males with values for HEIGHT and WEIGHT that are
within a specified range. Use expressions to determine if a WEIGHT value is
within a range of five pounds
.
ot
above I below the ideal weight for that body
type. IF-THEN, ELSE, and OUTPUT:statements write each observation to the
appropriate data set. j
I
Create INSHAPE all ·ouTSHAPE. data inshape outshape; \
keep lname height weight type; !
:[ '
i
" On the first DATA( step iteration, load a arr:ay wt(66:75,j) _temporary_; I
two-dime11sio11al temp,o~ary
'.1
array from //,e if _n_=l then j
i,1/ormation ilt IDEAL: [fhe DO loop reads do i=l to all; i
each observation fromjWEAL and loads the set. ideal nobs=all;
WT array. The ~ssignT1nt statements ' '
wt(height,l)=small;
assign weight values from IDEAL to the ' ·' wt ·(height, 2] =medium;
I I ·
correct array cells. There are three weight ,I •i . .
values for each height,! ohe for each of three .;

,wt(height,3J=large;
I ' I
frame sizes. Note that ih~ bounds of the first end·

, ·: i ; .! ~ l ,
dimension of WT are ~6!and 75, the

smallest and largest IGHT values in
inches.
i, ·
I
Combi11i11g Single Observatio11s wit!, Si11g/e Obsen•atio,Js □ &·ample 2.5 33
Read an obsel'vationfrom BTEAM. set bteam;

: I
• 11
Determine whether a male qualifies as in if sex= 'M' and 3 ge type ge 1 and 75 ge height ge 66;'.
sltape or out of shape and wl'ite the if wt (height, type)-5 le weight le wt (height, type) +5
obsel'vatio1t lo JNSHAPE or OUTSHAPE. then output inshape;
The subsetting IF statement allows only
else output outshape;
observations for males lo be processed. The
run;
other conditions ensure that TYPE and
HEIGHT are valid values for this
application. Otherwise, invalid values might
be used to locate values in the array,
causing an error that would terminate the
DATA step. The IF-THEN statement writes
to INSHAPE each observation that meets
the criteria. The ELSE statement writes all
other observations to OUTSHAPE.
(ii.! A Closer Look ; Processing the Two-Dimensional Array WT i
lTo help you visualize processing in this example, Figure 2. IJ.'. represents the
itwo-dimensional array WT, beginning with the lower bou'n~ bf 66. If you
!compare it to the IDEAL data set, you can see how it was cbAstructed.
I I '
Fig11re 2.5 Representation of I I

I
-l
HEIGHT I
Two-Dimensional Array WT '. .j

68 69 70 71 72 i 73 74 75
I Ii
T
:Y 1
p
E 3
This
I
statement processes the. array:
!
if wt(height;type)-5.le weight le wt(height,type)+5
then output inshape; i
> I 'j
On the first iteration of the DATA step, the first observation from BTEAM is
processed: - 111
1
1
.: Adams M 67 160 2 I :;IJ
The cell in the array that is the intersection of column 67 (1J

GHT) and row 2
(TYPE) contains the weight. val~e 141. The IF-THEN statJnie t processes
~hese values: ·
if (141-5) le 160 le (141+5) I I

then output inshape; !
Temporary Arrays
When elements in an array are cons~ants that are only needed during the
duration of the DATA step, you can:save execution time by using temporary
arrays instead of creating variables, ~s shown in this ARRAY statement:
I
array wt(66:75,3) _temporary~;
!
In addition to saving execution time; temporary array elements differ from
variables in the following ways: ·
i
D They are not written to the outp~t data set.
o They do ~ot have names and caJ be referenced only by their array names
and dimensions. !
I
□ They are automatically retained,: instead of reset to missing, at the
beginning of the DATA step.
Where to Go f~dm Here □ Two-Dimensional Arrays. For a discussion, see pp. 165-169 in SAS
! . Language: Reference, Version 6; First Edition. For examples, see
Chapter 7, "Grouping Variables:to Perform Repetitive Tasks Easily," in
SAS Language and Procedures: :Usage 2, Version 6, First Edition and
Example l4, "Expense Report, "i in SAS Guide to Report Writing:
Examples, Version 6, First Editi(1n.
'
□ Tempor;;iry Array Elements. F9r an explanation and an example, see
pp. 129-1,31 in SAS Language a~d Procedures: Usage 2, Versioiz 6, First
Edition. For a short example, se~ pp. 170-171 in SAS Language:
Reference, Version 6, First Edition.
/ ;
I.: J :
, . , . ; , .. 1 I 'f
--•••-······o -···o•- ----• •••••-••• ...... ~--•o•• ~-••• •T•-, i'f ~ =u,.,y,o ~.v _,..,
.! I
Example 2.6 Matching Observations Randomly ii
Goal Randomly pair observations from transaction and mast~rld ta sets until a good
match is found. Create a new data set containing the result. of the match.
Update the value of a variable in the master data set appr&~riately.
Ii!
Strategy Sequentially process observations from the transaction
• . '· I
dJj
set. Access the
master data set directly by observation number, using thejMODIFY statement
with the POINT= and NOBS"' options; MODIFY allows tlie data set to be
updated in place. Use the RANUNI function to randomiy!gbnerate a number,
and use the CEIL function to return it as an integer; assigti the resulting integer
to the POINT= variable. Use IF-THEN logic to test a variable for a condition.
Continue reading observations until one meets the condidori, write an
observation to a_ new data set, assign a value to a new vari1JI' le, and update a
value in the master data set. ! :
I
i :
I I
Input Data Sets ENGINEER PROJECTS

I:: I' I
OBS ENGINEER AVAILRRS OBS PROJID HOURS
I j; I
1 Inge 33 1 AERO ii 31
2 Jane 100 2 BRANox iHo
3 Eduardo 12 3 CHEM l:ia
16 CONTRA
4
5
6
Fred°
Kia
Monique
130
44
4
5
6
ENG2
ENG3
ii!
'29
7 sofus 23
I
Resulting Data Sets
011tp11t 2.6a Updated Version of
ENGINEER Data Set BNGINEE:R
OBS BNOlNE&R AVAILIIRS
1 Inga 33
2 Jane 82
3 Bdua;rdo 6
'
5
6
Prad
Kia
Monique
16
60
13
7 Sofus 23
Ii I
Output 2.6b ASSIGN Data Set
i
ASSIGN ; i
OBS PROJID BNGIIIBIR '
1- Aero Monique
2 Brandx NOW
3 Chem Jane
: 4 Contra Kia
s· Bng2 Eduardo
6 Eng3 Kia :
t
11
I
i
!
36 Example 2.0 ltl i (,1rapter :l
II i
I ,
i
I
Program The data set ENGINEER lists engineers and their available hours. The data set
PROJECT lists each project by ID and the hours needed to complete that
project. The objective is to use a ra~dom direct access technique to match a
project with an engineer who has sufficient hours to complete that project,
output the results to the new data set ASSIGN, and update ENGINEER to
reflect the hours remaining for eac~ engineer after assignment. The random
direct access technique causes the program to produce different output each
time it is executed. :
Read ari observation'.from PROJECTS sequentially. Use the RANUNI and

CEIL functions to generate a random integer and assign its value to X, the
POINT= variable. This value is the observation number. Use IF-THEN logic to
write an observation:to ASSIGN when the engineer hours equal or exceed the
hours ne~ded to complete the task. The DO loop continues iterating until one
engineer is selected for a project. Decrease the engineer's available hours by
the value of HOUR~, and use the REPLACE statement to update ENGINEER
accordingly.. ·
FOUND is then set to 1 and the next project is accessed. Exiting the loop when
FOUND::::0 means that no engineer y,rith sufficient hours was found and,
therefore, assigns the. value 'NONE1I to ENGINEER and writes an observation
:1 to ASSIGN. !
:1I ; .f.· I_
I
;
Open ENGINEER .Of update a11d create data.engineer assign(keep=enginJer projid);

ASSIGN. Read an observation from _s.~t p~ojects/ '
PROJECTS. !I I
i'
i
'
found=O;
i; .. I
i
I
Process observatio,~sfrom ENGINEER do i=l. to 1000. while (not fou'nd);
Ulltll a,i
engineer is ls~lectedfor tile
current project or u~1til it iterates 1,000
times. 1 I;
1
I
Generate ra11dom values that will be used x=ceil (ranuni (12345) *n);
to access ENGINEER I I
by observation
number and assign {llf resuUs to the , :,,. i:
variable X. The RANUNI1 ;1

function
I
randomly generates nu;mbers and returns a . : ,· ~ t ·.

value based on a see~. !Multiplying th.at -,'. ~ }\_T,lit;";j, I!
value by the num~er1of ~ngineers (N) and · ;·: -1\hi ~i- ..
then using the CEIL function returns an
I ' ,
integer between 1 anll N.
I' i
i I I
Use the value ofX to, access observations
; j ~~ ' ~ • ; .i
modify •_i.engineer point=x nobs=n;
'
' f ~ ' '.: • ~ 1 : :
in ENGINEER b" ., obs~rvatio,i

I I
11umber,
.
POINT= uses the variable X, whose value

provides direct accesf ~o !observations in
·,, ~NGINEER by obsefv~tionnumber.
NOBS= assigns the ur·per of observations
in ENGINEER to N. · i
i
i
!
'
II
Ib
Combi11i11g Single 0bsen•a1io11s with Single Obse,vatio? i Example 2.6 37
Whe11 the available e11gi11eel'i11g hours if availhrs>=hours then

exceed or equal the hours 11eeded to do;
complete the project, w,ile _the observatio11 output assign;
to ASSIGN, calculate a 11e111 value for
availhrs=availhrs-hours;
A VAILHRS, a11d update ENGINEER. The
assignment statement assigns a· new value to
replace engineer;
AVAILHRS, and the REPLACE statement found-!;
updates the current observation read from end;
ENGINEER with the new value for end; /* ends the iterative DO loop*/
A VAILHRS. FOUND is set to 1, so the DO
loop will stop because an engineer has been
selected.
Wiren 110 e11gi11ee,·'s available hours equal if found=O then do;

or exceed /lie hours needed for the cun·e11t engineer="NONE";
project, write a11 observatio11 lo ASSIGN output assign;
i11dicating tltat 110 e1igi11eenvas assig11ed. end;
run;
' iI
.,0 r.xamp1e~.1 u 'L11up1er4
I .
I
:1
qi
Example 2.1 iI Combining Observations Based on a
Calculation on Variables Contributed by
i Two Data Sets '
'
!
:
:
Goal :I
•'I
I Use one-to-many matching on columns* in two tables and perform a
ii
;,
calculation that shows the relationship between values in columns that are
I
II
I unique to each table. Produce a table that includes only those rows that meet a
:I I
I
i specified condition. I
i I
i
i
Strategy Use the SQL procedure to join two taples. The join produces a Cartesian
: product, which is a combination of e~ch row from the first table with every
' row from the second table. During th~ join, you can perform mathematical
computations to create a new column;using values from columns that are
common to both tables. Subset the joi n to get only those rows that have a
1
I
specified value of the new column.
:
:
ONE TWO
OBS HOUSE X y OBS STORE X y
'F ' '

'housel storel
~ 1 1 6 1
2 , house2 3 3 2 store2 5 2
3 house3 2 3 3 store3 3 5
4 house4 7 7 4 store4 7 5
Resulting Data sl'e . I i

It !
FINAL
011tput2.7 FINAL 'Ila·:•.·\11
.Closest
. I OBS HOUSE i Store Distaneo
I
' I l house1 ;store3 4.12 -&-
~I ! . 2
' 3
4
house2
house3
houae4
'store3 ·
:ptore3
store4
2. 00
2.24
2,00
1
I
i'
.1
• A PROC sqL table is a SAS data set. Jn S~L terminology, columns are variables and rows
;
are observallons. ·
: i
Combining Single Obse1va1io11s with Single Observali~11~i j □ Example 2.7 39
; j, I
Program The objective is to join ONE and TWO to get a row for ~ve~~ combination of
house and store. In this exa.mple, the join results in an inte.,]1.: ~I table of 16 rows,
four-for each house. ! I1
! !I
For each row, calculate the distance between each house k ~ each store by
performing mathematie:al calculations on the X and Y coorbi'nates. Lastly,
determine which store is closest to each house. Select onlyltliose rows whose
values for DIST represent the minimum distance from a spe' 1ific house to the
closest store. 1
I[
I11voke PROC SQL a11d create a table. The proc sgl;
CREATE TABLE statement creates the create table final as
table FINAL to store the results of the
subsequent query,
Select the colrmms. The SELECT clause select one.house, two.store label='Closest Store'
selects the HOUSE and STORE columns
from tables ONE and TWO, respectively. !
. ] I
llJ Ca/cr,late a new cohmm. The arithmetic sqrt((abs(two.x-one.x)**2)+(abs{two.y-one.y)* 2 las dist
expression uses the square root function label='Distance' format=4,2 ' j
I '
(SQRT) to create an additional column,

DIST, that contains the distance from
HOUSE to STORE for each row.
Name tlte tables tojoi11 a11d query. from one, two
Group tile data by values ofHOUSE a11d group by house

stlbset the grouped data. The HAVINO having calculated dist= min(dist);
clause subsets the grouped data by selecting quit;
the row with the lowest value for DIST
from each group. CALCULATED takes the
place of the mathematical expression that
calculates the values of DIST.
C
1
I
I
·1 '
Note: In PROC SQL, SELECT statements automa~catly pr~nt output.
SBLBCTclauses1 which follow CREATE TABLE or CREA'JJE VIEW
ftatements, do not_ automatically print output. I j
: i
I
!
!
I'
'IU J!xample ~.I L!J l-1iapte,· ~
I 1
i:
lia A Closer Lio'ok Calculate a New Column
I I ;
w It may help you to visualize the plot of the location of the houses and stores
and to actually see how the distance between a specific house and store is
calculated. The following plot sho~s the position of each house and store:
;
Figure 2.7 Plot Showing Position of

8 !
Houses and Stores 1
I
I
7 lL4
6
i
5 s3 s4
'
3 h3 h2
2 s2
h1 sl
l 2 3 4 5 6 1;
As an example, this is the calculation for the distance between housel and
storef !
' .
sqrt( (abs (two.x-one ,x) **2)+ (abs [two.y-one.y) **2))
sqrt((abs(6 - 1)**2)+(abs(; 1 - 1)**2))
sqrt((5**2) + (0**2))
sqrt(25 + 0)
sqrt(25)
=5
Note: A double asterisk (**) represents exponentiation.
,,
:I
41
C H A p T E R 3
! i
Combining a Single Observation with Multiple
Observations : I·:
l one-to-many or many-to-One relationship between inpn Jl..

sots implies
that specific values of one or more chosen variables are uniqilp in one data s~t
but may occur in multiple observations in the other data set.iT{te order in
which the input data sets are processed determines whether the relationship is
one-to-many or many-to-one.
,;
!' 1..! I
3.1 Adding Values to All Observations in a Data Set 4i !
I i ·
! 'I ''
3.2 Adding Valu~s from the Last Observation in a Data S,e~t: o All
Observations m Another Data Set 44 . i I:
!
j
3.3 Merging Observations from Multiple Data Sets Based 'A a

Common Variable 46 ! 11
. . . 11 I
3.4 Applymg Transactmns; to a Master Data Set Based o~ a,r.ommon
iI Variable 48 · 1 I:
I .
i
3.5 Combining and Collapsing Observations Based on a ¢ommon
I
i Variable 50 I 1.11:
!: . I .I :
3.6 Applying Transactions to a Master Data Set Using an! I~dex 53

i
I : 111
3.7 Removing Observations ~rom a Master Data Set Bas~d 1 Values in a
I Transaction Data Set 56
I
j
~r
I! :
3°.s Performing a Table Lookup with a Small Lookup Daia Set 60
I I
i !
3.9 Performing a Table Lookup with Large Nonindexed
I Data Sets · 63 ·
i!
31, 10 Performing a Table Lookup Using a Composite Index
, Transaction Data Set Contains Duplicate Values 6~
Performing a Table L~•. okup with a Large Lookup D~t1

That Is Indexed 70 ; !
!
:
!
i
!
' i
: i
: '
Example 3.1:I Adding Values to All Observations in a Data
Set . !
II
i
'i
Goal Efficiently combine values from a single observation in one data set with all
observations in another data set.
Strategy On the first iteration of the DATA step, read the values of all variables from a
single observation in one data set once to place those values into the program
data vector. Then read each observation in the second data set, outputting a
new observation that contains the combined values.
Input Data Sets! I:

Each salesperson in shLESREP works in IlEPT_ID SALESREP
the store and departmbritldentified
,J
in
DBPT_ID. .i OBS STORE DEPT OBS NAME MONTH 'ro'l'SALES
!I 1 13 VIDEO 1 Harvey Jan $25,375

:1
2 Lou Jan $9,950
!j 3 Mary Jan $27,985
:1 4 Sam Jan $8,795
:I,1
i
Resulting Data S'.et
Output 3.1 SALES_ dData Set !
''
S~ES_ID
I
0119 STORE DBP'I' N/u1E MOITT.'H TOTSliLBS
!
1 13 VIDEO !Harvey Jan $25,375
2 13 VIDl!O Lou Jan $9,950
3 13 VIDl!O :Maey Jan $27,985
4 13 VIDEO Sam Jan $8,795
'
I :
'i
ii
I :I I
'i
i
'""..,,.,,..,11111•.s 1., uu1,0111-- ...,....,...,..,r rn••v•• .. ,.,.,, ,..,,,, .. ,,,,~.,. _u..,c:.1 rur1'"l L..I J....:il.AUf1ty,c ..,,~ -.~
I:
Program The objective is to take the data set that contains infor~al~ n about sales
representatives and add to each observation the same v~t4ep for two new
variables, STORE and DEPT. The only observation in DEI;>T_ID contains
values for STORE and DEPT. The IF-THEN statement Jith the N option
l·I - -
executes the SET statement to read from DEPTJD only o~ce, on the first
iteration of the DATA step. The values for STORE and DEPT remain in the
program data vector for the duration of the DATA step ei~tution because
' ' ' 11
values read with the SET statement are automatically retailled until another
observation is read from that data set. Each iteration readJ ·an observation from
SALESREP and writes an observation that contains the 'sJriie value for STORE
and DEPT and all the data for a single sales representativb: ·
c,.eate SALES_ID. Real} a11 observation data sales_id; 1
from DEPT_ID 011ly 011 the first iteration. if _n_=l then set dept_id:
Since DEPT_ID contains only one
observation, using IF-THEN and _N_ to
read from it only once avoids prematurely
ending the DATA step when end-of-file is
reached. The variable values from
DEPT_ID are retained throughout the
DATA step.
Read 011 observati011from SALESREP set salesrep;

a11d automatically Wl'ile a11 observation to run;
SALES_ID.
"" l!.Xampte .::,.;; I:I
u <l11ap1er :,
I:[ .
Example 3.l , Adding Values from the Last Observation in
a Data Set to All Observations
I
in Another
Data Set I
!
Goal · Efficiently combine values from the l_ast observation in one data set to all
observations in another data set.
Strategy Read the values of all vadables from the last observation in one data set once
to place those values into the progra~ data vector. You can use the POINT=
and NOBS= options with the SET sta'tement to go directly to the last
observation in the data set. Then read;each observation in the second data set,
writing a new observation that contaii.1s the combined values.
:
Input Data Sets, j !

II I
DEPT_ID
Each salesperson in SALESREP works m SALESREP
the store and departm&ritshown in the last
observation in DEPT ID.i
,. OBS STORE DEPT ; OBS NAME MONTH TOTSALES
I
I.
: 1 02 AUTO 1 Harvey Jan $25,375
2 07 HSEWARES 2 Lou Jan $9,950
3 10 AUDIO 3 Mary Jan $27,985
4· 13 VIDEO 4 Sam Jan $8,795
Resulting Data et s 1
Output 3.2 SALES_ ~ J:?ata Set

SALES_ID
i
OBS S'l'O!IB DBP'l' !NAME MONTI! TOl'Sl\LBS
' ,!1 I
\ • •, ~. I
1 13 VIDKO :uarvl!Y Jan $25,375
2 13 VIDEO !Lou Jaa $9,.950
3 13 VIDEO ,Mary Jan $27,985
' ·, 1,4 13 VlDEO Sam Jan $8,795
i
!1
i
Ii
II
I:
,.,.'-'"'""""JS tf ,nuz;1c vv.,c:, yuuv11 n•1,r1 1r.u1u1pu:. vu.,.erv1111u1~.1i- I L:..J &.:.rlllllple .J.~ 'f::11
' ! II
Program. ~he objective is to take the data set that contains informatioh kbout sales
representatives and add to each observation the same value (or store and
oepartment, which are read from another data set. The Iast'observation in
DEPT_ID contains the correct store and department value~, k~ that observation
is read on the first iteration of the DATA step. The values f6r STORE and
DEPT remain in the program data vectbr for the duration of t~e DATA step
execution because values read with the SET statement are aritomatically
~etained until other values for those same variables are read tolreplace them.
Each DATA step iteration reads an observation from SALESREP and writes
~n observation that contains the same value for STORE an~
the data for a single sales representative: ! ·
and all of ~rPT
I I
! . '
I
Cl'eate SALES_ID. Read the last data sales_id;
obse,-vatio11ft·om DEPT_ID on the first if _n_=l then set dept_id point=last nobs=last;
iteration, NOBS= sets the value of LAST to
4, the last observation in the data set.
POINT= allows you direct access to
observation 4.
Read a11 observationfrom SALESREP set salesrep;

and automatically write a11 observatio11 to run;
SALES_ID.
.. u n..1u111ptt: ., •.., u I l11up,r:1· ->
'I
I:i .
Example 3.~ I Merging Observationsifrom Multiple Data
Sets Based on a Common Variable
''
i I
Goal
Ii
i
Combine observations from multiple ~ata sets based on a variable common to
each contributing data set. This task r~quires that each data set either have an
index on the BY variable or be sorted by the values of the BY variable.
.. Ii
i . i
, I
Strategy Use the BY statement and the MERG;E statement to match-merge the
observations whose d_ata sets are specified in the MERGE statement by the
values of the BY variable.
,1
Input Data Set~ ;! !

I '
The data sets must be i~~ried by tile values OliE TWO

of ID. Each value of Ip joccurs only once in
data sets ONE and Tl:-p~EB but may occur OBS ID NAME OBS ID SALE
multiple times in TW ;"j
I l l Nay Rong 1 1 $28,000
2 2 Kelly Windsor 2 2 $30,000
3 3 Julio Meraz 3 2 $40,000
4 .. 4 Richard Krabill 4 3 $15,000
5 "'. ~ 5 :· : Rita Giuliano 5 3 $20,000
6 3 $25,000
7 4 $35,000
8 5 $40,000
;, '' ·'l'HREE'
OBS . ' ID • BONUS
1 1 $2,000
2 2 $4,000
3 3 ,; $3,000
4 '4" ;, ·$2,500
5 \I ·.5i: ...."$2,800
,,:,
Resulting Oat~ S~~
:] !
011tpllt 3.3 FlNAL D ta Set
FINAL
.. OBS ID IWIB SALB BONUS
1 1 Nay Rong $28,000 $2,000
2 2 Kelly Windsor $30,000 $4,000
3 2 Kelly Ninasor $40,000 $4,000
3
'
5
6
3,
.3
Julio Hara=
Julio Meraz
Julio Mera;
$15,000
$20,000
$25,000
$3,000
$3,000
$3,000
7
8 '
5,
Richard Krabil
Rita Giuliano I
!
$35,000
$40,000
$2,500
$2, BOO
C:0111bi11i11g a ~·111g1e uvsen1a11011 w1111 Mllftlpte uoserva11011s y J<.;xamp/e 3.3 47
! !
. :I
Program The objective is to create a new data set that matches each inciividual with the
correct sale and bonus based on corresponding ID values. Thi~ program
m•· atch-merges the data sets 0.NE, TWO, and THREE to er~.al1e.[la single data set
that contains variables ID, NAME, SALE, and BONUS: I i
~ata set TWO contain~ multiple occurr~nces of some valu~.s ID, while ONE Jr.
and THREE contain only one occurrence. Because values of ~11AME and
BONUS (which are read from ONE and THREE) are automatibally retained
across the BY group, multiple observations with the same ".ai~bl for ID contain
the.correct NAME and BONUS values: I ;·
i '
Create FINAL. Combine observations data final;
from Ille three data sets based o,i tlre merge one two three;
111atcl,i11g va_lues for ID to create the by id;
FINAL data set. run;
'to r.xa111p1e :,.-, u I Y.","pier :,
i
I
Example 3.f·I I
Applying. Transaction~ to a Master Data Set
Based on a Common Variable
I I
1• 'i
' I
'
Goal Use a common variable to update th~ values of variables in a master data set
with the values of variables in a trans'action data set without writing missing
values to the revised master data set and without overlaying variable values in
1
the program data vector.
:1
' i
Strategy to
Use the MERGE and BY statements update the values of a master data set
with the values of a transaction data s et. Use the IN= data set option to indicate
1
whether the transaction data set contributed to this obse1·vation. If it did not
contribute, use IF-THEN logic with a DO group to preserve the original values
1
I •
from the master data set. You must rename variables with the RENAME=
option because the master and transaqtion data sets contain the same variables.
Input Data Sets· 1 :

I,, :,1
Both data sets have ti e same variables.

'
HAS'l'ER TRANS
.i
OBS :I'l'EMA I'l'EMB I'l'EMC :OBS I'l'EMA ITEMS ITEMC
1 1 2 0 1 1 5 6
2 1 3 99 2 3 3 4
3 1 "4 88
4 1 5 '17
5 2 1' 66
6 2 2 55
7 3 i 4 44
Resulting Data Is.1et

:[ .
Output 3.4 FINAL°l~ta ;set i. FINAL
'I
OBS l'l'EMA I I'l'EMB I!l'EMC
1[1 i
1 1 5 6
1 2 1 5 G
I ' 3 1 5 6
4 1 5 6
5 2 1 66
6 2 2 55
7 3 3 4
I,
Co111billi11g a Single Observation with Multiple Obsen1atio1isj □ E:mmple 3,4 49
. . I·
Program The objective is to update all variable values in observatiol from MASTER
with the values of variables contained in observations in T~NS, based on the
values of the BY variable ITEMA. Because both data seis bontain the same
variables, you must rename variables other than the BY va~ikb1e so that
variable values from observations in MASTER will not bejnerlaid.
Special handling is required when TRANS does not contairi ~ matched value
for ITEMA in MASTER. Use IF~THEN processing and th~: {N= option to
Create FINAL. Combine observatio11s

determine when TRANS contributes to an observation. IfT~ANS doesn't
: from MASTER:
data final (drop=oldb oldcl; ]I 1

l
i contribute, then reset the values ofITEMB and ITEMC to ih~ir original values
! Ii
from MASTER a11d TRANS based 011 the merge master(renarne=(itemb=oldbitemc=oldcl) trans(1b in2);
,nafchi11g values for tl,e BY variable by itema; !
JTEMA. RENAME=renames the variables
ITEMB and ITEMC in MASTER to
preserve the values for ITEMB and ITEMC
from MASTER. IN= creates the temporary
variable IN2, which is set to I when an
observation from TRANS contributes to the
current observation.
Wltm TRANS does 11ot co11trib11te values if not in2 then

for ITEMB a11d ITEMC based 011 tile do;
c11rre1Zt val11e ofITEMA, use tlie values itemb=oldb;
from MASTER. If the value of IN2 does itemc=oldc;
not equal 1 (is not true), the values of
end;
ITEMB anq. ITEMC are set to the original
run;
values from MASTER.
ou e,.mmp,e .,,J u I1
,m/"",- "
Example 3.~I Combining and CollaAsing .Observations

I:l Based on a Common Variable
. I
I,,
i_l
i
Goal ii Reshape the transaction data s~t, turtiing related observations into single ones,
!I,, and match each collapsed
.
observationI from the transaction data set with an
appropriate observation from the master data set, based on the value of a key
i! variable. . I
II'I • I
!,
Strategy i Combine· data from ·two SAS data sets based on the key variable. Read the
I
! ' master d~ta set sequentially, while using the KEY= option to directly access
observations in an indexed transaction data set. When the transaction data set
11
'I contains multiple observations for thJ same key value, you can collapse them
into a single observation as you combine it with an observation from the
i' MASTER data set. Use error-checkirig logic to direct execution to the
appropriate code path. ! I
!
,,, .
Input Data Sets! ;
Ii ,
SSN is common to botii TRANS and MASTER TRANS
MASTER. MASTERl~?ntains only u~ique
values of SSN, but T~1NS can contain up OBS i' SSN NAME OBS SSN RECDATE
to three observations ~ith the same value
forSSN. I JI 1 215-15-0007 David 1 202-36-5566 89
II :_I .. 2 221-27-1:234 Jane 2 215-15-0007 92
Becaus~ the program ~~pends on accessing
3 '231-18-1345 Susan 3 215-15-0007 90
observations in TRAI-fS directly using
KEY==SSN, TRANS m'~st be indexed on the 4 233-44-3215 Paula 4 215-15-0007 89
I I 5 243-09-8956 Joe 5 221-27-1234 92
variable SSN. That botti data sets be sorted
by SSN is not requireh but is recommended
1
6 221-27-1234 90
for pedrn-m,nce, !I 7
8
221-27-1234
231-18-1345
89
93
9 231-18-1345 92
10 243-09-8956 93
,· 11 243-09-8956 92
12 243-09-8956 91
Resulting Data ~ets

I Ii !
Output 3,Sa FINALJ: I

. '
r
I ;
When All Observations Are Included from
MASTER
OBS
1
Dl\TBl
92
DATE2
90
FlNAL1
I
D!'.TEJ
! 89
SSH
215-15-0007
NAME
David
2 92 90 : 89 221-27-1234 Jane
3 . 93 92 i 231-18-1345 Susan
Paula
4
5 : 93 92 i9i 233-44-3215
243-09-8956 Joe
'
--··•-······o ... - .. ·a·- - ---· ···.··-·· ······ ·····••·r•- ----• ......
1
1•[- - _....... , . .•.,. .,.., -.
!, I
Output 3.Sb FINAL2
FINAL2
When Only Matched Observations Are OBS DA'l'Bl DA'rB2 DA'rE3 SSN N»IE
Included from MASTER. See "Related .1 92 90 89 215-15-0007 David

Technique," 2
3
4
92
93
93
90
92
92
89
91
. 221-27-123(
231-18-1345
20-09-8956
Jana
Suaan
Joa
=
,: I'[
I
'
:
;
Program The objective is to combine data from the TRANS dat~ s~ with the
appropriate observations from the MASTER data set. MASTER contains one
observation for each value of SSN while TRANS contairi~lmultiple
observations for some values of SSN. Instead of creating! ¢ultiple observation:
in MASTER, this application collapses into a single obsJr
: I!
ation multiple
observations from TRANS that contain the same SSN val e. This program
assumes that TRANS contains no more than three obseH~ ~ons with the same
value for ID. ' i 1! ·
! 11 .
First, read one observation from MASTER. Then, by designating SSN as the
KEY variable, read all of the observations from TRANS th the same SSN r'f
value, and write a single observation to FINAL!. Use an ~r;ray to assign each
RECDATE value from TRANS to DATEl, DATE2, or DATE3, as
appropriate. This application assumes that the TRANS c1JiJ set has been
indexed on SSN. It also assumes that no more than three t~~ords exist in
TRANS for each value of SSN, so the DATEFLD array dohtains obly three
elements: ' : . 'I
Create FINALI a11d define array data finall(drop=i recdate);
DATEFLD. Read a11 obse,-vatio11fi·om array datefld(*) date1-date3;
MASTER. In preparation for collapsing set master;
multiple observations in TRANS into single
observations for each SSN value, the an·ay
DATEFLD is defined.
Reset I to Oso tltat eacli time yori process i=O;

a11 observatio11from MASTER, yor, begi11 do un~il (_iorc_ =:lsysrc(_dsenom)):
processi11g tlie an·ay witli t1lefirst eleme11t. set trans key=ss~;
Read a11 obse,•vatio11f,·0111 TRANS, based
011 Ille value of tlte key variable SSN. The
DO UNTIL loop executes and processes
observations from TRANS until all
observations with the current value of SSN
have been read. (For information on
JORC_ and SYSRC, see the Apperidix.)
""' ,:,.,w11p11: J,.J I: i :""P"'' J
W/,e., a" obsena,J.!,from

I I
TRANS with select Liorc_) ;
the current value oPSSN
,_:1
has beeu read, when (bysrc (_sok)I do/
store RECDATEvp[ftesfrom TRANS i11to i + l; ,f
an element of the DATEFLD array
H
(DATEI, DATE2, DATEJ). When the
value of IORC_ cprresponds to _SOK, the
datefld(i) = recdate;
end;
value of SSN in the cibservation being read
from TRANS matchcisI •I
the current SSN
value from MASTER.
Ii ·
~ Write a11 obser~1ation to FINALI whe11 when(%sysrc(_dsenom)l ~o;
all observatio11sfrom'I TRANS that /,ave a output;
!
I
matclting value Jo) t/1e curre11t vaTue of _error_= O; !

SSN from MASTER:have bee11 read. When end;
the value of _IO Rd~ 1corresponds to
_DSENOM, no ob~~1vations in TRANS
contain the current value of SSN, so the
current observationji~ written to FINALl.
·I
_ERROR_is reset to 1o _to prevent an error
condition that woulk write the contents of
1 11
the program data vTto~- to the SAS log.
111 case ofau zmexpeL~d _IORC_ otherwise do;

co11ditio11, write an f'~·br
message a11d stop put 'Unexpected ERROR: _IORC_ = '_iorc_;
execrttion. When_IOtlC- corresponds to stop;
anything other than ~f>SENOM or _SOK, end;
an unexpected condf tion has been
encountered, so an~~1io~ message is written
to the SAS log and \he STOP statement
end;
end;
/* ends SELECT group*/
l* ends DO UNTI~ loop */
terminates the DAT~lsTp.

run;
~ Related TedHnique The program shown previously produces a resulting data set (FINALl) that
includes all observations in MASTER, due to the logic of the DO group in the
WHEN statement: ·
i
l11cl11de all obserJJa io,11sfrom MASTER i11 when (%sysrc (_dsenom)) do;
output data set. See INALl in Output output; 1
3.Sa. ii end;
.:_error_= O;
To produce a resulting data set (FIN~L2) that includes only those observations
from MASTER whose SSN values o·ccur in TRANS, change the statements in
the WHEN statement: ·
.. lno/,,J, ouly ma/ch/7.l bmvalion, i,, when(%sysrc(_dsenom)) do;

'output data set. See FJNAL2 in Output if i > 0 then output;
3.5b. 1 ·/ _error_= 0;
I I end;
Combi11i11g a Si11gle Observa1io11 with M11/tip/e Obse1va1io1is ' □ E:mmple 3.6 53
' . i j:
i rl
Example 3.6 Applying Transactions to a Master Data Set
Using an Index '
i I
.i
' i i
Goal • Update a master data set in place using values supplied by 1~ transaction data
; set to locate observations in the master data set that are to be replaced.
i ; 11 .
i i.! :
; '.l .
Strategy Overall Strategy ! :

I
I
i!
To increase J/0 efficiency, read an observation from the transaction data set
and keep only the variable.that is the key variable in the m~~:ter data set.
l Because th~ master data se,t ca~ contain ~ulti~le occurre?97f of any value of
'the key variable, read observations from 1t until there are,nq more matches.
Each time there is a match; read the current observation :t'rdrri the transaction
data set in its entirety and replace the matching observati?1 :·i'r the master data
1
set. When no match occurs, you can write a note to the log ;
I i
Step-by-Step Strategy I ;'
' I
Read the transaction data set sequentially, but use the SET statement with the
POINT= option to access observations directly by observ~t~dn number. Place
the SET within an iterative DO loop so that the index variablf for the DO loop
can supply values to the POINT= variable. Using direct a~d~ss is important
Ibecause_ it makes it possible for _you to reread the current ~ti~+rvation from the
;i transaction data set when there 1s a match. :· I . I: I
i . ! H.
;Next, use the MODIFY statement to read an observation fro:qi the master data
lset, retrieving a match based on the values of the KEY;::;; va#,ble. Use the'
i IORC automatic variable and the SYSRC autocall macrd'iri a SELECT
!found.
!
t;
!group execute the appropriate statements, based on whbttWa match is
.
:
. : 11
II l
_ .
1when a match is found, reread the current observation froni t e transaction
'data
:
set so that the new values for these variables will ovei"l~t:
. ,. I·
the values read
from the master data set. Use SET with POINT= to reread tlie same
:obs_ervati~n from the tra~saction data set, this time bringi~g!I the values of all
1
vanables mto the program data vector. ! I:
1
i
1
!Fmally,
I
• usethe REPLACE statement• to up dfile the observa
•
I 110n
• •
mp1ace mt
. he
master data set. · •
! . !
i
!
Input Data Sets :
Both X and Y are common to the MASTER MASTER TRANS
and TRANS data sets. MASTER contains
multiple observations with duplicate values OBS X y OBS X y
for the key variable X. Because the program
depends on directly accessing observations 1 1 2 :i 1 8
in MASTER by using KEY=X, MASTER
2 1 3 2 3 9
must be indexed on the variable X.
3 2 4 3 5 2
4 3 5
5 1 2
.... /CACIIIIJIII: .,,U 11":""JIii:/".,
I ,
Resulting Data Set
I;
1.' •
Output 3.6 Update~ Version of MASTER
MAHER
Data Set !
OBS X 1C
11 1 8
21 1 8
3 I 2 4
4 ! 3 9
5 ! 1 8
Program The objective is to update the MASTER data set in place, replacing any
observation whose value of the key yariable X matches the value of X stored in
the TRANS data set TRANS contains one observation for each value of X,
while MASTER
. contains multiple observations
. for some values of X.
Use the KEEP= option to read only t_he variable X from TRANS. Then read an
observation from MASTER using the MODIFY statement and the KEY=
option. If there is a match for the key variable X in MASTER, reread the same
observation from TRANS, this time including all of the variables. These values
overlay the existing values in the program data vector that were read from
MASTER, and the REPLACE statement updates in place the current
observation in MASTER. !
To read selected observations from iRANS twice, use the SET statement with
the POINT= option to access observations directly by observation number, Use
an iterative DO loop to supply values to the POINT= variable for the first SET
statement. Then use this same variable (P) as the POINT= variable in the
I
second SET statement, allowing it to :read the same observation in TRANS in
i. its entirety. The entire program runs in one iteration of the DATA step.
I
·:
Use the _IORC_ automatic variable ~nd the SYSRC autocall macro in a
SELECT group to route execution to the appropriate code path based on
whether
, .
a match is found:
.' ~
Opell MASTER/or 1tpilate a11d exec11te the data master;

iterative DO loop to 8i~~e11tially process do p = 1 to ~ot,obs;
observatio11sfrom TRANS 11si11g direct flag= O;
I ii ·
by
access obse,·vatio, 'Iumbe,·. _iorc_ = O;
set trans(.keep=x) point=p n~bs=totobs;
Updat, MASTER {u j,,~

based on th•
value oftl,e key variable X. The DO
I !·I '
do while [_iorc_=lsysrc(_sok)I;
modi~y master key=x;
·· •WHILE loop executes and processes
observations from M1~'q3R as long as
values of X in MASTF,R match the current
value of X in TRANSi. :(F~r information on
JORC_ and SYSRC, ~e1e'. the Appendix.)
I :
·. I·
I!II i
I·'
I'
i
.!
i
I:
i: 1· l,_J
!i
Combi11i11g a Single Obsen•ation wit!, M11/tiple Obse1wi1iJ11s □ Example 3.6 55
. Ji
·Wl1e11 a,i obse,·vatio1ifi·o111 MASTER

111/rose key value matches tlzat of TRANS
select (_iorc_);
when (%sysrc(_sok)) do;
III·
1I
1,as bee11 read, rel'ead tlze c11rre,rt set trans point=p;
obser11atio11fi·om TRANS a11d replace tl,e flag=l;
obser11atio11 i11 MASTER. When the value replace;
of JORC_ corresponds to _SOK, the value end;
of X in the obs~rvation being read from i
MASTER matches the current X value from
TRANS. The value of the POINT= variable
allows you to reread the current observatiori
from TRANS, this time reading all of the ·
variables so their values can overlay the !
existing values from MASTER. i
!
I
When 110 matcli is f o,md, 110 f11rtller when (%sysrc(_dsenom)) do; i
attempts are made to retrieve obse1'tlatio11s if flag then I
I
from MASTER a11d tlle DO WHILE loop put 'NOTE: No more matches for KEY Ix:
e11ds. A11 appropriate 110/e is 111ritte11 to the I
else. I
log i11dicati11g that 110 mate/Jes exist for /lie, put 'NOTE: No match for KEY=' x;
cm·re11t key val11e, When the value of ' _error_= O;
JORC_ co1,esponds to _DSENOM, no
obsetvations in MASTER contain the
end; l I
current value of X. The _ERROR_
automatic variable is reset to Oto prevent an!
error condition that would write the
contents of the program data vector to the
SAS log. The value of FLAG determines
which note is written to the log.
1·
111 case ofail tmexpected _IORC_ othe;rwise do;

co11ditio111 write a11 error message lo the : put 'ERROR: _IORC_ = '_iorc_ / 'Prol am halted.';
SAS log a11d stop executio11, When _IORC_: _error_= O;
corresponds to anything other than 1
stop;-
_DSENOM or _SOK, an unexpected .
end;
condition has been encountered, so an error i
end; /* ends SELECT group */
message is written to the SAS log and the :
STOP statement terminates the DATA step. ; end; /* ends DO WHILE loop */
_ERROR_ is reset to Oto prevent an error ' end; /* ends iterative DO loop*/
condition that would write the contents of stop;
the program data vector to the log. The run;
second STOP statement is necessary to end
the DATA step upon exiting the iterative
DO loop because there is no end-of-file
condition to stop the DATA step when the
SET statement uses POINT=.
"" r.xu111p11J .:,,1 u I L11ap11:r.:,
I Ii
I I
Example 3. ~~ Removing Observatio~s from a Master Data

Set Based on Values in a Transaction Data
I
Set i
!
Goal Update a master data set in place usiri.g values supplied by a transaction data
set to locate observations in the rnast~r data set that are to be deleted.
I
I
:• :
Strategy Process observati9ns from the transa4tion data set sequentially, supplying
values to locate o~servations that are :to be deleted in the master data set. Use
the MODIFY statement to update a data set in place and the KEY= option to
directly access the master data set by 'using an index on the KEY= variable.
Verify the results of the MODIFY execution using the automatic variable
_IORC_ and the SYSRC autocall ma9ro. Use the REMOVE statement to
delete any match in the master data s~t. Use an iterative process to access all
observations in the master data set th* have a match for the cu1Tent key value.
When all observations for the current key value have been deleted or when the
master data set does not contain the kh
value supplied by the transaction data
set, continue processing to the next kJy value froin the transaction data set. The
task is complete when all observation~ from the transaction data set have been
processed.. , I
I
!
You can pe1form the same task with ~ROC SQL; see "Related Technique."
I
I
Note: Due to the vqriability of data imd the number of conditions that
determine the path chosen by the PRQC SQL optimizer, it is not always
possible to determine the most efficie11t method without first testing with your
data. · ' ·
Input Data Setsj \I i

CUST is common to ooth TRANS and MAST.ER' I· TRANS
MASTER, and MAST~~ _contains multiple r. 1: ~ · -i ;:f~. ;: :~: ,i
,,
observations with the same
1,1
value for CUST. OBS ' CUSTI X · OBS CUS'i'
Because the program 99pends on directly 1 '·1;,l1 If' 1 1
accessing observation~~µ MASTER by 2 ':·1 "·' ·2' ·' 2 3
using KEY=CUST, Mlt\STER
1·, . must be 3 '''1 i• :· 'j.
ind~x.ed on the variabl · UST. 4 2 ;! 2'
5 2. 2 1•
6 · ·2 ' 2 '
7' . ·2 2 ,.
8 3' ' .. ·3 '
9 3 3 :
10 4 2
.. u .. - u· ......, ·- - ---- ... ·-·· - -·······r·- ......
Resulting Data Set +··

011tp11t3.7a Updated Version of
;, MASHR
MASTER Data Set •I
OBS CUS'l' X
MASTER was updated with the DATA 4 2 2
step. -5 2 2
6 2 2
L, 7 2 2
I, 10 4 2
: :,.:
011/prit 3.7b Updated Version of

.MASTBJI
MASTER Data Set
ODS CUS!!' X
MASTER was updated with PROC SQL. 2 2
'
5
6
2
2
2
2
7 2 2
10 4 2
--:=:::::--~ - - - = - = = = - = - - - - - - - = - - - - - - - - - - - -
I !.
!
i !
, I '
Program The objective is to up~ate the MASTER data set in place, 1iemoving any
observation whose value of the key variable CUST match~~1 the value of CUST
stol'ed in the TRANS data'set. TRANS contains one obser11~ation for each value
ofCUST, while MASTER contains multiple observatiorisl6r some values of
CUST. . i 1':!
!' I;·1
Read an observation from TRANS to obtain a value of Ct,r~/f· Execut.e the
MODIFY statement with the KEY:::; option to directly access MASTER using
the index defined for CUST. .
To verify whether a match ~a's peen located, use
. l;
the,SYSRC autocall macro and the _IORC_ automatic vanable. When a match
occurs, the RE~OVE statement deletes the observation ju~~lretrieved and
updates MASTER in place. When no match occurs, FLA'.Q 1~set to prevent
further retrievals for the current value of CUST. I I·!
To delete multiple observ~tions when MASTER contain~ licates, enclose dh

the MODIFY statement in the DO UNTIL loop to continuJ;execution for the
current value of CUST. After all occurrences have been 11e{iibved and deleted,
the no match condition _DSENOM is encquntered, FLAG jsl set, and the loop
, terminates. The D~TA step iterates and processing conti~4~~ wi!h the ne~t
value of CUST retneved from TRANS. The DATA step processmg termmates
when the end-of-fiie'condition is encountered for TRANS:
Open MASTER/or rtpdate. Read a11 data master;

observatio11from TRANS. set trans;
:io .exc111)P,.j11 ~·' u <.tmp,er;,
i:
,j .
Remove o~sJ1·vatio11s i11 MASTER based · flag=O; .
011the v~li~e ~f the key val'iable CUST. The do until (flag);
DO UNTilli loop executes and processes modify master k~y=cust;
0
observati~ds from MASTER until all

observatip1~s ,with the cur.rent value of
CUST hari been read and deleted. (For
informatip~ (?ll _IORC_ and SYSRC, see
the Appendix.)
Ii ;
W/,e,i a11 ~b~en1atio11from MASTER /1aa select Liorc_) , .
been 1·ea11~/zose /cey value matches that of when (\sysrc(~sok)I remove;
TRANS, frf11~ve tlte obse1·vafio11fro111
MASTER,:'f,hen the value of JORC_
correspotlds to _SOK, the value of COST in
I·' ,
the obserya~ion being read from MASTER
matches the current CUST value from
TRANS. II! I
Wlie11 110 ~ilatcli is formd, 110 f111·tlle1· when (%sysrc(dsenom))
obsel'vatib,is ate tetl'ieved a11d tile do; i
DO UNTID loo'P e11ds. When the value of _error_fO;
I •I ·
JORC_ 99\Tesponds to _DSENOM, no flag=l; !
observations
11·1
in MASTER contain the
I ,
end; I
current v~lp,e of CUST. _ERROR_1s reset
to Oto pr9v,~n,t an error condition that would l 1
write the cohtents of the program data
111 ·
vector to the SAS log. When the value of
FLAG is
•
fHh~
l•'I,
DO UNTIL loop will not
.
begm a new. 1t.erat1on.
1 II :
ii !
In case o.tja11 imexpected _IORC_ otherwise
comfitio11}, ;J,·ite an error message and stop do;
executio11j t~en _IORC_ couesponds to put 'Unexpected ERROR: _iorc_= '_iorc_;
anything 9t~er than _DSENOM or _SOK, stop; I
an unexpected condition has been end;
I
I
encounter~d, so an error message is written end; /* ends SELECT group */
to the SA~ lpg and the STOP statement I
end; /* end~ DO UNTIL loop */
terminate the,DATA step.
run; i
I
Co111bi11i11g a Single Ob:re11•atio11 with Multiple Ob:re1w1t/01is □ Example 3.7 59
: !!I·
I "
i
: '1;.i
! I;
Related Technique If you are familiar with Structured Query Language (SQt); ou may want to
use PROC SQL instead of the DATA step. You can use aDELETE statement
in PROC SQL to delete rows from a table.* The rows that 1 1eet the criteria
specified in the WHERE clause are deleted.
proc sql;
delete from master · I
: where cust in (select cust from trans); ;

:
)
quit;.
'
• ! I·' I
i i
' The WHERE clause in this example uses a subquery, which !s a query that
returns one or more values. First, PROC SQL evaluates l~ubquery and
returns all the values for CUST from the TRANS table. The j'\'HERE clause
t~f
then evaluates to where cus t in ( 1 , 3 ) • Thus, all rb s in MASTER
i that have values of I or 3 for COST are deleted. , i
i
Note: In PROC SQL, DELETE statements do not automa ically produce a

report.
i : .
i
I
!;
Where to Go from Here □ MODIFY with KEY=. For a discussion of processiJg ~sing MODIFY
with KEY=, see Chapter 1, "SAS Language Statements)•• in SAS
Technical Report P-242 SAS Software: Changes a11d/:?1i/1ancements,
Release 6.08. !
, ·I
I
!
I
'I
I
I
i
i
•i
I•
1:
'i
• A PROC SQL table is a SAS da<a set. In SQL terminology, cplumns ~re variables and rows
are observations. I ·
11
OU r,.\Uf/1/JII: J,U
Example 3.~ · , Performing a Table Lookup with a Small

Lookup Data Set ·
Goal Combine two data sets by using the value of a specific variable to look up
information in a small auxiliary or lookup data set and add it to information in
the primary data set to create a new data set.
Strategy Sequentially process observations in a primary data set while using direct
access to read observations in the lookup data set until a match is found. This
table lookup technique directly accesses the lookup data based on observation
number and avoids reading subsequent observations from the lookup data set
once a match has been found. This te6hnique is best used with a smaH lookup
data set because there is the possibility of having to read many records from
the lookup data set when trying to fin~ a match.
!
To read the primary data set, use the SET statement to read one observation on
I
each iteration of the DATA step. To r~ad the lookup data set, use the SET
statement with the NOBS= and POIN::£';:: options and an iterative DO loop to
access each observation by observation number. Then you can test a condition
to determine whether combining the i11formation from the current observation
of each data set is appropriate and wri,te an observation to a new data set only
when the condition is met. Use the RENAME= option to rename the common
variable from ti1elc>okup data set so tliat the value read does not overwrite the
value read from the primary data set. \
You can perform the same task with P,ROC SQL; see "Related Technique."
, , ., , ' I
1, ! ' ,·, :
Note: Due to the variability of data and the number of conditions that
determine the path chosen by the PROC SQL optimizer, it is not always
possible to 1eter~ne the most efficie~t method without first testing with your
data. i
i
';. !
:
Input Data Setsl ! I:
Both data sets have the common variable 1 PRIMARY i LOOKUP
,. t
PARTNO. Ii' I
I
I
OBS PARTNO QUANTITY OBS PliRTNO DESC
. l I
!
1 A220 ' 4 1 A401 tuning peg
' 2 A498 41 2 Ao2s
I
bridge
3 A063 ' 8' ' ' 3 A203 nut
. 4' " A810 •, '4 ; '
4 l\220 neck
5 .1\810 pick guard
11,l:·, i
6 A063 pickup
d ! ,; A047 pot
7
.. 'cl,, 'i! volume knob
" 8 if608
I 9 A097 toggle switch
I 10 A498 body
,-1
Co111bi11/11g a Single Obsen1atioll with Multiple Observnt{o\,f □ Example J,8 51
Resulting Data Sets '!

Output 3.8a REPORTl Data Set ,,·,,
llEPOR'l'l I
: .
REPORTl was created with the DATA OBS PAIITNO QUAN'l'I'l'Y DESC
i
step. 1
2
A220
M98
4
4
neck
body
i
3
4
A063
A810 4
8 pickup
pick guard
·! I
I I! I
Ii I
Output 3.8b REPORT2 Data Set
RIPOR'l'2 !; I II
REPORT2 was created with PROC SQL. OBS PAl\'l'NO QUAmITY DESO i ,:1
l
' A220 4 neck i I
2 A810 4 pick guard I
3 A063 8 pickup !
4 A498 4 body i
!
'' ' 'i
' '' ·!
i
Program The oqjective is to create a new data set that includes all OJe information
from PRIMARY and only 'the corresponding descriptive 6rmation from i~f
LOOKUP. The resulting data set, REPORT!, contains the p!rt number,
quantity, and description. Read an observation from PRIM:i\~Y and
subsequently read observations from LOOKUP until a ma(~h is found. Use
RENAME= to rename PARTNO in LOOKUP so PARm~ ~ alues from
LOOKUP and PRIMARY are both retained. Use IF-THEl'f logic to compare
, the values and to output only matching observations: I I!
·1
Create REPORTl. Read a,1 observatio11 data reportl(drop=pn found);

from PRIMARY, set primary;
,,
Set FOUND to 0. Read observatio11sfrom found=O;
LOOKUP rmtil a mate!, is fou11d based oil do n=l to nurnobs until (found); !
the value of tlie commo11 variable, I ,I
set lookup (rename=(partno=pn)) nobs=numobs poin~~n;
PARTNO. The POINT= option creates a
variable (N) whose value provides direct . I! :
access to each observation in LOOKUP by
observation number. NOBS::: assigns the
number of observations in LOOKUP to the
variable NUMOBS. The DO loop iterates
once for each observation in LOOKUP or
until a match is found. RENAME::: renames
PARTNO in LOOKUP so the common
variable values from LOOKUP do not
overwrite the values from PRIMARY. Use
IF-THEN logic to compare PARTNO from
each data set and output the appropriate
values to REPORT!.
I '
01111:;. 1:!,J.Uflljlftt J.O W ~llUJJH!/ J
. '
;
i
Writea11 observatioil:io REPORT! ift!,e if partno=pn then
co11ditio11 is met. Se~F,OUND to 1 so the do;
DO loop stops and np.¥1~re observations output;
are read from LOOI<!UP until the next found=l;
observation from PR1MARY is read. ii end;
'
end;
i
Write a 11ote to the f gl whell there is 1l0 if not found then put 'No mat~h for PARTNO=' partno 'in LOOKUP.'
matclt. !I ' •
'Observation I
not added to!REPORTl data set.';
run;.
I ,
I !
Related Techn'que If you are familiar with Structured Query Language (SQL), you may want to
:1 ; use PROC SQL instead of the DATA step. PROC SQi.joins the tables* to
'I produce a new table, REPORT2. The; REPORT2 table is the same as
,i
i REPORT!, except for the order of the data, The difference in the order is a
:j result of the different processing techriques.
:· I I
Conceptually, the join results in an internal table that matches every row in
PRIMARY with every row in LOOKUP. The WHERE clause determines that
only the rows that have matching valttes for PARTNO will be in the resulting
table. The table REPORT2 has the qu antity and description for each part that is
1
in both input tables:
proc sql;
create table report2 as
:select *
from primary, lookup
where primary,partno=loo~up.partno;
!;
quit;.
Note: In PROC SQL, SELECT stat4ments automatically produce a report.

SELECT clauses, which follow CRE~TE TABLE or CREATE VIEW
statements, do not automatically prod~ce a report.
I
I
I
I
* A PROC SQL table is a SAS dnla set. In SQL terminology, columns are variables and rows
:1 are observations.
,I
i
.I
!i
i
i
I
I
I
I
Example 3.9 Performing a Table Lookup with Larige
Nonindexed Data Sets ! :! I
Goal
values remain fairly constant. :
•
,
j!
I
I.
1
strategy Use a table lookup technique that relates data using a usl) 1ritten format
rather than sequentially processing both data sets. Dyna~ically build the
format and retrieve the formatted values. This technique ik;~fficient when you
have a large data set whose retrieved values remain fairly bb"nstant and when
no index is otherwise needed for the data sets. i 1:: 1.
First, create a data set that· you can use to pass information !from the lookup file
· jl I
to the FORMAT procedure to dynamically build the format! Specify this data
set in the CNTLIN::::: option as input to the FORMAT proc~4ure. PROC
FORMAT uses the data iri the input control data set to bµi~~ the format. Create
a new data set by reading observations from the primaryifii,{and using the
PUT function to apply the formatted values of the commori ~ariable to a new
variable. 'I '
1
I;
, I 1'.
. ' II
Ea See "A Closer Look'.' for more information on dyn~~·1•cally building
: formats and retrieving val~es. ; I! :
You can perform the same task with PROC SQL; see " R~l ted Technique."
Note: Due to the variability of data and the number of hAh~i~ions that
determine the path chosen by the PROC SQL optimizer, itJ~J not always
possible to determine the most efficient method without fi~si testing with your
data. l:i
1:i '
lriput Data Sets

Both data sets have the common variable PRIMARY tOOKUP
PARTNO,
OBS PAl\TNO QUAN'l'ITY OBS PAl\'l'NO DESC
!'i.
:1
1 A220 4 1 A401 tuning_ peg : ! 1
2 A498 4 2 A025 bridge

3 A063 8 3 A203 nut
4 A810 4 4 A220 neck .
I
5 A810 pick guard:
6 A063 pickup
7 A047 pot
8 A608 volume knob: ,
9 A097 toggle Sl~itc}i:
10 A498 body
Resulting Data Set
I,[ ;
I • :
011tput 3,9a REPORT! Data Set
,1 '_ IIBPOR'l'l
1•I :
REPORT! was crea ed with the DATA OBS PARTNO QllllN'l'ITY DBSC
step. !1 :
2
1
3
''8 neck
A220
A498
1063
body
pickup
ABlO pick guard
~-=-------
' '
-=----=-====== -====== ====== - -__ -==-
•
I;
f I, .
011tp11t 3,9b REPORT2 Data Set

II 'I;1 i
I
IIEPOR'l'2
!
REPORT2 was create~ V(ith PROC SQL. 0811 PAll'l'NO . 9llllH'lITY DBIC
;i I 1 A220
1'1 :
!! :
2
3
'
AUS
A063
4
8
neck
body
pickup
~
' A810
'
====-----=========-=-=======- --==-======= -_-- --
piak 9'\lard
Program The objective is to create a new data set that includes all of the data from
PRIMARY and only the correspondi~g descriptive information from
LOOKUP. The resulting data set, REPORTl, contains the part number,
quantity, and description. Create the data set FORMATS, which takes the
information contained in LOOKUP, r~naming variables and adding a
FMTNAME variable so that PROC FORMAT can use it to dynamically build
the format $PARTS. Execute PROC FORMAT. Then create REPORT I, which
reads from PRIMARY and applies th~ formatted values of the key variable
PARTNO from LOOKUP to the new ~ariable DESC:
I !
Create tlie co11trol da~a ~~t FORMATS. data formats; !
Readfrom LOOKUPr~'~ re11ame tire set lookup(rename=(partno=star~desc=label));
variables tJ,at are req11iredfor tire fmtname='$parts';
CNTLIN= data set. R~riame PARTNO to run;
I ' .
START and rename~~~<;:! to LABEL.
Assign the required vI!,a~le FMTNAME
the value $PARTS, i:
;
I l
Ii.! Use CNTLIN= to ilild the format proc format
;
cntlin=formats;
$PARTS. dynamicall1,.1 j run; :
1
Ii
Create the data set Rl{P,ORTJ. Readfrom data reportl;
PRIMARYa11d create1(~te:11ew variable · set primary;
DESC. The PUT function relates the values de~c=p~t' (~rtno, $parts. ) ;
,a
~! the format $PARTS:. the common run;
variable PARTNO fro?1f90KUP, and the
results are stored in th new character
variable DESC. : ! I ; .::11-
I . ;I : j •
ii
I
II
:;i
Lo1110111mg a .:,111g1e uoservario,1 1111111 Mtttrtpte uoserva:10 ;1 u uampte 3,9 65
.. A Closer Look Using Formats to Periorm a Table Lookup i :! I

When you need to pelform a table lookup, a common te~hh~que is to use a
merge operation with the IN= option and the BY statemerit!However, when
you have a large primary data set and a small, unsorted ~opl<:up data set, using
formats is much more efficient. A user-written format creatbd with PROC
FORMAT uses a binary search technique to take the input'.falue and match it
with the appropriate output value. The binary search se~hes half or less of the
master file sequentially and does not require that you sort ttte data. On average,
a binary s~arch r~qu~res int(log2(n)) seek_ operations to ffn!~lthe ~esired key.
value. This solution 1s preferable as the size of the lookup!table increases with
;:::g::::~•;:~:mically Mth CNTLll~iI

This example uses FO~ATS, a temporary data set, w~i~~ is specified with
the CNTLIN=· option as input to the FORMAT procedure.!: ,ROC FORMAT
uses the contents of this data set to construct formats anci ibformats. A data set
specified with CNTLIN= must contain the required variabtJ START,
LABEL, and FMTNAME. Once the data set is created wit~ !the required
variables, you can use CNTLIN= to build formats dynatrti~Juy (in this
example, $PARTS.).
.
j
'
!I
, 1
!
I
I
i ;
I
Related Technique If you are familiar with Structured Query Language (SQL),! may want to tou
use PROC SQL instead of t"1e DATA step. Using the PR::1¥i*-RY. table,* PROC
SQL creates a new table, REPORT2, that has a new columnj DESC. The PUT
function assigns values to DESC by using the $PARTS. tor[at, created earlier,
with the PARTNO column. The $PARTS. format contains description for a
each part represented in the PARTNO column: 1 I· ·
! ' '
i :
proc sql;
I I
select*, put(partno,$parts.) as desc i
from primary;·
quit; i
I
i
Note: In PROC SQL, SELECT statements automatically~foduce a report.
SELECT clauses, which follow CREATE TABLE or CREA:fE VIEW
statements, do not automatically produce~ repoct I )' .
I . I
iI '
j,,
* A PROC SQL table is a SAS data set. In SQL 1crminology,, columns 1 : 1 variables and rows
are observations. : i: j
·I
''1I ..
I
I
UU 1:#.\lltllpll: .,.lU ~ ,1..:,,ruprt:1 J
1 ,
1
! .
Performing a Table Lo'.okup Using a

Composite Index When the Transaction Data
Set Contains Duplicate Values
. I
. I
i
Goal Combine! two data sets by using the ~alue of specific variables to locate
informati_on in an auxiliary or looki,p:data set and add it to information from
the primary data set. j
i
!
i
Strategy Use the iterative action of the DATA!step to read the primary file sequentially.
Directly access observations in the lookup file by using a composite index.
Specify the composite index with the !l{Ey= option on the SET statement and
use the UNIQUE option to force eacl, search for a match to begin at the
beginning of the index. Because the primary file contains consecutive
duplicate values of the variables rep~sented in the composite index, some
existing matches might not be found 1.mless each search begins at the beginning
of the index. Use the _IORC_ automatic variable and the SYSRC autocall
macro in error-checking logic to dire~t execution to the appropriate code path.
i
You can perform the same task with ~ROC SQL; see "Related Technique."
. i
' I
Note: ,· Due to the variability of data and the number of conditions that
determin~ thf?_path chosen by the PROC SQL optimizer, it i~ not always
possib,1~ tQ ~e,t~rmine 1th,e most efficierit method without first testing with your
data. · · ·
Input Data Setsl i;! j

STORE and LOC are common
I i ;
to both the
PRIMARY and LOOif(l~ data sets.
LOOKUP has a comp?~~t~ index on
STORE and LOC. PRf1ff.1RY contains
duplicate observationsj ith the same values '
I
for STORE ond LOC.1 I """"' I

1
LOOKUP
d
!
OB . STORE LOC: I'i'EM . 'AMOUll'i' OBS S'rORE LOC: STORNAME C:I'l'Y
l 233 DEBIT ' ., $350' 1 l 233 Lynn's Finest St Thomas

2 233 DEBIT '$550 2 1 735 Lynn's Finest San Diego
3 735 DEBIT' $650 3 1 234 Lynn's Finest Orlando
4 1 735 DEBIT $250 4 2 222 Just 4 You San Francisco
5 l 233 CREDIT $450' 5 2 444 Just 4 You New York
6 1 233 CREDIT $300 6 2 399 Just 4 You Boston
7 2 222 DEBIT $20;
;} ,. i,
8 !2 222 DBBIT' $io '·
9 2 444 CREDIT -' $775 .
10 :2 4.44 CREDIT' ,.. $995
11
II 399 CREDIT $1,00Q
:2
12 2 399 CREDIT $2,500
• For an explanation of the behavior of SET ,vilh KBY= when duplicates exist, see SAS
Technical Report P-242, SAS Software: Ch;"ges 011d E,1/1a11ceme11/s, Releose 6.08, page 14.
c:ombi11i11g a Single Ubse,-v~tio11 with Multiple Obse,-vatio11s □ Example 3.10 67
;
'I
Resulting Data Sets i
''
011tpr1t 3.10a REPORTl Data Set ' ':
:; ·' lll!PORT1
'
REPORTl was created with the DATA OBS STORNUIB CI'l'Y I'l'Elf 'JiMOUN'l'
;. 1.
step, 1 Lynn's Fin88t St Thomas llBBI'l' $350
2 Lynn'&; Finest . St Thomas DEBI'l' $550
3 Lynn i's: Finaat San Diego DEBIT $550
Lynn's. Finest San Diego DEBIT $250
'
5
6
Lynn• s · Finest
Lynn's Finest
St Th0111aa
St 'l'homas
Clll!DI'l'
CREDIT
$450
$300
I
7 Just 4 ,You San Francisco DEBIT $20
8 Juit 4 ·you San Francisco DBSI'l' $10
I g Just 4 lt'ou New York CRBDIT $775
10 Just 4,You New York CRBDI'l' $995
11 Just '4 '.You Boston CRBDI'l' $jl.,OOO
12 Just 4 ,You Boston CREDIT 1'$,,500
"I I'.
i :
!
Output 3.10b REPORT2 Data Set
RBPOR'l'2
REPORT2 was created with PROC SQL. OB!! STOIUWIB CITY ITEM iAJ OUN'l'
i
1 Lynn• a Finest St 'l'hOIIIBB DBBI'l' i $350
2 Lynn, I Finest st Thomas DEBU ' $550
3 Lynn's Finest San Diego DEBIT ' mo
4 Lynn's Fine11t San Diego DBBI'l' $250
5 Lynn's Finest St ThOlll!lB CIIEDIT $450
6 Lynn's FinBBt St Thomas ·Cl!EDI'l' $300
7 Just 4 You San Francisco DEBIT $20
8 Juat 4 You San Francisco DEBIT $10
9 Just 4 You N'ew York CREDIT I $775
10 Just 4 You New York CRBDIT ! •$995
11 Just 4 You Boston CRBDIT 1#,000
12 Just 4 You Boston CRBDIT ,$roo
H
I
'
i
Program The objective is to create a new data set that includes ijtion from
PRIMARY and the corresponding descriptive informatio? f~·pm LOOKUP.
The resulting data set, REPORT!, contains the store name,lcity, item, and
amount. Read an observation from PRIMARY using sequehfial access. Using
the composite index STORLOC, read an observation from ~bOKUP directly
based on the current values of variables STORE and LOC. B~cause
PRIMARY contains duplicate values, you must begin eachl~barch on the
STORLOC index for the LOOKUP data set at the beginnin~-1 _Otherwise, you
would miss matches in LOOKUP for consecutive duplicate v.alues of
STORLOC in PRIMARY.
UU 1:M.Uttlptr: J,.JVI ! L.1 1-1,aup1c1 ..,J
I:1 :
Create REP01T~i Read a11 obsenation data reportl(drop=store loc};
from PRIMAR!Yl : set primary;
I·I i
Readfro,n LOO,{(UP witlt direct access, set lookup
. key=storloc/unique;
;
based Ott val11eh11 't!,e composite illdex
STORLOC, UNIQUE causes the search to
always begin a~ ·tite'.beginning of the index,
so that consecuti~e duplicate values in
PRIMARY will riot miss amatch in
LOOKUP. I !:I
Iii ·
Whe11 tlie c11n·e1it val11es of STORE a11d select (_iorc_);
LOCfrom PRI~Ymatch a STORLOC when (%sysrc{_80K)) oqtput;
i11dex valuefi·op~!LOOI(UP, wrilea11
observatio11 to REPORTJ. When the value
t ·I '
of _IORC_ corresponds
I; .
to _SOK, there is a
match. I'!'I ..
i
Wl,e11 tlte Cltrl'Bfl'i val11es of STORE a11d when (\sysrcLdsenom))[
LOCf,'0111 PRlfA~Y do 11ot matcli a do;' j
STORLOC i11def lvqlllefro111 LOOIWP, put 'WARNING! New Location not in Table' store= loc=;
write a wan1i11B1 llf essage to tlte log. When _error_=O;
the value of _IOjRC.:.. corresponds to end;
_DSENOM, therelis no match. The PUT
statement writes! ~!I message to the log.
Setting _ERROR_ to Oprevents the error
condition from writing the entire contents of
the program dat~ ~ector to die log,
l!.I ·.
/11 case of all u,iexpected _IORC_ otherwise.
co11ditio11, w,·ite H1r ~rror message to the do;
SAS log a11d stol/:~xec11tio11. Wlten JORC_ put 'Unexpected RROR: _IORC_ = ' _iorc_;
corresponds to ap~thing other than _error_=O;
...DSENOM or -f~K, an unexpected stop;
condition has ber~ encountered, so an error end;
message is writt«r~!to the SAS log and the end•
STOP statement ~efminates the DATA step. ' I
run;
_ERROR_ is resl'~ 1t~ 0 to prevent an error
condition that wJuldI •I ,
write the contents of
the program datl y~tor to the log,
I
t
:I,1 ;.
I .
1
;1
:I
i .
l I
·1
I
I
I
'I
I
i
II
O o . I:' j j:] I ....,,,.uu,yn.; .., • .s.v u~
, l;j
: i·i .
• 11
Related Technique If you are familiar with Structured Query Language (SQL)! JYOU may want to
use _PROC SQL instead of the DATA step. PROC SQL joins the tables* to
produce a new table, REPORT2. The REPORT2 table is' thd same as
REPORT 1. • ! I11
Conceptually, the join res~lts in an internal table that matctl s every row in
PRIMARY with every row
in LOOKUP. However, you Y'~qt
only the rows
where the values for STORE and LOC are the same. The WHERE clause
returns the rows from the join that have the same values fd~
TORE and LOC.
Thus, the result is a table that includes information from 'bb't tables, based on
the columns they have in common: i; !! •.
proc sql;
select storname,·city, item, amount ,
from primary p, lookup l I!..
quit;
where p,sl:ore;=l.store and p,loc=l.loc;
:
Note: In PROC SQL, S~LECT statements automatical~J> I oduce a report.

i I· .
i I l
>
SELECT clauses, which follow CREATE TABLE or CREA E VIEW
statements, do not automatically produce a report. · !
• A PROC SQL table is a SAS data set. In SQL terminology, columns iir I variables and rows
are observations. ' !
,u t:M,(IIIIJJlt! .J." J. J ·1 '-llr.lpuu J
I 1 ;
I I'
Example 3. ·',, i Performing a Table Lopkup with a Large
Lo~kup; Data Set That ~s Indexed
.' . !
Goal Efficiently combi~e two data sets wh6n the lookup data set is large and has an
index. · '
Strategy I Use a table lookup technique that is e~pecially appropriate for a large lookup
data set. Perform a table lookup using an index to locate observations that have
key values equal to the current value l:>f the key variable. Read from the
primary file sequentially. Then to read the lookup data set, use the SET
statement with the KEY= option to adcess the observations directly.
Observations are written to the output data set only when a match occurs in the
lookup data set for the key value supelied by the primary data set. Use
error-checking logic to direct executi?n to the appropriate code path.
!
You can perform the same task with ~ROC SQL, see "Related Technique."
I
Note: Due to the variability of• data I~nd the number of conditions that
determine the path chosen by the PRqC SQL optimizer, it is not always
possible to .determine the most efficient method without first testing with your
data, i :. •; \' ·
Input Data Sets! : I!

.,
PARTNO is common fo1both the PRIMARY LOOKUP
PRlMARY and LOOKUP data sets.
PRIMARY contains p~nseculive np OBS , PAll!l'NO QUANTITY o~s PAR!I'NO DESC
duplicate values for PtRTNO.* Because iI
the program depends qnid~rectly accessing 1 :· A063 11
8 A401 tuning peg
observations in LOOI~U,P,by using 2\ A220 4
;
2 A025 bridge
KEY=PARTNO,Log~uP must be 3·
indexed on the variablf:fARTNO.
A498 4 3 A203 nut
4' A777 3 4 A220 neck
5' A810 4 5 A810 pick guard
6 A063 pickup
. ? A047 pot
',. ·., 8 A608 volume knob
~ A09? toggle switch
·!i-
10 A498 body
i
!I i.
;_ ii1,r: r i·-:
i !': ,.:· i;.;; ,L'
! '·
!:-i
• This"program ,vorks as expected only if PRIMARY contains no consecutive observations

with the same value for PARTNO. For an explanation of the behavior of SET with KEY=
when duplicates exist, see SAS Technical Report P-242, SAS Software: Cl,a11ges a11d
E11ha11ce111e11ts,. Release 6.08, page 14,
Combiili11g a Si11g/e Obse1vatio11 with Multiple Observatio,is □ Example 3.J I 71
I ri
! !!
Resulting Data Sets i 1:1
I 1,:
I I!
011tp11t 3.lla REPORT I Data Set I,
REPOR'l'l
REPORT! was created with the DATA OBS PAR'l'NO QUANTITY DISC Ii
step. 1 A063 8 pickup n
2 A220 4 neck ! Ii
3 AOB 4 body ! I;
A810, piak guard 1:
' ' !' 1:
Output 3.llb REPORT2 Data Set

llB~ll'1'2
REPORT2 was created with PROC SQL. OBS PAltfflO QIIAN'l'I'l'Y DISC
1 A01i3 8 pickup
A220 naak ,
''
2
3 A08 body i
4 A810 4 pick guard:
'
'
: lj·
!
: i
t
Program The objective is to create a new data set that includes anloi he information
1
1
from PRIMARY and only the corresponding descriptive:iMormation from
, LOOKUP. The resulting data set, REPORT!, contains tlie1p~rt number,
quantity, and description. I 1:! I:
I !:11
First, read an observation from PRIMARY. Then, use the SET statement with
' 1·1
the KEY:::: option to read an observation from LOOKUP :i,~~fd on the current
value of PARTNO. To verify whether a matching value in!~OOKUP has been
located for the·current value of PARTNO in PRIMARY,I U~fr.the SYSRC
! autocall macro and the _IORC_ automatic variable. Wh~n !match is found, I~
write the observation, When no match is found, write a ~ar,rling message to the
SAS log, reset _ERROR_ to 0, and continue processing. Whkn an unexpected
condition is _encountered, write an error message and stop 4xkcution:
, . Ii::
:!
Create REPORTI. Read all observatio,i data report:1; iI
fl'om PRIMARY. set: primary;
; '
Read a11 observatio11from LOOKUP based set lookup key=partno;

o,r tlze val11e of tile key variable, PARTNO.
The SET statement with KEY= accesses an
observation in LOOKUP directly through
the index, using the current value of
PARTNO.
Wl1e11 a,i observatiolljrom LOOKUP l1as select (_iorc_);

bee11 s11ccessft1lly located a11d retrieved, when (lsysrc(_S0Kll output:;
write it to ~EPORTJ. When the value of
_IORC..:. corresponds to _SOK, the value of
PARTNO in the observation being read
from LOOKUP matches the current
PARTNO value from PRIMARY. (For
information on _IORC_ and SYSRC, see
the Appendix.)
!I
I If!. I:,Xtllltp,t:c J,J. l J; \.,,rlUJJU:t J
II::.1 :!.
Wizen 110 match is Jou11d, write a wanzi11g when (%sysrc {_dsenom) )
message to tlze SAs 1 rJ.g:
1
When the value of do;
•I I
_IORC_ correspondf ,to ~DSENOM, no put •~/ARNING: Part humber' partno •is not in lookup table,';
I
observations in LOq~QP contain the _error_=0; '
I
current value of PA~l'NO. _ERROR_is end; !

reset to 0 to prevent 11tiI error
.
condition that
I
would write the conterits of the program
data vooto, to the •1:
I '
l~g
/11 case ofall rmexpecred _IORC_ otherwise do; j

f,
co11ditio11, Jlll'ite art 1o~· message to the ': put .'Unexpected ERROR: :_IORC_ = ' _iorc_:
SAS log 011d stop exfl:f1~011, When _IORC_ . j :: :; stop;
corresponds to anything ~ther than :! end;
•I I
_DSENOM or _s011, unexpected
t
rn Ir ...
end;
condition has been encountered, so an error run;
message is written to) t~e; SAS log and the
STOP statenrenttl:"''."' the DATA step.
Related Techn;que If you are familiar with Structured Q*ery Language (SQL), you may want to
use PROC SQL instead of the DATAi step. PROC SQL joins the tables* to
; I
produce a new ,table, REPORT2. The:REPORT2 table is the same as

REPORTl, except for the order of the data. The difference in order is a result
of the join method chosen by the internal optimizer.
'
• II
I
Conceptually, a join results in an internal table that matches every row in
PRIMARY with every row in LOOKUP. The WHERE clause determines that
the join will return only the rows that;have matching values for PARTNO. The
table REPORT2 has the quantity and description for each part that is in both
input tables: · !
proc sql; ; '

select *
from primary, lookup
where primary,partno=lookup.partno;
quit; ·

SELECT clauses, which follow CREATE TABLE or CREATE VIEW
,,
• A PROC SQL table is a SAS data set. In SQL terminology, columns arc variables and rows
: are obs'ervntions. :
!
II ,.. r ... n ..., .
I 11:i I
I ij
Combining Multiple Observations with Mm_!

: ii [
ltiple
Observations ; If! , I
!I
:i
,
Ii iI I
The many-to-many category implies that multiple observations from each
input data set may be related based on the values of a ccinirilon variable.
I I:!
; !: i
4.1 Adding Variables from a Transaction Data Set to a Master Data Set 74
4.2 Updating a Master Data Set with Only Nonmissini tLues from a
Transaction Data Set 76 j l:i I;
4.3 Generating Every Combination of Observations (6a~1t~sian Product)
between Data Sets •78 : 1:1 I.
i i'
4.4 Generating Every Combination of Observations bet~Jen Data Sets
Based on a Common Variable 80 f l!J
: '; 1.
I:
i iii' '
4.5 Delaying Final Disposition of Observations Until All' Processing
Is Complete 82 • i Ii I
: 1: :
!, 4.6 Generating Every Combination between Data Sets!. Biased
I•
on
a Common Variable When an Index Is Available 86 · 1
, Ii .
! 4.7 Combining Multiple Data Sets without a Variable 1 ~6 on to Ali
1 the Data Sets 92 ' I:!
l (1
' 4.8 Interleaving Nonsort_ed Data Sets 96 [ 1:1 ,
4.9 Interleaving Data Se~s Based on a Common Variabld1! '99

! I
i I, .
4.10 Comparing All Observations with the Sarne BY ~a,~ s 102
. I,
i;
Ii
I
! .
I•
ii
:i
I
,.. t:,.\Wll/Jlt: 't,J LI I'\'''."P'"''.,
• I
;1 _:
;:
Example 4., Addi~g Variables fro"' a Transaction Data
Set to a Master Data Set
Goal Based on the values of a common variable, produce a new data set by
combining variables from a master data set and a transaction data set. Include
only observations that the masterdat~ set contains.
i;
I
I
Strategy Use the MERGE statement with the I;JY statement to combine the observations
from the two data sets. Use the IN= data set option to indicate whether the
master data set contributed to an obsdrvation. To get the desired results, set the
value of the temporary variable creat~d with IN= to Oat the top of the DATA
step to l'eset the value when the BY v~iable changes. While merging
observations within each BY group, tjse the subsetting IF statement to allow
the DATA step to complete the current iteration and to write an observation
only when the master data set has contributed to it.
';
This match-merge operation requires that each data set either have an index on
the BY variable or be sorted by the values of the BY variable.
Input Data Set, :

Both the master and t~ansaction data sets !",
MAS'i'ER TRANS
ma~ contain duplicate! .JaI~es for the BY
val'1able NAME. .• J :
y·
OBS NAHE OBS NAME z
i! :
Ii' 1
2
John
John
}111
2222
1
2
John
John
89·
94
3 John 3333 3 John 83
,, 4 !4afy 1111 4 Macy 77
;I
. i •. 5 Mary 88
,·, . 6 Mary 99
. ,1
I I !:!,'
Resulting Data TTiet 'i •. : i:il\-1;
!
Ortlpr,t4.la COMB i ED Data Set I
... ·•\ .: COMBIIIBD
I
Ol!S WIMI! i y z
i
• ,:~ r _:~ _., ··i 1 John I 1111 89
d : , [,; I.( i 2 John! 2222 94
3 John i 3333 83
4 Hilty! 1111 77
Co111bi11i11g Multiple Obsen•ations wi1l1 M11ltip/e Observalioi1's , □ Example 4.J 75
Output4.lb COMBINE2 Data Set

C0flBINE2
OBS NAMB y z
COMBINE2 contains two additional 1 John 1111 89
observations for MARY that did not exist in 2 John 2222 94
3 John 3333 83
MASTER. See "Related Technique." 4 Mary 1111 77
I
5 Mary 1111 88
6 Mary 1111 99
:i.,
: :I 1•.ii ' I
Program The objective is to combine observations from MASTE~ lwd TRANS based
·on the values of a common variable, including only thos~ 6tiservations to
which MASTER contributes. Use the MERGE and BY st~t~ments to combine
IF
observations from the two,data sets. Use the subsetting ~t~tement and the
IN= data set optio.n t9 determine when MASTER has contti?.uted vari~bles.
Reset the IN= variable to Oat the top of the DATA step. qtHerwise this value;
which is retained until the BY group changes, may cause. t~i :.DATA step to
write additional observations to COMBINED from TRANS ;
;" I:, .. I i !! i
,j, •. ·i ' ·:
Create COMBINED. Combi11e data combined; : : '!
observ.atio11Sfrom MASTER a11d TRANS inrnast=O; : . :I ,
based 011-tlze matcl,l11g values for tlze BY merge master(in=inmast) trans; ;I
variable NAME. IN= creates INMAST, by name; ' 'i
which is set to 1 when an observation from I
I
MASTER contributes to the current
observation. lNMAST is set to Oat the top :1
of the DATA step so that a previous value 1
of 1 is not retained until the BY group iI ,

changes.
I
' I
Allow the DATA step to complete the if inmast;
current iteration a11d write a11 obse1'vatio11 run; 'i
to COMBINED 011/y if MASTER has i
• •I
I
co11trib11ted to it. !
Related Technique
•
!The preceding program writes an observation to COMBINE , only if the
! Ii
;MASTER data set contributed. In the input ?ata sets, TRA*~ contains three
Iobservations with the value of MARY for NAME, but MA,5TER contains only
j one. If in your application you want the resulting output da~, f.et to contain
1multiple observations when the transaction data set does butthe master data set
idoes not, then simply remote the assignment statement th~i~bts the value of
:the IN= variable to 0. In· thi~ example, if you do not reset iNMAST to 0, its
;value is retained throughout the BY group. Three observati6\is, therefore,
' '. 1,, I
;containing the value MARY for NAME are created and wdtt n to the output
data set COMBINE2. See Output 4. lb. 1 !
:I I.
[ data combine2;
merge master(in=inmast) trans;
by name;
if irunast;
run;
10 l!.,).Olllple 'l,,t. u Il11ap1er,.
I :1
Example 4.1· I Updating a Master Dat~ Set with Only

Nonmissing Values from a Transaction Data
Set !
:I
I!
Goal Update a:mast~r data set with values from
a transaction data set, except when
the trans~ction data set contains missing values for the variable being updated.
Strategy Use the MERGE statement with the BY statement to update values in a master
data set with values from a transaction data set. Use IF-THEN logic in
conjunction with the RENAME= dat~ set option to apply transaction values
only if they are not missing values. i
, I
I
Tbis match-merge operation requires that each data set either have an index on
the BY variable or be sorted by the v~lues of the BY variable.
;
Input Data Set~ .-1

i!
Both MASTER and Ti1½'\NS contain MAS'rEn 'l'RANS
duplicate values for ttiJ'!By variable ITEM.
I I
The data sets are sorted-lby the values of OBS; I'l'EM , PRICE OBS I'l'EM PRICE
. I
ITEM. . · ; I
' ' •··

: i, ·1 ;P' i
.:I
1 apple $1.99 1 banana $1.05
2 apple $2,89 2 grapes $2.75
3 apple $!.49 3 orange $1.49
4 grapes $1.69 4 orange
5 ': · grapes $2,46 5 orange $2.39
6 ·! orange $2.29
7 orange $1.89
8 or~nge $2.19
1.1.
Resulting Data r= et i-•
Output4.2 COMB! EDataSet

'I '
, I
i,. :I OBS
COMIIINB
I
ITl!M PRICII
'I , 1
i
ap~le $1,99
:1 2 aPl)le $2.89
3 apple $1.49
4 banana $1.05
5 grapes $2,75
G 9rapes $2.75
7 Or8119'8 $U9
8 orange $1,89
9 ora:ige $2,39
;', ,•
;
Combi11i11g M11friple Observ11t/011s with Multiple Obsen1atioi1'~1 □ Example 4.2 77
' Ii
Program J
The objective is to update. the observations in MASTE~ I: ose ITEM values
have a match in TRANS, 'except when the value of the PRI€E, the variable
I
being updated, has a missing value in TRANS. The variatM PRICE in TRANS
is renamed NEWPRICE so that in the program data vecfoHts value does not
automatically overlay the value of PRICE read from MA~1JER. When the
value of NEWPRICE is not missing in TRANS, use IF-THEN processing to
assign its value to the PRICE variable in MASTER. Othe~~l.ise, u~e the .
existing value of PRICE in MASTER: l !;
. I'
1!
Create COMBINE. Co111bi11e observations data combine (drop=newprice);
from MASTER a11d TRANS based on the merge master trans(rename=(price=newprice)); i
111atclli11g values fot ITEM. RENAME= by item;
I
·:
renames the variable PRICE in TRANS for
later processing with the IF-THEN i :
statement.
When NEWPRICE is 11ot equal to missi11g, if newprice ne. then price=newprice;

use its,value to update the MASTER value format price dollar5.2; •
of PRICE based 011 tl1e c11rre11t value of run;
ITEM. ff the value of NEW PRICE is
missing, PRICE retains its original value
from MASTER.
·1 •
IO .c..mmp11:: "·" LJ
II
". IIU.J/1'1/" 't
I' , ,,'I .'

Example 4.3i I
Generating Every Co1n~ination of
Observations (Carte.sian Product) between
Data Sets !
Goal Combine two tables* that have no common columns in order to produce every
:1 possible combination of rows.
I
I
Strategy :I Join the two tables with PROC SQL, '?/hen you join two tables without
specifying join criteria in a WHERE clause, you get a Cartesian product. A
Cartesian product shows every possible combination of rows from the tables
beingjoined. PROC SQLjoins the tables listed in the FROM clause.
Input Data Sets!; TRIPS ATTENDS
OBS DEST 'l'llAVCODE OBS NAME ~EVEL
1 DETROIT C751 1 Kreuger, John 1

2 SAN FRANCISCO C288 2 Angler, Erica 2
J ST THOMAS A054 3 Ng, Sebastian 1
4 HAWAII P003 4 Sook, Joy 3
5 ' BERMUDA A059 5 Silverton, Lou 2
'
:1
are observations. j
I
I '
·1
' i
I ;j
Combi11i11g Mulriple Observal_iowr with M11/tiple Observatioii1 0 Example4.3 79
;
Result,ng Data Set
i
01ttp11t 4.3 FLIGJ-J.TS Table
FLIGHTS '
i
OBS DIST 'l'RAVCODB HAMB LML
'
; !
1 DB'l'ROI'l'' C751 Kreuger, Jobn 1
1
2 Dl'l'ROIT · C751 Angler, Erica' 2
3 DITROI'l'. C751 Ng, Sebastillll; l
'
5
6
DB2.'ROIT.
DB!l'ltOI'l'.
SAN FIIANCISCO
C751
C751
C288
Sook, Joy
Silverto11, Lou
Kreuger, John,
3
2
1
7 SAN FRANCISCO caea Angler, Brica; 2
9
8 SAN FRANCISCO
SAN FRANCISCO
C288
C288
Ng, Sebaatia11 I
Sook, Joy :
I 1
3
10 SAN FIWfCISCO C:288 Silverton, Loll _,J 2
11 S'r THOMAS A054 Kreuger, John! 1
12 ST 'l'HOMAS A054, Angler, Erica i 2
13 S'r 'rROMAS A054 Ng, Sebastian! 1
1' - ST THOMAS A054 Sook, Joy : 3
15 ST i'HOMAS A054 Silverton, Lou 2
16 HAWAII P003 Kreuger, John i 1
17 HAWAII P003 Angler, Erica i 2
18 HAWAII P003 Ng, Sebastian ' 1
19 HAWAII P003 SoC!k, Joy ! 3
20 HAWAII 1'003 Silverton, Lou 2
21 l!IRMUDA . A059 Kreuger, John. ·1
22 BBRMIJDA i A059 Angler, Erica! 2
23 BBRlltlDA ' A059 Ng, Sebastian ! 1
24 BBRiluDA A059 Sook, Joy 3
25 · BERMUDA · A059 Silverton, Lou 2
;
;
rn
Program Because each flight attendant in ATIENDS /lies to each lUnation, the
objective is to produce a table that shows every possible co'tribination of
NAME and DEST:
I
i
'
i
Invoke PROC SQL and create a table. The proc sql; '
CREATE TA~LE statement creates the create table flights as
table FLIGHTS to store the results of the
subsequent query.
Select the colum11s. The SELECT clause select *

selects all of t~e columns from the tables
specified in the FROM clause. · I:
Name tile tables tojoi,, a11d query.

quit;
from trips, attends; I::
:Note: In PROC SQL, SELECT statements automatically~ pduce a report.
:SELECT clauses, which follow CREATE TABLE or CREA E VIEW
!statements, do not automatically produce a report. . .[
ou l!..Tampte "·" u I _J.11i1111er,,
,-:I
Example 4.itll:1 Gene~ating Every Combination of
i!I Observations· between! Data Sets Based on a
I' Common1'Variable ! .
I: ; ; i
Goal Combine :two tables"' that have a common column. The common column has
I ' I
duplicate values in both tables. Produ~e a table that shows the possible
combination of rows where the values from the common column match.
i
. I
Strategy Join the two tables with PROC SQL. The join produces all possible
combinations of rows from both tables. Use a WHERE clause to choose only
those rows where the values from the;common column match. Order the query
rnsult to make the data easier to proce:ss in subsequent steps. You do not have
to sort the data prior to joining the ta~le.
This technique of showing the possible combinations of observations is useful

for producing a' table that can be manipulated further.
. !
Input Data Setsi ROSTER SCHEDULE

,n,j
OBS GRllDE. STUDENT '
:OBS ORllDE HOMEROOM LOCA'l'ION
::,;
.!
' 1 ,!,: 11 .:_Jon i1 11 6 room4
Rick :2
--
2 9 10 3 rooml
3 10 Amb·er !J 12 8 Tibri~y
,4 12 Susan - i4 10 4 roOJ02
5 10 Cindy 5 11 5 ·ro_c.m3.
6
7
8
11
10
12
Ginny
Denise
Lynn
6
7
8
10
11
9
2
7
1
--
cafe
~hop
gym
9 u Michael ·--·--,
10 . 12 Jake
,; 'i
are observations.
Combi11i11g Multiple Observatiqirs with M11ltip/e Observar~o,M 0 I; Example 4.4 81
! I' ,
; 1'1
Resulting Data Set

Ii
I!11
Output4.4 ASSIGNTable
ASSIGN l;i
O!IS STUD&NT GRADE HOMKROOM LOCA'l'ION
·1
1 Amber 10 2 cafe, i,
2 Amber 10 4 room2
3 Amber 10 3 rooml 'i 1•
4 Cindy 10 4 roorn2, -;
S Cindy 10 3 rooml
6 Cindy 10 2 eafe;
7 Denise 10 4 room2
8 Denise 10 2 cafe ·
9 Denise 10 3 roomi
10 Ginny 11 5
11 Ginny 11 6 room3 l'I'I
room4
12 Ginny 11 7 ahop, ,
13 Jake 12 8 library I ,
H Jon 11 6 roomi j'1 ,
15 11 5-.
Jon rooml l'j'
16 Jon 11 7 shop I, ,
17
18
19
Lynn
Michael
Michael
12
11
11
8
5
6
library!
rooml
room( I '.
II ,
20
21
Michael
Rick
11
9
7
1
shop
gym
I
't'l!
Ii
22 Susan la 8 libra~r! .
1
I ::1 1 '
I I::
Program The objective is to produce a table that shows all of the pos:¢~ble homeroom
locations for each student, b. ased on grade. I 1:.! I I
: If
Join the two tables to find all of the possible combinations ~~ STUDENT and
LOCATION. Use the GRADE column to join the tables.~ G:Hoose only those
! 1:1 I
rows where the values for GRADE match. Order the data:by TUDENT:
I Iii
lllvoke PROC SQL a11d create a table. The proc sql; ' i
CREATE TABLE statement creates the create table assign as C\"fll~t \-a.~~ I.Xl;il : ~
I '
Cti
table ASSIGN to store the results of the
subsequent query.
I' I
::1
Select t1ie colmmzs. The SELECT clause select student, ro. ster. grade, homeroom, location !':.!
selects the specified columns from the
tables specified in the FROM clause, :
'
5t!Lct iJ-1"~~-
i
t'"',
11
Because GRADE is in both tables; you need ; :1 '
to qualify the name by prefixing the table : it '
name to the column name.
' .l
n
Name the tables tojoill and query, from·roster, schedule
Specify tlze join criterio11. where roster.grade=schedule.grade ii

I, ·iI,,l
Order the resulti11g rows by the stude11ts' order by student; 1
11
names. quit; I :!
, 'I
, 'I
~ote: In PROC SQL, SELECT statements automaticall~ ~f duce a report.
SELECT clauses, which follow CREATE TABLE or CREli.T:E VIEW
ftatements, do not automatically produce a report. I I:I
I. I 11'JI;
I
I 1:1
!,. • The columns that you join on do not have to have the same name. j Ii i
I ii
I. 1-:I,
1·1
ji '
I,
I
·f
,1
g,t; oxu111pu: 'f•.J u I:Juupu:r .,
:!
·!
Example 4.51 Delaying Final Disposition of Observations
I.,I! Until All Processing Is, Complete
I:1 iI
!
Goal Search through a data set multiple times to find the closest match based on
calculated criteria, not on matching ✓alues of common variables. Flag
observations for subsequent processipg based on those criteria.
i
I
I
I
i
Strategy Flag observations in a data set for further processing by reading one data set
sequentially and another data set dire~tly using the POINT= option. Set up an
array with one :element for each obsetvation in the second data set, the one you
read dire<?tly. '
Read an observation from the first da~a set, then begin reading observations
from the second data set, looking for :Values
I
of one or more variables that meet
a certain condition set by a value from the observation in the first data set. Use
the iterative DO loop and the POINT~ option to read all observations not
marked as already used from the second data set. Continue reading
observations
' . .
and comparing values td: see if a better match occurs in the second
data set. :
I
After the entire second data set has bJen processed to locate the best match for
the current observation in the first data set. write an observation that contains
the best match to the output data set. Mark the selected observation from the
I •
second data set as used. I

i ~ , . . 1
After all observations

: ·, : I -t
from the first data
I
set have been processed, write to
another output data set all observations from the second data set that were not
paired with ob~erv'ations from the first data set.
:I
'I
I
.j
Input Data Setsl :I

:1
ROOMS indicates tha the seating capacity ·. ·, ROOMS
and availability of de d facilities for six ':
·I
currently unscheduled 1!1eeting rooms. OBS : ROOM. DEMOFAC CAPACITY'
i
1 RlOO; N 10
2 R200; y 15
3 R301 y 30
4 . R305 N 50
5 IR4QQ, y 60
6 · R420. y 100
MEETINGS contains I ~r observation for MEETINGS

each meeting that needs fo,be scheduled, the ',,
number expected to atthnd; and whether OBS 'NUMA.TT DE!10 DESC
demo facilities are nee I e~,'
1 10 · y Operator ~raining
I
");
2 ' 12 N' Sales Mee~ing
i··' 40' ~ !-{1 y
'3 MarketingiPresentation
4· :60· I N Division Jeeting
5 45 N Employee Orientation
Combi11i11g Multiple Obser11atip11s witlr M11ltiple Observatio11!i □ Example -4.5 83
!.:
n
Resulting Data Sets i
Output 4.Sa ASSIGN Data Set
I!I•
ASSIGN
1::
ASSIGN lists the meetings and their OBS NUMA'rT DEMO · DBSC: BOOM DEMO AC CAPACITY
assigned rooms. No room was found for the 1 10 Y .Operator Training B200 15
last meeting. 2
3
12
,o
N' • :Salas Meeting
Y': i . Marketing Pr11aent11tion
1!305
BU0
'1~1 50
60
60 ?fr '; •Division Meeting :Yi
'
5 45 N
· )', ''
, Employee Orientation
: I
R420
NONE 1:r
n1
100
,.,
'
I!:: I!
.,
Output 4.Sb ROOMS Data Set 1·1
OBS IIOOM
ROOMS
DBMOFAC: CAPACI'l'Y
Ii
I:
ROOMS contains only the rooms that ·! I-
remain unassigned. 1: Rl00 N 10
2' R301 y 30
iI
! I
Program !The objective is to find the most suitable room for a meetinl based on the
!number
.
of attendees and the need
.
for demo facilities. After' 1.,ti1~
1
first suitable
:match is found in ROOMS for an observation in MEETINGi the rest of the
:observations in ROOMS are searched in case there is an ey~~ more appropriate
match. "More appropriate"imeans that the room is closer ~1i'size to the number
~f attendees or that demo facilities are not scheduled unlessl~liey are needed. In
~this application, keeping demo facilities available was the :highest priority.
! . . ' i Ii I
First, determine the n'umber1of observations in ROOMS by ~Jing the NOBS=
bption and write that number to a macm variable using CAI!.U SYMPUT. In
; · · 1· 1·
~he second DATA step, use sequential access to read an 09sfr1vation from
;tvJEETINGS. Use the value ·of the macro variable to create aplarray with one
element for each observation in ROOMS. Create an iterativ~ bo loop that 1
iterates once for each observation in ROOMS. When a rooni 1lias been
f ch~duled for a meetin~, the value of the appropriate eler_n~~~ fn the array
md1cates that the room 1s currently scheduled. If a room 1s:noo already tagged
~s scheduled, read an observation from ROOMS directly u'slnk the POINT=
bption. Determine if the room is large enough; if it has dern6}acilities,
determine if the meeting requires them. Continue reading otl1er observations
from ROOMS, testing to see if a more appropriate roo~ is; a;vJiiable. Set up
temporary variables to hold values for seating capacity and ~vrilability of
~emo facilities so that you can compare those values to onesr~ad from the next
6bservation as you search for an even more suitable room, I' 1:1., 11
II i :j .
At the end of each DATA step iteration, write an observatii>ri 1l~ ASSIGN. If a
I ; 1,11
match was found, set the USED array element for the approJjriate observation
from ROOMS to 1. If it wasn't,.write the observation and incubate that no
ioom was assigned.
I
· I I.!
i I!
I
• I t! I
After processing all observations in MEETINGS, reread ROOMS with direct

~ccess using SET with POINT= in a DO loop. Write observ~tibns to the new
✓ersion of ROOMS only for those rooms that remain unassid~bd.
Olf Al"(ltllple .,.,.J j·~' \.,,flUJJU:r .,
Dete1'111i11e the 11 ,J,~e1· of observatio11s i11 data _null_;

ROOMS a11d sto1·tr111iat vallle i11 macrn if O then set rooms nobs=n'obs;
1
val'iable NUM. ~o o~servations are actually ' i

1 call symput{'num',left(puv(nobs,8,)));
read from MAS'D,EJ.l because Ois never stop;
true. MASTER opened is,
., ' so that the run;
number of observ.ations can be captured
from the data set ,~!iscriptor
., information. By
using CALL SYty.l,Ji>UT to store this number
in a macro variable( you can pass it to the
next DATA step.I iJ I
,,,
Create ASSIGN q11p a 11ew versio11 of
' ii•
data rooms[keep = room demofa capacity)

ROOMS. Defi11e th,e array USED. Retai11 · assign(keep = desc room ~umatt demo demofac capacity);
tl,e valries of the eleme11ts i11 tl,e array array used(*) usedl-used&num;
' '
across 1terat1011s 1 1 . 'ti'a11ze
a11d 1111 ,. tl ,err
' vaIues
I I : I rei:ain usedl-used&num O;
to 0. The macro vprja1Jle NUM is equal to 6,
the number of ob~ervations
I I
in MASTER.
So USED1-USED6 are created to contain a
value that indicalis rhether an observation
is flagged as alreacly matched.
1:1
Read all observatio11ft·om
,., MEETINGS. set meetings end=done;
Create a variable .(DONE) that will eq11al 1
wlie11 the last obsf':T,alioll is beil,g
processetl. After the,Iast observation has
been read, conditiHa~ processing can
execute statements that create another data
set containing unab~'igned rooms.
britialize temporal),., v~riables that will be · tempcap = 999;

used to·sto,·e val111s;of CAP, DEMO, a11d 'tempdemo = demo;
OBS so that val11ef:fa,11 be compared •~empobs = •,
betwee11 obser11afitmsfro111 ROOMS.
!
I'
• ·1
'I 'i '
Begi11 a DO loop tf1tfl will read eac/i :ao i=l to nobs;
f
observatio1t/rom ?OMS tl,at is 11ot .. if used(i) ne 1 then
al,·eady flagged as1~,~ed. This DO loop do;
allows the prograll/- .to scan the entire set rooms point=i obs=nobs;
ROOMS data set iil:~n attempt to find the i.'
best unassigned m~t9h for each obseL'vation
in MEETINGS. US6if~is true (equals 1)
when an observati~n1has been designated as
already assigned to.~ meeting. Because
POINT= is used, ~OOMS is read with
direct access instead !of sequential. If
I'·• ,
ROOMS were pro9~~s~d sequentially here,
reaching the end of.it would end the DATA
step, and we would riot be able to reread it
multiple times. I
!'i !
I:I i
Deter111i11e if tile ct1l're11t ,·oom is large if capacity>= numah and (demofac=demo or demo = 'N') then
euorigl, a11d if it li~s ~emo facilities if they do; !I
are 11eces:mry. CAPiA;-ClTYand
DEMOFAC are va~i~b~es from ROOMS
that contain informr~on about the ~eeting
room in each obsery~tion. NUMA'IT and
DEMO are variables from MEETINGS that
show how many peb~le the room must
accommodate and if' ~emo equipment is
needed. 11!
:;:
·'.i
:i,,i
',j
ii
J
VUIIIVUlfllli, ,r.iu~ftjJCc;;. '-'V->r;;.1 1 Lu1yu.Jo 1nu, 1r.r.u111r•c. vv~w, r1,11,,.n,...,I
[
L...I
: : 1:I
if (capacity < tempcap) or : I:I
[ Determi11e if the cu11"e11t room is a better (demo= 'N' and tempdemo = 'Y') lhen
fit tha,1 the previous choice. If the do; : P
l
CAPACITY value of the current room is
smaller than that of the previous choice
(TEMPCAP) and if the status of demo
, : :~~:~o c:~: !!~:
tempobs = i;
== I: i
facility is the same or not needed, then the ~nd;
current room is a better choice. TEMPOES
[ is set to the value of I, the number of the
end; ;• ends a DO group
. current observation. It will be used later to
set the appropriate member of the USED
[ array to indicate that the room has been ' i,
selected. I :
[ If a11 exact matclz is found, leave tile if

'. :j '
tempcap=numatt and tempdemo=demo then le ve;
I l'i I
iterative DO loop became there is 110 need end; /* ends a DO group ; 1,*l,
! IJ}
to searchfurtlie,~ I end; /* ends an 1terat.1ve DO looip '!
, •
I
[
If a room (from ROOMS) has bee11f01md if tempobs ne then
for the c11rre11t meefi)1g (from do; 1
I
[ MEETINGS), then use the val11e of set rooms po~nt=tempobs;
TEMPOBS to locate tlze app,·opriate output assign;
observation from ROOMS aud reread it. used (tempobsl,=1;
Write the c11rre11t obs~rvatio11 (co11tai11i11g
l i11formatio11from MEETINGS a11d
end;
ROOMS) to ASSIGN. Set tile val11e of the
I
USED array element that correspo11ds to
[ the curre11t obser11atib11Jrom ROOMS to I.
I
If 110 room was selected, reset the values of else
L ROOM, CAPACITY, ~,1d DEMOFAC
appropriately, and w,1ite tire observatio11 to
do;
I room = 'NONE/;
ASSIGN, i11dicatirig that 110 approp1'iate capacity = • /
[ room was available.
demofac = ' ' ;
output assign;
end;
L
After all observations i11 MEETINGS have if done then
been processed to locate the best available do i=l to dim(used);
[ meeting room i11 ROOMS, use direct if not used(i) then
access to 1'ead each obser11atio11 i11 do;
ROOMS. Write to the 11ew versioll of the se~ rooms point=i;
[ ROOMS data set 011ly those observations output rooms;
that are not flagged as used. The DIM end;
function returns the number of elements in
end;
[ an array. Using DIM prevents you from
having to change the upper bound of an run;
iterative DO group if you later change the
number of atTay elements.
L ,,'
iii
I;
L 1 1,
Where to Go from Here D LEAVE statement. F6r a complete description with ah example, see
pp. 34-35 in SAS Tec~11ical Report P-222, Changes a,ldiEnhancements to
[ Base SAS Software, Re;lease 6.07. :[ i.
[
l !I
Example 4.fr 1
' Generating Every Com_bination between D~ta
Sets, Based on a Common Variable When an
Index Is Available ·
,j i
!
l
i
I
I
Goal Create a new data set that is a cartesia'n product* of two input data sets.
I
I
:
i
Strategy The overall strategy is to process the first data set sequentially using BY-group
processing and to process the second ~ata set directly based on the value of a
key variable. (The variable corrunon to both data sets is the BY variable for the
first data set an~ the key variable for the second data set.) Bach time you find a
match, write an observation to the output data set. If there are consecutive
duplicate values for the common variable in the first data set, you must force
the pointer to return to the beginning ~f the index so that matching values in
the secondI data set will be retrieved and
I
paired with the appropriate
observations in the first data set. i
!
In detail, sort the first data set on the BY variable and index the second data set
on the same variable. Read observatiohs from the first data set sequentially,
executing the SET statement in each iteration of the DATA step. Read an
observation from the second data set u.~ing SET with the KEY= option in a
DO UNTIL loop. Continue reading observations until there is no mateµ for the
common variable. ,, · i
:: ;: • i'
Use the SELECT group .to conditional1Y execute statements based on whether
a match is found. If a match is found, write an observation to the output data
set. If a match is not found, take differbnt actions, based on whether the current
observation from the first data set is thb last one in the current BY group.
When it is ,the last in the BY group, ta!<;e no additional action. The DO UNTIL
loop will end. I
:· I I
When the current observation from the first data set is not the last in the current
1
?n
BY group, you ~ust force positioning the index to the beginning.
Otherwise,:cons,ecutive duplicate value:s for the comrnoµ variable in the first
data set cannot: be.. paired
'
with matching! values in the second data set.
I
You can perforf!l tl~e s~me task with P~OC SQL. See "Related Technique."
' '
, it, · I
Note: Due to the variabiiity of data and the number of conditions that
determine the path.chosen
' '.. • 'I'
' .,_ '-
:,i,
by the PROC SQL optimizer, it is not always
I
possible, to det~r~~e t~e most yfficien~ method without first testing with your
data. , ;· I
i
i
i
II
i
I
i
I
I
i
I
!
• In this example, a Cartesian product is a ne~v data set that consists of every possible
combination of observntions from the two input data sets, based on the value of a BY
variable.. ·
Combining M11ltiple Obsen1atfo11s with Multiple Observatidi,f □ Example 4.6 87
:I
,I
:I
Input Data Sets i
The SALES data set is sorted by
.I
SALES
PRODUCT.
OBS PRODUCT SALESREP ORDERNUM
1 310 Polanski• RAL5447 '

2 310 1 • Alvarei CH1443
3 312 Corrigan DUR5523
4 313 1 •• , Corrigan · DUR5524
5 . 313 Polanski RAL5498
The STOCK data set is indexed by STOCK

PRODUCT.
OBS PRODUCT PRDTDESC PIECE
1 310 oak'pedestal table 310.01

2 310 oak:pedestal table 310, 02
3 310 oak:pedestal table 310.03
4 312 brass floor lamp 312.01
5 312 · . , brass floor lamp 312.02
6 313 . . oak bookcase, short 313.01
7 313 , : oak bookcase, short 313,02
Resulting Data Set

Oulpttt4.6a SHIPLISTData Set
SBIPLIST
SHIPLIST was created with the DATA OBS PllOP\ICT SALESREP . ORDBRNUM PRDTDESC PCDl!SC
step. 1 310 Polanski · RAL54'7 oak pedestal table tabletop
2 310 Polanski ' RAL5U7 oak pedestal table pedestal
3 310 Polanski . RAL5447 oak pedestal table 2 leaves
4 310 Alvarez CH1443 oak pedestal table tabletop
s 310 Alvai:az CH1443 oak pedestal table pedestal
6 310 Alvarez , CH14'3 oak pedestal table 2 leaves
7 312 Corrigan DDll5523 brass floor lamp lamp base
8 312 Corrigan DUR5523 brass floor lamp lllll!P shade
9 m Corrigan : DUR5524 oak hookcaee, short bookcase
10 313 Corrigan : DUR5524 oak bookca ■e, short 2 shelves
11 313 Polanski i RALS49B oak bookcase, short bookcase
12 313 Polanski : RAL5498 oall: bookcase, abort 2 shelves
Output 4.6b SHIPLST Data Set

SHIPLS'.I'
SHIPLST was created with PROC SQL. OBS PRODUCT SAL!!SRXP ORDBRN1JM PRD?DESC: PCDKSC
1 310 Polanski RAL5447 oak pedestal table tabletop
2 310 Alvarez CH1443 oak pedestal table tabletop
3 310 Polanski : RAL5447 oek pedestal table pedestal
4 310 Alvarez . CH1443 oak pedestal table pedestal
s 310 Polanski RAL5447 oak pedestal tabl& 2 leaves
6 310 Alvarez : CH1443 oak pedestal table 2 leaves
7 m Corrigan . DDll5523 brass floor lamp lamp base
8 312 Corrigan , DtlR5523 brass floor lBJl\p lamp shade
9 313 Corrigan DUR5524 oak bookcase, short liookcase
10 313 Polanski .RAL5498 oak bookcase, short bookcase
11 313 Corrigan I DUR5524 oak bookcase, short 2 shelves
12 313 Polanski . RAL5498 oak bookcase, short 2 shelves
oo .excm1p1e "·"
;
i
Program The objective is to create a shipping! list data set from one data set that shows
each item sold and from another dat~ set that shows how many pieces need to
be packed for shipping each item. F~r example, an observation in SALES
shows that an oak pedestal table, item 310, was sold, and STOCK shows that
item 310 consists of three pieces:_ a tpp, a base, and two leaves. The resulting
data set SHIPLIST, therefore, will contain three observations for the first sold
item recorded in SALES. !
I
First, S.A'.LES. must be sorted by PRODUCT, and STOCK must be indexed on

In
PRODUCT. the DATA step, read'. observations sequentially from SALES,
using the SET; statement to read one 'observation on each DATA step iteration.
Specify PRODUCT in the BY staterilentso that you can use BY-group
processi~g. . :
Then read observations from STOC~ directly. Use the SET statement and
i
,,
specify PRODUCT as the key variaqle with the KEY= option. Place this
statement in a DO UNTIL loop that executes until there are no matches in
STOCK for the current value of PRQDUCT in SALES. Each time a match.is
found, write an observation to SHIPLIST. When no match occurs, take one of
two actions based on whether you'v¢ finished processing the current BY group
in SALES. :
If the cun·ent observation is the last observation in SALES for the current BY
group, tlie DO UNTIL loop conditio1i. is met and the ioop ends and processing
returns t~ the top
of the DATA step tp read the first observation from the next
BY. group in SALES.
. ., i
' I
If the cur~ent observation is not the l~st observation in SALES for the current
BY group, yo~• must force the pointe~\ to return to the beginning of the index so
that observations with matching PRQDUCT values in STOCK will be found
and matched
I
with observations from SALES. See "A Closer Look" for more
-t '. 1
detail. j
I
I !;I : .' · 'i I
Create SHIPLIST. Rfqdau obscrvaiioll . data
•
shipl;ist
Ii• tO •
(drop=dwruny); .
from SALES. Specify PRODUCT as the · set sales;
BY variable. Set DU.M~;Y to O at the top of · by, product;
each PATA step iter~tipq, (The next DO d~y=O_;, .
loop uses the value o1D·.
·I UMMY.)
ij
;_.,· .. '
!·. ,: h .:i.
l I
Co111bi11i11g M11ftiple Obsen•atio11s with M111tip/e ObservatioJ,s □ Example 4.6 89
lii.1 Attempt to read a,1 observatio,zfrom do until(_iorc_=%sysrc(_dsenom)J;

STOCK, based 011 the value oftl,e key if dU!TUlly then product=99999;
variable PRODUCT. Repeat tile process set stock key=product;
until the value of PRODUCTfrom SALES
does 11ot match any value ofPRODUCT
from STOCK. -When DUMMY is tme
(equals 1), set PRODUCT to a 11011existe11t
valrie. The DO UNTIL loop executes and
processes observations from STOCK until
no observations contain the current value of
PRODUCT. (For information on _IORC_
and %S YSRC, see the Appendix.)
DUMMY is true when there are more
consecutive observations in SALES that
contain the same value for PRODUCT. Set
PRODUCT to a nonexistent value.
Changing the value of the KEY= variable
forces the pointer to return to the beginning
of the index, so that later observations in
SALES can find matches for the same value
of PRODUCT in STOCK.
Use the value of _IORC_to co11ditio11ally select Liorc_l ;

process observatio11s, W/Je11 the vallle of when (%sysrc(~sok)) output;
PRODUCTfrom SALES matches a
PRODUCTvalriefrom STOCK, write a11
observatio11 to SHIPLIST. When the value
of _IORC_ co1Tesponds to _SOK, the value
of PRODUCT in the observation being read
from STOCK matches the current
PRODUCT value from SALES.
t.1 When the current obser,atio11fro111 when (%sysrc(_dsenomJ)

SALES has 110 matching value/or do;
PRODUCTil, STOCK, set _ERROR_to 0. _error_to; ,
If the c11rre11t observatio11from SALES is if not last.product and not dummy then
11ot the last i11 the curre11I BY group 011d if do;
DUMMY is 1101 true (does 1101 equal 1), dummy=l;
then set the values ofDUMMY and _iorc_=O;
JORC_ accordiligly. When the value of
end; .
_IORC_ corresponds to _DSENOM, no
observation in STOCK contains the current end;
value of PRODUCT from SALES.
_ERROR_ is reset to Oto prevent an error
condition that would write the contents of
the program data vector to the SAS log.
111 case ofa11 u11expected _IORC_ otherwise

co11ditio11, write au error message artd stop do;
executio11. When _IORC_ corresponds to put 'Unexpe~ted ERROR: _IORC_ = '
anything other than _DSENOM or _SOK, stop;
an unexpected condition has occurred, so an end;
error message is written to the SAS log and end; /* ends the SELECT group */
the STOP statement terminates the DATA /* ends the DD UNTIL loop*/
end;
step.
run;
::iu Ai:ampie "t,IJ u II'':~"~I"'"'.,
I:I : . ' .
Finding a Match for Consecutive Duplicate Values
. I
Much of the logic of this program fo~uses on the need to successfully match
observations containing consecutive duplicate values of the BY variable
PRODUCT in SALES with observati'ons in STOCK that contain the same
value for PRODUCT. Unless you re~osition the pointer at the beginning of the
PRODUCT index for STOCK, conse~utive duplicate values of PRODUCT in
SALES will not be successfully matc~ed.
I
'
The SELECT group in the DO UNTIL loop begins this process. When there
are no more matches in the index on STOCK for the cu1rent value of
PRODUCT in SALES, determine if there are more observations in the cunent
BY group in SALES. If there are mo1!e observations to process in the same BY
group in SALES and DUMMY has not already been set to 1, assign values to
_IORC_ and the variable DUMMY: .
when (%sysrc(_dsenoml)
do;
_error_=O; 1
if not last.product and not dummy then
i
do; ,
dummy=l;
_iorc_=O;
end;
end;
I
I
By changing the value of _IORC_ to 0, you cause the DO UNTIL loop to
• ·• . . , I
iterate agam: :
,i i
do until(_iorc_=%sysrc(_dsenom));
if d~y then product=99999i
I
set stock key=product; I
Because DUMMY is true (equals 1), PRODUCT is set to 99999, a nonexisting

value. Wh~n t~e SET statement execu~es again, the pointer is forced to the
beginning.of, t~e
.
index
.
on STOCK because
I
the value of PRODUCT (the KEY=
variable) q,a~ c~.~nged. No match is fo~nd for 99999, so the DO UNTIL loop
ends and processing returns to the top of the DATA step. Then the next
observation is read from SALES:
data shiplist(drop=dwnmy);
set sales;
by product;
dummy=O;
Because the pointer is at the beginningiof the index on STOCK, the

observation with a consecutive duplicate value for PRODUCT in SALES finds
the appropriate match in STOCK. DUMMY is reset to Oat the top of the
DATA step so that its value does not tr~gger a change in the value of
PRODUCT when it is not needed. /
i
i
i.;
I,; :
Co111bi11/11g Multiple Obse11'alio11s with Multiple Observatloi1~I □ Example 4.6 91
; 1,
Related Technique II you are familiar with slructured Query Language (S~J you may want to
use PROC SQL instead of the DATA step. PROC SQL jofn.s the tables* to
produce a new table, SHIPLST, which includes informatiotj from both input
tables. : I:. :.
:i I1
I
1
Conceptually, the join results in an internal table that matches every row in
SALES with every·row in STOCK. However, you want oril~
the rows where
the values for PRODUCT are the same in both tables. Thei~HERE clause
returns the rows from the join that have the same values for ~RODUCT:
' :\,: I
·,
proc sql; ,
create table shiplst as
select *
from sales as· a, stock as b
where a.product=b.product; 1··
quit; ' :
,,
1-I
Note: In PROC SQL, SELECT statements automatically:p oduce a report.
SELECT clauses, which follow CREATE TABLE or CRE~rm VIEW
• I I
statements, do not automatically produce a report. Ii I
• A PROC SQL table is a SAS data set. In SQL terminology, ,columns ~~ i variables and rows
are observations. ·
• ·,t
Example 4. 1:! Combining

.
Multiple Data
I
Sets without
•
a
11
11 Variable Common to AllI
the Data Sets
t-!
ii
Goal !I Combine three tables* that do not shdre a common column. One table has one
iri
column common with each of the bther tables. Use this relationship to
IIi,! combine all
three tables. Group the <l~ta and use a summary function to
summarize numeric data for each grohp.
. I
,,Li i
'I
i
Strategy Join the three tables using the SQL procedure. Use the compound WHERE
clause to create a three-way join. Use!the GROUP BY clause to group the data.
Summarize the numeric data in the grbups using the SUM function.
I
You do not have to sort the data prior\o joining the tables.
I
I
Input Data Set, J:i I
I
The DAILY table has [t!ie ITEMNO column EMPLOYEE I

in common with the PRICES
I••
table and
IDNUM in common Vi'+~h _the EMPLOYEE OBS ~o NAME IEMPTYPE LOCATION
table. In EMPLOYEE'( ;1p~ ID column !
matches the IDNUM column in DAILY. . '34f
:1 ' 1 Kreuger, John H Bldg A, Rm 1111
j i
!I 2; I iii1 Olszweski, Joe s Bldg A, Rm 1234
. 5112 s
i 3
4 ! i132'
Nuhn, Len
Ngµyen, Luan s
Bldg
Bldg
A,
B,
Rm 2123
Rm 5022
5 , 515]; Oveida, Susan s Bldg D, Rm 2013
i
I:
6
7
,,
,; 355!'
· ~7.82_"
Sook, Joy
co'muzzi, James
H
s
Bldg E, Rm 2533
Bldg E, Rm 1101
!
:! B 381' '\
Smith, Ann s Bldg C, Rm 3321
:i
·1
I
,1
•;1
u
i:[
:11
• A PROC SQL table is a SAS dnla set. In SQL terminology, columns are variables and rows
are observations. '
t:01110I11111,: MII111pte uvse1va1I011s 1111111 MI1111p,e U/Jservat/011 l □ Example 4, 7 93
I
DAILY i
i
OBS ~DNUM UBMNO
. '
QUAN'l'ITY iI
1 ~~-~ 101' 2
2 3~~ .103, 1
3 511 101' 1
4 ?1~ 103 1
5 5112 105 1
6 5132" '.to5'·, 1
7 3551" ...l~t 1
8 ~S5~ -@$i 2
9 ~78~ 104
•. i·
1
10 ·34~ :1ot 2
11 Sll 1.0·i: 1
12 ~1~ :103• 3
13 511~ 10S; 1
14 5112 ro1:; 3
15 5132° ~o~ 2
16 3551 @J; 1
17 ~551 @Ji. 2
18
19
355~
,3782
®:
104: 1
2
.
. i
20 3782 -105 ! 3 Ii
PRICES
OBS I'l'BMNO PRICE
1 ·101 0.30
2 .10.2 0.65
3 103_ 2. 75'
4 104 1.25'
5 105 0.85,·
"i
'
Resulting Data Set .li I·
!:
011tp11t4,7 CHARGETable
CHARGE u I·
!i;-
OBS .ID IWIB LOCl\.'1.'XON T()'l'AI, ; 'l'll'PB
I' I
1 341 Kteur,ar, John Bldr, A, 111111111 $3.95 : ca~h charge
2 3551 Sook, Joy . Bldg I, Rm 2533 $11, 40 - ctah charge
3 3782 comuzzi, James Bldg 1, Jim 1101 $5, 05 ; payroll deduction
4 511 Olazweald, _Joe Bldg A, 1111 1234 $11.60 . payroll deduction
s 5112 Nuhn, Len ; Bldg A, 11111 2123 $2. 60 : payroll deduction
6 5132 Nguyen, Luan Bldg B, Alli 5022 $2. 55 : payxoll deduotion
'• l"I·I
"" ,::,X<1111p1e 't,I
I .
I
<-j '-'"'P'",...,
!I I
;
Program The objective is to join the EMPLOYEE, DAILY, AND PRICES tables to
learn the total charges for each employee. Use the common columns to join all
three tables.* As a result of the join, all columns from all three tables are
available to process. By joining DAILY and PRICES, you can multiply
:j
QUANTITY and PRICE to get a dollar amount for each purchase. By joining
I:, ;j
DAILY and EMPLOYEE, you can get the name of the employee who made
! each purchase.
!I'
I
I I
Ii Group the rows so that you can perf9rm a summary calculation on each group
.i l:1
I i'.i
and get the total charges for each employee. Grouping the data also eliminates
duplicate rows.
'I
!j
;;i Two employees,·381 and 5151, have no charges in the DAILY table.
'i Therefore, there are no rows with these two IDNU.Ms that satisfy the WHERE
i! conditions:
I; !l;,i
fovoke PROC SQL imd create a table. The
I 'j
proc sql;
CREATE TABLE statement creates the create table charge as
table CHARGE to stbr~ the results of the
subsequent query. ! ;! I
i 'I :
Begin to specify the ~~lm1ms to be in the select id, name, location,
query result, Becaus~;ID, NAME, and
LOCATION occur ohly ,.. in the EMPLOYEE·
table, you do not have to prefix the table
'
. h . I
a1ias to I eir names .. ,::
I
:
1 \I :
Create a lleW colrmm '1~iti1 an al'itlm,etic surnlquantity*price) as:total format=dollar8.2,
expression. The SUM function sums the
values that tesuit frm~ pl)-lltiplying
QUANTITY and PRtq_E, The column _
TOTAL shows the 10·1al charges for each
I I .
employee. Because t11e 1data are grouped,
the value of TOTAL isI •I
'for each group, (If
the data were not gro~~ed, the value of
TOTAL would be th]· total for the entire
table.) . I;
Crea~ea coltmmji·o1 i'j1~PTYPE. T11e case emptype

CASE expression cre11trs a character when 'H' then 'cash charge'
column, TYPE, based cin the values of when 'S' then 'payroll deduction'
EMPTYPE. 1 ii 1
else 'special'
I ii end as type
! !i ;
Name tlte tables to joi1~ia11d query. The from employee as e, daily as d, prices asp
number of tables that ff~ specified in the
FRQM clause indicate Iiow many tables you
are joining. The AS cl1ause specifies an alias
for each table. Table alikses
1-i
provide a
shorthand method for referring to a table in
other clauses. I:I
!I
I
I
:1
'I
I
,i
tj
,i
• The columns 1h11t you join on do not have to have the same name.
·;
i
I
i
I
Ii
II
I;!
\,,.UIIHIIIHllg J.t'JHIIIJ)tt: vv.n:1 VUUUIIS tvltll lV.lHlllpu: VV~C:l"l'UIIUII.) . L.J e,xan,pte 4.7 95
. . I.
i
Joi11 tlte tables, ITBMNO is common to the where p.itemno~d.itemno and id~idnum I
'
DAILY and PRICES tables. IDNUM and
ID are common columns in the DAILY and i
I
EMPLOYEE tables, respectively. I:
Group the data to get the total for eaclt
employee. The GROUP BY clause returns quit;
group by id, name, location, type; I I!
one row for each employee. In the GROUP I:
BY clause, if you list each column specified 1:
in the SELECT clause, PROC SQL has to j:
make only one psss of the data. i: I
:
!:ii II
Note: In PROC SQL, SELECT statements automaticallyI _groduce a report.
•
SELECT clauses, which follow CREATE TABLE or
I
VIEW C~,1I: TE
:
• I '
i ;
: .
1:
1·
I
I
I:
I
,.g r.xu111p1e -,,o u II ~'"'P""",.,
),
Example 4-~! I Interleaving Nonsorte~ Data. Sets
\ii
'I
Goal ,i Combine two tables'-' that contain columns with the same names. Create a new
ii table froni the result. Put the data in order according to the values of two of the
columns. ·
Strategy PROC SQL provides set operators that enable you to work-with the results of
two independent queries. Use the OUTER UNION set operator to concatenate
the two independent query results retqrned by the SELECT clauses. Use the
I' CORR keyword to overlay like-name~ columns.
I
Input Data Sets!
I ONE 'l'WO
OBS DA'l'E DEPAR'l' FLIGHT OBS DATE DEPAR'l' FLIGK'l'
1 I 01JAN93 '7:10 114 !,; 1 01JAN93 6:21 1'16

2 ; 01JAN93 10:43 202 \ 2 02JAN93 9:10 176
3 ! 01JAN93 ·12:16 439 ' 3 03JAN93 8:21 176
4 02JAN93 7:10 114 4 04JAN93 9:31 176
5 ! 02JAN93 10:45 202 5 05JAN93 . 8:13 176
Resulting Data fft

' '
' '
Oulpttt4.8a SCHBDULEData Set '
1 ·1 SCHEDULE
SCHEDULE was created with PROC SQL.
I: OIIS DATB I DIPllR'l' FLIOH'l'
I: 1
2
01JAN93
0liJAN93
7110
8131
, 10143
114
176
I 3 01iJMf93 202
4 01JJIN93 i 12116 439
5 02JAN93 7110 114
Ii 02JAN93 9110 176
I: 7 02JMl93 10145 202
1· 0 03JAN93 8121 176
g 04JAN93 9131 176
I 10 05JAN93 8113 176
~--~--=----------
---~~=---~--------~ --,- -----~
----
,. A PROC ,SQL table is a SAS data set. In SQL terminology, columns are variables and rows
are observations.
1'
I
!t
t:
--•- ■ -uouy r-- - ~--·. n••-··-
........ ,...... - -·-----·r- ... ----· ....... • ·: J - -~-'""I"'""" .......
i'
I
Output 4,Bb SCHED Data Set
SCHBD
SCHED was created with the DATA step. I
OBS DATB DBPllR~ lLIGH~ .
l 01JAN93 7110 114
2 01JAN93 8:21 176
3 01JAN93 10143 202
4 01JAN93 12116 439
5 02JAN93 7 ilO 114
6 02JAN93 9110 176
7 02JAN93 10145 202
8 03JAN93 8121 176
9 04JAN93 9131 176
10 05JAN93 8:13 176
· ,, I'
1:
. :! I:
Program The objective is to combine the tables so that all the flight '.ihformation is in
t bl ,I
one a e. i ji
To make the table more useful, order the data by the date, and then by the
departure time: I:
Invoke PROC SQL and create a table. The proc sql;
CREATE TABLE statement creates the create table schedule as
table SCHEDULE to store the results of the
subsequent queries and set operation.
:
Select all colmm1sfrom table ONE. select *
!:
from one
,,l
11
Co11cate11ate the two query results. The outer union corr
OUTER UNION set operator concatenates Ii
the queries returned by the two SELECT
clauses. CORR overlays columns that have I!
Ii
the same name. This operator must. come

between the two SELECT clauses.
i!
Select all col1111111sfrom table TWO. select *
from two
Order the concate11atio11 by the values of order by date, depart;

DATE a11d DEPART. Without quit;
ORDER BY, the rows from table TWO
appear after all of the rows from table ONE.
Note: In PROC SQL, SELECT statements automaticallyiproduce a report.

SELECT clauses, which follow CREATE TABLE or CREAIT'E VIEW
statements, do not automatically produce a report. I
i
I
, I.
Related Technique ' Using OUTER UNION CO~R is equivalent to using the Sf,T statement in the
DATA step. The DATA step requires that the data be sortefi ~y the BY
variables. This DATA step produces the same output as the!~~OC SQL step:
data sched; : I; I;
set one two; i: ,,
by date depart; I; I
i' .
run;
!,,
.
I'
I·
>10 £J,J,a111p1e -,,o u
Note: If you use PROC SQL, you do not have to sort or index the data. If the
data sets i,ire not sorted (regardless of;being indexed), it is typically more
efficient to use PROC SQL. If the da~a sets are sorted, it is typically more
I. efficient to use the DATA step and tll'e SET statement.
I !
I:i
I:
Example 4.9 Interleaving Data Sets Based on a ¢.9mmon
Variable !; ·
1:
i I: I!
Goal Interleave two data sets containing a common variable. Aisb, demonstate that
testing the value of an existing variable instead of a new ✓atiable can produce
unexpected results when using BY-group processing. · j: I:
: 1:
Strategy Sort the input data by the BY
.
variable. Specify both input dbta
. I
sets in a single
SET statement. Use the IN= data set option with one of the aata sets to create a
. n
variable that indicates when that data set has contributed tg kn observation.
Two examples of the sam~ program illustrate how unexpe~~ed results can
occur. The first program tests the value of and updates a vati~able read from the
input data set. It produces unexpected results. The revisedjyersion produces
accurate results by testing and resetting the value of a varil,lile created during
theDATAstep. • : !J:
CAUTION! ; 1I .
Variables read from input SAS data sets are retained ~¢ oss DATA step
iterations. Testing or resetting those variables can prod Ice unexpected
results. ■ ·· • l·i ·
:
!'
i:
Input Data Sets

Data sets ONE_A, ONE_B, and TWO ONE_A TWO.
contain the varial>lc COMMON. Data set
ONE_A contains the variable TEST while TEST OBS
'IJ
, ! I
COMMON• SWITCH
OBS COMMON
ONE_B does not. The first program reads
data set ONE_A and produces unexpected
\'
1 'A N
results. Both programs use data set TWO. 2 A y
1 A AAAA
2 C cccc 3 A N
4 B N
5 B y
6 B N
I
I
ONE_B
OBS COMMON
1 A
2 C
I uu l!,XCllllpll! ¥,:,
'·I
u: I \,llllplt:I'.,
I I .
II
Desired Results
ll
Output 4.9a COMBINED
I , Data Set ·coMBINID
TEST contains the vi1!e TRUE only in OBS COMM~ SWITCH TBS'l'
observations 3 and 6 ' I 1 A

2 A ti
-I 3 A y TRUB
II '
5
A
D
N
II
.,·1 6 B y !!RUB
'I 7 B H
·I 8 C
i
-I
I
Original Progrant The objective is to interleave data sets ONE_A and TWO, based on the values
I of the BY variable COMMON. Readithe input data sets with BY-group
I
I processing by using the SET and BY statements. Use the IN::: data set option to
I
I create variable IN2, which will be se~ to 1 (true) for each observation that
,1
i originates from data set TWO. With an IF statement, test the value of IN2 and
1·i
i the value of the variable SWITCH to ~etermine whether to set the value of the
.i
!"I
·I
existing variable TEST to 'TRUE':
:·1
11
'i
:!
Create COM/JINED. 1~ead a11 observation data combined;
fi·om data set ONE.A a11d data set TWO, set one_a two(in=in2);
nsi11g BY group p,·oc~s~ing. Variable IN2 by common;
will be set to 1 for eaqhiobservation to
which TWO contributes. COMMON is the
-:•.J .
BY variable. i !!
.I,;
If data set TWO /1as coi1lribnted to a11 if in2 and switch = 'Y' then test = 'TRUE';
observatio11 (INZ is il·f1,J) a11d if t/le vallte
'I , run;
of SWITCH is 'Y', tlifli set tl,e cm·1·e11t
valt1e of TEST to 'TRUE', The assignment
statement assigns a valub to the existing
variable TEST. .
Unexpected Results
Ii•
1
Oulptlt 4.9b COMB1rf3D Data Set COtmIRBD

!
I.I
1,
In observations 4 and r,rEST incorrectly OBS COMMON '1'19'1' SIIITCH
con.tains the value TR~. Only 1 A AAAA

2 A H
observations 3 and 6 s1~1ould
.I contain this 3 A TAUB y
value. i
:I '
5
6
A
B
B
TltUll
TRW
N
lJ
y
·1. 7 II TRW ti
I 8 C cccc
,j
.,:1
----~- -
·i ~ -
:I
~u"'v"'"'6 mmu,-,v ~v••• ••mw•• """ ,...,.,,,,,. ~v••• ••mv,r !: ~ =muy<~ T,_, IU l
At first glance, Output 4.9b seems to show that the IF coddttion did not work
correctly since observations 4 and 7 contain the incorrect!v~lue of TRUE.
Actually, these observations contain incorrect values for TEST because its
value is retained in the program data vector throughout th~ [ife of the current
BY group. It is replaced only when a new observation is ie~d from data set
ONE_A. Because TWO contains multiple observations 'w;ith
the same value of
the BY variable while ONE_A contains unique values of th~ BY variable, the
value of TEST is duplicated across all remaining observa icins in the current
I
BY group. l
Revised Program The revised program uses the same code, but different in~u data. It reads data 1
set ONE_B, which does not contain TEST. This program H~ts and changes the
value of TEST as a variable that is created during the DAT step, not read f
from an existing SAS data set. The assignment statement freates TEST in this
example. Its value, therefore, is not retained throughout tne urrent BY group,
so testing and setting its value does not produce incorrect tesults when
f
subsequent observations in the same BY group are read frbrh
data set TWO.
See Output 4.9a. · ·
'
I: :
, I
' 1'
Create COMBINED. Read an observation data combined; I·I'

from data set ONE_B a11d data set TWO,
11si11g BY-group processi11g. Variable IN2
set one_b two(in=in2);
by common;
I:
will be set to 1 for each observation to
which TWO contributes. COMMON is the
BY variable, I1:
1:,I
If data set TWO has contributed Jo an if in2 and switch= 'Y' then test= 'TRUE'; 11
observation (IN2 is tme) and if the value run; i
Ii
of SWITCH is 'Y', then assign TEST a I;
value of 'TRUE'. The assignment statement 1!
creates TEST and assigns it a value, Its !' !'
value is reset to missing upon each iteration
of the DATA step.
IU.O: l!.XCllllpll: 't,lV 'ui ·1.,1111p1er.,
Ii
Example 4.1. p Comparing All Obser"'.ations with the Same
;I BY Values '
I
I
Goal Create a new data set by merging two data sets, each of which may contain
multiple observations with the same BY values, and by comparing all
observations with the same BY values.
i
·I·'
Strategy Begin with two data sets that may co~tain multiple observations for each
11 unique value of the common variable'. In the DATA step, read these data sets
I I to create new data sets that contain one observation per BY group, with
variables whose values identify the observation number of the first and last
observation for each BY group. Merge these two new data sets so that all the
informati~n about BY groups in both:data sets is in one location. Include only
observations to which tlie first data s~t contributed.
To create the final data set, read all three data sets: the two original ones and
I the merged one that identifies the first and last observation in each BY group.
,,! Because each of the original data sets;may contain multiple observations with
duplicate values of the BY variable, you must loop through the BY groups in
'II each data set multiple times to compa~e each observation in a given BY group
.,i with each .ol:!servation in the same BY; group in the other data set. Therefore,
use the POINT= option to directly access both data sets by observation
number. ' ' · ;
'
;
'I You can perform the same task with P,ROC SQL; see "Related Technique,"
·I !
I
<I Note: Due to the variability of data ~nd the number of conditions that
;j
determine the path chosen by the PRQC SQL optimizer, it is not always
I possible to determine the most efficient method without first testing with your
i data. · 1
I
I
. I '
Input Data Sets! i I
Both BREAKDWN andiMAINTcontain
'., BREARDWN MAil'li'
for
multiple observations I • certain values of
the BY variable VEHICLE. Each is sorted OBS:. 'BRKDND'l' VEHICLE OBS MNTDAi'E VEHICLE
by VEHICLE, Within ~~HICLE, each is
also sorted by date of breakdown 1 02MAR94 1
AAA 03JAN94 AAA
(BRKDNDT) or mainferiance . :20MAY94°
2 AAA 2 05APR94 AAA
(MNTDATE),
3 l9Jijij94' AAA 3 10A0094 AM
4 29NOV94 AM 4 28JAN94 CCC
5 ' ' , 04JUL94 ,' BBB 5 16MAY94 CCC
6 31MAY94 1 CCC 6 070CT94 CCC
;
7 I · 24DEC94 ' CCC 7 24FBB94 DDD
8 22JON94 DDD
9 19SEP94 DDD
i
i
i
'·
!
i
,I
ri
:1
I
'I
Co111bi11i11g M11l1iple_Observa!io11s wit/I M11/tip/e Observation, D Example 4.10 103
BRKKEY and MAINTKEY identify the BlUCKEY lMAINTKEY

observation number of the first and last
observatjon for each BY group. OBS VEHICLE FIRS'l'1 LAS'l'l OBS VEHIC FIRS'l'2 LAS'l'2
1 AAA :1 4 1 ~-! ,I
1 3
2 BBB '5 5 2 CCC::,, 4 6
3 CCC 6 7 3 DDD!' 7 9
KEYS is the result of merging BRKKEY KEYS

and MAINTKBY.
OBS VERICLE FIRS'l'l LAS'l'l FIRS'r2 LAS'l'2
1 AAA 1 4 1 3
2 BBB 5 5 I
I
3 CCC
I
,' 6 7 4 6 ;!
~ I
';
, I
I
Resulting Data Sets I!
O11tput 4.10a FINALl Data Set
FINALl was cr!:ated with the DATA step. OBS

I
VBHICLB
FIIIAL1
.BIIKDIID'l' LASTMll'l'
Ii
I
1 AAA 02HAR94 03JAN9' : .I
2 AAA 20MAY9' 05Al'R9' . I
3: All 19-'UN9' 0511'119' '
II
s:6'; ..
AAA 29N0VH l0AUG94 :
BBB 04JUL94
CCC 31MAY.94 16HAY9' , I
7, CCC 24DIC94 070CT94 '
O11tp11t4.I0b FINAL2 Data Set

FJ'NAL2
FINAL2 was created with PROC SQL. OBS VBJ!ICLB BRKDND'l' LASTM!l'l!:
l' AAA 02MARH 03JAN94
2, AAA 20MAY94 05APll94
3 All 19JDNM 0SAPll94 :
! l1
's'
6,
AAA
BBB
CCC
29H0V94
04JUL9'
31MAY94
10AUG9' •
16MAY94
.. !
i I
! I;
f CCC 24DIC9' 070C!l!l4 ; i i ; L
1 I
I"! '
: I
I
iI I
1i i
l.1 !
Program j The objective is to create a:data set that shows the mostrec1 J :tmaintenance
j date for each time a vehicle had a breakdown. 1 !I:
I ; 1 !
!First, create data sets (BRKKEY and MAINTKEY) that corit~in one
j opservat!on foreach BY gro~p and two additiona~ variable~'. t~at identify the
1 observation numbers of the f1r~t and last observations for _tli~~ BY group.
!Merge these two data sets into a single data set (KEYS) so ihit you will be
iable to compare all observations in each BY group betwee~ t~o data sets. ;¥e
!Then read this merged data set and use the FIRSTl, LAST~; tIRST2, and
1LAST2 values to directly access all observations in each BY group in data sets
iBREAKDWN and MAINT. Then compare the valu~s ofMN!fDATE and
1BRKDNDT so that you can determine tµe correct value forirJ~STMNT, the
Imost recent maintenance d~te prior to each time the v:ehic~I i ~eded repairs.
i
ru•t·1 l!,.\'Ulllp1e~.1u u ~ttup1er.,
'·I
Ii
cr!ate BRI(KEY. Read a11 observatio11 data brkkey (keep = vehicle f.irstl lastl);
ft·o~11:BREAIWWN, 11sillg VEHICLE as . set breakdwn; (
tlle1~Yval'iable. Create variables FIRSTJ by vehicle; !
muf LASTl whose values represent the•
retain firstl;
obiei·vatio11 mtmber of the first and last
if first.vehible then firstl=_n_;
obieJ.vatio11 in each BY group. After
rea~ing the last obser11atio11 ill each BY if last.vehicle then
I
grdu},, write a11 observation that iucludes do; :

I ,
011lj .VEHICLE allCI FIRSTl a11d LASTJ. lastl=_~_;
REIT'AIN retains the value ofFIRSTl across output; i
DATA step iterations so that it is still end; l
avail~ble
I I
when LAST I obtains a value and run;
the 'observation is written.
l ,r
Cre'pte MAINTI(EY, Use the same logic as data maintkey (keep= vehicle tirst2 last2);
in thel preceding DATA step. In this DATA set maint;
steJ,'.the variables whose values represent by vehicle;
the Fbservation number of the first and last retain first2 ;:
obsrryation in each BY group are named
if first.vehi9le then first2=_n_;
FIRSf and LAST2.
if last.vehic~e then
I do•I '
I
last2=_Ii_;
i output; !
i
end;
I run;
; :i
C1·eCflif l(EYS by 111e1·gi11g data sets data keys; .
BRJ(l(EY
I , and MAINTI(EY, . based. 011 tlte merge brkkey(ih=inl) maintkey;
value 'of VEHICLE. Iuclude only, • '
by vehicle; II •
obsJr~atio,zs to which data set BR/(l(EY if inl; I
confilmted, The IN= data set option creates run;
a vafi~ble that is set to a value of I for each
iteration in which data set BRKKEY ,
cont~iimtes to the current observation. The
subs~tting IF statement allows only
obsir~ations to which BRKKEY
contti6uted to be written to KEYS.
fl
Create FINALI. Read a11 observatio11ft·om
· , data finall;
/(EY.S.\ Each iteration of this DATA step . drop firstl la~tl first2 last2 mntdate;
I; • BY group,
11.
processes an entire set keys; !
i
Rea14'1 obscrvalio11/J'011t BREAI(D WN in do i=firstl to ilastl;
the cru;re11t BY group. POINT== enables you · set breakd1•~ point=i;
to re~d data set BREAKDWN using direct foI111at lastrn'nt date7,;
acceis! For each DATA step iteration, this . I
lastmnt=, ; ·
DO loJp reads all observations in the
I I
curr,~p~ BY group in BREAKDWN.
LASf¥NT is initialized to missing for each
obse~v~tion in the current BY group of '
BRE;tl.l(DWN in preparation for
deterln1ning the most recent MNTDATE in
MAINT prior to the current BRKDNDT
frorri!BREAKDWN.
,j 1 ·
' I
I
'
' i I
I
'__L,I
Co111bi11i11g Multiple 0bsel1'atio,!S with Multiple Observations [I] Example 4.10 105
'. 1:
Ifdata set MAINT contributed to tlte if first2 ne. then

current observation, then execute this do j=first2 to last2;
nested DO loop to process all observatio11s set rnaint point=j;
from MAINT for this BY group. 011 each
if mntdate gt lastmnt and rnntdate le brkdn~
iteration, read an observatio11from
MAINT a11d compare the values of then lastmnt=mntdate;
MNTDATE, LASTMNT, a11d BRKDNDT. else if,mntdate gt brkdndt then leave;
If the MNTDATE value ir greater titan end;
LASTMNTyet less tllan BRKDNDT, 11le11 output;
set LASTMNTequal to the value of end; /* ends: the outer iterative DO loop*/
MNTDATE. If MNTDATE is greater tha11 run;
BRKDNDT, tlle11 yoll know there are 110
more mai11te11a11ce dates prior to tile
breakdo11111 date, so stop processing this
DO loop. On each DATA step iteration, this
DO loop executes if MA INT FIRST2 does
not have a missing value. The DO loop
reads and processes all observations in the
BY group until it reaches a MNTDATE that
is past the BRKDNDT or until all
observations have been processed. The
LEAVE statement allows processing to exit
the DO loop and to begin executing the next
statement in the DATA step, which writes
the current observation to FINAL. Because
BREAKDWN is sorted by date of
breakdown within VEHICLE, it is
appropriate to exit the DO loop when the
value ofMNTDATEexceeds BRKDNDT.
i i
! I
i i
iI
Related Technique ;If you are familiar with Structured Query Language (SQL), y '. u may want to
use PROC SQL instead of the DATA step. PROC SQLjoin~ fpe tables* to
1
produce a new table, FINAL2. Conceptually, a join results i½lan internal table
that matches every row in BREAKDWN with every row MA.INT.
1
in
i . : 11 Ii
ffhis example shows a left join, which returns all rows that Jrie~t the ON clause·
briteria and the rows from the left table (BREAKDWN) that:]'pnot match any
row in the right table (MAINT). ; i j
~he ON clause specifies that the resulting table will contain 1 jly those rows
;,,,,here the values of VEHICLE match and where the breakdd~n date is later
l
' • II I
than the maintenance date. : · 11 ,
I
I . i' I
pie HAVING clause ensures that you get the row with th~ lat<:st maintenance
date for each vehicle. : II II
I 'Ii
To understand how this join works, consider the matches fo~ ;t~e breakdown
date of 20MAY94 in the BREAJ{DWN table. ON specifies that,~hejoin will
r•m only rows from tho internal table where th• value of ,/1.1rcle is the same
i : !l
I '. 1 I
:• A PROC SQL table is a SAS data set. In SQL terminology, columns ir~ variables and rows
i are observations. : j !
l !; I
I ;
i
I I
i
: I
I i
1uo l!.Xa111p1e .,.,,v :IJ c..11aprer 'f
l
I
'
and where the breakdown date is later than the maintenance date. Only two
rows from the internal table
.
meet both
'
of those criteria:
VEHICLE BRKDNDT LASTMNT
¥A 2.0l-!AY9.4 ·05APR94
AM 20MAY94 03JAN94 i
I
I
Because the HAVING clause further;restricts the result to include only the row
that has the latest maintenance date, only the shaded row appears in the final ·
result. ' i ·
I
!
The row for vehicle BBB is the only ~ow returned by the join from the left table
that does not have a match in the right table.
I
i
Here is the PROC SQL step that creates FINAL2:
' !
proc sql; !
create table final2 as i
select b. vehicle, b,brkdndt, m.mntdate as lastmnt
from breakdwn b left joi~ rnaint rn
I on b.vehicle=m.veh~cleand b,brkdndt >= rn.mntdate
group by b.vehicle, b.brkdndt
having rn.rnntdate = rnax(m'.mntdate);
. I
quit; !
l i
SELECT clauses, which follow CRE~TE TABLE or CREATE VIEW
statements, do not automatically produce
I
a. report.
!
i
□ LEAVE statement, For a compl~te description with an example, see
pp. 34-35 in SAS Technical Report P-222, Changes and Enhancements to
Base SAS Software, Release 6.07.!
I
''
-11
'
. 'I
_____________
CH APTER
,..5!
I
i :
j
Manipulating Data From a Single Source ; !
'
1 I!
I' II
You can work with the data in a single data set in many w' 1s to enhance it or
reshape it as you need. For example, you can calculate ne~ Jalues from
existing variables, apply common operations to a group of/y~riables, collapse
observations, or expand observations. For a complete list oflthe tasks covered
in this chapter, see the example titles below: : 11 1 I!
! I I
5.1 Performing a Simple Subset · 108 ; I[I!
5.2 Separating Unique Observations from Duplicate Ob.s~r.,vations 110
: I II !
5.3 Accessing a Specific Number of Observations from tp.'[;Beginning and
EndofaDataSet 112 : ii !
: : !J :
, : I I
5.4 Adding New Observations to the End of a Data Set! /115
: . i 11
5.5 Adding Observations: to a Data Set Based on the V~lUfjPf a
Variable 118 : ! I! jl
5.6 Simulating the LEAD Function by Comparing the V. a,]! of a Variable to
Its Value in the Next Observation 120 : 11 i
• I I
;5. 7 Obtaining the Lag (Previous Value) of a Variable w'itHi : a BY
Group 122 , i I\ i
:5.8 Applying Common 6perations to a Group ofVariabill 124
!5.9
Cumulative Total 126 l I!I
5.10 Calculating the Percentage That One Observation C.Jn.~ibutes to the
Total of a BY Group 129 ; j 1 I
:S. 11 Adding a New Variable that Contains the Frequendy-lb.~i a BY-Group
: Value 132 : l iIJ
5.12 Subsetting a Data Set Based on the Calculated A verag I of a BY

Group 134 · /!
I l
• I i
5.13 Really Roundmg Numbers 136 I !
. • 11 '
5.14 Collapsing Observattons within a BY Group into a ~i~ !le

: Observation 140 ; Ii i
5.15 Expanding Single Observations into Multiple ObservaJions 142
I' I
5.16 Reshaping Observations into Multiple Variables 14J i
i i i
: I\
i
I :
i
I I
I !
11
iI
!j
II II
' :
i:
I I
108 Example 5.1 j j Chapters
!I
Performing a Simple ~ubset.
I
,1
!
Goal :I Create a subset of a SAS data set efficiently by selecting for processing only
:i observations that meet a particular c~ndition.
!I
Strategy i •
To subset a SAS data set based on a variable value, you can use the WHERE
statement with the SET statement to ~pecify a c~ndition that the data must
satisfy before observations are read into the program data vector. Using a
WHERE statement is efficient because it talces effect before the SET statement
executes on each DATA step iteratioA. Instead of reading all observations, the·
SET statement then reads only the obkervations from the input data set whose
data meet. the specified condition. [
l
!
·I.
i
Input Data Set~ i NEl'IBIRES
OBS HAME DEPT ID
1 Estefon, Emilio Toys ! 54345

I
2 Wentworth, Guy HardwarT 43454
3 .. Nay, Rong Automotive 23234
. I , ''. I '.
4 : Harper, ·Chang Toys 45434
5 : Smart, Matthew Toys 45412
: Ochman, Andre Toys
i
6 ! 45413
I
1
7 : Welk, Liz Ann. Hardware 32322
8 I Jordan, Erica Linens ! 31012
Resulting Data:set
fl'
Outprlt 5.1 TOYDE: !Data _Set
TOYDBl.'1'
I
!
' OBS NAMB I DBPT ID
!
' I
; ' l Bstefon, Blllilio Toys 5'3(5
2 Harper, Chang Toys 45434
3 smart, Hattliew Toya 45412
4 Ochman, Jindra 'l'oys 45413
II
u. .. ,.l" .. '"'~"' 6
~ _ ..., ..... '"''" .... ..,.,, 6 .. L- uvnj&..f l L..J ~ .. ,,,,,p,.: J.~
•'•. 1'I; .';
Program . The objective is to create a subset of the data set NEW~IIIBS that includes
i only employees in the Toys department. The WHERE state 1 ent allows only
observations that have a value of 'l'oys for DEPT to be re I·d by the SET
I
statement:
i'
CJ"eate TOYDEPT. Read an observatio11 data toydept;
jl'Om NEWHIRES 011/y if tlte employee set newhires;
works in tl,e toy depart111e11t. The WHERE where dept= 1 Toys 1 ; .
statement prevents unneeded observations run;
from being read into the program data
vector. II I
I
! i
iI
!
i
:
I
i 11
;
I
i
11u r.xamp1e .J.~ J : j 1.,11apu:r .J
ii.iiI :.
Example 5.:'! Separ~ting Unique Observat(ons from
' Duplicate· Observation:s
i
Goal Identify duplicate and nonduplicate observations in a data set and write each tc:>
the appropriate data set. :
Strategy Sort the input data set by the BY variables. Read the input data set with the
SET and BY statemen~s. Use the FIRST. and LAST. vruiables for the
appropriate BY variable to determine, when an entire observation is a duplicate
! in the dat~ set. When both FIRST.vm'.iable and LA~T.variable fo~ the
appropriate BY variable are equal to ~ (true), then you know that the
observation is not a duplicate so write it to a data set. Write all other
observations to a data set for duplicates. .
i
Input Data Setf11 :
CLASDATA must b~ sorted by NAME and CLASDATA

CLASS within NAME,- I
OBS ID CLASS
[i NllMB
i'
i i1 1 3456 Amber CHEM101 :
I
! I ' 2 3456 Amber . MATH102
i 3 3456 Amber MATH102
i i
4 4567 Denis_e ENGL201
I! 5
:
4567 Denise BNGL201
6 · 2345 Ginny CHBM101
I 7 · 2345 Ginny ENGL201
Ii 8 2345
:·1234
Ginny
Lynn
MATH102
CHE~U01
9
I
10 ; 1234
0
Lynn CHBM101
11 1234 'Lynn MATH102
12 I 5678 'Rick CHEMlOl
13 , 5678 Rick HISTJOO
14 : 5678 Rick HIST300
Resulting Data !Siets

Output 5.2a DUPS J~ a .Set
,D!JPS
OBS ID : IWIE CLASS

i
1 3456 Amber KA'l'H102

2 3656 Alnbar HA!l'K102
3 4567 Daniaa INGL20l
( 4567 Denise EtiG!.201
!i 1234 i:,ynn CJIBH101
I 6 1234 CHBM101
i LY!UI
I 7 5678 llick HIS'l'300
I I 0 Sj78 ~ick BIS'l'300
Ma11ip11lati11g Dala From a Si11gle Sourc~ 1 O Example 5.2 111
. I'
1!
Otttpllt 5.2b NODUPS Data Set
NODUPS
OBS ID NAME CLASS
1 3456 )mbar CRBM101
2 2345 Ginny CHBM101
3 2345 Ginny ENGI.201
4 23'5 Ginny MATR102
5 1234 Lynn MATR102
6 5678 Kick ~Bl!HlOl
- ~---__ -_ -- ---====--====-=====~~---=--
. :11
Program The objective is to deterrriine which observations in CL~ ~ATA are
duplicates. A student's name may be in the data set more tl+n once, but no two
observations should contain both the same student name a~? the class.
· · · II 1:
First, sort CLASDATA b; NAME and CLASS. Then uie ~~
-group
processing to create the FIRST. and LAST. variables for tUJ.BYvariables.
When FIRST.CLASS and LAST.CLASS are both equal t<}iL you know that
the observation is the only one with these values for NAME and CLASS in the
· · · II'
data set. Write it to the NODPPS data set. If these variablrs are not both equal
to 1, the observation is a duplicate, so write it to the DUP I ata set:
Create DUPS a11d NOD UPS. Read 011 data dups nodups;
observatio11from CLASDATA 11si,ig tlie set clasdata;
SET'stateme11t a11d BY-group processi11g. by name class;
Specify NAME a11d CLASS as BY
variables,
Compa,·e tl,e val1tes of tlie FIRST.CLASS if first.class and last.class then output nodups;:i
a11d LAST.CLASS variables. Write a11 else output dups;
observatio11 to NOD UPS or DUPS, run;
depe11di11g OIi tlie 011tcome ofthe
comparison.
112 Example S.3
ll
m Chaptel' 5
d1'
:I '
Accessing a Specific Number of
Observations ·from th~ Beginning and End of
a Data Set 1
!
i
I
Goal i'I 'I
'
Process only the first five and last fi~e observations in a data set efficiently by
: !1 not reading the entire data set. I
I 11 I
.' ''I,, !
I
I
Strategy Process specific observations rather than all observations sequentially by using
the POIN,T= option in the SET state~ent. Use the NOBS= option in the SET ·
statement to assign to a variable the ~umber of observations in the data set.
Use DO loops to read only the first five and last five observations in the data
set. Because the application calls for l·eading at least ten observations, use
I IF-THEN logic to avoid reading sombI observations twice when a data set
' I
I contains fewer than ten observations.jReduce redundancy in your program by ,
using the LINK statement to repeated,ly route execution to a group of
Ii data-reading and data-writing statements.
'I
: 1!I
!
Input Data Set: i SALES
OBS NAME DAYSALES

I
'
!
1 Ball, George 674i
: I
2 Lee, Chin 1800!
3 Placa, Ace 25001
4 Leung, Ho 30001
I
5 Wagner, Willie 850!
6 DuBois, Grace 2000 1
7 ' Jernigan, Alec 7501

8 : Tilldale, Jules 1000:
9 Brown, Dick 555 I
10 : Hammer, Danny 400 i

11 \~ills, Wesley 800
12 . Grant, Heber 3500
13 Mooney, Hal 400
Resulting Data Set

Ii1 ,
Output 5.3 SUBSETn'ta Set
SUJISBT
ii Ve observations
I
The first five and last OBS NAME l>AYSALBS
from SALES. I 1 llall, George m

2 Lee, Chin 1800
3 Plac:a, AC:8 . 2S00
! 4 Leung, 110 3000
5 Wagner, Willie 850
6 Brown, Dick 555
7 B111111!ler, Danny 400
8 Wills, Wesley 800
9 G,:ant, Heber 3500
10 Mooney, Hal 400
Ma111p11tat(11g Data From a Si11gl~ s[·,,lrce □ Example 5,3 113
I I
Program The objective is to create a subset of the SALES dat~ , ~t that contains only the
first five and last five observations. Because SALE~~~~ more than ten
observations, you must set the values of the variables St'ARTOBS and
ENDOBS to indicate which observations to read: the' fit.st
I,'
five observations
and the last five observations. After these values are setJ link to a set of
labelled statements that read and write five observatiohs.
;·. i'. I:
1: I
If SALES has fewer than ten observations, you can simply read from the
beginning to the end of the data set. However, by usin~ the same method of
access, direct instead of sequential, regardless of the :sfa! of the data set, you
can link to the same block of data-reading and data-wrltlng statements, making
your code more compact: ' ·
Create SUBSET. data subset (drop=startobs endobs);
For data sets with more than ten . if numobs > 10 then
observatio11s,pl'Ocess tliefirstjive a11d last do;
Jfre obsel'vatio11s. The assignment startobs=l;
statements set the appropriate values for i endobs=5;1
STARTOBS and ENDOBS, which will be' link getobs;
used to control the DO loop that reads and: startobs=numobs-4;
writes observations. The first LINK '
endobs=numobs;
statement causes the statements that follow
the label GETOBS to execute and process ! link getobs;
the first five observations. Execution then 1
end;
returns to the statement following the LINK
statement. The STARTOBS and ENDOBS
I
values are reset, based on the value of i
NUMOBS, the NOBS= variable. (See the i
SET statement later in this program.) When
the program is compiled, NUMOBS is :
assigned a value equal to the number of
observations in data set SALES, The second
L1NK statement causes the labeled i
statements to execute again, this time
processing the last five observations.
'
For data sets with te11 or fewer ! else
observatio11s, process all obscrvatio11s. Th~ do; :
LINK statement causes the statements that ! startobs=1;
follow the label GETOBS to execute and : endobs=numobs;
process all of the observations in data set link getobs;
SALES.
end;
·Pl'eve11t tlte DATA step /tom co11tim1ous stop;

looping. Because there is no end-cif-file I
condition when direct access is used to read
data, you must use a STOP statement to I
prevent continuous looping. (See the SET I
statement with the POINT= option later in :
I
the program.) ,
II
a·• '¼ 1:M:a111pu: J. :J u 1.,1n1pu::r J
Ii ·
Ii ·
1,
ii .
Read a11d write each observation, as return;
i~)dicated by the vah,es of STARTOBS a11d ·getobs:
!JiNDOBS. The LINK GETOBS statements do i=startobs to endobs;
l~Juse
I '1
these statements to execute. The set sales point=i nobs=numobs;
iPOINT:::: option makes direct access
output;
~◊ssible. The NOBS option creates a
end;
~ru'iable named NUMOBS whose value is
the number of observations in the SALES return;
~Ata set. The RETURN statement that run;
ll :
precedes the label prevents any statements
•'I
that . follow the label from executing, except
,,,
tv. hen a LINK statement routes execution
th~re. (This RETURN statement is not
n1~cessary in this program, but using it is .i
HI )
~90~ practice because it's often_ necessary.
T,hc,final RETURN statement signals the
e~d
1 1 .
1
of the section labeled GETOBS and
routes execution to the statement following
ltib'I LINK
•
statement that linked to this block
wade.
II ·
!I
:•
;
!
I
I
J
11
;
Manip11fati11g Data From a Si11gle Somti i° Example 5.4 115
iI
Example 5.4 Adding New Observations to the End of a
1
Data Set !. ! '!
' I
I
I ,! I
Goal Add new observations to the end of a data set, while retaining the original
name of the data set. • i i·
I , .
1 .
I i
Strategy Use the END= option in the SET statement to determine 1w]l~fn the end of the
data set has been reached. Then use a DO loop to generate new observations
and append them to the end of the data set. If you want the r~·sulting data set to
retain the same name, specify the same data set name in th I DATA and SET
statements. : I! I
You can also create a new data set to contain the new obsef~~tions and then
add those to the original data set by using PROC APPEND) :~ee "Related
Technique." · I! :
I
iI '
Input Data Set TES'l'l I ,
i 1,
OBS X y I I
I
:
1 1 2 !
'
2 2 4
·:
i !
3 3 6 I:
·: I
4 4 8 i
5 5 10 i '
i
: I
i
Resulting Data Set ''

II
Output S.4a TESTl Data Set, New
Version TEST!
!
Ii
TESTl was produced with the DATA step.
OBS
1 1
X y
2
ILIi •
2 2 4 I'
3
4
3 6 I
4 8 i
' 5 5 ,10 II
6 6 12
!
? ? 14
B 8 16 ''; :
i
9 9 18 '
10 10 20 I
I'
I
11 11 22
12 12 24 I
13 13 26
14 14 28 I
15 15 30
I I
I
I
!
i'
;:
! I
i i
I
I.
i
j
!
,.
I
'
I
1 10 r,.mmp,e J,'1- l·i'I
I
1
~11up11:r .J
Output 5.4b TESTl Data Set, New

'l'l!S'l'l
Version 111 •
OBS X y
i
TESTl was produced ~ith the DATA step 1 1 2
and PROC APPENd. 1 • 2 2 4
3 3 6
i 4 4 D
I 5
6
5
6
10
12
II
7 7 14
8 8 16
I 9 9 18
! 10
11
10
11
20
22
12 12 2(
13 13 26
14 : 14 28
1s I 1s 30
I
Program !II The objective is to use the value of the END= variable to determine when the
I last observation from TEST! has been read. Then execute an iterative DO loop
I to generate ten new observations and add them to the end of TESTI.
I i
You can also use a DATA step to create the new data set and PROC APPEND
to add observations from the second ~ata set to the end of the first one, See
"Related Technique."
Specify a11 outpr,t data et wit/, ti,o same data testl(drop=i);

i-1 I .
. name as tJ,e i11pr,t da~a, set. Read an
1 set testl end=lastone;
observation from TEf.T.I. END= defines a output;
variable (LASTONE)itHat is set to 1 when
the last observation h~s been read from
11 • •
TEST!. The OUTPUTstatement 1s required
because use of an expV~it OUTPUT
statement in the DO Iddp disables the
et
~utoi:1atic OUTPUT 1 Acuted
1terat1onof the DATA step.
each for
:1 '
Afte1· the last observa ·h,, has been read,
1
if lastone then do;
ge11erate new observaHJ,,s
I ,I
a11d write them do i=l to 10;
to TESTI. WhenLAS'.f,ONEequals 1, this x=;+l;
1 ii :
DO loop executes to a sign values to X and y=yt2;
Y and to write ten ne ~bserva!ions.
i ' output;
.I
i end;
!
I end;
I run;
J ,
I
I
I
roa111p111a_1111g uara l' ram a .wrg1e .lo11rce1 : iu =ample S,4 117
: lij'
, I;
i Ii
Related Technique If your original data set is very large, it is probably mori ef 1cient to use a
DATA step to create the additional observations and then ~~d them to the end
of the original data set by using PROC APPEND. So that Y~'.u can initialize the
values of X and Y to their values in the last observation of/'IJESTl, read only
that observation from TEST! on the first iteration. Use a DO loop to create ten
new observations. Because there is no end-of-file conditioh ostop this DATA
step, you must use a STOP statement: ; [! .
data test2 (drop=il; j ; •
if _n_=l then set testl point=lastobs nobs=last6b ';
' i
do i = 1 to 10; :
X = X +1;
y = y + 2;
output;
end;
stop;
run;
proc append base=testl'data=test2;

run;
I IO l!,XQ111pte J.J LI i1...uup1er J
Iii
Example 5.il Adding Observations to a Pata Set Based on
the Value of a Variable;
i
Goal Ii Add a specific number of observatioris to a data set, based on the value of one
of its variables, so that the resulting data set retains the name of the original.
I
Ii
Strategy Read an observation from the data set. Use the value of an existing variable to
' I determine how many times the DO loop should iterate and write an
'
!
observation. tf you want the resulting data set to have the same name, specify
the same data set name in the DATA and SET statements.
Input Data Set I ' li I

Each observation has bniquc value for TASKS
JOB.
OBS DAYS JOB
I 1 l wiring
-1
2 2 drywall
3 4 flooring
4 2 trimwork
5 'J painting
Resulting Data s.let.

I !
'
I
!
Output 5.5 TASKS · ~ta' Set, New
/ISKS
Voraion 11 ' I
OBS DA'1'1!: DAYS JOB
ii
1
2
lOJULl995
llJUL1995,
i 1 wiring
drywall
2
:I 3 12JUL1995 • 2 drywall
4 13JUL1995 1 4 flooring
5 14JUL1995. 4 floorin11
6 17JUL1995 4 flooring
7 18JUL1995 4 flooring
8 19JUL1995 2 tri111Work
9 20JUL1995 2 tr!mwork
10 21JUL1995 3 painting
11 24JUL1995 3 painticg
12 25JUL1995 3 painting
Program The o~~tive is 00 use th~~~:e~f·~~: :: ~=~:J~.:· •· ,. ,•
observations to generate for each JOB and, beginning wit~ the current day,
. ,, 1
determine on what days the job will be done. The value 'ofIDAYS determines
how many times the DO loop iterates, writing an observ1a11i 1n each time:
Create an output data set with the same data tasks(drop=i testday);
11ame as the original one. Read a11 format date date9.;
observatio11fro111 TASKS. set tasks;
O,i tliejirst iteratio11, set the value of if _n_=l then date=today(J;

I
DATE. The TODAY function returns the i
J
SAS date value for the current day. j
Write one observation for each day that the do i=l to days;
task requi1'es a11d i11cl'ease the DATE value testday=weekday(date);
appropriately. Use the WEEKDAY if testday=7 then date=date+2;
function to derive the day of the week from if testday=l then date=date+l;
the DATE variable. If the weekday is output;
Saturday (7) or Sunday (1), then add either date+l;
1 or 2 to its value so that the new value is
end;
the date for the following Monday (2). The
sum statement (date+l;) increases the run;
value of DATE by 1 and also causes the
value.of DATE to be automatically retained
across iterations of the DATA step.
I
i
11
I
Ij
l
120 Example 5.6
Simulating the LEAD r-unction by Comparing

Example 5~- the Value of a· Variable to Its Value in the
Next Observation :
Goal W,ithin the same data set, look ahea~ from a variable value in one observation
to return· the value of the same variable in the observation that immediately
follows it. Then compare the returne~ value with the current observation or use
it in a calculation on the current observation.
: . i
Strategy ' i
You can use DATA step processing to simulate the LEAD function. To look
I ahead from one observation to the 11~xt within the same data set, merge the
' " data set with itself by specifying the same data set name twice in the MERGE
statement. In the second reference to:the data set, use the data set option
FIRSTOBS= lo start processing with' the second observation in the same data
set. Because the program does not contain the BY statement, SAS software
performs. a one-to-one merge, but th~ pointer in the second reference to the
data set will always be one observati~>n ahead of the first reference.
I
;
i
In the second reference to the data sei, use the RENAME= and KEEP=
options. RENAiv,IE= gives the look-ahead variable a unique name, thus
preventing the look-ahead value froiri overwriting the value read from the first
reference. ~EP= allows only the lo9k-ahead variable from the second
reference to. the data set to be brought into the program data vector. If you keep
all variables from the look-ahead
•
read,I you would overlay values of variables
with the same names that you just read from the first reference to the data set.
iI
!
Input Data Set
ONE
OBS X y
1 5 1
2 5 2
3 10 1
4 2 1
5 2 2
'6 19 1
.,.
:i_
Ma11ip11lnti11g Data From a Si11gl~ So,Ji-ce □ Example 5.6 121
' I
.I
Resulting Data Set

011tp11t 5.6 TWO Data Set
i ;
TWO I
! I
OBS X y NBXTX MATCH I
, I
1 5 1 5 YES I
2 5 2 10 NOi
3 10 l 2 NO i
4 2 1 2 n:s
5 2 2 19 NO I
6 19 1 NO I
I
I
!•I' I'
I I
I I .
:I I
Program The objective is to create a new data set, TWO, in whibB each observation
contains the value of X. for the current observation arid ,,
. ~I·. r the next
observation. To create'data set TWO, merge data set:0. E with itself. In the
second reference to data set ONE, do three things: : 1· •
I. Begin reading at t~e second observation. iI :

2. Rename X to NEXTX so that you can store the loo, -ahead values of X
without overwriting the value of X from the cun;eA~ observation in the first
reference, : ! ll:_I '. l I
3. Bring only the look-ahead variable into the program data vector.
Otherwise, you would overwrite all the other varia'!Jles with values from
the next observation. I t1,
! Iii· ·
Then use IF-THEN/ELSE logic to compare the origiJal !~d look-ahead values
and to report a match: I
Create TWO by merging ONE witli itself. ;

Begi11 1·eadi11g tile seco11d refere11ce lo data
set ONE 111itli tile seco11d observatio11. Use!
FIRSTOBS=2 to start the look-ahead
process at the second observation. This
l
!
data two;
..,... -
•
oM(firstobs--2 ,.,,,..lx•ruoxtxl 11 i :I ;
: I
example looks ahead only one observation i

but by setting the FIRSTOBS::a: option
differently, you could read ahead any
number of observations. RENAMB:a
renames X so that the look-ahead value
doesn't overwrite the current value of X in
the program data vector. KEEP= ensures
that only the value of X from the
look-ahead observation will be brought into'
the program data vector.
For eac/1 observatio11, compare tlie if x=nextx then match='YES';

origi11al and look-allead valries ofX a11d else match='NO';
create tlie 11e111 variable MATCH to report run;
Ille comparison.
'"'"' 1!..'fDlllple J.I i ·! ~l/<I/JICr J
! I.
Example 5.~ Obtaining the lag (Previous Value) of a
Variable within a BY G'roup
i
!
Goal II Create lagged values* for variables ~ithin a BY group.
I:I
Strategy jj Use the LAGn function in conjunction with BY-group processing and array
processing to create lagged values for
a variable within each BY group. After
ii each BY group is processed, use an IF-THEN statement to reinitialize the
.I
1 •
lagged values to missing.
'I
11
II
Input Data Set i ! :

'I .
INFORMS is sorted ; slrART.
!
INFORMS
OBS START END
1 1 2
2 1 1
3 1 3
4 1 4
5 1 10
6 1 5
7 2 1
8 2 2
9 3 1
10 3 3
11 3 2
12 3 4
13 3 5
Resulting Data ~e~

UI ·
Output 5.7 SHOWLAG Data Set
I.: I l Sl!Olff.AG
SHOWLAG contains ~J*ged

values within OBS START BND ENDLAGl!
'
BNDLAG2 EtmLAG3 ENDLAG4
BY groups, based on ~T.(\R•T. 1 1 2

2 l l 2
I:I:J : !
3
4
1
1
3
4
1
3
2
1 2
,1 5 1 10 4 3 1 2
6 1 5 10 4 3 1
7 2 1
ii
8
9
10
2
3
3
2
l
3
.
1
1.
11
12
13
3
3
3
2
4
5
3
2
4
1
3
2
1
3
.''
.
1
I
I
I
• A lagge4 1•a/11e is an cnrlier value for a giv~n variable.

i
Ma11ip1tlating Data From a Single Sorin!~! □ Emmple 5.7 123
. : 1·1
' '
' :1
' 11
Program The objective is to process data set INFORMS in BY groh s based on the
value of START and create lagged values for END withiii! ach BY group. The
input data set INFORMS must be sorted by START. U~ejttle LAGn function to
create the necessary lagged values. Use an array and an'itriJative DO loop to
reset variables that hold lagged values so that lagged vaiJJ! are not held across
BY groups. i / 1· i
This program generates up to four lagged values. By indJl~ing the size of the
array and the number of assignment statements that use :thb LAGn functions,
you can generate as man)'. lagged values as needed: i ;
I
I
Create SHOWLAG. Read a11 observation data showlag(drop=i count);
from INFORMS. Specify START«s the
BY variable;
set informs; I1·
I
by start; i
I
Defi11e the array GROUP. Create and array group{*) endlagl-endlag4; I

assign values to fow· 11ew variables. Use endlagl=lagl(end);
END LAG 1-ENDLAG4 to store lagged endlag2=lag2(end);·
values of END, from the most recent to the· endlag3=lag3(end);·
fourth preceding value. endlag4=lag4(end);·
I'
Whe11 the first observatio11 i11 each BY if first.start then count=l;
group is processed, reset COUNT to I. This
value is used by the following DO loop to
set appropriate array elements to missing.
On each iteratio11, set to missi11g array do i=count to dim(group);

ele111e11ts that have 11ot yet received a group(i)=.;
lagged value for the current BY group. end;
I11crease COUNT by I. If these array count + 1;
elements are not set to missing before an run;
observation-is written, they would still
contain lagged values from the previous BY
group. The DIM function returns the
number of elements in an array. Using DIM
prevents you from having to change the
upper bound of an iterative DO group if you
later change the number of array elements.
Example 5.r·. Applying Common Operations to a Group of
Variables ,
1:
:1 i
Goal Apply an arithmetic operation to sel~cted numeric variables in a data set by
using an array without explicitly listihg the variable names.
' i
i'
Strategy
i'I
Move the variables you do not want ~rocessed to the beginning of the program
data vector by listing them in the RETAIN statement that precedes the SET
:i statement. Then define a numeric array so that no intervening character
:1 ' variables are processed. By specifying _NUMERIC_, you can define an array
of numeric variables without specifyipg the names of the vatiables. Set the DO
loop to begin with the first variable you want processed.
I
I
i
Input Data Set IiI GRADES
i
I
I 11
T
,, T T T T T T T T T E
N E E E E E E E E E s
0 A s s s s s s s s s T
B 'M
·.)
T T T T T T T T T 1
s E 1 2 3 4' 5 6 7 8 9 0
1 Betty ?8 88 94 57 89 77 79 81 89 82
2 James 74 82 88 1f 88 81 72 84 91 77
3 Fred 69 71 81 64 79 74 66 77 81 95
Resulting Data Set

11 :
Output 5.8 CURVE l!at1 Set I
CURVE
;
CURVE contains scores!with curved values OBS TBS'l'J TES'r5 TllST9 NAMB 'l'ES'rl ITES'r2 TEST4 'l'ES'r6 TBST7 'rEST8 'rES'r10
for seven tests. Scores for Tests 3, 5, and 9 1 94 89 89 Betty 88 ! 98 67 87 89 91 92
are not curved. !I II ., 2 88 88 91 Jam&11 84 I 92 81 91 8~ 94 87
3, 01 79 01 Fred 79 81 74 84 76 87 100
I
'I
I
I
I
I
I
I
I I
Ma11ip11lati11g Data Ft-om a Single Source i ID Example 5.8 125
. ' !!
Program The objective is to curve the values of seven out of ten t~s t'. l~ores in a data set.
1
First, move the three scores you don't want curved to the ~~nt of the progam
data vector by using the RETAIN statement before the ~ETjstatement. Then
~efine the arr~y AL_LTEST by_ u~ing _NUMERIC_ so tha~_ytou do no_t hav~ to
list the numeric variables exphc1tly. Use a DO loop to begm processing with
the fourth numeric element in the ALLTEST array. Then
formula you want on the fourth through tenth array elements:
~xf
cute whatever
C,·eate CURVE. Move val'iables you don't data curve[drop=i);

. II .I'
I I
wa11t pl'ocessed to the fi·o11t of tlte program retain test3 tests test9;
data vector, j
Read a11 obse1·vatlo11from GRADES and set grades;

define the ,mmeric array ALLTEST. array alltest _numeric_;
Process variables in the array ALLTEST, do i=4 to dim(alltest);

begi1111i11g witlt the fotll'/li element. Apply alltest(i)+lO;
the c11rve by adding 10 to each test score. if alltest(i) > 100 then alltest(i) = 100;
Set all scores above 100 to 100. The DO end;
loop begins with 4 so that the first three run;
numeric variables in the array will not be
processed, The DIM function returns the
number of elements in an array. Using DIM
prevents you from having to change the
upper bound of an iterative DO group if you
later change the number of array elements.
I .::u OAUIIIJIII: '.J.::t 1I1""Jll<'1 .J
Example 5.11 Calculating Totals acrpss a BY Group to

Produce Either a Gr.and or Cumulative Total
II
Goal I:I Create a data set that collapses each BY group into a single observation and
I produces grand totals for variables i~ each BY group. You can also create a
I
new data' set that contains cumulative totals for observations within a BY
group.
I! !
:
Strategy The input data must be sorted on the ;BY variable. In a DATA step, use a BY
I statement to create FIRST. and LAS'f. variables for the BY variable. Using the
i
! values of;these variables with IF-TH?N logic, you can process observations in
I!
i groups. By using SUM statements and creating new variables to contain
running totals, you can accumulate the values of each variable as you process
the BY groups. ·
'
So that the new data set contains only the grand totals for each BY group, use
an OUTPUT statement with IF-THE~ logic to write oniy the last observation
for each BY group. Rename the original variables if you want the new
variables containing the totals to hav1 the same name as the original variables.
. I
To create an output data set that contains only a running or cumulative total for
each BY group, remove the IF-THEN logic that causes only the last
observation from each BY group to be
written to the output data set. See
"Related Technique."
Input Data Set

SCORES is sorted by SCORES
OBS . ID GAME! GMIE2 GAME3
1 ·A 2 3 4
2 ,A 5 6 7
3 B 1 2 3
4 ' ·C 1 2 3
5 C 4 5 6
6 C 7 8 9
Resulting Data Set

Output 5,9a GRAND,T .
I:!o:r:
Data Set I
·,, I I1 ! GlWIDrOT
I
GRANDTOT containJ tr c!game grand OBS ID OAMBl
I
ClAME2 G!IMB3
totals for each ID. I ' 1 A i; 9 11
I 2 B 1 2 3
3 C 12' 15 18
I
11 I
ii
::1
ll
,,ii
'
.... ,
011tp11t 5.9b CUMTOT Data Set
CIIMTO'l ! l!I,
Cumulative game totals for each ID. OBS ID GllMEl o»m2 ~~' ,
1 A 2 3
I
j 14
I' '
2 A 7 9 ' 11'
3 B 1 2
4 C 1 2
: 13
: ,3 '
5 C 5 7 : 19 ,
6 C 12 15 • 18 ,
Ii:
-- ~~- -----======= -"--=--------- - l ~ - ~ -
· i! 11!:'
1!
Program The objective is to create an output data set that co~tJ~ds the grand totals for
the variables GAMEl, GAME2, and GAME3 for eacH ~y group. The data
must be sorted by ID, the BY variable. Create new yafirbles that will contain
accumulated t~tals. Use IF·TH~N processing and the1tr~ue of FIRST.ID to
reset these variables to Oeach time a new BY group ~rims. Use the SUM
statement to create running totals. Use the IF-THEN anti OUTPUT statement
to write only the last observation for each BY group tg lhe output data set.
Rename the original variables from SCORES so that th~ variables in the new
data set that contain the accumulated totals can preser~~; the original variable
names: ! I:
I ,I ,
: I ,
Create GRANDTOT atid drop tlze data grand tot (drop:templ temp2 temp3) ; ! I:
~ariables tllnt represe11t tlie GAMEl- 1 set scores(rename= (gamel=templ game2:temp2 ga e•:temp3)J;
, I
GAME3 vali,esfro,n SCORES. Read all I
obse1'11atio11from SCORES a11d re11ame I
by id; !:
tlie origi11al variables co11taini11g tlie ga11, e
score valtter. The BY statement specifies I
ID as the BY-group variable and creates the
variables FJRST.ID and LAST.ID. !
i
Wizen re((di11g tliefirst observation ofeaJ1z if first.id then
BY gror1p, reset tlie values of GAMEi- i do;
GAME3 to zero so that tl,e total from tl,e1 game1=9;
previous BY gro11p is 1iot retai11ed. I game2:0;
game3=0;
end;
I
Add tlie cw·re11t valtte of TEMP1-TEMP3 gamel + templ;
,to tl,e rrm11i11g totals. Write 011/y tlie last ! game2 + temp2;
observationfor each valrte ofID to : game3 + temp3;
GRANDTOT, The three sum statements a4d if last.id then,output;
thevaluesofTEMP1-TEMP3 to GAMBlL run;
GAMB3 and also cause the values of i
GAME1-GAME3 to be automatically ,
.__) retained across iterations of the OATA step.
1.::0 L!.XOIIIPIC ;J,Y 11<-tmpter J
!' ' '
I
:
I
Ref ated Technlqu'e You can produce a data set that contains running totals of GAME1-GAME3
I! for each BY group by removing the last IF-THEN statement from the end of
11
,ii. ., the DATA step in the previous program:
I
11 I
·I if last.id then output; I
I: I
\
The DATA statement was also chang¢d to produce the CUMTOT data set. See
Output 5.9b. i ·
. I
! !
i
' I
I
!
Example 5.1 o Calculating the Percentage That O~~
Observation Contributes to the Tot~II of a
BY Group I
, I
Goal Calculate BY-group totals for a variable and then create: aiy1ariable that shows
the percentage that each observation contributes to the totf:1 for that BY group.
1r
' .1
1:
Strategy Use PROC MEANS to calculate a total for each BY group and to create a new
. ,. I
data set that contains one observation for each BY group. [!'hen use a
one-to-many merge to merge this data set with the origina) data set and
1
calculate the percentage that a variable in each observatioii tontributes to the

BY-group total. You can calculate this percentage for e~c~1! ~bservation
because match-merging causes the variables to be retained't)lroughouteach BY
group. · ! I! I
I! I
I
I
I'
You can perform the same task with PROC SQL. See "Re;Iated Technique."
Note: Due to the variabiHty of data and the number orcWditions that
H:
determine the path chosen by the PROC SQL optimizer,:i~ not always
possible to determine the most efficient method without fi I st testing with your
I : :
data. ·
Input Data Sets

I.I'.
Both SALES and REGTOT are sorted by SALES REGTOT
the variable REGION. The REGTOT data
set, created with PROCMEANS, contains OBS REGION REPID AMOUNT OBS rial!ok REGrO'rAL
the total amount of sales for each region.
1 EAST 1051' $2,508,000 1
: Ii 1:
EASf: I; $4,313,000
2 EAST 1055 $1,805,000 2 NORTH::
j: I
$5,800,000
'l
3 NORTH 1001 $1,000,000 3 $4,635,000
4 NORTH 1002 $1,100,000
5 NORTH 1003 $1,550,000 : ! .
: i :
6 NORTH 1008• $1,250,000
1 ; :
7 NORTH 1005 $900,000
8 SOUTH 1007 $2,105,000
9 SOUTH 1010. $875,000
10 SOUTH 1012 $1,655,000
1.:,u r,xarnpre .J,Ju LI: i ~11u111e1· .J
:1
Resulting Datj fts

I .
:f
Output 5,10a PEReENTl Data Set
PBRCBll'l'l
1:1 ;
. .I .
PERCBNTl was creit~d,with the DATA OBS REGION •. REPID ! AMOUN'l' RBGTO:l'AL IIKGPC1'
step, '. I : 1 RAST 1051 $2,508,000 $4,313,000 58,15

2 EAS'.f 1055 $1,805,000 $4,313,000 41,85
'-ii ! 3
4
liOll:l'H
NORTH
1001
1002
$1,000,000
$1,100,000
$5,800,000
$5,800,000
17,2(
10.!n
·i'
5 HOll'l'R 1003 $1,550,000 $5,800,000 25.72
6 NOR'1'K 1008 $1,250,000 $5,800,000 21.55
;
7 NORTH 1005 1 $900,000 $5,800,000 15.52
,,' 8 SOll!l'H 1007 $2,105,000 , $4,635,000 45.62
9 SOUTH 1010 :$875,000 $4,635,000 18,88
II 10 SOOTH 1012 $~,655, 000 $4,635,000 35.71
II ------=-::---=--------------=------=-----=--=----
_ _ _______,~ --- -----------------=---
~ - -
. i ,\
Output S,lOb PERC~NT2 Data Set I
I'I
PERCENT2 was creal~d ~ith PROC SQL.
1
OBS IIBGION REPID
l'BRCENT2
!'I AMOUN'l' IUIGi'O'l'AL l\EGPC!I'
l:1,,
·i:
·'I
I
' 1
2
BAS'l'
IWl1'
1051
1055
$2,508,000
$.1,805, 000
$4,313,000
$(,313,000
58.15
41,85
ii 3 NORTH 1001 $'1,ooo,ooo $5,800,000 17.24
'! 4 NORTH 1002 $1,100,000 $5,800,000 18,97
5 NOR'l'K 1003 $1,550,000 $5,800,000 26,72
6 NOR'rH 1008 $1,250,000 $5,800,000 21.55
7 HOll'l'H 1005 I $900,000 . $5,800,000 15,52
8 SOUTH 1007 $2,105,000 $(, 635,000 45,'2
9 SOU'rH 1010 l $875,ooo $(,635,000 18,SS
10 35,71
i SOll'l'H 1012 $1,655,000
' !
$4,635,000
iI
I
,I
. I
I
1! I
!
Program The objective is to produce a data set ~hat shows not only the sales amount
produced by each sales representative but also what percentage it represented
in the total for the region. !
!
First, use PROC MEANS to produce REGTOT, an output data set that
contains totals calculated for the AMOUNT variable for each BY group in
SALES. Then merge REGTOT with S!A.LES by REGION and calculate the
• . I
percentage ofreg1on total (REGTOTA:L) for the amount sold by each sales
representative. i
I
Due to match-merging behavior, the. value I
of REGTOTAL is retained until the
value of the BY vru.iable REGION changes. The value of REGTOTAL,
therefore, is available for calculating the value of REGPCT for each
observation.
,,.u.c,11pnn.u111,5 .1.,1u1u •·1ur11 u "111,;1r: .IJUU/"(."C'I' r,;rnmpte :,,JU 131
; ,I
: ii
1
lllj
Create REGTOT, a data set tl,at co11tai11s proc means data=sales noprint nway; ,.
011e obser11atio11Jor eacli REGION. Create 1:
var amount; :;
a 11e111 va,-iable, REGTOTAL, tliat co11tai11s by region; ; Ii :
. I• !
the total AMOUNTfor eaclt REGION. output out=regtot (keep=regtotal region) sum=regtota: i
!
run; I
!
C,-eate PERCENTI by mergi11g REGTOT 11
data percent!;
·1
will, SALES, based 011 tl,e Jlali,e of the BY
variable REGION.
merge sales regtot; rl
by region; II1:
,!
Calc11late the perce11tage eacl, obse1·vatio1i regpct = (amount/ regtotal) * 100; 1:
il
co11trib11ted lo tlie total for tlze approp1·iate format regpct 6,2 amount regtotal dollarlO.; ti
I!
regio11. AMOUNT is the ainount run;
contributed by each sales representative and 1·
REGTOTALis the sales total for that
region. I
I
I
Related Technique If you are familiar with Structured Query Language (SQLl, you may want to
use PROC SQL instead of the DATA step. Using the SAL~S table,* PROC
SQL creates a new table that contains two new columns ofi ~limmary data,
; I, I·
REGTOTAL and REGPCT. Because the data are grouped lby REGION, the
summary SUM function sums data in each group, not the e~tire table. Thus, to
get the total for each region, simply use the SUM functio:n the AMOUNT :op
column. The region t~tal becomes the values of the REGT~]l'AL column. To
calculate a percentage for each REPID, divide AMOUNT ~Y.:the sum of
, AMOUNT for the region and multiply by 100. The percenta~e of the total
I
1 becomes the values of the REGPCT column: · ' ·1
I I
! .
proc sql;
create table percent2 as i 1
select t, sum(amount)· as regtotal
•. ----- •
format=dollarlO·,
; II
100* (amount/sum(amoun~)) as r:~?.Qt forDlil; t=l6 2
from sales 1
group by region;
quit, ·- · : I :
Note: In PROC SQL, SELBCT.state~zents automatica11~ ~duce a report. ~

SELECT clauses, which follow CREATE TABLE or CRE~ E VIEW 1
statements, do not automatically produce a report. ti

i
L
r
'i
'
• A PROC SQL table is a SAS data set. In SQL terminology, columns ~r+ variables and rows
are observ111ions. i Ii !
1·;
I.
1,3~ i:.:i:ampte .J,IJ .HI
I" '
L.11ap1e:r J
'1 i! i
· Example 5.,· i1: Adding a New Variabl~ that Contains the

Frequency of a BY-Group Value
II :
t
!
:
r
. !
: '
Goal For each row in a table, determine th~ number of occurrences of a column's
value, and store the number in a new icolumn. *
I
i:
I :
Strategy Use the COUNT function in the SQ~ procedure to obtain a frequency count.
Create a new table that includes a column that shows the frequency count. Use
the GROUP BY clause so that the fre~uency count will be for each group.
Input Data Set :1 ! ONE
OBS ID NAME LOCATION HOUl!S
1 1 John Krueger Tech Support 5

2 2 Joe Olszweski Mark~ting 3
3 1 John Krueger Tech:Support 10
4 3 Len Nuhn Sale~ 30
5 3 Len Nuhn Sales
I
1
6 2 Joe Olszweski Marketing 20
I
7 1 John Krueger TechiSupport 30
8 1 John Krueger Tech:support
I
40
9 4 Luan Nguyen Deveiopment 40
Resulting Data Fr t
Output 5.11 FINAL ,~ble
FIIIAL
OBS ID NAME i LOCA'l'IOII HOURS COUll'l'

1 1 John Krueger 'l'aah SUpport 30 4
2 l John Kroeger 'l'eah Support 10 4
3 1 John Krueger 'l'ech Support 5 4
4 1 John Krueger Tech Support 40 4
5 2 Joe Oluweslci Marlee ting 20 2
6 2 Joe Olazwealci Harlceti1:1g 3 2
7 3 Len Nuhn S11les 1 2
8
9
3
II
Len Huhn
Luan Nguyen i Sales
Development
30
40
2
1
• A PROC SQL table is a SAS data set. In SQL lerrninology, columns are variables and rows
are observations.
I
I
I
. I
ivwmpmmmg vu,a r rom a ;,111g1e ;,om·cel,
. ,1 w r,xampte :,, 1 l 133
; ! I
Program The objective is to create a new table that shows how m~ , times each ~
employee appears in the original table. Group the data by Jy~lues of ID. Count
the number of rows in each group. Create a new columd th t shows how many
times the employee appears in the original table: , Ii
Invoke PROC SQL, a11d create a table. The proc sql; I Ii
1:
CREATE TABLE statement creates the create table final as !
subsequent query.
Select tl,e colllmns. The SELECT clause select*, count{id) as count

selects all columns from table ONE and
creates an additional column, COUNT. For
each row, the COUNT function uses the
row's value of ID to return a frequency
count that shows the total number of rows
that have the same value of ID.
Name tl,e table to query. from one
Group the data by values of ID. Because group by id;

the values are grouped by ID, the COUNT quit1
function will return the total number of
rows for each value of ID. Without the '
GROUP BY clause, the COUNT function I
i
returns the total number of rows in the i
table. .
II I
;
i 1·1 i
Note: In PROC SQL, SELECT statements automaticallYj~foduce a report.
SELECT clauses, which follow CREATE TABLE or CRE-~!fE VIEW 1
statements, do not automatically produce a report. 1; 1
I . '
: I!
I
..
I
I
; i!
i I
i i
!I ''
134 &;ample 5.12
ii
C?j _Chapter 5
ll
ii
I , .
Example 5.r:r Subsetting a Data Set Based. ori the

Calculated Ave,·age of a BY Group
Goal Group the data in a table* and create'a new column that contains an average for
each group. Use the average for each group to subset the table.
i
Strategy Using the SQL procedure, create a nbw table that contains a column that gives
the average of values from a specified column. Use the GROUP BY clause to
group the data and find the average f9r each group. Use the HAVING clause to
subset the table and to return only rows from each group that meet a specific
search criterion.
Input Data Set :I EMPLOYEE

·[I
OBS NAME JOBCODE BAI.ARY
i
1 Nikos Al $32,456.00
2 Paul NA2 $53,798.00
I
3 Jody T2 $25,147.00
4 Olga Tl $19,810.00
5 Yao NAl $4~,433.00
6 Natasha Al $3~,987 .00
7. Tom T2 $23,596.00
8 Kendrick NAl $4~,690.00
9 Kesha Al $33,067.00
10 Klaus Tl $2~,230.00
11 Kyle NA2 $51,081.00
12 Carla NA2 $5?,270.00
13 1\nne T2 $24,876.00
14 Gunner NAl $4?,345.00
15 Candice Al $34,567.00
Resulting Data ~~'

I ;i '
Output 5.12 FINAL r·I'able PINAL
'ii
!l OBS llAMB JODCODE: SALARY AVBRAGE
i! 1 Xosha. Al ! $33,067 ,00 $33,019.25
ll 2 Candice Al $34,567.00 $33,019.25
ii
1, 4
3 Yao
Poul
NIil
NA2
$43,433.00
$53,798.00
$42,489.33
$52,383.00
'I 5 Klaus '1'1 $20,230.00 $20,020.00
h 6 Alllla '1'2 $24,876.00 $24,539.67
:ii 7 Jody '1'2 $25,147,00 $24,539.67
"
I
:Jl
• A PROC SQL table is n SAS data set. In SQL terminology, columns arc variables and rows
are observations. ·
••-•••••r• .. •• .. •••o _ .... _ • '""'""" ,e .., ... 6,., ._,._.,.,,1,o1.o11I" &:.t..\f.flllJ,IIG J■ I..£ 1~'1
I li1:
: ii I'
Program The objective is to find which employees make higher s'aiJties than the average
salary for their jobcode. Calculate the average salary for
create a new column that contains the average. Use the
jobcode and
age salary for each ~Je
J~F
jobcode and each employee's salary to subset the table:! :
!
I11voke PROC SQL, a11d c,·eate a table. The proc sql;
CREATE TABLE statement creates the create table final·as
subsequent query. i'
i ii
. : lj !
Select tl,e col1111111s. The SELECT clause select *, avg(salary} as average formal:=dollatlO:, ·
selects all the columns from EMPLOYEE ' lj
and creates an additional column, 1,
AVERAGE. The AVG function calculates ,1
the average for all the values of SALARY
for each JOBCODE, which becomes the
value of AVERAGE.
Name tl,e table to qllery. from employee
Group the data by JOBCODE, a11d subset group by jobcode

the grouped data, GROUP BY specifies having salary >calculated average;
that AVERA GE will contain the average quit:;
salary for each JOBCODE. The HAVING
clause returns all rows where the
employee's salary is greater than the
average for their jobcode. CALCULATED
takes the place of the mathematical
computation in the SELECT clause,
(HAVING requires that the data be
grouped,)
i
Note: In PROC SQL, SELECT statements automatically:1pioduce

I
a report. I
SELECT clauses, which follow CREATE TABLE or CRE _:TE VIEW

-i'
I,
I I
Really Rounding Numbers
Goal :I Produce pencil-and-paper results* when rounding the results of riumeric

!iii calculations.
IJ
·i!:,
Strategy Introduce a fuzz.factor** that allows you to obtain numeric calculations that
are closer to the pencil-and-paper results that you expect. In a macro, use the
regular ROUND function, but first a~d a fuzz factor to the number.
G'! Due, to factors of numeric preci~ion, the ROUND function without an

added fuzz factor does not always pr9duce pencil-and-paper results. See "A
Closer Look." I
Input Data Set:! !I .

AMT2 contains the Pyiicll-and-paper JIMOUNTS
rounded version ofthr!❖uµiber in AMTl
and is in the data set 5iq that we can compare
I
ODS ~l'1'1 AM'l'2 :
1:J!
it against the results o rounding
technique. ·1 !
1 0.0000540 0.0000500
'ii : 2 0.0000550 0,0000600
:!
: j 3 0,0000560 0.0000600 ,
0,9998100 i
: j
,I 4 0.9998050
:' I'I
5 0.9998060 0.9998100:
I
6 17,9998050 17.9998100 !
7 17,9998060 17.9998100 '
8 18.9998050 18.9998100
9 18,9998060 18.9998100
10 18,9999050 18,9999100
11 18.9999060 18,9999100
Resulting Data:~~ 1s
-l,1 '
01tlpllt 5.13a MAC,N_D. Data Set '
'I I !11\CRND
. !1 I I
MACRND shows rou dmg with a fuzz OBS NIT1 ! Al!.T2 M_IIOUNI) HATCH
I
factor. 1 o.oooou 0,000050 0,000050 yes
2 0,000055 . 0.000060 0,000060 yes
J 0,000056 0 ,000060 0.000050 yea
4 0,999805 o,9m10 0,999810 yea
5 0,599806 0,999810 0,999810 '/{88
6 17,999805 17.999810 17,999810 yes
7 17 ,999806 17.999810 17,999810 yea
0 18.999805 18,999810 18,999810 ye11
9 18.999806 18.!199810 18.999810 yea
10 18,999905 18,999910 18,999910 yaa
11 18.999906 18,999910 18,999910 yea
* Pe11cil-a11d-paper results arc produced by n'1anunl cnlculations.

•• Fuzifactor refers to adding an amount to a'.value so that it is rounded up appropriately when
calculations are performed. !
lVl<IIIIJJtll<llllllf Ll<IIU rrum ll ollllgle olOl/rB U l!.Xalllp/e :,,J:J 137
iI 11; .
I I!
·'.
Output 5.13b REGRND Data Set I!
IIEGRND
:H
REGRND shows rounding with the OBS AM'l'l AM'1'2 R_ROIIND MA'l'CH
' I• i,
ROUND Function and no fuzzing, 1 0.000054 0.000050 o. 000050 YU
2 0,000055 O, 000060 o.0000iio , yea
3 0.000056 0 ,000060 0,000060 . yae
4 0.maos 0 •.999810 0,999BiO • yes
5 0,999806 0.999810 0.9998~0 \ yes
6 : 17,999805 17.999810 17.999800 . no
7 , 17,999806 1'1,999810 17,999810 ; yes
8 18,999805 18,999810 18.999BQO ; no
9
10
11
. 18,999806
18.999905
• 18,9.99906
18,999810
18.999910
18.999910
!::::mi I!
18.9.9.991
i
yea
no
yes
i r:
I:
. I!
Program The objective is to produce rounded values that are thJ Jame as
pencil-and-paper r~suits. The program use~ a macro fa: ~~traduce a fuzzing
factor when roundmg a value. For companson purposesj ,data set AMOUNTS
contains the variable AMT2, whose value is the corri:9~ Iiencil-and-paper
rounded value of AMTl. The program rounds AMT~ y?~lh a macro, produces
M_ROUND, and then ;compares its value with AMT?,·jW~ie TEST is set to
'YES' when th~ values match so that you can easily see ithe result of
fuzz-factor rounding: • · ·
. I
~ Create tlie MACROUND macro. %macro macround(var,unit,fuzz=le-10);

round ((&var+(sign(&var)*&fuzz)),&unitl
%mend;
Create MACRND. Read a11 observation data macrnd;

fi·om AMOUNTS. format amtl amt2 m_round 10.6;
set amounts;
IJ Use tlie MACROUND macro to 1·ormdi m_round=%macround(arntl,.00001);

the valtte of AMTI to six decimal places. j
Assig11 tlle value to M_J10UND. !
i if amt2=m_round then match='yes';
To sl,0111 how well the ro1mdi11g wo,-ked,
test the pe11cil-a11d-paper value i,z AMT2 else match='no';
agai11st M_ROUND for a matcll. run; I
1:1;
I I
I" !
. 1:lli
~ A Closer .Look '1 Numeric Precision and the ROUND Fu?~fl~n
We are introducing a fuzz factor because of a numeri~ Jr~cision problem
common to computer applications, not because the R(?'9J;ll'D function produces
inaccurate results. The problem of numeric precision ~~¥s because of
hardware limitations in.the way computers store real qu'irtbers. Basically, a
finite set of numbers must be used to represent the infin1i~U real number system.
1 !11; 1
Most software packages and spreadsheet applications ~niipduce a hidden fuzz

factor to account for problems in numeric precision. Th~ SAS ROUND
function, however, does not use an automatic fuzzing ~~❖hanism. It simply
accepts the original value as it had been stored on the pi~~ine. Unless you add
this fuzz factor, however, numeric precision may preve11· ~.ou from achieving
I :
1:
I
11
' 1:1
!I
138 Examp{e 5.13 i
I
I
I
I
the usual pencil-and-paper results. Numeric precision is an issue across all
platforms and is not consistent from riachine to machine.
I
The key to defining your own rounding routine is to determine how much of a
. I
fuzz factor should be added. You want to add enough so that values are
rounded up when they should be, but!you do not want to add so much that
values that should be rounded down are also rounded up instead. See "The
MACROUND Macro and the Fuzz F~ctor" for a discussion of the rounding
routine used in this program. j
I
So that you can compare the results, the following program is the same as the
original one except that it uses the RQUND function with no fuzzing factor:
I
data regrnd;
format amt1 amt2 r_round 10. 6;:
set amounts; ]
r_round=round(amtl,. 00001); ·:
if amt2=r_round then match='yes';
1
. j else match='no';
i run;
!
.l
. ! I
Output 5.13b shows that the rounded numbers in REGRND do not match the
iiij pencil-and-paper answers as well as tli.e MACRND values do in
it Output 5.13a. !
!.'11 I
I '
~ The MACROUND Macro and the Fuzz Factor
I
I
!
i
The MACROUND macro uses the ROUND function with an additional fuzz
!I factor to produce pencil-and-paperres:ults when rounding a number. When the
. 11
: 1 macro is defined, macro parameters in' the %MACRO statement define three
-:1
,,:1 macro variables: VAR, UNIT, and FU:ZZ. The variable FUZZ is assigned a
value at that time.
ii"
,I
ii %macro macround(var,unit,fuzz=le-10);
round ( {&var+ (sign(&var) *&fuzz)) ,&unit)
; !1
j:I %mend;
i
'
When you invoke the MACROUND macro, values for VAR and UNIT are
I I
supplied: 1
m_round=%macround(amtl, . 00001) ;!
!
At macro execution time, the macro variables are resolved:
I
' i
m...round=round( (arntl t (sign (aintll ;* (le-10) J) , •00001)
i I
The routine used here ensures that a value that should be rounded up will be
rounded co1Tectly, even when its valueiwas slightly less because of the
representation error. The SIGN functiop is important here because it ensures
that negative values are rounded in the '.correct direction. The SIGN function
returns a negative value when AMTl iJ negative. Without it, a negative value
for AMTl would be rounded in the wr6ng direction. ·
!
mumpwu1111i,: uu1u £"IV/II u ""'8'" "u"rfe j f j l!:Xamp[e S.13 139
1 i_l 11
: 1i I 1
As an example, here's how the expression works when ~t ex~cutes for the first
observation in this example: ! j I;
iI !1 'II
m round=round(0,000054+(sign(0.000054)*(1e-10)\J d6001)
m=round=round(0.000054+.0000000001, .00001) 11
. · 1
11
m_round=round(0.0000540001,.00001)
m_round=.000050 i
l,
'II ,
: I:
Where to Go from Here □ Numeric precision. For a discussion of this issue, Jee!~~-
88-95 in SAS
Language: Reference, Version 6, First Edition. For ml:>fe discussion, see
Klenz, Brad (1992), "Handling Numeric RepresentJti◊p Error in SAS
Applications," Observations, 1(3), 19-30.
,1 ii!
,, I
□ Macro processing. For useful introductions, see C~ap1ter 1, "Introducing
the Macro Facility," in SAS Macro Facility Tips and T.dchniques. For
complete reference information, see SAS Guide to Ma 1dh:, Processing,
Version 6, First Edition. · !
Example 5. 1;'-l Collapsing Observatio:ns withJn a BY Group
into a Single Observation
Ii
I'
i
I
'
Goal .I
I
Rean-ange a data set by changing a sipgle variable in a group of observations to
i a group of variables in one observation. Reshape data by collapsing
observations within a BY group into hsingle observation in order to simplify
data analysis and report generation, I
i
Strategy J
Collapse multiple observations with common BY variable value into a single
observation. The data set must be sorted pdor to the DATA step. Use a BY
statement to create BY groups and th~ FIRST.variable and LAST.variable.
Use array processing to assign the current value of a certain variable to the
appropriate new variable. Use the FIRST.variable to control a DO loop that
reinitializes the retained array values knd resets the array subscript variable.
Use the LAST.variable to output the il.ewly
I
created observation.
I
I
'
!
S'l'UDEN'l'S
\ J
OBS NAME SCORE
1 • Deborah 89
2 Deborah 90
3 Deborah 95
4 Hartin 90
5 Stefan 89
6 . Stefan 76
Resulting Data•frr
Output 5.14 SC01r Set :
'
SCOl\ES
~I ·s
OBS NAME -SC0RE1
I
sc3i2 SGOI\E3
I:;: 1 Deborah 89 90 95
2 Ma.rtin 90
3 Stefan 89 '
7/i
Iiii ·
I
i
!
1J
H
i
I
I I
.!
!
I
i1
ii
I·
:!
Iilj
11
•I
Ii
Ii
I!
!i
I!i r,
I 11
!!
1·
i
1'
·I :I,,
,, • .,_UUf/" . . . . . . . •~•6 .... l,UH & f VUI ... ..,llllji&S. ~i,r~.•r~ "-' l,j.U,,UJJfe J,14 141
'1'
,,
11 .
, 'I
Program Each observation in the data set STUDENTS curre~ttY stores a single score.
' ,, I
The objective is to reshape STUDENTS so that an qbs~'rvation contains all test
scores for an individual student. Use array processi~gi~4 create and assign
values to the new variables SCOREl, SCORE2, and SGORE3.11'
The values are
retained with each iteration of the DATA step, and the't'ecordis written only
when the last observation in the BY group is processe~.jThis program assumes
that STUDENTS contains no more than three obser~.a~irns with the same value
for NAME: ' 11
: ii '
: I
Create SCORES. Use the RETAIN data scores (keep=name scorel-score3 l ; I
stateme11t to create the variables SCOREi- retain name scorel-score3; !I
SCORE3 and to l'elai11 tlze values of !
!
I
NAME a11d SCOREI-SCORE3 from one
iteratio11 ofthe DATA step to the next. j
Ii
I
lj
Create the array SCORES. SCOREl- I array scores(*) scorel-score3; !
SCORE3 will receive their values from the
variable SCORE in STUDENTS. 1 i
i
!
i
Read observalionsfrom STUDENTS. Use i
set student~( ! :
the BY statement to create BY groups a11d by name; ' i
tire va,-iables FIRST.NAME a11d 1·
l
LAST.NAME. 1 i
i
At the begi1111i11g of each BY group, use I
I
if first.name then do;
I
the assig11111e11t stnteme11t to set the vallle I i=l;,
i
I
to
of tl,e al'ray subscript I lo 1. Reinitialize I do j=l to 3; I
-i
111issi11g the values of SCOREI-SCORE3; ,scores (j) ". ;
! end;
end;
Assig11 the current value of SCORE to scores(i)=score;

SCOREJ, SCORE2, or SCORE3.
Aftel'p1·ocessi11g the last observation in a ! if last.name then output;

BY group, write a11 observation to '
SCORES.
i
llrcrease the value of I by I. In addition to 1 i+l;
increasing the value, the sum statement also run;
causes it to be automatically retained acrosJ
iterations of the DATA step. ]
I
I
!
, ....: 1:,;,:a111p1e .J.J.J /_;:1' L.· 11c,p1er .J
:j. l1'[ I
.
Expanding Single Obs:ervatio~s into Multiple

Observations f
i
Goal Reshape data by creating multiple ob~ervations from a single observation in
the input data set and by assigning variable names as values in the output data
set. :
II
Strategy 1: Use an mTay and a DO loop to create'.multiple observations from each single
ii
IJ
observation in the input data set. Use 'the CALL routine VNAME to assign
variable names from the input data se't as values of a new variable in the output
ii ' I
II!I ~- I
I
I
I
.
I
I
Input Data Set :i•:I ;I

I
SURVEY I
'i .
i
I
OBS NAME CEREAL PASTRYi BAGEir
1 John 10 9 I 8
2 Sam 2 8 4
3 Sally 5 7 6
Resulting Data:~ e~ 1
Output 5.15 SURV~Yr Pata Set I
syiw1m
I'I,.'
SURVEY2 contains reshaped survey data.
!I OBS IIAMB D!IRAKFST RESPONSE
I
Ii !
l John [CBRBAL 10
2 John [PASTRY 9
1: ' 3 John 'Bll.GEL 8
ii, I 4 Sam ic&llEAL 2
11 ,
iI i s Sam 1
PASl'RY 8
I ~ Slllll :BAGEL 4
7 Sally CBRBAL 5
I 8
9
Sally
Sally
iPll.Si'RY
sll.GBL
7
6
I 1
I'i 1
;'l·l·i
. I·'
·.·i;!
:I
.. I
Ii
iiII
:;.11.1
I
I:
J!
i 1·I,
11
Ii'I
:.1
1 •I
!:1
I
1:
J"'-Ul••f'Hll.4•U'0, ._,LlfW • ,..,,,. M ~Ul4SUi UU'UIL,]
i i
IP....,
!
.t;,,\Ull4j,IIC J • .tJ )&f-->
Program The objective is to read SURVEY and create a new d,J which the lifa
variable names CEREAL, PASTRY, and BAGEL becom~ yalues for the new
variable BREAKFST and in which three observations ai-elcreated for each one
in SURVEY. The numeric values of the variables CEREA" ; PASTRY, and
BAGEL are written to the new variable RESPONSE: : ! : J
Create SURVEY2, dropping Ille variables data pUrvey2(drop=cereal pastry bagel il;
whose values are being written to the llf!W
variable RESPONSE a11d droppi,1g tlze
set survey; I' i
I
I
variable used by ti,e iterative DO loop.
Read a11 observali01tfrom SURVEY.
Defi11e the array NUM. Defl11e character array num (*) cereal pastry bagel;
variable BREAKFST a11d give it a length length breakfst $ 8;
of eigi,t.
Assign values to RESPONSEa11d do i=l to dim(num};

BREAKFST. Write Qll observaiio11 to response=num[i]i
SURVEY2 each time the DO loop iterates. call vname(num[i],breakfst);
This DO loop iterates three times, once for output;
each element in the array. In each loop, the
end;
assignment statement uses the NUM array
run;
to write the numeric value of either
CEREAL, PASTRY, or BAGEL to the new
numeric variable RESPONSE. The CALL
routine VNAMEassigns the name of that
variable (CEREAL, PASTRY, or BAGEL)
to BREAKFST.
I '
Reshaping· Observatiohs into. Multiple ·
Variables i
i
I
Goal ' I'

:.·1· Transpose* a data set to make the information more meaningful and usable for
i further processing. The process of tra'.nsposing is repeated until the final goal is
i md l
!
II·,I!1 .'
I
Strategy Sometimes data that are stored as nutiierical values actually have greater
significance and potential usability than is apparent. To improve meaning and
usability. use a multi-step process to transpose a data set so that information
originally: stored in a few numeric variables with numerous observations is
stored as multiple variables with few~r observations.
!
First, use the TRANSPOSE procedure with the VAR statement to reshape the
variable so that its values are stored iri a series of new variables. Use one
variable as a BY variable to create one observation for each unique value of
that variable. Write the results to an output data set.
!
i
Second, transpose all of the variables !ll the output data set so that each
variable name is now a value of a new, character variable. Store the values of
each variable in another series of variables and create a new output data set.
I
Input Data Set • ,

' '
'
ONE
OBS CATEGORY RA'UNG
1 1 6
2 1 9
3 1 8
4 1 9
5 1 8
6 1 10
7 1 10
8 2 7
9 2 8
10 2 8
11 2 10
12 2 9
13 2 8
14 2 10
15 3 6
16 3 9
17 3 9
18 3 9
19 3 8
20 3 9
21 3 9
i' * To transpose is to reshape data by turning columns (variables) of information into rows
I (observations). i
,,
i
!i
11
,!
'I
l1
11
1'
I:'
':1
Ii
~u,np1c .1• .1 u .....
i '
Resulting Data Sets I I
i
I,
011tpr,t 5,16a INTERIM, First Transposed
Data Set
OllS FORD NISSAN MAZDA
IN'l'BRlM
SAAB SA'l'IJRN
i 1·1:
HONDX\
'i
TOYOO'A
I•
1 6 9 8 9 8 10 10
2 7 8 8 10 g la 10
! 1:!~
3 5 9 9 9 8 , 9
i
i
!
I 11., f
011tp11t 5.16b FINAL, Final Transposed
Data Set
FINAL :' 1:1
OBS MAXI DEPEND APPEAL PBRFOl!M
i'',
' I
,:
1 !FORD 6 7
2 IIISSMI 9 8 ,: I I
3 MAZDA 8 8 9 r i
9 10 I:
'
s
6
SAAB
SA'l'UllN
RONDA
8
10
9
8
9:
B
9:
t
i
7 .TOYOTA 10 10 ,; I:
i
!
i ! I !
I ' lj: 1
:
I : 11:
Program The data set ONE contains survey data that have been colle ted on the
dependability, overall appeal, and performance of cars fr~Mlseven
manufacturers. However, the results are stored simply as; v~\*es in the numeric
variables CATEGORY and RATING. Not only are the datJ rot useful in their
current form for further processing, but their meaning is ~c1tually bi~ried.
i II: .
Reshape the values in a two-step process to reflect their ~eaJ :r;neaning and to
1
make the data usable. In the first step, transpose RATING ~'tj:its values are
stored in a series of new variables, each reflecting a car m_ d~tifacturer. Do not
I. transpose the values of CATEGORY. Simply collapse the· ya~ues Ill•
into one
I observation corresponding to each BY group. The resultip*l1.ata set INTERIM
i has seven variables, one for each car manufacturer, and three!observations.
• i jjl!
In the second step, transpo~e the seven car variables so t~at:~ch variable name
is now the value of a new variable, MAKE. Store the valu~~ ~f each car
variable in the three new variables ~EPEND, APPEAL, i~1iPERFORM,
which reflect the qualities surveyed. The resulting data set,!f.INAL, has four
variables, one for the make of car and three reflecting the! tlfo' e qualities. It has 1
,seven observations. · '

I "fO ,e..\(lf/lj)lt:
',. :w
J,.i<,.1 l.,,IIUJJlt:.I J
I· !
·11i i
" I
Create INTERIM'by reshapi11g ONE. proc transpose data=one 1
Transpose the vJ,.Jable RATING. Use out=interim(dtop=_name_category
CATEGORY as ~ JJY variable to create m1 rJname~(coll=Fordcol2=Nissan col3=Mazda
I 1i "
observatio11for 1'Wh u11ique value of I col4=Saab col5=Saturn
CATEGORY. T~~ ~TERIM data set will I col6=Honda col7=Toyota));
have three observations; the BY statement
by category; !
used with PROCjT,RANSPOSE generates a
var rating;
single observatio,ri for each BY group of
CATEGORY. TJ\~iVAR statement specifies run;
I 'I ·
that only the variFRle RATING is to be
transposed. The 1~fiables COL1-COL7 are
created automati9ally to contain the values
of RATING from1 ·1~very.
observation of the
BY group. NAME Iis created
automatically to ipb~ify the name of the
variable being tr~ri~posed. Because it is
unneeded, it is droJped.
.
l·.
I
I
Create FINAL bJ1 ~;eshapi11g INTERIM,

:
1 '' i
proc transpose data=interim I
this time tra11sposiiig
I 1I .
all of tire variables. out=final(rename=(_name_=rnakecoll=depend
Without a VAR stafem.
, , ' ent or another I col2=appeal col3=perform));
statement, the sev.en numeric variables, run; I
FORD-T<?YOTt~!a1;e all t~nspos~d.
_NAME_ 1s created automatically~ lts
values are the na$~s of the transposed
variables FORD.J.rOYOTA. The variables
COL1-COL3 ar~l~leated automatically to
contain values foii tl_1e: three qualities.
NAME and COi.Jl_'._COL3 are renamed
;ppropri;tely. Ii ;
i! I
p
C H A T E R 6
i: !
ij j,
Utilities and Functions 11 '
I(
' 1:1 :
i i:i j
! This chapter is not strictly about combining observationJ fioin
' • If I.
I
different SAS
, data sets. It contains, however, examples of commonly as~7tl questions about
dealing with d~ta values, s_uch a~ extracting character s_ti:.in~~ ;rro~ a variable
value, converting a numenc variable to a character vana~l~ rnd vice versa,
performing a bubble sort, and determining someone's agF•. ~'~-p. m a SAS date
value, among others. : 1p
: I '
' ~ ii
6.1 Converting Variable Types from Character to Numeri and Vice
Versa 148 i I!
: 11
6.2 Determining the Type of a Variable's Content 1~01! i
' ! 11 ;
6.3 Determining Whether a Variable is Character or N~illeric 152

· I: I:
. 1111
6.4 Creating a SAS Data Set Whose Variables Contairi tlie 1Attributes
of Variables from A~other Data Set 155 i
II I!
6.5 Sorting Variable Values within an Observation (Bdb~l~ Sort) 158
6.6 Creating Equal-Sized Random Samples and

Sized Subsets or Exact-Sized Subsets 161
ProduJil~
! !l
~qual
11!
1
1~1 Counting the Occurr~nces of a String within the V ~1Jl~I of a

I Variable 164 ! I[ !
16.8 · Extracting a Character String without Breaking the TJk~in the

I Middle of a Word 166 ' i' ·
II I
16,9 Creating SAS Datetime Values 168 !,

I ] i
!6.10 Creating a SAS Tim~ Value from a Character Val~e ! il70
. , ·II ,,
I
Ii
;6.11 Calculating a Person's Age 172 1 1
! : i
i !
I !
!
I
1'10 C..Talllpte u.1 Ji i i;,,rup,er u
I I
Ii :
1
: ! '
' I
Converting Variable Types from Character to

Numeric and Vice Versa I
!
:I
Goal Read the.value of a character variabl~ and write its value to a numeric variable,
and vice versa. i !
Strategy pf
You cannot directly change the type a variable. You must create a new
variable of the desired type. Use the PUT or INPUT function and a specified
format or informat, respectively, to c~>nvert a value. Use an assignment
statement to assign that value to the new variable. When converting character
to numeric, use the INPUT function *nd a numeric informat. To do the reverse,
use the PUT function and a numeric format.
i
i
Input Data Set1·!1 i 1
: I
• •I '
s
Data set ONE contai a single character ONE , TWO
variable. Data set TW.O c:ontains a single !I
numeric variable. ; iI i, OBS XCIIAR OBS, Y?Ml
. !
1 0123 11 123
: I 2 12345 2 !I 12345
: !
: i
3 123456 3 I 999
:: iI ! 4 123A45 4 !f
.!
!i
i
'
CHWNUM
OBS XCHAR XIMI
1 0123 123
2 12345 123'5
3 123456 123456
4 l23A45
I I
i I
i
II ,
Output6.lb NUM2GHAR Data Set I
:I ii I
NUM1CRAR
II
I
'I
OBS nMl ! YCRAR1 YCIIAR2
I l
2
123
12345
j 000123
1012345
123
12345
3 999 000999
I 999
4
'
I
' j
i
i,
Output6.lc
I : Data Set
CHAR2•rr2:•. I
Cllllll2tnl2
i
•i ; OBS I XCRAR
. I , l 123
2 : 123'5
3 : 123456
!j : 4
i
,.. I
i,.
±&±fr=
I
I
I.
iii
I''
f ii
111
n11";•fVU~ ·~~
UUIIIIG-.J LIIIU ,1
.. I L..J .V..\t.AIIIJ,'l.f: U.J
; !I
Program The objective is to read a character value and write 'id '.if possible, as a numeric
value to a new variable, or vice versa. To write the ctja}acter value of XCHAR
to the numeric variable XNUM, use the INPUT funct,iAn to return the existing
i'I
'
character value as it is read with the numeric informa't' 8, To write the value of
numeric variable YNUM as a value for character v~rtJbtes YCHARl and
I YCHAR2, use the PUT function and the numeric fonrtht I I•
Z6. to return the
•I
I existing numeric value with leading zeros, or use the ~UT function with the
I
I standard numeric format 6. to return the numeric val~e !without leading zeros.
I In all three cases, use·an
,
assignment statement to save!the
I I I J:
returned value to the
new variable XNUM, YCHARl, or YCHAR2, respec~ vely:
':I .
Create CHAR2NUM. Read a11 observatiL data char2num; j
from ONE. ' set one;
l
!
Read the value of XCHAR 111itfr a m,me,:ic xnum = input(xchar,?? 8.);
illformat a11d assign it as a 1111111eric valrtk run;
to XNUM, a ,mmeric variable. The INPUT
function reads the value ofXCHAR with 1
the numeric informat 8. and returns a
numeric value. The ?? format modifier
suppresses the invalid data messages and
prevents the automatic variable _ERRORl
from being set to 1 if XCHAR doesn't i
contain valid numeric data. I
I
I
Create NUM2CHAR, Read a11 observatidu data num2char;
from TWO. I set two;
Assign va/11es to YCHARJ and YCHAR21 ycharl = put(ynum,z6.);

by formatti11g the curre11tvalue of YNUM ychar2: put(ynum, 6.);
wit!, tlieformats Z6. and 6., respectively.!
The PUT function writes the current value~
of the specified variable with the specifieci
format. The format Z6. formats the value of
YNUM with lcadin~ zeros. The format 6. is ' I
the standard numeric format. !. :! I
1
[fthe value/or YNUMis missing, tlzen if ynum=, then ycharl=' '; !
explicitly assig11 YCHARJ a value of run; .i
bla11k. Assigning a blank to YCHARl . Iri
overrides the default value of a period (for ii
missing), ii :
If
ti
Related Technique You can drop a_nd rename variables so that the new JaH ~le with the numeric
value can retain the same name as the original variable 1 ·was created from:
data char2nrn2 (drop:x); ; l·i :

set one(rename={xchar=x)); 1
xchar = input (x; ?? 8. ),; ;1
run; '!
j
I
I •
·1 !
I
See Output 6.lc. ,I
rl
'
l
I I:
1:
1:
I,
i
1 :,u r.xampte O,-' ,· 1..;(IC1prer u
I;
I ,
Example 6.? Determining the Type pf a Variable's Content

·1 i''
;
ii
Goal Determine whether a character varia~le's value contains numeric data,
character data, or missing data.
1.•11
1·
I
Strategy To determine the contents of a charadter variable's value for each observation,
first test the value to see if it is missing (blank). If it is missing, classify it as
undefined. If it is not missing, then use the INPUT function to read the value
with a numeric format. If that result is not missing, it is a valid numeric value.
If it is mi~sing, it is classified as a chfacter value.
Input Data Set~ I/ !

OLD contains charac~er variable X. OLD
'I II'
I,. iI '
ODS X
I
Ii
ll,,
. 1, 1 1234
2 12E5
I,
1
1 1
3
' ·,1 4 124ABC
·I 5 124
i 6 ABCDEFGH
Resulting Data
Output 6.2
~rf
NEW DaiJ
1!
;Set
I , lmw
'I
Ol!S 'l'lPB X
I
1 Numeric 1234
2 NWDeric 12BS
3 Undefined
~
,.
!"I
s
Character
Numeric
124ADC
124
,I
' I
6 Cbara~ter
i
AllCDBFGH
II ;
I
I 1
_.,.,., ....,,., •"'""":•111• ~ =umy,~ u.~ ,_, I
1'
i ,!
Program 'of
The objective is to determine the data type of the value I ariable X in ea.ch
observation in data set OLD. Read each observation and telt
to see if the value
of Xis missing (blank). If it is missing, set a new vatiabl6Jh'amed TYPE to
"Undefined". For all other observations, assign a value: t6J hi temporary variable
by using the INPUT function and a numeric informat to r~tlirn a numeric
value, If the value is not missing, it is a valid numeric v~lti '. If it is missing, it
1
. jl ,
is a character value. Ii '
Create NEW. Read a11 observatio11from data new(drop=tempvar); !i
OW. The LENGTH function assigns a
length of 9 to the new character variable
length type$ 9.; I!
set old;
TYPE. By default, TYPE would have been
created with a length of 8. 1:1
11
If the value of Xis blank, it is 11eitlzer if x=' ' then

character 1101· mm1eric so set TYPE do; Ir1
appropriately. There is 110 11eed to do type='Undefined'; i:
further cllecki11g, so retum to the top of return;
the DATA step. end;
Create variable TEMPVAR a11d assig11 it a tempvar=input(x,?? 8.);

value by usi11g the INPUTftmctio11 to if tempvar ne. then
ret11r11 tlie value of X read with the type = 'Numeric' ;
1m111ericformat 8. If the value is 11ot else
missing, assign a value to TYPE that
type= 'Character';
indicates tl,at it is mtmeric. Othemise,
i11dicate that it is character. The INPUT run;
function converts the character values to
numeric values. The ?? format modifier
suppresses the invalid data messages and
prevents the automatic variable _ERROR_
from being set to 1 when invalid data are
read.
Example s.a,· Detea•mining Whether aVariable is Character
or Numeric
Goal Determine if a variable is character Jl' numel'ic to ensul'e that you have the
right type of data for your applicatio#.
i
Query the table* DICTIONARY.COLUMNS in PROC SQL to determine the
variable's type. Use the INTO clauseito store the variable's type in a macro
variable. Use the macro valiable in a 'subsequent DATA step to c1·eatea new
variable of the other type that contairis the same data as the original variable,
I
Input Data Set:1r1 i
Data set ONE contain~,,two

. variables: the ONE
numeric variable X_ UM and the character
variable Y_CHAR. , \:I OBS X_Nl!M Y_CHAR
:11
:p·, 1 12345 12345
' !i
I
I
Resulting Data: l$ets
Outpllt 6;3a NUM2t¼ARData Set i
:Ii! : NUM2CIIAR
. 11 !
. Ii '
OBS' X_NUM iullllR
I
CIIJ\RVAL
I 1 12345 112m 123,5
I I
:·I
' ·1
' i
,I
Output 6,3b CHAR2 : UM Data Set I
CIIAR2NllM
-I 11! !
, !I . OBS X_NUH j Y_CHAR HUMVAL
!I r
. ii i 1 1234S i 12345 12345
. I.I . I
'L!
, !·I
ii I
I
i
Program The objective is to ensure that you are :using a certain type of data in your
application. . I
lI
The table DICTIONARY.COLUMNS!contains information about all variables
in all SAS data sets in the current SASi session. For each variable in data set
'. I
i
ONE, query DICTIONARY.COLUMNS to determine its type. You must
I subset the query to get the type for only one variable, (Typically, you subset
,
1!
I!
queries of dictionary tables with a WHERE clause because they are very large
I
11
I
!
• A PROC SQL table is a SAS data set. In SQL terminology, columns arc variables and rows
are observations.
~ . . . . . . ~J ...... • .... 1l'r. J ~ "'""""'"' u.J .....
tables.) The query returns the value num or the value c,qar. Store the value i-n a
macro variable. Use the macro variable in a subseque~fDATA step. The
DATA step creates a new data set and a new variableltl!at contains the same
data as the original variable, but of a different type. ! j
In many cases, SAS automatically changes data from{e type to another, but
it is more efficient if you do it. In addition, if you contrbl the conversion, you
can avoid possible unexpected results when numeri6 d~ta defaults to using the
BEST12. format. : I:! .
Qllery DICTIONARY.COLUMNS to /eam proc sql; !' .·.1 '
tlie type of X_NUM. The column TYPE m ·'

DICTIONARY.COLUMNS contains the I
select type into; vartype
from dictionary.columns
I
variable type for all variables active in th6 II
current SAS session. The INTO clause puts
the value num in the macro variable I
VARTYPE. I
S11bset 1/ze query. To get the type for only;I where libname='WORK' and
X_NUM, subset the query to return only t~e rnemname='ONE' and
row for that variable. The values in the 1 narne='X_NUM';
dictionary table are in upper case, so the quit;
WHERE clause must use upper case as
well.
Create a 11e111 data set a11d a 11ew variable 1

data num2char;
that co11tai11s tl,e same data as the original set one; ii 1·,
variable, b11t of a different type. The PUTi I I·
if "&vartype"="nwn• then charval=put(x_num,5.') ~·
function returns the value ofX_NUM as i run; i I,, ,
character data. The character data is stored i •;
;
in the variable CHARYAL. The value nunl !
for VARTYPE is lower case, so the IF !'
statement must use lower case as well.
Query DICTIONARY.COLUMNS to lea,-~i

I
proc sql; ' I :
i
//,e type of Y_CHAR. The column TYPE ih select type into; vartype 11
i
DICTIONARY.COLUMNS contains the i from dictionary.columns.
variable type for all variables active in the i
current SAS session. The INTO clause puts
the value char in the macro variable
VARTYPE.
Subset the query. To get the type for only where libname='WORK' and
Y_CHAR, subset the query to return only rnemname='ONE' and
the row for that variable. name='Y_CHAR';
quit;
I :
Create a 11e111 data set a11d a new vadable data char2num; di
that co11tai11s tlze same data as //,e ol'igi11a set one; j;[ :
val'iable but ofa differe11t type. The INPUT if "&vartype"='char• then numval:cinput(y_char,51),r-'.
function returns the value ofY_CHAR as i run;
· l,1
!i !
numeric. The numeric value is stored in thd :i •
variable NUMVAL. I II . !
i Note: In PROC SQL; SELECT statements automatically produce a report.

•I
'
'
SELECT clauses, which follow CREATE TABLE or dREATE VIEW
statements, do not automatically produce a report. · 11 j
1:
Ii
I O't
,
lfl"'.'t!
Y'tl/t!Tt! IQ \JO j/"0/'. LI lAlCIJJlt!I" 0
.:11:1
I ,!
!
D Dictionary tables.
□ For an example that describes dictionary tables more thoroughly, see
Example 6.4, "Creating a s4s Data Set Whose Variables Contain the
Attributes of Variables from IAnother Data Set," in this book.
□ For a complete description ahd
examples of dictionary tables, see
pp. 286-291 and pp. 294-295 in Chapter 37, "The SQL Procedure,''
in SAS Technical Report P-222, Changes a11d Enhancements to Base
SAS Software, Release 6.07. iI
□ For examples that use dictionary tables, see Chapter 11, "Five Nifty
Reports Using PROC SQL yiews in the SASHELP Library" by
Bernadette Johnson in Repor/ingfrom the Field: SAS Software
Exper/s Present Real-World Report-Writing Applicatio11s.
. I
II
I
i
. i.
r
i
'·:
'II
i
:;
i
;j
ii
Utilities a11d F111i~~io11s □. Example (i 4 155
: 111: .
: I'! I
Example 6.4 Creating a SAS Data Set Whosel.xtariables

Contain the Attributes of Varia6iij~
. 1:
from 'J
Another SAS Data Set : Iii:

i I' I
; I~ i I:
Goal Create a table* that has information about the namei t~~:e, and length of
columns in another table. • I'! !
; II :
. J 11 I!
Strategy PROC SQL provides dictionary tables, which contaiJ;ihformation about the
SAS files in the current SAS session. Dictionary tabIJs lare accessed by the
predefined libref DICTIONARY. Dictionary tables '.ai~ ~o
different from other
PROC SQL tables, except that the information in the~ i,s gathered and
maintained by the SAS System. !I iIi
I! ,1
The table DICTIONARY.COLUMNS contains info~If{J~ion about all
of the columns in all the tables in the current SAS sestign.
: jj 1:
Use the
DESCRIBE TABLE statement on the DICTIONARYi.<WLUMNS I l,I J1
table to
determine the column names that you need in your query.
. ; Ill
CAUTION! ! j I!
Because DICTIONARY.COLUMNS is ustially\y~ry large table, use a
WHERE clause to restrict the query to only one: tawe. ■
i 1.,,1
Input Data Set PRICES i j:~i
; 11 i
OBS CROP MARKET HIGH , tO I.AST MONTH
' 11 .
1 Wheat Farmville 2.96

I6i Iii 1 2.7 jul94
i i
I!
Il :
,.! ;
ii :
Resulting SAS Log I
I
Iqi !I
Output 6.4n Columns in
DICTIONARY.COLUMNS NOTE I SQL table DICTIONARY. COLUHNS was creatad like I
Id
I; I
creata table D!CTIONARY,COLIJ}INS I
( i :1
The descriptions of the columns used in this
example are shaded.
LlBNAME char(8) labela'Libraey Name' 1
IIEMNAME char (8) label='Member Nllllle',
f
MEMT:tPE char(8) label='Member Type',
NAME,.char(8) labala 1 Column. !lame•,
'l.'YPI. char(4). labela'Coliimn Type',
LIHG'l'R lilim lllhel='Column Length''
NPOS nWll ·1ebela·•coluinn Position',
VARNUM nWll label='Column Nwnbar in Table',
LABEL char(40) label=•Column Label',
FORMA'l' cbar(16) label='ColWllll Format',
INFORMAT char(16) labela!Column Informat',
IDXUSAGB char [9) labala 'Collllllll Index Type'
)1
I! 1·
11.
• A PROC SQL table is aSAS data set. In SQL terminology, col~hlns arc variables and rows
arc observations, l' I•I 1i ;I
i ;: ;
:1
i'
i
1 iI
.:ii
10v nxamp1e V,'f J.I I i:.11ap1er u
, .
:1 I
I
ATTR
Colu.'M\ Colll!lUl Column
OBS Typa N11111e Length
1 char MO!iTll 5
2 char CROP 5
3 char IWIKBT 9
4 num LAS'!' 8
5 mun LOW 8
6 num HIGH 8
. I
Program !I Look at your SAS log to determine tqat you need to use the TYPE, NAME,
i
!. and LENGTH columns from DICTIONARY.COLUMNS to get the type,
name, and length of variables in WO~K.PRICES. In addition, use the
i LIBNAME and MEMNAME colunuisI to subset DICTIONARY.COLUMNS
'i to produce information about the columns in WORK.PRICES, only. Lastly,
l I.
::1 order the table so that the character columns and the numeric columns are
I
:1,· listed together: ·

I.! '
Invoke PROC SQL a,,4 qetermi11e the proc sql; i
colmmrnames in Ji;! i . ' tabl e d'1ct1onary.co
describe ' I
l Uf!U1Si
DICTIONARY.COLflflJS. The
DESCRIBE TABLE 1t,~tement writes a
description of the table' lo' the SAS log.
'Ill ~
Createatable. TheCM~TETABLE create table attr as
statement creates the taole_ ATTR to store
I 17 ·
the results of the subsequent
I I '
query.
1i
Select tlze appropriat cplrmms.
!I
I '
select type, name, length
,Ii ;
Name tl,e dictio11ary fable to query. You do !from dictionary. colwnns
not have to assign the ~-;~ref
DICTIONARY. • •11 :
Srtbset tile qllelJ'• The r:;~ues of LIBNAME where libname='WORK' andrnen\I1ame='PRICES'

and MEMNAMEmustbe uppercase, !
'Iii
I
Order the data values by variable type. order by type; i
quit; I:
·11 i' i
I
lI :, Note: In ,PROC SQL, SELECT staternents automatically produce a report.

,. SELECT clauses, which follow CREA;TE TABLE or CREATE VIEW
i··. statements; do not automatically proddce a report.
I
i
I; !
,,
I
i:
I 1:1
Ii:·
i;
i
I:
' i;I
1:1
I.
,I1:1
il
I!:!;I ,
I
i ,; I
Utilities a11d F1111ctio11s □ Wliere to Go "ron, H
, 1:: I
J• ere 157
! 11'.1
Where to Go from Here : 1:, :
D Dictionary tables, For a complete description and ~~~i;nples of dictionar

tables, see pp. 286-291 and pp, 294-295 in Chapter 3?; "The SQL y
Procedure," in SAS Technical Report P-222, Changis hnd Enhancements
to Base SAS Software, Release 6.07. • !,! Ii
For examples that use dictionary tables, see Chapter i),
"Five Nifty
Reports Using PROC SQL Views in the SASHELP~lJiBrary" by
Bernadette Johnson in Reporting from the Field: SA;S:Software Experts
Present Real-World Report-Writing Applications. I iI :
! II :
'! i
'.I i
:i
:i
:i
'i
![
i
:,
,'I
Ii
':I
i
1
:1
ii
ii
:1
!i
:1
:I
I 1:11
1: i
Li
II'I ;.
l.1 i
11 •
1'1
q, !
I l
:I: l
. :i ;
' :i !
•1
I,
I!
H;I
Ill:1 I
l!.I :
:j l
(rl :
ir
1:i :
i :.
.
I!
,i .
'
ii '.
1
11 '
ii ;
i i
I!!:
100 l!,)CUlllpte 0,.) J jJ l.,ll(lpter U
''iiJ:·tI !'
Example 6.$] Sorting Variable Valu~s within .an
·I 1,!:1
. Observation (Bubble Sort)
I
11' I
·1 :1 .
',,
i
I
I,,
Goal '
I': Sort the values of variables within ad observation.
, Ii I
I
:I !i,,
I
Strategy To sort the values of variables within an observation, use a technique called
bubble sort. Create an array that conthlns the variables that you want to sort.
Then use nested DO UNTIL and itedtive DO loops to compare the value of
each variable in an observation with the next variable value until all have been
compared and placed in ascending or~er.
1
For an enhanced version of this prog~am that will increase efficiency for
processing larger data sets, see "Rela~ed Technique."
i
i
Input Data Set~!i ONE

·1:!:.•
i'·I
!CODE~
,1
1·: I
OBS CODEl CODE2 CODE3 CODE5 CODE6'
·•' .
1 3 1 5 4 6 2
2 9 8 6 5 7 4
3 3 2 1 9 0 7
4 8 2 6 4 0 1
5 5 7 4 3 8 2
Resulting Data_]]e~s
Output 6.5a VARS~~TiData Set VARSOR'l'
I
:1 I . OBS CODEl COOE2 '

CODE3 COOE4 CODI!5 CODE6
·I! 1 1 2
i
3 5 6
'
, I
Iii 4 5 G 7 8 9
fi
2
3 0 1 a 3 7 9
4 0 1 :i 4 6 8
5 2 3 4I 5 7 8
1!!
,ij
I
iiiI ,
I
:1'
Output 6.Sb VARSOR!f2Data Set
VARSOR'l'2
i''I1, '• ODS CODE1 COOB2 CODE)
I
CODR4 COD&S CODB6
VARSORT2 was prodp:ced by a technique I
shown in "Related Tedlifique" that requires 1 1 2 ~ 4 5 6
more coding but that i~ p10re efficient for 2 ~ 5 6 7 8 9
larger,~ata sets. • i ti 3
~
5
0
0
2
1
1
3
2
2
4,
3
4
5
7
~
7
9
8
8
:1
i:I
i'l
1:1
I!
i
.
J:
I,
:!
: l
'1i:
r
lI:I:·1
I
I
.,!
-
0
•·-~-~ -••-'"'••••-••:-"'""'I ._. .-.-uu•y•~ lJoJ l~W
: ! I
: i I:
Program The obje_ctive is to reorder t~e values ?f CODE! ~hrough 1~0DE6 for each
observation so that they are.in ascenduig order. First, cre~~e. the CODE array
to contain the values for these six variables in each observ,ition.
' I
Use a
DO UNTIL loop that iterates until the data are completelYj sorted. Within that
loop, nest an iterative DO loop that iterates five times, 6nq6/or every -
comparison that needs to be made (CODEl to CODE2, aHdso on). This DO
loop mak~s the compariso~s by process.ing the COD~ a~r1 V~lues are
reordered 1f the next value m sequence 1s larger than 1ts ,nru:ned1ate
~f
predecessor. ; i' I'
! i I'
Create VARSORT. Defi11ean·ay CODE. data varsort(keep=codel-code6);
11
Read a11 observationfrom ONE. array code(*) codel-code6;
set one; ii:
: ,,I:
liJ Begi,z a DO UNTIL loop that iterates do until (sorted)/'
1111til all of the
variable values withill a1, sorted=l;
observation have bee,z sorted. Set
SORTED to 1, SORTEDwHl beset to 0
each time the DO group executes to reorder
values. When that code does not execute,
the array is already sorted. In that case,
SORTED will remain 1 and prevent the DO
UNTIL loop from executing again.
Begin a,z iterative DO loop that iterates do i = 1 to dim(code)-1;

five times, 01Zcefor eaclz comparisoll tl,at
needs to be made i,i encl, observation. The
DIM function returns the number of
elements in CODE (6). Using DIM prevents
you from having to change the upper bound
of an iterative DO group if you later change
the number of array elements.
Compare eaclz val11e of a11 elemellt i11 Ille if codeli) > code(i+l) then
CODE array (values of variables CODEl do;
througl, CODE6) wit!, tlze value of tile temp=code(i+l);
next variable. If the first element is larger, code(i+l)=code(i);
reorder tlie values alld set SORTED to 0. code(i)=temp;
The variable TEMP holds an array element sorted=0;
while you assign the larger value to the
end;
second element and the smaller value to the
first element. SORTED is set to O so that end;
the DO UNTIL loop continues iterating. end;
This DO group only executes when a value
is greater than its immediate successor.
After all values are in order, this block does
not execute. SORTED is, therefore, not
reset to 0, causing the DO UNTIL loop to
stop._
rcu c..w111p1e o•., J! !,,'I 1.11ap1er
;
u
. ,!, '
_I
. !I!:
I •j ,
g A Closer Looi<: Processing the Array the Correct Number of Times

:' 11i, · The key to processing the array the necessary number of times is in where the
i variable SORTED is set to 1. The va(ue of SORTED controls the DO UNTIL
1,1
loop that processes each observation.! When SORTED is true (equal to I), the
.i 11]!'I DO UNTIL loop stops processing. SORTED is set to 1 before the code that
reorders values, SORTED is set to O~ithin the DO loop that reorders values.
:1:i'1 On the last time through, the reorderiµg code never executes because the tested
' '1 value is never greater than the follow~ng value. Because SORTED is not reset
'I
I to Oin that case, it equals 1 when the :no UNTIL statement executes and that
I - ends the DO UNTIL loop. The entire DATA step iterates and another
1
observation is read for processing. !

iI
I l\
i I
I
I
Related Techni:que If you are sorting a small data set, the: technique described in "A Closer Look"
' -
i is simple and useful. But if you are sorting a larger data set, the gain in
i
; iI efficiency can make it worth the effo1!t to limit the comparisons performed to
only those that are necessary. Set the upper bound of the iterative DO loop that
I
, I compares values and switches them v.;hen necessary so that only pairs before
i
!: the last pair switched are rechecked. i
First, create two additional variables, HBND and MOVEHIGH in this
example, that you can use to prevent the iterative DO loop from rechecking
pairs unnecessarily. Use HBND to control how many times the DO loop that
compares pail's of values iterates. Initi:auy set HBND to the highest number
necessary, the next-to-last element in the array. In the DO group that switches
I
values when necessary, set the value of MOVEHIGH to I, the number of the
iteration and, therefore, of the element in the array being processed. Use that
value to reset the value of HBND. Th~ next time the DO loop iterates, it will
not check more pairs than are necessa~y:
I
I
data varsort2(keep=codel-code6);
array code(•) codel-code6;
set one;
hbnd =-dim(code)-1;
do until (sorted);
sorted=l;
do i = 1 to hbnd;
if code(i) > code(i+l) then
do;
temp=code (i+l); ,
code (i+l) =code(i); :,-
code [i) =temp;
movehigh=i;
sorted=O;
end;
end;
hbnd=movehigh-1;
end;
run;
Utilities a,rd F1111ctl~11~ j I□ Example 6.6 161
: !1' 1
:i!
Example 6.6 Creating Equal-Sized Random Sam'.ples and
Producing Equal-Sized Subsets or Ii ..
Exact-Sized Subsets 1:
i''
I 1::
; Ii
Goal Create equal-sized subsets ~rom randomly chosen observ.'at. ~tn. s from a data set.
You can also create exact-sized subsets.
'
;
I
1rI .
Note: You can create equal-sized subsets only if the nhihber of observations
is divisible by the number of subsets you want to create. j 11 1i
; 11
: t,
1 :i ' '
Strategy Create a new version of the data set by adding a new varia~l~ v.ihose values are
randomly generated with the RANUNI function. Sort th~ ~~w data set based on
the values of that variable. Then read the sorted data set ~nd 1calculate the value
of a new variable for each observation, based on the remiiHder of the current
, I i I:
value of _N_ divided by the number of subsets you want;to:create. Use

conditional processing to write each observation to one of th~ee data sets.
: , l:il:
To create an exact-sized s~bset, simply use the OBS= da~a !~4~
option so that
only a certain number of observations are read and then wrjtten to an output
data set. See "Related Techniques." : ! 11· 1,
' I I
i 1;1 !
Note: This technique is riot efficient for large data sets.:~~r:more efficient
sampling examples for larger data sets, see Chapter 10, "P~?Ressing Large
Data Sets with SAS Software," in the SAS Applications Guide, :: i
1987 Edition.
,,:! i '
:, :
Input Data Sets :I ~

I
d!
' '
RANDOM was created by reading MASTER RANDOM

MASTER and using the RANUNI function
to generate values for X. It is sorted by X. ·i
OBS NAME OBS NAME :X
i
; ! :'
' I , ,
1 NCSU 1 Wake Forest i o,.,07789
2 Clemson 2 Maryland : o!:'.J!s3s2
3 Georgia Tech 3 Duke :oi.:d162s
4 Duke 4 NCSU :.o;:3·6292
I''
s
I I" I'
Maryland 5 UNC !oiro12s
6
7
8
9
Virginia
Wake Forest
Florida State
UNC
6
7
8
9
Virginia
Florida State
Clemson
Georgia Tech
""I"'
• 'I ,
:ofo
iof ~~519
·o•s 106
' I~
432
[ ij!
' ·1 :
It'
; 1:
11 ,.
'I
i!,,
ONB
OBS HAMB
I
1 Duke
2 Virginia
3 Csorgia Tech
_-------=----=-----==: --= - - = = ---
Output 6.6b TWO I

i TitO
I
OBS NNIB
I
1 Wake Forest
2 NCSU
3 Florid11 State
!
Output 6.6c TH
TIIREB
!
OBS !NAMB
1 i Maryland
2
3 li Clemson
IJNC
0lllput6.6d
~IMPLB
OBS lWIB
1 tlllke Foxest
2 Maxylalld
3 Duka
4 NCSU
5 I/NC
i
_- - - - -·~-- - --=-~~- - ~-,. ~ - ~--=-~
~
U,l/1,/u """""'~1 j O Example 6. 6 163
Program The objective is to create three equal-sized subsets froJ ti

data set MASTER.
In the DATA step, read each observation in MASTER an~4 µse the RANUNI
function to generate random values of variable X betweert' 0 and I. Sort the
' 11 !·
data set by X. Use another DATA step to read the new data'.set RANDOM.
Create the variable CLASS and calculate its value by u~i1~ the MOD function
to divide _N_ by 3 and return the remainder. Then use cotj?itional processing
to output each observation to a subset, based on the valuelof CLASS:
I I!•: I!
Create RANDOM. Read an obse,-vation data random; I '
from MASTER. Ge11erate ra11dom set master; I
,mmbers for variable X. The RANUNI
function randomly generates numbers and
x=ranuni(12345);
run;
I
returns a value based on the seed. !1 :
1.1'
' I:
Sort RANDOM by X. proc sort data=random; '111
by x; ]:
run; :1
l :.
I'.'
,,, Ii
Create data sets ONE, TWO, and THREE. data one two three;
Read an ohservatioufrom RANDOM. set random; 1:!
Drop JJariahles you do 11ot need i11 tlze ; 1:!
drop x class; fl
output data sets. C1·eate the variable class=mod(_N_,3)_; :1'1.
CLASS. The MOD function returns the
remainder of _N_, the number of the c·urrent
iteration, divided by 3, the number of Ii·1
subsets being created.
Write a11 observatio11 to ONE, TWO, or select (class);

THREE, based on the value of CLASS. when (OJ output one;
when (1) output two;
otherwise output three;
end; ,I
run; .i
i
"!
'i I
: i i
Related Technique To create a randomly selected subset of an exact size, us~ ~~~ OBS::;::; data set
option to read only a specific number of observations fro;
case5:
f
ANDOM, in this
1 i '
! ;
data simple{keep=name);
set random(obs=S);
run;
See the resulting data set in Output 6.6d.

l<l't £.XUlllple U,I JI :l,.,11Up1CW U
·1 :
;J'l ',I :.
Counting the Occurre~ces of a String within

the Values of a Variable
i
i
I
iF
Goal ,1 Count the number of occurrences of ~I character string or of a single character
in the value of a variable. i
-1:-1' i
i
',.!: I
!
First, use the TRANWRD function t~ substitute a single character for the
search string in the variable value yo~'re searching. Use COMPRESS to
remove that character from the string1and use the LENGTH function to check
the length of the string both before arid
I
after the character has been removed.
The difference in lengths indicates hqw many occurrences of the character
string were in the original variable value.
I
Note: You must choose a substitut~ character that does not occur in your
original character values. I
Input Data Set,f1:

Each value of MEMNAMEcontains
·•••I •
one or ONE
more occu1Tences of thi string JnY, The
second observation cohtains the string t(y to OBS lWINAME
demonstrate that the s~~rch is case
•,
sens1t1ve.
·:I :
n, 1 _my_lib_my
1:1
ii 2 _my_lib_My
. !I 3 'mylibmylibmylib
LIi•I 4 • my_libmylib_my
I 1:1
:11
• ,11
-, I wwww =
Resulting Data;~,r~s
Or,tpr,t 6.7 TWO Daft,!s~t
I :.I :
The value of COUNT inllicates the number OBS Ml!MIWII NEHMNAHB C:OtlNT
of times my occurs in t 11~ value of 1 _my_lib_my _&_lib_& 2
2 _my_lib_Ny _r._lib_My 1
MEMNAME. i:1
3 111Ylibmylibmylib &lib&lib&lib 3
1,1
4 my_liblllylibJIIY &_lib&lib_r. 3
i'I
l:1
! !,i
I ii
:11:1
-1=1
'1!
1·1
I,
ii
!·I
-1:i
!
!·
iI,
·11
!I
iiil
,i
l1
i
ii
, ,;
•i
·Ii,:
_, .....................T..f.i 1· - - . . . . . ., . , . ~ v ..
1! .
Program The objective is to count all occurrences of the string m; i~ !the values of
MEMNAME in data set ONE. First, use the TRANWRD !ttlnction to change
all occurrences ofll\Y in each value of MEMNAME to ah ~~persand. Assign
this value to a new variable, NEWMNAME. Use the COMPRESS function to
remove all instances of & from the value of NEWMNAMB.i Use the LENGTH
function to determine the length ofNEWMNAME both:v.iith and without the
compression. Assign this difference to COUNT, which ~n~ir··ates how many
times Jl\Y occurred in each value of MEMNAME: i I! [
Ii '
Create TWO. Read a11 observatio11from data two; !!
ONE. set one; 11
Determi11e !tow ,na11y times my occ11rs i11 newmname=tranwrd(memname, 'my','&' I;

i:i:I,
i11
i
:
I ;
MEMNAME. Clta11ge each imta11ce of my count=length{newmname)-length(compress(newmname, &:'));
I
ill MEMNAME lo & • Assign that val11e to run;
NEWMNAME. Remove all ampersa11ds
(&) from NEWMNAME. Subtract tire
compressed length from tT,e original
le11gtlt of NEWMNAME.Assig11 tl,at value
to COUNT. The TRANWRD function
searches the current value of MEMNAME
for my and changes each occurrence into an
ampersand (&). The LENGTH function
returns the length of a variable. The
COMPRESS function removes all
ampersands from NEWMNAME.
Extracting a Character String without
Breaking the Text in the
I
Middle of a Word
I
I
I
Goal Extract from a variable a character st~ing that is no longer than a specified
length and that does not end in the ~ddle of a word.
• I
:
I
Strategy Use the SUBSTR function and assigriment statements to create two new
variables from a character variable. One contains the character that would be
last in the extracted string; the other dontains the first character following the
1r
extracted string. If the last character the first character following the string is
a punctuation mark or a blank, assign, the extracted string to a new variable. If
not, use a DO loop to search backward in the string until a blank or
punctuation mark is found, and then ~riteI
to a variable the new, shorter string
that does not end in the middle of a word. 1
IU
Input Data Set:,I,,
i !I i
Variable COMMENT~ contains a character SURVEY
value. The twcnty-fir~t16haracter in each
value contains either d blank, a punctuation OBS COl•IMEm'S
1·1 '
mark, or a letter within!a word.
.'I I'
. I[,
i
I
The food was served 1n a timely manner,
I I '
1
• . od i
II 2 The service was good! Fo 1ras great!!
:I I 3 The waiter was very helpful and courteous.
;l ! 4 My chicken is great, but service is slow!!!
j 5 I love the restaurant!!! SerJice
I
is great!!
I IIi1 I
Resulting Data
I
sl':
I,
r, !
Output 6.8 NEW Da ~:Set I

:NEW
OBS CO!IMBNTS i NEWCOMNT
1· The food was served in a timely manner, The food was 11erved ·
2 'l'ho sarvico wnu· good I Food :was great 11 'l'he service was good
3 , The waiter wae very helpful 'and c::ourteou The waiter was very
4 My chickon le great, but se.i:vice :i.s slow My chicken is great,
5 I love the xeetaurant fl I Service is grea I love the
Program The objecti~ is to ~ate varWblc NEW:::~rn:cti:g ~::::;: •••
string from variable COMMENTS that is no longer thin I:
20 characters and that
: I
does not end in the middle of a word. Use the SUBSTR:function and an
/• I
assignment statement to create two variables: NEXT<!::¥AR consists of the

twenty-first character, and CUTPT consists of the twehtieth character. If
NEXTCHAR or CUTPT is a blank or a punctuatiori ~Jfk, the string doesn't
break in the middle of a word, so assign NEWCOMrtr, ;a value with the full 20
characters and write the observation. Otherwise, use anli.terative DO loop to
process NEWCOMNT again, on each iteration readi~g bne less character and
testing to see if it is a blank or a punctuation mark. W,~in it is, assign
NEWCOMNT a value whose length is determined by; t9e iterative process, and
write the observation with the new shorter value for;.1'.i"~'wcoMNT:
IiI! :,
i i
Create NEW. Read an observatio11from I data new(keep=cornments newcomnt); i
SURVEY. Assig11 length of20 to character set survey; ! ,!
• I
variable NEWCOMNT. I
length newcomnt $ 20; ' I:;
!
1::
Assign values to NEXTCllAR and ' 1
nextchar=substr(comments,21,1);
CUTPT. The SUBSTR function reads a 1 cutpt=substr(cornments,20,1);
number of specified characters beginning:at
a specified location in variable J
COMMENTS.. !
Test variables CUTPT a11d NEXTCHARJ

'
if cutpt in(''','';''.''?''!') or !
I
If eitlrer oftliem co11tai11s a blank or a i nextchar in ( ' ' ' , ' ' ; ' ' . ' ' ? ' ' I ' ) then
p1mct11atio11 mark, tlze11 write a value to : do;
NEWCMNTt!,at contaim the full 20 i newcornnt=substr(comments,1,20};
characters. Take 110 additio11al action a11d end;
allow the automatic output at tire bottom .of
tile DATA step to w1·ite a1z observatio11 witlz
the current valrie of NEWCMNT. '
:
If the previous co11ditio11 is 1101 true, then: else do;
NEWCOMNT e11ds i11 the middle of a : do i=19 to 1 by -1 until (cutpt in (' ' ' '?' I! I)) i
word, Read backwards throug/r the stri11g cutpt=substr(comments,i,1);
,mtil reaclti11g a bla11k or a p1111ct11atio11 ! end;
mark. Assig11 the 11ew shorter value to i newcomnt=substr(cornments,1,i);
NEWCOMNT. An observation with this end;
new value for NEWCOMNT is written to
run;
data set NEW automatically.
, oo r.xompre o,:,, i..i •·, 1..1,c1p1er o
fi
I :I i
I :1 !
Example 6.~ll Creating
'
SAS Datetimei Values
i
,1 !I
_ _ _ _ _ _ _!....- - - - - - - - !
Goal ·1] Create a new variable that contains a SAS datetime value from a variable that
contains a SAS date value and a variable that contains a SAS time value.
I: ;
;
I; i
Strategy a
Use the DHMS function to combine SAS date value and a SAS time value
into a single SAS datetime value. j
Input Data Sets1,!'I .,:

·, II I
In data set ONE, DA1"~VAL contains a ONE

SAS date value and T1MEVAL contains a
SAS time value. : i:I · OBS DA'i'EVAL 'i'IMEVAL
ij
!·I i 1 l9JUL1994 16:00:00
2 25DEC1994 14:22:05
3 01JAN1995 23:01:00
4 09JAN1995 9:35:01
,l'j •
Resulting Data· S~t
111 !
011tput 6.9 RES UL~~ f~la Set
RESUL'l'S
111
I; i
L
OBS
I
!DA'l'IMVAL
I• I
,• f i
l 19JUL94116100100
2 25DBC94:1412310S
3 01JM95123101100
4 09J~95109135101
!
Utilities a11d F1111ctio11s_ i I □ Exmnple 6.9 169
I I, I
I 11
: I,, 1
I
Program The objective is to use a SAS date value and a SAS time ✓~lue to create a new
variable that contains a datetime value. The OHMS functi8ri accepts four
numeric values that provide values for date, hour, minute,l~rid seconds,
respectively. It returns a single value in the form of a SA.SI tfatetime value. In
this example, the DATEVAL variable supplies the date v~ltle and the
TJMEVAL variable supplies the time, which is stored as: ~~!integer
representing the number of seconds since midnight: 1 i I.
I.
Create RESULTS. Read a11 observatioll data results(keep=datirnval); i
from ONE. set one;
Create a 11ew variable, DATIMVAL, to datimval=dhrns(dateval,0,0,tirneval);

co11tai11 a SAS datetime value. The DHMS format datimval datetirne. ;
function returns a SAS datetime value from run;
numeric values that represent the date, hour,
minute, and second. Because the values of i
TIMEVAL are SAS time values and these
values contain the number of seconds since 1I
I'
the previous midnight, it is not necessary I
for the hour and minute arguments to have a I
value. Zeros are used in their place and the
time variable TIMEVAL is used for the
i'
seconds argument. The FORMAT statement
i
I iI ,
·1 :
permanently associates the DATETIME.
format with DATIMVAL so that it will
Ii
I
·I :
always display in that format. ii'
I I, :
.i Ii
: i::I 1·.:;
I
Where to Go from Here :□ SAS datetime values. For a complete discussion of how:SAS handles
time, date, and datetinie values, see "Using SAS Date ~fi
Time Values"
on p. 85 and "Understanding SAS Date and Time VaIJ~s" on pp.
129-131 in SAS Language: Reference, Version 6, Fii·s~!Edition.
. !I · .
d
i
i'
!i
Creating a SAS Time Value from a Character
Value '
Goal 1ii'
;j ::I
i
Read a character value to create a SAS time value.
i
!
,am i
Strategy Converting a character value to a SA~ time value is a multistep process if the
value is not in a form that can be read with an existing SAS time informat or
function, such as the TIIYIEn. inform~t or the HMS function. If the data values
are not in such a form, you must firs(create a picture format that is in the form
expected by the SAS informal TIMEl 1 (or HMS function). Then use a series
of assignment statements that contai~ INPUT and PUT functions to transform
a character value into a numedc value and then into a SAS time value.
Input Data Set1J[ '

TIMEl contains the 6r:dractcr variable 'lIMEl
TJMECHAR. ' !, '
• I!: :: OBS 'l'IMECHAR
; i· i
I '
. :j ;
!
1 33.49
2 1:13.69
3 13:00:00.33
'! 4 1:13:43.45
-!
d
1·1
Resulting Data ~1e~
011tp11t6.l0 TIMB2D 1 Set ta '
H
'i
oas
TXHB2
i
SAS'r~MB
!
'l'IMB.CIIAR
1 0100133;'9 33,49
i 2 0101113:o 1:13,69
3 13:00100;33 13100100,33
',. 4 1:13:43l45 1113143,45
;
!
Program The objective is to create a SAS time falue from the character value
TIMECHAR in data set TIME!. First, 'use the PICTURE statement in
PROC FORMAT to create a user-defi~ed format that you will use later to put a
value into the propel' form for a SAS t(me value (ltours:mim,tes:seconds) so
that it can be read with the TIME! 1. informat,
I
Use the COMPRESS function to remoye the colons from the values of
TIMECHAR. Then read this value wit~1 the INPUT function and a numeric
informat to return a numeric value thatican be written with a picture format.
Use the PUT function and a picture fo~mat to write the numeric value to
variable TEMP2, so that it can be read 'as a SAS time value with the TIME! 1.
inforrnat and assigned to a variable named SASTIME. As an example, a value
that contains only seconds and tenths of seconds (33.34), will be expanded to
contain leading zeros for hours and mi1~utes (00:00:33.34).
-••••••..,.,. ~•••- ._ ... ,,.,.,■ .,;•u , i j":"""
1
,&,,,,W0,1.Ul ■J-'10:-. Vo.IV I I I
' :i
The value that is assigned to SASTIME is not only numeiW, it is also a valid
SAS time value. For example, the SASTIME value that:p~ihts as 13:00:00:33
is stored as 46800.33. Ii ·
Create the TME.Jormat, Use the proc fomat; i!
PICTURE statement to create a format that picture tme other=' 99: 99: 99, 99' ; 1 1: !
can be used as a template for writing
run,· 11
numbers. In this case, the format will be ' Ii
used to write a value so that it can be read •·1
as a SAS time value. 11
I
c,·eate TIME2. Perma11e11tly associate the data time2(drop=ternpl ternp2);

!
,i
TIME11.2fon11at will, the 11ew variable fonnat sastime timell.2; I
SASTIME, Read a11 observation from set timel; .I
TIMEI. i'i
I
'I
Create a SAS time value and assig11 it to templ=compress(timechar,':'); :1
tile 11ew variable SASTIME. The ternp2=put(input(templ,11.2),tme.); . 'i
COMPRESS function returns the value of sastime=input(temp2,timell,); ' ,i'
TIMECHAR without the colons, and the lj
run; ! ·!
assignment statement assigns the new value
to TEMPI. In the second assignment
statement, TEMP2 is assigned a character
value as the resu It of two functions. The
INPUT function returns a numeric value by
reading the character value TEMPI with the
numeric informat 11.2. The PUT function
then returns a character value written with
the user-defined format TME. The last
assignment statement assigns a valid SAS
time value to SASTIME by using the
INPUT function to read the value of
TEMP2 with the TIME 11. informat.
Where to Go from Here D PROCFORMATand thePICTUREstatement. , , I

• . I I' I
o For complete refe'rence information and example~ ,s1 Chapter 18, ee
"The FORMAT Procedure," in SAS Procedures Gtiide, Version 6,
Third Edition, , . i IJ Ii
D For another example, see pp. 356-358 m SAS_ 0,~g11age and
Procedures: Usage 2, Version 6, First Edition. : l ':
□ For an example of eliminating the leading zeros :o~ ~umbers between
z~ro and one, see the Input/Output article by Ph~li~ ~helton and Jason
Sharpe, Obse,vations, Fourth Quarter 1993, 57-,6q.!i
D INPUT and PUT functions. For a complete referen~d ihformation on
these functions, see Chapter 11, "SAS Functions," in SAS Language:
Reference, Version 6, First Edition. For numerous exaiil~les and
explanations, see SAS Language and Procedures: Uia~4:2, Version 6,
First Edition. ! 1:1 !;
□ SAS date and time values. For.
a complete discussi~ni6fhow
. 'I.
SAS
handles time, date, and datetime values, see "Using SAS:Date and Time
Values" on p. 85 and "Understanding SAS Date and)1iriie Values" on
pp. 129-131 in SAS Language; Reference, Version 6j Edition. '}/:st
i :I I.;
;I! ::
I
:! :
:i
i
:i 1:
·l•I
, , ,:, r,xc,mp1e u.,, , f-f, 1-''"'JI'",. u
II :
! LI :
Calculating a Person'~ Age
i
I
I
Goal Determine a person's current age using

I
their date of birth.
I
!
Strategy Detennine the current age of each petson' in a data set by subtracting the SAS
data value of the date of birth from the current date. Use the TODAY function
to obtain the SAS date value of the c~rrent date. Use the INTCK function to
count the number of months between: the date of birth and the current date.
Divide the number of months by 12 t<:> produce the number of years. _Use the
MONTH function to determine if the;month of the birthday and the current
date are the same. If they are, determine if the birthday has occurred this year.
Ifit hasn't, adjust the age by subtract~ng one year.
Input Data Set 1

• BIR'l'H
OBS NAME :BDAY

;
1 Miguel December 31,; 1973

2 Joe February 28, 1976i
3 Rutger March 29, !1976
4 Broguen March l, i1976
5 . Susan December 12, !1976
I
6 , Michael February 14, jl971

7 LeCe ,November 9, 11967
8 Hans July 2, ;1955
I 9 Lou July 30, il960
Ii :
Resulting Data ftP!

Output 6.11 AGES a a Set
Ln.OES
I
OBS IWIB 1BDAY CtmllBNT A8B

i
1 Miguel Dacember 31, !1973 July 10, 1995 21
2 Joa Fabruaey 28, 11J76 July 10, 1995 19
3 autgor March 2.9, ;1976 July 10, 19.95 19
Broguen March 1, :1976 1.9
'
5
6
Susan
Michael
December 12, "l.97 6
rebruaxy 14, :1971
July
July
July
10,
10,
10,
1995
1995
1995
18
24
7 LeCe November 9, 1967 July 10, 1995 37
8 Hans July 2, 1955 July 10, 1995 40
9 Lou July 30, 1960 July 10, 19.95 34
i
~ ' ~ = - -.....- =:-
Utilities and F1111ctio11'.s 1.:r• &-ample 6.11 173
: 11
I I:
Program The objective is to determine the current age of each pe~~~:in data set BIRTH.
Use the TODAY function in an assignment statement to:as$ign the value of the
; current date to CURRENT. Use the INTCK function to ;etii~n the number of
· . i' 1·
; months between the person's date of birth and the current pate. If the birth
i month and the current month are the same, adjust f?r the; f~:~t that the birthday
may not yet have occurred. Use the MONTH function to' retµrn
11,
the month from
each date and the DAY function to return the day from eac~!date:
. : 11 :
Create AGES, Rend a11 observatio11from data ages; '. : i
BIRTH. set birth;
I
; i:
,1
1 '
Set CURRENT to a SAS date value current=today () ;

I I!,, 1'
represe11ti11g today's date. Assign a SAS format current worddate20.;
date format to CURRENT. It is always
advisable to associate a format with a date
value.
Assign a value to AGE. The INTCK age=int(intck{'month',bday,current)/12);

function calculates and returns the number
of month intervals between a person's date
of birth and the current date. After that
number is divided by 12 to produce the age
in years, the INT function returns the
integer portion.
Whe11 the c11rre11t 111011th is the same as the if month(bday)~month(current) then

bi11li 111011th, adjust the value of age, based age=age-(day(bday)>day(current));
oil wTietTier the birthday has occrirred. The
, I
MONTH function returns a value between 1
run; I I
and 12 that represents the month of the

current date value of BDAY and
CURRENT. When adjusting the value of
AGE, the assignment statement uses
Boolean logic to return a value of0 or 1. If
the day of the month for BDAY is greater
than CURRENT, then the birthday has not
yet occurred, so a value of 1 is subtracted
from AGE. If the birthday has occured, a
value of 0 is substracted from AGE.
! !
Ii
I
Ii! 115
!:ii
A P P E N D I X 1
ii'
1) ;
: 11 :
Error Checldng When Using MODIFY or ::·ET

I 11 ,
with KEY= ! 11 !
,!;;' .:.
i
/Why Error Checking? 176 ;
' i
• I! ,
New Error-Checking Tools 176
1 1
i; !
I ; I.· :
!Example 1: Routing Execution When an Unexpected Conpiii :n Occurs 177
I ! :_1 !;
: . : i-1 :
Example 2: Using Error Checking on All Statements That'Us ; KEY= 180
! 11 :
II :
, ,o YYIIJ l!.rror 1.,,reci1i,k,:
I!I i
Why Error c:11ecking?
! :
!
!
When reading observations with the ~ET statement and KEY= option or with
the MODIFY statement, error checkii1g is imperative for several reasons. The
most important reason is that becaus~ these tools use non.sequential access
methods, there is no guarantee that rui observation will be located that satisfies
the request. Error checking enables y~m to direct execution to specific paths,
i depending on the outcome of the YO operation. Your program will continue
an4
.Ii execution for expected conditions
results occur. i
terminate execution when unexpected
i.
New Error-,:·~;, ~eking Tools
:i : Two tools have been created to make ~rror checking easier when you use the
• !. : MODIFY statement or the SET state~ent with the KEY= option to process
. ! SAS data sets: i
1
I), :'
·!; □ _IORC_ automatic vadable
□ SYSRC autocall macro.
'
_IORC_ is created automatically whe11 you use the MODIFY statement or the
SET statement with KEY=. The value:of _IORC_ is a numeric return code that
indicates the status of the I/0 operatioh from the most recently executed
or
MODIFY SET statement with I<EY;=. Checking the value of this variable
i for abnormal YO conditions· enables you to detect them and direct execution
'
!' down specific code paths instead of having the application terminate
abnormally. For example, if the I<EY~ variable value does match between two
observations, you might want to combine them and output an observation. If
they don't match, however,. you may wantI
only to write a note to the log.
I
Because the values of the _IORC_ aut9matic variable are internal and subject
to change, the SYSRC macro was created to enable you to test for specific YO
conditions while protecting your code from future changes in _IORC_ values.
Using SYSRC, you can check the valu~ of _IORC_ by specifying one of the
mnemonics listed in Table A.l. 1
i
I
I ,
Ii I
I • 111
En·or Checki11g Whe11 Usi11g MODIFY 01· SEI' with KEY= □ E:rample 1: Ro11ti11g .Exec11tio11 W/re11 n,1 U11bxp'e' ted Co11ditio11 Occurs 177
' 1., .
t1
I•
I jl
Table A.I List of Most Common
Mnemonic Value Meaning of Return Code · 11 .
Mnemonic Values of JORC_for DATA
Step Processing
_DSENMR The TRANSACTION data s~tibbservation does
not exist in the MASTER dataisbt. (This return
code occurs when MODIFY W)th BY is used and
no match occurs.) : ·1·1 !:
l 1•
.J)SEMTR Multiple TRANSACTION d~t~ ~et observations
with the same BY variable .v~l❖~J' 'do not exist in
the MASTER data set. (This :ret rn code occurs
when MODIFY with BY is u'sJaiand. l•:J, consecutive
observations with the same B,Y(:i"[?lues do not find
: a match in the first data set. In tnis situation, the
· first observation to ~ail to fin~ ~:r.atch returns
'_DSENMR. Followmg ones fe~mn _DSEMTR.)
_DSENOM . No matching observation wa~ ~opnd in MASTER
data set. (This return code occuis'.when SET or
MODIFY with KEY= finds ~oi11#~tch.)
: !1-
_SOK The I/0 operation was success\~l: (This return
code occurs when a match is fou.11d.)
lI !I!:
I I.
i !i I:
l 1·1!•
Example 1: Routing Execution When an Unexpected Condijt~~n Occurs

i rili
This example shows how .
to prevent an unexpected ccin8ition
. 1·11·
from terminating
the DATA step. The goal is to update a master data s~t with new information
from a transaction data set. This application assumes ~h~tlfhere are no duplicate
values for the common variable in either data set.* ! I! !
i 1 l1 1
Input Data Sets d1i li i 1
1 111 j
The TRANS data set contains three MASTER 'l'WS
observations: two updates to information in : I•
; 1: :
MASTER and a new observation about OBS PARTNO QUAN'l'I'l'Y OBS PAR~Nl : ADDQUAN'l'
PARTNO value 6 that needs to be added,
MASTER is indexed on PARTNO. There 1 1 .10 1 4' . :' 14
are no duplicate values of PARTNO in
2 2 20 2 6 ,: ' 16
MASTER or TRANS. ,3 2: i; :
3 3 30 12
•• !,1;
4 4 40 I I!:
: 1:1:
5 5 so ,:1:
!. '
I
'" This program works as expected only if 1he master and lransactid~ hata sets contain no
consecutive observations with the same value for the common val:i~ble. For an explanation of
the behavior of MODIFY with KEY= when duplicates exist, see SAS Technical Report
P-242, SAS Softtllare: Clra11ges and E11l1a11ce11re11ts, Release 6.0B,!P~ges 4 and 8-10.
. 1,•',
1· :
f :
1; •
I
.!
ii
1;
I:
ii' ;
1··1 ·
I ,o ~At.t111p1.t: .1.; .l\UHtl~lg ~u:C:HllUlt rrllt::11 trll Ullt:J.p~Clt!tl l,UltctlUUII vc:c:nrJ
1:1 ;
11' ;
I:·'
Correctly Upda~e,d MASTER

Data Set J!I : I
Output A.Ia MAST~~ Data Set
MASTER
,!Jil :
OBS P~TNO QUANTITY
MASTER contains up4a~ed quantities for
,,,, 4 and a new
PARTNO values 2 and 1 !1 10
observation for PARrf?
value 6.
2
3
2
3
32
30
11.!I ,
:,-F ;
I'.
4
5
"4
5
54
50
6 :6 16
·;:11-I '
!
Original Progr+i)li The objective is to update the MASTER data set with information from the
I TRANS data set. The program reads trRANS sequentially. MASTER is read
directly, not sequentially, using the N):ODIFY statement and the KEY= option.
Only observations with matching values for PARTNO, the KEY= variable, are
read from MASTER.
,1 '
Opell MASTERf01· 11p_ifate. Read a,i data master;
obser11atiorifrom TR{l./:{S. Match set trans;
obser11atio11sfrom M~S7'ER based Oil tlie modify master key=partno;
val11es of PARTNO. Ujdate
I ' •
tire quantity = quantity + addquant_,
i11formatio11 o,i QUANJ'JTY by addi,ig the run:
11ew valttesfrom T. . NSI •
,
I
l'
:1
.,ii
Resulting Log •1· !:I '.
I! :
Output A.lb Log Meiliage about DATA
Step Ending Ij:I : ERROR1 No matohing observation was foubd iD MASTER data set.
PARTN0•6 AVllQ11AllT=16 QUAN'l'l'l'Y=70 _EIUIOR....=1 _IORC'_=1230015 _ti_~2
NOTB1 The SAS SysteJD stopped processing this step becauae of errors.
This program has corrjtl~ updated one NOTB I The data set WOIU(,1111STBR bas been updated, 'lhere were 1
obsurvationa xo1tXitten, 0 observations added and O obsei:vation11
observation but stopp~~!\\'.hen it could not deleted, !
find a match for PAR1.1~ vob,e 6
-I:
: ! ;
,.,
F,lesulting
.
Data Set
11·! :
·1 q :
Output A.le Incorre1tly lJpdated
MASTER ·iii I :
MASTER
I
OBS PARTNO QUANTUY

i·I : i
The updated
.,
master ha~:five observations,
1 ·, '
1 1i 10
One observation was updated correctly. a 2 2 20
•ti . 3 3 30
new one was not addedJarid a second 4 ( 54
update was not made. 1 I{ 5 5 50
11;:
i'I
I.
!'1
I•
!"i
ii!
. -I
·I t
I 'i
I
i
'I
:I
-·. -· _,. __·····o .. ··-·· --.. ·o ··---·· • i. --- ...... ··-· -· - ,,,r1rr~:"""'u" ~""'""
- .......,..... ··-·· .. ··o -··•-•n•-............. ~ ....
, i! I
Revised Program The objective is to apply two updates and one addition to[MASTER,
preventing the DATA step from stopping when it does pnd a match in not
MASTER for the PARTNO value 6 in TRANS. By addiri~'error checking, this
DATA step is allowed to complete normally and producd al co1Tectly revised
version of MASTER. This program uses the _IORC_ aut?rhatic variable and
the SYSRC autocall macro in a SELECT group to check tlt'e value of the
_IOR~- variable and execute the appropriate code based whether or not a
match 1s found. :
~t
I, ;
j :, '.
Ope11 MASTERJ01· update. Read an

obser11'1tio11from TRANS. Match
data master;
set trans; :i:I.
observatio11sfrom MASTER based 011 the modify master key:partno; ,1
V'1lue ofPARTNO. 1'
1!
T'1ke the correct course ofactio11 based 011 select l_iorc_); :,
11•.i
whether'1 matching value/or PARTNO is when{%:sysrc (_sok)) do;

found i11 MASTER. Update QUANTITY quantity= quantity+ addquant; 1:l
by adding the 11ew values from TRANS. replace; H
The SELECT group directs execution to the end;
p
correct code. When a match occurs (_SOK),
update QUANTITY and replace the original
when {%:sysrc (_dsenom) } do; ii·;'
:1
I: .
quantity= addquant;
observation in MASTER. When there is no
match (_DSENOM), set QUANTITY equal _error_= O;
output;
:Ii
: ;
to the ADDQUANT amount from TRANS, I!
and append a new observation. _ERROR_ end; I: ,

is reset to O to prevent an error con di ti on otherwise do; : Iii :
that would write the contents of the put 'ERROR: Unexpected value for JORC_= 'Li!o1c_;
program data vector to the SAS log. When put 'Program terminating. Data step iterati~11,:1 '. ';_n_;
an unexpected condition occurs, write put _all_; ; I)
messages and the contents of the program stop; : ,
data vector to the log, and stop the DATA end; 1 1: ;
step. 1
end; I'
run; i 1·
f
Resulting Log
Output A.Id Log Message
NOH: The data eet WORX.MASTBR has bean updated, There wer~ JI II
observations rewritten, 1 observations added and O obs.·e.rvl'.altlona
The DATA step executed without error and deleted. ' ·! : i i
observations were appropriately updated i .I :
and added.
i
I ;
'\ II '11
, ;
I
iSee the con·ectly updated version of MASTER in Output:A\j{h..

I(
;i
In :
180 E:r:ample 2: Usi;,k:a,:,vr Checki11g OIi All Staleme111s That Use KEY=
'Id '
:I f·I i
Example 2: w~ing Error Checking on All Statem~nts That .Use KEY=

' I! ; This example shows how important {t is to use error checking on all statements
1 i that use the KEY= option when reading data.
I:
!i
Input Data Set~! i :
The MASTER and D~i6RPTN
are both indexed on }l~RTNO. The
data sets MAS'l'ER ORDER
ORDER data set con~~i:ns values for all OBS PARTNO QUAN'l'I'l'Y OBS PARTNO
parts in a single orde(.]bnly
,,I '
ORDER
contains the PARTN9NI'~lue 8. 1 1 10 1 2
. 1:1 : 2 2 20 2 4
'1·1 ' 3 3 30 3 1
. l;Il1 4 4 40 4 3
5 5 50 5 8
lt
6 5
7 6
DESCRPTN
OBS PART.NO DESC
1 4 nuts
2 3 bolts
3 2 screws
4 6 washers
i·I
Correctly Creat~cl COMBINE
Data Set ;I i:
Output A.Za COMBfNE Data Set
COMBillB
Note that COMBINE !~e~ not contain an 0811 PAllTNO DBS~ QUANTITr'
i
observation with the p'~RTNO value 8. 1 2 llCXOlrll :lO
2 nuts 40
This value does not ocbJr in either
MASTER or DESCRITTN'.
3 '
1 No description
bolts
10
'
5
6
3
5
6
No ~Hcdpti0n
washers
30
50
0
I
I
The objective is to create a data set th~t contains the description and number in
stock for each part in a single order, eicept for the parts that are not found in
either of the two input data sets, MAS'fER and DESCRPTN. A transaction
data set contains the part numbers of all parts in a single order. One data set is
read to retrieve the description of the ~art and another is read to retrieve the
quantity in stock. !
!
i
The program reads the ORDER data set sequentially and then uses SET with
the KEY= option to read the MASTE~ and DESCRPTN data sets directly,
based on the key value of PARTNO. When a match occurs, an observation is
written that contains all the necessary ihformation for each value of PARTNO
l!.1'1'01' !.-1/eC/Clllg WIie/i usmg MUUII' l' or ;)l!.L 111ltll Kl!.r= U J.!Xatnpte ~: USl/lg l!./'/'01' UleC/Clllg OIi All ~·tq(e(llellts 1710I Use KEY= 181
' : 11
Create COMBINE. Read an observatio11 ,

:
1'
:
in ORDER. This first attempt at a solution uses erro) dti!cking for only one of
the two SET statements that use KEY= to read a dat11-' ~~
data combine;
I .
: I
:
I
1 1
",•om ORDER. Read a11 observationfirom! set order; ·, 1
~ESCRPTNa11do11e%mMASTER
J'
! set descrptn key=partno;
•!i·
·,
basedo11a111ntcl1i11gval11eforPARTNO,i set master key=partno; fi
tlle key variable. Note that no error • 1·,;
checking occurs after an observation is rea'd :i
fromDESCRPTN. i l·i
l:i
;j
Take tile correct course ofactio11, based 011 select Liorc_); i!
,:I
wlzetlter a matc1ii11g value/or PARTNO ii when(\sysrc(_sokl) do; J'i
/01md in MASTER or DESCRPTN. (Tl,is
logic is based o,z the erro11eo11s asm111ptio11
1
end;utput; 1r
l'I: '
ti,at tliis SELECT gro11p performs error I when(\sysrc{_dsenom) l do; 'I

c/recki11g for botlt of the precedi11g SET ! desc = 'No description'; li-:li ;'
stateme1its tlzat co11tain the KEY= optio11. !
It actually perf01·111s error checki,ig for :l _error_ = O; :i
011/y tlie most rece11t 011e.) The SELECT i
group directs execution to the correct code,;
end;utput; i :!1: ;
• do;
otherwise ;i 'I' ;: ;
When a match occurs (_SOK), the value 9f
PARTNO in the observation being read put 'ERROR: Unexpected value for _IORC_='. ,,:J iorc_;
from MASTER matches the current i put 'Program terminating.'; : !I Ii
PARTNO value from ORDER. So, output I put _all_; ;I I
an observation. When there is no match :
(_DSBNOM), no observations in MAS~
stop; jil:
end; 1i i
contain the current value of PARTNO, so i end; j'.!
set the value ofDESC appropriately and ! run,· 11
output an observation. _ERROR_is reset tri jl
0 to prevent an error condition that would ! 'iI
write the contents of the program data i :I,,,
vector to the SAS log. When an unexpected n
i·I
condition occurs, writ~ messages and the '
-i~l
contents of the program data vector to the
log, and stop the DATA step. I·II
11
l!i
I:!,:,
1:!
-:i
1
·1.
Resulting Log Ii:
,,llli..
Ot1tp11t A.Zb Log Message
·ti!: '
This program creates an output data set but
executes with one error.
...........................,. .- . . 1:r
PARTNOa1 DBSC=nuts QUAN'l'll'lr=lO _BRROR_•l _IOllC_,.O JL=3 f
PARTN0,,5 DHC=Ho deacription QIWITI'l'Ya50 _!IIIIOR_c1 _IORC~=q,:-!ll..•6
,i! i
i '
i
i
!
ii
iiH
IJ
1:1
·I
Resulting Dali-Ir~t
OutprttA.2c Incorrectly Created
COMBINE '] ii i OBS l!ARTNO
:coMBilll!
I
DISC QUA!fl'ITr
!!11 : I
Observation 5 should not be in this data set. 1 2 BCrl!IB 20
PARTNO value 8 do~ n:ot exist in either 2 4 nut& 40
3 1 nu'ta 10
MASTER or DESC, so no QUANTITY 4 3 bolts 30
should be listed for it. !A.Jso, observations 3 5 8 No' description 30
Ii 5 No' description 50
and 7 contain descri~t!~ns from 7 6 No'. description 50
observations 2 and 6,jrespectively. '
'·I -
·11'.1 !
IqI·1! : !
I
Revised Program! To create an accurate output data set,!this example performs error checking on
·1i!i i both SET statements that use the KEY= option:
!: '
Create COMBINE. I(~~~ an obser11atio11 data combine(drop=foundes);
fro,n OIWER. Read~1i ,,, obser11atio1ifrom
' set order;
DESCRPTN, 11si11g P~:R'[NO as tlie key foundes = O;
variable. FOUNDES is !created so that its set descrptn key=partno;
value can be used latdr!lo:indicate when a
PARTNO value has a!¥atch in
DESCRPTN. •ji'I,- :'
Ii :
Take the correct collff~ ofactioll based on select (_iorc_);
whether a matclti11g.!~~1{efor PARTNO is when(%sysrc(_sok)l do;
fom1d in DESCRPT~·!The SELECT group foundes = 1;
directs execution to lheli t:orrect code based
I I · end;
on the value of _IOR<j:.J When a match when(%sysrc(_dsenom)) do;
• 'I -
of
occurs (._SOK), the v!(!Ue PARTNO in desc = 'No description';
the observation being read from
_error_= O;
DESCRPTNmatches hib currentvalue
'
from ORDER~ FOUNDES
111 '
is set to 1 to end;
lh
indicate that DESCRP[l'Ncontributed to the otherwise do;
current observation. wiibn there is no match put 'ERROR: Unexpected value for _IORC_= _iorc_;
I
(_DSE.t:-JOM), no obserXations in put 'Program teminatingJ Data set accessed·is DESCRPTN';

DESCRPTN contain th~ current value of put _all_;
PARTNO, so the descri~t(onis set _error_= O;
appropriately, ....ERRO~l_is reset to Oto stop;
prevent an error condiudn that would write end;
I II '
the contents of the pro~f?qt data vector to end;
~he _SAS log. Any ot)1eml9RC_ ~a.lue
indicates tl'!at an u11ex11~91~d condition bas
been met, so mcssagesi¥ie'written to the log
and the DATA step is ~~opped.
11:1,
fl
·Ii,
·!:(
!11
!Ii
:1;1·
_,,
l'I
I~ -
I'
f
·1
,:
ti
1-1
!,l"
!-I'j
Read a11 observation from MASTER, set master key=partno;
j
usi11g PARTNO as a key variable.
Take the correct course ofaction based ~,z select (_iorc_) ; l;J l_
whether a matching value for PARTNO is when (%sysrc (_sokl) do; H

found in MASTER. When a match is found output;
(_SOK) between the currentPARTNO : end; l:I'I
value from ORDER and from MASTER, i when{%sysrc(_dsenom)) do;
write an observation. When a match isn't ! 1:i
if not foundes then do; • j;f 1
found (_DSENOM) in MASTER, test the j
value of FOUNDES. If FOUNDES is not i _error_ = O; . ; d !;
true, then a value wasn't found in put 'WARNING: PARTNO ' partno 'is no't !fn•
i j:; 1:
DESCRPTN either, so write a message to j ' DESCRPTN or MASTER,'; I !:t j:
r, I
the log but do not write an observation. If! end;
else do;
I·,: i:
. I'
FOUNDES is true, however, the value is i~ 11 i;
DESCRPTN but not MASTER. So write an quantity = O; Ii H
observation but set QUANTITY to 0. _error_= O; 'j !:
I
Again, if an unexpected condition occurs, : output; I
write a message and stop the DATA step. ! i ·;
i
end; !!
end; : q
otherwise do; :
put 'ERROR: Unexpected value for _IORC_= l'.iorc_;
I?t:
put 'Program terminating, Data set acce~s~J\s MASTER';
put _all_; ' 11:.-._l, ]ll.•.·

_error_::: 0;
stop; 1:i I
~d; f:
end; / * ends the SELECT group */ I' j
11 i;
111 :
I ,
Resulting Log I(1:,:

OutputA.2d Log Message
WABNING1 PARmo 8 is not in DESCRPTN or MAS1'BR, i I:
1:1
N01'Jh The data set WORK.COMBIHX hae 5 observations and 3: ~litiables,
The DATA step executed without error. Six HI.
observations were correctly created and a ·
message was written to the log.
1 l· 11·: ·
· rl , 1
See the correctly created version of COMBINE in OutP,u A.2a.
' I'
I
I
III
i,,
I''
I
!1
I,
11
·1
:1
1naex 1: I
.'. i!•!' !1
, HI
A collapsing observations within BY

single observation 140-141 :
jJlI: I :into
·
ABS function, example 20 : comparing all observations with sanie ,BX
access methods
direct 5
va~ues 102-106 : Ii L
subsetting data set based on calculated,l , •
sequential 5 ;
age of person, calculating 172 average of BY group 134--135 !.1;1.1:
DY statement ! ii :
APPEND procedure, example 116 collapsing observations within BY gronpiinto
arithmetic operations, applying to group of
variables, example 124-';-125
single observation, example 14:1.11 :i;:
requirements for BY-group processing ;! p
array processing · :
processing array correct number of times
reshaping observations into multiple; L
variables, example 146 : :ii!
l~I
160 ! separating unique observations from; j; i;:
processing two-dimensional array 33 . duplicate observations, example: Ii. I'
purpose and use 11 i 110-111 ; :! I;
array processmg examp es I
• ] I
applying common operations to group of

summary of use, table 10 ! lil !:
BY statement with MERGE statement : 1:l Ii
variables 124-125 ; adding variables from transaction data ~~t to
collapsing observations within BY group into
single observation 141
master data set, example 74-75, 1 :i I:
applying transactions to master data ·set based
combining and collapsing ~bservations by on common variable 48-49 ! 1:i I:
common variable 51-52 merging dala sets by common variable) I'. i
delaying final processing o~ observations
83-85 t
example 16-17 1 !j I
merging observations. based on comm.oh I'
expanding single observati<?ns into multiple variable, example 46-47 i I! i '.
observations 142-143 updating master data sets with nonmis~iilg
obtaining lag value of variable within BY
group 122-123
values, example 76-:77 i i,! Ii
BY-group processing , i ! 1:
sorting variable values within one FIRST.variable and LAST.variable 11ii i:
observation 159 ·
table lookups with unindexed lookup data set
tools for BY-group processing 11 ! !i
• I'·•I
Ii
unexpected results when testmg or res 17tting
32-33
ARRAY statement
i
I
variables 99-101
BY-group processing examples
! lij
!r I!,I' 1 I :
Ii
defining arrays 11 i generating every combination betwee!1 <l~ta
temporary arrays, example i 34 sets with index available 86-91: Ji I!
AS clause, example 94 [ obtaining lag value of variable within B,Yj !
asterisks (**) representing e~onentiation 40 group 122-123 H',
AVG function, example 135i
I
reshaping observations into multiple I! ;
variables 144-146 ,j ,
separatmg' uruque
• ob servations room 1i,;' 1
\:i !:
B
BEST12. format 153
'
duplicate observations 111
r It
binary searc;hing technique ini formats 65

bubble sort '
C li
!.i 11.·
Ji
definition 158 calculated columns 1• '
limiting the number of comparisons calculation, example 40 J;'!
performed 160 f combining observations based on calculation
sorting variable values within one of varia}Jles, example 38-39 ! 1•.11:.
observation, example 158-160 . Cartesian product :. . ; 1·.
BY group examples i definition 3 8 i i: I
adding new variable containing frequency of generating every combination of observatmns
I· 1:
DY-group value 132-133 between data sets,. exa_mple. 78:-79( !
calculating percentage of BY group total for generating every combmation WJth mdex ;
each observation 129-~31 available, example 86-91 1
:i ·
calculating totals across BY group for CASE expression, example 94 ·!
producing totals 126- ~ 28 CEIL function, example 36-36
:I
I
Iii\
11 I
I".
;, 1
I: I
r!I·[
J•:
f'.
um inaex
character variables table lookups with indexed lookup data set

conve1·ting variable type from character lo '30
nuineric, examplD 148-149, 152-153 tablJ lookups with large inde.,:ed data set 72
creating SAS time value fronl character table lookups with large nonindexed lookup
value, example 170-171 data set 65
determining type of variable's content, table lookups with small lookup data set 62
example 150-151
determining whether variable is character or ;
11umedc, example 152-154 !

CNTLIN= option, FORMAT procedure
D I'
building formats dynamically 65 '
data relationships 2-4
table lookups with large nonindexed data set, categories 2
example 63-64 many-to-many 4
collapsing observations one-to-many and many-to-one 3-4
See observations, collapsing one-t~-one 3
colwnns data sets
creating table with attribute information from See SAS data sets
another table, example 155-157 DATA step
overlaying columns having same name, compared with OUTER UNION set
example 96-97 operator 97
columns, calculated ensuring processing of last observation 22
colculation, example 40 prev~nting from ending automafic~lly 20
combining observations based on calculation stopping explicitly 22 ·
of variables, example 38-39 stopping explicitly, example 20, 21, 55
combining data sets unexpected results when testing or resetting
See SAS data sets, combining variables 99-101
composite index DATA ~tep examples
table lookup with composite index and addin:g values from last observation to all
duplicate values in data set 66-69 observations 45
COMPRESS function addin~ values to all observations 42-43
counting occurrences of string within calcu~ating totals across BY group for
variable, example 164-165 P,roducing totals 126-128
creating SAS time value from character removing obsei-vations from master data
value, example 170-171 · s:7-58
concatenation of data sets 6 routing execution upon occurrence of
CORR keyword, example 96-97 unexpected condition 177-179
COUNT function, example 132~133 simulating LEAD function by looking ahead
CREATE TABLE statement examples at observations 120-121
adding new variabl~. containing frequency of table iookup witl1 composite index and
BY-group volue 133 duplicate values in data set 66-68
calculating percentage of BY group .total for data vec~o1·
each observation 131 adding values to all observations, example
combining multiple data sets without variable 42-43
common to all 94 applyihg transactions to master data set based
combining observations based on calculated o~ common val'iable 48-49
variable, example 39 date Val'-;1-flS
combining observations with inexactly combining observations with no common
matching variables 23 v~riable, example 25
combining observations with no common DATETIME format 169
variable 26 datetim,; values
comparing all observations with same BY creati~g SAS datetime values, example
values 106 . 168-169
creating table with attribute information from datetime variable formatting, example 20
another table.155-156 DELETE: statement, SQL
generating every combination between data removing obse1·vations from master data,
sets with index 91 example 68-59
·generating every combination of observations reports not produced automatically 59
between data sets 78-79 DESCRIBE TABLE statement, SQL
generating every combination of observations creatirig table with attribute information from
with common column 81 a~tller table, example 155:...156
intel'leaving nonsorted data sets 97 DHMS function, example 168-169
subsetting data set based on calculated DICTIONARY libref 155
average of BY group 135 dictionary tables
table lookups with composite index and creating table with attribute infor1nation from
duplicate values in data ~t 68 a~otlier table, example 165-157
1na._ex I UH
; I: I
determining whether variable is character or DROP= data set option : !! I;
numeric, example 152-154 converting character variables to numJi-tc
subsetting with WHERE clause 152-153, variables, example 149 i Iii I.
155
expanding single observations into multi._.·r' e
DIM function examples , observations, example 143 ;! : i 1.
applying common operations to group of -DSEMI'R return code, table 177 , ii .

variables 125 · _])SENMR return code, table 177 :i I I
delaying final processing of observations 85 -DSENOM return code · i ·
obtaining lag value of variable within BY error checking for all statements using 'I .
group 123 KEY= option 181-183 ' '!
• 1·• '
direct access 5 : purpose, table J77 • i! :
by observation number 5 i removing observations from master data, :
by value of variables 6 i
DO loop examples I
example 57-58 : j:
routing execution upon occurrence of i I:
Ii
accessing observations fron:i beginning and ' unexpected condition, example -179' i
end of data set 112-114 ! Ii :
adding observations to data sets based on . 11.1 ::
I
,t
. specific variable value 1118-119
adding observations to end 1of data set
115-116 :
applying common operations to group of
:... ma-
See also IF-THEN/ELSE logic examples!! 1,
, Ii
variables 124-125 ;I table lookups with unindexed lookup data set,
. .. 1·
applying transactions to master data set based example 32-33 · I:! 1 I
on common variable 48-49 end of data set j:i I:
applying transactions to ma~ter data using preventing SET statement from readingft,ast
index 53-54 j end of data set 18, 20-22 '. II i !:
collapsing observations witlµn BY group into END= option, SET statement · ' • 1.
single observation 140~141 adding observations to end of data sel I:
!;
combining observations witji inexactly example 115-116 j I. I:
matching variables 20-j-22 END= variable I I• I;
combining observations witlt no common determining last observation of data set' l !
variable 24-26 ! example 18, 20, 22 i !:
comparing all observations ~th same BY error checking 176-183 , I
values 104-105 i See also ....lORC- automatic variable ;i
delaying final processing of :observations error checking for all statements using Ir
82-85 i KEY= option, example 180-183, j 1
expanding single observatio~s into multiple reasons for using 176 f'!
observations 142-143 I routing execution upon unexpected conditlon,
extracting character string without breaking example 177-179 • ii I;
text in middle of word Ii166-167 tools for error-checking 176-177 :_ -11;; :_
limiting the number of comparisons -ERROIL variable examples ! i I•
performed in bubble sort 160 applying transactions to master data usi~i 1
obtaining lag value of variable within BY index 55 ' Ii,:
group 123 I combining and collapsing observationi by I:
random matching of observations 36-37 common variable 52 . l 1·
sorting variable values within one error checking for all statements using -, :I
observation 158-160 ; KEY= option 181 ; j j;
table lookup with indexed Iobkup data set
29 I
table lookups with small lookup data set
generating every combination between 1 data
sets with index,. example 89 i II
rem?ving obse;vations from master d~t~ i 158
r
60-62 r routmg execution upon occurrence of ; !:1 I'
table lookups with unindexe~ lookup data unexpected condition 179 , t'i 1:
set 32 ! table lookup with composite index and. Iii
,_
DO UNrIL loop examples I duplicate values in data set 6B ; •I 1:
combining and collapsing ob~ervations by table lookup with indexed lookup data 1set t;
common variable 51-52 29 1 Id I:
generating every combination between data table lookup with large indexed lookuJ d~t~
sets with index available 86-91 set 71-72 · 11
: 1, I
I
processing array correct number of times
160 I . I'.I' I
I:·
removing observations from rilaster data
57-58 :
F ,/
sorting variable values within one FIRST.variable 11 · Ii
i 11 I
1
observation 158-160 ! calculating totals across BY group for '. i I I·

DO WHILE loop, example 54--;55 producing totals, example 126-1~7•1!,
I 1!
1
I
11
II
i I
11
:I
ii II
.uns 1naex
FIRST. variable (continued) full outer join 22

collapsing observations within BY group into fuzz f~ctor for rounding
single observation, example 140-141 MACROUND macro and fuzz factor
! I purpose and use 11
separating unique observations from
;138-139
nu~eric precision considerations 137-138
I! duplicate observations, example rounding numbers with paper-and-pencil
110-111 'results, example 136-137
FIRSTOBS= option, example 120-121 fuzzy merge of data sets, example 18-23
FORMAT procedure
binary searcbing technique for formats 66
building formats dynamically 66 G
control data set for creating formats
dynanucally 64, 65 GROUP BY clause examples
creating SAS time value from character adding new variable containing frequency of
value, example 170-171 BY-group value 132-133
table lookups with large nonindexed data set, calculating percentage of BY group total for
example 63-64 each observation .131
FORMAT statement combining multiple data sets without variable
permanently associating DATETIME format, common to all 92-96
example 169, 171 combining observations based on calculation
formats ilf variables 39
binary searching technique for table comparing all observations with same BY
lookups 65 values 106
converting character variables to numeric subs~tting data set bnsed on calculated
•i variables, example 148-149 ~verugc of BY group 134-135
creating SAS time value from character
value, example .170-171
datetime variables, example 20
determining type of variable's content, I
example 150-151 HAVIN'.G clnuse examples

table lookups with large nonindexed data set, com~ining observations based on calculation
example 63-64 of variables 39
unexpected results from '<iefault BEST12, comp~ring all observations with same BY
format 163 ".alues 105-106
frequency count of BY-group value, example subsetting data set based on calculated
132-133 ~verage of BY group 134-135
FROM clause examples
adding new variable containing frequency of
BY-group value 133 i
calculating percentage of BY group totnl for
each observation 131 IF statement, subsetting
combining multiple data sets without variable adding variables from transaction data set to
common to all 94 master data set, example 74-75
combining observations with inexactly comparing all observations with same BY
matching variables 23 v~ues, example 104
combining observations with no common table lookups with unindexed lookup data set,
variable 26 ekample 32-33
comparing all observations with same BY IF-THE~/ELSE logic examples
. ; vallles 106 combining observations with fuzzy merge
generating every combination between data 21
sets with index 91 delaying final processing of observations 85
generating every combination of observations merging data sets by common variable
between data sets 79 16-17
generating every combination of observations processing two--dlmensional arrays 33
with common column 81 simulating LEAD function by looking ahead
subsetting data set based on calculated ai observations 121
average of BY group 135 IF-THEN logic examples
table lookups with composite index and accessing observations from beginning and
duplicate values in data set 68 e11d of data set 112-114
table lookups with indexed lookup data set applying transactions to master data set based
30 011. common variable 48-49
table lookups with large indexed data set 72 calcul~ting totals across BY group for
table lookups with large nonindexed lookup producing totals 126-127
data set 65 random matching of observations 35-37
table lookups witli small lookup data set 62 resetting variables to original values 49
i ,1..1.U,A..CiA: I .1.Ui;I
I
i ; 1:: !
table lookups with. small lo~kup data set INPUT function examples j dI:
61-62 ! converting character variables to num~ric
updating master data sets ':'ith nonmissing
values 76-77 j
.variables 148-149 j
creating SAS time value from character: ·
Ii •·
IF-THEN statement 1 value 170-171 i I!
ensuring processing of lastjobservation 22 determining type of variable's content 11:i
_N_ option 43 I 150-1S1 : · ,I
IF-THEN statement examples] INTCK function, example 172 i l:l
adding values to all observations 43 interleaving
combining observations wi~h fuzzy merge See SAS do.ta sets, interleaving
20-21 . INTO clause, example 152-154
obtaining lag value of variable within BY _IQRC.... automatic variable
group 122-123 : checking value with %SYSRC autocall
preventing SET statement from reading past macro 176 :: !
end of data set 18; 20 error-checking capabilities 176 !I
processing two-dimensional! array 33 summary of use, table 10 ,
table lookups with unindextd lookup data set, _IQRC.... automatic variable examples ·j 1
. example 32-33 I See also unexpected _IQRC_ conditiclnj'

examples : !',
IN= data set option, examples
adding variables from tran~action data set to applying transactions to master data using
master data set 74-75i index 53-55 I Ii I/
applying transactions to m3iiter data set based combining and collapsing observation~ bYi:
on common variable 4'8-49 common variable 61-S2 i I I1I
error checking for a.11 statements using 1i 1'.
comparing all observations I'with same BY
values 104 !
KEY= option 181-183 i
1
i1:
generating every combination between da ii.
interleaving data sets based .ion common
1 sets with index, example 89-90 ! I'! :
variable 99-101
removing observations from master dat~; 56,
merging data sets by commbn variable
10-11 I 68
routing execution upon occurrence of: !
i
index
accessing data. directly 5
unexpected condition 179 !. i
table lookup with composite index and Ii
definition · 5 1
forcing pointer to beginning of index,
duplicate values in date. set 66, 68 i
table lookup with ind.exed lookup dat~ s~ti i
example 86-90 :
index requirements for combining data sets
.28-29 ·• i Ii ii
table lookup with large indexed lookup data.
BY-group processing 11 I
generating every combinati~n between data
~et .11-12 :
I
I! :
,,· ,
sets with index, example 88 ' 11
merging data sets by commtin variable, I!
exa.mple 16 ! J 1-i
I.I
merging observations based :on common joining tables
variable, example 46 1 · See tables, joining
'II
updating data sets 9 i
indexed data set examples 1
I
applying tnnsactions to master data using
index 58-55 i K I
combining and collapsing o~servations by KEEP= option examples . !.
common variable 50-52 applying transactiops to master data usi ili:
generating every combinatioh between data
sets with common variable, example
· in~ex 54 • . ; !ii•
simulating LEAD function by looking ahead
86-91 i at observations 120-121 '! li
removing observations fromimaster data 56 KEY= option, MODIFY statement :i ·
table lookup with composite '.index nnd See also error checking
duplicate values in data :set 66-69 accessing data directly 5 ! 1;
table lookup with indexed lobkup data set applying transactions to master data using :
27-30 ,
!
I
Index 53-55 ! Ii
table lookup with lnrge inde1ed lookup data automatic creation of -IORC.... a.utomatio I
set 70-72 i variable 176 I ::
informat examples I error checking for all stateme~ts using :t .
converting character variables to numeric KEY= option, example 180-183; :! .:
variables 148-149 I removing observations from master da~a.:j ji:.
creating SAS time value fron;i character example 56-S8 : h•
value 170-171 . i routing ·execution upon occurrence of i l:i j:
determining type of variable' content 1S1 unexpected condition, example 178~ ·79
I
' i
I
J.UU 1naex
KEY= option, SET statement many-to-one relationships

See also error checking See inwtiple observations, combining with
accessing data directly 5 multiple observations ·
automatic creation of JORC_ automatic See ~ne-to-many or many-to-one
variable 176 relationships
combining and collapsing observations by master: data set examples
common variable, example 50-51 addi~g variables from transaction data set to
error checking for all statements using master data set 74-75
KEY= option, example 180-183 applying transactions to master data set based
generating every combination between data on common variable 48-49
sets with index, example 86-91 applying transactions to master data using
table lookup with composite index and index 63-55
duplicate values, example 66-67 coml?ining and collapsing observations by
table lookup with indexed lookup data set, common variable 50-52
example 27-28 resetting variables to original values 49
table lookup with large indexed lookup data routing execution upon occurrence of
set, example 70-71 unexpected condition 178-179
key variables 27 updating observations in place 53-54
updating with nonmissing values from
transaction data set 76-77
L match-merging of data sets
See also SAS data sets, merging
!AG function, exainple 122-123
adding variables from transaction data set to
Ing values, obtaining for variable within BY
lllaster data set, example 74-75
group, example 122-123
calcu~ating percentage of BY gi:oup total for
LAST.vnriabie 11
~ach observation, example 129-131
calculating totals across BY group for
finding match for consecutive duplicate
producing totals, example 126-127
values 91
collapsing observations',within BY group into
merging data sets by common variable,.
single obse1·vation, example 140-141
example 16-17
purpose and use 11
separating unique observations from merging observations based on common
duplicate observations, example ':ariable, example 46-47
110-111 principle of match-merging, illustration 8
LEAD function, simulating 120-121 updating master data sets with nonmissing
LEAVE statement values, example 76-77
comparing all observations with same BY MEANS, procedure, example 129-131
values, example 105 MERGE statement
delaying final processing of observations, one-to-one merging 7
example 85 simulating LEAD function by looking ahead
left join, example 105-106 af observations, example 120-121
left outer join 30 summary of use, table 10
LENGTH function ' MERGE' statement with BY statement,
I
counting occurrences of string within examples

variable 164-165 adding variables from transaction data set to
i LENGTH function examples ~aster data set 74-75
I'' I assigning length to converted character appl;i,ng transactions to master data set based
. on common variable 48-49
'1 variable 161
!1i
f LINK statement calcul~ting percentage of BY group total for
accessing observations from beginning aud e~ch observation 131
!1 mergihg data sets by common variable
ii end of data set, example 112-114
: :I combining observations with fuzzy merge, 16-17
d ! example 18, 20 mergiilg observations based on common
~ ·i
::1 returning execution to LINK statement, v~riable 46-4 7
example 21' updati;ng master data sets with nonmissing
r1 values 76-77
1!i 1 mergingidata sets
1 1,M See SAS data sets, merging
missing values
iii' l
I,:1'I macro variables determining type of variable's content,
' 1· 1 converting numedc to character variables example 150-151
l:. ' and vice versa, example 152-153 norunissing values not replaced in transaction
!:1 l manipulating data sets d~ta set during update 9
1!'i , See SAS data set~, manipulating replacing period (.) witll blank, example
:il : many-to-many relationships 4 149
!
i
i
.tllU~4! I' .l.iJ .1.
i
replacing .with more meaningful values,
. : I.;.
conv:erting variable types from characte~ to
example 17 I numeric, example 152-153 I .!i I
updating' master data sets 'With only determining type 'of vario.ble's c~nteni, ii!'
nonmissing values, ~xaitiple 48-49,
76-:-77 .. , i.
exam1>le 159:-151 ; i Ii
determining whether variable is char~ctJxl or
MOD function, example 161..:.l"163
MODIFY statement : .
,numeric, example 152-154 · : i'
-NUMERIC- keyword, example 124-125!
See also KEY= option, MODIFY statement • ,, • ' I !I
See also POINT= option, MODIFY !-i
statement ! 11
BY-group processing_ 11 j
compared with UPDATE stiitement 11-12
o •
:, II·
nonmissing values not repla~ed in transaction OBS= data set option, example 161-1631
data set during update j 9 observations :
purpos'! and use 11 j accessing data directly using observati~~ i
summary of use, table 10 i numbers . 5 , : Ij •
accessing from beginning and end of dnia !set,
updating data sets 9 i
MONTH function, example 1:12 example 112-114 [ j i [·
multiple observations, combining with multiple adding to data sets based on specific var.i~. le
observations 73-106 i value, example 118-119 iH1·
adding variables from transaction data set to adding to end of data set, example ll_Sf.1~7
· master data set · 74' I calculating percentage of BY group totamlr
combining multiple data sets without common each observation, example' 129-iail .
va11&ble 92-;-95 , i I , . ,I
comparing all observations with same B¥J ·
comparing all observations with same BY values, example 102-106 1 1:1 .
. variable 102-:-106 · , I delaying fi~ disp~sition until completio4 1of
delaying fin.al disposition of pbservations processing, example 82-85 i \j I-
82-:85 , : I ensuring processing of last observation 12 I~
generating.every ci>mbinatior between data expanding single observations into multip1 1
· sets with index available ·86-91 observations, example 142-143 : 'IiI.!
generating every combinatio-h of observation ,. • I I. ' :
remo~ng observations from master datafj

· based on common variable 80-81
example 56-59 , ':
generati_ng every combinatio~ of observation
between data sets 78-79
reshaping ob~ervations into multiple : I I
interlea-ring data sets based ~m common
variables, example 144--146 I
variable 99-101 i sepnr~ting unique observations from
duplicate ob_servations, example ;
interleaving nonsorted data sets 96-98
updating master data set with only 110-111
nonmissing values 76-17 simulating LEAD function by looking ahen
multiple observations, combinitig with single at observations, example 120-121
observations I updating in place 53-54
See single observations, combining with observations, collapsing
multiple observations I calculating totals across BY group for
! I
producing totals, example 126-128 .
1
I collapsing observations in BY group in~o !1 .
N I single observation, example 140-141 I'
I combining and collapsing based on conitrt~h
_N_ option, IF-THEN statemeJt 43
NOBS ... option, MODIFY stateinent 35-36
variable, example 50-52 : "I I,
including only matching observations in ·I
NOBS- option, SET statement I
accessing observations from beginning lind
output data, example 52 : I
observations, combining ! I
end of data set, examplei 112-114 See multiple observations, combining ,vith'.
adding values from. last observations to all multiple observations .
observations, example -f4-45
See single observations, combining wit~
combining observations with !no common
v~riable, example 24,2~
multiple observations i
See single observations, combining with
delaying final processing of observations,
example 83-84 • · I I single observations
table lookups with small look)ip data set, ON clause, example 105
example 60-61 I. one-to-many matching . ,! .
numeric precision and ROUNDlfunction calculating percentage of DY group total r11l :
i31-1ss I each observation, example 129-1~1 11 I·
numeric variables .j combining observations based on calculation
converting character variable to numeric, of variables, example 38-40 ! Ij fl;
example 148-149 one-to-many or many-to-one relationships !3T :
1 i
j i
: 11:
!
I
i
I
1- :i
: i
i
'
one-to-one merging of do.ta sets converting character variables to numeric
purpose and use 7 / variables 148-149
simulating LEAD function by looking ahead cr~ating SAS time value from character
at observations, example 120-121 I value 170-171
one-to-one reading of data sets 7 determining numeric vs. chru·acter
one-to-one relationships 3 i variables 153
ORDER BY statement, SQL tab le lookups with large nonindexed data
0
creating table with attribute informntion from ! set 65

another table, example 156 '
I
I
generating every combination of observations
il
with common column, example 81 I
interleaving nonsorted data sets, example Q I
97 t
outer join question mark(??) format modifier, example
full 22 149, 151
!
left 30
OUfER UNION set operator
compared with SET statement in DATA
step 97
interleaving nonsorted data sets, example randdm samples
96-97 cre~ting
I
equal-sized random samples,
OUTPUT statement examples j example 161-163
adding observations to end of data set 116 random matching of observations, example
calculating totals across BY g1·oup for ;35-37
p1·oducing totals 126-127 RANUNI function
table lookups with unindexed lookup data set creating equal-sized random samples,
32-33 Iexample 161-163 .
random matching of observations, example
135-36 .
p REMOVE statement, example 56-58
RENAME= data set option
percentage of BY group total, calculating for
app}ying tr:msnctions to master data set based
each observation, example 129-131
PICTURE statement . ion co=on variable 48-49
creating SAS time value from character calcµlating totals across DY group for
value, example 170-171 !producing totals, example 127
i POINT= option, MODIFY statement converting character variables to numeric
accessing data directly 5 . 1variables, example 149
i random matching of observations, example simulating LEAD function by looking al1ead
!;
35-36 iat observations, example 120-121
L POINT= option, SET statement tably lookups with small lookup data set,
l
accessing data directly 5 ~ :example 60-61
i'
,·. accessing observations from beginning and updating master data sets with nonmissing
,,i
I
end of data set, example 112-114 1vw.ues, example 76-77
:-1 adding values from last observations to all REPLACE statement
observations, example 44-46
:I applying transactions to master d111!1 using
applying transactions to master data using
il ;index, example 53-55
I''
1 !'
index, example 53-55 random matching of observations, example
I combining observations with no common :36-'37
I varinble, example 24-25 RETAIN statement
i/ comparing all observations with same UY comparing all observations with same BY
i values, example 102-106
~alues, example 104
delaying final processing of observations,
11 RETAl/,'l' statement examples
example 82-86
I applying common operations to group of
table lookups with small lookup data set,
~ariables 124-125
I example 60-61
processing information in groups collapsing observations within BY group into
i See array processing ~ingle observation 141
I I See BY-group processing RETURN statement
! I punctuation marks returning execution to LINK statement,
extracting character string without breaking ~xample 21
1
1 text, example 166-167 ROUND function
I
Pur function examples
I
I.
I I
I !
applying formatted values to new, variable
MACROUND macro and fuzz factor
138-139
63,64 num~ric precision considerations 137-138
:! I
' I
,j I
I i
'I '
i
I
iI
i
I
r
! Index !::193
i
rounding numbers with paper-and-pencil
results, example :136-1?7
table lookup with large nonindexed daL
63-65 . . . ;, ,
IJ1
1 Ii .
without fw:z factor, example!· 138 table lookup with small lookup data set !,1 !
i 60-62 , !' ,
' . I.I :
table lookup with unindexed lookup dataiset
s 31-34 · !, !
tools for combining 9-12 1i i'
SAS data sets 1 tools for combining, table 10 l i:I

compared 1with SQL tables (note) 22 updating 9 1·: !i
creating table with attribute information from updating master data set with only I,I.
another table, example 155-157 nonmissing values 76-77 . , I' i
creating u~er-defined formats' 64, 65 variable values no( matching ex~tly 1s..!23
preventing SET stateJnent fr~1m_ reading past SAS data sets, interleaving ; 1;1 I
end of.data set. 18, 20,121 based on common.variable, example 99J1b1
tra~sposing, example 144..:146 nonsorted data sets, .example 96-98 · i I; I; i
SAS data sets, combining 6-9 j purpose and use . 7 . .: !
1; 1 L
See also SAS data sets, merging SAS data sets, manipp.Iating 107-146 i !:i 1·
See also tables, joining j accessing beginning and ending observatidns
adding values from last observations to all , 11_ 2-114 ,· ' ; J:: .
observations 44-'.45 I adding observations based on specific · H
adding values to ~ll observati~ns 42-43 V!l,riable vaiue i 115-117 Ii ;
1
adding variables from transa~tion data set to adding observations to end of data set i
I
master data set 74.:..75 · 115-117 ' Ii
applying transactions to mast~r data set based adding variable 'containing frequ~ncy of: H
on common variable 48-'49 BY-group value 132-133 ! i'l ,
applying transactions to mast!n- data set using applying ~ommon operations to group o_f ·I:. j _
index 63-55 · i variables 124--:-126 i , :
based on calculated column 38-40 calculating percentage of BY group tota~ f?r :
choosing between UPDATE and MODIFY each observ,ation 129-131 i : I! I
statements 11-12 I calculating totals across BY group 126-;-1~8;
combining ,and coliapsing obs~rvations based collapsing observations in BY group into [:! I:
on common variable 50-f52 single obse1·vation 140-141 ! !• p
combining ,multiple data sets without common compafing variable ,values by looking nhea~:
variable 92-95 · i 120-121 , · . : I:·:
comparing '.an observatio~s with same BY expand,ing single observations into multiple :
variable 102-106 I observations 142-143 • I: :
concatenating 6 . ! obtaining lag value of variable within B¥· I! ·
delaying final disposition of observations group 122-123 · ! d
82-85 i reshaping observations into multiple : ! I
generating every combination ~etween data variables 144-146 1 !i •
sets with index available 186-91 rounding numbers to pencil•and:paper i 1./
generating every combination :of observation results 136-139 1,,
based on common variabl,e 80-81 separating unique observations from 11
generating every combination ;of observation duplicate observations 110-'-111 I::
between data sets 78-791 simulating LEAD function 120-121 Ii
including only matching obser~ations in subserting 108-10~ , l' ,'.
outp~t _data, example 521 subsetting based on calculated average of B~'
interleaving, 7 ·I group 134-135 •. ; 1,! 1:
interleaving based on common variable SAS data sets, merging ·• ! 1 i 1!
99-101 I See.also m?tch-nierging of dat~ s~ts ; i:1 jl
interleaving nonsorted data sets 96-98
match-merging 8
methods for combining 6-9
'
addmg variables from transactwn data setltq
master dataset 74-75 : •
applying transactions to master data set ~a:J,d
}r
no common variable 24-26 on common variable 48-49 . i 'I: i I:
one-to-one merging 7 calculating percentage of BY group total ·ror, I:
one-to-one reading 7 , each observation, example 129-131 : 1•
processing information in grotips 11 SAS data sets, merging · : ;i '
random matching of observations 35-37 comparing all observations with same BY .!
removing observations
. from
. m'aster
I data set values 102-106 · i :j
56-59 , [uzzy merge of data sets 18-23 ' I! .
table lookup using composite ihdex 66-69 · match-merging 8 !II
table lookup 'with indexed looktip data set merging data sets by common variable i 1. •
21-30 .. I 16-17 I ·
1
table lookup with large index~ lookup data merging observations based on common I i,,I
set 707 72 variable 46-4 7 · i!
!·j
l
Ii.I! I
ii
!Il(
I' ~
i'
r!
I Ii
1naex
! i i
I iI SAS data sets, merging (continued)

one-to-one merging 7
SELECT statements, SQL
I automatic production of repo1·ts 23
Il simulating LEAD function 'by looking ahead sequential access 5
£1t observations 120-121 SET sl,atement
iI
updating master data sets with nonmissing
values 76-77
See also KEY= option, SET statement
See also NOBS= opti~n,: SET statement
I
i SAS data sets, subsetting See \iiso POINT= option, SET statement
: based on calculated average of BY group adding values to all observations, example
i 134-135 . ;43'
I creating equal-sized random samples or com~iniug observations with fuzzy merge,
subsets 161-163 'example 18 ·
!! producing equal-sized or exact-sized subsets compared with OUTER UNION set
161-163 ;operator 97 ·
i simple subset 108-109 generating every combination between data
I
I SAS data sets, updating :sets with index, example 86-91
I
I applying transactioris to master dnta set based interleaving data sets based on common
I on common variable 48-49 ~ariable, example ~9...:101
!I principles of updating, illustration 9 prev~nting from reading past end of data
I requirements 9 ~et· 18, 20, 21
! SAS datetime values, creating 168-169 subsetting data sets, example 108-109
SAS time value, creating from character value sumlhary of use, table · 10
170-171 table: lookups with small lookup data set,
II SELECT clause examples example 60-61
I I
adding new variable containing frequency of SIGN function, example 138
! BY-group value 133 single Jbservations, combining with multiple
i calculating percentage of BY group total for ob~ervations 41-72
each observation 131 addirig values from last obse1·vations to all
combining multiple data sets w.ithout variable 6bservations 44-45
common to all 94 adding values to all observations 42-43
combining observations based on calculation applying transactions to master data set based
of variables 39 on common variable 48
combining observations witli. inexactly appl}'.ing transactions to master data using
matching variables 23 · index 53-55 ·
combining observations with no common comb'.ining and collapsing'.observations based
variable : 26 · · cin common variable 60
comparing all observations with same BY including only matching observations in
values 106 6utput data 52 ·
generating every combination between data mermng observations based on common
sets with index 91 ~ariable 45..!.47
generating every combination of observations removing observations from master data
between data sets 79 ~6-59 '.
generating every combination of observations table ~ookup using composite index 66-69
with common column 81 table Jookup with large indexed lookup data
interleaving nonsorted data sets 96-97 set 70-72 .
reports not produced automatically 23 table lookup with large nonindexed data set
subsetting data set based on calculated 63-66
average of BY group · 135 · table iookup with smnllloolcup data set
table lookups with composite index nnd 60-62
duplicate values in data set 68 single observations, combining with single
table lookups with indexed lookup data set obs~rvntions 15-40 ,
30 based ;on culculated column 38-40
table lookups with large indexed data set 72 merging by common variable 16-17
table lookups with large nonindexed lookup no common variable 24-26
data set '65 ' randoin. matching of observations 36-37
t£1ble lookups with sm'all lookup dnta set 62 table lpokup with indexed lookup data set
SELECT statement examples 27-30 1
error checking for all statements using table l~okup with unindexed lookup data set
KEY= option 181, 182 si.:.34
error checking with ....Iorrc_, automatic variable vnlues do not match exactly 18-23
variable 53..:.55 _SOK r~turn code
generating every combination between data error bhecking for all statements using
sets based on·common variable 86, KEY= option 181-183
89-90 ' purpose, table 177 ,
routing e."Cecution upon occurrence of routink execution. 'upon occurrence or
unexpected condition 179 unexpected condition, example 179
I
;i·
·!
. Index IIliss
, ,.; I'
sets
'
sorting requirP.ments for manipulating data
!
string function examples i
counting occurrences of string within i I j·
IPri
!'
See also ORDER BY statement, SQL variable 164-165 ! J:l 1,
BY-group processing , 11 ) extracting character string without brealdng
calculating totals across BY group, example text in middle of word 166-167: ':! j1
126-127 I subquery, SQL WHERE clause 59 •: i;
collapsing observations within BY group into subsetting data sets '. 11•
single observation, example 140 See SAS data sets,·subsetting I! I:
combitrlng and collapsing obkervations by See ~ERE clause, SQL : 1:1 11
common variable, example 50
I
SUBSTR function, example 166-167 ' l'!r I:
merging data sets by common variable, SUM function examples : i, I:
example 16 ' I.
. calculating percentage of BY group total '(dr
•
each observation _131
I l l I·
i j j !!
separating unique observations from
duplicate observations, Jxample 110 combining multiple data sets without variable
updating data sets , 9 ,I
I' comm.on to all 92-95 ! Il I!
sorting variables within observation SUM statement examples ! 11 I:
adding to data sets based on specific v~rilil~e
See bubble sort . · -1'. .
value .119 ,,.·: :I I1j iI
SQL procedure
calculating totals ·across BY group for i Ii :
conditions affecting PROC SQL optimizer 18
summary ~f use, table 10 1.
producing totals 1~6-127 : I! :
collapsing observations within BY group fato
SQL procedure examples , 1
adding new variable containing frequency of
single ob~ervation 141 . i l'l'I;
SYMPUT routine, example 83-84 : i:1 ;
BY-group value 132..:133 %SYSRC autocall macro · : 1: i I;
calculating percentage of BY group total for applying.transactions to master data us)n~ Ii
each observation 131 !
combining' multiple.data sets_ ,without variable
index, example 53-55 ! I:
j;i
checking val~e o_f:._--'.IORC,_ automatic : -1:i !
common to all 92-95 I variable 176 : , :i
combining ·observations based on calculation remo~ng observations from master data,:!
.of variables 3~-40 ! example 56, 58 !I !jI·
combining observations with inexactly
.
summary of use, table 10 . i ,!
I
I
matching variables 22-23 table fookup with ~mposite index and : :

combitrlng observations with rio common duplicate values, example 66, 68 i •1•1
variable 25-26 ! table lookup with indexed lookup data 'et~!
comparing all observations with same BY example 28-2~ · .l:i
val~es, 105-106 • : I 1,1
,'.I
generating every combmation' between data
sets with index' 91 I
generating every combinationl of observations
T It'
ii
between data sets 78-79 table aliases 94 i ·t'
generating every combination! of observations table lookup examples !I
with common column 80-81 composite index and duplicate values in r:'
.,_
interleaving nonsorted data sJts 96-98
removing observati~ns from ritaster data
transaction data set 66--69 I ,I
large indexed lookup data set 70-72 ! ·i·'
I
58-59 . i large notrlndexed data set 63-65
lookup data set indexed 27-30 ·:1
·,.
subsetting data set based on calculated
average of BY group 134-135 lookup data set unindexed 31-34 !· i:
1 1·
. I
small lookup data set 60-62 1 ; '
table lookups with composite index and
duplicate values in data s~t 68-69
tables, joining . · !:
Ii i
• . I combining multiple data sets without va:ri~ble
table lookups with indexed lookup data set
30 .. ' [ common to all, example 92-95 i
I: iI;
combining observations based on calculli.ti~1·:
table lookups with large inde~ed data set 72
table lookups with large notrlridexed data
of variables, example 38-40 ] i • I
l comparing all observations with same B?-71 [ '.
set 65 I values, example 105-106 ! •1· 1l.:
table lookups with small lookup data set 62 · full outer join 22 ! ·'
SQRT function, example 39 i fuzzy merge of data sets, example 22-23. :i I:
STOP statement · : generating every combination of observatiops
explicitly stopping DATA stepi example 21,
55 i
bet~veen data sets,. ex~mple 78-79 ;J: ! i!'.
generating every combmation of observation·s
stopping execution after unex~ected ---1ORC,_ with common column, example so+ 8~1 : •
condition 29 interleaving nonsorted data sets, example I;i
stopping execution after unexiiected ---1ORC,_ 96-98 · 1 !
condition, example 58 i left outer join 30 i 1:i ·!

:: ·:,·1
i .:: .
'I ·I!i
I .. i
i ;1;1
ii •i
:!i
I :'11
I '
',:III
,
: . ,.
.maex ··:1 i·
I
tables, joining (continued) rem~ving observations from master data 58
table lookup with composite index and routing execution upon occurrence of
duplicate values, example 68 [unexpected condition i77-179
table lookup witl1 fad_cxed lookup data set, tabli lookup with composite index and
· example 30 , :duplicate values in· data set 68
;I tabl~ lookup with indexed lookup data set
table lookup with large indexed data set,
I! example 72.
table lookup with small lookup data set,
"29
tablJ lookup with large·indexed lookup data
I
;' example 62 · ~et 72 ·. ·
tables, SQL UNIQUE option, SET statement 66-67
compared with SAS data sets (note) 22 UPDATE statement ·
temporary arrays com~ared, with MODIFY statement with BY
advantages 34 . statement 11-12 : i
coll_lpared with variables 34 nomµissing values not ~laced in transaction
table lookups with unindexed lookup data set, data set during update 9
example 32:-33 · summary of use, table : 10
time values upd11;ting data sets 9 , 1
creating SAS datetime values, example updating data sets ·, I

168-169 . .. See SAS data sets, updating
creating SAS time ·value from character user-defined formats, example 63-64
value, example 170..;.171 utility Jnd function examples 147-174
I •: fuzzy merge ofdata sets, enmple 19-22 bubble sort of variable values within one
'· , T1ME11. informat, example 170-171 . Qbservation 158-160
TODAY function · , calcu).ating age of person 172
adding obse1•vations to data set, example conve1'ting variable types from character to
119 111umeric 148-149 ·
calculating age of person, example 172 counting occurrences o{siring within
tools for combining data sets 9-12 ~ariable 164-165 •
choosing between UPDATE and MODIFY creatlng equal-sized random samples or
statements· lf-12 ' ·.· s'.ubsets 161-163 ' :
1
·1 error-<:becking tools 176-177 creating SAS datetime values 168-169
I
list of tools,· table 10 m·eating SAS time value from character

! processing information in groups 11 . ';alue 170-171
I:i totals, calculating across BY group, example creating table with attribute information from
126-128 ·a:n_other table 155-157
transaction data set examples deterinining numeric vs. character variables
addi11g variables from transaction data set to 152-154 ..
master data set 74-75 deterinining type of vatiable's content
applying transactions to maste1· data set based 150:...151 . I
1 on common variable 48-49 extntting char~cter string without breaking
applying transactions to master data using t~xt in middle of word 166-167
•index 53-55 i
combining and ~llapsing observations by
:I;
common variable 50-52 \t'
i I I
table lookup with composite index and !

I : ~ • .
' duplicate values in data set 66-69 VAR statement, TRANSPOSE procedure
updating master data se_t witli nonmissing 144~146 !
values 76-77 ' variables ,:
TRANSPOSE procedure, example 144-146 adding-n~w variable containing frequency of
TRANWRD function, example 164-165 B;Y-gro'up value, example 132-133
two-dimensional arrays applying 'common' operations' to group of
processing two-dimensionol arr.ay 33 v!riables, example ·:124.:.125
table lookup with wiindexed lookup data set, bubbl~; sort of variable values within one
; example 32.:.33 · · observation; example '168-160
comp~red ,Yith temporBry fll'rays 34
comparing variable values by looking ahead,
u ei~mple ;120-121 .
conve#ing variable types1from character to
unexpected ....lORC..... condition examples numeric,example 148-149, 152-153
applying transactions to master data using cowiting occurrences of string within
index 55 v~rinble, example 164-165
combining and collapsing observations by creating SAS time value from cl1aracter
i common variable · 52 vltlue, example 170..:.171
generating every combination· between data deterniining 11umeric vs. character variables,
sets with index· 89 · e~ample 152-154
l
i
:, i(i
Inde~d·1·197
! l;!;I,
:: !,,,:
determining type of variable's content, generating _eve_ry combination betwee~ ~~~a
example 150-151 · :sets with mdex 91 ,; !\'I!
directly accessing data based on variable generating every combination of observ'atibns
values 5 with common column 80-81 ; j1 Ii
obtaining lag value of variable within BY
group, example 122-123
removing observations from master dat~ 1,
: . 58-59
j,
: ii ;
II
subsetting queries of dictionary tables i i i
renaming to avoid overwriting, example
152-153, 155 i J! ;
60-61 table lookups with composite index and! T 1
resetting to ofiginal values, ;example 49
reshaping observations into; multlple
: duplicate values in data set 68 ' Il ;
table lookups with large indexed data·set j 72
variables, example 144-146 table lookups with small lookup data selj :s2
unexpected results during BY-group WHERE statement, example 108-109 '. !i ~
processing 99-101 ' ll I
!! l
VNAME call routine , ' 11 :
expanding single observatiohs Into multiple z, I' I
IJ ~
observations, example 142-143 iil ;
'
zs.:informat, example 149
w 6
l
,:J
I' I•
,!
WEEKDAY function, example j 119 6, ipCormat, example 149 tli

I
I;
I
WHEN statement I 1-l !
combining and collapsing observations by ,j '
common variable, exam.pie 52 lI
8 !i
producing equal-sized or exJot-sized subsets, ,I
example 163 · 8. inforrnat, example 149 I
'i I'
I
I
WHERE clause, SQL : I !11!
BY-group processing with MODIFY
statement 11 : Special Characters Ii I:
•• (asterisks) representing exponentiation j ~~
subquery 59 ·
WHERE clause examples 1.1 ;,
?? format modifier, example 149, 151 : I' I 1
combining multiple data sets without variable
common to all 92-95 ! I
I 1:
i Ii
,,[
combining observations with no common i 1:: l

variable 26 I, Ii '
'I '
i l,l 1l
rl !:
i11:
r: 1
ri1:
Your Turn : 1:i
.I,·
; lill!
j 1i1 :
i 1:1 •
. I' .
If you have comments or suggestions about Combinin~ ~,~Modifying SAS

Data Sets: Examples, Version 6, First Edition, please se~ · them to us on a
photocopy of this page or send us electronic mail. ; 1:J i
i Iii i
' 11 i
: Ii i
For comments about this book, please return the phot~c9 yto
' 11
SAS Institute Inc. i :i
Publications Division '!. .

SAS Campus Drive d I
Cary, NC 27513 . '1
1
email: yourtum@unx.sas.com: ,:1 ;
For suggestions about the software, please return the phot copy to
: I:: i
SAS Institute Inc. : ,,i '
Technical Support Division i
.i
SAS Campus Drive
I
Cary, NC 27513 !
email: suggest@unx.sas.com
; 1:11=
~dditional D.ocumentation
!
I
For a complete list of SAS0 publications, yoJ should refer to the
current P11blicatio11s Catalog. The catalog is produced twice a
year. You can order a free copy of the catalog by writing, calling,
SAS Institute.Inc.
Book Sales Department
SAS Campus Drive
!~,
; Iii 11:
; 1!1.,
Telephon~: 9-677-8000, then
press 70Q_l ,
· Fax: 919-,677r:8166
or faxing the Institute (or, access the on line version of the Cary, NC 27513 E-mail: sasbdokl @vm.sas.com
P11blicatio•1s Catalog via the World Wide Web): WWW: hltb:)/www.sas.com/
''
'
I 1:1 I:
II Ja~obs III, Charles A. ( 1992), "DATA St~p Programming Ill SAS' Gulde to the SOL Procedure: Usage and
Using the MODIFY Statement," Obse11•atio11s, 2(1), Reference, Version 6, First Editlori' . .... (order #A56070)
10-42 ................................ !.... (order #A56305) fully describes the SAS System's impl~1l,entation of
provides numerous examples and an in-depth explanation of Structure Query Language. It illustrates SQL through simple
the MOOIFY statement. · ! queries and provides documentation:oh: ~dvanced features.
• i 1i!j'
1-,'
ra "MODIFY and Indexes: Beyond the ~aslcs," Im SAS' Software: Changes and Enhancements,
in "SAS Technical Tips," (1994), SAS Co111)111111icatio11s, 20(1). Release 6.10 ............. ·..... ,i. ,l:J jL .... (order#A55l20)
i documenti: the changes and enhanceri1~rit~ to several SAS
1111 SAS" Programming Tips: A Guide to Efficient software products for Release 6.09. ) Iii Ii
' ,,,!!
SAS' Processing ................... .! .•• (order#A56150) I 1!i 1,
provides mo1·e than 100 tips for improving the efficiency of DI SAS'" Technical Report P-252, s4~ Software:
your SAS programs. : Changes and Enhancements, ; Ii!
· I I
I:
I Release 6.09 ................... ; {/,:: .... (order#A59l69)
Ii SAS• Language and Procedures: U~age, documents the changes and enhancemeil~:S to several SAS
Version 6, First Edition ............. l ... (order #A56075) software products for Release 6.09. j 1·:11:
I 1
SAS• Language and Procedures: Usage 2, j ·1 ,;
yersion 6, First Edition ............. j ... (order #A56078) m SAS"'Technlcal Report P·242, SA.!P Software:
Provide 1ask-oriented ex.amples of the majpr featu1·es of base
SAS software. · j
Changes and Enhancements, ! 1 ! ' "II
.
1:
Release 6.08 . .................. ; . [1. :•... (order #A59159)

111. ·
I contains documentation on the MODUi)fstatement and

m SAS'" Language: Reference, Version 6, provides the latest features and change¢.l~n Release ~.08 base
First Edition ........................ JI • •• (order #A5607<>) SAS software, SAS/GRAPH softwa~•~,S~S Screen Control
provides detailed reference information abi:)ut SAS language Language, and other products. i !11 i
. I -11!
statements, functions, formals, and inform~ls; the SAS I l'I'
Display Manager System; the SAS Text Editor; and nny i!lll SAS9 Technical Report P-222, Changes and
other element of base SAS software excep\ for procedures. Enhancements to Base ·sAg SottWare,
I Release 6.07 ................... ·.,. (:l~i .... (order#A59139)
Ill SAS'" Procedures Gulde, Version 6, ] provides the latest features and change{i : Release 6.07 of
Third Edition ....................... -!- .. (order #A56080) base SAS software. !i,.,i ,I
provides detailed information about the procedures available 1!1'
in the SAS System. j
I 1l
I
1,i
Ill Getting Started with the ·saL Procedure, 1:!
Version 6, First Edition . ............. ! •. (orde1· #A55042) j:i
introduces users to the SQL procedure, a bJse SAS procedure ,:1
that implements Structured Query Language in Version 6 of
the SAS System . ::i
I:,
I
, ;,---- ------------- · o•, • •··· 1 ~ 1F - - -.- - - . ·-·• -,,--.-:,- - -··-·- •1r. -·•- . ._- - - -·--~ ·
,,, ... ,.,.,,..,,""'-'W''"' ., . ..-., ..
!.,- ,,, •...•.. ~..... ,.,,,._, ....... t,.,,.-,•
•. -11:i;;~.
cjj•<•'A··, J'i.,,sAS-lnstiti:Jte
1-, . . . . . .·
ln~··•·n··.....a:, . :,.•- .,, .. ,.,. ,J':. ,-.,..:.",~:,"....:.......,.. ,.,.,.~ •.. :., ...........,.,,;,., .. ;........ }j
, _, ·, .. ......-.,:......... ,. •~.•-·····••''-'"';.... ,.-.. 1.•••,... ,- .. 1J .. ·,..... , . ,.•; .. ,,,." •. ,-.,1 ••,1~ .., .. ,~ .. ,,,.1,--: -f
i · f~ : .1. I SA~H;.amp·u,s Drive
~
.:) ' :' -· ·., .· . _: , : t\
. · ~• ® ;; Ca.ry, NC,27513 .. / ·.,. · . · t:,[
:~"!: ~~i
_ -.): <1A
"t;.i" ·.;1g
.:.;::rl
t~
::.:h1
•::,·"!4
(I
.'.)~.: .. :
.;i.,:
-
:-: .. ·
· · ■. i,se ~gfyupproc_e"ss,itig e.[fecfftJ(!.ly_. . L ( :ft·
·i.. • i·
!(·'.
·:;.
h·. ·. )'.·... ,. __
;.:
). • handling~Ja.S:iwell as:the. SOL pi:ocedµr.ei · HC ···· •,. - ..'.;, ·· · •· :l!)

f::
h·.
;_,
(
~:.
r
!; : ·., \.~
'I !I .
ISBN 1-55544r220-X

Combing and Modifying Sas Data Set

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Combing and Modifying Sas Data Set

Uploaded by

Copyright:

Available Formats

.-,. .

Copyright© 1995 by ~~S Institute Inc., Cary, NC, USA.

Doc P4, l 1JUL95

CHAPTER 1 • An Introduction to Data Relationships Access Methods,

CHAPTER 2 • Combining Single Observations with Smgle

Example 2.7 Combining Observations Based on a Calculation on Variables

CHAPTER 3 • Combining a Single Observation with Multiple

Example 3.9 Performing a Table Lookup with Large Nonindexed Data

CHAPTER 4 o Combining Multiple Observations with Multiple

CHAPTER 5 • Manipulating Data From a Single Source 107

Example 5.16 Reshaping Observations into Multiple Variable 144

CHAPTER 6 • Utilities and Functions 147

Example6.3 Determining Whether a Variable is Charact~r :~r Numeric 152

Ex.ample 1: Routing Execution When an Unexpected Conditio!• pccurs 177

pesign and Production Design, Production, and Prirti. ,g Services

Ginny D,unn, Amber Elam, wn

An Overview of,Methods for Combining SAS Data Sets

Tools/01'.Piocessbig lnforn.iation in Groups ,11

!~ t '• : • lI .·, ~:i:'.

The categories characterized by how observations relate among the data

5489 17000 ' 4222 0 '.18 I

MAINT are relate~,?~ common values for

Access Methods: Sequen~i~I versus, Direct ! lj

An Overview of Method_s for ~0111~ilii~:g SAS Dat~ Sets

two or more data sets, based on one or more YEAR YEAR'.i

011e-to-011e merging is the same as a

Figure 1.8 Match M;erging

-j- 0~4l \~Z - I

Updati11g uses information from

Note that MASTER and TRANS are both

not replace nonmissing values in a master 1990 Xl Yl I

An Overview of Tools for qombining SAS Data Sets 1

6nce you understand the basics .

l-0/IIU//111/g ol/i.> UC/1(1 .Jers ; U <-1mp1er1

SET readsi iianI observatlon

_IORC_** an automatic variable created NA NA NA The value of this variable is a

Tools for Processing

Only one application of MODIFY is: comparable to UPDATE: using MODIFY

Table 1.2 UPDAf,E !;versus MODIFY wit.h BY

Where to G ' '.from Here

□ Array processing and the ARR.4 Y statement. For a complete

Proce1ures: Usage, Version 6,. Fi~-st Edition For more extensive .

the MO~IFY Statement,,. Observations, 2. (1), 4-11. : !. !

In a one-to-one relationship, typically a single observationln one data set is

2.1 Merging Data Sets by a qommon Variable, SpecifjfJg

2.7 Combining Observations Based on a Calculation on ariables

II to an obse1;vation. Use IF-THEN/ELSE logic to specify the origin of the

Input Data Sets 11

Both ONE and TWO a e sorted by ID. 1

OBS ID NAME DEPT PROJECT OBS ID NAME PROJHRS

Resulting Data 4e1t ·

Create COMBINE by merging data combine;

Assig11 valrtes to ORIGIN accordi11gto the if inland in2 then origin='both'; ·

Replace missi11g val11es with more if dept=' ' then dept='NEW';

Example 2.2 Combining Observations When Variable

,j : ' ' •· r 'l

Input Data Sets

Resulting Data Sets

MATCH I was created with the DATA step. OBS 'l'IIIBl

l '23NOV94:0910l 1 1110 23NOV94:09:IIII \ 200

processed, ONEDO~E and TWODONE are

[ If the differe11ce between TEMPT] and else if

[ met, write an observation that co11taills tlze

[ you know that you are not going to find a I