Professional Documents
Culture Documents
SAS Programming 1 - SAS Programming Essentials PDF
SAS Programming 1 - SAS Programming Essentials PDF
t e r
®
SAS Programming M a 1:
r y i o n
t a t
Essentials
u
r i e r i b S
p
o dis SA t
P r f
S f o r o
A t d
S No tsi Course Notes e
ou
SAS® Programming 1: Essentials Course Notes was developed by Michele Ensor and Susan Farmer.
Additional contributions were made by Michelle Buchecker, Christine Dillon, Marty Hultgren, Marya
Ilgen-Lieth, Mike Kalt, Natalie McGowan, Linda Mitterling, Georg Morsing, Dr. Sue Rakes, Warren
Repole, and Larry Stewart. Editing and production support was provided by the Curriculum Development
and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
SAS® Programming 1: Essentials Course Notes
ia l
t e r
Copyright © 2009 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
a
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
y M n
Book code E1516, course code LWPRG1/PRG1, prepared date 18Jun2009. LWPRG1_002
a r t i o
i e t b u ISBN 978-1-60764-194-0
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
For Your Information iii
Table of Contents
Prerequisites ................................................................................................................................ xi
ia l
1.1
r
Course Logistics .............................................................................................................. 1-3
t e
a
1.2 An Overview of Foundation SAS .................................................................................... 1-7
1.3
M
Chapter Review.............................................................................................................. 1-10
y n
1.4
r i o
Solutions ........................................................................................................................ 1-11
a t
t
Solutions to Chapter Review ................................................................................... 1-11
i e i b u
Chapter 2
p r t r
Getting Started with SAS ..................................................................... 2-1
S
2.1
r o dis SA
Introduction to SAS Programs ......................................................................................... 2-3
S P
2.2
o r o f
Submitting a SAS Program ............................................................................................ 2-11
Demonstration: Submitting a SAS Program with SAS Windowing
A t f d e
Environment – Windows ............................................................... 2-16
S No tsi
Demonstration: Submitting a SAS Program with SAS Windowing
Environment – UNIX .................................................................... 2-21
Demonstration: Submitting a SAS Program with SAS Windowing
ou
Environment – z/OS (OS/390) ..................................................... 2-25
Demonstration: Submitting a SAS Program with SAS Enterprise Guide ............... 2-30
Exercises.................................................................................................................. 2-43
ia
Exercises.................................................................................................................. 3-19
l
3.3
r
Chapter Review.............................................................................................................. 3-20
t e
3.4
a
Solutions ........................................................................................................................ 3-21
M
Solutions to Exercises ............................................................................................. 3-21
y o n
Solutions to Student Activities (Polls/Quizzes) ....................................................... 3-23
r i
a t
Solutions to Chapter Review ................................................................................... 3-25
t u
Chapter 4
r i e i b
Getting Familiar with SAS Data Sets .................................................. 4-1
r S
4.1
p
o dis SA t
Examining Descriptor and Data Portions......................................................................... 4-3
P r Exercises.................................................................................................................. 4-13
f
S
4.2
r o
Accessing SAS Data Libraries ....................................................................................... 4-16
f o
A
S No tsit d e
Demonstration: Accessing and Browsing SAS Data Libraries – Windows ............ 4-24
Demonstration: Accessing and Browsing SAS Data Libraries – UNIX ................. 4-28
Demonstration: Accessing and Browsing SAS Data Libraries – z/OS
ou
(OS/390) ........................................................................................ 4-31
Exercises.................................................................................................................. 4-33
l
Exercises.................................................................................................................. 5-26
e r ia
Exercises.................................................................................................................. 5-41
5.5
a t
Chapter Review.............................................................................................................. 5-44
5.6
M n
Solutions ........................................................................................................................ 5-45
y
r i o
Solutions to Exercises ............................................................................................. 5-45
a t
i e t b u
Solutions to Student Activities (Polls/Quizzes) ....................................................... 5-50
p r t r i
Solutions to Chapter Review ................................................................................... 5-54
S
o dis SA
Chapter 6 Reading Excel Worksheets .................................................................. 6-1
P
6.1
r f
Using Excel Data as Input................................................................................................ 6-3
S f o r o
Demonstration: Reading Excel Worksheets – Windows ......................................... 6-15
A
S No tsi
6.2 t d e
Exercises.................................................................................................................. 6-17
ou
Exercises.................................................................................................................. 6-36
ia l
t e r
Solutions to Chapter Review ................................................................................... 7-55
Chapter 8
a
Validating and Cleaning Data .............................................................. 8-1
M n
8.1 Introduction to Validating and Cleaning Data.................................................................. 8-3
8.2
r y i o
Examining Data Errors When Reading Raw Data Files .................................................. 8-9
a t
i e t b u
Demonstration: Examining Data Errors .................................................................. 8-14
8.3
r r i
Validating Data with the PRINT and FREQ Procedures ............................................... 8-19
p t S
o dis SA
Exercises.................................................................................................................. 8-30
8.4
f
S f o r o
Exercises.................................................................................................................. 8-38
A 8.5
S No tsit d e
Cleaning Invalid Data .................................................................................................... 8-41
Demonstration: Using the Viewtable Window to Clean Data – Windows
(Self-Study) ................................................................................... 8-44
ou
Demonstration: Using the Viewtable Window to Clean Data – UNIX
(Self-Study) ................................................................................... 8-47
Demonstration: Using the FSEDIT Window to Clean Data – z/OS (OS/390)
(Self-Study) ................................................................................... 8-49
Exercises.................................................................................................................. 8-64
ia l
t e r
Exercises.................................................................................................................. 9-51
9.4
a
Chapter Review.............................................................................................................. 9-53
M n
9.5 Solutions ........................................................................................................................ 9-54
a r y t i o
Solutions to Exercises ............................................................................................. 9-54
t
Solutions to Student Activities (Polls/Quizzes) ....................................................... 9-62
i e b u
Solutions to Chapter Review ................................................................................... 9-66
i
p r t r S
o dis SA
Chapter 10 Combining SAS Data Sets ................................................................. 10-1
P r
10.1 Introduction to Combining Data Sets ............................................................................ 10-3
f
S o r o
10.2 Appending a Data Set (Self-Study)................................................................................ 10-7
f
A
S No tsit d e
Exercises................................................................................................................ 10-17
ou
Exercises................................................................................................................ 10-41
l
Exercises................................................................................................................ 11-16
e r ia
Exercises................................................................................................................ 11-29
a t
11.3 Creating User-Defined Formats ................................................................................... 11-32
M n
Exercises................................................................................................................ 11-43
y
r i o
11.4 Subsetting and Grouping Observations ....................................................................... 11-46
a t
i e t b u
Exercises................................................................................................................ 11-52
r r i
11.5 Directing Output to External Files ............................................................................... 11-55
p t S
o dis SA
Demonstration: Creating HTML, PDF, and RTF Files ......................................... 11-64
f
S f o r Demonstration: Using Options with the EXCELXP Destination (Self-Study) ..... 11-80
o
Exercises................................................................................................................ 11-83
A
S No tsit d e
11.6 Chapter Review............................................................................................................ 11-88
ou
Solutions to Exercises ........................................................................................... 11-89
Solutions to Student Activities (Polls/Quizzes) ................................................... 11-103
Solutions to Chapter Review ............................................................................... 11-109
ia l
t e r
Solutions to Chapter Review ................................................................................. 12-78
Chapter 13
a
Introduction to Graphics Using SAS/GRAPH (Self-Study) ............. 13-1
M n
13.1 Introduction .................................................................................................................... 13-3
a r y t i o
13.2 Creating Bar and Pie Charts ........................................................................................... 13-9
i e t b u
Demonstration: Creating Bar and Pie Charts ........................................................ 13-10
r r i
13.3 Creating Plots ............................................................................................................... 13-21
p t S
o dis SA
Demonstration: Creating Plots .............................................................................. 13-22
P r
13.4 Enhancing Output ........................................................................................................ 13-25
f
S f o r o
Demonstration: Enhancing Output........................................................................ 13-26
A t
Chapter 14
ou
14.2 Beyond This Course ....................................................................................................... 14-6
Course Description
This course is for users who want to learn how to write SAS programs. It is the entry point to learning
SAS programming and is a prerequisite to many other SAS courses. If you do not plan to write SAS
programs and you prefer a point-and-click interface, you should attend the SAS® Enterprise Guide® 1:
Querying and Reporting course.
To learn more…
ia l
e r
A full curriculum of general and statistical instructor-based training is available
t
at any of the Institute’s training facilities. Institute instructors can also provide
M a
on-site training.
For information on other courses in the curriculum, contact the SAS Education
r y i o n
Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also
find this information on the Web at support.sas.com/training/ as well as in the
t a t
Training Course Catalog.
u
r i e r i b S
For a list of other SAS books that relate to the topics covered in this
p
o dis SA t Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the
f
r
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
A
S No tsit d
ou
For Your Information xi
Prerequisites
Before attending this course, you should have experience using computer software. Specifically, you
should be able to
• understand file structures and system commands on your operating systems
• access data files on your operating systems.
No prior SAS experience is needed. If you do not feel comfortable with the prerequisites or are new to
programming and think that the pace of this course might be too demanding, you can take the
Introduction to Programming Concepts Using SAS® Software course before attending this course.
ia l
e r
Introduction to Programming Concepts Using SAS® Software is designed to introduce you to computer
programming and presents a portion of the SAS® Programming 1: Essentials material at a slower pace.
t
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
xii For Your Information
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 1 Introduction
ia l
1.3
r
Chapter Review ............................................................................................................. 1-10
t e
1.4
a
Solutions ....................................................................................................................... 1-11
M
Solutions to Chapter Review ................................................................................................ 1-11
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-2 Chapter 1 Introduction
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1.1 Course Logistics 1-3
Objectives
Explain the naming convention that is used for the
course files.
Compare the three levels of exercises that are used
in the course.
ia l
r
Describe at a high level how data is used and stored
e
at Orion Star Sports & Outdoors.
t
Navigate to the Help facility.
a
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
Filename Conventions
S P o r
p104d01x
o f
A t f d
course ID
e chapter # type item # placeholder
S No tsiCode Type
p104a01
p104a02 Example:
ou
p104a02s The Programming 1
a Activity
course ID is p1, so
p104d01
d Demo p104d01 =
p104d02 Programming 1
e Exercise p104e01 Chapter 4, Demo 1.
s Solution p104e02
p104s01
p104s02
4
1-4 Chapter 1 Introduction
ia l
r
Level 3 Only the task you are to perform or
e
the results to be obtained are provided.
a t
Typically, you will need to use the
Help facility.
M
You are not expected to complete all of the exercises
y o n
in the time allotted. Choose the exercise or exercises
that are at the level you are most comfortable with.
r i
t a u t
r i e i b
Orion Star Sports & Outdoors
r S
p
o dis SA t
P r Orion Star Sports & Outdoors is a fictitious global sports
f
r
and outdoors retailer with traditional stores, an online store,
S o o
and a large catalog business.
f e
A
S No tsit
The corporate headquarters is located in the United States
d
with offices and stores in many countries throughout the
world.
ou
Orion Star has about 1,000 employees and 90,000
customers, processes approximately 150,000 orders
annually, and purchases products from 64 suppliers.
6
1.1 Course Logistics 1-5
ia l
warehouse.
t e r
Data marts were created to meet the needs of specific
a
departments such as Marketing.
M
7
r y i o n
t a u t
r i e i
The SAS Help Facility
r b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
8
ou
The Help facility can also be accessed from a Web browser at the following link:
http://support.sas.com/documentation/index.html
1-6 Chapter 1 Introduction
ia l
t e r
M a
10
r y i o n
t a u t
r i e
1.01 Poll
r i b S
p t
Were you able to open the Help facility in your
o dis SA
SAS session?
P r Yes
f
r
No
S f o e o
A
S No tsit d
11
ou
1.2 An Overview of Foundation SAS 1-7
Objectives
Describe the structure and design of Foundation SAS.
Describe the functionality of Foundation SAS.
ia l
t e r
M a
r y i o n
t a u t
14
r i e r i b S
p
o dis SA t
P r What Is Foundation SAS?
f
r
Foundation SAS is a highly flexible and integrated
S f o o
software environment that can be used in virtually any
setting to access, manipulate, manage, store, analyze,
e
A
S No tsit
and report on data.
d
ou
15
1-8 Chapter 1 Introduction
ia l
r
the access to virtually any data source such as DB2,
e
Oracle, SYBASE, Teradata, SAP, and Microsoft Excel
t
the support for most widely used character encodings
for globalization
M a
16
r y i o n
t a u t
r i e i b
What Is Foundation SAS?
r S
p t
At the core of Foundation SAS is Base SAS software.
o dis SA
P r Components of Foundation SAS
f
S f o r
Reporting
and Graphics
o Data Access
and Management
User
Interfaces
A
S No tsit d e
Analytics
Visualization
Base SAS
Business
Application
Development
Web
and Discovery Solutions Enablement
17
ou
Base SAS capabilities can be extended with additional
components.
1.2 An Overview of Foundation SAS 1-9
1.02 Poll
Are you currently using SAS?
Yes
No
ia l
t e r
M a
19
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-10 Chapter 1 Introduction
Chapter Review
1. How can you open the Help facility?
ia l
t e r
M a
r y i o n
t a u t
21
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1.4 Solutions 1-11
1.4 Solutions
ia l
a Web browser at
e r
http://support.sas.com/documentation/index.html.
t
The Help facility can be opened within your
Documentation.
M a
SAS session from Help SAS Help and
r y o n
2. What is at the core of Foundation SAS?
i
a t
Base SAS is at the core of Foundation SAS.
t u
r i e r i b S
22
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-12 Chapter 1 Introduction
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 2 Getting Started with SAS
ia l
r
Demonstration: Submitting a SAS Program with SAS Windowing Environment –
Windows ..................................................................................................... 2-16
t e
Demonstration: Submitting a SAS Program with SAS Windowing Environment –
a
UNIX ........................................................................................................... 2-21
y M
Demonstration: Submitting a SAS Program with SAS Windowing Environment –
n
z/OS (OS/390) ........................................................................................... 2-25
r i o
Demonstration: Submitting a SAS Program with SAS Enterprise Guide............................. 2-30
a t
i e t b u
Exercises .............................................................................................................................. 2-43
2.3
r r i S
Chapter Review ............................................................................................................. 2-45
p t
2.4
r o dis SA
Solutions ....................................................................................................................... 2-46
S P o r o f
Solutions to Exercises .......................................................................................................... 2-46
A t f e
Solutions to Chapter Review ................................................................................................ 2-50
d
S No tsi
ou
2-2 Chapter 2 Getting Started with SAS
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.1 Introduction to SAS Programs 2-3
Objectives
List the components of a SAS program.
State the modes in which you can run a SAS program.
ia l
t e r
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rSAS Programs
f
A SAS program is a sequence of steps that the user
S f o r o
submits for execution.
A
S No tsitRaw
Data d e
DATA steps are typically used to create
SAS data sets.
ou
DATA SAS PROC Report
Step Data Step
Set
SAS
Data PROC steps are typically used to process
Set SAS data sets (that is, generate reports
and graphs, manage data, and sort data).
4
2-4 Chapter 2 Getting Started with SAS
2.01 Quiz
How many steps are in this program?
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
l
Job_Title $ Salary;
run;
e r ia
t
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
M a
6
r y i o n p102d01
t a u t
r i e i
SAS Program Example
r b S
p t
This DATA step creates a temporary SAS data set named
o dis SA
Work.NewSalesEmps by reading four fields from a
f
r
data work.NewSalesEmps;
S f o o
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
e
infile 'newemps.csv' dlm=',';
A
S No tsit
input First_Name $ Last_Name $
run;
d
Job_Title $ Salary;
ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
8
The raw data filename specified in the INFILE statement needs to be specific to your operating
environment.
Examples of raw data filenames:
Windows s:\workshop\newemps.csv
UNIX /users/userid/newemps.csv
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
ia l
r
run;
run;
a t e
proc print data=work.NewSalesEmps;
M
class Job_Title;
n
var Salary;
run;
9
a r y t i o
i e t b u
p r
SAS Program Example
t r i S
This PROC MEANS step creates a summary report of the
r o dis SA
Work.NewSalesEmps data set with statistics for the
variable Salary for each value of Job_Title.
S P o r o f
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
A t f e
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
d
S No tsi
Job_Title $ Salary;
run;
ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
10
2-6 Chapter 2 Getting Started with SAS
Step Boundaries
SAS steps begin with either of the following:
a DATA statement
a PROC statement
ia l
t e r
a QUIT statement (for some procedures)
the beginning of another step (DATA statement
a
or PROC statement)
y M n
11
a r t i o
e t b u
A SAS program executed in batch or noninteractive mode might not require any RUN statements
to execute successfully. However, this practice is not recommended.
i
p r t r i S
o dis SA
Step Boundaries
e
length First_Name $ 12
A
S No tsit d
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
run;
Job_Title $ Salary;
ou
proc print data=work.NewSalesEmps;
2.02 Quiz
How does SAS detect the end of the PROC MEANS step?
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
l
Job_Title $ Salary;
run;
e r
proc means data=work.NewSalesEmps;
ia
t
class Job_Title;
var Salary;
M a
14
r y i o n
t a u t
r i e
Step Boundaries
r i b S
p t
SAS detects the end of the PROC MEANS step when it
o dis SA
encounters the RUN statement.
P r f
data work.NewSalesEmps;
S f o r length First_Name $ 12
o
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
A
S No tsit run;
d e
input First_Name $ Last_Name $
Job_Title $ Salary;
ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
16
2-8 Chapter 2 Getting Started with SAS
ia l
t e r
M a
17
r y i o n
t a u t
r i e i b
SAS Windowing Environment
r S
p
o dis SA t
P r f
r
In the SAS windowing environment,
A
log, output, and Help facility, and more.
S No tsit d
18
ou
2.1 Introduction to SAS Programs 2-9
ia l
r
SAS Enterprise Guide provides a
e
menu-driven system for creating
t
and running your SAS programs.
M a
19
r y i o n
t a u t
r i e
Batch Mode
r i b S
p t
Batch mode is a method of running SAS programs in
o dis SA
which you prepare a file that contains SAS statements
f
and submit the file to the operating system.
S f o r o
e
Partial z/OS (OS/390) Example:
A
S No tsit
// EXEC SAS
//SYSIN DD *
d
//jobname JOB accounting info,name …
data work.NewSalesEmps;
Appropriate JCL
is placed before
SAS statements.
ou
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile '.workshop.rawdata(newemps)' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
20
2-10 Chapter 2 Getting Started with SAS
Noninteractive Mode
In noninteractive mode, SAS program statements are
stored in an external file and are executed immediately
after you issue a SAS command referencing the file.
Directory-based Example:
SAS filename
ia l
z/OS (OS/390) Example:
t e r
a
SAS INPUT(filename)
M
21
r y i o n
t a u t
The command for invoking SAS at your site might be different from the default shown
r i e r i b
above. Ask your SAS administrator for the command to invoke SAS at your site.
S
p
o dis SA t
2.03 Multiple Answer Poll
P r f
Which mode(s) will you use for running SAS programs?
S
a.
f o r o
SAS windowing environment
e
b. SAS Enterprise Guide
A
S No tsit c.
d.
e.
d
batch mode
noninteractive mode
other
ou
f. unknown
23
2.2 Submitting a SAS Program 2-11
Objectives
Include a SAS program in your session.
Submit a program and browse the results.
Navigate the SAS windowing environment.
Navigate SAS Enterprise Guide.
ia l
t e r
M a
r y i o n
t a u t
26
r i e r i b S
p
o dis SA t
P r SAS Windowing Environment
f
S f o r o
A
S No tsit d e The SAS windowing environment is
used to write, include, and submit SAS
programs and examine SAS results.
ou
27
2-12 Chapter 2 Getting Started with SAS
ia l
r
processing of the SAS program,
e
including any warning and error
a t messages.
y M n
SAS program.
28
a r t i o
i e t b u
p r
Editor Windows
t r i S
o dis SA
Enhanced Editor Program Editor
P r f
Only available in the Windows Available in all operating
S f o r o
operating environment
Default editor for Windows
environments
Default editor for all operating
A
S No tsit d e
operating environment
Multiple instances of the editor
can be open at one time
Code does not disappear after it
environments except Windows
Only one instance of the editor
can be open at one time
Code disappears after it is
ou
is submitted submitted
Incorporates color-coding as you Incorporates color-coding after
type you press ENTER
29
2.2 Submitting a SAS Program 2-13
Editor Windows
Enhanced Editor
ia l
t e r
M a Program Editor
30
r y i o n
t a u t
r i e
Log Window
r i b S
p
Partial SAS Log
o dis SA t
r
33 data work.NewSalesEmps;
34 length First_Name $ 12 Last_Name $ 18
P
35 Job_Title $ 25;
f
36 infile 'newemps.csv' dlm=',';
r
37 input First_Name $ Last_Name $
o
38 Job_Title $ Salary;
S o
39 run;
A f e
NOTE: The infile 'newemps.csv' is:
t d
File Name=S:\Workshop\newemps.csv,
S No tsi
RECFM=V,LRECL=256
ou
NOTE: The data set WORK.NEWSALESEMPS has 71 observations and 4 variables.
40
41 proc print data=work.NewSalesEmps;
42 run;
NOTE: There were 71 observations read from the data set WORK.NEWSALESEMPS.
31
2-14 Chapter 2 Getting Started with SAS
Output Window
Partial PROC PRINT Output
Obs First_Name Last_Name Job_Title Salary
l
6 Shani Duckett Sales Rep. I 25795
7 Fang Wilson Sales Rep. II 26810
ia
8 Michael Minas Sales Rep. I 26970
9 Amanda Liebman Sales Rep. II 27465
r
10 Vincent Eastley Sales Rep. III 29695
11 Viney Barbis Sales Rep. III 30265
e
12 Skev Rusli Sales Rep. II 26580
t
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445
a
15 Leonid Karavdic Sales Rep. II 27860
y M n
32
a r t i o
i e t b u
p r
Output Window
PROC MEANS Output
t r i S
r o dis SA The MEANS Procedure
S P
Job_Title
o r
N
o
Obs
f
N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
f
Sales Rep. I 21 21 26418.81 713.1898498 25275.00 27475.00
A
S No tsit
Sales Rep. II
Sales Rep. IV
d e 11
9
6
9
11
6
26902.22
29345.91
31215.00
592.9487283
989.4311956
545.4997709
26080.00
28025.00
30305.00
27860.00
30785.00
31865.00
ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
33
2.2 Submitting a SAS Program 2-15
l
d. other
ia
e. unknown
t e r
M a
35
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-16 Chapter 2 Getting Started with SAS
•
•
Start a SAS session.
Include and submit a SAS program.
ia l
•
•
Examine the results.
Use the Help facility.
t e r
Starting a SAS Session
M a
y o n
1. Double-click the SAS icon to start your SAS session.
r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-17
1. To open a SAS program into your SAS session, select File Open Program or click and then
select the file that you want to include. To open a program, your Enhanced Editor must be active.
You can also issue the INCLUDE command to open (include) a program into your SAS session.
a. With the Enhanced Editor active, on the command bar type include and the name of the file
containing the program.
b. Press ENTER.
ia l
t e r
M a
r i o n
The program is included in the Enhanced Editor.
y
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
You can use the Enhanced Editor to do the following:
ou
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
In the Enhanced Editor, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names
2. To submit the program for execution, issue the SUBMIT command, click , or select Run
Submit. The output from the program is displayed in the Output window.
2-18 Chapter 2 Getting Started with SAS
To scroll horizontally in the Output window, use the horizontal scroll bar or issue the RIGHT
ia l
and LEFT commands.
t e r
In the Windows environment, the Output window displays the last page of output generated by the
submitted program.
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
To scroll vertically in the Output window, use the vertical scroll bar, issue the FORWARD and
BACKWARD commands, or use the PAGE UP or PAGE DOWN keys on the keyboard.
You also can use the TOP and BOTTOM commands to scroll vertically in the Output window.
ou
1. Scroll to the top to view the output from the PRINT procedure.
2.2 Submitting a SAS Program 2-19
2. To open the Log window and browse the messages that the program generated, issue the
LOG command, select Window Log, or click on the log.
The Log window
• is one of the primary windows and is open by default
• acts as an audit trail of your SAS session; messages are written to the log in the order in which
they are generated by the program.
3. To clear the contents of the window, issue the CLEAR command, select Edit Clear All, or
you can click (the NEW icon).
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
The Log window contains the programming statements that are submitted, as well as notes about the
following:
• any files that were read
• the records that were read
• the program execution and results
In this example, the Log window contains no warning or error messages. If the program contains
errors, relevant warning and error messages are also written to the SAS log.
2-20 Chapter 2 Getting Started with SAS
1. To open the Help facility, select Help SAS Help and Documentation or click .
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language
f
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming
S f o r o
with Base SAS Software are recommended to learn SAS concepts.
A t d e
4. For example, select Base SAS 9.2 Procedures Guide
find the documentation for the PRINT procedure.
S No tsi
Procedures The PRINT Procedure to
ou
2.2 Submitting a SAS Program 2-21
•
•
Start a SAS session.
Include and submit a SAS program.
ia l
•
•
Examine the results.
Use the Help facility.
t e r
Starting a SAS Session
M a
y o n
1. In your UNIX session, type the appropriate command to start a SAS session.
r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-22 Chapter 2 Getting Started with SAS
1. To open a SAS program into your SAS session, select File Open or click and then select the
file that you want to include. To open a program, your Program Editor must be active.
You can also issue the INCLUDE command to open (include) a SAS program into your SAS session.
a. With the Program Editor active, on the command bar type include and the name of the file
containing the program.
b. Press ENTER.
ia l
t e r
The program is included in the Program Editor window.
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program Editor window to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
Within the Program Editor, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names
2. To submit the program for execution, issue the SUBMIT command, click , or select Run
Submit. The output from the program is displayed in the Output window.
2.2 Submitting a SAS Program 2-23
To clear the contents of the window, issue the CLEAR command, select Edit Clear All, or click .
ia l
To scroll horizontally in the Output window, use the horizontal scroll bar or issue the RIGHT and LEFT
commands.
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
To scroll vertically within the Output window, use the vertical scroll bar or issue the FORWARD
and BACKWARD commands.
f
S f o r o
You also can use the TOP and BOTTOM commands to scroll vertically in the Output window.
A t d e
1. Scroll to the top to view the output from the PRINT procedure.
S No tsi
ou
2-24 Chapter 2 Getting Started with SAS
2. To open the Log window and browse the messages that the program generated, issue the
LOG command or select View Log.
The Log window
• is one of the primary windows and is open by default
• acts as a record of your SAS session; messages are written to the log in the order in which
they are generated by the program.
3. To clear the contents of the window, issue the CLEAR command, select Edit Clear All,
or click .
ia l
r
The Log window contains the programming statements that were most recently submitted, as
e
well as notes about the following:
t
• any files that were read
a
• the records that were read
M n
• the program execution and results
r y i o
In this example, the Log window contains no warning or error messages. If your program contains
a t
errors, relevant warning and error messages are also written to the SAS log.
e t u
4. Issue the END command or select View
i b
Program Editor to return to the Program Editor window.
r
Using the Help Facility
p t r i S
r o dis SA
1. To open the Help facility, select Help SAS Help and Documentation or click .
P r
2. Select the Contents tab.
S o o f
A t f d e
3. From the Contents tab, select SAS Products Base SAS.
S No tsi
The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming
with Base SAS Software are recommended to learn SAS concepts.
ou
4. For example, select Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure to
find the documentation for the PRINT procedure.
The Help facility can also be accessed from a Web browser at the following link:
http://support.sas.com/documentation/index.html
From this Web page, Base SAS can be selected. The SAS syntax books are available in
HTML or PDF version.
2.2 Submitting a SAS Program 2-25
•
•
Start a SAS session.
Include and submit a SAS program.
ia l
•
•
Examine the results.
Use the Help facility.
t e r
Starting a SAS Session
M a
y o n
1. Type the appropriate command to start your SAS session.
r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-26 Chapter 2 Getting Started with SAS
1. To include (copy) a SAS program into your SAS session, issue the INCLUDE command.
a. Type include and the name of the file that contains your program on the command line
of the Program Editor.
b. Press ENTER.
ia l
t e r
M a
r y i o n
The program is included in the Program Editor.
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program Editor to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save programming statements in a file
The program contains three steps: a DATA step and two PROC steps.
Issue the SUBMIT command to execute your program.
2.2 Submitting a SAS Program 2-27
2. The first page of the output from your program is displayed in the Output window.
ia l
t e r
M a
r y i o n
t a u t
r i e
Examining the Results
r i b S
p
The Output window
o dis SA t
• is one of the primary windows and is open by default
P r
• becomes the active window each time that it receives output
f
S o r
• automatically accumulates output in the order in which it is generated.
o
You can issue the CLEAR command or select Edit Clear All to clear the contents of the window.
f
A t e
To scroll horizontally in the Output window, issue the RIGHT and LEFT commands.
S No tsi d
To scroll vertically in the Output window, issue the FORWARD and BACKWARD commands.
ou
You also can use the TOP and BOTTOM commands to scroll vertically within the Output
window.
2-28 Chapter 2 Getting Started with SAS
1. Issue the END command. If the PRINT procedure produces more than one page of output, you
are taken to the last page of output. If the PRINT procedure produces only one page of output,
the END command enables the MEANS procedure to execute and produce its output.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
You can issue an AUTOSCROLL 0 command on the command line of the Output window to
o dis SA
have all of your SAS output from one submission placed in the Output window at one time.
r
This eliminates the need to issue an END command to run each step separately.
S P o r o f
The AUTOSCROLL command is in effect for the duration of your SAS session. If you want
this every time that you invoke SAS, you can save this setting by typing autoscroll 0;
f
wsave on the command line of the Output window.
A
S No tsit d e
2. Issue the END command to return to the Program Editor.
After the program executes, you can view messages in the Log window.
ou
The Log window
• is one of the primary windows and is open by default.
• acts as a record of your SAS session; messages are written to the log in the order in which they are
generated by the program.
You can issue the CLEAR command to clear the contents of the window.
The Log window contains the programming statements that were recently submitted, as well as notes
about the following:
• any files that were read
• the records that were read
• the program execution and results
In this example, the Log window contains no warning or error messages. If your program contains
errors, relevant warning and error messages are also written to the SAS log.
2.2 Submitting a SAS Program 2-29
1. To open the Help facility, select Help SAS Help and Documentation or click .
2. Select the Contents tab.
3. From the Contents tab, select SAS Products Base SAS.
ia
The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language l
e r
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming
with Base SAS Software are recommended to learn SAS concepts.
t
a
4. For example, select Base SAS 9.2 Procedures Guide
find the documentation for the PRINT procedure.
M
Procedures The PRINT Procedure to
r y i o n
The Help facility can also be accessed from a Web browser at the following link:
t
http://support.sas.com/documentation/index.html
t a u
From this Web page, Base SAS can be selected. The SAS syntax books are available in
e
r i t r b
HTML or PDF version.
i S
r p
o dis SA
S P o r o f
A t f d e
S No tsi
ou
2-30 Chapter 2 Getting Started with SAS
ia l
•
•
Manage a project (optional).
Use the Help facility.
t e r
Starting SAS Enterprise Guide
M a
y o n
1. Double-click the Enterprise Guide icon to start your SAS session.
r i
t a u t
The method that you use to invoke Enterprise Guide varies by any customizations in effect at
your site. This demo is based on Enterprise Guide 4.2.
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1. To open a SAS program into Enterprise Guide, select File Open Program or select
Program and then select the file that you want to include.
The program is included in the Program tab of the workspace area.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program tab to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
In the Program tab, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names
2-32 Chapter 2 Getting Started with SAS
3. Modify the INFILE statement in the Program tab to include the path location of the CSV file.
ia l
t e r
4. To submit the program for execution, select Program Run On Local or click in the
workspace area.
M a
Program tab or select the F8 key. The output from the program is displayed in the Results tab of the
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-33
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
To scroll vertically in the Results tab, use the vertical scroll bar or use the PAGE UP or PAGE DOWN
keys on the keyboard.
2-34 Chapter 2 Getting Started with SAS
By default, the result format is set to SAS Report (an XML file specific to SAS) in SAS Enterprise Guide
4.2.
1. To change the result format, select Tools Options Results Results General.
2. Select the desired result formats such as HTML and Text Output and select OK.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3. Resubmit the program.
2.2 Submitting a SAS Program 2-35
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
5. View the multiple Results tabs to view the different result formats.
f
S f o r o
A
S No tsit d e
ou
2-36 Chapter 2 Getting Started with SAS
The Log tab contains the statements specific to SAS Enterprise Guide and the programming statements
that are submitted as well as notes about the following:
• any files that were read
• the records that were read
• the program execution and results.
In this example, the Log tab contains no warning or error messages. If the program contains errors,
relevant warning and error messages are also written to the SAS log.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
To scroll horizontally in the Log tab, use the horizontal scroll bar.
To scroll vertically in the Log tab, use the vertical scroll bar or use the PAGE UP or PAGE DOWN keys
on the keyboard.
2.2 Submitting a SAS Program 2-37
The Project Tree window displays the active project and its associated programs. SAS Enterprise Guide
uses projects to manage each collection of related data, tasks, code, and results.
Multiple programs can be added to one project.
1. To add a new program to the existing project, select New Program.
2. Enter a PROC FREQ step on the Program tab.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-38 Chapter 2 Getting Started with SAS
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
4. To save the program, right-click on Program in the Project Tree and select Save Program As….
Then, supply a location and name for the program.
f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-39
5. To save the project, select File Save Project As …. Then, supply a location and name for the
project.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
6. To maneuver between programs, double-click the desired program in the Project Tree.
f
S f o r o
A
S No tsit d e
ou
2-40 Chapter 2 Getting Started with SAS
7. To delete a program, right-click on the program in the Project Tree and select Delete.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
e
You will need to select Yes to delete all the items associated with the program.
A
S No tsit d
ou
2.2 Submitting a SAS Program 2-41
1. To open the Help facility for SAS Enterprise Guide, select Help SAS Enterprise Guide Help.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
2. To open the Help facility for SAS Syntax, select Help SAS Syntax Help.
A t e
3. Select the Contents tab.
S No tsi d
ou
2-42 Chapter 2 Getting Started with SAS
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
5. For example, select Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure to
S f o r o
find the documentation for the PRINT procedure.
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-43
Exercises
Level 1
ia l
Windows
t r
a. With the appropriate Editor window active, include a SAS program.
e
Select File Open Program and select the p102e01.sas program.
UNIX
a
Select File
M
Open and select the p102e01.sas program.
z/OS (OS/390)
r y i o n
Issue the command: include '.workshop.sascode(p102e01)'.
t
b. Submit the program for execution. Based on the report in the Output window, how many rows
t a
and columns are in the report?
e u
r i
rows: __________
t r i b columns: __________
S
c. Examine the Log window. Based on the log notes, how many observations and variables are in the
r p
o dis SA
Work.country data set?
S P r f
d. Clear the Log and Output windows.
o o
f
e. Use the Help facility to find documentation about the LINESIZE= option.
A
S No tsit d e
Go to the CONTENTS tab in the SAS Help and Documentation. Select SAS Products
Base SAS SAS 9.2 Language Reference: Dictionary
Dictionary of Language Elements SAS System Options LINESIZE= System Option.
ou
What is an alias for the LINESIZE= system option? ____________________
Level 2
The SETINIT procedure produces a list of the SAS components licensed at a given site.
2-44 Chapter 2 Getting Started with SAS
c. If you see SAS/GRAPH in the list of components in the log, include a SAS program.
Windows Select File Open Program and select the p102e02.sas program.
d. Submit the program for execution. View the results in the GRAPH window.
e. Close the GRAPH window.
3. Setting Up Function Keys
ia l
t e r
a. Issue the KEYS command or select Tools Options Keys to open the KEYS window.
M a
The KEYS window is a secondary window used to browse or change function key
definitions.
r y i o n
b. Add the following commands to the F12 key:
a t
clear log; clear output
t u
e
c. Close the KEYS window.
r i t r i b
d. Press the F12 key and confirm that the Log and Output windows are cleared.
S
Level 3
r p
o dis SA
S P o o f
4. Exploring Your SAS Environment – Windows
r
a. Customize the appearance and functionality of the Enhanced Editor by selecting Tools
A t f d e
Options Enhanced Editor. For example, select the Appearance tab to modify the
font size.
S No tsi b. In the Help facility, look up the documentation for the Enhanced Editor.
From the Contents tab, select Using SAS Software in Your Operating Environment
ou
SAS 9.2 Companion for Windows Running SAS under Windows
Using the SAS Editors Using the Enhanced Editor.
5. Exploring Your SAS Environment – UNIX and z/OS (OS/390)
a. From a Web browser, access the following link: http://support.sas.com/documentation/.
b. Select Base SAS.
c. Select the HTML version of Step-by-Step Programming with Base SAS Software.
d. On the Contents tab, select Understanding Your SAS Environment
Using the SAS Windowing Environment Working with SAS Programs.
e. Refer to Command Line Commands and the Editor and Line Commands and the Editor.
2.3 Chapter Review 2-45
Chapter Review
1. What are the two components of a SAS program?
ia l
environment?
t e r
3. How can you include a program in the SAS windowing
environment?
M a
4. How can you submit a program in the SAS windowing
r y i o n
5. What are the three primary windows in the SAS
a
windowing environment?
t u t
39
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-46 Chapter 2 Getting Started with SAS
2.4 Solutions
Solutions to Exercises
1. Submitting a Program and Using the Help Facility
a. Include a SAS program.
l
options linesize=95 pagesize=52;
ia
data work.country;
t r
length Country_Code $ 2 Country_Name $ 48;
e
input Country_Code $ Country_Name $;
run;
M a
n
proc print data=work.country;
y
run;
a r
b. Submit the program.
t i o
e t
rows: 238
i u
columns: 3
b
r r i
c. Examine the Log window.
p t S
o dis SA
observations: 238 variables: 2
f
S f o r
e. Use the Help facility.
o
What is an alias for the LINESIZE= system option? LS=
A
S No tsit d e
2. Identifying SAS Components
a. Type the following SAS program:
proc setinit;
ou
run;
b. Submit the program.
Partial SAS Log
Product expiration dates:
---Base Product 31DEC2008
---SAS/STAT 31DEC2008
---SAS/GRAPH 31DEC2008
---SAS/ETS 31DEC2008
2.4 Solutions 2-47
goptions reset=all;
proc gchart data=work.SalesEmps;
vbar3d Job_Title / sumvar=Salary type=mean;
ia l
r
hbar Job_Title / group=Gender sumvar=Salary
e
patternid=midpoint;
pie3d Job_Title / noheading;
t
a
where Job_Title contains 'Sales Rep';
label Job_Title='Job Title';
M n
format Salary dollar12.;
y
title 'Orion Star Sales Employees';
run;
quit;
a r t i o
e t
d. Submit the program.
i b u
r r i
e. Close the GRAPH window.
p t S
o dis SA
3. Setting Up Function Keys
P r
a. Issue the KEYS command.
f
S f o r
b. Add a command to the F12 key.
o
e
Keys Window (Windows):
A
S No tsit d
ou
ia l
t e r
M a
r y i o n
t a u t
r i e i b
5. Exploring Your SAS Environment – UNIX and z/OS (OS/390)
r S
p t
a. From a Web browser, access the following link: http://support.sas.com/documentation/.
o dis SA
P r b. Select Base SAS.
f
r
c. Select the HTML version of Step-by-Step Programming with Base SAS Software.
S o o
d. On the Contents tab, select Understanding Your SAS Environment
f e
A
S No tsit
Using the SAS Windowing Environment Working with SAS Programs.
d
e. Refer to Command Line Commands and the Editor and Line Commands and the Editor.
Partial Documentation
ou
2.4 Solutions 2-49
ia l
r
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
a t e
proc print data=work.NewSalesEmps;
run;
PROC
Step
class Job_Title;
y M
proc means data=work.NewSalesEmps;
n PROC
o
var Salary; Step
run;
t a r t i
u
3 steps
7
r i e r i b S
p102d01
p
o dis SA t
P r 2.02 Quiz – Correct Answer
f
How does SAS detect the end of the PROC MEANS step?
S f o r o
data work.NewSalesEmps;
e
length First_Name $ 12
A
S No tsit d
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
run;
Job_Title $ Salary;
ou
proc print data=work.NewSalesEmps;
SAS does not detect the end of the PROC MEANS step.
SAS needs a RUN statement to detect the end.
15
2-50 Chapter 2 Getting Started with SAS
ia l
r
Batch, noninteractive, and interactive modes
t e
3. How can you include a program in the SAS windowing
environment?
a
M
INCLUDE command, File Open, or
r y i o n
4. How can you submit a program in the SAS windowing
t a
environment?
u t
SUBMIT command, Run Submit, or
40
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
5. What are the three primary windows in the SAS
S f o r o
windowing environment?
e
LOG, OUTPUT, and EDITOR windows
A
S No tsit d
ou
41
Chapter 3 Working with SAS Syntax
ia l
r
Demonstration: Diagnosing and Correcting Syntax Errors .................................................. 3-12
a t e
Demonstration: Diagnosing and Correcting Syntax Errors .................................................. 3-15
3.3
M n
Chapter Review ............................................................................................................. 3-20
y
3.4
a r t i o
Solutions ....................................................................................................................... 3-21
i e t b u
Solutions to Exercises .......................................................................................................... 3-21
p r t r i S
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 3-23
o dis SA
Solutions to Chapter Review ................................................................................................ 3-25
P r f
S f o r o
A
S No tsit d e
ou
3-2 Chapter 3 Working with SAS Syntax
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.1 Mastering Fundamental Concepts 3-3
Objectives
Identify the characteristics of SAS statements.
Explain SAS syntax rules.
Insert SAS comments using two methods.
ia l
t e r
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rSAS Programs
f
r
A SAS program is a sequence of steps.
S o o
data work.NewSalesEmps;
f e
length First_Name $ 12
A
S No tsit
Last_Name $ 18 Job_Title $ 25;
d
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
run;
Job_Title $ Salary;
DATA
Step
ou
proc print data=work.NewSalesEmps; PROC
run; Step
proc means data=work.NewSalesEmps;
class Job_Title; PROC
var Salary; Step
run;
Statements
SAS statements have these characteristics:
usually begin with an identifying keyword
always end with a semicolon
data work.NewSalesEmps;
length First_Name $ 12
l
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
ia
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
t e r
proc print data=work.NewSalesEmps;
run;
class Job_Title;
M a
proc means data=work.NewSalesEmps;
n
var Salary;
y
run;
5
a r t i o p103d01
i e t b u
p r
3.01 Quiz
t r i S
How many statements are in the DATA step?
r o dis SA
a. 1
S P b.
c.
o
3
r
5
o f
f
d. 7
A
S No tsit d e
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
ou
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
7
3.1 Mastering Fundamental Concepts 3-5
run;
Job_Title $ Salary;
ia l
e r
proc print data=work.NewSalesEmps;
run;
t
class Job_Title;
var Salary;
M a
proc means data=work.NewSalesEmps;
n
run;
a r y t i o
i e t b u
SAS programming statements are easier to read if you begin DATA, PROC, and RUN statements
in column one and indent the other statements.
p r t r i S
r o dis SA
SAS Syntax Rules
P f
SAS statements are free-format.
S f o r o
One or more blanks or special characters can
be used to separate words.
A
S No tsit d e
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
ou
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
10
3-6 Chapter 3 Working with SAS Syntax
ia l
r
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12
e
Last_Name $ 18 Job_Title $ 25;
t
infile 'newemps.csv' dlm=',';
a
input First_Name $ Last_Name $
Job_Title $ Salary;
M
run;
proc print data=work.NewSalesEmps; run;
11
y o n
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
r i
t a u t
r i e
SAS Syntax Rules
r i b S
p t
SAS statements are free-format.
o dis SA
r
One or more blanks or special characters can
be used to separate words.
S P o r o f
They can begin and end in any column.
A single statement can span multiple lines.
A t f e
Several statements can be on the same line.
d
S No tsi data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
Unconventional Formatting
ou
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
12
3.1 Mastering Fundamental Concepts 3-7
ia l
r
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12
a e
Last_Name $ 18 Job_Title $ 25;
t
input First_Name $ Last_Name $
Job_Title $ Salary;
M
run;
proc print data=work.NewSalesEmps; run;
13
y o
class Job_Title; var Salary;run;
r i n
proc means data =work.NewSalesEmps;
t a u t
r i e
SAS Syntax Rules
r i b S
p t
SAS statements are free-format.
o dis SA
r
One or more blanks or special characters can
be used to separate words.
S P o r o f
They can begin and end in any column.
A single statement can span multiple lines.
A t f e
Several statements can be on the same line.
d
S No tsidata work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
Unconventional Formatting
ou
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
14
3-8 Chapter 3 Working with SAS Syntax
SAS Comments
SAS comments are text that SAS ignores during
processing. You can use comments anywhere in
a SAS program to document the purpose of the
program, explain segments of the program, or
mark SAS code as non-executing text.
ia l
/* comment */
t e r
* comment ;
M a
15
r y i o n
t a u t
Avoid placing the /* comment symbols in columns 1 and 2. On some operating environments,
i e i b
SAS might interpret these symbols as a request to end the SAS job or session.
r r S
p
o dis SA
SAS Comments t
P r f
This program contains four comments.
S f
|
o r o
*----------------------------------------*
This program creates and uses the |
A e
| data set called work.NewSalesEmps. |
t
*----------------------------------------*;
S No tsi d
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18
Job_Title $ 25;
infile 'newemps.csv' dlm=',';
ou
input First_Name $ Last_Name $
Job_Title $ Salary /*numeric*/;
run;
/*
proc print data=work.NewSalesEmps;
run;
*/
proc means data=work.NewSalesEmps;
*class Job_Title;
var Salary;
run; p103d02
16
3.1 Mastering Fundamental Concepts 3-9
ia l
t e r
M a
18
r y i o n
t a u t
r i e i b
3.02 Multiple Choice Poll
r S
p t
Which statement is true concerning the DATALINES
o dis SA
statement based on reading the comment?
P r f
a. The DATALINES statement is used when reading
S o r
data located in a raw data file.
o
b. The DATALINES statement is used when reading
f
A
S No tsit d e
data located directly in the program.
19
ou
3-10 Chapter 3 Working with SAS Syntax
Objectives
Identify SAS syntax errors.
Diagnose and correct a program with errors.
Save the corrected program.
ia l
t e r
M a
r y i o n
t a u t
23
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.2 Diagnosing and Correcting Syntax Errors 3-11
Syntax Errors
Syntax errors occur when program statements
do not conform to the rules of the SAS language.
l
unmatched quotation marks
ia
missing semicolons
r
invalid options
t e
When SAS encounters a syntax error, SAS prints
a
a warning or an error message to the log.
M
ERROR 22-322: Syntax error, expecting one of the following:
n
a name, a quoted string, (, /, ;, _DATA_, _LAST_,
y
_NULL_.
24
a r t i o
i e
to the SAS log: t b u
When SAS encounters a syntax error, SAS underlines the error and the following information is written
r r i
• the word ERROR or WARNING
p t
• the location of the error
S
r o dis SA
• an explanation of the error
S P o r
3.03 Quiz
o f
A t f e
This program has three syntax errors.
d
S No tsi What are the errors?
daat work.NewSalesEmps;
ou
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
p103d03
26
3-12 Chapter 3 Working with SAS Syntax
ia l
e
Submitting a SAS Program with Errors
t r
daat work.NewSalesEmps;
length First_Name $ 12
M a
n
Last_Name $ 18 Job_Title $ 25;
y
infile 'newemps.csv' dlm=',';
r
input First_Name $ Last_Name $
a
Job_Title $ Salary;
t i o
run;
i e t b u
run;
p r t i
proc print data=work.NewSalesEmps
r S
r o dis SA
proc means data=work.NewSalesEmps average max;
class Job_Title;
Pvar Salary;
run;
S o r o f
A f e
For z/OS (OS/390), the following INFILE statement is used:
t d
S No tsi
infile '.workshop.rawdata(newemps)' dlm=',';
The SAS log contains error messages and warnings.
ou
36 daat work.NewSalesEmps;
----
14
WARNING 14-169: Assuming the symbol DATA was misspelled as daat.
37 length First_Name $ 12
38 Last_Name $ 18 Job_Title $ 25;
39 infile 'newemps.csv' dlm=',';
40 input First_Name $ Last_Name $
41 Job_Title $ Salary;
42 run;
43
44 proc print data=work.NewSalesEmps
45 run;
---
22
ia l
r
202
ERROR 22-322: Syntax error, expecting one of the following: ;, (, BLANKLINE, DATA, DOUBLE,
WIDTH.
a t e
HEADING, LABEL, N, NOOBS, OBS, ROUND, ROWS, SPLIT, STYLE, SUMLABEL, UNIFORM,
ERROR 202-322: The option or parameter is not recognized and will be ignored.
46
y M n
NOTE: The SAS System stopped processing this step because of errors.
a r t i o
t
47 proc means data=work.NewSalesEmps average max;
i e i b u -------
22
r r
202
p t S
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ALPHA, CHARTYPE, CLASSDATA,
o dis SA
CLM, COMPLETETYPES, CSS, CV, DATA, DESCEND, DESCENDING, DESCENDTYPES, EXCLNPWGT,
EXCLNPWGTS, EXCLUSIVE, FW, IDMIN, KURTOSIS, LCLM, MAX, MAXDEC, MEAN, MEDIAN, MIN,
f
NWAY, ORDER, P1, P10, P25, P5, P50, P75, P90, P95, P99, PCTLDEF, PRINT, PRINTALL,
o
QRANGE, RANGE, SKEWNESS, STDDEV, STDERR, SUM, SUMSIZE, SUMWGT, T, THREADS, UCLM,
e
USS, VAR, VARDEF.
A 48
49
50
t
ERROR 202-322: The option or parameter is not recognized and will be ignored.
S No tsi d
class Job_Title;
var Salary;
run;
ou
NOTE: The SAS System stopped processing this step because of errors.
l
Last_Name $ 18 Job_Title $ 25;
ia
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
run;
Job_Title $ Salary;
t e r
run;
M a
proc print data=work.NewSalesEmps;
y o n
proc means data=work.NewSalesEmps mean max;
class Job_Title;
r i
var Salary;
run;
t a u t
r i e i b
3. Submit the program. It runs successfully without errors and generates output.
r S
p
o dis SA t
Saving the Corrected Program
P r
You can use the FILE command to save your program to a file. The program must be in the Enhanced
f
r
Editor or Program Editor before you issue the FILE command. If the code is not in the Program Editor,
S o o
recall your program before saving the program.
f e
A Windows or UNIX
S No tsit d
z/OS (OS/390)
file '.workshop.sascode(myprog)'
Save As.
ou
A note appears that indicates that the statements are saved to the file.
3.2 Diagnosing and Correcting Syntax Errors 3-15
ia l
e r
Submitting a SAS Program that Contains Unbalanced Quotation Marks
t
data work.NewSalesEmps;
M a
The closing quotation mark for the DLM= option in the INFILE statement is missing.
r y
Job_Title $ 25;
i o n
length First_Name $ 12 Last_Name $ 18
t a u t
infile 'newemps.csv' dlm=',;
input First_Name $ Last_Name $
run;
r i e
Job_Title $ Salary;
r i b S
p t
proc print data=work.NewSalesEmps;
run;
o dis SA
P r f
proc means data=work.NewSalesEmps;
S o r
class Job_Title;
var Salary;
f o
A
run;
S No tsit
SAS Log
51
52
d e
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18
ou
53 Job_Title $ 25;
54 infile 'newemps.csv' dlm=',;
55 input First_Name $ Last_Name $
56 Job_Title $ Salary;
57 run;
58
59 proc print data=work.NewSalesEmps;
60 run;
61
62 proc means data=work.NewSalesEmps;
63 class Job_Title;
64 var Salary;
65 run;
3-16 Chapter 3 Working with SAS Syntax
There are no notes in the SAS log because all of the SAS statements after the DLM= option became part
of the quoted delimiter.
The banner in the window indicates that the DATA step is still running, and it is still running
because the RUN statement was not recognized.
ia l
r
You can correct the unbalanced quotation marks programmatically by adding the following code before
e
your previous statements:
*';*";run;
a t
If the quotation mark counter within SAS has an uneven number of quotation marks, as seen in the above
M
program, SAS reads the quotation mark in the code above as the matching quotation mark in the quotation
n
mark counter. SAS then has an even number of quotation marks in the quotation mark counter and runs
y o
successfully, assuming no other errors occur. Both single quotation marks and double quotation marks are
r t i
used in case you submitted double quotation marks instead of single quotation marks.
t a
i e i b u
Point-and-Click Approaches to Balancing Quotation Marks
Windows
p r t r S
r o dis SA
1. To correct the problem in the Windows environment, click the break icon
the CTRL and Break keys.
or press
S P o r o f
2. Select 1. Cancel Submitted Statements in the Tasking Manager window and select OK.
A t f d e
S No tsi
ou
3.2 Diagnosing and Correcting Syntax Errors 3-17
ia l
UNIX
t e r
a
1. To correct the problem in the UNIX operating environment, open the SAS: Session Management
window and select Interrupt.
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
2. Select 1 in the SAS: Tasking Manager window.
S P o r o f
A t f d e
S No tsi
ou
3. Select Y.
3-18 Chapter 3 Working with SAS Syntax
z/OS (OS/390)
1. To correct the problem in the z/OS (OS/390) operating environment, press the Attention key
or issue the ATTENTION command.
2. Type 1 to select 1. Cancel Submitted Statements and press the ENTER key.
ia l
t e r
3. Type Y and press ENTER.
M a
r y i o n
t a
Resubmitting the Program
u t
i e i b
1. In the appropriate Editor window, add a closing quotation mark to the DLM= option in the INFILE
r
statement.
r S
p
o dis SA t
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18
P r Job_Title $ 25;
f
infile 'newemps.csv' dlm=',';
S f o r o
input First_Name $ Last_Name $
Job_Title $ Salary;
A
run;
S No tsit d e
proc print data=work.NewSalesEmps;
run;
ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
2. Resubmit the program.
When you make changes to the program in the Enhanced Editor and did not save the new
version of the program, the window bar and the top border of the window reflect that you
changed the program without saving it by putting an asterisk (*) beside the window name.
When you save the program, the * disappears.
3.2 Diagnosing and Correcting Syntax Errors 3-19
Exercises
Level 1
ia l
b. Submit the program.
t e r
a
c. Use the notes in the SAS log to identify the error.
M
d. Correct the error and resubmit the program.
Level 2
r y i o n
t a u t
2. Diagnosing and Correcting a Missing Statement
i e i b
a. With the appropriate Editor window active, include the SAS program p103e02.
r r S
p t
b. Submit the program.
o dis SA
r
c. Are there any errors in the SAS log? _____________________________________________
S P o o f
d. Notice the message in the title bar of the Editor window.
r
e. Why is PROC PRINT running? _________________________________________________
A t f e
f. Add the missing statement to execute the PROC PRINT step.
d
S No tsi
g. Submit the added statement.
h. Confirm that the output was created for the program by viewing the Log and Output windows.
Level 3
ou
3. Using the Help Facility to Determine the Types of Errors in SAS
a. In the Help facility, type syntax errors on the Index tab.
Chapter Review
1. With what do SAS statements usually begin?
ia l
t e r
3. What are two methods of commenting?
a
4. Name four types of syntax errors.
M
5. How do you save a program?
y n
a r t i o
i e t b u
i
31
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
3.4 Solutions 3-21
3.4 Solutions
Solutions to Exercises
1. Diagnosing and Correcting a Misspelled Word
a. Include the SAS program.
b. Submit the program.
c. Use the notes in the SAS log to identify the error.
ia l
d. Correct the error.
data work.country;
t e r
M a
length Country_Code $ 2 Country_Name $ 48;
infile 'country.dat' dlm='!';
n
input Country_Code $ Country_Name $;
run;
a r y
proc print data=work.country;
t i o
run;
i e t b u
p r t i
2. Diagnosing and Correcting a Missing Statement
r
a. Include the SAS program.
S
r o dis SA
b. Submit the program.
S P o r o f
c. Are there any errors in the SAS log? No
d. Notice the message in the title bar.
A t f d e
e. Why is PROC PRINT running? The PROC PRINT step is missing a RUN statement.
S No tsi
f. Add the missing statement.
data work.donations;
ou
infile 'donation.dat';
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
run;
ia l
e r
Semantic: when the language element is correct, but the element might not be valid for a
particular usage compile time
t
time
M a
Execution-time: when SAS attempts to execute a program and execution fails execution
r i o n
Data: when data values are invalid execution time
y
Macro-related: when you use the macro facility incorrectly
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.4 Solutions 3-23
ia l
d. 7
t e r
data work.NewSalesEmps;
a
length First_Name $ 12
M
Last_Name $ 18 Job_Title $ 25;
r o n
infile 'newemps.csv' dlm=',';
y
input First_Name $ Last_Name $
i
t
Job_Title $ Salary;
run;
e t a u
8
r i t r i b S
r p
o dis SA
3.02 Multiple Choice Poll – Correct Answer
S P o r o f
Which statement is true concerning the DATALINES
statement based on reading the comment?
A t f e
a. The DATALINES statement is used when reading
d
data located in a raw data file.
ou
20
3-24 Chapter 3 Working with SAS Syntax
run;
Job_Title $ Salary;
ia l
e r
proc print data=work.NewSalesEmps
run;
t
class Job_Title;
var Salary;
M a
proc means data=work.NewSalesEmps average max;
n
run;
27
a r y t i o p103d03
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
3.4 Solutions 3-25
ia l
r
semicolon
t e
3. What are two methods of commenting?
/* comment */
a
* comment;
y M n
a r t i o
i e t b u
i
32 continued...
p r t r S
r o dis SA
Chapter Review Answers
S P o r o f
4. Name four types of syntax errors.
misspelled keywords
A t f d e
unmatched quotation marks
missing semicolons
ou
5. How do you save a program?
Issue the FILE command.
Select File Save As.
33
3-26 Chapter 3 Working with SAS Syntax
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 4 Getting Familiar with SAS
Data Sets
ia l
Exercises .............................................................................................................................. 4-13
4.2
t e r
Accessing SAS Data Libraries .................................................................................... 4-16
a
Demonstration: Accessing and Browsing SAS Data Libraries – Windows .......................... 4-24
M
r i o n
Demonstration: Accessing and Browsing SAS Data Libraries – UNIX ................................ 4-28
y
Demonstration: Accessing and Browsing SAS Data Libraries – z/OS (OS/390) ................. 4-31
t a u t
Exercises .............................................................................................................................. 4-33
4.3
r i e r i b
Accessing Relational Databases (Self-Study) ........................................................... 4-35
S
p
o dis SA t
r
4.4 Chapter Review ............................................................................................................. 4-40
P
4.5
S r o f
Solutions ....................................................................................................................... 4-41
o
A t f d e
Solutions to Exercises .......................................................................................................... 4-41
S No tsi
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 4-45
ou
4-2 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4.1 Examining Descriptor and Data Portions 4-3
Objectives
Define the components of a SAS data set.
Define a SAS variable.
Identify a missing value and a SAS date value.
State the naming conventions for SAS data sets
ia l
and variables.
e r
Browse the descriptor portion of SAS data sets
t
by using the CONTENTS procedure.
M a
Browse the data portion of SAS data sets by using
the PRINT procedure.
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P r SAS Data Set
f
r
A SAS data set is a file that SAS creates and processes.
S o o
Partial Work.NewSalesEmps
f e
A
Data Set Name WORK.NEWSALESEMPS
S No tsit
Engine
Created
Observations d V9
Fri, Feb 08, 2008 01:40 PM
71 Descriptor
Portion
ou
Variables 4
...
First_Name Last_Name Job_Title Salary
$ 12 $ 18 $ 25 N 8
Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV 30890 Data
Kevin Lyon Sales Rep. I 26955 Portion
Petrea Soltau Sales Rep. II 27440
Data must be in the form of a SAS data set to be processed by many SAS procedures and some
DATA step statements.
A SAS data set is a specially structured file that contains data values.
4-4 Chapter 4 Getting Familiar with SAS Data Sets
Descriptor Portion
The descriptor portion of a SAS data set contains the
following:
general information about the SAS data set (such
as data set name and number of observations)
variable information (such as name, type, and length)
Partial Work.NewSalesEmps
Data Set Name WORK.NEWSALESEMPS
ia l
Engine
Created
V9
t e r
Fri, Feb 08, 2008 01:40 PM General
Information
a
Observations 71
Variables 4
M
...
n
First_Name Last_Name Job_Title Salary Variable
5
$ 12 $ 18
a r y $ 25
t i o
N 8 Information
i e t b u
p r t i
Browsing the Descriptor Portion
r S
The CONTENTS procedure displays the descriptor
r o dis SA
portion of a SAS data set.
S P r f
General form of the CONTENTS procedure:
o o
A t f RUN;
d e
PROC CONTENTS DATA=SAS-data-set;
S No tsi Example:
ou
proc contents data=work.NewSalesEmps;
run;
p104d01
6
4.1 Examining Descriptor and Data Portions 4-5
l
Last Modified Wed, Jan 16, 2008 Deleted Observations 0
ia
02:14:20 PM
Protection Compressed NO
r
Data Set Type Sorted NO
Label
#
t e
Alphabetic List of Variables and Attributes
y M
1
3
2
First_Name
n
Job_Title
Last_Name
Char
Char
Char
12
25
18
r o
4 Salary Num 8
i
7
t a u t
This is a partial view of the default PROC CONTENTS output. PROC CONTENTS output
i e i b
also contains information about the physical location of the file and other data set information.
r r S
p t
The descriptor portion contains the metadata of the data set.
o dis SA
P r4.01 Quiz
f
S o r o
How many observations are in the data set
f
A
S No tsit d e
Work.donations?
ou
to view the descriptor portion of
Work.donations.
Submit the program and review the results.
9
4-6 Chapter 4 Getting Familiar with SAS Data Sets
Data Portion
The data portion of a SAS data set is a rectangular table
of character and/or numeric data values.
Partial Work.NewSalesEmps
Variable
First_Name Last_Name Job_Title Salary
names
l
Satyakam Denny Sales Rep. II 26780
ia
Monica Kletschkus Sales Rep. IV 30890 Variable
Kevin Lyon Sales Rep. I 26955 values
Petrea Soltau
t
Character values
e r
Sales Rep. II 27440
Numeric
M a values
11
y o n
The data values are organized as a table of observations
(rows) and variables (columns).
r i
t a u t
Variable names are part of the descriptor portion, not the data portion.
r i e r i b S
p t
SAS Variable Values
o dis SA
P r There are two types of variables:
f
r
Contain any value: letters, numbers,
S f o e
character o special characters, and blanks.
Character values are stored with
A
S No tsit d
a length of 1 to 32,767 bytes.
One byte equals one character.
Stored as floating point numbers
ou
in 8 bytes of storage by default.
Eight bytes of floating point storage
numeric
provide space for 16 or 17 significant
digits.
You are not restricted to 8 digits.
12
4.1 Examining Descriptor and Data Portions 4-7
ia l
t e r
M a
14
r y i o n
t a u t
r i e
SAS Date Values
r i b S
p t
SAS stores date values as numeric values.
o dis SA
P r 01JAN1959
f
01JAN1960 01JAN1961
S f o r o store
A
S No tsit d e
-365
01/01/1959
display
0
01/01/1960
366
01/01/1961
18
ou
A SAS date value is stored as the number of days
between January 1, 1960, and a specific date.
4.03 Quiz
What is the numeric value for today’s date?
ia l
t e r
M a
20
r y i o n
t a u t
r i e i
Missing Data Values
r b S
p t
A value must exist for every variable for each observation.
o dis SA
Missing values are valid values in a SAS data set.
P r f
Partial Work.NewSalesEmps
S f o r
First_Name
Satyakam
o
Last_Name
Denny
Job_Title
Sales Rep. II
Salary
26780
A
S No tsit
Monica
Kevin
Petrea
d e Kletschkus Sales Rep. IV
Lyon
Soltau
Sales Rep. I
.
26955
27440
ou
A character A numeric
missing value missing value
is displayed is displayed
as a blank. as a period.
22
A period is the default display for a missing numeric value. The default display can be altered by
changing the MISSING= SAS system option.
4.1 Examining Descriptor and Data Portions 4-9
ia l
t e r
M a
23
r y i o n
t a u t
Special characters can be used in variable names if you put the name in quotation marks followed
r i e r i b
immediately by the letter N.
S
t
Example: class 'Flight#'n;
p
o dis SA
In order to use special characters in variable names, the VALIDVARNAME option must be set to ANY.
r
P f
Example: options validvarname=any;
S f o r o
A
S No tsit d e
4.04 Multiple Answer Poll
Which variable names are valid?
a. data5mon
ou
b. 5monthsdata
c. data#5
d. five months data
e. five_months_data
f. FiveMonthsData
25
4-10 Chapter 4 Getting Familiar with SAS Data Sets
Observation Row
Variable Column
ia l
is specific to SAS.
t e r
The terminology of data set, observation, and variable
among databases.
M a
The terminology of table, row, and column is common
27
r y i o n
t a u t
r i e i b
Browsing the Data Portion
r S
p t
The PRINT procedure displays the data portion
o dis SA
of a SAS data set.
P r f
By default, PROC PRINT displays the following:
S f o r o
all observations
e
all variables
A
S No tsit d
an Obs column on the left side
28
ou
4.1 Examining Descriptor and Data Portions 4-11
l
Example:
ia
proc print data=work.NewSalesEmps;
r
run;
a t e
y M n
29
a r t i o p104d02
i e t b u
p r t i
Browsing the Data Portion
r
Partial PROC PRINT Output
S
r o dis SA
Obs First_Name Last_Name Job_Title Salary
P f
1 Satyakam Denny Sales Rep. II 26780
r
2 Monica Kletschkus Sales Rep. IV 30890
o
3 Kevin Lyon Sales Rep. I 26955
S o
4 Petrea Soltau Sales Rep. II 27440
f e
5 Marina Iyengar Sales Rep. III 29715
A
6 Shani Duckett Sales Rep. I 25795
S No tsit 7
8
9
10
11
Fang
d
Michael
Amanda
Vincent
Viney
Wilson
Minas
Liebman
Eastley
Barbis
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
II
I
II
III
III
26810
26970
27465
29695
30265
ou
12 Skev Rusli Sales Rep. II 26580
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445
15 Leonid Karavdic Sales Rep. II 27860
30
4-12 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t e r
The NOOBS option suppresses the observation
numbers on the left side of the report.
a
The VAR statement selects variables that appear
in the report and determines their order.
y M n
31
a r t i o
i e t b u
p r t r i
Browsing the Data Portion
S
o dis SA
proc print data=work.NewSalesEmps noobs;
var Last_Name First_Name Salary;
P r run;
f
S f o r
Partial PROC PRINT Output
o
Last_Name First_Name Salary
A
S No tsit d e
Denny
Kletschkus
Lyon
Soltau
Iyengar
Duckett
Satyakam
Monica
Kevin
Petrea
Marina
Shani
26780
30890
26955
27440
29715
25795
ou
Wilson Fang 26810
Minas Michael 26970
Liebman Amanda 27465
Eastley Vincent 29695
p104d03
32
4.1 Examining Descriptor and Data Portions 4-13
Exercises
Level 1
ia l
t e r
a. Retrieve the starter program p104e01.
b. After the PROC CONTENTS step, add a PROC PRINT step to display all observations,
M a
all variables, and the Obs column for the data set named Work.donations.
y o n
Partial PROC PRINT Output (First 10 of 124 Observations)
r i
a t
Employee_
t u
Obs ID Qtr1 Qtr2 Qtr3 Qtr4 Total
r i e 1
2
r i b
120265
120267
S
.
15
.
15
.
15
25
15
25
60
p t
3 120269 20 20 20 20 80
o dis SA
4 120270 20 10 5 . 35
r
5 120271 20 20 20 20 80
6 120272 10 10 10 10 40
S P o r
7
8
9
o f
120275
120660
120662
15
25
10
15
25
.
15
25
5
15
25
5
60
100
20
A t f d e
10 120663 . . 5 . 5
S No tsi
d. In the PROC PRINT step, add a VAR statement and the NOOBS option to display only the
Employee_ID and Total variables.
ou
Partial PROC PRINT Output (First 10 of 124 Observations)
Employee_
ID
120265
Total
25
120267 60
120269 80
120270 35
120271 80
120272 40
120275 60
120660 100
120662 20
120663 5
4-14 Chapter 4 Getting Familiar with SAS Data Sets
Level 2
l
c. Submit the program and answer the following questions:
ia
How many observations are in the data set? ______________________________________
e r
How many variables are in the data set? _________________________________________
t
What is the length (byte-size) of the variable Product_Name? _____________________
M a
d. After the PROC CONTENTS step, add a PROC PRINT step with appropriate statements and
n
options to display part of the data portion of Work.newpacks.
a r y t i o
e. Submit the program to create the following PROC PRINT report:
t
Product_Name Supplier_Name
i e b u
Black/Black
i
Top Sports
r
X-Large Bottlegreen/Black Top Sports
p t r S
Commanche Women's 6000 Q Backpack. Bark Top Sports
o dis SA
Expedition Camp Duffle Medium Backpack Miller Trading Inc
Feelgood 55-75 Litre Black Women's Backpack Toto Outdoor Gear
f
Medium Black/Bark Backpack
Toto Outdoor Gear
Top Sports
r
Medium Gold Black/Gold Backpack Top Sports
A
S No tsit d
Victor Grey/Olive Women's Backpack
Deer Backpack
Deer Waist Bag
Hammock Sports Bag
Sioux Men's Backpack 26 Litre.
Top Sports
Luna sastreria S.A.
Luna sastreria S.A.
Luna sastreria S.A.
Miller Trading Inc
ou
4.1 Examining Descriptor and Data Portions 4-15
Level 3
c. Use the Help facility to find documentation on how times and datetimes are stored in SAS.
Go to the CONTENTS tab in the SAS Help and Documentation and select SAS Products
Base SAS SAS 9.2 Language Reference: Concepts SAS System Concepts
ia l
t e r
Dates, Times, and Intervals About SAS Date, Time, and Datetime Values.
d. Complete the following sentences:
a
A SAS time value is a value representing the number of _________________________________
M n
______________________________________________________________________________.
a r y t i o
A SAS datetime value is a value representing the number of ______________________________
t
______________________________________________________________________________.
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
4-16 Chapter 4 Getting Familiar with SAS Data Sets
Objectives
Explain the concept of a SAS data library.
Assign a library reference name to a SAS data library
by using the LIBNAME statement.
State the difference between a permanent library
ia l
r
and a temporary library.
a e
Browse the contents of a SAS data library by using
t
Investigate a SAS data library by using the
M
CONTENTS procedure.
r y i o n
t a u t
36
r i e r i b S
p
o dis SA t
P r SAS Data Libraries
f
r
A SAS data library is a collection of SAS files that
S f o e o
are recognized as a unit by SAS.
A
S No tsit d
Directory-based System
Windows Example:
A SAS data library
s:\workshop
is a directory.
ou
UNIX Example: /users/userid
37
4.2 Accessing SAS Data Libraries 4-17
ia l
t e r
Libraries
M a
38
r y i o n
t a u t
r i e
Assigning a Libref
r i b S
p t
Regardless of which host operating system you use, you
o dis SA
identify SAS data libraries by assigning a library reference
f
S f o r o
A
S No tsit d e Libref
39
ou
4-18 Chapter 4 Getting Familiar with SAS Data Sets
ia l
Sasuser – permanent library
t e r Sasuser
M a
40
r y i o n
t a u t
The Work library and its SAS data files are deleted after your SAS session ends.
r i e i b
SAS data sets in permanent libraries such as the Sasuser library are saved after
r S
t
your SAS session ends.
r p
o dis SA
S P SAS Data Libraries
r o f
You can also create and access your own permanent
o
A t f
libraries.
d e
S No tsi
ou
Work
Sasuser
41
4.2 Accessing SAS Data Libraries 4-19
Assigning a Libref
You can use the LIBNAME statement to assign a library
reference name (libref) to a SAS data library.
l
LIBNAME libref 'SAS-data-library' <options>;
e r ia
t
The name must be 8 characters or less.
a
The name must begin with a letter or underscore.
The remaining characters must be letters, numerals,
M n
or underscores.
42
a r y t i o
i e t b u
For Windows and UNIX, SAS can only make an association between a libref and
i
an existing directory. The LIBNAME statement does not create a new directory.
p r t r S
o dis SA
z/OS (OS/390) users can use a DD statement or TSO ALLOCATE command instead
of issuing a LIBNAME statement.
P r f
S o r
Assigning a Libref
f o
A
S No tsit
Examples:
Windows
d e
libname orion 's:\workshop';
ou
UNIX
libname orion '/users/userid';
z/OS (OS/390)
libname orion 'userid.workshop.sasdata';
43
4-20 Chapter 4 Getting Familiar with SAS Data Sets
Work
ia l
t e r Sasuser
M a
s:\workshop
/users/userid
44
r y i o n
userid.workshop.sasdata
orion
t a u t
When your session ends, the link between the libref and the physical location of your files is broken.
r i e r i b S
p
4.05 Poll
o dis SA t
P r During an interactive SAS session, every time that
f
you submit a program you must also resubmit
S f o r
the LIBNAME statement.
o
A
S No tsit
True
False
d e
ou
46
4.2 Accessing SAS Data Libraries 4-21
ia l
t
Sales
e r Sasuser
M a
The second name (filename)
n
refers to the file in the library. orion
48
a r y t i o
i e t b u
p r t i
Temporary SAS Filename
r S
The default libref is Work if the libref is omitted.
r o dis SA
NewSalesEmps work.NewSalesEmps
S P o r
data NewSalesEmps;
o f
A t f e
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
d
S No tsi infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
run;
Job_Title $ Salary;
49
ou
proc print data=work.NewSalesEmps;
run;
4-22 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t e r
available during your current
SAS session
navigate to see all members
M a of a specific library
display the descriptor portion
50
r y i o n
of a SAS data set
t a u t
The SAS windowing environment opens the SAS Explorer by default on many hosts. You can
i e i b
issue the SAS EXPLORER command to invoke this window if it does not appear by default.
r r S
p
o dis SA
• View
t
The SAS Explorer can be opened by selecting
Contents Only
P r or
f
S f o r
• View Explorer.
o
In the Contents Only view, the SAS Explorer is a single-paned window that contains the contents
A
S No tsit d e
of your SAS environment. As you open folders, the folder contents replace the previous contents
in the same window.
In the Explorer view of the SAS Explorer window, folders appear in the tree view on the left and
folder contents appear in the list view on the right.
ou
4.2 Accessing SAS Data Libraries 4-23
ia l
of the data sets.
t e r
NODS is only used in conjunction with the keyword
_ALL_.
M a
51
r y i o n
t a u t
If you are using a noninteractive or batch SAS session, the CONTENTS procedure
i e i b
is an alternative to the EXPLORER command.
r r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4-24 Chapter 4 Getting Familiar with SAS Data Sets
ia l
1
t e r
2. Check the log to confirm that the orion libref was assigned.
libname orion 's:\workshop';
M a
NOTE: Libref ORION was successfully assigned as follows:
Engine: V9
Physical Name: s:\workshop
y o n
3. View the PROC CONTENTS output in the Output window.
r i
a t
Partial PROC CONTENTS Output
t u
e
The CONTENTS Procedure
r i t r i b S
Directory
r p
o dis SA
Libref
Engine
Physical Name
ORION
V9
s:\workshop
S P o r o f
File Name s:\workshop
A t f #
d eName
Member
Type
File
Size Last Modified
S No tsi 1
2
BUDGET
COUNTRY
COUNTRY
DATA
DATA
INDEX
5120
17408
17408
12Feb08:00:57:25
12Feb08:00:57:25
12Feb08:00:57:25
ou
3 CUSTOMER DATA 33792 12Feb08:00:57:25
4 CUSTOMER_DIM DATA 33792 12Feb08:00:57:25
5 CUSTOMER_TYPE DATA 17408 12Feb08:00:57:25
CUSTOMER_TYPE INDEX 9216 12Feb08:00:57:25
6 EMPLOYEE_ADDRESSES DATA 74752 12Feb08:00:57:25
7 EMPLOYEE_DONATIONS DATA 25600 12Feb08:00:57:25
8 EMPLOYEE_ORGANIZATION DATA 41984 12Feb08:00:57:25
9 EMPLOYEE_PAYROLL DATA 33792 12Feb08:00:57:25
10 LOOKUP_COUNTRY DATA 37888 12Feb08:00:57:25
11 MNTH7_2007 DATA 5120 12Feb08:00:57:25
12 MNTH8_2007 DATA 5120 12Feb08:00:57:25
13 MNTH9_2007 DATA 5120 12Feb08:00:57:25
14 NONSALES DATA 33792 12Feb08:00:57:25
4.2 Accessing SAS Data Libraries 4-25
4. Select the Explorer tab on the SAS window bar to activate the SAS Explorer or select
View Contents Only.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
5. Double-click Libraries to show all available libraries.
o dis SA
P r f
S f o r o
A
S No tsit d e
ou
4-26 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
7. Right-click on the Sales data set and select Properties.
o dis SA
P r f
S f o r o
A
S No tsit d e
ou
This default view provides general information about the data set, such as the library in which it is
stored, the type of information it contains, its creation date, the number of observations and variables,
and so on. You can request specific information about the columns in the data table by selecting the
Columns tab at the top of the Properties window.
4.2 Accessing SAS Data Libraries 4-27
9. Double-click on the Sales data set or right-click on the file and select Open.
This opens the data set in a VIEWTABLE window. A view of orion.sales is shown below.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A t d e
In addition to browsing SAS data sets, you can use the VIEWTABLE window to edit data sets,
S No tsi
create data sets, and customize your view of a SAS data set. For example, you can do the following:
• sort your data
ou
• change the color and fonts of variables
• display variable labels versus variable names
• remove and add variables
Variable labels are displayed by default. Display variable names instead of variable labels
by selecting View Column Names.
ia l
e r
2. Check the log to confirm that the orion libref was assigned.
t
a
3. View the PROC CONTENTS output in the Output window.
M
4. Select View Contents Only to activate the SAS Explorer.
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
5. Double-click Libraries to show all available libraries.
ou
4.2 Accessing SAS Data Libraries 4-29
ia l
t e r
a
7. Right-click on the Sales data set and select Properties.
M
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8. Select to close the Properties window.
4-30 Chapter 4 Getting Familiar with SAS Data Sets
9. Double-click on the Sales data set or right-click on the file and select Open.
This opens the data set in a VIEWTABLE window. A view of orion.sales is shown below.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
10. Select
o dis SA t
to close the VIEWTABLE window.
P r
11. With the SAS Explorer active, select
f
on the Toolbox to return to Libraries.
S f o r o
A
S No tsit d e
ou
4.2 Accessing SAS Data Libraries 4-31
ia l
e r
2. Check the log to confirm that the orion libref was assigned.
t
a
3. View the PROC CONTENTS output in the Output window.
M
4. Type explorer on the command line and press ENTER to activate the SAS Explorer.
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5. Type s beside Orion and press ENTER to show all members of that library.
4-32 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t e r
M a
r y i o n
t a u t
r e r i b
7. Select OK to close the Properties window.
i S
8. Type ? beside Sales and press ENTER.
p
o dis SA t
9. Select Open to open the FSVIEW window.
P r f
S f o r o
A
S No tsit d e
ou
10. Type end to close the FSVIEW window.
Exercises
Level 1
ia l
t e r
a. Write and submit the appropriate LIBNAME statement to provide access to the orion libref.
Fill in the blank with the location of your SAS data library.
a
libname orion '_____________________________';
M
r
Windows
y i o n
Possible location of your SAS data library:
s:\workshop
t a
UNIX
u t /users/userid
r i e
z/OS (OS/390)
r i b S
.workshop.sasdata
p t
b. Check the log to confirm that the SAS data library was assigned.
o dis SA
r
NOTE: Libref ORION was successfully assigned as follows:
S P o o f
c. Add a PROC CONTENTS step to list all the SAS data sets in the orion library. Do not display
r
the descriptor portions of the individual data sets.
A t f d
orion.sales.
e
d. Add another PROC CONTENTS step to display the descriptor portion of the data set
S No tsi
e. Use the SAS Explorer window to view the contents of the orion library.
ou
Level 2
5. Reviewing Concepts
a. SAS statements usually begin with a(n) .
b. Every SAS statement ends with a .
c. The descriptor portion of a SAS data set can be viewed using the _________________ procedure.
d. Character variable values can be up to characters long and use
byte(s) of storage per character.
e. By default, numeric variables are stored in bytes of storage.
f. The internally stored SAS date value for January 3, 1960, is .
4-34 Chapter 4 Getting Familiar with SAS Data Sets
l
k. A libref name must be _________________ characters or less.
l. What are the two kinds of steps? ____________________________________________________
e r
m. What are the three primary windows in the SAS windowing environment? __________________ ia
a t
n. What are the two portions of every SAS data set? ______________________________________
M
o. What are the two types of variables? _________________________________________________
r y i o n
p. True or False: If a SAS program produces output, then the program ran successfully and there is
no need to check the SAS log.
t a u t
q. True or False: There are two methods for commenting in a SAS program.
i e i b
r. True or False: Omitting a semicolon never causes errors.
r r S
p t
s. True or False: A library reference name (libref) references a particular data set.
o dis SA
t. True or False: If a data set is referenced with a one level name, Work is the implied libref.
P r f
u. True or False: The _ALL_ keyword is used with the PRINT procedure.
S
Level 3
f o r o
A
S No tsit d e
6. Investigating the LIBNAME Statement
a. Use the Help facility to find documentation about the LIBNAME statement.
ou
Go to the CONTENTS tab in the SAS Help and Documentation and select
SAS Products Base SAS SAS 9.2 Language Reference: Dictionary
Dictionary of Language Elements Statements LIBNAME Statement.
b. Answer the following questions:
What argument disassociates one or more currently assigned librefs? ___________________
What system option provides you the convenience of specifying only a one-level name for
permanent SAS files? _________________________________________________________
c. Write and submit a LIBNAME statement that shows the attributes of all currently assigned
SAS libraries in the SAS log. ___________________________________________________
4.3 Accessing Relational Databases (Self-Study) 4-35
Objectives
Assign a library reference name to a relational
database by using the LIBNAME statement.
Reference a relational database table using
a SAS two-level name.
ia l
t e r
M a
r y i o n
t a u t
62
r i e r i b S
p
o dis SA t
P r The LIBNAME Statement (Review)
f
r
The LIBNAME statement assigns a library reference
S f o e o
name (libref) to a SAS data library.
A
S No tsit d
General form of the LIBNAME statement:
ou
63
4-36 Chapter 4 Getting Familiar with SAS Data Sets
ia l
t r
After a database is associated with a libref, you can use a
e
SAS two-level name to specify any table in the database
and then work with the table as you would with a SAS
data set.
M a
64
r y i o n
t a u t
The SAS/ACCESS interface to relational databases is a family of interfaces (each of which is licensed
r i e r i b
separately) that enable you to interact with data in other vendors' databases from within SAS.
S
t
The engine-name such as Oracle or DB2 is the SAS/ACCESS component that reads from and writes to
p
o dis SA
your DBMS. The engine name is required. Because the SAS/ACCESS LIBNAME statement associates a
libref with a SAS/ACCESS engine that supports connections to a particular DBMS, it requires a DBMS-
r
specific engine name.
S P o r o f
A t f d e
S No tsi
ou
4.3 Accessing Relational Databases (Self-Study) 4-37
Oracle Example
This example uses the LIBNAME statement as supported
in the SAS/ACCESS interface to Oracle.
libref engine
ia l
r
pw=edu001
options path=dbmssrv
a t e schema=educ;
y M n
65
a r t i o p104d05
i e t b u
USER= specifies an optional Oracle user name. If the user name contains blanks or national characters,
enclose it in quotation marks. USER= must be used with PASSWORD=.
name.
p r t r i
PASSWORD= or PW= specifies an optional Oracle password that is associated with the Oracle user
S
r o dis SA
PATH= specifies the Oracle driver, node, and database. SAS/ACCESS uses the same Oracle path
designation that you use to connect to Oracle directly. See your database administrator to determine
P f
the databases that are set up in your operating environment, and to determine the default values
S f o r
if you do not specify a database.
o
SCHEMA= enables you to read database objects, such as tables and views, in the specified schema. If
A
S No tsit d e
this option is omitted, you connect to the default schema for your DBMS. The values for SCHEMA=
are usually case sensitive, so use care when you specify this option.
ou
4-38 Chapter 4 Getting Familiar with SAS Data Sets
Oracle Example
Any table in this Oracle database can be referenced
using a SAS two-level name.
ia l
t e r
M a
66
r y i o n
t a u t
r i e
Oracle Example
r i b S
p
o dis SA t
libname oralib oracle
user=edu001 pw=edu001
P r path=dbmssrv schema=educ;
f
S o
run;
f r
proc print data=oralib.supervisors;
o
A
S No tsit d e
data work.staffpay;
merge oralib.staffmaster
oralib.payrollmaster;
by empid;
run;
ou
libname oralib clear;
p104d05
67
The CLEAR option in the LIBNAME statement disassociates the libref. Disassociating the libref
disconnects the database engine from the database and closes any resources that are associated with
that libref's connection.
4.3 Accessing Relational Databases (Self-Study) 4-39
4.06 Quiz
Which option in the LIBNAME statement specifies a
user’s password when accessing an Informix database?
t e r
LIBNAME Statement Specifics for Informix).
M a
69
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4-40 Chapter 4 Getting Familiar with SAS Data Sets
Chapter Review
1. What structure is a SAS data library in UNIX or
Windows?
ia l
r
(OS/390)?
t e
3. What is the name of the permanent SAS data library
a
that SAS creates for you?
y M n
4. What can you do with the SAS Explorer?
a r t i o
5. What window enables you to interactively browse a
i e t
SAS data set?
b u
i
72
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
4.5 Solutions 4-41
4.5 Solutions
Solutions to Exercises
1. Examining the Data Portion
a. Retrieve the starter program.
b. After the PROC CONTENTS step, add a PROC PRINT step.
data work.donations;
ia l
r
infile 'donation.dat';
e
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;
run;
a t
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
run;
y M
proc contents data=work.donations;
n
a r
proc print data=work.donations;
t i o
run;
i e t b u
i
c. Submit the program.
p r t r S
d. In the PROC PRINT step, add a VAR statement and the NOOBS option.
o dis SA
proc print data=work.donations noobs;
P
run;
r
var Employee_ID Total;
f
S f o r o
e. Submit the program.
A
S No tsit d e
ou
4-42 Chapter 4 Getting Familiar with SAS Data Sets
t e r
Expedition Camp Duffle Medium Backpack
Feelgood 55-75 Litre Black Women's Backpack
a
Toto Outdoor Gear AU Jaguar 50-75 Liter Blue Women's Backpack
Top Sports DK Medium Black/Bark Backpack
Top Sports
Top Sports
y M DK
DK
Medium Gold Black/Gold Backpack
n
Medium Olive Olive/Black Backpack
o
Toto Outdoor Gear AU Trekker 65 Royal Men's Backpack
Top Sports
t a
Luna sastreria S.A. r t
DK
ES
i Victor Grey/Olive Women's Backpack
Deer Backpack
i e
Luna sastreria S.A.
Luna sastreria S.A.
i b uES
ES
Deer Waist Bag
Hammock Sports Bag
p r
Miller Trading Inc
t r
US
S
Sioux Men's Backpack 26 Litre.
o dis SA
run;
P r f
proc contents data=work.newpacks;
run;
S f o r o
c. Submit the program and answer the following questions:
A
S No tsit d e
How many observations are in the data set? 15
How many variables are in the data set? 3
ou
What is the length (byte-size) of the variable Product_Name? 43
c. Use the Help facility to find documentation on how times and datetimes are stored in SAS.
d. Complete the following sentences:
day.
ia l
A SAS time value is a value representing the number of seconds since midnight of the current
e r
A SAS datetime value is a value representing the number of seconds between January 1, 1960,
and an hour/minute/second within a specified date.
t
M a
4. Accessing a SAS Data Library
a. Write and submit the appropriate LIBNAME statement.
r y
libname orion 'SAS-data-library';
i o n
a t
b. Check the log to confirm that the SAS data library was assigned.
i e t b u
c. Add a PROC CONTENTS step to list all the SAS data sets in the orion library.
run;
p r t i
proc contents data=orion._all_ nods;
r S
o dis SA
d. Add another PROC CONTENTS step to display the descriptor portion of orion.sales.
P
run;r
proc contents data=orion.sales;
f
S o r o
e. Use the SAS Explorer window to view the contents of the orion library.
f
A t e
5. Reviewing Concepts
S No tsi d
a. SAS statements usually begin with an identifying keyword.
b. Every SAS statement ends with a semicolon.
ou
c. The descriptor portion of a SAS data set can be viewed using the CONTENTS procedure.
d. Character variable values can be up to 32,767 characters long and use 1 byte(s) of storage per
character.
e. By default, numeric variables are stored in 8 bytes of storage.
f. The internally stored SAS date value for January 3, 1960, is 2.
g. A SAS variable name has 1 to 32 characters and begins with a letter or an underscore.
h. A missing character value is displayed as a blank.
i. A missing numeric value is displayed as a period.
j. When a SAS session starts, SAS automatically creates the temporary library called Work.
t
M a
s. True or False: A library reference name (libref) references a particular data set. False
t. True or False: If a data set is referenced with a one level name, Work is the implied libref. True
y o n
u. True or False: The _ALL_ keyword is used with the PRINT procedure. False
r i
a t
6. Investigating the LIBNAME Statement
t u
r i e
a. Use the Help facility.
r i b S
b. Answer the following questions:
p
o dis SA t
What argument disassociates one or more currently assigned librefs? CLEAR
P r What system option provides you the convenience of specifying only a one-level name for
f
permanent SAS files? USER=
S o r o
c. Write and submit a LIBNAME statement.
f
A libname _all_ list;
S No tsit d e
ou
4.5 Solutions 4-45
ia l
r
data work.donations;
infile 'donation.dat';
t e
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
run;
a
M n
proc contents data=work.donations;
run;
y
a r t i o
i e t b u p104a01s
i
10
p r t r S
r o dis SA
4.02 Multiple Choice Poll – Correct Answer
S P r
values?
o o f
Which variable type do you think SAS uses to store date
A t f
a. character
d e
S No tsi
b. numeric
ou
15
4-46 Chapter 4 Getting Familiar with SAS Data Sets
Example:
If the current date is February 1, 2008, the numeric value
is 17563.
ia l
t e r
M a
21
r y i o n
t a u t
r i e i b
4.04 Multiple Answer Poll – Correct Answer
r S
p t
Which variable names are valid?
o dis SA
P r a.
b.
data5mon
5monthsdata
f
S
c.
f
d.
o rdata#5
o
five months data
A
S No tsit
e.
f. e
five_months_data
d
FiveMonthsData
26
ou
4.5 Solutions 4-47
True
False
ia l
e r
The LIBNAME statement remains in effect until
canceled, changed, or your SAS session ends.
t
M a
47
r y i o n
t a u t
r i e i b
4.06 Quiz – Correct Answer
r S
p t
Which option in the LIBNAME statement specifies a
o dis SA
user’s password when accessing an Informix database?
P r f
r
The USING= option specifies the password that is
S o o
associated with the Informix user. USING= can also
f e
be specified with the PASSWORD= and PWD= aliases.
A
S No tsit d
70
ou
4-48 Chapter 4 Getting Familiar with SAS Data Sets
ia l
(OS/390)?
t
an operating system file r
2. What structure is a SAS data library in z/OS
e
M a
3. What is the name of the permanent SAS data library
Sasuser
r y
that SAS creates for you?
i o n
t a u t
73
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
r
4. What can you do with the SAS Explorer?
S f o o
view a list of all the libraries available to your
e
SAS session
A
S No tsit d
navigate to see all members of a specific library
display the descriptor portion of a SAS data set
ou
5. What window enables you to interactively browse a
SAS data set?
the VIEWTABLE window
74
Chapter 5 Reading SAS Data Sets
ia l
5.3
t e r
Subsetting Observations and Variables ..................................................................... 5-12
Exercises .............................................................................................................................. 5-26
5.4
M a
Adding Permanent Attributes ...................................................................................... 5-29
r y o n
Exercises .............................................................................................................................. 5-41
i
5.5
t a u t
Chapter Review ............................................................................................................. 5-44
r i e r i b S
t
5.6 Solutions ....................................................................................................................... 5-45
r p
o dis SA
Solutions to Exercises .......................................................................................................... 5-45
S P o r o f
Solutions to Chapter Review ................................................................................................ 5-54
A t f d e
S No tsi
ou
5-2 Chapter 5 Reading SAS Data Sets
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5.1 Introduction to Reading Data 5-3
Objectives
Define the concept of reading from a data source
to create a SAS data set.
Define the business scenario that will be used when
reading from a SAS data set, an Excel worksheet,
ia l
r
and a raw data file.
a t e
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
Business Scenario
S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and
f
the United States.
A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales
ou
Representatives
the employee's first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion
4
5-4 Chapter 5 Reading SAS Data Sets
Business Scenario
Reading SAS
Data Sets
Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files
M a
5
r y i o n
t a u t
Three different input data sources are used to create the new SAS data set.
r i e i b
First, a SAS data set is used as the input data source.
r S
p t
Second, an Excel workbook is used as the input data source.
o dis SA
P r
Third, a raw data file is used as the input data source.
f
S f o r o
A
S No tsit d e
ou
5.1 Introduction to Reading Data 5-5
Business Scenario
libname ;
Reading SAS data ;
set ;
Data Sets ...
run;
l
libname ;
data ;
ia
Reading Excel
set ;
Worksheets
r
...
run;
Reading Delimited
a t
data
e
infile
input
;
;
;
Raw Data Files
y M ...
run;
n
6
a r t i o
i e t b u
The DATA step is used to accomplish the scenario regardless of the input data source. Additional
statements are added to the DATA step to complete all of the requirements.
r t r i S
The LIBNAME statement references a SAS data library when reading a SAS data set, and an Excel
p
o dis SA
workbook when reading an Excel worksheet.
P r f
S f o r
5.01 Multiple Answer Poll
o
Which types of files will you read into SAS?
A
S No tsita.
b.
c.
d e
SAS data sets
Excel worksheets
raw data files
ou
d. other
e. not sure
8
5-6 Chapter 5 Reading SAS Data Sets
Objectives
Use the DATA step to create a SAS data set from
an existing SAS data set.
ia l
t e r
M a
r y i o n
t a u t
10
r i e r i b S
p
o dis SA t
P r Business Scenario
f
S f o r
Reading SAS
o
A
S No tsit
Data Sets
d e
Reading Excel
ou
Worksheets
Reading Delimited
Raw Data Files
11
5.2 Using SAS Data as Input 5-7
Business Scenario
libname ;
Reading SAS data ;
set ;
Data Sets ...
run;
l
libname ;
data ;
ia
Reading Excel
set ;
Worksheets
r
...
run;
Reading Delimited
data
a t e
infile
input
;
;
;
Raw Data Files
y M ...
run;
n
12
a r t i o
i e t b u
p r t i
Business Scenario Syntax
r S
Use the following statements to complete the scenario:
r o dis SA
LIBNAME libref 'SAS-data-library';
S P o r o f
DATA output-SAS-data-set;
A t f e
SET input-SAS-data-set;
WHERE where-expression;
d
S No tsi KEEP variable-list;
LABEL variable = 'label'
variable = 'label'
ou
variable = 'label';
FORMAT variable(s) format;
RUN;
13 continued...
5-8 Chapter 5 Reading SAS Data Sets
l
SET input-SAS-data-set;
ia
WHERE where-expression;
Part 2
KEEP variable-list;
t r
LABEL variable = 'label'
e
variable = 'label'
Part 3
a
variable = 'label';
FORMAT variable(s) format;
RUN;
y M n
14
a r t i o
i e t
move to the next part.
b u
Instead of writing the program all at once, break the program into three parts. Test each part before you
p r t r i S
r o dis SA
The LIBNAME Statement (Review)
P f
A library reference name (libref) is needed if a permanent
S f o r
data set is being read or created.
o
A
S No tsit d e
LIBNAME libref 'SAS-data-library';
DATA output-SAS-data-set;
SET input-SAS-data-set;
ou
<additional SAS statements>
RUN;
15
5.2 Using SAS Data as Input 5-9
l
DATA output-SAS-data-set;
ia
SET input-SAS-data-set;
<additional SAS statements>
RUN;
t e r
data sets.
M a
The DATA statement can create temporary or permanent
16
r y i o n
t a u t
r i e
The SET Statement
r i b S
p t
The SET statement reads observations from a SAS data
o dis SA
set for further processing in the DATA step.
P r f
r
LIBNAME libref 'SAS-data-library';
S f o o
DATA output-SAS-data-set;
e
A
S No tsit
SET input-SAS-data-set;
d
<additional SAS statements>
RUN;
ou
By default, the SET statement reads all observations
and all variables from the input data set.
The SET statement can read temporary or permanent
data sets.
17
5-10 Chapter 5 Reading SAS Data Sets
data work.subset1;
set orion.sales;
run;
ia l
9
Partial SAS Log
data work.subset1;
a
10 set orion.sales; observations and 9 variables
11 run;
M
NOTE: There were 165 observations read from the data set ORION.SALES.
n
NOTE: The data set WORK.SUBSET1 has 165 observations and 9 variables.
y
18
a r t i o p105d01
i e t b u
The LIBNAME statement needs to reference a SAS data library specific to your operating environment.
p r
For example:
Windows
t r i S
libname orion 's:\workshop';
r o dis SA
UNIX libname orion '/users/userid';
S P r
z/OS (OS/390)
A t f d e
Business Scenario Part 1
S No tsi proc print data=work.subset1;
run;
Obs
1
2
ou
Partial PROC PRINT Output
Employee_ID
120102
120103
First_
Name
Tom
Wilson
Last_Name
Zhou
Dawes
Gender
M
M
Salary
108255
87975
Job_Title
Sales
Sales
Manager
Manager
Country
AU
AU
Birth_
Date
3510
-3996
Hire_
Date
10744
5114
3 120121 Irenie Elvish F 26600 Sales Rep. II AU -5630 5114
4 120122 Christina Ngan F 27475 Sales Rep. II AU -1984 6756
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 1732 9405
6 120124 Lucian Daymond M 26480 Sales Rep. I AU -233 6999
7 120125 Fong Hofmeister M 32040 Sales Rep. IV AU -1852 6999
8 120126 Satyakam Denny M 26780 Sales Rep. II AU 10490 17014
9 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 6943 14184
10 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 9691 17106
11 120129 Alvin Roebuck M 30070 Sales Rep. III AU 1787 9405
12 120130 Kevin Lyon M 26955 Sales Rep. I AU 9114 16922
13 120131 Marinus Surawski M 26910 Sales Rep. I AU 7207 15706
14 120132 Fancine Kaiser F 28525 Sales Rep. III AU -3923 6848
15 120133 Petrea Soltau F 27440 Sales Rep. II AU 9608 17075
p105d01
19
5.2 Using SAS Data as Input 5-11
ia l
t e r
M a
21
r y i o n
t a u t
r i
5.02 Polle r i b S
p t
The DATA step reads a temporary SAS data set to create
o dis SA
a permanent SAS data set.
P r f
r
True
S f o
False
e o
A
S No tsit d
22
ou
5-12 Chapter 5 Reading SAS Data Sets
Objectives
Subset observations by using the WHERE statement.
Subset variables by using the DROP and KEEP
statements.
ia l
t e r
M a
r y i o n
t a u t
26
r i e r i b S
p
o dis SA t
P r Business Scenario Syntax
f
r
Use the following statements to complete the scenario:
S f o e o
LIBNAME libref 'SAS-data-library';
A
S No tsit d
DATA output-SAS-data-set;
SET input-SAS-data-set;
WHERE where-expression;
Part 1
ou
Part 2
KEEP variable-list;
LABEL variable = 'label'
variable = 'label'
Part 3
variable = 'label';
FORMAT variable(s) format;
RUN;
27
5.3 Subsetting Observations and Variables 5-13
9 data work.subset1;
10 set orion.sales;
11 run;
NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.SUBSET1 has 165 observations and 9 variables.
ia l
t e r
By adding statements to the DATA step, the number of
observations and variables can be reduced.
a
NOTE: The data set WORK.SUBSET1 has 61 observations and 5 variables.
M
28
r y i o n
t a u t
r i e
The WHERE Statement
r i b S
p t
The WHERE statement subsets observations that meet
o dis SA
a particular condition.
f
S f o r o
WHERE where-expression ;
A
S No tsit d e
The where-expression is a sequence of operands and
operators that form a set of instructions that define a
condition for selecting observations.
ou
Operands include constants and variables.
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.
29
5-14 Chapter 5 Reading SAS Data Sets
Operands
A constant operand is a fixed value.
Character values must be enclosed in quotation marks
and are case sensitive.
Numeric values do not use quotation marks.
ia l
Examples:
t e r
a
where Gender = 'M'; where Salary > 50000;
variable
y M
constant
n variable constant
30
a r t i o
i e t b u
p r t i
Comparison Operators
r S
Comparison operators compare a variable with a value
r o dis SA
or with another variable.
S P o r
Symbol
=
o f
Mnemonic
EQ
Definition
equal to
A t f ^= ¬=
d e
~= NE not equal to
S No tsi
> GT greater than
< LT less than
>= GE greater than or equal
ou
<= LE less than or equal
IN equal to one of a list
31
5.3 Subsetting Observations and Variables 5-15
Comparison Operators
Examples:
where Salary ne .;
ia l
where Salary >= 50000;
t e r
M a
where Country in ('AU','US');
Values must be
separated by
n
where Country in ('AU' 'US'); commas or blanks.
32
a r y t i o
i e t b u
p r
Arithmetic Operators
t r i S
Arithmetic operators indicate that an arithmetic calculation
r o dis SA
is performed.
S P o r o**
f
Symbol Definition
exponentiation
A t f d e * multiplication
S No tsi
/ division
+ addition
- subtraction
33
ou
5-16 Chapter 5 Reading SAS Data Sets
Arithmetic Operators
Examples:
ia l
t e r
where Salary + Bonus <= 10000;
M a
34
r y i o n
t a u t
r i e
Logical Operators
r i b S
p t
Logical operators combine or modify expressions.
o dis SA
P r Symbol
f
Mnemonic Definition
r
& AND logical and
S f o e
|
o OR logical or
A
S No tsit
^
d
¬ ~ NOT logical not
35
ou
5.3 Subsetting Observations and Variables 5-17
Logical Operators
Examples:
ia l
t e r
where Country not in ('AU' 'US');
M a
36
r y i o n
t a u t
r i
5.03 Quize r i b S
p t
Which WHERE statement correctly subsets the numeric
o dis SA
values for May, June, or July and missing character
P rnames?
f
S f o r
a. Answer
where Months in (5-7)
o
and Names = .;
A
S No tsit
b. Answer
d e
where Months in (5,6,7)
and Names = ' ';
ou
c. Answer
where Months in ('5','6','7')
and Names = '.';
38
5-18 Chapter 5 Reading SAS Data Sets
ia l
? CONTAINS
t
LIKE
e r a character string
a character pattern
M a
40
r y i o n
t a u t
r i e i
BETWEEN-AND Operator
r b S
p t
The BETWEEN-AND operator selects observations in
o dis SA
which the value of a variable falls within an inclusive
P r range of values.
f
S f o r
Examples:
o
where salary between 50000 and 100000;
A
S No tsit d e
where salary not between 50000 and 100000;
Equivalent Expressions:
ou
where salary between 50000 and 100000;
41
5.3 Subsetting Observations and Variables 5-19
ia l
Examples:
where Employee_ID is null;
t e r
a
where Employee_ID is missing;
M
42
r y i o n
t a u t
r i e
CONTAINS Operator
r i b S
p t
The CONTAINS (?) operator selects observations that
o dis SA
include the specified substring.
f
r
values is not important.
S f o o
The operator is case sensitive when you make
e
comparisons.
A
S No tsit
Example:
d
where Job_Title contains 'Rep';
43
ou
5-20 Chapter 5 Reading SAS Data Sets
5.04 Quiz
Which value will not be returned based on the
WHERE statement?
a. Office Rep
b. Sales Rep. IV
c. service rep III
d. Representative
ia l
e r
where Job_Title contains 'Rep';
t
M a
45
r y i o n
t a u t
r i e
LIKE Operator
r i b S
p t
The LIKE operator selects observations by comparing
o dis SA
character values to specified patterns.
P r f
r
There are two special characters available for specifying
S o
a pattern:
f e o
A percent sign (%) replaces any number of characters.
A
S No tsit d
An underscore (_) replaces one character.
ou
A percent sign and an underscore can be specified in the
same pattern.
The operator is case sensitive.
47
5.3 Subsetting Observations and Variables 5-21
LIKE Operator
Examples:
ia l
r
where Name like 'T_M%';
a t e
This WHERE statement selects observations that begin
with a T, followed by a single character, followed by an M,
followed by any number of characters.
y M n
48
a r t i o
i e t b u
Starting in SAS 9.2, the LIKE operator supports an escape character, which enables you to search for the
percent sign (%) and the underscore (_) characters in values.
p r t r i S
An escape character is a single character that in a sequence of characters signifies that what is to follow
o dis SA
takes an alternative meaning. For the LIKE operator, an escape character signifies to search for literal
P r
instances of the % and _ characters in the variable's values instead of performing the special-character
function.
f
S o r o
To specify an escape character, you include the character in the pattern-matching expression and then the
f e
keyword ESCAPE followed by the escape character expression. When you include an escape character,
A
S No tsit
the pattern-matching expression must be enclosed in quotation marks and it cannot contain a column
d
name. The escape character expression is an expression that evaluates to a single character. The operands
must be character or string literals. If it is a single character, it must be enclosed in quotation marks.
ou
For example, if the variable X contains the values abc, a_b, and axb, the following LIKE operator using
an escape character selects only the value a_b. The escape character (/) specifies that the pattern search
for a '_' instead of matching any single character:
where x like 'a/_b' escape '/';
Without an escape character, the following LIKE operator would select the values a_b and axb. The
special character underscore in the search pattern matches any single character, including the value with
the underscore:
where x like 'a_b';
5-22 Chapter 5 Reading SAS Data Sets
5.05 Quiz
Which WHERE statement will return all the observations
that have a first name starting with the letter M for the
given values?
where Name like '_, M_';
a. Answer
ia l
c. Answer
t e r
where Name like '_, M%';
d. Answer
M a
where Name like '%, M_';
n
first name
50
a r y t i o
last name
i e t b u
p r t i
Business Scenario Part 2
r S
Include only the employees from Australia who have the
r o dis SA
word Rep in their job title.
P f
data work.subset1;
S f o rset orion.sales;
o
where Country='AU' and
Job_Title contains 'Rep';
A
S No tsit
run;
d e
Partial SAS Log
NOTE: There were 61 observations read from the data set ORION.SALES.
ou
WHERE (Country='AU') and Job_Title contains 'Rep';
NOTE: The data set WORK.SUBSET1 has 61 observations and 9 variables.
p105d02
52
5.3 Subsetting Observations and Variables 5-23
1
2
3
120121
120122
120123
Irenie
Christina
Kimiko
Elvish
Ngan
Hotstone
F
F
F
26600
27475
26190
Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
AU
AU
AU
-5630
-1984
1732
5114
6756
9405
ia l
r
4 120124 Lucian Daymond M 26480 Sales Rep. I AU -233 6999
5 120125 Fong Hofmeister M 32040 Sales Rep. IV AU -1852 6999
e
6 120126 Satyakam Denny M 26780 Sales Rep. II AU 10490 17014
7 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 6943 14184
t
8 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 9691 17106
9 120129 Alvin Roebuck M 30070 Sales Rep. III AU 1787 9405
a
10 120130 Kevin Lyon M 26955 Sales Rep. I AU 9114 16922
11 120131 Marinus Surawski M 26910 Sales Rep. I AU 7207 15706
12 120132 Fancine Kaiser F 28525 Sales Rep. III AU -3923 6848
M
13 120133 Petrea Soltau F 27440 Sales Rep. II AU 9608 17075
14 120134 Sian Shannan M 28015 Sales Rep. II AU -3861 5114
n
15 120135 Alexei Platts M 32490 Sales Rep. IV AU 3313 13788
53
a r y t i o p105d02
i e t b u
p r t i
The DROP and KEEP Statements
r S
The DROP statement specifies the names of the variables
r o dis SA
to omit from the output data set(s).
S P o r o f
DROP variable-list;
A t f e
The KEEP statement specifies the names of the variables
to write to the output data set(s).
d
S No tsi KEEP variable-list;
ou
The variable-list specifies the variables to drop or keep,
respectively, in the output data set.
55
5-24 Chapter 5 Reading SAS Data Sets
ia l
t e r
M a
56
r y i o n
t a u t
r i e i b
Business Scenario Part 2
r S
p t
Include only the employee's first name, last name, salary,
o dis SA
job title, and hired date in the data set Work.subset1.
P r data work.subset1;
f
r
set orion.sales;
S f o o
where Country='AU' and
Job_Title contains 'Rep';
e
A
keep First_Name Last_Name Salary
S No tsit run;
d
Job_Title Hire_Date;
ou
NOTE: There were 61 observations read from the data set ORION.SALES.
WHERE (Country='AU') and Job_Title contains 'Rep';
NOTE: The data set WORK.SUBSET1 has 61 observations and 5 variables.
p105d03
57
5.3 Subsetting Observations and Variables 5-25
1
2
3
Irenie
Christina
Kimiko
Elvish
Ngan
Hotstone
26600
27475
26190
Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
5114
6756
9405
ia l
r
4 Lucian Daymond 26480 Sales Rep. I 6999
e
5 Fong Hofmeister 32040 Sales Rep. IV 6999
6 Satyakam Denny 26780 Sales Rep. II 17014
t
7 Sharryn Clarkson 28100 Sales Rep. II 14184
a
8 Monica Kletschkus 30890 Sales Rep. IV 17106
9 Alvin Roebuck 30070 Sales Rep. III 9405
10 Kevin Lyon 26955 Sales Rep. I 16922
M
11 Marinus Surawski 26910 Sales Rep. I 15706
12 Fancine Kaiser 28525 Sales Rep. III 6848
58
r y i o n p105d03
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-26 Chapter 5 Reading SAS Data Sets
Exercises
Level 1
1. Subsetting Observations and Variables Using the WHERE and KEEP Statements
a. Retrieve and submit the starter program p105e01.
ia l
e r
What is the variable name that contains gender values? ________________________________
t
a
What are the two possible gender values? ___________________________________________
M
b. Add a DATA step before the PROC PRINT step to read the data set orion.customer_dim to
n
create a new data set called Work.youngadult.
y
r i o
c. Modify the PROC PRINT step to refer to the new data set.
a t
i e t
11 variables.
b u
d. Submit the program and confirm that Work.youngadult was created with 77 observations and
p r t r i S
e. Add a WHERE statement to the DATA step to subset for female customers.
r o dis SA
f. Submit the program and confirm that Work.youngadult was created with 30 observations and
11 variables.
S P o r o f
g. Modify the WHERE statement to subset for female customers whose Customer_Age is
between 18 and 36.
A t f e
h. Submit the program and confirm that Work.youngadult was created with 15 observations and
d
S No tsi
11 variables.
i. Modify the WHERE statement to subset for female 18- to 36-year-old customers who have the
word Gold in their Customer_Group.
ou
j. Submit the program and confirm that Work.youngadult was created with 5 observations and
11 variables.
k. Modify the DATA step so that Work.youngadult contains only Customer_Name,
Customer_Age, Customer_BirthDate, Customer_Gender, and Customer_Group.
l. Submit the program and confirm that Work.youngadult was created with 5 observations and 5
variables.
5.3 Subsetting Observations and Variables 5-27
Level 2
2. Subsetting Observations and Variables Using the WHERE and DROP Statements
a. Write a DATA step to read the data set orion.product_dim to create a new data set called
Work.sports.
Work.sports should include only those observations with Supplier_Country from Great
Britain (GB), Spain (ES), or Netherlands (NL) and Product_Category values that end in the
word Sports.
M a
Partial PROC PRINT Output (First 10 of 30 Observations)
Obs
r y
Product_
Category
i o n Product_Name
Supplier_
Country
1
2
t a u t
Children Sports
Children Sports
Butch T-Shirt with V-Neck
Children's Knit Sweater
ES
ES
r i e3
4
5
b
Children Sports
i
Children Sports
r
Children Sports
S
Gordon Children's Tracking Pants
O'my Children's T-Shirt with Logo
Strap Pants BBO
ES
ES
ES
p
o dis SA
6
7
t
Indoor Sports
Indoor Sports
Abdomen Shaper
Fitness Dumbbell Foam 0.90
NL
NL
P r 8
9
Indoor Sports
Indoor Sports
f
Letour Heart Bike
Letour Trimag Bike
NL
NL
r
10 Indoor Sports Weight 5.0 Kg NL
S
Level 3
f o e o
A
S No tsit d
3. Using the SOUNDS-LIKE Operator and the KEEP= Option
a. Write a DATA step to read the data set orion.customer_dim to create a new data set called
ou
Work.tony.
b. Add a WHERE statement to the DATA step to subset the observations with the
Customer_FirstName value that sounds like Tony.
Documentation on the SOUNDS-LIKE operator can be found in the SAS Help and
Documentation from the Index tab by typing sounds-like operator.
c. Add a KEEP= data set option in the SET statement to read only the Customer_FirstName
and Customer_LastName variables.
Documentation on the KEEP= data set option can be found in the SAS Help and
Documentation from the Contents tab. (Select SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
SAS Data Set Options KEEP= Data Set Option.)
5-28 Chapter 5 Reading SAS Data Sets
1 Tonie Asmussen
2 Tommy Mcdonald
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5.4 Adding Permanent Attributes 5-29
Objectives
Add labels to the descriptor portion of a SAS data set
by using the LABEL statement.
Add formats to the descriptor portion of a SAS data set
by using the FORMAT statement.
ia l
t e r
M a
r y i o n
t a u t
61
r i e r i b S
p
o dis SA t
P r Business Scenario Syntax
f
r
Use the following statements to complete the scenario:
S f o e o
LIBNAME libref 'SAS-data-library';
A
S No tsit d
DATA output-SAS-data-set;
SET input-SAS-data-set;
WHERE where-expression;
Part 1
ou
Part 2
KEEP variable-list;
LABEL variable = 'label'
variable = 'label'
Part 3
variable = 'label';
FORMAT variable(s) format;
RUN;
62
5-30 Chapter 5 Reading SAS Data Sets
ia l
Partial PROC CONTENTS Output
t e r
Alphabetic List of Variables and Attributes
a
# Variable Type Len Format Label
1 First_Name Char 12
M
5 Hire_Date Num 8 DDMMYY10. Date Hired
4 Job_Title Char 25 Sales Title
63
2
3
r y
Last_Name
Salary
i o n Char
Num
18
8 COMMAX8.
t a u t
r i e i b
Adding Permanent Attributes
r S
p t
When displaying reports,
o dis SA
r
a label changes the appearance of a variable name
a format changes the appearance of variable value.
S P o r o f
A t f e
Partial PROC PRINT Output
d
Label
S No tsi
First_
Obs Name Last_Name Salary Sales Title Date Hired
ou
3 Kimiko Hotstone 26.190 Sales Rep. I 01/10/1985
4 Lucian Daymond 26.480 Sales Rep. I 01/03/1979
5 Fong Hofmeister 32.040 Sales Rep. IV 01/03/1979
Format
64
5.4 Adding Permanent Attributes 5-31
l
variable = 'label'
ia
variable = 'label';
e r
A label can have as many as 256 characters.
t
Any number of variables can be associated with labels
M a
in a single LABEL statement.
Using a LABEL statement in a DATA step permanently
65
r y o n
associates labels with variables by storing the label in
the descriptor portion of the SAS data set.
i
t a u t
r i e i b
Business Scenario Part 3
r S
p t
Include labels in the descriptor portion of Work.subset1.
o dis SA
r
data work.subset1;
set orion.sales;
S P o o f
where Country='AU' and
A t f e
Job_Title Hire_Date;
d
label Job_Title='Sales Title'
S No tsirun;
Hire_Date='Date Hired';
66
ou p105d04
5-32 Chapter 5 Reading SAS Data Sets
1
Variable
First_Name
Type
Char
Len
12
Label
ia l
r
5 Hire_Date Num 8 Date Hired
4 Job_Title Char 25 Sales Title
2
3
Last_Name
Salary
a t e Char
Num
18
8
y M n
67
a r t i o p105d04
i e t b u
p r t i
Business Scenario Part 3
r S
In order to use labels in the PRINT procedure, a LABEL
r o dis SA
option needs to be added to the PROC PRINT statement.
P f
proc print data=work.subset1 label;
r
run;
S f o o
Partial PROC PRINT Output
e
A
S No tsit Obs
1
2
3
First_
d
Name
Irenie
Christina
Kimiko
Last_Name
Elvish
Ngan
Hotstone
Salary
26600
27475
26190
Sales Title
Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
Date
Hired
5114
6756
9405
ou
4 Lucian Daymond 26480 Sales Rep. I 6999
5 Fong Hofmeister 32040 Sales Rep. IV 6999
6 Satyakam Denny 26780 Sales Rep. II 17014
7 Sharryn Clarkson 28100 Sales Rep. II 14184
8 Monica Kletschkus 30890 Sales Rep. IV 17106
9 Alvin Roebuck 30070 Sales Rep. III 9405
10 Kevin Lyon 26955 Sales Rep. I 16922
p105d04
68
5.4 Adding Permanent Attributes 5-33
ia l
data values.
t e r
Using a FORMAT statement in a DATA step
permanently associates formats with variables
M
of the SAS data set. a
by storing the format in the descriptor portion
69
r y i o n
t a u t
r i e
SAS Formats
r i b S
p t
SAS formats have the following form:
o dis SA
P r <$>format<w>.<d>
f
S f o
$
r o
indicates a character format.
A
S No tsit format
w d e
names the SAS format or user-defined format.
ou
. is a required delimiter.
70
5-34 Chapter 5 Reading SAS Data Sets
SAS Formats
Selected SAS formats:
Format Definition
$w. writes standard character data.
l
writes numeric values with a comma that separates every
COMMAw.d
ia
three digits and a period that separates the decimal fraction.
r
writes numeric values with a period that separates every three
COMMAXw.d
digits and a comma that separates the decimal fraction.
DOLLARw.d
a t e
writes numeric values with a leading dollar sign, a comma that
separates every three digits, and a period that separates the
decimal fraction.
M
writes numeric values with a leading euro symbol (€), a period
n
EUROXw.d that separates every three digits, and a comma that separates
y
the decimal fraction.
71
a r t i o
i e t b u
p r
SAS Formats
t r i
Selected SAS formats:
S
r o dis SA Format Stored Value Displayed Value
S P o r
$4.
12.
o f Programming
27134.2864
Prog
27134
A t f e
12.2
d
27134.2864 27134.29
S No tsi
COMMA12.2 27134.2864 27,134.29
COMMAX12.2 27134.2864 27.134,29
DOLLAR12.2 27134.2864 $27,134.29
ou
EUROX12.2 27134.2864 €27.134,29
72
5.4 Adding Permanent Attributes 5-35
SAS Formats
If you do not specify a format width that is large enough
to accommodate a numeric value, the displayed value
is automatically adjusted to fit into the width.
l
DOLLAR12.2 27134.2864 $27,134.29
ia
DOLLAR9.2 27134.2864 $27134.29
r
DOLLAR8.2 27134.2864 27134.29
DOLLAR5.2
DOLLAR4.2
a t e
27134.2864
27134.2864
27134
27E3
y M n
73
a r t i o
i e t b u
p r
5.06 Quiz
t r i S
Which numeric format writes standard numeric data with
r o dis SA
leading zeros?
S P o r o f
Documentation on formats can be found in the
SAS Help and Documentation from the Contents tab
A t f e
(SAS Products Base SAS SAS 9.2 Language
Reference: Dictionary Dictionary of Language
d
S No tsiElements Formats Formats by Category).
75
ou
5-36 Chapter 5 Reading SAS Data Sets
l
MMDDYY8. 0 01/01/60
ia
MMDDYY10. 0 01/01/1960
DDMMYY6.
DDMMYY8.
t e r 365
365
311260
31/12/60
DDMMYY10.
M a 365 31/12/1960
78
r y i o n
t a u t
r i e
SAS Date Formats
r i b S
p t
Additional date formats:
o dis SA
P r Format
DATE7.
Stored Value
f -1
Displayed Value
31DEC59
S f o r
DATE9.
o -1 31DEC1959
A e
WORDDATE. 0 January 1, 1960
S No tsit MONYY7.
YEAR4.
d
WEEKDATE. 0
0
0
Friday, January 1, 1960
JAN1960
1960
79
ou
5.4 Adding Permanent Attributes 5-37
5.07 Quiz
Which FORMAT statement creates the output?
a. Answer
format Birth_Date Hire_Date ddmmyy9.
Term_Date mmyy7.;
l
b. Answer
format Birth_Date Hire_Date ddmmyyyy.
Term_Date mmmyyyy.;
c. Answer
e r
format Birth_Date Hire_Date ddmmyy10. ia
t
Term_Date monyy7.;
a
Output
y M
Birth_Date Hire_Date
n
Term_Date
o
21/05/1969 15/10/1992 MAR2007
81
t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
5-38 Chapter 5 Reading SAS Data Sets
l
German_Germany 01. Januar 1960
ia
English_UnitedStates January
NLDATEMNw.
r
German_Germany Januar
NLDATEWw.
a t e
English_UnitedStates
German_Germany
Fri, Jan 01, 60
Fr, 01. Jan 60
M
English_UnitedStates Friday
NLDATEWNw.
83
r y i o n
German_Germany Freitag
p105d05
t a u t
National Language Support (NLS) is a set of features that enable a software product to function
r i e r i b
properly in every global market for which the product is targeted. SAS contains NLS features to
ensure that SAS applications conform to local language conventions.
S
p
o dis SA t
A locale reflects the language, local conventions, and culture for a geographical region. Local
conventions might include specific formatting rules for dates. Dates have many representations,
P r
depending on the conventions that are accepted in a culture.
f
S o r o
The LOCALE= system option is used to specify the locale, which reflects the local conventions,
language, and culture of a geographical region. For example, a locale value of English_Canada
f
A
S No tsi d e
represents the country of Canada with a language of English, and a locale value of French_Canada
t
represents the country of Canada with a language of French. English_UnitedStates represents
the country of United States with a language of English. German_Germany represents the country
of Germany with a language of German.
ou
The LOCALE= system option can be specified in a configuration file, at SAS invocation, in the
OPTIONS statement, or in the SAS System Options window.
For more information, refer to the SAS 9.2 National Language Support Reference Guide in SAS Help
and Documentation.
5.4 Adding Permanent Attributes 5-39
5.08 Quiz
How many date and time formats start with EUR?
r y i o n
t a u t
r i e i b
Business Scenario Part 3
r S
p t
Include formats in the descriptor portion of
o dis SA
Work.subset1.
P r
data work.subset1;
f
r
set orion.sales;
S f o o
where Country='AU' and
Job_Title contains 'Rep';
e
A
keep First_Name Last_Name Salary
S No tsit d
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';
format Salary commax8. Hire_Date ddmmyy10.;
ou
run;
p105d06
87
5-40 Chapter 5 Reading SAS Data Sets
1
Variable
First_Name
Type
Char
Len
12
Format Label
ia l
r
5 Hire_Date Num 8 DDMMYY10. Date Hired
4 Job_Title Char 25 Sales Title
2
3
Last_Name
Salary
a t eChar
Num
18
8 COMMAX8.
y M n
88
a r t i o p105d06
i e t b u
p r t r i
Business Scenario Part 3
S
o dis SA
proc print data=work.subset1 label;
run;
P r f
r
Partial PROC PRINT Output
S f o
Obs
First_
Name
A
S No tsit
1
2
3
4
5
6
Irenie
d
Christina
Kimiko
Lucian
Fong
Satyakam
Elvish
Ngan
Hotstone
Daymond
Hofmeister
Denny
26.600
27.475
26.190
26.480
32.040
26.780
Sales
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
Rep.
II
II
I
I
IV
II
01/01/1974
01/07/1978
01/10/1985
01/03/1979
01/03/1979
01/08/2006
ou
7 Sharryn Clarkson 28.100 Sales Rep. II 01/11/1998
8 Monica Kletschkus 30.890 Sales Rep. IV 01/11/2006
9 Alvin Roebuck 30.070 Sales Rep. III 01/10/1985
10 Kevin Lyon 26.955 Sales Rep. I 01/05/2006
11 Marinus Surawski 26.910 Sales Rep. I 01/01/2003
12 Fancine Kaiser 28.525 Sales Rep. III 01/10/1978
p105d06
89
5.4 Adding Permanent Attributes 5-41
Exercises
Level 1
ia l
e r
Notice the format and labels stored in the descriptor portion of Work.youngadult.
t
PROC PRINT report:
M a
b. Add a LABEL statement and a FORMAT statement to the DATA step to create the following
n
Customer
y
Obs Gender Customer Name Date of Birth Member Level Age
1 F
a r t i o
Sandrina Stephano July 9, 1979 Orion Club Gold members 28
t u
2 F Cornelia Krahl February 27, 1974 Orion Club Gold members 33
e
3 F Dianne Patchin May 6, 1979 Orion Club Gold members 28
4
r
5
i F
F
t r i b
Annmarie Leveille
S
Sanelisiwe Collier
July 16,
July 7,
1984
1988
Orion
Orion
Club
Club
Gold
Gold
members
members
23
19
r p
o dis SA
The labels need to be changed for Customer_Gender, Customer_BirthDate, and
Customer_Group.
S P o o f
The format needs to be changed for Customer_BirthDate.
r
Hint: Do not forget the option in the PROC PRINT step that enables the labels to appear.
A t f e
Why do Customer_Name and Customer_Age appear with a space in the column header
d
S No tsi but do not need labels? __________________________________________________________
Level 2
ou
5. Adding Permanent Attributes to Work.sports
Variable Label
Product_Category Sports Category
Product_Name Product Name (Abbrev)
c. Add a FORMAT statement to the DATA step to display only the first 15 letters of
Product_Name and Supplier_Name.
l
1 Children Sports Butch T-Shirt w ES Luna sastreria
2 Children Sports Children's Knit ES Luna sastreria
ia
3 Children Sports Gordon Children ES Luna sastreria
r
4 Children Sports O'my Children's ES Luna sastreria
e
5 Children Sports Strap Pants BBO ES Sportico
6
7
8
a t
Indoor Sports
Indoor Sports
Indoor Sports
Abdomen Shaper
Fitness Dumbbel
Letour Heart Bi
NL
NL
NL
TrimSport B.V.
TrimSport B.V.
TrimSport B.V.
M
9 Indoor Sports Letour Trimag B NL TrimSport B.V.
n
10 Indoor Sports Weight 5.0 Kg NL TrimSport B.V.
r y i o
e. Add a PROC CONTENTS step to the end of the program to verify that the labels and formats
a t
are stored in the descriptor portion.
Level 3
i e t b u
p r t r i S
6. Using the $UPCASEw. Format and the SPLIT= Option
r o dis SA
a. Retrieve the starter program p105e06.
S P o o
uppercase values.
f
b. Add a FORMAT statement to display Customer_FirstName and Customer_LastName in
r
A t f d e
Documentation on the $UPCASEw. format can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS
ou
c. Add a LABEL statement to add the following labels:
Variable Label
Customer_FirstName CUSTOMER*FIRST NAME
d. In the PROC PRINT statement, replace the LABEL option with the SPLIT= option and reference
the asterisk as the split character.
Documentation on the SPLIT= option can be found in the SAS Help and Documentation
from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure).
5.4 Adding Permanent Attributes 5-43
CUSTOMER CUSTOMER
Obs FIRST NAME LAST NAME
1 TONIE ASMUSSEN
2 TOMMY MCDONALD
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-44 Chapter 5 Reading SAS Data Sets
Chapter Review
1. What statement is used to read from a SAS data set
in the DATA step?
ia l
r
the DATA step?
t e
3. What does the WHERE statement do?
a
y M
4. What are examples of logical operators?
n
r i o
5. How can you limit the variables written to an output
a t
data set in the DATA step?
i e t b u
i
92
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
5.6 Solutions 5-45
5.6 Solutions
Solutions to Exercises
1. Subsetting Observations and Variables Using the WHERE and KEEP Statements
a. Retrieve and submit the program.
What is the variable name that contains gender values? Customer_Gender
ia l
b. Add a DATA step.
t e r
a
data work.youngadult;
set orion.customer_dim;
run;
y M n
o
proc print data=orion.customer_dim;
run;
t a r t i
i i b u
c. Modify the PROC PRINT step.
e
proc print data=work.youngadult;
run;
p r t r S
o dis SA
d. Submit the program.
P r
e. Add a WHERE statement to subset for female customers.
f
r
data work.youngadult;
S f o o
set orion.customer_dim;
where Customer_Gender='F';
e
A run;
S No tsit d
f. Submit the program.
g. Modify the WHERE statement to subset for female 18- to 36-year-old customers.
ou
data work.youngadult;
set orion.customer_dim;
where Customer_Gender='F' and
Customer_Age between 18 and 36;
run;
h. Submit the program.
5-46 Chapter 5 Reading SAS Data Sets
i. Modify the WHERE statement to subset for female 18- to 36-year-old customers who have the
word Gold in their Customer_Group.
data work.youngadult;
set orion.customer_dim;
where Customer_Gender='F' and
Customer_Age between 18 and 36 and
Customer_Group contains 'Gold';
run;
j. Submit the program.
k. Keep only five variables.
ia l
data work.youngadult;
set orion.customer_dim;
t e r
a
where Customer_Gender='F' and
Customer_Age between 18 and 36 and
M
Customer_Group contains 'Gold';
r y i o n
keep Customer_Name Customer_Age Customer_BirthDate
Customer_Gender Customer_Group;
run;
t a u t
e
l. Submit the program.
r i t r i b
2. Subsetting Observations and Variables Using the WHERE and DROP Statements
S
r p
o dis SA
a. Write a DATA step.
data work.sports;
set orion.product_dim;
S P o r o f
where Supplier_Country in ('GB','ES','NL') and
Product_Category like '%Sports';
f e
drop Product_ID Product_Line Product_Group
A
S No tsi
run;
t
Supplier_Name Supplier_ID;
d
b. Write a PROC PRINT step.
ou
proc print data=work.sports;
run;
3. Using the SOUNDS-LIKE Operator and the KEEP= Option
a. Write a DATA step.
data work.tony;
set orion.customer_dim;
run;
b. Add a WHERE statement to the DATA step.
data work.tony;
set orion.customer_dim;
where Customer_FirstName =* 'Tony';
run;
5.6 Solutions 5-47
ia l
t e r
a. Retrieve and submit the starter program.
b. Add a LABEL statement and a FORMAT statement to the DATA step.
data work.youngadult;
set orion.customer_dim;
M a
where Customer_Gender='F' and
r y o n
Customer_Age between 18 and 35 and
i
t
Customer_Group contains 'Gold';
e t a
keep Customer_Name Customer_Age Customer_BirthDate
u
Customer_Gender Customer_Group;
r i r i b
label Customer_Gender='Gender'
S
Customer_BirthDate='Date of Birth'
t
p Customer_Group='Member Level';
o dis SA
format Customer_BirthDate worddate.;
run;
r
S P
run;
o r o f
proc contents data=work.youngadult;
A t f d e
proc print data=work.youngadult label;
S No tsi
run;
Why do Customer_Name and Customer_Age appear with a space in the column header but
ou
do not need labels? These variables already have permanent labels assigned in the data set
descriptor portion.
5. Adding Permanent Attributes to Work.sports
b. Add a LABEL statement to the DATA step and a LABEL option to PROC PRINT.
data work.sports;
set orion.product_dim;
where Supplier_Country in ('GB','ES','NL') and
Product_Category like '%Sports';
drop Product_ID Product_Line Product_Group Supplier_ID;
label Product_Category='Sports Category'
Product_Name='Product Name (Abbrev)'
Supplier_Name='Supplier Name (Abbrev)';
run;
ia l
proc print data=work.sports label;
run;
t e r
a
c. Add a FORMAT statement to the DATA step.
data work.sports;
set orion.product_dim;
y M n
where Supplier_Country in ('GB','ES','NL') and
r i o
Product_Category like '%Sports';
a t
drop Product_ID Product_Line Product_Group Supplier_ID;
i e t b u
label Product_Category='Sports Category'
Product_Name='Product Name (Abbrev)'
p r t i
Supplier_Name='Supplier Name (Abbrev)';
r S
format Product_Name Supplier_Name $15.;
o dis SA
run;
f
S o r
e. Add a PROC CONTENTS step.
o
proc contents data=work.sports;
f
A
run;
S No tsit d e
6. Using the $UPCASEw. Format and the SPLIT= Option
a. Retrieve the starter program.
ou
b. Add a FORMAT statement.
data work.tony;
set orion.customer_dim(keep=Customer_FirstName Customer_LastName);
where Customer_FirstName =* 'Tony';
format Customer_FirstName Customer_LastName $upcase.;
run;
ia l
run;
e. Submit the program.
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-50 Chapter 5 Reading SAS Data Sets
True
ia l
False
t e r
a
data work.mycustomers;
set orion.customer;
run;
M
y o n
proc print data=work.mycustomers;
r i
var Customer_ID Customer_Name
run;
t a u t
Customer_Address;
23
r i e r i b S
p
o dis SA t
P r 5.03 Quiz – Correct Answer
f
r
Which WHERE statement correctly subsets the numeric
S f o
names?
e o
values for May, June, or July and missing character
A
S No tsit a. Answer
d
where Months in (5-7)
and Names = .;
ou
b. Answer
where Months in (5,6,7)
and Names = ' ';
c. Answer
where Months in ('5','6','7')
and Names = '.';
39
5.6 Solutions 5-51
ia l
e r
where Job_Title contains 'Rep';
t
M a
46
r y i o n
t a u t
r i e i b
5.05 Quiz – Correct Answer
r S
p t
Which WHERE statement will return all the observations
o dis SA
that have a first name starting with the letter M for the
P rgiven values?
f
S f o r
where Name like '_, M_';
a. Answer
o
A
S No tsit d e
where Name like '%, M%';
b. Answer
ou
where Name like '%, M_';
d. Answer
first name
last name
51
5-52 Chapter 5 Reading SAS Data Sets
Zw.d
ia l
of blanks.
t e r
M a
76
r y i o n
t a u t
r i e i b
5.07 Quiz – Correct Answer
r S
p t
Which FORMAT statement creates the output?
o dis SA
P r a. Answer
format Birth_Date Hire_Date ddmmyy9.
f
Term_Date mmyy7.;
S f o r
b. Answer
o
format Birth_Date Hire_Date ddmmyyyy.
A
S No tsit c. Answer d e
Term_Date mmmyyyy.;
82
ou Output
Birth_Date
21/05/1969
Hire_Date
15/10/1992
Term_Date
MAR2007
5.6 Solutions 5-53
Nine
Example:
The EURDFDDw. format writes international SAS
date values in the form dd.mm.yy or dd.mm.yyyy.
ia l
t e r
M a
86
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-54 Chapter 5 Reading SAS Data Sets
ia l
the DATA step?
DATA statement
t r
2. What statement is used to write to a SAS data set in
e
M a
3. What does the WHERE statement do?
r y i
meet a certain condition.
o n
The WHERE statement subsets observations that
t a u t
93
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
r
4. What are examples of logical operators?
S f o
AND
OR
e o
A
S No tsit NOT
d
5. How can you limit the variables written to an output
ou
data set in the DATA step?
DROP or KEEP statement
94
Chapter 6 Reading Excel
Worksheets
ia l
e r
Exercises .............................................................................................................................. 6-17
t
6.2
a
Doing More with Excel Worksheets (Self-Study) ....................................................... 6-20
M
r i o n
Exercises .............................................................................................................................. 6-36
y
6.3
a t
Chapter Review ............................................................................................................. 6-37
t u
6.4
i e i b
Solutions ....................................................................................................................... 6-38
r r S
p t
Solutions to Exercises .......................................................................................................... 6-38
o dis SA
r
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 6-43
S P o r o f
Solutions to Chapter Review ................................................................................................ 6-44
A t f d e
S No tsi
ou
6-2 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.1 Using Excel Data as Input 6-3
Objectives
Use the DATA step to create a SAS data set from
an Excel worksheet.
Use the SAS/ACCESS LIBNAME statement to
read from an Excel worksheet as though it were
ia l
r
a SAS data set.
a t e
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
Business Scenario
S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and
f
the United States.
A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales
ou
Representatives
the employee's first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion
4
6-4 Chapter 6 Reading Excel Worksheets
Business Scenario
Reading SAS
Data Sets
Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files
M a
5
r y i o n
t a u t
r i e
Business Scenario
r i b S
p
o dis SA t libname ;
r
Reading SAS data ;
set ;
Data Sets
P f
...
r
run;
S f o e o libname ;
A
S No tsit
Reading Excel
d
Worksheets
data
set
...
run;
;
;
ou
data ;
Reading Delimited infile ;
input ;
Raw Data Files ...
run;
6
The LIBNAME statement references a SAS data library when reading a SAS data set
and an Excel workbook when reading an Excel worksheet.
6.1 Using Excel Data as Input 6-5
DATA output-SAS-data-set;
l
SET input-SAS-data-set;
ia
WHERE where-expression;
KEEP variable-list;
t e r
LABEL variable = 'label'
variable = 'label'
a
variable = 'label';
FORMAT variable(s) format ;
RUN;
y M n
7
a r t i o
i e t b u
p r
sales.xls
t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
8
ou
two worksheets cells formatted as dates
6-6 Chapter 6 Reading Excel Worksheets
ia l
Example:
t
libname orion 's:\workshop';
e r
M a
n
libref physical location of
y
SAS data library
9
a r t i o
i e t b u
The LIBNAME statement needs to reference a SAS data library specific to your operating environment.
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.1 Using Excel Data as Input 6-7
ia l
t e r
This enables you to reference worksheets directly in
a DATA step or SAS procedure, and to read from and
M a
write to a Microsoft Excel worksheet as though it were
10
r y i o n
t a u t
SAS/ACCESS options can be used in the LIBNAME statement.
r
For example,
i e r i b S
p
MIXED=YES | NO
o dis SA t
specifies whether to convert numeric data values into character data values for a column
f
S f o r o
The default is NO, which means that numeric data will be imported as missing values in a
character column. If MIXED=YES, the engine assigns a SAS character type for the column
A
S No tsit d e
and converts all numeric data values to character data.
The following Technical Support Usage Note addresses column data that is imported as missing:
http://support.sas.com/kb/6/123.html
ou
6-8 Chapter 6 Reading Excel Worksheets
Example:
libname orionxls 's:\workshop\sales.xls';
ia l
libref
11
r y i o n p106d01
t a u t
SAS/ACCESS Interface to PC File Formats enables you to read data from PC files, to use that data in
r i e r i b
SAS reports or applications, and to use SAS data sets to create PC files in various formats.
S
t
SAS/ACCESS Interface to PC File Formats gives access to Microsoft Excel, Microsoft Access, dBase,
p
o dis SA
JMP, Lotus 1-2-3, SPSS, Stata, and Paradox.
r
To determine if you have a license for SAS/ACCESS Interface to PC File Formats, submit the following
P
step:
r
proc setinit;
S o o f
f
run;
A
S No tsit d e
After submitting, look in the SAS log for the products that are licensed for your site.
SAS/ACCESS Interface to PC File Formats on Linux and UNIX enables access to PC files from
the Linux and UNIX operating environments. A PC Files Server is used to access the PC data.
ou
The PC Files Server runs on a Microsoft Windows server, and SAS/ACCESS Interface to PC File
Formats runs on a Linux or UNIX client server.
6.1 Using Excel Data as Input 6-9
ia l
t e r
M a
n
Worksheet names appear with a
12
t i o
i e t b u
p r t i
The CONTENTS Procedure
r S
proc contents data=orionxls._all_;
r o dis SA
run;
P f
The CONTENTS Procedure
S f o r o Directory
e
Libref ORIONXLS
A
S No tsit d
Engine
Physical Name
User
EXCEL
sales.xls
Admin
DBMS
ou
Member Member
# Name Type Type
p106d01
13
6-10 Chapter 6 Reading Excel Worksheets
ia l
r
# Variable Type Len Format Informat Label
e
8 Birth_Date Num 8 DATE9. DATE9. Birth Date
t
7 Country Char 2 $2. $2. Country
1 Employee_ID Num 8 Employee ID
a
2 First_Name Char 10 $10. $10. First Name
4 Gender Char 1 $1. $1. Gender
M
9 Hire_Date Num 8 DATE9. DATE9. Hire Date
6 Job_Title Char 14 $14. $14. Job Title
n
3 Last_Name Char 12 $12. $12. Last Name
y
5 Salary Num 8 Salary
14
a r t i o
e t u
The Excel LIBNAME engine converts worksheet dates to SAS date values and assigns
i b
i
the DATE9. format.
p r t r S
r o dis SA
The CONTENTS Procedure The CONTENTS Procedure
r
Member Type
Engine
o
Created
o f
ORIONXLS.'UnitedStates$'n
DATA
EXCEL
.
Observations
Variables
Indexes
Observation Length
.
9
0
0
f e
Last Modified . Deleted Observations 0
A
Protection Compressed NO
t d
Data Set Type Sorted NO
S No tsi
Label
Data Representation Default
Encoding Default
ou
Alphabetic List of Variables and Attributes
15
6.1 Using Excel Data as Input 6-11
ia l
r
orionxls.'Australia$'n
y M n
16
a r t i o
i e t b u
p r t i
The PRINT Procedure
r S
proc print data=orionxls.'Australia$'n;
r o dis SA
run;
S P
Obs
o r
Employee_
ID
o f
Partial PROC PRINT Output
First_Name Last_Name Gender Salary Job_Title Country
Birth_
Date Hire_Date
A
1
2
3
t f 120102
120103
120121
d e
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
108255
87975
26600
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
11AUG1969
22JAN1949
02AUG1944
01JUN1989
01JAN1974
01JAN1974
S No tsi
4 120122 Christina Ngan F 27475 Sales Rep. II AU 27JUL1954 01JUL1978
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 28SEP1964 01OCT1985
6 120124 Lucian Daymond M 26480 Sales Rep. I AU 13MAY1959 01MAR1979
7 120125 Fong Hofmeister M 32040 Sales Rep. IV AU 06DEC1954 01MAR1979
8 120126 Satyakam Denny M 26780 Sales Rep. II AU 20SEP1988 01AUG2006
9 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 04JAN1979 01NOV1998
ou
10 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 14JUL1986 01NOV2006
11 120129 Alvin Roebuck M 30070 Sales Rep. III AU 22NOV1964 01OCT1985
12 120130 Kevin Lyon M 26955 Sales Rep. I AU 14DEC1984 01MAY2006
13 120131 Marinus Surawski M 26910 Sales Rep. I AU 25SEP1979 01JAN2003
14 120132 Fancine Kaiser F 28525 Sales Rep. III AU 05APR1949 01OCT1978
15 120133 Petrea Soltau F 27440 Sales Rep. II AU 22APR1986 01OCT2006
p106d01
17
6-12 Chapter 6 Reading Excel Worksheets
6.01 Quiz
Which PROC PRINT step displays the worksheet
containing employees from the United States?
ia l
e r
c. proc print data=orionxls.'UnitedStates'n;
run;
t
a
d. proc print data=orionxls.'UnitedStates$'n;
M n
run;
19
a r y t i o
i e t b u
p r
Business Scenario
t r i S
Create a temporary SAS data set named Work.subset2
r o dis SA
from the Excel workbook named sales.xls.
P f
libname orionxls 's:\workshop\sales.xls';
S r
data work.subset2;
f o o
set orionxls.'Australia$'n;
A
S No tsit e
where Job_Title contains 'Rep';
d
keep First_Name Last_Name Salary
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';
ou
format Salary comma10. Hire_Date weekdate.;
run;
p106d02
21
6.1 Using Excel Data as Input 6-13
Business Scenario
proc contents data=work.subset2;
run;
1
Variable
First_Name
Type
Char
Len
10
Format
$10.
Informat
$10.
Label
First Name
ia l
r
5 Hire_Date Num 8 WEEKDATE. DATE9. Date Hired
4 Job_Title Char 14 $14. $14. Sales Title
e
2 Last_Name Char 12 $12. $12. Last Name
t
3 Salary Num 8 COMMA10. Salary
M a
22
r y i o n p106d02
t a u t
r i e
Business Scenario
r i b S
p
o dis SA t
proc print data=work.subset2 label;
run;
P r f
r
Partial PROC PRINT Output
S f o e o
Obs First Name Last Name Salary Sales Title Date Hired
A
1 Irenie Elvish 26,600 Sales Rep. II Tuesday, January 1, 1974
t
2 Christina Ngan 27,475 Sales Rep. II Saturday, July 1, 1978
S No tsi d
3 Kimiko Hotstone 26,190 Sales Rep. I Tuesday, October 1, 1985
4 Lucian Daymond 26,480 Sales Rep. I Thursday, March 1, 1979
5 Fong Hofmeister 32,040 Sales Rep. IV Thursday, March 1, 1979
6 Satyakam Denny 26,780 Sales Rep. II Tuesday, August 1, 2006
7 Sharryn Clarkson 28,100 Sales Rep. II Sunday, November 1, 1998
ou
8 Monica Kletschkus 30,890 Sales Rep. IV Wednesday, November 1, 2006
9 Alvin Roebuck 30,070 Sales Rep. III Tuesday, October 1, 1985
10 Kevin Lyon 26,955 Sales Rep. I Monday, May 1, 2006
11 Marinus Surawski 26,910 Sales Rep. I Wednesday, January 1, 2003
12 Fancine Kaiser 28,525 Sales Rep. III Sunday, October 1, 1978
p106d02
23
6-14 Chapter 6 Reading Excel Worksheets
Disassociating a Libref
If SAS has a libref assigned to an Excel workbook, the
workbook cannot be opened in Excel. To disassociate a
libref, use a LIBNAME statement and specify the libref
and the CLEAR option.
libname orionxls 's:\workshop\sales.xls';
data work.subset2;
set orionxls.'Australia$'n;
ia l
...
run;
t e r
libname orionxls clear;
M a
SAS disconnects from the data source and closes any
24
r y i n
resources that are associated with that libref's connection.
o p106d02
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.1 Using Excel Data as Input 6-15
1. Submit the following program except for the last LIBNAME statement.
libname orionxls 'sales.xls';
data work.subset2;
ia l
r
set orionxls.'Australia$'n;
e
where Job_Title contains 'Rep';
a
Job_Title Hire_Date;
t
keep First_Name Last_Name Salary
y M
Hire_Date='Date Hired';
n
format Salary comma10. Hire_Date weekdate.;
run;
a r t i o
run;
i e t b u
proc contents data=work.subset2;
r r i S
proc print data=work.subset2 label;
run;
p t
r o dis SA
libname orionxls clear;
P r f
2. Review the PROC CONTENTS and PROC PRINT results in the Output window.
S o o
f
3. Select the Explorer tab on the SAS window bar to activate the SAS Explorer or select
A
S No tsit d e
View Contents Only.
ou
6-16 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
r y i o n
a t
5. Double-click on the Orionxls library to show all Excel worksheets of that library.
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6. Submit the last LIBNAME statement to disassociate the libref.
6.1 Using Excel Data as Input 6-17
Exercises
Level 1
ia l
t e r
b. Add a LIBNAME statement before the PROC CONTENTS step to create a libref called
CUSTFM that references the Excel workbook named custfm.xls.
M a
c. Submit the LIBNAME statement and the PROC CONTENTS step to create the following partial
n
Page 1 of 3
a r y t i o
The CONTENTS Procedure
t
Directory
i e i b u Libref CUSTFM
r
Engine EXCEL
o dis SA
User Admin
P r f
DBMS
r
Member Member
A
S No tsit d
1
2
Females$
Males$
DATA
DATA
TABLE
TABLE
d. Add a SET statement in the DATA step to read the worksheet containing the male data.
e. Add a KEEP statement in the DATA step to include only the First_Name, Last_Name, and
ou
Birth_Date variables.
f. Add a FORMAT statement in the DATA step to display the Birth_Date as a four-digit year.
g. Add a LABEL statement to change the column header of Birth_Date to Birth Year.
h. Submit the program including the last LIBNAME statement and create the following PROC
PRINT report:
Partial PROC PRINT Output (First 5 of 47 Observations)
Birth
Obs First Name Last Name Year
Level 2
l
c. Submit the program to determine the names of the four worksheets in products.xls.
ia
d. Write a DATA step to read the worksheet containing sports data to create a new data set called
Work.golf.
t e r
The data set Work.golf should
M a
• include only the observations where Category is equal to Golf
• not include the Category variable
r i o n
• include a label of Golf Products for the Name variable.
y t
e. Write a LIBNAME to clear the PROD libref.
t a u
f. Write a PROC PRINT step to create the following report:
e
r i i b
Partial PROC PRINT Output (First 10 of 56 Observations)
t r S
r p
o dis SA
Obs
1
2
Golf Products
Ball Bag
Red/White/Black Staff 9 Bag
S P o r o f 3
4
5
Tee Holder
Bb Softspikes - Xp 22-pack
Bretagne Performance Tg Men's Golf Shoes L.
A t f d e 6
7
Bretagne Soft-Tech Men's Glove, left
Bretagne St2 Men's Golf Glove, left
S No tsi 8
9
10
Bretagne Stabilites 2000 Goretex Shoes
Bretagne Stabilities Tg Men's Golf Shoes
Bretagne Stabilities Women's Golf Shoes
ou
Level 3
Documentation on the appropriate option can be found in the SAS Help and
Documentation from the Contents tab (SAS Products SAS/ACCESS
SAS/ACCESS 9.2 for PC Files: Reference LIBNAME Statement and
Pass-Through Facility on 32-Bit Microsoft Windows File-Specific Reference
Microsoft Excel Workbook Files).
b. Write a PROC CONTENTS step to view all of the contents of XLSDATA.
6.1 Using Excel Data as Input 6-19
c. Submit the program. Any member not containing a dollar sign in the name is an Excel range.
d. Write a DATA step to read the range containing Germany (DE) data to create a new data set
called Work.germany. Add appropriate labels and formats based on the desired report.
l
Obs ID Country Gender First Name Last Name Date
ia
1 9 DE F Cornelia Krahl 27/02/74
r
2 11 DE F Elke Wallstab 16/08/74
e
3 13 DE M Markus Sepke 21/07/88
t
4 16 DE M Ulrich Heyde 16/01/39
a
5 19 DE M Oliver S. Füßling 23/02/64
6 33 DE M Rolf Robak 24/02/39
M
7 42 DE M Thomas Leitmann 09/02/79
n
8 50 DE M Gert-Gunter Mendler 16/01/34
y o
9 61 DE M Carsten Maestrini 08/07/44
10
t a r 65
t i
DE F Ines Deisser 20/07/69
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-20 Chapter 6 Reading Excel Worksheets
Objectives
Use the DATA step to create an Excel worksheet
from a SAS data set.
Use the COPY procedure to create an Excel
worksheet from a SAS data set.
ia l
r
Use the IMPORT Wizard and procedure to read
e
an Excel worksheet.
an Excel worksheet.
a t
Use the EXPORT Wizard and procedure to create
y M n
a r t i o
i e t b u
i
29
p r t r S
r o dis SA
Creating Excel Worksheets
S P o r o f
In addition to reading an Excel worksheet, the
SAS/ACCESS LIBNAME statement with the DATA
f
step can be used to create an Excel worksheet.
A
S No tsit d e
libname orionxls
's:\workshop\qtr2007a.xls';
data orionxls.qtr1_2007;
set orion.qtr1_2007;
ou
run;
data orionxls.qtr2_2007;
set orion.qtr2_2007;
run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 22 observations read from the data set ORION.QTR1_2007.
l
NOTE: The data set ORIONXLS.qtr1_2007 has 22 observations and 5 variables.
ia
74 data orionxls.qtr2_2007;
r
75 set orion.qtr2_2007;
76 run;
t e
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 36 observations read from the data set ORION.QTR2_2007.
a
NOTE: The data set ORIONXLS.qtr2_2007 has 36 observations and 6 variables.
y M n
31
a r t i o
i e t b u
p r t i
Creating Excel Worksheets
r S
Partial PROC CONTENTS Output
P f
Directory
S f o r o
Libref
Engine
Physical Name
ORIONXLS
EXCEL
qtr2007a.xls
A
S No tsit d e #
User
Name
Admin
Member
Type
DBMS
Member
Type
ou
1 qtr1_2007 DATA TABLE named
2 qtr1_2007$ DATA TABLE
worksheets ranges
3 qtr2_2007 DATA TABLE
4 qtr2_2007$ DATA TABLE
32
ia l
t e r
M a
33
r y i o n
t a u t
r i e i b
Creating Excel Worksheets
r S
p t
As an alternative to the DATA step, the COPY procedure
o dis SA
can be used to create an Excel worksheet.
P r libname orionxls
f
r
's:\workshop\qtr2007b.xls';
S f o o
proc copy in=orion out=orionxls;
e
A
S No tsit
select qtr1_2007 qtr2_2007;
run;
d
proc contents data=orionxls._all_;
run;
ou
libname orionxls clear;
p106d03
34
6.2 Doing More with Excel Worksheets (Self-Study) 6-23
l
NOTE: The data set ORIONXLS.QTR1_2007 has 22 observations and 5 variables.
NOTE: Copying ORION.QTR2_2007 to ORIONXLS.QTR2_2007 (memtype=DATA).
ia
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 36 observations read from the data set ORION.QTR2_2007.
r
NOTE: The data set ORIONXLS.QTR2_2007 has 36 observations and 6 variables.
a t e
y M n
35
a r t i o
i e t b u
p r t i
Import/Export Wizards and Procedures
r S
The Import/Export Wizards and IMPORT/EXPORT
r o dis SA
procedures enable you to read and write data between
SAS data sets and external PC files.
S P o r o f
The Import/Export Wizards and procedures are part of
A t f e
Base SAS and enable access to delimited files. If you
have a license to SAS/ACCESS Interface to PC File
d
S No tsi Formats, you can also access Microsoft Excel, Microsoft
Access, dBASE, JMP, Lotus 1-2-3, SPSS, Stata, and
Paradox files.
36
ou
6-24 Chapter 6 Reading Excel Worksheets
l
select File and Import Data or
ia
Export Data.
t e r
M a
37
r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
The Import Wizard enables you to read data from an
o dis SA
external data source and write it to a SAS data set.
P r f
r
Steps of the Import Wizard:
S f o o
1. Select the type of file you are importing.
e
A
2. Locate the input file.
S No tsit d
3. Select the table range or worksheet from which to
import data.
4. Select a location to store the imported file.
ou
5. Save the generated PROC IMPORT code. (Optional)
38
6.2 Doing More with Excel Worksheets (Self-Study) 6-25
ia l
t e r
M a
39
r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
2. Locate the input file.
o dis SA
P r f
S f o r o
A
S No tsit d e
40
ou
6-26 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
41
r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
4. Select a location to store the imported file.
o dis SA
P r f
S f o r o
A
S No tsit d e
42
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-27
ia l
t e r
M a
43
r y i o n
t a u t
r i e i
The Import Wizard
r b S
pSAS Log
o dis SA t
NOTE: WORK.SUBSET2A data set was successfully created.
P r f
S f r o
proc print data=work.subset2a;
o
run;
A
S No tsi
Obs
t ID d e
Partial PROC PRINT Output
Employee_
First_Name Last_Name Gender Salary Job_Title Country
Birth_
Date Hire_Date
ou
1 120102 Tom Zhou M 108255 Sales Manager AU 11AUG1969 01JUN1989
2 120103 Wilson Dawes M 87975 Sales Manager AU 22JAN1949 01JAN1974
3 120121 Irenie Elvish F 26600 Sales Rep. II AU 02AUG1944 01JAN1974
4 120122 Christina Ngan F 27475 Sales Rep. II AU 27JUL1954 01JUL1978
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 28SEP1964 01OCT1985
p106d04
44
6-28 Chapter 6 Reading Excel Worksheets
8
7
Variable
Birth_Date
Country
Type
Num
Char
Len
8
2
Format
DATE9.
$2.
Informat
DATE9.
$2.
Label
Birth Date
Country
ia l
r
1 Employee_ID Num 8 Employee ID
2 First_Name Char 10 $10. $10. First Name
e
4 Gender Char 1 $1. $1. Gender
t
9 Hire_Date Num 8 DATE9. DATE9. Hire Date
6 Job_Title Char 14 $14. $14. Job Title
a
3 Last_Name Char 12 $12. $12. Last Name
5 Salary Num 8 Salary
y M n
45
a r t i o p106d04
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-29
ia l
r
USEDATE=YES;
SCANTIME=YES;
RUN;
a t e
y M n
46
a r t i o p106d04a
i e t
OUT=<libref.>SAS-data-set
b u
i
identifies the output SAS data set.
p r
DATAFILE="filename"
t r S
r o dis SA
specifies the complete path and filename or a fileref for the input PC file, spreadsheet,
or delimited external file.
S P o r
DBMS=identifier
o f
specifies the type of data to import. To import a DBMS table, you must specify DBMS= using
A t f e
a valid database identifier. For example, DBMS=EXCEL specifies to import a Microsoft Excel
d
worksheet.
S No tsi
REPLACE
overwrites an existing SAS data set. If you do not specify REPLACE, PROC IMPORT does
ou
not overwrite an existing data set.
RANGE="range-name | absolute-range"
subsets a spreadsheet by identifying the rectangular set of cells to import from the specified
spreadsheet.
GETNAMES=YES | NO
for spreadsheets and delimited external files, determines whether to generate SAS variable
names from the column names in the input file's first row of data. Note that if a column name
contains special characters that are not valid in a SAS name, such as a blank, SAS converts
the character to an underscore.
6-30 Chapter 6 Reading Excel Worksheets
MIXED=YES | NO
converts numeric data values into character data values for a column that contains mixed data
types. The default is NO, which means that numeric data will be imported as missing values in
a character column. If MIXED=YES, then the engine will assign a SAS character type for the
column and convert all numeric data values to character data values.
SCANTEXT=YES | NO
scans the length of text data for a data source column and uses the longest string data that is
l
found as the SAS column width.
USEDATE=YES | NO
r ia
specifies which format to use. If USEDATE=YES, then the DATE. format is used for date/time
e
t
columns in the data source table while importing data from Excel workbook. If
a
USEDATE=NO, then a DATETIME. format is used for date/time.
SCANTIME=YES | NO
M n
scans all row values for a DATETIME data type field and automatically determines the TIME
y o
data type if only time values (that is, no date or datetime values) exist in the column.
t a r t i
e
The Export Wizard
i i b u
r r S
The Export Wizard reads data from a SAS data set and
p t
o dis SA
writes it to an external file source.
f
S f o r
1. Select the data set from which you want to export
data.
o
A
S No tsit d e
2. Select the type of data source to which you want to
export files.
3. Assign the output file.
4. Assign the table name.
ou
5. Save the generated PROC EXPORT code. (Optional)
47
6.2 Doing More with Excel Worksheets (Self-Study) 6-31
ia l
t e r
M a
48
r y i o n
t a u t
r i e
The Export Wizard
r i b S
p t
2. Select the type of data source to which you want
o dis SA
to export files.
P r f
S f o r o
A
S No tsit d e
49
ou
6-32 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
50
r y i o n
t a u t
r i e
The Export Wizard
r i b S
p t
4. Assign the table name.
o dis SA
P r f
S f o r o
A
S No tsit d e
51
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-33
ia l
t e r
M a
52
r y i o n
t a u t
r i e
The Export Wizard
r i b S
p
o dis SA
SAS Log
t
r
NOTE: File "S:\Workshop\qtr2007c.xls" will be created if the export
process succeeds.
P f
NOTE: "qtr1" table was successfully created.
S f o r o
A
S No tsit d e
53
ou
6-34 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
54
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-35
ia l
in the EXPORT procedure.
t e r
M a
55
r y i o n p106d04b
t a
DATA=<libref.>SAS-data-set
u t
r i e r i b
identifies the input SAS data set.
S
p t
OUTFILE="filename"
o dis SA
specifies the complete path and filename or a fileref for the output PC file, spreadsheet,
f
r
DBMS=identifier
S f o o
specifies the type of data to export. To export a DBMS table, you must specify DBMS= by
e
using a valid database identifier. For example, DBMS=EXCEL specifies to export a table
A
S No tsit d
into a Microsoft Excel worksheet.
ou
6-36 Chapter 6 Reading Excel Worksheets
Exercises
Level 1
ia l
e r
a. Write a LIBNAME statement to create a libref called MNTH that references a new Excel
workbook named mnth2007.xls.
t
a
b. Write a PROC COPY step that copies orion.mnth7_2007, orion.mnth8_2007,
and orion.mnth9_2007 to the new Excel workbook.
M
r y i o n
c. Write a PROC CONTENTS step to view all of the contents of MNTH.
t
d. Write a LIBNAME statement to clear the MNTH libref.
Level 2
e t a u
r i t r i b S
p
5. Using the Import Wizard to Read an Excel Worksheet
r o dis SA
a. Use the Import Wizard to read the products.xls workbook.
P f
1) Select the worksheet containing children data.
S f o r o
2) Name the new data set Work.children.
A
S No tsit d e
3) Save the generated PROC IMPORT code to a file called children.sas.
b. Write a PROC PRINT step to create a report of the new data set.
c. Open children.sas to view the PROC IMPORT code.
Level 3
ou
6. Using the EXPORT Procedure to Create an Excel Worksheet
a. Write a PROC EXPORT step to export the data set orion.mnth7_2007 to an Excel
workbook called mnth7.xls.
b. Submit the program and confirm in the log that the mnth_2007 worksheet was successfully
created in mnth7.xls.
6.3 Chapter Review 6-37
Chapter Review
1. What statement is used to point to a physical filename
including the path, filename, and extension of an Excel
l
workbook ?
r
2. What character appears at the end of an Excel
e
worksheet name in the SAS Explorer? ia
a t
3. What is an example of a SAS name literal?
y M n
4. How do you disassociate a libref?
a r t i o
i e t b u
i
58
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-38 Chapter 6 Reading Excel Worksheets
6.4 Solutions
Solutions to Exercises
1. Reading an Excel Worksheet
a. Retrieve the starter program.
b. Add a LIBNAME statement.
libname custfm 'custfm.xls';
ia l
proc contents data=custfm._all_;
run;
t e r
data work.males;
M a
run;
r y
proc print data=work.males label;
i o n
run;
t a u t
r i e
libname custfm clear;
r i b S
c. Submit the LIBNAME statement and the PROC CONTENTS step.
p
o dis SA t
d. Add a SET statement in the DATA step.
P r
data work.males;
set custfm.'Males$'n;
f
run;
S f o r o
e. Add a KEEP statement in the DATA step.
A
S No tsit
data work.males;
d e
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
run;
ou
f. Add a FORMAT statement in the DATA step.
data work.males;
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
format Birth_Date year4.;
run;
g. Add a LABEL statement.
data work.males;
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
format Birth_Date year4.;
label Birth_Date='Birth Year';
run;
h. Submit the program.
6.4 Solutions 6-39
ia l
data work.golf;
set prod.'Sports$'n;
t e r
a
where Category='Golf';
drop Category;
run;
y M
label Name='Golf Products';
n
r
e. Write a LIBNAME statement.
a t i o
t
libname prod clear;
i e b u
f. Write a PROC PRINT step.
i
r r
proc print data=work.golf label;
run;
p t S
r o dis SA
3. Reading a Range of an Excel Worksheet
S P o f
a. Write a LIBNAME statement.
r
libname xlsdata 'custcaus.xls' header=no;
o
A f e
b. Write a PROC CONTENTS step.
t d
S No tsi
proc contents data=xlsdata._all_;
run;
c. Submit the program.
ou
d. Write a DATA step.
data work.germany;
set xlsdata.DE;
label F1='Customer ID'
F2='Country'
F3='Gender'
F4='First Name'
F5='Last Name'
F6='Birth Date';
format F6 ddmmyy8.;
run;
e. Write a LIBNAME statement.
libname xlsdata clear;
6-40 Chapter 6 Reading Excel Worksheets
ia l
r
select mnth7_2007 mnth8_2007 mnth9_2007;
e
run;
a t
c. Write a PROC CONTENTS step.
proc contents data=mnth._all_;
run;
y M n
o
d. Write a LIBNAME statement.
a
libname mnth clear;
t r t i
i e i b u
5. Using the Import Wizard to Read an Excel Worksheet
p r t r
a. Use the Import Wizard.
S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.4 Solutions 6-41
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6-42 Chapter 6 Reading Excel Worksheets
ia l
t e r
M a
r y i o n
t a u
b. Write a PROC PRINT step. t
i e i b
proc print data=work.children;
run;
r r S
p
o dis SA t
c. Open children.sas.
P r
PROC IMPORT OUT= WORK.children
f
DATAFILE= "S:\Workshop\products.xls"
S o r o
DBMS=EXCEL REPLACE;
RANGE="Children$";
f
A
S No tsit
GETNAMES=YES;
MIXED=NO;
d
SCANTEXT=YES;
USEDATE=YES;
e
SCANTIME=YES;
ou
RUN;
6. Using the EXPORT Procedure to Create an Excel Worksheet
a. Write a PROC EXPORT step.
proc export data=orion.mnth7_2007
outfile='mnth7.xls'
dbms=excel replace;
run;
b. Submit the program.
6.4 Solutions 6-43
ia l
r
run;
t e
b. proc print data=orionxls.'UnitedStates$';
run;
a
run;
y M
c. proc print data=orionxls.'UnitedStates'n;
n
r i o
d. proc print data=orionxls.'UnitedStates$'n;
a t
t
run;
20
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-44 Chapter 6 Reading Excel Worksheets
l
workbook ?
ia
a LIBNAME statement
e r
2. What character appears at the end of an Excel
t
worksheet name in the SAS Explorer?
$
M a
r
orionxls.'Australia$'n
i o n
3. What is an example of a SAS name literal?
y
t a u t
59
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
r
4. How do you disassociate a libref?
S f o
CLEAR option
e o
A
S No tsit d
ou
60
Chapter 7 Reading Delimited Raw
Data Files
ia l
Exercises .............................................................................................................................. 7-27
7.2
t e r
Using Nonstandard Delimited Data as Input .............................................................. 7-30
a
Exercises .............................................................................................................................. 7-44
M
7.3
y o n
Chapter Review ............................................................................................................. 7-48
r i
7.4
t a u t
Solutions ....................................................................................................................... 7-49
r i e i b
Solutions to Exercises .......................................................................................................... 7-49
r S
p t
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 7-52
o dis SA
r
Solutions to Chapter Review ................................................................................................ 7-55
S P o r o f
A t f d e
S No tsi
ou
7-2 Chapter 7 Reading Delimited Raw Data Files
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
7.1 Using Standard Delimited Data as Input 7-3
Objectives
Use the DATA step to create a SAS data set from
a delimited raw data file.
Examine the compilation and execution phases
of the DATA step when reading a raw data file.
ia l
r
Explicitly define the length of a variable by using
e
the LENGTH statement.
a t
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
Business Scenario
S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and
f
the United States.
A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales
ou
Representatives
the employee’s first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion
4
7-4 Chapter 7 Reading Delimited Raw Data Files
Business Scenario
Reading SAS
Data Sets
Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files
M a
5
r y i o n
t a u t
r i e
Business Scenario
r i b S
p
o dis SA t libname
data ;
;
r
Reading SAS set ;
Data Sets
P f
...
r
run;
S f o e o libname ;
A
data ;
t
Reading Excel
d
set ;
ou
data ;
Reading Delimited infile ;
input ;
Raw Data Files ...
run;
6
7.1 Using Standard Delimited Data as Input 7-5
sales.csv
Partial sales.csv comma delimited
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
l
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979
ia
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
r
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998
120128,Monica,Kletschkus,F,30890,Sales Rep. IV,AU,14JUL1986,11/01/2006
e
120129,Alvin,Roebuck,M,30070,Sales Rep. III,AU,22NOV1964,10/01/1985
t
120130,Kevin,Lyon,M,26955,Sales Rep. I,AU,14DEC1984,05/01/2006
a
120131,Marinus,Surawski,M,26910,Sales Rep. I,AU,25SEP1979,01/01/2003
120132,Fancine,Kaiser,F,28525,Sales Rep. III,AU,05APR1949,10/01/1978
M
120133,Petrea,Soltau,F,27440,Sales Rep. II,AU,22APR1986,10/01/2006
n
120134,Sian,Shannan,M,28015,Sales Rep. II,AU,06JUN1949,01/01/1974
y o
120135,Alexei,Platts,M,32490,Sales Rep. IV,AU,26JAN1969,10/01/1997
7
t a r t i
i i b u
The raw data filename needs to be specific to your operating environment.
e
p r t r
Business Scenario Syntax
S
r o dis SA
Use the following statements to complete the scenario:
S P o r o f
DATA output-SAS-data-set;
LENGTH variable(s) $ length;
A t f e
INFILE 'raw-data-file-name';
d
INPUT specifications;
ou
variable = 'label';
FORMAT variable(s) format;
RUN;
8
7-6 Chapter 7 Reading Delimited Raw Data Files
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
INPUT specifications;
<additional SAS statements>
RUN;
ia l
t e r
The DATA statement can create temporary or permanent
a
data sets.
y M n
9
a r t i o
i e t b u
p r t i
The INFILE Statement
r S
The INFILE statement identifies the physical name
r o dis SA
of the raw data file to read with an INPUT statement.
S P o r o f
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
INPUT specifications;
A t f e
<additional SAS statements>
d
S No tsi
RUN;
ou
environment uses to access the file.
10
7.1 Using Standard Delimited Data as Input 7-7
ia l
z/OS
(OS/390)
e r
infile '.workshop.rawdata(sales)';
t
M a
11
r y i o n
t a u t
r i e i
The INPUT Statement
r b S
p t
The INPUT statement describes the arrangement of
o dis SA
values in the raw data file and assigns input values
f
S f o r o
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
A
S No tsit RUN;
d e
INPUT specifications;
<additional SAS statements>
ou
The following are input specifications:
column input
formatted input
list input
12
Column input enables you to read standard data values that are aligned in columns in the raw data file.
Formatted input combines the flexibility of using informats with many of the features of column input.
By using formatted input, you can read nonstandard data for which SAS requires additional instructions.
List input uses a scanning method for locating data values. Data values are not required to be aligned
in columns, but must be separated by at least one blank or other defined delimiter.
7-8 Chapter 7 Reading Delimited Raw Data Files
l
d. none
ia
e. not sure
t e r
M a
14
r y i o n
t a u t
r i e
List Input
r i b S
p t
To read with list input, data values
o dis SA
must be separated with a delimiter
f
S o r
Partial sales.csv
f o
A
S No tsit e
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
d
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979
ou
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998
15
7.1 Using Standard Delimited Data as Input 7-9
Delimiter
A space (blank) is the default delimiter.
DATA output-SAS-data-set;
INFILE 'raw-data-file-name' DLM='delimiter';
ia l
r
INPUT specifications;
e
<additional SAS statements>
RUN;
a t
y M n
16
a r t i o
i e t b u
The DLM= option is an alias for the DELIMITER= option.
p r t i
To specify a tab delimiter on Windows or UNIX, type dlm='09'x.
r S
To specify a tab delimiter on z/OS (OS/390), type dlm='05'x.
r o dis SA
The DSD (delimiter-sensitive data) option changes how SAS treats delimiters when you use LIST input
P f
and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters
S o r
as a missing value and removes quotation marks from character values. The DSD option specifies that
o
when data values are enclosed in quotation marks, delimiters within the value be treated as character data.
f
A
S No tsit d e
Standard and Nonstandard Data
Standard data is data that SAS can read without
ou
any special instructions.
17
7-10 Chapter 7 Reading Delimited Raw Data Files
ia l
r
value rather than as a numeric value.
e
The default length for character and numeric variables
is eight bytes.
t
M a
18
r y i o n
t a u t
r i e i b
List Input for Standard Data
r S
p
o dis SA
Partial sales.csv
t
r
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
P f
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
r
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
S o o
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
f
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979
e
A
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979
S No tsit d
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998
ou
Gender $ Salary Job_Title $ Country $;
19
7.1 Using Standard Delimited Data as Input 7-11
Business Scenario
Create a temporary SAS data set named Work.subset3
from the delimited raw data file named sales.csv.
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
run;
Gender $ Salary Job_Title $ Country $;
ia l
t e r
M a
20
r y i o n p107d01
t a u t
r i e
Business Scenario
r i b S
281
282
p t
data work.subset3;
o dis SA
infile 'sales.csv' dlm=',';
r
283 input Employee_ID First_Name $ Last_Name $
284 Gender $ Salary Job_Title $ Country $;
P f
285 run;
S f o r o
NOTE: The infile 'sales.csv' is:
File Name=S:\Workshop\sales.csv,
RECFM=V,LRECL=256
A
S No tsit
NOTE: 165
The
The
NOTE: The
d e
records were read from the infile 'sales.csv'.
minimum record length was 61.
maximum record length was 80.
data set WORK.SUBSET3 has 165 observations and 7 variables.
21
ou
7-12 Chapter 7 Reading Delimited Raw Data Files
Business Scenario
proc print data=work.subset3;
run;
l
Obs ID Name Name Gender Salary Title Country
ia
1 120102 Tom Zhou M 108255 Sales Ma AU
2 120103 Wilson Dawes M 87975 Sales Ma AU
3 120121 Irenie Elvish F 26600 Sales Re AU
r
4 120122 Christin Ngan F 27475 Sales Re AU
e
5 120123 Kimiko Hotstone F 26190 Sales Re AU
6 120124 Lucian Daymond M 26480 Sales Re AU
7
8
9
10
120125
120126
120127
120128
Fong
a
Satyakam
Sharryn
Monica t Hofmeist
Denny
Clarkson
Kletschk
M
M
F
F
32040
26780
28100
30890
Sales
Sales
Sales
Sales
Re
Re
Re
Re
AU
AU
AU
AU
M
11 120129 Alvin Roebuck M 30070 Sales Re AU
12 120130 Kevin Lyon M 26955 Sales Re AU
22
r y i o n p107d01
t a u t
r i e i
DATA Step Processing
r b S
p t
The DATA step is processed in two phases:
o dis SA
compilation
P r execution
f
S f o r o
Compile Program
A
S No tsit d e
Initialize Variables to Missing (PDV)
ou
No
Execute Other Statements
Next
Step
Output to SAS Data Set
24
7.1 Using Standard Delimited Data as Input 7-13
Compilation
During the compilation phase, SAS
checks the syntax of the DATA step statements
creates an input buffer to hold the current raw
data file record that is being processed
creates a program data vector (PDV) to hold the
current SAS observation
creates the descriptor portion of the output data set.
ia l
t e r
M a
25
r y i o n
t a u t
r i e
Compilation
r i b S
p
data work.subset3;
o dis SA t
infile 'sales.csv' dlm=',';
P r
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
f
r
run;
S f o e o
A
S No tsit d
26
ou ...
7-14 Chapter 7 Reading Delimited Raw Data Files
Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
ia l
t e r
M a
27
r y i o n ...
t a u t
r i e
Compilation
r i b S
p
data work.subset3;
o dis SA t
infile 'sales.csv' dlm=',';
f
r
run;
S f o e o
A
S No tsit
PDV
Input Buffer
d
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
ou
Employee
_ID The default length for numeric
N 8 variables is eight bytes.
28 ...
7.1 Using Standard Delimited Data as Input 7-15
Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
ia l
PDV
t e r
a
Employee First_
_ID Name For list input, the default length for
$ 8 character variables is eight bytes.
M
N 8
29
r y i o n ...
t a u t
r i e
Compilation
r i b S
p
data work.subset3;
o dis SA t
infile 'sales.csv' dlm=',';
P r
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
f
r
run;
S f o e o
A
S No tsi
PDV
Input Buffer
t d
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
ou
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
30 ...
7-16 Chapter 7 Reading Delimited Raw Data Files
Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;
ia l
t
Descriptor Portion Work.subset3
e r
a
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
M
N 8 $ 8
31
r y i o n ...
t a u t
r i e i b
7.02 Multiple Choice Poll
r S
p t
Which statement is true?
o dis SA
P r a. An input buffer is only created if you are reading data
f
from a raw data file.
S f o r o
b. The PDV at compile time holds the variable name,
type, byte size, and initial value.
A
S No tsit d e
c. The descriptor portion is the first item that is created
at compile time.
33
ou
7.1 Using Standard Delimited Data as Input 7-17
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... InitializeFirst_Name
input Employee_ID PDV $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Input Buffer 1
r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
e ia
PDV
Employee First_ Last
a t Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
. .
35
t a r t i
...
i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;
r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $
P
120121,Irenie,Elvish, ...
r
120122,Christina,Ngan, ...
o
120123,Kimiko,Hotstone, ...
S o
120124,Lucian,Daymond, ... f Last_Name $ Gender $
Salary Job_Title $
Country $;
f
run;
e
120125,Fong,Hofmeister, ...
A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
. .
36 ...
7-18 Chapter 7 Reading Delimited Raw Data Files
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Input Buffer 1
e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 , ia
PDV
Employee First_
a
Lastt Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
. .
37
t a r t i
...
i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;
r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $
P
120121,Irenie,Elvish, ...
S r f
120122,Christina,Ngan, ...
o
120123,Kimiko,Hotstone, ...
o
120124,Lucian,Daymond, ...
Last_Name $ Gender $
Salary Job_Title $
Country $;
f
run;
e
120125,Fong,Hofmeister, ...
A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,
ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
120102 Tom Zhou M 108255 Sales Ma AU
38 ...
7.1 Using Standard Delimited Data as Input 7-19
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Implicit OUTPUT;
ia
Implicit RETURN;
Input Buffer 1 2
e r
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,
t
a
PDV
Employee First_ Last Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
120102 Tom Zhou M 108255 Sales Ma AU
39
t a r t i
...
i e i b u
p r
Execution
t r S
Output SAS Data Set after First Iteration of DATA Step
r o dis SA
Work.subset3
P f
Employee First_ Last Job_
Gender Salary Country
r
_ID Name _Name Title
S f o
120102 Tom
e o
Zhou M 108255 Sales Ma AU
A
S No tsit d
40
ou ...
7-20 Chapter 7 Reading Delimited Raw Data Files
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... Reinitialize
input Employee_ID PDV
First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Input Buffer 1
e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 , ia
PDV
Employee First_
a
Last t Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
. .
41
t a r t i
...
i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;
r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $
P
120121,Irenie,Elvish, ...
S r f
120122,Christina,Ngan, ...
o
120123,Kimiko,Hotstone, ...
o
120124,Lucian,Daymond, ...
Last_Name $ Gender $
Salary Job_Title $
Country $;
f
run;
e
120125,Fong,Hofmeister, ...
A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,
ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
. .
42 ...
7.1 Using Standard Delimited Data as Input 7-21
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Input Buffer 1
e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9 ia
PDV
Employee First_
a
Last t Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
. .
43
t a r t i
...
i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;
r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $
P
120121,Irenie,Elvish, ...
r
120122,Christina,Ngan, ...
o
120123,Kimiko,Hotstone, ...
S o
120124,Lucian,Daymond, ... f Last_Name $ Gender $
Salary Job_Title $
Country $;
f
run;
e
120125,Fong,Hofmeister, ...
A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9
ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
120103 Wilson Dawes M 87975 Sales Ma AU
44 ...
7-22 Chapter 7 Reading Delimited Raw Data Files
Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Implicit OUTPUT;
ia
Implicit RETURN;
Input Buffer 1 2
e r
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9
t
a
PDV
Employee First_ Last Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
120103 Wilson Dawes M 87975 Sales Ma AU
45
t a r t i
...
i e i b u
p r
Execution
t r S
Output SAS Data Set after Second Iteration of DATA Step
r o dis SA
Work.subset3
P f
Employee First_ Last Job_
Gender Salary Country
r
_ID Name _Name Title
S f o
120102 Tom
e
120103 Wilson
o Zhou
Dawes
M
M
108255 Sales Ma AU
87975 Sales Ma AU
A
S No tsit d
46
ou ...
7.1 Using Standard Delimited Data as Input 7-23
Execution
Partial sales.csv
120102,Tom,Zhou, ... Continue
data until EOF
work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;
l
run;
120125,Fong,Hofmeister, ...
Input Buffer 1
e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9 ia
PDV
Employee First_ Last
a t Job_
M
Gender Salary Country
_ID Name _Name Title
n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
y o
120103 Wilson Dawes M 87975 Sales Ma AU
47
t a r t i
i e i b u
p r
7.03 Multiple Choice Poll
t r
Which statement is true?
S
r o dis SA
a. Data is read directly from the raw data file to the PDV.
P f
b. At the bottom of the DATA step, the contents of the
S o r o
PDV are output to the output SAS data set.
c. When SAS returns to the top of the DATA step, any
f
A
S No tsit d
missing.
e
variable coming from a SAS data set is set to
49
ou
7-24 Chapter 7 Reading Delimited Raw Data Files
l
LENGTH variable(s) $ length;
Example:
e r ia
t
length First_Name Last_Name $ 12
a
Gender $ 1;
y M n
52
a r t i o
i e t b u
p r
Business Scenario
t r i S
Create a temporary SAS data set named Work.subset3
r o dis SA
from the delimited raw data file named sales.csv.
S P
data work.subset3;
o r o f
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25
A t f e
Country $ 2;
infile 'sales.csv' dlm=',';
d
S No tsi
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;
53
ou p107d02
7.1 Using Standard Delimited Data as Input 7-25
Business Scenario
proc print data=work.subset3;
run;
1
2
3
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
120102
120103
120121
108255
87975
26600
ia l
r
4 Christina Ngan F Sales Rep. II AU 120122 27475
5 Kimiko Hotstone F Sales Rep. I AU 120123 26190
e
6 Lucian Daymond M Sales Rep. I AU 120124 26480
t
7 Fong Hofmeister M Sales Rep. IV AU 120125 32040
8 Satyakam Denny M Sales Rep. II AU 120126 26780
a
9 Sharryn Clarkson F Sales Rep. II AU 120127 28100
10 Monica Kletschkus F Sales Rep. IV AU 120128 30890
11 Alvin Roebuck M Sales Rep. III AU 120129 30070
M
12 Kevin Lyon M Sales Rep. I AU 120130 26955
54
r y i o n p107d02
t a u t
r i e
Compilation
r i b S
p
data work.subset3;
o dis SA t
length First_Name $ 12 Last_Name $ 18
P r Gender $ 1 Job_Title $ 25
Country $ 2;
f
r
infile 'sales.csv' dlm=',';
S f o o
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
e
A
run;
S No tsi
PDV t
First
d
Last
Gender Job_Title Country
ou
_Name _Name
$ 1 $ 25 $ 2
$ 12 $ 18
55 ...
7-26 Chapter 7 Reading Delimited Raw Data Files
Compilation
data work.subset3;
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25
Country $ 2;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;
ia l
PDV
First
_Name
Last
_Name
Gender
t e r
Job_Title Country
Employee
_ID
Salary
a
$ 1 $ 25 $ 2 N 8
$ 12 $ 18 N 8
y M n
56
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
7.1 Using Standard Delimited Data as Input 7-27
Exercises
Level 1
ia l
e r
b. Add the appropriate LENGTH, INFILE, and INPUT statements to read the comma-delimited
t
raw data file named the following:
Windows or UNIX
M a newemps.csv
z/OS (OS/390)
r y
Partial Raw Data File
i o n
.workshop.rawdata(newemps)
t a u t
Satyakam,Denny,Sales Rep. II,26780
e
Monica,Kletschkus,Sales Rep. IV,30890
r i r i b
Kevin,Lyon,Sales Rep. I,26955
S
Petrea,Soltau,Sales Rep. II,27440
t
p
Marina,Iyengar,Sales Rep. III,29715
r o dis SA
The following variables should be read into the program data vector:
S P o r
Name
First
o f
Type
Character
Length
12
A t f e
Last
d
Character 18
S No tsi Title
Salary
Character
Numeric
25
ou
c. Submit the program to create the following PROC PRINT report:
Partial PROC PRINT Output (First 5 of 71 Observations)
Obs First Last Title Salary
Level 2
l
z/OS (OS/390) .workshop.rawdata(donation)
e r ia
t
120267 15 15 15 15
120269 20 20 20 20
120270
120271
20 10
20 20
M a
5 .
20 20
r i o n
The following variables should be read into the program data vector:
y t
Name Type Length
e t a IDNum
u Character 6
r i Qtr1
t r i b S
Numeric 8
r p
o dis SA
Qtr2
Qtr3
Numeric
Numeric
8
S P o r Qtr4
o f Numeric 8
A t f d e
b. Write a PROC PRINT step to create the following report:
S No tsi
Partial PROC PRINT Output (First 10 of 124 Observations)
ou
1 120265 . . . 25
2 120267 15 15 15 15
3 120269 20 20 20 20
4 120270 20 10 5 .
5 120271 20 20 20 20
6 120272 10 10 10 10
7 120275 15 15 15 15
8 120660 25 25 25 25
9 120662 10 . 5 5
10 120663 . . 5 .
7.1 Using Standard Delimited Data as Input 7-29
Level 3
l
z/OS (OS/390) .workshop.rawdata(supplier)
Use column input in the INPUT statement to read the fixed column data.
e r
Documentation on column input can be found in the SAS Help ia
a t
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
M
Statements INPUT Statement, Column).
50
r y
Partial Raw Data File
i o n
Scandinavian Clothing A/S NO
109
316
t a
Petterson AB
u
Prime Sports Ltd
t SE
GB
755
772
r i e
Top Sports
i b
AllSeasons Outdoor Clothing
r S
DK
US
p t
The following is the layout of the raw data file:
o dis SA
r
Name Starting Column Ending Column
S P o r
ID
Name
o f 1
8
5
37
A t f d e
Country 40 41
S No tsi
b. Write a PROC PRINT step to create the following report:
ou
Partial PROC PRINT Output (First 10 of 52 Observations)
Obs ID Name Country
Objectives
Use informats to read nonstandard data.
Add additional SAS statements to perform further
processing in the DATA step.
Use the DSD option with list input to read consecutive
ia l
r
delimiters as missing values.
t e
Use the MISSOVER option to recognize missing
values at the end of a record (Self-Study).
a
y M n
a r t i o
i e t b u
i
60
p r t r S
r o dis SA
Standard and Nonstandard Data
S P o o f
Standard data is data that SAS can read without
r
any special instructions.
A t f e
Examples of standard numeric data:
d
S No tsi 58 -23 67.23 00.99 5.67E5 1.2E-2
ou
without a special instruction.
61
7.2 Using Nonstandard Delimited Data as Input 7-31
ia l
data values into a variable.
e r
The width of the informat can be eliminated.
t
a
For character variables, the width of the informat
determines the variable length, if it has not been
previously defined.
y M n
62
a r t i o
i e t b u
p r
SAS Informats
t r i S
SAS informats have the following form:
r o dis SA
<$>informat<w>.<d>
S P o r
$
o f
indicates a character informat.
A t finformat
S No tsi w
specifies the number of columns to read in the
input data.
ou
. is a required delimiter.
63
7-32 Chapter 7 Reading Delimited Raw Data Files
SAS Informats
Selected SAS Informats:
Informat Definition
$w. reads standard character data.
w.d reads standard numeric data.
COMMAw.d
DOLLARw.d
reads nonstandard numeric data and removes
embedded commas, blanks, dollar signs, percent
signs, and dashes.
ia l
t e r
COMMAXw.d reads nonstandard numeric data and removes
embedded periods, blanks, dollar signs, percent signs,
DOLLARXw.d and dashes.
EUROXw.d
M a
reads nonstandard numeric data and removes
embedded characters in European currency.
64
r y i o n
t a u t
r i e
SAS Informats
r i b S
p t
In list input, informats are used to convert nonstandard
o dis SA
numeric data to SAS numeric values.
P r Informat
f
Raw Data Value SAS Data Value
S f o r COMMA.
DOLLAR.
o $12,345 12345
A
S No tsit e
COMMAX.
d
DOLLARX.
EUROX.
$12.345
€12.345
12345
12345
65
ou
7.2 Using Nonstandard Delimited Data as Input 7-33
SAS Informats
SAS uses date informats to read and convert dates
to SAS date values.
l
MMDDYY. 01/01/60 0
01/01/1960
DDMMYY.
e r
311260
31/12/60 365
ia
t
31/12/1960
DATE.
M a 31DEC59
31DEC1959
-1
66
r y i o n
t a u t
r i e i b
List Input for Nonstandard Data
r S
p
Partial sales.csv
o dis SA t
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
P r
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
f
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
r
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
S f o o
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979
e
A
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979
t d
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
S No tsi
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998
ou
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;
67
7-34 Chapter 7 Reading Delimited Raw Data Files
7.04 Quiz
Which INPUT statement correctly uses list input to read
the space-delimited raw data file?
Raw Data
Donny 5MAY2008 25 FL $43,132.50
Margaret 20FEB2008 43 NC 65,150
ia l
t e r
state $ salary comma.;
a
b. input name $ hired :date. age
state $ salary :comma.;
M
69
r y i o n
t a u t
r i e
Business Scenario
r i b S
p t
Create a temporary SAS data set named Work.subset3
o dis SA
from the delimited raw data file named sales.csv.
P r
data work.subset3;
f
S f o r o
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25
e
Country $ 2;
A
S No tsit
infile 'sales.csv' dlm=',';
d
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;
ou
run;
p107d03
71
7.2 Using Nonstandard Delimited Data as Input 7-35
Business Scenario
proc print data=work.subset3;
run;
1
2
3
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
120102
120103
120121
108255
87975
26600
3510
-3996
-5630
10744
5114
5114
ia l
r
4 Christina Ngan F Sales Rep. II AU 120122 27475 -1984 6756
5 Kimiko Hotstone F Sales Rep. I AU 120123 26190 1732 9405
6 Lucian Daymond M Sales Rep. I AU 120124 26480 -233 6999
e
7 Fong Hofmeister M Sales Rep. IV AU 120125 32040 -1852 6999
t
8 Satyakam Denny M Sales Rep. II AU 120126 26780 10490 17014
9 Sharryn Clarkson F Sales Rep. II AU 120127 28100 6943 14184
a
10 Monica Kletschkus F Sales Rep. IV AU 120128 30890 9691 17106
11 Alvin Roebuck M Sales Rep. III AU 120129 30070 1787 9405
12 Kevin Lyon M Sales Rep. I AU 120130 26955 9114 16922
M
13 Marinus Surawski M Sales Rep. I AU 120131 26910 7207 15706
14 Fancine Kaiser F Sales Rep. III AU 120132 28525 -3923 6848
72
r y i o n p107d03
t a u t
r i e i b
Additional SAS Statements
r S
p t
Additional SAS statements can be added to perform
o dis SA
further processing in the DATA step.
P r
data work.subset3;
f
length First_Name $ 12 Last_Name $ 18
S f o r
Country $ 2;
o
Gender $ 1 Job_Title $ 25
e
infile 'sales.csv' dlm=',';
A
S No tsit d
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;
keep First_Name Last_Name Salary
ou
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';
format Salary dollar12. Hire_Date monyy7.;
run; p107d04
73
7-36 Chapter 7 Reading Delimited Raw Data Files
l
Obs Name Last_Name Sales Title Salary Hired
ia
1 Tom Zhou Sales Manager $108,255 JUN1989
2 Wilson Dawes Sales Manager $87,975 JAN1974
3
4
5
6
Irenie
Christina
Kimiko
Lucian
Elvish
Ngan
Daymond
t
Hotstone
e r Sales
Sales
Sales
Sales
Rep. II
Rep. II
Rep. I
Rep. I
$26,600
$27,475
$26,190
$26,480
JAN1974
JUL1978
OCT1985
MAR1979
a
7 Fong Hofmeister Sales Rep. IV $32,040 MAR1979
8 Satyakam Denny Sales Rep. II $26,780 AUG2006
M
9 Sharryn Clarkson Sales Rep. II $28,100 NOV1998
10 Monica Kletschkus Sales Rep. IV $30,890 NOV2006
74
r y i o n p107d04
t a u t
r i e i b
Additional SAS Statements
r S
p t
The WHERE statement is used to obtain a subset
o dis SA
of observations from an input data set.
f
records from a raw data file.
S f o r o
e
The subsetting IF can subset data that is in the PDV.
A
S No tsit d
75
ou
7.2 Using Nonstandard Delimited Data as Input 7-37
ia l
t r
James Kvarniq,(704) 293-8126,(701) 281-8923
e
Sandrina Stephano,, (919) 271-4592
Cornelia Krahl,(212) 891-3241,(212) 233-5413
M a
Karen Ballinger,, (714) 644-9090
Elke Wallstab,(910) 763-5561,(910) 545-3421
77
r y i o n
t a u t
r i
7.05 Quize r i b S
p t
Open and submit p107a01.
o dis SA
Examine the SAS log.
f
r
observations were created?
S f o
data contacts;
e o
A
S No tsit
length Name $ 20 Phone Mobile $ 14;
d
infile 'phone2.csv' dlm=',';
input Name $ Phone $ Mobile $;
run;
ou
proc print data=contacts noobs;
run;
79
7-38 Chapter 7 Reading Delimited Raw Data Files
Unexpected Results
The missing phone numbers caused unexpected results
in the output.
PROC PRINT Output
Name Phone Mobile
l
Sandrina Stephano (919) 871-7830 Cornelia Krahl
Karen Ballinger (714) 344-4321 Elke Wallstab
e r ia
t
NOTE: 5 records were read from the infile 'phone2.csv'.
The minimum record length was 31.
a
The maximum record length was 44.
NOTE: SAS went to a new line when INPUT statement reached
M
past the end of a line.
NOTE: The data set WORK.CONTACTS has 3 observations and 3
81
variables.
r y i o n
t a u t
r i e i b
Consecutive Delimiters in List Input
r S
p t
By default, list input treats two or more consecutive
o dis SA
delimiters as a single delimiter and not treated as a
P r missing value.
f
S f o r o The two consecutive commas are
e
not being read as a missing value.
A
phone2.csv
S No tsit d
1 1 2 2 3 3 4 4
1---5----0----5----0----5----0----5----0----5
James Kvarniq,(704) 293-8126,(701) 281-8923
Sandrina Stephano,, (919) 271-4592
ou
Cornelia Krahl,(212) 891-3241,(212) 233-5413
Karen Ballinger,, (714) 644-9090
Elke Wallstab,(910) 763-5561,(910) 545-3421
82
7.2 Using Nonstandard Delimited Data as Input 7-39
ia l
e r
INFILE 'raw-data-file-name' DSD;
t
M a
83
r y i o n
t a u t
r i e
Using the DSD Option
r i b S
p t
Adding the DSD option will correctly read the
o dis SA
phone2.csv data file.
P rdata contacts;
f
S f o r o
length Name $ 20 Phone Mobile $ 14;
infile 'phone2.csv' dsd;
e
input Name $ Phone $ Mobile $;
A
S No tsit
run;
d
proc print data=contacts noobs;
run;
ou
The DLM=',' option is no longer needed in the
INFILE statement because the DSD option sets
the default delimiter to a comma.
p107d05
84
7-40 Chapter 7 Reading Delimited Raw Data Files
Results
Adding the DSD option gives the expected results.
PROC PRINT Output
Name Phone Mobile
ia l
Partial SAS Log
t e r
NOTE: 5 records were read from the infile 'phone2.csv'.
a
The minimum record length was 31.
The maximum record length was 44.
M
NOTE: The data set WORK.CONTACTS has 5 observations and
3 variables.
85
r y i o n
t a u t
r i e i b
Missing Values at the End of a Record
r S
p
(Self-Study)
o dis SA t
The data values in phone.csv are separated
f
r
a phone number, and finally a mobile number.
S f o
phone.csv
A
comma delimiter are missing
t
1 1 2 2
of the3lines3of data.
4 4
d
from some
S No tsi 1---5----0----5----0----5----0----5----0----5
James Kvarniq,(704) 293-8126,(701) 281-8923
Sandrina Stephano,(919) 871-7830
ou
Cornelia Krahl,(212) 891-3241,(212) 233-5413
Karen Ballinger,(714) 344-4321
Elke Wallstab,(910) 763-5561,(910) 545-3421
87
7.2 Using Nonstandard Delimited Data as Input 7-41
data contacts;
l
length Name $ 20 Phone Mobile $ 14;
ia
infile 'phone.csv' dsd;
input Name $ Phone $ Mobile $;
run;
t e
proc print data=contacts noobs;r
a
run;
y M n
89
a r t i o
i e t b u
p r t i
Unexpected Results (Self-Study)
r S
The missing mobile phone numbers caused unexpected
r o dis SA
results in the output.
PROC PRINT Output
S P o r
Name
James Kvarniq
o f Phone
(704) 293-8126
Mobile
(701) 281-8923
f
Sandrina Stephano (919) 871-7830 Cornelia Krahl
A
S No tsit d e
Karen Ballinger
ou
The maximum record length was 44.
NOTE: SAS went to a new line when INPUT statement
reached past the end of a line.
NOTE: The data set WORK.CONTACTS has 3 observations and
3 variables.
91
7-42 Chapter 7 Reading Delimited Raw Data Files
ia l
t e r
M a
92
r y i o n
t a u t
r i e i b
The MISSOVER Option (Self-Study)
r S
p t
The MISSOVER option prevents SAS from loading a
o dis SA
new record when the end of the current record is reached.
P r f
r
General form of an INFILE statement with a MISSOVER
S
option:
f o e o
A
S No tsit
INFILE 'raw-data-file-name' MISSOVER;
d
If SAS reaches the end of the row without finding values
ou
for all fields, variables without values are set to missing.
93
7.2 Using Nonstandard Delimited Data as Input 7-43
data contacts;
length Name $ 20 Phone Mobile $ 14;
infile 'phone.csv' dsd;
input Name $ Phone $ Mobile $;
ia l
run;
t e
proc print data=contacts noobs;r
run;
M a
95
r y i o n
t a u t
r i e
Results (Self-Study)
r i b S
p t
Adding the MISSOVER option gives the expected results.
o dis SA
PROC PRINT Output
P r Name
f
Phone Mobile
r
James Kvarniq (704) 293-8126 (701) 281-8923
S f o
Cornelia Krahl
e o
Sandrina Stephano (919)
(212)
871-7830
891-3241 (212) 233-5413
A
Karen Ballinger (714) 344-4321
t
Elke Wallstab (910) 763-5561 (910) 545-3421
S No tsi d
Partial SAS Log
NOTE: 5 records were read from the infile 'phone.csv'.
ou
The minimum record length was 31.
The maximum record length was 44.
NOTE: The data set WORK.CONTACTS has 5 observations and
3 variables.
97
7-44 Chapter 7 Reading Delimited Raw Data Files
Exercises
Level 1
ia l
e r
b. Add the appropriate LENGTH, INFILE, and INPUT statements to read the comma-delimited
t
raw data file named the following:
Windows or UNIX
M a custca.csv
z/OS (OS/390)
r y
Partial Raw Data File
i o n .workshop.rawdata(custca)
t a u t
Bill,Cuddy,11171,M,16/10/1986,21,15-30 years
e
Susan,Krasowski,17023,F,09/07/1959,48,46-60 years
r i r i b
Andreas,Rennie,26148,M,18/07/1934,73,61-75 years
S
Lauren,Krasowski,46966,F,24/10/1986,21,15-30 years
t
p
Lauren,Marx,54655,F,18/08/1969,38,31-45 years
r o dis SA
The following variables should be read into the program data vector:
S P o r
Name
o
First f
Type
Character
Length
20
A t f d eLast Character 20
S No tsi ID
Gender
Numeric
Character
8
ou BirthDate
Age
AgeGroup
Numeric
Numeric
Character
8
12
7.2 Using Nonstandard Delimited Data as Input 7-45
c. Add a FORMAT statement and a DROP statement in the DATA step to create a data set that
resembles the following when used in the PROC PRINT step:
Partial PROC PRINT Output (First 5 of 15 Observations)
Birth
Obs First Last Gender AgeGroup Date
l
3 Andreas Rennie M 61-75 years JUL1934
4 Lauren Krasowski F 15-30 years OCT1986
ia
5 Lauren Marx F 31-45 years AUG1969
Level 2
t e r
a
5. Reading a Space-Delimited Raw Data File with Spaces in Data Values
M n
a. Write a DATA step to create a new data set named Work.us_customers by reading the space-
a r y
delimited raw data named the following:
Windows or UNIX
t i o custus.dat
e t
z/OS (OS/390)
i b u .workshop.rawdata(custus)
r r i S
Some of the data values contain spaces. Use an option in the INFILE statement to specify that
p t
when data values are enclosed in quotation marks, delimiters within the value are treated as part
r o dis SA
of the data value.
P
Partial Raw Data File
S r o f
"James Kvarniq" 4 M 27JUN1974 33 "31-45 years"
o
"Sandrina Stephano" 5 F 09JUL1979 28 "15-30 years"
A t f e
"Karen Ballinger" 10 F 18OCT1984 23 "15-30 years"
"David Black" 12 M 12APR1969 38 "31-45 years"
d
S No tsi
"Jimmie Evans" 17 M 17AUG1954 53 "46-60 years"
ou
Name Type Length
Name Character 20
ID Numeric 8
Gender Character 1
BirthDate Numeric 8
Age Numeric 8
AgeGroup Character 12
b. Add a FORMAT statement in the DATA step to make the BirthDate resemble a three-character
month with a four-digit year.
7-46 Chapter 7 Reading Delimited Raw Data Files
c. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 7 of 28 Observations)
Birth
Obs Name Gender Date AgeGroup Age
ia l
38
53
53
Level 3
7
t e r
Michael Dineley M APR1959 46-60 years 48
M a
6. Reading Missing Values at the End of a Record
y o n
a. Write a DATA step to create a new data set named Work.prices by reading the asterisk-
r i
delimited raw data file named the following:
t a
Windows or UNIX
u t prices.dat
r i e i b
z/OS (OS/390)
r S
.workshop.rawdata(prices)
p
o dis SA t
Some of the records do not have a value for UnitSalesPrice and the last delimiter is
missing.
P r f
Documentation on the INFILE statement options can be found in the SAS Help
S f o r o
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
A
S No tsit d e
Statements INFILE Statement).
Partial Raw Data File
210200100009*09JUN2007*31DEC9999*$15.50*$34.70
210200100017*24JAN2007*31DEC9999*$17.80
ou
210200200023*04JUL2007*31DEC9999*$8.25*$19.80
210200600067*27OCT2007*31DEC9999*$28.90
210200600085*28AUG2007*31DEC9999*$17.85*$39.40
7.2 Using Nonstandard Delimited Data as Input 7-47
The following variables should be read into the program data vector:
ia l
UnitSalesPrice
t e r Numeric 8
b. Write a PROC PRINT step and add a LABEL and a FORMAT statement in the DATA step
M a
to create a data set that resembles the following when used in the PROC PRINT step:
Partial PROC PRINT Output (First 10 of 259 Observations)
a t
Obs Product ID Date Range Date Range per Unit Unit
i e
1
t b u
210200100009 06/09/2007 12/31/9999 15.50 34.70
i
2 210200100017 01/24/2007 12/31/9999 17.80 .
p r 3
4
t r
210200200023
210200600067
S
07/04/2007
10/27/2007
12/31/9999
12/31/9999
8.25
28.90
19.80
.
o dis SA
5 210200600085 08/28/2007 12/31/9999 17.85 39.40
r
6 210200600112 01/04/2007 12/31/9999 9.25 21.80
7 210200900033 09/17/2007 12/31/9999 6.45 14.20
S P o r
8
9
10
o f
210200900038
210201000050
210201000126
02/01/2007
04/02/2007
04/22/2007
12/31/9999
12/31/9999
12/31/9999
9.30
9.00
2.30
20.30
19.60
6.50
A t f d e
S No tsi
ou
7-48 Chapter 7 Reading Delimited Raw Data Files
Chapter Review
1. What statement identifies the physical filename of the
raw data file to read?
ia l
r
in the raw data file?
t e
3. What is the default delimiter when the DLM= option is
used?
a
M n
4. What are the two phases of DATA step processing?
y
a r t i o
5. What is a program data vector (PDV)?
i e t b u
i
100 continued...
p r t r S
r o dis SA
Chapter Review
S P r f
6. Why would you use a LENGTH statement?
o o
A t f d e
7. What is an instruction that SAS uses to read data
values into a variable?
ou
101
7.4 Solutions 7-49
7.4 Solutions
Solutions to Exercises
1. Reading a Comma-Delimited Raw Data File
a. Retrieve the starter program.
b. Add the appropriate LENGTH, INFILE, and INPUT statements.
data work.NewEmployees;
ia l
r
length First $ 12 Last $ 18 Title $ 25;
e
infile 'newemps.csv' dlm=',';
run;
a t
input First $ Last $ Title $ Salary;
run;
y M
proc print data=work.NewEmployees;
n
r t i o
For z/OS (OS/390), the following INFILE statement is used:
a
i e t b u
infile '.workshop.rawdata(newemps)' dlm=',';
i
c. Submit the program.
p r t r S
2. Reading a Space-Delimited Raw Data File
r o dis SA
a. Write a DATA step.
data work.QtrDonation;
S P r
length IDNum $ 6;
o
infile 'donation.dat';
o f
f e
input IDNum $ Qtr1 Qtr2 Qtr3 Qtr4;
A run;
S No tsit d
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(donation)';
ou
b. Write a PROC PRINT step.
proc print data=work.QtrDonation;
run;
3. Using Column Input to Read a Fixed Column Raw Data File
a. Write a DATA step.
data work.supplier_info;
infile 'supplier.dat';
input ID 1-5 Name $ 8-37 Country $ 40-41;
run;
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(supplier)';
7-50 Chapter 7 Reading Delimited Raw Data Files
t e r
BirthDate :ddmmyy. Age AgeGroup $;
a
run;
run;
y M
proc print data=work.canada_customers;
n
r i o
For z/OS (OS/390), the following INFILE statement is used:
a t
t
infile '.workshop.rawdata(custca)' dlm=',';
i e b u
c. Add a FORMAT statement and a DROP statement.
i
r r
data work.canada_customers;
p t S
length First Last $ 20 Gender $ 1 AgeGroup $ 12;
o dis SA
infile 'custca.csv' dlm=',';
P r
input First $ Last $ ID Gender $
f
BirthDate :ddmmyy. Age AgeGroup $;
S o r
format BirthDate monyy7.;
drop ID Age;
f o
e
run;
A
S No tsit d
5. Reading a Space-Delimited Raw Data File with Spaces in Data Values
a. Write a DATA step.
ou
data work.us_customers;
length Name $ 20 Gender $ 1 AgeGroup $ 12;
infile 'custus.dat' dlm=' ' dsd;
input Name $ ID Gender $ BirthDate :date.
Age AgeGroup $;
run;
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(custus)' dlm=' ' dsd;
7.4 Solutions 7-51
t e r
var Name Gender BirthDate AgeGroup Age;
data work.prices;
r y i o n
infile 'prices.dat' dlm='*' missover;
t a u t
input ProductID StartDate :date. EndDate :date.
UnitCostPrice :dollar. UnitSalesPrice :dollar.;
run;
r i e r i b S
t
For z/OS (OS/390), the following INFILE statement is used:
p
o dis SA
infile '.workshop.rawdata(prices)' dlm='*' missover;
r
b. Write a PROC PRINT step and add a LABEL and a FORMAT statement in the DATA step.
S P r
data work.prices;
o f
infile 'prices.dat' dlm='*' missover;
o
f
input ProductID StartDate :date. EndDate :date.
A
S No tsit d e
UnitCostPrice :dollar. UnitSalesPrice :dollar.;
label ProductID='Product ID'
StartDate='Start of Date Range'
EndDate='End of Date Range'
ou
UnitCostPrice='Cost Price per Unit'
UnitSalesPrice='Sales Price per Unit';
format StartDate EndDate mmddyy10.
UnitCostPrice UnitSalesPrice 8.2;
run;
ia l
t r
type, byte size, and initial value.
e
c. The descriptor portion is the first item that is created
at compile time.
M a
r y i o n
t a u t
34
r i e r i b S
p
o dis SA t
P r 7.03 Multiple Choice Poll – Correct Answer
f
Which statement is true?
S o r o
a. Data is read directly from the raw data file to the PDV.
f
A
S No tsit d e
b. At the bottom of the DATA step, the contents of the
PDV are output to the output SAS data set.
c. When SAS returns to the top of the DATA step, any
variable coming from a SAS data set is set to
ou
missing.
50
7.4 Solutions 7-53
Raw Data
Donny 5MAY2008 25 FL $43,132.50
Margaret 20FEB2008 43 NC 65,150
ia l
state $ salary comma.;
t e r
a
b. input name $ hired :date. age
state $ salary :comma.;
M
70
r y i o n
t a u t
r i e i b
7.05 Quiz – Correct Answer
r S
p t
Open and submit p107a01.
o dis SA
Examine the SAS log.
f
r
observations were created?
S f o e o
A
Five records were read from the input file and three
S No tsit d
observations were created.
80
ou
7-54 Chapter 7 Reading Delimited Raw Data Files
Five records were read from the input file, and three
l
observations were created.
e r ia
a t
y M n
90
a r t i o
i e t b u
p r t i
7.07 Quiz – Correct Answer (Self-Study)
r S
Open p107a03 and add the MISSOVER option to
r o dis SA
the INFILE statement. Submit the program and examine
the SAS log. How many input records were read and
S P o
data contacts;
o f
how many observations were created?
r
A t f e
length Name $ 20 Phone Mobile $ 14;
d
infile 'phone.csv' dsd missover;
ou
proc print data=contacts noobs;
run;
ia l
in the raw data file?
an INPUT statement
t r
2. What statement describes the arrangement of values
e
M a
3. What is the default delimiter when the DLM= option is
used?
r y
a blank delimiter
i o n
t a u t
102
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
4. What are the two phases of DATA step processing?
S f o r
compilation
o
e
execution
A
S No tsit d
5. What is a program data vector (PDV)?
A logical area in memory where SAS holds the
ou
current observation
103 continued...
7-56 Chapter 7 Reading Delimited Raw Data Files
ia l
t e r
M a
104
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 8 Validating and Cleaning
Data
ia l
8.2
t e r
Examining Data Errors When Reading Raw Data Files ............................................... 8-9
Demonstration: Examining Data Errors ............................................................................... 8-14
8.3
M a
Validating Data with the PRINT and FREQ Procedures ............................................ 8-19
y o n
Exercises .............................................................................................................................. 8-30
r i
8.4
t a u t
Validating Data with the MEANS and UNIVARIATE Procedures ............................... 8-33
r i e i b
Exercises .............................................................................................................................. 8-38
r S
8.5
p
o dis SA t
Cleaning Invalid Data ................................................................................................... 8-41
P r Demonstration: Using the Viewtable Window to Clean Data – Windows (Self-Study) ........ 8-44
f
S f o r o
Demonstration: Using the Viewtable Window to Clean Data – UNIX (Self-Study) .............. 8-47
e
Demonstration: Using the FSEDIT Window to Clean Data – z/OS (OS/390)
A
S No tsit d
(Self-Study) ................................................................................................. 8-49
ou
8.6 Chapter Review ............................................................................................................. 8-66
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.1 Introduction to Validating and Cleaning Data 8-3
Objectives
Define data errors in a raw data file.
Identify procedures for validating data.
Identify techniques for cleaning data.
Define the business scenario that will be used
ia l
t e r
with validating and cleaning data.
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rBusiness Scenario
f
r
A delimited raw data file containing information on Orion
S f o o
Star non-sales employees from Australia and the United
States needs to be read to create a data set.
e
A
S No tsit d
Requirements of non-sales employee data:
Employee_ID, Salary, Birth_Date, and
ou
Hire_Date must be numeric variables.
First, Last, Gender, Job_Title, and
Country must be character variables.
4
8-4 Chapter 8 Validating and Cleaning Data
8.01 Quiz
What problems will SAS have reading the numeric data
Salary and Hire_Date?
Partial nonsales.csv
120101,Patrick,Lu,M,163040,Director,AU,18AUG1976,01JUL2003
120104,Kareen,Billington,F,46230,Administration Manager,au,11MAY1954,01JAN1981
120105,Liz,Povey,F,27110,Secretary I,AU,21DEC1974,01MAY1999
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974
120107,Sherie,Sheedy,F,30475,Office Assistant III,AU,01FEB1978,21JAN1953
ia l
r
120108,Gladys,Gromek,F,27660,Warehouse Assistant II,AU,23FEB1984,01AUG2006
120108,Gabriele,Baker,F,26495,Warehouse Assistant I,AU,15DEC1986,01OCT2006
t e
120110,Dennis,Entwisle,M,28615,Warehouse Assistant III,AU,20NOV1949,01NOV1979
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978
a
120112,Ellis,Glattback,F,26550, ,AU,17FEB1969,01JUL1990
120113,Riu,Horsey,F,26870,Security Guard II,AU,10MAY1944,01JAN1974
M
120114,Jeannette,Buddery,G,31285,Security Manager,AU,08FEB1944,01JAN1974
n
120115,Hugh,Nichollas,M,2650,Service Assistant I,AU,08MAY1984,01AUG2005
.,Austen,Ralston,M,29250,Service Assistant II,AU,13JUN1959,01FEB1980
a y t i o
120117,Bill,Mccleary,M,31670,Cabinet Maker III,AU,11SEP1964,01APR1986
r
i e t b u
p r
Data Errors
t r i S
Data errors occur when data values are not appropriate
r o dis SA
for the SAS statements that are specified in a program.
SAS detects data errors during program execution.
S P o r o f
When a data error is detected, SAS continues to
execute the program.
A t
RULE:
f d e
NOTE: Invalid data for Salary in line 4 23-29.
----+----1----+----2----+----3----+----4----+----5----+----6
S No tsi
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
ou
NOTE: Invalid data for Hire_Date in line 9A63-71.
data error example is
9 120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL194
61 9,99NOV1978 71 defining a variable as
numeric,
Employee_ID=120111 First=Ubaldo Last=Spillane but theSalary=26895
Gender=M data value
is actually character.
Job_Title=Security Guard II Country=AU Birth_Date=23/07/1949
Hire_Date=. _ERROR_=1 _N_=9
8
8.1 Introduction to Validating and Cleaning Data 8-5
Business Scenario
Additional requirements of non-sales employee data:
Employee_ID must be unique and not missing.
Gender must have a value of F or M.
Salary must be in the numeric range of 24000 –
l
500000.
ia
Job_Title must not be missing.
t e r
Country must have a value of AU or US.
Birth_Date value must occur before
a
Hire_Date value.
Hire_Date must have a value of 01/01/1974 or
later.
y M n
9
a r t i o
i e t b u
p r
8.02 Quiz
t r i S
What problems exist with the data in this partial data set?
r o dis SA
S P o r o f
A t f d e
S No tsi
11
ou
Hint: There are nine data problems.
8-6 Chapter 8 Validating and Cleaning Data
SAS procedures
for validating data
ia l
PROC
t e r PROC
a
PRINT UNIVARIATE
y M PROC
FREQ
n
PROC
MEANS
13
a r t i o
i e t b u
p r t i
The PRINT Procedure
r S
The PRINT procedure can show the job titles that are
r o dis SA
missing and the hire dates that occur before the birth
dates.
S P Obs
o r
Employee_
ID
A t f5
9
10
e
120107
120111
d
120112
Office Assistant III
Security Guard II
01/02/1978
23/07/1949
17/02/1969
21/01/1953
.
01/07/1990
S No tsi
14
ou p108d01
8.1 Introduction to Validating and Cleaning Data 8-7
Cumulative Cumulative
Gender Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
l
F 110 47.01 110 47.01
G 1 0.43 111 47.44
ia
M 123 52.56 234 100.00
t r
Frequency Missing = 1
e Cumulative Cumulative
a
Country Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
M
AU 33 14.04 33 14.04
n
US 196 83.40 229 97.45
au 3 1.28 232 98.72
15
us
a r y 3
t i o
1.28 235 100.00
i e t b u
p r
The MEANS Procedure
t r i S
The MEANS procedure can show if any salaries are not
r o dis SA
in the range of 24000 to 500000.
P
The MEANS Procedure
S o r o f
Analysis Variable : Salary
f
N
A
S No tsit
N
234
16
ou p108d01
8-8 Chapter 8 Validating and Cleaning Data
l
Variable: Salary
ia
Extreme Observations
r
-----Lowest---- -----Highest----
Value
2401
2650
a t
Obs
20
13
e Value
163040
194885
Obs
1
231
M
24025 25 207885 28
24100 19 268455 29
17
24390
r y
228
i o n 433800 27
p108d01
t a u t
r i e
Cleaning the Data
r i b S
p t
After the data is validated, the invalid data needs to be
o dis SA
cleaned.
P r f
r
Techniques for cleaning data:
S f o o
Editing raw data file outside of SAS
e
A
Interactively editing data set using VIEWTABLE
S No tsit d
Programmatically editing data set using the DATA step
Programmatically editing data set using the SQL
procedure
ou
Using the SAS DataFlux product dfPower Studio
18
Ideally, invalid data should be cleaned in the original data source and not in the SAS data set.
Integrity constraints can be placed on a data set to eliminate the possibility of invalid data in the data set.
Integrity constraints are a set of data validation rules that you can specify in order to restrict the data
values that can be stored for a variable in a SAS data file. Integrity constraints help you preserve the
validity and consistency of your data. SAS enforces the integrity constraints when the values associated
with an integrity constraint variable are added, updated, or deleted.
Section 8.5 addresses the situation where you have no choice but to correct the data in the SAS data set.
8.2 Examining Data Errors When Reading Raw Data Files 8-9
Objectives
Identify data errors.
Demonstrate what happens when a data error is
encountered.
Direct the observations with data errors to a different
ia l
r
data set than the observations without data errors.
e
(Self-Study)
a t
y M n
a r t i o
i e t b u
i
21
p r t r S
r o dis SA
Business Scenario
S P o r o f
A delimited raw data file containing information on Orion
Star non-sales employees from Australia and the United
f
States needs to be read to create a data set.
A
S No tsit d e
Requirements of non-sales employee data:
Employee_ID, Salary, Birth_Date,
ou
and Hire_Date must be numeric variables.
First, Last, Gender, Job_Title, and
Country must be character variables.
22
8-10 Chapter 8 Validating and Cleaning Data
ia l
t e r
M a
24
r y i o n
t a u t
r i e
Data Errors
r i b S
p t
One type of data error is when the INPUT statement
o dis SA
encounters invalid data in a field.
f
r
A note that describes the error is printed in the
S f o
SAS log.
e o
The input record (contents of the input buffer) being
A
S No tsit d
read is displayed in the SAS log.
The values in the SAS observation (contents of the
PDV) being created are displayed in the SAS log.
ou
A missing value is assigned to the appropriate
SAS variable.
Execution continues.
26
8.2 Examining Data Errors When Reading Raw Data Files 8-11
Data Errors
A note that describes the error is printed in the
SAS log.
l
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
ia
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
r
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
a t e
This note indicates that invalid data was found for variable
Salary in line 4 of the raw data file in columns 23-29.
y M n
27
a r t i o
i e t b u
p r
Data Errors
t r i S
The input record (contents of the input buffer) being read
r o dis SA
is displayed in the SAS log.
S P o r
Partial SAS Log
o f
NOTE: Invalid data for Salary in line 4 23-29.
f
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
A
4
S No tsit d e
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
ou
A ruler is drawn above the raw data record that contains
the invalid data.
28
8-12 Chapter 8 Validating and Cleaning Data
Data Errors
The values in the SAS observation (contents of the PDV)
being created are displayed in the SAS log.
l
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
ia
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
r
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
a t e
y M n
29
a r t i o
i e t b u
p r
Data Errors
t r i S
A missing value is assigned to the appropriate
r o dis SA
SAS variable.
S P o r
Partial SAS Log
o f
NOTE: Invalid data for Salary in line 4 23-29.
f
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
A
4
S No tsit d e
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
30
ou
8.2 Examining Data Errors When Reading Raw Data Files 8-13
Data Errors
During the processing of every DATA step, SAS
automatically creates the following temporary variables:
the _N_ variable, which counts the number of times
the DATA step begins to iterate
the _ERROR_ variable, which signals the occurrence
of an error caused by the data during execution
0 indicates that no errors exist.
ia l
RULE:
t r
1 indicates that one or more errors occurred.
e
NOTE: Invalid data for Salary in line 4 23-29.
----+----1----+----2----+----3----+----4----+----5----+----6
4
61 44,01JAN1974 72
M a
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
n
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
y
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
31
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8-14 Chapter 8 Validating and Cleaning Data
l
length Employee_ID 8 First $ 12 Last $ 18
Gender $ 1 Salary 8 Job_Title $ 25 Country $ 2
ia
Birth_Date Hire_Date 8;
infile 'nonsales.csv' dlm=',';
e
input Employee_ID First $ Last $
t r
Gender $ Salary Job_Title $ Country $
a
Birth_Date :date9.
Hire_Date :date9.;
M n
format Birth_Date Hire_Date ddmmyy10.;
y
run;
a r
proc print data=work.nonsales;
t i o
run;
i e t b u
p r t r i
For z/OS (OS/390), the following INFILE statement is used:
S
infile '.workshop.rawdata(nonsales)' dlm=',';
r o dis SA
Partial PROC PRINT Output
E
S P m
p
o
l
r o f J
o
B
i
r
H
i
A t f o
y
d e G S
b
_
C
o
t
h
r
e
S No tsi
O
e
e
_
F
i
r
L
a
e
n
d
a
l
a
T
i
t
u
n
t
_
D
a
_
D
a
ou
b I s s e r l r t t
s D t t r y e y e e
l
189 Hire_Date :date9.;
ia
190 format Birth_Date Hire_Date ddmmyy10.;
191 run;
e
NOTE: The infile 'nonsales.csv' is:
t r
File Name=s:\workshop\nonsales.csv,
a
RECFM=V,LRECL=256
RULE:
M
NOTE: Invalid data for Salary in line 4 23-29.
n
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
y o
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974 72
a r t i
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=. Job_Title=Office Assistant II
Country=AU Birth_Date=23/12/1944 Hire_Date=01/01/1974 _ERROR_=1 _N_=4
t
9
i e b u
NOTE: Invalid data for Hire_Date in line 9 63-71.
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978 71
i
r
Employee_ID=120111 First=Ubaldo Last=Spillane Gender=M Salary=26895 Job_Title=Security Guard II
p t r S
Country=AU Birth_Date=23/07/1949 Hire_Date=. _ERROR_=1 _N_=9
NOTE: 235 records were read from the infile 'nonsales.csv'.
o dis SA
The minimum record length was 55.
f
NOTE: The data set WORK.NONSALES has 235 observations and 9 variables.
S f o r o
A
S No tsit d e
ou
8-16 Chapter 8 Validating and Cleaning Data
ia l
t e r
M a
r y i o n
34
t a u t
r i e r i b
8.04 Multiple Choice Poll
S
p
o dis SA t
Which statement best describes the invalid data?
P r f
a. The data in the raw data file is bad.
S f o r o
b. The programmer incorrectly read the data.
A
S No tsit d e
ou
35
8.2 Examining Data Errors When Reading Raw Data Files 8-17
l
input Employee_ID First $ Last $
Gender $ Salary Job_Title $ Country $
ia
Birth_Date :date9.
r
Hire_Date :date9.;
format Birth_Date Hire_Date ddmmyy10.;
a e
if _error_=1 then output work.baddata;
t
y M n
38
a r t i o p108d03
i e t b u
The raw data filename specified in the INFILE statement must be specific to your operating environment.
p r t r i S
Outputting to Multiple Data Sets (Self-Study)
r o dis SA
An OUTPUT statement can be used in a conditional
P f
statement to write the current observation to a specific
S f o r
data set that is listed in the DATA statement.
o
data work.baddata work.gooddata;
A
S No tsi e
length Employee_ID 8 First $ 12 Last $ 18
t d
Gender $ 1 Salary 8 Job_Title $ 25
Country $ 2 Birth_Date Hire_Date 8;
infile 'nonsales.csv' dlm=',';
input Employee_ID First $ Last $
Gender $ Salary Job_Title $ Country $
ou
Birth_Date :date9.
Hire_Date :date9.;
format Birth_Date Hire_Date ddmmyy10.;
if _error_=1 then output work.baddata;
else output work.gooddata;
run;
p108d03
39
8-18 Chapter 8 Validating and Cleaning Data
l
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
NOTE: Invalid data for Hire_Date in line 9 63-71.
ia
9 120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL194
61 9,99NOV1978 71
r
Employee_ID=120111 First=Ubaldo Last=Spillane Gender=M Salary=26895
e
Job_Title=Security Guard II Country=AU Birth_Date=23/07/1949
t
Hire_Date=. _ERROR_=1 _N_=9
NOTE: 235 records were read from the infile
a
's:\workshop\nonsales.csv'.
The minimum record length was 55.
M
The maximum record length was 82.
NOTE: The data set WORK.BADDATA has 2 observations and 9 variables.
40
r i o n
NOTE: The data set WORK.GOODDATA has 233 observations and 9 variables.
y
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.3 Validating Data with the PRINT and FREQ Procedures 8-19
Objectives
Validate data by using the PRINT procedure with the
WHERE statement.
Validate data by using the FREQ procedure with the
TABLES statement.
ia l
t e r
M a
r y i o n
t a u t
42
r i e r i b S
p
o dis SA t
P r Business Scenario
f
r
Additional requirements of non-sales employee data:
S f o o
Employee_ID must be unique and not missing.
e
A
Gender must have a value of F or M.
S No tsit d
Salary must be in the numeric range of 24000 –
500000.
Job_Title must not be missing.
ou
Country must have a value of AU or US.
Birth_Date value must occur before
Hire_Date value.
Hire_Date must have a value of 01/01/1974 or
later.
43
8-20 Chapter 8 Validating and Cleaning Data
ia l
PROC MEANS step with
VAR statement
t e r
detects invalid numeric values by using
summary statistics.
VAR statement
M a
PROC UNIVARIATE step with detects invalid numeric values by
looking at extreme values.
44
r y i o n
t a u t
r i e i
The PRINT Procedure
r b S
p t
The PRINT procedure produces detail reports based
o dis SA
on SAS data sets.
P r f
r
General form of the PRINT procedure:
S f o o
PROC PRINT DATA=SAS-data-set ;
e
A
S No tsit
VAR variable(s) ;
d
WHERE where-expression ;
RUN;
ou
The VAR statement selects variables to include in
the report and determines their order in the report.
The WHERE statement is used to obtain a subset
of observations.
45
8.3 Validating Data with the PRINT and FREQ Procedures 8-21
WHERE where-expression ;
ia l
r
operators that form a set of instructions that define a
e
condition for selecting observations.
t
Operands include constants and variables.
M a
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.
46
r y i o n
t a u t
r i e
The WHERE Statement
r i b S
p t
The following PROC PRINT step retrieves observations
o dis SA
that have missing values for Job_Title.
P r f
r
proc print data=orion.nonsales;
S f o o
var Employee_ID Last Job_Title;
where Job_Title = ' ';
e
A
run;
S No tsit d
Obs
Employee_
ID Last
Job_
Title
ou
10 120112 Glattback
p108d04
47
8-22 Chapter 8 Validating and Cleaning Data
ia l
t e r
What is the numeric SAS date value for January 1, 1974?
a
A SAS date constant is used to convert a calendar date
to a SAS date value.
y M n
48
a r t i o
i e t b u
p r
SAS Date Constant
t r i S
To write a SAS date constant, enclose a date in quotation
r o dis SA
marks in the form ddMMMyyyy and immediately follow the
final quotation mark with the letter d.
S P dd
o r o f
is a one- or two-digit value for the day.
f
MMM is a three-letter abbreviation for the month.
A
S No tsit yyyy
d
d e
is a four-digit value for the year.
is required to convert the quoted string to a SAS date.
ou
Example:
The date constant for January 1, 1974, is '01JAN1974'd .
49
8.3 Validating Data with the PRINT and FREQ Procedures 8-23
l
where Hire_Date < '01JAN1974'd;
run;
Obs
Employee_
ID
e r
Birth_Date Hire_Date ia
5
9
214
120107
120111
121011
a t
01/02/1978
23/07/1949
11/03/1944
21/01/1953
.
01/01/1968
y M n
50
a r t i o p108d04
i e t b u
p r t i
8.05 Multiple Choice Poll
r S
Which data requirement cannot be achieved with
r o dis SA
the PRINT procedure using a WHERE statement?
S P a.
b.
o r o f
Employee_ID must be unique and not missing.
Gender must have a value of F or M.
A
c.
t
d.
f d e
Salary must be in the numeric range of 24000 – 500000.
Job_Title must not be missing.
S No tsie.
f.
Country must have a value of AU or US.
Birth_Date value must occur before Hire_Date value.
ou
g. Hire_Date must have a value of 01/01/1974 or later.
52
8-24 Chapter 8 Validating and Cleaning Data
Data Requirements
Data Requirement where-expression to obtain invalid data
Employee_ID must be Does not account
Employee_ID = .
unique and not missing. for uniqueness.
Gender must have a
Gender not in ('F','M')
value of F or M.
l
Salary must be in the
Salary not between 24000 and 500000
range of 24000 – 500000.
Job_Title must not be
missing.
r
Job_Title = ' '
e ia
t
Country must have a
Country not in ('AU','US')
value of AU or US.
Birth_Date must occur
before Hire_Date.
54
Hire_Date must have a
r y i o
value of 01/01/1974 or later.
n
Hire_Date < '01JAN1974'd
t a u t
r i e i
Data Requirements
r b S
p t
The following PROC PRINT step accounts for all of the data
o dis SA
requirements except the Employee_ID being unique.
P r
proc print data=orion.nonsales;
f
r
var Employee_ID Gender Salary Job_Title
S f o o
Country Birth_Date Hire_Date;
where Employee_ID = . or
e
A
Gender not in ('F','M') or
S No tsit d
Salary not between 24000 and 500000 or
Job_Title = ' ' or
Country not in ('AU','US') or
Birth_Date > Hire_Date or
ou
Hire_Date < '01JAN1974'd;
run;
The OR operator is used between expressions. Only one
expression needs to be true to account for an observation
with invalid data. p108d04
55
8.3 Validating Data with the PRINT and FREQ Procedures 8-25
Data Requirements
Sixteen observations need the data cleaned.
Obs Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date
l
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005
ia
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003
r
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976
e
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974
t
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994
a
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968
y M n
56
a r t i o
i e t b u
p r
The FREQ Procedure
t r i S
The FREQ procedure produces one-way to n-way
r o dis SA
frequency tables.
S P o o f
General form of the FREQ procedure:
A t f e
TABLES variable(s);
d
S No tsi
RUN;
ou
to produce.
The NLEVELS option displays a table that provides
the number of distinct values for each variable named
in the TABLES statement.
57
8-26 Chapter 8 Validating and Cleaning Data
ia l
t e r
produces a frequency table for each variable.
M a
58
r y i o n p108d05
t a u t
r i e
The FREQ Procedure
r i b S
p t
Two observations need the data cleaned for Gender and
o dis SA
six observations need the data cleaned for Country.
P r f
The FREQ Procedure
S f o r
Gender
o
Frequency Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
A
S No tsit
F
G
M
d e 110
1
123
47.01
0.43
52.56
Frequency Missing = 1
110
111
234
47.01
47.44
100.00
ou
Cumulative Cumulative
Country Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 33 14.04 33 14.04
US 196 83.40 229 97.45
au 3 1.28 232 98.72
us 3 1.28 235 100.00
59
ia l
t e r
M a
60
r y i o n p108d05
t a u t
r i e
The FREQ Procedure
r i b S
p t
Partial PROC FREQ Output
o dis SA
r
The FREQ Procedure
P f
Cumulative Cumulative
r
Employee_ID Frequency Percent Frequency Percent
o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
The FREQ Procedure
S o
120101 1 0.43 1
Cumulative 0.43
Cumulative
f
120104 1 0.43 Percent 2 0.86 Percent
e
Employee_ID Frequency Frequency
A
120105 1 0.43 3 1.29
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t d
120106
121130 1 1 0.43
0.43 4
224 1.72
96.14
S No tsi
120107 121131 1 1 0.43 0.43 5 225 2.15 96.57
120108 121132 2 1 0.85 0.43 7 226 2.99 97.00
120110 121133 1 1 0.43 0.43 8 227 3.43 97.42
120111 121134 1 1 0.43 0.43 9 228 3.86 97.85
120112 121141 1 1 0.43 10 229 4.29 98.28
ou
0.43
120113 121142 1 1 0.43 0.43 11 230 4.72 98.71
121146 1 0.43 232 99.14
121147 1 0.43 233 99.57
121148 1 0.43 234 100.00
Frequency Missing = 1
61
8-28 Chapter 8 Validating and Cleaning Data
l
run;
r
The NLEVELS option displays a table that provides the
e
number of distinct values for each variable named in the ia
TABLES statement.
a t
y M n
62
a r t i o p108d05
i e t
TABLES statement.
b u
To display the number of levels without displaying the frequency counts, add the NOPRINT option to the
r r i S
proc freq data=orion.nonsales nlevels;
p t
tables Gender Country Employee_ID / noprint;
run;
r o dis SA
To display the number of levels for all variables without displaying any frequency counts, use the _ALL_
S P o r o f
keyword and the NOPRINT option in the TABLES statement.
proc freq data=orion.nonsales nlevels;
A run;
t f d e
tables _all_ / noprint;
S No tsi
ou
8.3 Validating Data with the PRINT and FREQ Procedures 8-29
l
Number of Variable Levels
ia
Missing Nonmissing
r
Variable Levels Levels Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
e
Gender 4 1 3
t
Country 4 0 4
Employee_ID 234 1 233
a
There are 235 employees but there are only 234 distinct
M n
Employee_ID values. Therefore, there is one
63
a y t i o
duplicate value for Employee_ID.
r
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8-30 Chapter 8 Validating and Cleaning Data
Exercises
Level 1
ia l
e r
b. The data in orion.shoes_tracker should meet the following requirements:
t
• Product_Category must not be missing.
a
• Supplier_Country must have a value of GB or US.
M
r o n
Add a WHERE statement to the PROC PRINT step to find any observations that
y
do not meet the above requirements.
i
t a t
c. Add a VAR statement to create the following PROC PRINT report:
u
e
Product_ Supplier_
r i t r i
Obs
b Category
S
Supplier_Name Country Supplier_ID
p
1 Shoes 3Top Sports us .
o dis SA
2 3Top Sports US 2963
r
5 Shoes 3Top Sports UT 2963
10 Shoes Greenline Sports Ltd gB 14682
S P o r o f
How many observations have missing Product_Category? ___________________________
A t f d e
How many observations have invalid values of Supplier_Country? ___________________
S No tsi
d. Add a PROC FREQ step with a TABLES statement to create frequency tables for
Supplier_Name and Supplier_ID of orion.shoes_tracker. Include
the NLEVELS option.
ou
The data in orion.shoes_tracker should meet the following requirements:
• Supplier_Name must be 3Top Sports or Greenline Sports Ltd.
• Supplier_ID must be 2963 or 14682.
Level 2
a. Write a PROC PRINT step with a WHERE statement to validate the data in
orion.qtr2_2007.
ia l
t e r
The WHERE statement should find any observations that do not meet the above requirements.
b. Submit the program to create the following PROC PRINT report:
Obs Order_ID
M aOrder_
Type Employee_ID Customer_ID
Order_
Date
Delivery_
Date
5
22
r y
1242012259
1242449327
1
i
3
o n 121040
99999999
10
27
18APR2007
26JUL2007
12APR2007
26JUL2007
t a t
How many observations have Delivery_Date values occurring before Order_Date values?
u
e
______________________________________________________________________________
r i i b
How many observations have Order_Date values out of the range of April 1, 2007 – June 30,
t r S
p
2007? ________________________________________________________________________
r o dis SA
c. Add a PROC FREQ step with a TABLES statement to create frequency tables for Order_ID and
Order_Type of orion.qtr2_2007. Include the NLEVELS option.
S P o r o f
d. Submit the PROC FREQ step.
A t f d e
The data in orion.qtr2_2007 should meet the following requirements:
• Order_ID must be unique (36 distinct values) and not missing.
ou
What invalid data exists for Order_ID and Order_Type? ____________________________
Level 3
3. Using the PROPCASE Function, Two-Way Frequency Table, and MISSING Option
a. Write a PROC PRINT step with a WHERE statement to validate the data in
orion.shoes_tracker. All Product_Name values should be written in proper case.
c. Add a PROC FREQ step with a TABLES statement to create the following two-way frequency
table with Supplier_Name and Supplier_ID of orion.shoes_tracker:
The FREQ Procedure
ia l
t e r
Supplier_Name(Supplier Name)
Frequency ‚
Supplier_ID(Supplier ID)
M a
Percent
Row Pct
Col Pct
‚
‚
‚ .‚ 2963‚ 14682‚ Total
r y i o n
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
3Top Sports ‚ 1 ‚ 5 ‚ 1 ‚ 7
t
‚ 10.00 ‚ 50.00 ‚ 10.00 ‚ 70.00
e t a u
‚ 14.29 ‚ 71.43 ‚ 14.29 ‚
‚ 100.00 ‚ 71.43 ‚ 50.00 ‚
i b
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
p r t r i 3op Sports
S
‚
‚
0 ‚ 2 ‚
0.00 ‚ 20.00 ‚
0 ‚
0.00 ‚ 20.00
2
o dis SA
‚ 0.00 ‚ 100.00 ‚ 0.00 ‚
‚ 0.00 ‚ 28.57 ‚ 0.00 ‚
P r f
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Greenline Sports ‚ 0 ‚ 0 ‚ 1 ‚ 1
r
Ltd ‚ 0.00 ‚ 0.00 ‚ 10.00 ‚ 10.00
S f o e o ‚
‚
0.00 ‚
0.00 ‚
0.00 ‚ 100.00 ‚
0.00 ‚ 50.00 ‚
A
S No tsit d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 1
10.00
7
70.00
2
20.00
10
100.00
Documentation on two-way frequency tables and the MISSING option can be found
ou
in the SAS Help and Documentation from the Contents tab (SAS Products
Base SAS Base SAS Procedures Guide: Statistical Procedures
The FREQ Procedure Syntax: FREQ Procedure TABLES Statement).
The data in orion.shoes_tracker should meet the following requirements:
• A Supplier_Name of 3Top Sports must have a Supplier_ID of 2963.
• A Supplier_Name of Greenline Sports Ltd must have a Supplier_ID
of 14682.
Objectives
Validate data by using the MEANS procedure with
the VAR statement.
Validate data by using the UNIVARIATE procedure
ia l
with the VAR statement.
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
67
r o dis SA
S P The MEANS Procedure
o r o f
The MEANS procedure produces summary reports that
A t f e
display descriptive statistics.
d
General form of the MEANS procedure:
ou
RUN;
68
8-34 Chapter 8 Validating and Cleaning Data
ia l
r
Analysis Variable : Salary
N Mean
a
38354.77 2401.00 433800.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
M n
Without the VAR statement, PROC MEANS
y
69
a r t i o
analyzes all numeric variables in the data set. p108d06
i e t b u
p r t i
The MEANS Procedure
r S
By default, the MEANS procedure creates a report with
r o dis SA
N (number of nonmissing values), MEAN, STDDEV, MIN,
and MAX.
S P o r o f
For validating data, the following descriptive statistics are
A t fbeneficial:
d e
N, number of nonmissing values
ou
MAX
70
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-35
l
run;
r
The MEANS Procedure
e
Analysis Variable : Salary ia
N
N
Miss
a t Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
234 1
M 2401.00
n
433800.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
y
71
a r t i o p108d06
i e t b u
p r t i
The UNIVARIATE Procedure
r S
The UNIVARIATE procedure produces summary reports
r o dis SA
that display descriptive statistics.
S P r f
General form of the UNIVARIATE procedure:
o o
A t f e
PROC UNIVARIATE DATA=SAS-data-set;
VAR variable(s);
d
S No tsi RUN;
ou
The VAR statement specifies the analysis variables and
their order in the results.
72
8-36 Chapter 8 Validating and Cleaning Data
ia l
numeric variables.
t e r
M a
73
r y i o n p108d06
t a u t
r i e i b
The UNIVARIATE Procedure
r S
p t
The UNIVARIATE procedure can produce the following
o dis SA
sections of output:
P r Moments
f
r
Basic Statistical Measures
S f o o
Tests for Locations
e
A
Quantiles
S No tsit d
Extreme Observations
Missing Values
ou
For validating data, the Extreme Observations and
Missing Values sections are beneficial.
74
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-37
-----Lowest---- -----Highest----
l
2401 20 163040 1
2650 13 194885 231
ia
24025 25 207885 28
24100 19 268455 29
24390 228
t e r
Missing Values
433800 27
Missing
M a -----Percent Of-----
Missing
n
Value Count All Obs Obs
75
a r y
.
t i o
1 0.43 100.00
NEXTROBS=n
e t b u
specifies the number of extreme observations that PROC UNIVARIATE lists in the table of
i
p r t i
extreme observations. The table lists the n lowest observations and the n highest observations.
r S
The default value is 5, and n can range between 0 and half the maximum number of
o dis SA
observations. You can specify NEXTROBS=0 to suppress the table of extreme observations.
P r
For example:
f
proc univariate data=orion.nonsales nextrobs=8;
S
run;
f o r
var Salary;
o
A
S No tsit d e
ou
8-38 Chapter 8 Validating and Cleaning Data
Exercises
Level 1
ia l
e r
b. Add a VAR statement to the PROC MEANS step to validate Unit_Cost_Price,
t
Unit_Sales_Price, and Factor.
a
c. Add statistics to the PROC MEANS statement to create the following PROC MEANS report:
M n
The MEANS Procedure
a r
Variable
y t i o
Label N Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
i e t
Unit_Cost_Price
u
Unit_Sales_Price
b
Unit Cost Price
Unit Sales Price
171
170
2.3000000
6.5000000
315.1500000
5730.00
i
Factor Yearly increase in Price 171 0.0100000 100.0000000
p r r S
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
o dis SA
The data in orion.price_current should meet the following requirements:
f
• Unit_Sales_Price must be in the numeric range of 3 – 800.
S f o r o
• Factor must be in the numeric range of 1 – 1.05.
A
S No tsit d e
What variables have invalid data? _________________________________________________
d. Add a PROC UNIVARIATE step with a VAR statement to validate Unit_Sales_Price and
Factor.
ou
e. Submit the PROC UNIVARIATE step and find the Extreme Observations output.
How many values of Unit_Sales_Price are over the maximum of 800? _______________
How many values of Factor are over the maximum of 1.05? __________________________
Level 2
b. Add the MIN, MAX, and RANGE statistics to the PROC MEANS statement.
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-39
c. Add FW=15 to the PROC MEANS statement. The FW= option specifies the field width
to display the statistics in printed or displayed output.
d. Add the following CLASS statement to group the data by Supplier_Name:
class Supplier_Name;
Documentation on the FW= option and the CLASS statement can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The MEANS Procedure).
e. Submit the program to create the following PROC MEANS report:
ia l
t e r The MEANS Procedure
a
Analysis Variable : Product_ID Product ID
M
N
n
Supplier Name Obs Minimum Maximum Range
y o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3Top Sports
t a r t i
7 22020030007 2202003001290 2179982971283
u
3op Sports 2 220200300015 220200300116 101.00000000
r i e r i
Greenline Sports Ltd
b S
1 220200300157 220200300157 0
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
p
o dis SA t
Which Supplier_Name has invalid Product_ID values assuming Product_ID must have
r
only twelve digits? ______________________________________________________________
S P o o
orion.shoes_tracker.
f
f. Add a PROC UNIVARIATE step with a VAR statement to validate Product_ID of
r
A f e
g. Submit the PROC UNIVARIATE step and find the Extreme Observations output.
t d
S No tsi How many values of Product_ID are too small? _____________________________________
Level 3
ou
6. Selecting Only the Extreme Observations Output from the UNIVARIATE Procedure
a. Write a PROC UNIVARIATE step with a VAR statement to validate Product_ID of
orion.shoes_tracker.
b. Before the PROC UNIVARIATE step, add the following ODS statement:
ods trace on;
c. After the PROC UNIVARIATE step add, the following ODS statement:
ods trace off;
8-40 Chapter 8 Validating and Cleaning Data
d. Submit the program and notice the trace information in the SAS log.
What is the name of the last Output Added in the SAS log? ______________________________
e. Add an ODS SELECT statement immediately before the PROC UNIVARIATE step to select only
the Extreme Observation output object.
Documentation on the ODS TRACE and ODS SELECT statements can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
l
SAS 9.2 Output Delivery System User’s Guide ODS Language Statements
Dictionary of ODS Language Statements).
r
f. Submit the program to create the following PROC UNIVARIATE report:
Extreme Observations
y M n
--------Lowest------- -------Highest------
i e t b u 2.20200E+10
2.20200E+11
4
1
2.2020E+11
2.2020E+11
6
7
p r t r i 2.20200E+11
S
2.20200E+11
2
3
2.2020E+11
2.2020E+11
9
10
o dis SA
2.20200E+11 5 2.2020E+12 8
P r f
S f o r o
A
S No tsit d e
ou
8.5 Cleaning Invalid Data 8-41
Objectives
Clean data by using the Viewtable window.
(Self-Study)
Clean data by using assignment statements in
the DATA step.
ia l
r
Clean data by using IF-THEN/ELSE statements
e
in the DATA step.
a t
y M n
a r t i o
i e t b u
i
79
p r t r S
r o dis SA
Invalid Data to Clean
S P o r o f
The orion.nonsales data set contains invalid
data that needs to be cleaned.
A t f d e
S No tsi
ou
After you validate the data and find the invalid data,
80 the correct data values are needed.
8-42 Chapter 8 Validating and Cleaning Data
4
au or us
.
AU or US
26960
ia l
Salary
t e
13
20 r 2650
2401
26500
24015
Hire_Date
M a 5
9
21/01/1953
.
21/01/1995
01/11/1978
81
r y
214
i o n 01/01/1968 01/01/1998
t a u t
r i e i b
Interactively Cleaning Data (Self-Study)
r S
p t
If you are using the SAS windowing environment, the
o dis SA
Viewtable window can be used to interactively clean data.
P r f
r
Use the Viewtable window to interactively clean the
S f o o
following five observations:
e
A
Variable Obs Invalid Value Correct Value
S No tsit d
Employee_ID
7
14
12
120108
G
.
120109
120116
F
ou
Gender
101 F
Job_Title 10 Security Guard I
82
For z/OS (OS/390), the FSEDIT window is used to clean data. If FSEDIT is not available, the data can be
cleaned programmatically instead of interactively. If you use batch mode, cleaning the data interactively
is not an option.
8.5 Cleaning Invalid Data 8-43
ia l
t e r
M a
83
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8-44 Chapter 8 Validating and Cleaning Data
1. Select the Explorer tab on the SAS window bar to activate the SAS Explorer window or select
View Contents Only.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
2. Double-click Libraries to show all available libraries.
ou
8.5 Cleaning Invalid Data 8-45
ia l
t e r
M a
r y i o n
t a u t
r i e r i b
4. Double-click on the Nonsales data set to open the data set in the Viewtable window.
S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8-46 Chapter 8 Validating and Cleaning Data
ia l
t e r
M a
r y i o n
t
6. Go to the following observations and make the desired changes:
Obs
e t a u
Variable Correct Value
r i7
t r i b
Employee_ID
S
120109
r p
o dis SA
12
14
Gender
Employee_ID
F
120116
S P o r101
o fGender F
A t f
7. Select
e
to close the Viewtable window. The changes are saved to the data set.
d
S No tsi
ou
8.5 Cleaning Invalid Data 8-47
ia l
t e r
M a
r y i o n
t a u t
r i e i b
2. Double-click Libraries to show all available libraries.
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3. Double-click on the Orion library to show all members of that library.
8-48 Chapter 8 Validating and Cleaning Data
4. Double-click on the Nonsales data set to open the data set in the Viewtable window.
ia l
t e r
M a
r y i o n
t a u t
r i e i b
5. Select Edit Mode from the Edit menu.
r S
p t
6. Go to the following observations and make the desired changes:
o dis SA
P r Obs
f
Variable Correct Value
r
7 Employee_ID 120109
S f o 12
e o Gender F
A
S No tsit 14
101 d Employee_ID
Gender
120116
ou
7. Select to close the Viewtable window. The changes are saved to the data set.
8.5 Cleaning Invalid Data 8-49
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2. Go to the following observations and make the desired changes:
Obs
7
Variable
Employee_ID
Correct Value
120109
12 Gender F
14 Employee_ID 120116
101 Gender F
Type the observation number on the command line and press ENTER to go to an observation.
3. Type end on the command line and press ENTER to close the FSEDIT window. The changes
are saved to the data set.
8-50 Chapter 8 Validating and Cleaning Data
ia l
t e r
M a
86
r y i o n
t a u t
r i e i b
Programmatically Cleaning Data
r S
p t
The DATA step can be used to programmatically clean
o dis SA
the invalid data.
P r f
r
Use the DATA step to clean the following observations:
S f o
Variable
A
S No tsit
Country
Salary d
2, 84, 87, 125, 197, and 200
13
au or us
2650
AU or US
26960
26500
ou
20 2401 24015
5 21/01/1953 21/01/1995
Hire_Date 9 . 01/11/1978
variable = expression;
ia l
t e r
variable names an existing or new variable.
expression is a sequence of operands and operators
a
that form a set of instructions that produce a value.
y M n
90
a r t i o
i e t b u
Assignment statements evaluate the expression on the right side of the equal sign and store
the result in the variable that is specified on the left side of the equal sign.
p r t r i S
r o dis SA
The Assignment Statement Expression
P f
Operands are
S f o r o
character constants
numeric constants
A
S No tsit d e
date constants
character variables
numeric variables.
ou
Operators are
symbols that represent an arithmetic calculation
SAS functions.
91
8-52 Chapter 8 Validating and Cleaning Data
ia l
t
Country = upcase(Country);
e r
M
functiona variable
92
r y i o n
t a u t
r i e
SAS Functions
r i b S
p t
A SAS function is a routine that returns a value that is
o dis SA
determined from specified arguments.
P r f
r
The UPCASE function converts all letters in an argument
S o
to uppercase.
f e o
A
S No tsit d
General form of the UPCASE function:
UPCASE(argument)
93
ou
The argument specifies any SAS character expression.
8.5 Cleaning Invalid Data 8-53
data work.clean;
set orion.nonsales;
Country=upcase(Country);
run;
ia l
PDV
Employee_ID Job_Title
120101 ... Director
t e r Country
AU ...
M a
94
r y i o n p108d07
...
t a u t
r i e i b
The Assignment Statement
r S
p t
All the values of Country in the data set
o dis SA
orion.nonsales need to be uppercase.
P r f
r
data work.clean;
S f o o
set orion.nonsales;
Country=upcase(Country);
e
A
run;
S No tsit
PDV
Employee_ID d Job_Title Country
ou
120101 ... Director AU ...
upcase(au)
95 ...
8-54 Chapter 8 Validating and Cleaning Data
data work.clean;
set orion.nonsales;
Country=upcase(Country);
run;
ia l
PDV
Employee_ID Job_Title
t e r Country
120104 ... Administration Manager au ...
M a
96
r y i o n ...
t a u t
r i e i b
The Assignment Statement
r S
p t
All the values of Country in the data set
o dis SA
orion.nonsales need to be uppercase.
P r f
r
data work.clean;
S f o o
set orion.nonsales;
Country=upcase(Country);
e
A
run;
S No tsit PDV
d
Employee_ID Job_Title Country
ou
120104 ... Administration Manager AU ...
upcase(au)
97
8.5 Cleaning Invalid Data 8-55
l
Obs ID Job_Title Country
ia
84 120695 Warehouse Assistant II AU
85 120696 Warehouse Assistant I AU
r
86 120697 Warehouse Assistant IV AU
87 120698 Warehouse Assistant I AU
88
89
90
91
120710
120711
120712
120713
a t e
Business Analyst II
Business Analyst III
Marketing Manager
Marketing Assistant III
US
US
US
US
M
The assignment statement executed for every
n
observation regardless of whether the value
y
98
r i o
needed to be uppercased or not.
a t
i e t b u
p r t i
Programmatically Cleaning Data
r S
The DATA step can be used to programmatically clean
r o dis SA
the invalid data.
Use the DATA step to clean the following observations:
S P o r
Variable
The
o f Obs
84,assignment
Invalid Value
87, 125, 197,statement
and 200 was
auapplied
Correct Value
or us to all AU
observations.
f
Country 2, or US
A
S No tsit
Salary
d e 4
13
20
.
2650
The2401
26960
26500
assignment statement
24015
needs to be applied to
ou
5 21/01/1953 21/01/1995
specific observations.
Hire_Date 9 . 01/11/1978
99
8-56 Chapter 8 Validating and Cleaning Data
8.07 Quiz
Which variable can be used to specifically identify
the observations with invalid salary values?
Obs Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date
l
9 120111 M 26895 Security Guard II AU 23/07/1949 .
ia
10 120112 F 26550 AU 17/02/1969 01/07/1990
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005
r
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980
e
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989
t
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976
a
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994
M
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968
101
r y i o n
t a u t
r i e i b
Programmatically Cleaning Data
r S
p t
The DATA step can be used to programmatically clean
o dis SA
the invalid data.
f
S f o r
Variable
Country
o
Obs
2, 84, 87, 125, 197, and 200
Invalid Value
au or us
Correct Value
AU or US
A
S No tsit Salary
d e 4
13
20
.
2650
2401
26960
26500
24015
ou
5 21/01/1953 21/01/1995
Hire_Date 9 . 01/11/1978
103
8.5 Cleaning Invalid Data 8-57
IF-THEN Statements
The IF-THEN statement executes a SAS statement
for observations that meet specific conditions.
ia l
t r
expression is a sequence of operands and operators
e
that form a set of instructions that define a condition
for selecting observations.
M a
statement is any executable statement such as the
assignment statement.
104
r y i o n
t a u t
If the condition in the IF clause is met, the IF-THEN statement executes a SAS statement
r i e
for that observation.
r i b S
p
o dis SA t
IF-THEN Statements
P r f
All the values of Salary must be in the range
S f o r o
of 24000 – 500000.
A
S No tsit e
data work.clean;
d
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
ou
run;
PDV
Employee_ID Salary Job_Title
120105 ... 27110 Secretary I ...
p108d07
105 ...
8-58 Chapter 8 Validating and Cleaning Data
IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
ia l
r
run;
PDV
Employee_ID
a t e
Salary Job_Title
120105 ...
y M 27110 Secretary I
n
...
106
a r t i o ...
i e t b u
p r
IF-THEN Statements
t r i S
All the values of Salary must be in the range
r o dis SA
of 24000 – 500000.
P
FALSE
S o r
data work.clean;
o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
A t f e
if Employee_ID=120115 then Salary=26500;
d
if Employee_ID=120191 then Salary=24015;
S No tsi run;
ou
PDV
Employee_ID Salary Job_Title
120105 ... 27110 Secretary I ...
107 ...
8.5 Cleaning Invalid Data 8-59
IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
ia l
r
run;
PDV
Employee_ID
a t e
Salary Job_Title
120105 ...
y M
27110 Secretary I
n
...
108
a r t i o ...
i e t b u
p r
IF-THEN Statements
t r i S
All the values of Salary must be in the range
r o dis SA
of 24000 – 500000.
S P o r
data work.clean;
o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
A t f e
if Employee_ID=120115 then Salary=26500;
d
if Employee_ID=120191 then Salary=24015;
S No tsi run;
ou
PDV
Employee_ID Salary Job_Title
120106 ... . Office Assistant II ...
109 ...
8-60 Chapter 8 Validating and Cleaning Data
IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
TRUE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
ia l
r
run;
PDV
Employee_ID
a t e
Salary Job_Title
120106 ...
n
...
110
a r t i o ...
i e t b u
p r
IF-THEN Statements
t r i S
All the values of Salary must be in the range
r o dis SA
of 24000 – 500000.
P
FALSE
S o r
data work.clean;
o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
A t f e
if Employee_ID=120115 then Salary=26500;
d
if Employee_ID=120191 then Salary=24015;
S No tsi run;
ou
PDV
Employee_ID Salary Job_Title
120106 ... 26960 Office Assistant II ...
111 ...
8.5 Cleaning Invalid Data 8-61
IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
ia l
r
run;
PDV
Employee_ID
a t e
Salary Job_Title
120106 ...
n
...
112
a r t i o
i e t b u
p r
IF-THEN Statements
t r i S
When an IF expression is TRUE in this IF-THEN
r o dis SA
statement series, there is no reason to check the
remaining IF-THEN statements when checking
P f
Employee_ID.
r
TRUE
S f o o
data work.clean;
e
set orion.nonsales;
A
S No tsit run;
if Employee_ID=120106 then Salary=26960;
d
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;
113
ou
The word ELSE can be placed before the word IF,
causing SAS to execute conditional statements until
it encounters the first true statement.
When a series of IF expressions represent mutually exclusive events, then it is not necessary
to check all expressions when one is found to be true.
8-62 Chapter 8 Validating and Cleaning Data
IF-THEN/ELSE Statements
All the values of Salary must be in the range
of 24000 – 500000.
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
else if Employee_ID=120115 then Salary=26500;
else if Employee_ID=120191 then Salary=24015;
ia l
r
run;
PDV
Employee_ID
a t e
Salary Job_Title
120106 ...
y M . Office Assistant II
n
...
114
a r t i o p108d07
...
i e t b u
p r t i
IF-THEN/ELSE Statements
r S
All the values of Salary must be in the range
r o dis SA
of 24000 – 500000.
P
TRUE
S r
data work.clean;
set orion.nonsales;
o o f
if Employee_ID=120106 then Salary=26960;
A t f SKIP
d e
else if Employee_ID=120115 then Salary=26500;
else if Employee_ID=120191 then Salary=24015;
S No tsi
run;
ou
PDV
Employee_ID Salary Job_Title
120106 ... 26960 Office Assistant II ...
115
8.5 Cleaning Invalid Data 8-63
l
Country 2, 84, 87, 125, 197, and 200 au or us AU or US
ia
4 . 26960
r
Salary 13 2650 26500
Hire_Date
a
20
9t e 2401
21/01/1953
.
24015
21/01/1995
01/11/1978
y M 214
n
01/01/1968 01/01/1998
116
a r t i o
i e t b u
p r t i
IF-THEN/ELSE Statements
r S
All the values of Hire_Date must have a value
r o dis SA
of 01/01/1974 or later.
P f
data work.clean;
S f o r
set orion.nonsales;
o
Country=upcase(Country);
if Employee_ID=120106 then Salary=26960;
A
S No tsi e
else if Employee_ID=120115 then Salary=26500;
t d
else if Employee_ID=120191 then Salary=24015;
else if Employee_ID=120107 then
Hire_Date='21JAN1995'd;
else if Employee_ID=120111 then
ou
Hire_Date='01NOV1978'd;
else if Employee_ID=121011 then
Hire_Date='01JAN1998'd;
run;
p108d07
117
8-64 Chapter 8 Validating and Cleaning Data
Exercises
Level 1
ia l
e r
b. Add two conditional statements to the DATA step to correct the following invalid data:
t
Variable
Delivery_Date
M a Obs
5
Invalid Value
12APR2007
Correct Value Reference Variable
12MAY2007 Order_ID=1242012259
Order_Date
r y i o n
22 26JUL2007 26JUN2007 Order_ID=1242449327
a t
c. Submit the program. Verify that zero observations were returned from the PROC PRINT step.
t u
Level 2
r i e r i b S
p t
8. Cleaning Data from orion.price_current
o dis SA
P r a. Retrieve the starter program p108e08.
f
r
b. Add a DATA step prior to the PROC steps to read orion.price_current to create
S f o o
Work.price_current. In the DATA step, include two conditional IF-THEN statements
e
to correct the following invalid data:
A
S No tsit d
Variable
Unit_Sales_Price
Obs Invalid
Value
Correct
Value
Reference Variable
c. Submit the program. Verify that Unit_Sales_Price is in the numeric range of 3 – 800.
8.5 Cleaning Invalid Data 8-65
Level 3
ia l
Reference Variable
Supplier_Country
Supplier_Country
t e r
5
mixed case
UT
upper case
US Supplier_Country='UT'
Product_Category
Supplier_ID
r y i o
1
n . 2963 Supplier_ID=.
t
Supplier_Name 3, 7 3op Sports 3Top Sports Supplier_Name =
e t a u
'3op Sports'
r i
Product_ID
t r i b S
4 22020030007 220200300079 _N_=4
p
Product_ID 8 2202003001290 220200300129 _N_=8
r o dis SA
Product_Name not proper case proper case
S P o
Supplier_Name
A t f d e
c. Submit the program. Verify that the data requirements are all met.
ou
• Supplier_Name must be 3Top Sports or Greenline Sports Ltd.
• Supplier_ID must be 2963 or 14682.
• A Supplier_Name of 3Top Sports must have a Supplier_ID of 2963.
• A Supplier_Name of Greenline Sports Ltd must have a Supplier_ID of
14682.
• Product_ID must have only 12 digits.
8-66 Chapter 8 Validating and Cleaning Data
Chapter Review
1. What procedures can be used to detect invalid data?
ia l
t e r
3. Why would you need a SAS date constant?
a
4. How can you clean invalid data?
M n
5. What symbol is required in an assignment statement?
y
r t i o
6. Why would you use IF-THEN/ELSE statements
a
i e t b u
instead of IF-THEN statements?
i
120
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8.7 Solutions 8-67
8.7 Solutions
Solutions to Exercises
1. Validating orion.shoes_tracker with the PRINT and FREQ Procedures
ia l
where Product_Category=' ' or
run;
e r
Supplier_Country not in ('GB','US');
t
c. Add a VAR statement.
M a
proc print data=orion.shoes_tracker;
r y
where Product_Category=' ' or
i o n
Supplier_Country not in ('GB','US');
run;
t a u t
var Product_Category Supplier_Name Supplier_Country Supplier_ID;
i e i b
How many observations have missing Product_Category? One (observation 2)
r r S
p t
How many observations have invalid values of Supplier_Country? Three (observations 1,
o dis SA
5, and 10)
P r
d. Add a PROC FREQ step with a TABLES statement.
f
S o r
proc freq data=orion.shoes_tracker nlevels;
o
tables Supplier_Name Supplier_ID;
f e
run;
A
S No tsit d
What invalid data exists for Supplier_Name and Supplier_ID?
• two invalid values for Supplier_Name (3op Sports)
• one missing value for Supplier_ID
ou
2. Validating orion.qtr2_2007 with the PRINT and FREQ Procedures
M a
a. Write a PROC PRINT step with a WHERE statement.
proc print data=orion.shoes_tracker;
run;
r i o n
where propcase(Product_Name) ne Product_Name;
y
a
b. Add a VAR statement.
t u t
e
proc print data=orion.shoes_tracker;
r i r i b
where propcase(Product_Name) ne Product_Name;
var Product_ID Product_Name;
t S
run;
r p
o dis SA
c. Add a PROC FREQ step with a TABLES statement.
P f
proc freq data=orion.shoes_tracker;
S
run;
f o r
tables Supplier_Name*Supplier_ID / missing;
o
A
S No tsit d e
What invalid data exists for Supplier_Name and Supplier_ID?
• two invalid values for Supplier_Name (3op Sports should be 3Top Sports)
• one missing value for Supplier_ID (. should be 2963)
ou
• one wrong value for Supplier_ID (14682 should be 2963)
ia l
e r
proc univariate data=orion.price_current;
var Unit_Sales_Price Factor;
t
a
run;
M
e. Find the Extreme Observations output.
r i o n
How many values of Unit_Sales_Price are over the maximum of 800? One (5730)
y
How many values of Factor are under the minimum of 1? One (0.01)
a t
How many values of Factor are over the maximum of 1.05? Two (10.20 and 100.00)
t u
r i e r i b
5. Validating orion.shoes_tracker with the MEANS and UNIVARIATE Procedures
S
a. Write a PROC MEANS step with a VAR statement.
p
o dis SA t
proc means data=orion.shoes_tracker;
r
var Product_ID;
run;
S P o r o f
b. Add the MIN, MAX, and RANGE statistics to the PROC MEANS statement.
f
proc means data=orion.shoes_tracker min max range;
A var Product_ID;
S No tsi
run;
t d e
c. Add FW=15 to the PROC MEANS statement.
ou
proc means data=orion.shoes_tracker min max range fw=15;
var Product_ID;
run;
d. Add the CLASS statement.
proc means data=orion.shoes_tracker min max range fw=15;
var Product_ID;
class Supplier_Name;
run;
e. Submit the program.
Which Supplier_Name has invalid Product_ID values assuming Product_ID must have
only twelve digits? 3Top Sports
8-70 Chapter 8 Validating and Cleaning Data
l
How many values of Product_ID are too large? One (2.2020E+12)
ia
6. Selecting Only the Extreme Observations Output from the UNIVARIATE Procedure
e r
a. Write a PROC UNIVARIATE step with a VAR statement.
t
proc univariate data=orion.shoes_tracker;
var Product_ID;
run;
M a
ods trace on;
r y i o n
b. Before the PROC UNIVARIATE step, add an ODS statement.
t a u t
proc univariate data=orion.shoes_tracker;
run;
r i e
var Product_ID;
r i b S
p t
c. After the PROC UNIVARIATE step, add an ODS statement.
o dis SA
r
ods trace on;
S P r
var Product_ID;
o o f
proc univariate data=orion.shoes_tracker;
f
run;
A
S No tsit
ods trace off;
d e
d. Submit the program.
ou
What is the name of the last Output Added in the SAS log? ExtremeObs
e. Add an ODS SELECT statement.
ods trace on;
ia l
r
proc print data=work.qtr2_2007;
where Order_Date>Delivery_Date or
run;
a t
Order_Date>'30JUN2007'd; e
Order_Date<'01APR2007'd or
y M n
r i o
8. Cleaning Data from orion.price_current
a t
i e t b u
a. Retrieve the starter program.
i
b. Add a DATA step.
r r
data work.price_current;
p t S
o dis SA
set orion.price_current;
if Product_ID=220200200022 then Unit_Sales_Price=57.30;
P
run;
r
else if Product_ID=240200100056 then Unit_Sales_Price=41.20;
f
S o r o
proc means data=work.price_current n min max;
f
A run;
S No tsi d e
var Unit_Sales_Price;
t
proc univariate data=work.price_current;
var Unit_Sales_Price;
ou
run;
c. Submit the program.
8-72 Chapter 8 Validating and Cleaning Data
ia l
r
if Supplier_Name='3op Sports' then Supplier_Name='3Top Sports';
if _n_=4 then Product_ID=220200300079;
a t e
else if _n_=8 then Product_ID=220200300129;
Product_Name=propcase(Product_Name);
if Supplier_ID=14682 and Supplier_Name='3Top Sports'
run;
y M
then Supplier_Name='Greenline Sports Ltd';
n
r t i
proc print data=work.shoes_tracker;
a o
t
where Product_Category=' ' or
i e i b u
Supplier_Country not in ('GB','US') or
propcase(Product_Name) ne Product_Name;
run;
p r t r S
o dis SA
proc freq data=work.shoes_tracker;
P
run;
r
tables Supplier_Name*Supplier_ID / missing;
f
S o r o
proc means data=work.shoes_tracker min max range fw=15;
f e
var Product_ID;
A t d
class Supplier_Name;
S No tsi
run;
ou
var Product_ID;
run;
c. Submit the program.
8.7 Solutions 8-73
Partial nonsales.csv
ia l
r
120101,Patrick,Lu,M,163040,Director,AU,18AUG1976,01JUL2003
120104,Kareen,Billington,F,46230,Administration Manager,au,11MAY1954,01JAN1981
e
120105,Liz,Povey,F,27110,Secretary I,AU,21DEC1974,01MAY1999
t
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974
a
120107,Sherie,Sheedy,F,30475,Office Assistant III,AU,01FEB1978,21JAN1953
120108,Gladys,Gromek,F,27660,Warehouse Assistant II,AU,23FEB1984,01AUG2006
120108,Gabriele,Baker,F,26495,Warehouse Assistant I,AU,15DEC1986,01OCT2006
M
120110,Dennis,Entwisle,M,28615,Warehouse Assistant III,AU,20NOV1949,01NOV1979
n
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978
y o
120112,Ellis,Glattback,F,26550, ,AU,17FEB1969,01JUL1990
r i
120113,Riu,Horsey,F,26870,Security Guard II,AU,10MAY1944,01JAN1974
t
120114,Jeannette,Buddery,G,31285,Security Manager,AU,08FEB1944,01JAN1974
e t a
120115,Hugh,Nichollas,M,2650,Service Assistant I,AU,08MAY1984,01AUG2005
u
.,Austen,Ralston,M,29250,Service Assistant II,AU,13JUN1959,01FEB1980
120117,Bill,Mccleary,M,31670,Cabinet Maker III,AU,11SEP1964,01APR1986
7
r i t r i b S
r p
o dis SA
8.02 Quiz – Correct Answer
S P r f
What problems exist with the data in this partial data set?
o o
A t f d e
S No tsi
ou
Hint: There are nine data problems.
12
8-74 Chapter 8 Validating and Cleaning Data
ia l
t e r
M a
25
r y i o n
t a u t
r i e i b
8.04 Multiple Choice Poll – Correct Answer
r S
p t
Which statement best describes the invalid data?
o dis SA
P r a. The data in the raw data file is bad.
f
b. The programmer incorrectly read the data.
e
Partial SAS Log
A 404
S No tsi
405
t
RULE:
1
run;
d
input Employee_ID First $ Last;
ou
Employee_ID=120101 First=Patrick Last=. _ERROR_=1 _N_=1
NOTE: Invalid data for Last in line 2 15-24.
2 120104,Kareen,Billington,F,46230,Administration Manager,au,1
61 1MAY1954,01JAN1981 78
Employee_ID=120104 First=Kareen Last=. _ERROR_=1 _N_=2
36
8.7 Solutions 8-75
ia l
r
e. Country must have a value of AU or US.
e
f. Birth_Date value must occur before Hire_Date value.
g.
a t
Hire_Date must have a value of 01/01/1974 or later.
y M n
53
a r t i o
i e t b u
p r t i
8.06 Quiz – Correct Answer (Self-Study)
r S
Open the VIEWTABLE window for orion.nonsales.
r o dis SA
Use the VIEWTABLE window to interactively clean the
following observation:
S P o r o f
A t f d e
S No tsi
87
ou
8-76 Chapter 8 Validating and Cleaning Data
l
9 120111 M 26895 Security Guard II AU 23/07/1949 .
ia
10 120112 F 26550 AU 17/02/1969 01/07/1990
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005
r
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980
e
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989
t
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976
a
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994
M
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968
102
r y o n
Employee_ID because the values are unique.
i
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.7 Solutions 8-77
ia l
UNIVARIATE
t e r
M a
r y i o n
t a u t
121
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
r
2. What happens when SAS encounters a data error?
S f o o
A note that describes the error is printed in the
SAS log.
e
A
S No tsit d
The input record (contents of the input buffer)
being read is displayed in the SAS log.
The values in the SAS observation (contents of
the PDV) being created are displayed in the
ou
SAS log.
A missing value is assigned to the appropriate
SAS variable.
Execution continues.
122 continued...
8-78 Chapter 8 Validating and Cleaning Data
ia l
t e r
IF-THEN/ELSE statements in the DATA step
= (Equal sign)
M a
5. What symbol is required in an assignment statement?
123
r y i o n continued...
t a u t
r i e i b
Chapter Review Answers
r S
p t
6. Why would you use IF-THEN/ELSE statements
o dis SA
instead of IF-THEN statements?
f
r
exclusive, it is not necessary to check all
S f o e o
expressions when one is found to be true.
A
S No tsit d
124
ou
Chapter 9 Manipulating Data
t
9.3
a
Subsetting Observations ............................................................................................. 9-44
M
r i o n
Exercises .............................................................................................................................. 9-51
y
9.4
a t
Chapter Review ............................................................................................................. 9-53
t u
9.5
r i e i b
Solutions ....................................................................................................................... 9-54
r S
p t
Solutions to Exercises .......................................................................................................... 9-54
o dis SA
r
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 9-62
S P o r o f
Solutions to Chapter Review ................................................................................................ 9-66
A t f d e
S No tsi
ou
9-2 Chapter 9 Manipulating Data
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
9.1 Creating Variables 9-3
Objectives
Create SAS variables with the assignment statement
in the DATA step.
Create data values by using operators including
SAS functions.
ia l
r
Subset variables by using the DROP and KEEP
e
statements.
a t
Examine the compilation and execution phases
of the DATA step when you read a SAS data set.
M
Subset variables by using the DROP= and KEEP=
options. (Self-Study)
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rBusiness Scenario
f
A new SAS data set named Work.comp needs to
S f o r o
be created by reading the orion.sales data set.
e
Work.comp must include the following new variables:
A
S No tsit d
Bonus, which is equal to a constant 500
Compensation, which is the combination of the
employee's salary and bonus
ou
BonusMonth, which is equal to the month that the
employee was hired
Work.comp must not include the Gender, Salary,
Job_Title, Country, Birth_Date, and
Hire_Date variables from orion.sales.
4
9-4 Chapter 9 Manipulating Data
Business Scenario
Partial orion.sales
Employee_ First_ Last_ Birth_ Hire_
Gender Salary Job_ Title Country
ID Name Name Date Date
120102 Tom Zhou M 108255 Sales Manager AU 3510 10744
120103 Wilson Dawes M 87975 Sales Manager AU -3996 5114
120121 Irenie Elvish F 26600 Sales Rep. II AU -5630 5114
ia l
Partial Work.comp
Employee_
t e r
First_ Last_ Bonus
a
Bonus Compensation
ID Name Name Month
120102 Tom Zhou 500 108755 6
M
120103 Wilson Dawes 500 88475 1
n
120121 Irenie Elvish 500 27100 1
a r y t i o
i e t b u
p r t i
Assignment Statements (Review)
r S
Assignment statements are used in the DATA step
r o dis SA
to update existing variables or create new variables.
S P o r o f
DATA output-SAS-data-set;
SET input-SAS-data-set;
variable = expression;
A t f RUN;
d e
S No tsi DATA output-SAS-data-set;
ou
INFILE 'raw-data-file-name';
INPUT specifications;
variable = expression;
RUN;
6
9.1 Creating Variables 9-5
l
variable = expression;
r
variable names an existing or new variable.
e ia
t
expression is a sequence of operands and operators
a
that form a set of instructions that produce a value.
y M n
7
a r t i o
i e t b u
An assignment statement evaluates the expression on the right side of the equal sign and stores the result
in the variable that is specified on the left side of the equal sign.
p r t r i S
r o dis SA
Operands (Review)
P f
Operands are constants (character, numeric, or date) and
S f o r
variables (character or numeric).
o
e
Examples:
A
S No tsit d
Bonus = 500;
Gender = 'M';
numeric constant
character constant
ou
Hire_Date = '01APR2008'd;
variable
8
9-6 Chapter 9 Manipulating Data
Operators (Review)
Operators are symbols that represent an arithmetic
calculation and SAS functions.
Examples:
Revenue = Quantity * Price;
ia l
r
NewCountry = upcase(Country);
a t e
y M n
9
a r t i o
i e t b u
p r t i
Arithmetic Operators
r S
Arithmetic operators indicate that an arithmetic
r o dis SA
calculation is performed.
P f
Symbol Definition Priority
S f o r **
o exponentiation I
e
- negative prefix I
A
S No tsit *
/
+
d
multiplication
division
addition
II
II
III
ou
- subtraction III
9.01 Quiz
What is the result of the assignment statement?
a. . (missing)
b. 0 num = 4 + 10 / 2;
c. 7
l
d. 9
e r ia
a t
y M n
12
a r t i o
i e t b u
p r
9.02 Quiz
t r i S
What is the result of the assignment statement given
r o dis SA
the values of var1 and var2?
P f
a. . (missing)
S
b.
c.
f o
0
5r o
A
S No tsit
d. 10
d e
num = var1 + var2 / 2; var1
.
var2
10
15
ou
9-8 Chapter 9 Manipulating Data
ia l
e r
function-name(argument1, argument2, …)
t
M
arguments are used.a
Depending on the function, zero, one, or many
17
r y i o n
Arguments are separated with commas.
t a u t
r i e i b
Descriptive Statistics Function
r S
p t
The SUM function returns the sum of the arguments.
o dis SA
P r General form of the SUM function:
f
S f o r o
SUM(argument1,argument2, ...)
A
S No tsit d e
The arguments must be numeric values.
Missing values are ignored by some of the descriptive
statistics functions.
ou
Example:
Compensation=sum(Salary,Bonus);
18
9.1 Creating Variables 9-9
Date Functions
SAS date functions can be used to
extract information from SAS date values
create SAS date values.
01JAN1959
Calendar Date
01JAN1960 01JAN1961
ia l
t e r
-365
M a 0
19
r y i o n
t a u t
r i e i b
Date Functions – Extracting Information
r S
p
o dis SA t
YEAR(SAS-date)
extracts the year from a SAS date and
returns a four-digit value for year.
P r QTR(SAS-date)
f
extracts the quarter from a SAS date and
returns a number from 1 to 4.
S f o r
MONTH(SAS-date)
o extracts the month from a SAS date and
returns a number from 1 to 12.
A
S No tsit d e
DAY(SAS-date)
WEEKDAY(SAS-date)
extracts the day of the month from a SAS
date and returns a number from 1 to 31.
extracts the day of the week from a SAS
date and returns a number from 1 to 7,
ou
where 1 represents Sunday, and so on.
Example:
BonusMonth=month(Hire_Date);
20
9-10 Chapter 9 Manipulating Data
Example:
AnnivBonus=mdy(month(Hire_Date),15,2008);
ia l
t e r
M a
21
r y i o n
t a u t
r i e
Business Scenario
r i b S
p t
Create Bonus, Compensation, and BonusMonth.
o dis SA
r
data work.comp;
set orion.sales;
S P o
Bonus=500;
r o f
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
A t f
run;
d e
S No tsi
1700
1701
1702
1703
data work.comp;
set orion.sales;
Bonus=500;
Compensation=sum(Salary,Bonus);
orion.sales
has 9 variables.
ou
1704 BonusMonth=month(Hire_Date);
1705 run;
NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.COMP has 165 observations and 12 variables.
p109d01
23
9.1 Creating Variables 9-11
9.03 Quiz
What statement needs to be added to the DATA step
to eliminate six of the 12 variables?
ia l
t e r
M a
25
r y i o n
t a u t
r i e i b
The DROP and KEEP Statements (Review)
r S
p t
The DROP statement specifies the names of the variables
o dis SA
to omit from the output data set(s).
P r f
DROP variable-list ;
S f o r o
e
The KEEP statement specifies the names of the variable
A
S No tsit
to write to the output data set(s).
d
KEEP variable-list ;
ou
The variable-list specifies the variables to drop or keep,
respectively, in the output data set.
27
9-12 Chapter 9 Manipulating Data
Business Scenario
Drop Gender, Salary, Job_Title, Country,
Birth_Date, and Hire_Date.
data work.comp;
set orion.sales;
Bonus=500;
l
Compensation=sum(Salary,Bonus);
ia
BonusMonth=month(Hire_Date);
drop Gender Salary Job_Title
run;
t e r
Country Birth_Date Hire_Date;
M a
NOTE: There were 165 observations read from the data set ORION.SALES.
n
NOTE: The data set WORK.COMP has 165 observations and 6 variables.
28
a r y t i o p109d01
i e t b u
p r
Business Scenario
t r i
proc print data=work.comp;
S
r o dis SA
run;
S P
Partial PROC PRINT Output
Obs
o r
Employee_ID
o
Namef
First_
Last_Name Bonus Compensation
Bonus
Month
A 1
2
t f d e
120102
120103
Tom
Wilson
Zhou
Dawes
500
500
108755
88475
6
1
S No tsi
3
4
5
6
120121
120122
120123
120124
Irenie
Christina
Kimiko
Lucian
Elvish
Ngan
Hotstone
Daymond
500
500
500
500
27100
27975
26690
26980
1
7
10
3
ou
7 120125 Fong Hofmeister 500 32540 3
8 120126 Satyakam Denny 500 27280 8
9 120127 Sharryn Clarkson 500 28600 11
10 120128 Monica Kletschkus 500 31390 11
p109d01
29
9.1 Creating Variables 9-13
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
ia l
run;
t e r
BonusMonth=month(Hire_Date);
M a
31
r y i o n
t a u t
r i
9.04 Polle r i b S
p t
Are the correct results produced when the DROP
o dis SA
statement is placed after the SET statement?
P r Yes
f
r
No
S f o e o
A
S No tsit d
32
ou
9-14 Chapter 9 Manipulating Data
PDV
ia l
t e r
DROP and KEEP statements
M a
34
Output SAS Data Set
r y i o n
t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.comp;
set orion.sales;
r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
P f
Bonus=500;
r
Compensation=sum(Salary,Bonus);
S f o run;
e o
BonusMonth=month(Hire_Date);
A
S No tsit d
35
ou ...
9.1 Creating Variables 9-15
Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title
ia l
r
N 8 $ 12 $ 18 $ 1 N 8 $ 25
a t e
y M n
36
a r t i o ...
i e t b u
p r
Compilation
t r i
data work.comp;
S
o dis SA
set orion.sales;
r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
P f
Bonus=500;
r
Compensation=sum(Salary,Bonus);
S f o run;
e o
BonusMonth=month(Hire_Date);
A
PDV
S No tsit
N 8
d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18
Bonus
N 8
Job_Title
$ 25
ou
$ 2 N 8 N 8 N 8
37 ...
9-16 Chapter 9 Manipulating Data
Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title
ia l
r
N 8 $ 12 $ 18 $ 1 N 8 $ 25
a t e Bonus
N 8
Compensation
N 8
y M n
38
a r t i o ...
i e t b u
p r
Compilation
t r i
data work.comp;
S
o dis SA
set orion.sales;
r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
P f
Bonus=500;
r
Compensation=sum(Salary,Bonus);
S f o run;
e o
BonusMonth=month(Hire_Date);
A
PDV
S No tsitN 8
d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18
Bonus
N 8
Job_Title
$ 25
Compensation BonusMonth
ou
$ 2 N 8 N 8 N 8 N 8 N 8
39 ...
9.1 Creating Variables 9-17
Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title
ia l
r
D
N 8 $ 12 $ 18 $ 1 D N 8 D
$ 25
D
Country Birth_Date Hire_Date
$ 2 D
N 8 D
N 8
a t e
Bonus
N 8
Compensation BonusMonth
N 8 N 8
y M n
40
a r t i o ...
i e t b u
p r
Compilation
t r i
data work.comp;
S
o dis SA
set orion.sales;
r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
P f
Bonus=500;
r
Compensation=sum(Salary,Bonus);
S f o run;
e o
BonusMonth=month(Hire_Date);
A
PDV
S No tsit
N 8
d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18
Bonus
D
Job_Title
$ 25
Compensation BonusMonth
ou
D D D
$ 2 N 8 N 8 N 8 N 8 N 8
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Hire_Date Initialize
drop Gender SalaryPDV
Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
. . . . .
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
42
a r y t i o ...
i e t b u
p r
Execution
Partial orion.sales
t r i S
data work.comp;
o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title
r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
S P r
120103
120121
o
...
o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
f
120122 6756 run;
A PDV
S No tsit
Employee
_ID
120102
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth
M 10744 . . .
ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
43 ...
9.1 Creating Variables 9-19
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120102 M 10744 500 . .
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
44
a r y t i o ...
i e t b u
p r
Execution
Partial orion.sales
t r i S
data work.comp;
o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title
r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
S P 120103
120121
o r
...
o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
f
120122 6756 run;
A PDV
S No tsi
_IDt
Employee
120102
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth
ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
45 ...
9-20 Chapter 9 Manipulating Data
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120102 M 10744 500 108755 6
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
46
a r y t i o ...
i e t b u
p r
Execution
Partial orion.sales
t r i S
data work.comp;
o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title
r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
S P r
120103
120121
o
...
o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
Implicit OUTPUT;
f
120122 6756 run;
e
Implicit RETURN;
A PDV
S No tsit
Employee
_ID
120102 M
d
... DGender ... DHire_Date Bonus Compensation BonusMonth
ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6
47 ...
9.1 Creating Variables 9-21
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Hire_Date Reinitialize
drop Gender Salary PDV
Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120102 M 10744 . . .
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
y
120102 Tom Zhou 500 108755 6
48
a r t i o ...
i e t b u
SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by an
assignment statement are reset to missing, but variables that are read with a SET statement are not reset
p r
to missing.
t r i S
r o dis SA
Execution
S P
Employee
o r
Partial orion.sales
o f
Hire_Date
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
f
_ID Country Birth_Date
A
120102
S No tsit 120103
120121
120122 d
...
e 10744
5114
5114
6756
Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
ou
PDV
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
_ID
120103 M 5114 . . .
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6
49 ...
9-22 Chapter 9 Manipulating Data
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120103 M 5114 500 . .
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
y
120102 Tom Zhou 500 108755 6
50
a r t i o ...
i e t b u
p r
Execution
Partial orion.sales
t r i S
data work.comp;
o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title
r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
S P r
120103
120121
o
...
o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
f
120122 6756 run;
A PDV
S No tsit
Employee
_ID
120103
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth
ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6
51 ...
9.1 Creating Variables 9-23
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120103 M 5114 500 88475 1
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
y
120102 Tom Zhou 500 108755 6
52
a r t i o ...
i e t b u
p r
Execution
Partial orion.sales
t r i S
data work.comp;
o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title
r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
S P 120103
120121
o r
...
o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
Implicit OUTPUT;
f
120122 6756 run;
e
Implicit RETURN;
A PDV
S No tsi
_IDt
Employee
120103 M
d
... DGender ... DHire_Date Bonus Compensation BonusMonth
ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6
120103 Wilson Dawes 500 88475 1
53
9-24 Chapter 9 Manipulating Data
Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Continue until EOF
Hire_Date drop Gender Salary Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);
l
120122 6756 run;
ia
PDV
r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
e
_ID
t
120103 M 5114 500 88475 1
Work.comp
Employee
M a
First_Name Last_Name Bonus Compensation BonusMonth
n
_ID
y
120102 Tom Zhou 500 108755 6
54
a r
120103 Wilson
t i o
Dawes 500 88475 1
i e t b u
p r t i
DROP= and KEEP= Options (Self-Study)
r S
Alternatives to the DROP and KEEP statements are the
r o dis SA
DROP= and KEEP= data set options placed in the DATA
statement.
S P o r o f
The DROP= data set option in the DATA statement
excludes the variables for writing to the output data set.
A t f e
DATA output-SAS-data-set (DROP = variable-list) ;
d
S No tsi The KEEP= data set option in the DATA statement
specifies the variables for writing to the output data set.
56
ou DATA output-SAS-data-set (KEEP = variable-list) ;
When specified for a data set named in the DATA statement, the DROP= and KEEP= data set options are
similar to DROP and KEEP statements. However, the DROP= and KEEP= data set options can be used in
situations where the DROP and KEEP statements cannot. For example, the DROP= and KEEP= data set
options can be used in a PROC step to control which variables are available for processing by the
procedure.
9.1 Creating Variables 9-25
ia l
e r
The KEEP= data set option in the SET statement
t
specifies the variables for processing in the PDV.
a
SET input-SAS-data-set (KEEP = variable-list) ;
M
57
r y i o n
t a u t
r i e i b
DROP= and KEEP= Options (Self-Study)
r S
p t
data work.comp(drop=Salary Hire_Date);
o dis SA
set orion.sales(keep=Employee_ID First_Name
r
Last_Name Salary Hire_Date);
Bonus=500;
P f
Compensation=sum(Salary,Bonus);
r
BonusMonth=month(Hire_Date);
S
run;
f o e o
A
orion.sales
t d
Employee_ First_ Last_ Birth_ Hire_
S No tsi
Gender Salary Job_ Title Country
ID Name Name Date Date
PDV
Employee_ First_ Last_ Hire_
ou
Salary Bonus Compensation BonusMonth
ID Name Name Date
Work.comp
Employee_ First_ Last_
Bonus Compensation BonusMonth
ID Name Name
p109d02
58
9-26 Chapter 9 Manipulating Data
ia l
t e r
DROP and KEEP statements
DROP= and KEEP= options
a
in the DATA statement
M
Output SAS Data Set
y n
59
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9.1 Creating Variables 9-27
Exercises
Level 1
ia l
e r
b. In the DATA step, create two new variables, Increase and NewSalary.
t
• Increase is the Salary multiplied by 0.10.
a
• NewSalary is Salary added with Increase.
M
r i o n
c. Include only the following variables: Employee_ID, Salary, Increase, and NewSalary.
y
d. Store formats displaying commas for Salary, Increase, and NewSalary.
t a u t
e. Submit the program to create the following PROC PRINT report:
r i e i b
Partial PROC PRINT Output (First 10 of 424 Observations)
r S
p t
Employee
o dis SA
Annual
r
Obs Employee_ID Salary Increase NewSalary
S P o r o
1
2
3 f
120101
120102
120103
163,040
108,255
87,975
16,304
10,826
8,798
179,344
119,081
96,773
A t f d e
4
5
120104
120105
46,230
27,110
4,623
2,711
50,853
29,821
S No tsi
6 120106 26,960 2,696 29,656
7 120107 30,475 3,048 33,523
8 120108 27,660 2,766 30,426
9 120109 26,495 2,650 29,145
ou
10 120110 28,615 2,862 31,477
Level 2
b. In the DATA step, create three new variables, Bday2009, BdayDOW2009, and Age2009.
• Bday2009 is the combination of the month of Birth_Date, the day of Birth_Date, and
the constant of 2009 in the MDY function.
• BdayDOW2009 is the day of the week of Bday2009.
• Age2009 is the age of the customer in 2009. Subtract Birth_Date from Bday2009 and
then divide by 365.25.
9-28 Chapter 9 Manipulating Data
Obs Customer_Name
Birth_
Date Bday2009
Bday
DOW2009 Age2009
ia l
1
2
3
t e r
James Kvarniq
Sandrina Stephano
Cornelia Krahl
27JUN1974
09JUL1979
27FEB1974
27JUN2009
09JUL2009
27FEB2009
7
5
6
35
30
35
4
5
6
M a
Karen Ballinger
Elke Wallstab
David Black
18OCT1984
16AUG1974
12APR1969
18OCT2009
16AUG2009
12APR2009
1
1
1
25
35
40
r
7
8
y i o n
Markus Sepke
Ulrich Heyde
21JUL1988
16JAN1939
21JUL2009
16JAN2009
3
6
21
70
t
9 Jimmie Evans 17AUG1954 17AUG2009 2 55
e t a 10 Tonie Asmussen
u
02FEB1954 02FEB2009 2 55
Level 3
r i t r i b S
r p
3. Using the CATX and INTCK Functions to Create Variables
o dis SA
a. Write a DATA step to read orion.sales to create Work.employees.
S P o r o f
b. In the DATA step, create the new variable FullName, which is the combination
of First_Name, a space, and Last_Name. Use the CATX function.
A t f d e
Documentation on the CATX function can be found in the SAS Help
S No tsi
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Functions and CALL Routines CATX Function).
ou
c. In the DATA step, create the new variable Yrs2012, which is the number of years between
January 1, 2012, and Hire_Date. Use the INTCK function.
f. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 10 of 165 Observations)
Years of
Employment
Obs FullName Hire_Date in 2012
l
3 Irenie Elvish 01/01/1974 38
4 Christina Ngan 01/07/1978 34
ia
5 Kimiko Hotstone 01/10/1985 27
r
6 Lucian Daymond 01/03/1979 33
e
7 Fong Hofmeister 01/03/1979 33
t
8 Satyakam Denny 01/08/2006 6
a
9 Sharryn Clarkson 01/11/1998 14
10 Monica Kletschkus 01/11/2006 6
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9-30 Chapter 9 Manipulating Data
Objectives
Execute statements conditionally by using IF-THEN
and IF-THEN DO statements.
Give alternate actions if the previous THEN clause
is not executed by using the ELSE statement.
ia l
r
Control the length of character variables by using
e
the LENGTH statement.
a t
y M n
a r t i o
i e t b u
i
62
p r t r S
r o dis SA
Business Scenario
S P r o f
A new SAS data set named Work.bonus needs to
be created by reading the orion.sales data set.
o
A t f d e
Work.bonus must include a new variable named
ou
300 for Australian employees.
63
9.2 Creating Variables Conditionally 9-31
ia l
r
that form a set of instructions that define a condition
e
for selecting observations.
t
statement is any executable statement such as the
a
assignment statement.
M
64
r y i o n
t a u t
If the condition in the IF clause is met, the IF-THEN statement executes a SAS statement for that
observation.
r i e r i b S
p
o dis SA t
IF-THEN/ELSE Statements (Review)
f
r
if the previous THEN clause is not executed.
S o o
General form of the IF-THEN/ELSE statements:
f e
A
S No tsit d
IF expression THEN statement;
ELSE IF expression THEN statement;
ou
Using IF-THEN statements without the ELSE
statement causes SAS to evaluate all IF-THEN
statements.
Using IF-THEN statements with the ELSE statement
causes SAS to execute IF-THEN statements until it
encounters the first true statement.
65
Business Scenario
Create the new variable Bonus.
data work.bonus;
set orion.sales;
if Country='US' then Bonus=500;
else if Country='AU' then Bonus=300;
run;
1819
1820
data work.bonus;
set orion.sales;
ia l
r
1821 if Country='US' then Bonus=500;
1822 else if Country='AU' then Bonus=300;
e
1823 run;
a t
NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.BONUS has 165 observations and 10 variables
y M n
66
a r t i o p109d03
i e t b u
p r
Business Scenario
t r i
proc print data=work.bonus;
S
r o dis SA
var First_Name Last_Name Country Bonus;
run;
S P o r o f
Partial PROC PRINT Output
f e
Obs First_Name Last_Name Country Bonus
A
S No tsit 60
61
62
63
64
d
Billy
Matsuoka
Vino
Meera
Harry
Plested
Wills
George
Body
Highpoint
AU
AU
AU
AU
US
300
300
300
300
500
ou
65 Julienne Magolan US 500
66 Scott Desanctis US 500
67 Cherda Ridley US 500
68 Priscilla Farren US 500
69 Robert Stevens US 500
p109d03
67
9.2 Creating Variables Conditionally 9-33
9.05 Quiz
Why are some of the Bonus values missing in
the PROC PRINT output for orion.nonsales?
Submit program p109a02.
Review the results.
ia l
t e r
M a
69
r y i o n
t a u t
r i e
ELSE Statements
r i b S
p t
The conditional clause does not have to be in an
o dis SA
ELSE statement.
P r f
r
For example:
S o
data work.bonus;
f e o
set orion.sales;
A
S No tsit
run;
if Country='US' then Bonus=500;
d
else Bonus=300;
ou
All observations not equal to US get a bonus of 300.
p109d03
71
9-34 Chapter 9 Manipulating Data
Business Scenario
A new SAS data set named Work.bonus needs to
be created by reading the orion.sales data set.
ia l
r
300 for Australian employees.
t e
Work.bonus must include another new variable
a
named Freq that is equal to
y M
Once a Year for United States employees
n
Twice a Year for Australian employees.
72
a r t i o
i e t b u
p r t i
IF-THEN/ELSE Statements
r S
Only one executable statement is allowed in
r o dis SA
IF-THEN/ELSE statements.
S P o r o f
IF expression THEN statement;
ELSE IF expression THEN statement;
f
ELSE statement;
A
S No tsit d e
For the given business scenario, two statements
need to be executed per each true expression.
ou
Bonus=500;
if Country='US' then
Freq='Once per Year';
73
9.2 Creating Variables Conditionally 9-35
l
END;
ia
ELSE IF expression THEN DO;
executable statements DO Group
END;
t e r
a
Each DO group can contain multiple statements
that apply to the expression.
M n
Each DO group ends with an END statement.
74
a r y t i o
i e t b u
p r
Business Scenario
t r i S
Create another new variable named Freq.
r o dis SA
data work.bonus;
set orion.sales;
S P o
Bonus=500;
o f
if Country='US' then do;
r
Freq='Once a Year';
A t fend;
e
else if Country='AU' then do;
d
S No tsi
Bonus=300;
Freq='Twice a Year';
end;
run;
75
ou p109d04
9-36 Chapter 9 Manipulating Data
Business Scenario
proc print data=work.bonus;
var First_Name Last_Name
Country Bonus Freq;
run;
Partial PROC PRINT Output
l
Obs First_Name Last_Name Country Bonus Freq
ia
60 Billy Plested AU 300 Twice a Yea
61 Matsuoka Wills AU 300 Twice a Yea
r
62 Vino George AU 300 Twice a Yea
e
63 Meera Body AU 300 Twice a Yea
t
64 Harry Highpoint US 500 Once a Year
65 Julienne Magolan US 500 Once a Year
a
66 Scott Desanctis US 500 Once a Year
67 Cherda Ridley US 500 Once a Year
M
68 Priscilla Farren US 500 Once a Year
69 Robert Stevens US 500 Once a Year
76
r y i o n p109d04
t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.bonus;
set orion.sales;
f
Bonus=500;
r
Freq='Once a Year';
S f o
end;
o
else if Country='AU' then do;
e
A
Bonus=300;
ou
PDV
Employee_ID First_Name Hire_Date
...
N 8 $ 12 N 8
77 ...
9.2 Creating Variables Conditionally 9-37
Compilation
data work.bonus;
set orion.sales;
if Country='US' then do;
Bonus=500;
Freq='Once a Year';
end;
l
else if Country='AU' then do;
Bonus=300;
ia
Freq='Twice a Year';
r
end;
e
run;
PDV
Employee_ID First_Name
a t
Hire_Date Bonus
M
...
N 8 $ 12 N 8 N 8
78
r y i o n ...
t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.bonus;
set orion.sales;
f
r
Freq='Once a Year';
S f o
end;
o
else if Country='AU'
e
11 characters then do;
A
Bonus=300;
S No tsit end;
run; d
Freq='Twice a Year';
ou
PDV
Employee_ID First_Name Hire_Date Bonus Freq
...
N 8 $ 12 N 8 N 8 $ 11
79
9-38 Chapter 9 Manipulating Data
9.06 Quiz
How would you prevent Freq from being truncated?
ia l
t e r
M a
81
r y i o n
t a u t
r i e i b
The LENGTH Statement (Review)
r S
p t
The LENGTH statement defines the length of a
o dis SA
variable explicitly.
P r f
r
General form of the LENGTH statement:
S f o o
LENGTH variable(s) $ length;
e
A
S No tsit Example:
d
length First_Name Last_Name $ 12
ou
Gender $ 1;
83
9.2 Creating Variables Conditionally 9-39
Business Scenario
Set the length of the variable Freq to avoid truncation.
data work.bonus;
set orion.sales;
length Freq $ 12;
if Country='US' then do;
Bonus=500;
end;
Freq='Once a Year';
ia l
end;
Bonus=300;
Freq='Twice a Year';
t e r
run;
M a
84
r y i o n p109d04
t a u t
r i e
Business Scenario
r i b S
p t
proc print data=work.bonus;
o dis SA
var First_Name Last_Name
P rrun;
Country Bonus Freq;
f
S f
Obs
o r
First_Name
o
Partial PROC PRINT Output
Last_Name Country Bonus Freq
A
S No tsit
60
61
62
63
64
Billy
Vino
Meera
Harry d
Matsuoka
e Plested
Wills
George
Body
Highpoint
AU
AU
AU
AU
US
300
300
300
300
500
Twice a Year
Twice a Year
Twice a Year
Twice a Year
Once a Year
65 Julienne Magolan US 500 Once a Year
ou
66 Scott Desanctis US 500 Once a Year
67 Cherda Ridley US 500 Once a Year
68 Priscilla Farren US 500 Once a Year
69 Robert Stevens US 500 Once a Year
p109d04
85
9-40 Chapter 9 Manipulating Data
ELSE Statements
The conditional clause does not have to be in an
ELSE statement.
data work.bonus;
set orion.sales;
length Freq $ 12;
if Country='US' then do;
Bonus=500;
Freq='Once a Year';
end;
ia l
else do;
Bonus=300;
t
Freq='Twice a Year';
e r
a
end;
run;
M n
All observations not equal to US execute the
y o
statements in the second DO group.
r i
p109d04
86
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
9.2 Creating Variables Conditionally 9-41
Exercises
Level 1
ia l
e r
b. In the DATA step, create three new variables, Discount, DiscountType, and Region.
t
a
If Country is equal to CA or US,
• Discount is equal to 0.10
M
r i o n
• DiscountType is equal to Required
y
• Region is equal to North America.
t a u t
If Country is equal to any other value,
r i e r i b
• Discount is equal to 0.05
• DiscountType is equal to Optional
S
p t
• Region is equal to Not North America.
o dis SA
P r
c. Include only the following variables: Supplier_Name, Country, Discount,
f
DiscountType, and Region.
S o r o
d. Submit the program to create the following PROC PRINT report:
f
A
S No tsit Obs
e
Partial PROC PRINT Output (First 10 of 52 Observations)
d
Supplier_Name Country Region Discount
Discount
Type
ou
1 Scandinavian Clothing A/S NO Not North America 0.05 Optional
2 Petterson AB SE Not North America 0.05 Optional
3 Prime Sports Ltd GB Not North America 0.05 Optional
4 Top Sports DK Not North America 0.05 Optional
5 AllSeasons Outdoor Clothing US North America 0.10 Required
6 Sportico ES Not North America 0.05 Optional
7 British Sports Ltd GB Not North America 0.05 Optional
8 Eclipse Inc US North America 0.10 Required
9 Magnifico Sports PT Not North America 0.05 Optional
10 Pro Sportswear Inc US North America 0.10 Required
9-42 Chapter 9 Manipulating Data
Level 2
b. Create the new variable DayOfWeek, which is equal to the week day of Order_Date.
ia l
t e r
• Retail Sale if Order_Type is equal to 3.
M a
• Mail if Order_Type is equal to 1
• Email if Order_Type is equal to 2.
r y i o n
e. Do not include Order_Type, Employee_ID, and Customer_ID.
t a t
f. Write a PROC PRINT step to create the following report:
u
r i e r i b
Partial PROC PRINT Output (First 20 of 490 Observations)
S
Day
p
o dis SA
Obs
t Order_ID
Order_
Date
Delivery_
Date Type
Sale
Ads
Of
Week
P r 1 1230058123
f
11JAN2003 11JAN2003 Catalog Sale Mail 7
r
2 1230080101 15JAN2003 19JAN2003 Internet Sale Email 4
S f o
3
4
e o
1230106883
1230147441
20JAN2003
28JAN2003
22JAN2003
28JAN2003
Internet Sale
Catalog Sale
Email
Mail
2
3
A
5 1230315085 27FEB2003 27FEB2003 Catalog Sale Mail 5
S No tsit 6
7
8
9
d
1230333319
1230338566
1230371142
1230404278
02MAR2003
03MAR2003
09MAR2003
15MAR2003
03MAR2003
08MAR2003
11MAR2003
15MAR2003
Internet Sale
Internet Sale
Internet Sale
Catalog Sale
Email
Email
Email
Mail
1
2
1
7
ou
10 1230440481 22MAR2003 22MAR2003 Catalog Sale Mail 7
11 1230450371 24MAR2003 26MAR2003 Internet Sale Email 2
12 1230453723 24MAR2003 25MAR2003 Internet Sale Email 2
13 1230455630 25MAR2003 25MAR2003 Catalog Sale Mail 3
14 1230478006 28MAR2003 30MAR2003 Internet Sale Email 6
15 1230498538 01APR2003 01APR2003 Catalog Sale Mail 3
16 1230500669 02APR2003 03APR2003 Retail Sale 4
17 1230503155 02APR2003 03APR2003 Internet Sale Email 4
18 1230591673 18APR2003 23APR2003 Internet Sale Email 6
19 1230591675 18APR2003 20APR2003 Retail Sale 6
20 1230591684 18APR2003 18APR2003 Catalog Sale Mail 6
9.2 Creating Variables Conditionally 9-43
Level 3
b. Create two new variables, Gift1 and Gift2, using a SELECT group with WHEN statements.
If Gender is equal to F,
• Gift1 is equal to Perfume
• Gift2 is equal to Cookware.
ia l
If Gender is equal to M,
t
• Gift1 is equal to Cologne
e r
a
• Gift2 is equal to Lawn Equipment.
M
r
• Gift1 is equal to Coffee
i o n
If Gender is not equal to F or M,
y
a t
• Gift2 is equal to Calendar.
t u
r i e r i b
Documentation on the SELECT group with WHEN statements can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
S
p
o dis SA t
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Statements SELECT Statement).
P r
c. Include only the following variables: Employee_ID, First, Last, Gift1, and Gift2.
f
S f o r o
d. Write a PROC PRINT step to create the following report:
e
Partial PROC PRINT Output (First 15 of 235 Observations)
A
S No tsit Obs
1
2
d
Employee_ID
120101
120104
First
Patrick
Kareen
Last
Lu
Billington
Gift1
Cologne
Perfume
Gift2
Lawn Equipment
Cookware
ou
3 120105 Liz Povey Perfume Cookware
4 120106 John Hornsey Cologne Lawn Equipment
5 120107 Sherie Sheedy Perfume Cookware
6 120108 Gladys Gromek Perfume Cookware
7 120108 Gabriele Baker Perfume Cookware
8 120110 Dennis Entwisle Cologne Lawn Equipment
9 120111 Ubaldo Spillane Cologne Lawn Equipment
10 120112 Ellis Glattback Perfume Cookware
11 120113 Riu Horsey Perfume Cookware
12 120114 Jeannette Buddery Coffee Calendar
13 120115 Hugh Nichollas Cologne Lawn Equipment
14 . Austen Ralston Cologne Lawn Equipment
15 120117 Bill Mccleary Cologne Lawn Equipment
9-44 Chapter 9 Manipulating Data
Objectives
Subset observations by using the WHERE statement.
Subset observations by using the subsetting IF
statement.
Subset observations by using the IF-THEN DELETE
ia l
r
statement. (Self-Study)
a t e
y M n
a r t i o
i e t b u
i
90
p r t r S
r o dis SA
Business Scenario
S P r o f
A new SAS data set named Work.december needs
to be created by reading the orion.sales data set.
o
A t f e
Work.december must include the following new
variables:
d
S No tsi Bonus, which is equal to a constant 500.
Compensation, which is the combination of the
ou
employee's salary and bonus.
BonusMonth, which is equal to the month the
employee was hired.
Work.december must include only the employees
from Australia who have a bonus month in December.
91
9.3 Subsetting Observations 9-45
ia l
r
condition for selecting observations.
e
Operands include constants and variables.
t
Operators are symbols that request a comparison,
a
arithmetic calculation, or logical operation.
M
92
r y i o n
t a u t
r i e i b
Processing the WHERE Statement
r S
p t
The WHERE statement selects observations before they
o dis SA
are brought into the program data vector.
P r f
S f o r o
Input SAS Data Set
A
S No tsit
PDV
d e WHERE statement
93
ou Output SAS Data Set
9-46 Chapter 9 Manipulating Data
9.07 Quiz
Why does the WHERE statement not work in this
DATA step?
data work.december;
set orion.sales;
BonusMonth=month(Hire_Date);
l
Bonus=500;
ia
Compensation=sum(Salary,Bonus);
where Country='AU' and BonusMonth=12;
run;
t e r
M a
95
r y i o n p109d05
t a u t
r i e i b
The Subsetting IF Statement
r S
p t
The subsetting IF statement continues processing
o dis SA
only those observations that meet the condition.
f
S f o r o
IF expression;
A
S No tsit d e
The expression is a sequence of operands and operators
that form a set of instructions that define a condition for
selecting observations.
Operands include constants and variables.
ou
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.
97
9.3 Subsetting Observations 9-47
if Hire_Date = '15APR2008'd;
ia l
if upcase(Gender)='M';
t e r
if BirthMonth = 5 or BirthMonth = 6;
M a
if 40000 <= Compensation <= 80000;
98
r y
if sum(Salary,Bonus) < 43000;
i o n
t a u t
Special WHERE operators such as BETWEEN-AND, IS NULL, IS MISSING, CONTAINS, and LIKE
r i e r i b
cannot be used with the subsetting IF statement.
S
p
o dis SA t
Processing the Subsetting IF Statement
P r f
The subsetting IF statement determines if observations
S f o r
continue being processed in the program data vector.
o
A
S No tsit d e
Input SAS Data Set
WHERE statement
PDV
ou
subsetting IF statement
IF Expression IF Expression
FALSE
ia l
TRUE
Continue Processing
t e r
a
Observation A false IF expression
causes the contents of
Output Observation
y
to SAS Data Set
M n
the PDV to not output.
100
a r t i o
i e t b u
p r
Business Scenario
t r i S
Include only the employees from Australia who have
r o dis SA
a bonus month in December.
P
data work.december;
S o r
set orion.sales;
o f
where Country='AU';
f
BonusMonth=month(Hire_Date);
A
S No tsit d e
if BonusMonth=12;
Bonus=500;
Compensation=sum(Salary,Bonus);
run;
ou
Partial SAS Log
NOTE: There were 63 observations read from the data set ORION.SALES.
WHERE Country='AU';
NOTE: The data set WORK.DECEMBER has 3 observations and 12 variables.
p109d05
101
9.3 Subsetting Observations 9-49
9.08 Quiz
Could you write only an IF statement?
Yes
No
data work.december;
set orion.sales;
where Country='AU';
BonusMonth=month(Hire_Date);
if BonusMonth=12;
ia l
Bonus=500;
e
data work.december;
Compensation=sum(Salary,Bonus);
run;
t
set orion.sales;
r
BonusMonth=month(Hire_Date);
Bonus=500;
M a
if BonusMonth=12 and Country='AU';
Compensation=sum(Salary,Bonus);
103
run;
r y i o n p109d05
t a u t
r i e i b
WHERE Statement versus Subsetting
r S
p
IF Statement
o dis SA t
P r Step and Usage
PROC step
f
WHERE
Yes
IF
No
S o r o
DATA step (source of variable)
f
A
S No tsit d e
INPUT statement
assignment statement
SET statement (single data set)
No
No
Yes
Yes
Yes
Yes
ou
SET/MERGE statement (multiple data sets)
Variable in ALL data sets Yes Yes
Variable not in ALL data sets No Yes
105
9-50 Chapter 9 Manipulating Data
ia l
observation.
t e r
The DELETE statement stops processing the current
M a
107
r y i o n
t a u t
When the DELETE statement executes, the current observation is not written to a data set, and SAS
r i e r i b
returns immediately to the beginning of the DATA step for the next iteration.
S
p
o dis SA t
The IF-THEN DELETE Statement (Self-Study)
P r data work.december;
f
S f o r o
set orion.sales;
where Country='AU';
e
BonusMonth=month(Hire_Date);
A
S No tsit
if BonusMonth ne 12 then delete;
d
Bonus=500;
Compensation=sum(Salary,Bonus);
run;
ou
equivalent
data work.december;
set orion.sales;
where Country='AU';
BonusMonth=month(Hire_Date);
if BonusMonth=12;
Bonus=500;
Compensation=sum(Salary,Bonus);
run;
p109d06
108
9.3 Subsetting Observations 9-51
Exercises
Level 1
ia l
e r
b. In the DATA step, write a statement to select only the observations that have Emp_Hire_Date
t
greater than or equal to July 1, 2006. Subset the observations as they are being read into the
program data vector.
M a
c. In the DATA step, write another statement to select only the observations that have an increase
r y
greater than 3000.
i o n
t
d. Submit the program to create the following PROC PRINT report:
e t a u
Employee
Annual Employee
r i Obs
t r i b
Employee ID
S
Salary Hire Date Increase NewSalary
p
1 120128 30,890 01NOV2006 3,089 33,979
o dis SA
2 120144 30,265 01OCT2006 3,027 33,292
r
3 120161 30,785 01OCT2006 3,079 33,864
4 120264 37,510 01DEC2006 3,751 41,261
S P o r
5
6
7
o f 120761
120995
121055
30,960
34,850
30,185
01JUL2006
01AUG2006
01AUG2006
3,096
3,485
3,019
34,056
38,335
33,204
A t f 8
d
9
e
121062
121085
30,305
32,235
01AUG2006
01JAN2007
3,031
3,224
33,336
35,459
S No tsi
Level 2
10 121107 31,380 01JUL2006 3,138 34,518
ou
8. Subsetting Observations Based on Three Conditions
a. Write a DATA step to read orion.orders to create Work.delays.
b. Create the new variable Order_Month, which is equal to the month of Order_Date.
c. Use a WHERE statement and a subsetting IF statement to select only the observations that
meet all of the following conditions:
• Delivery_Date values that are more than four days beyond Order_Date
• Employee_ID values that are equal to 99999999
• Order_Month values occurring in August
9-52 Chapter 9 Manipulating Data
l
6 1233514453 3 99999999 70201 15AUG2004 20AUG2004 8
7 1236673732 3 99999999 9 10AUG2005 15AUG2005 8
ia
8 1240051245 3 99999999 71 30AUG2006 05SEP2006 8
r
9 1243165497 3 99999999 70201 24AUG2007 29AUG2007 8
Level 3
a t e
M
9. Using an IF-THEN DELETE Statement to Subset Observations
r y
Work.bigdonations.
i o n
a. Write a DATA step to read orion.employee_donations to create
t a u t
b. Create the new variable Total, which is equal to the sum of Qtr1, Qtr2, Qtr3, and Qtr4.
i e i b
c. Create the new variable NoDonation, which is equal to the count of missing values in Qtr1,
r r S
Qtr2, Qtr3, and Qtr4. Use the NMISS function.
p
o dis SA t
Documentation on the NMISS function can be found in the SAS Help and
f
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
S f o r o
Functions and CALL Routines NMISS Function).
e
d. The final data set should contain only observations meeting the following two conditions:
A
S No tsit d
• Total values greater than or equal to 50
• NoDonation values equal to 0.
Use an IF-THEN DELETE statement to eliminate the observations where the conditions are not
ou
met.
The IF-THEN DELETE statement is mentioned at the end of this section in a self-study
section.
e. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 7 of 50 Observations)
No
Obs Employee_ID Qtr1 Qtr2 Qtr3 Qtr4 Total Donation
1 120267 15 15 15 15 60 0
2 120269 20 20 20 20 80 0
3 120271 20 20 20 20 80 0
4 120275 15 15 15 15 60 0
5 120660 25 25 25 25 100 0
6 120669 15 15 15 15 60 0
7 120671 20 20 20 20 80 0
9.4 Chapter Review 9-53
Chapter Review
1. What is an advantage in using the SUM function
instead of an arithmetic operator?
ia l
r
from the input or output data set in the DATA step?
t e
3. When would you use DO group statements in the
DATA step?
a
M n
4. What is the default length of a numeric variable
y
r i o
created in an assignment statement?
a t
i e t b u
i
110
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9-54 Chapter 9 Manipulating Data
9.5 Solutions
Solutions to Exercises
1. Creating Two New Variables
a. Retrieve the starter program.
b. Create two new variables.
data work.increase;
ia l
r
set orion.staff;
e
Increase=Salary*0.10;
run;
a t
NewSalary=sum(Salary,Increase);
run;
y M
proc print data=work.increase label;
n
r t i
c. Include only four variables.
a o
i e t
data work.increase;
set orion.staff;
b u
r
Increase=Salary*0.10;
r i
NewSalary=sum(Salary,Increase);
p t S
keep Employee_ID Salary Increase NewSalary;
run;
r o dis SA
P
run;
S o o f
proc print data=work.increase label;
r
A t f d
data work.increase;
e
d. Format three variables.
S No tsi
set orion.staff;
Increase=Salary*0.10;
NewSalary=sum(Salary,Increase);
ou
keep Employee_ID Salary Increase NewSalary;
format Salary Increase NewSalary comma10.;
run;
ia l
BdayDOW2009=weekday(Bday2009);
t r
Bday2009=mdy(month(Birth_Date),day(Birth_Date),2009);
e
Age2009=(Bday2009-Birth_Date)/365.25;
run;
M a
keep Customer_Name Birth_Date Bday2009 BdayDOW2009 Age2009;
r y i o n
t a
set orion.customer;
u t
Bday2009=mdy(month(Birth_Date),day(Birth_Date),2009);
r i e i b
BdayDOW2009=weekday(Bday2009);
Age2009=(Bday2009-Birth_Date)/365.25;
r S
p t
keep Customer_Name Birth_Date Bday2009 BdayDOW2009 Age2009;
o dis SA
format Bday2009 date9. Age2009 3.;
r
run;
S P o o f
e. Write a PROC PRINT step.
r
proc print data=work.birthday;
f
run;
A
S No tsit d e
3. Using the CATX and INTCK Functions to Create Variables
a. Write a DATA step.
data work.employees;
ou
set orion.sales;
run;
b. Create the new variable FullName.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
run;
c. Create the new variable Yrs2012.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
run;
9-56 Chapter 9 Manipulating Data
d. Format Hire_Date.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
format Hire_Date ddmmyy10.;
run;
e. Add a label.
data work.employees;
set orion.sales;
ia l
r
FullName=catx(' ',First_Name,Last_Name);
e
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
format Hire_Date ddmmyy10.;
t
run;
M a
label Yrs2012='Years of Employment in 2012';
r y i o
proc print data=work.employees label;n
run;
t a u t
var FullName Hire_Date Yrs2012;
i e i b
4. Creating Variables Conditionally
r r S
p t
a. Retrieve the starter program.
o dis SA
r
b. Create three new variables.
P f
data work.region;
S o r
set orion.supplier;
length Region $ 17;
f o
e
if Country in ('CA','US') then do;
A
S No tsit
end;
d
Discount=0.10;
DiscountType='Required';
Region='North America';
ou
else do;
Discount=0.05;
DiscountType='Optional';
Region='Not North America';
end;
run;
ia l
end;
DiscountType='Optional';
Region='Not North America';
t e r
keep Supplier_Name Country
a
Discount DiscountType Region ;
M n
run;
r y
proc print data=work.region;
run;
a t i o
e t
d. Submit the program.
i b u
r r i
5. Creating Variables Unconditionally and Conditionally
p t S
o dis SA
a. Write a DATA step.
P r
data work.ordertype;
set orion.orders;
f
run;
S f o r o
b. Create the new variable DayOfWeek.
A t d e
data work.ordertype;
S No tsi
set orion.orders;
DayOfWeek=weekday(Order_Date);
run;
ou
c. Create the new variable Type.
data work.ordertype;
set orion.orders;
length Type $ 13;
DayOfWeek=weekday(Order_Date);
if Order_Type=1 then do;
Type='Catalog Sale';
end;
else if Order_Type=2 then do;
Type='Internet Sale';
end;
else if Order_Type=3 then do;
Type='Retail Sale';
end;
run;
9-58 Chapter 9 Manipulating Data
l
end;
else if Order_Type=2 then do;
ia
Type='Internet Sale';
SaleAds='Email';
end;
else if Order_Type=3 then do;
t e r
Type='Retail Sale';
end;
M a
n
run;
a
data work.ordertype;
r y t i o
e. Do not include three variables.
i e t
set orion.orders;
b u
length Type $ 13 SaleAds $ 5;
p r t i
DayOfWeek=weekday(Order_Date);
r
if Order_Type=1 then do;
S
o dis SA
Type='Catalog Sale';
r
SaleAds='Mail';
end;
S P o o f
else if Order_Type=2 then do;
r
Type='Internet Sale';
f
SaleAds='Email';
A end;
S No tsit d e
else if Order_Type=3 then do;
end;
Type='Retail Sale';
ou
run;
f. Write a PROC PRINT step.
proc print data=work.ordertype;
run;
6. Using WHEN Statements in a SELECT Group to Create Variables Conditionally
a. Write a DATA step.
data work.gifts;
set orion.nonsales;
run;
9.5 Solutions 9-59
ia l
Gift2='Lawn Equipment';
end;
otherwise do;
t e r
Gift1='Coffee';
Gift2='Calendar';
M a
n
end;
y
end;
run;
a r t i o
i e
data work.gifts;
t b u
c. Include only five variables.
p r
set orion.nonsales;
t r i
length Gift1 Gift2 $ 15;
S
o dis SA
select(Gender);
r
when('F') do;
Gift1='Perfume';
S P Gift2='Cookware';
end;
o r o f
f
when('M') do;
A
S No tsitend;
d e
Gift1='Cologne';
Gift2='Lawn Equipment';
otherwise do;
Gift1='Coffee';
ou
Gift2='Calendar';
end;
end;
keep Employee_ID First Last Gift1 Gift2;
run;
d. Write a PROC PRINT step.
proc print data=gifts;
run;
9-60 Chapter 9 Manipulating Data
ia l
r
format Salary Increase NewSalary comma10.;
run;
a t e
proc print data=work.increase label;
run;
M n
c. Write another statement to select only the observations based on Increase.
y
data work.increase;
set orion.staff;
a r t i o
t u
where Emp_Hire_Date>='01JUL2006'd;
e
Increase=Salary*0.10;
r i
if Increase>3000;
r i b
NewSalary=sum(Salary,Increase);
t S
p
keep Employee_ID Emp_Hire_Date Salary Increase NewSalary;
o dis SA
format Salary Increase NewSalary comma10.;
run;
r
P r o f
proc print data=work.increase label;
S
run;
o
A t f e
d. Submit the program.
d
S No tsi
8. Subsetting Observations Based on Three Conditions
a. Write a DATA step.
ou
data work.delays;
set orion.orders;
run;
b. Create a new variable.
data work.delays;
set orion.orders;
Order_Month=month(Order_Date);
run;
9.5 Solutions 9-61
ia l
run;
t e r
9. Using an IF-THEN DELETE Statement to Subset Observations
a. Write a DATA step.
data work.bigdonations;
M a
run;
r y
set orion.employee_donations;
i o n
t a t
b. Create the new variable Total.
u
r e
data work.bigdonations;
i i b
set orion.employee_donations;
r
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
S
run;
p
o dis SA t
P r
c. Create the new variable NoDonation.
data work.bigdonations;
f
S o r o
set orion.employee_donations;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
f
A run;
S No tsi d e
NoDonation=nmiss(Qtr1,Qtr2,Qtr3,Qtr4);
t
d. Use an IF-THEN DELETE statement.
data work.bigdonations;
ou
set orion.employee_donations;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
NoDonation=nmiss(Qtr1,Qtr2,Qtr3,Qtr4);
if Total < 50 or NoDonation > 0 then delete;
run;
e. Write a PROC PRINT step.
proc print data=work.bigdonations;
var Employee_ID Qtr1 Qtr2 Qtr3 Qtr4 Total NoDonation;
run;
9-62 Chapter 9 Manipulating Data
ia l
d. 9
t e r
M a
The order of operations from left to right is division and
multiplication followed by addition and subtraction.
y o n
Parentheses can be used to control the order of
operations.
r i
t a t
num = (4 + 10) / 2;
u
13
r i e r i b S
p
o dis SA t
P r 9.02 Quiz – Correct Answer
f
What is the result of the assignment statement given
S f o r o
the values of var1 and var2?
A
S No tsit
a.
b.
c.
0
5 d e
. (missing)
d. 10
ou
num = var1 + var2 / 2; var1 var2
. 10
16
9.5 Solutions 9-63
ia l
t e r
M a
26
r y i o n
t a u t
r i e i b
9.04 Poll – Correct Answer
r S
p t
Are the correct results produced when the DROP
o dis SA
statement is placed after the SET statement?
P r Yes
f
r
No
S f o e o
A
Yes, the DROP statement specifies the names
S No tsit d
of the variables to omit from the output data set.
33
ou
9-64 Chapter 9 Manipulating Data
ia l
data work.bonus;
set orion.nonsales;
t
if upcase(Country)='US'
e r
then Bonus=500;
a
else if upcase(Country)='AU'
M
then Bonus=300;
70
run;
r y i o n p109a02s
t a u t
r i e i b
9.06 Quiz – Correct Answer
r S
p t
How would you prevent Freq from being truncated?
o dis SA
P r Possible solutions:
f
S f o r o
Pad the first occurrence of the Freq value with
blanks to be the length of the longest possible
A
S No tsit
value.
d e
Switch conditional statements to place the
longest value of Freq in the first conditional
statement.
ou
Add a LENGTH statement to declare the byte size
of the variable up front.
82
9.5 Solutions 9-65
data work.december;
set orion.sales;
BonusMonth=month(Hire_Date);
l
Bonus=500;
ia
Compensation=sum(Salary,Bonus);
where Country='AU' and BonusMonth=12;
run;
t e r
a
The WHERE statement can only subset variables that
are coming from an existing data set.
M n
ERROR: Variable BonusMonth is not on file ORION.SALES.
y
96
a r t i o p109d05
i e t b u
p r t i
9.08 Quiz – Correct Answer
r S
Could you write only an IF statement?
r o dis SA
Yes
No
S P o r o f
Yes, but the program using both the
A t f e
WHERE and IF statements is more efficient.
d
S No tsi Both methods create a data set with three
observations. The program using both statements
ou
reads 63 observations into the PDV. The program
using only the IF statement reads 165 observations
into the PDV.
p109d05
104
9-66 Chapter 9 Manipulating Data
t e r
from the input or output data set in the DATA step?
M a
3. When would you use DO group statements in the
n
DATA step?
y t i o
Use DO group statements when you want to
r
execute multiple statements as a result of a true
a
t
IF expression in an IF-THEN statement.
111
i e i b u continued...
p r t r S
r o dis SA
Chapter Review Answers
P f
4. What is the default length of a numeric variable
S f o r
8 bytes
o
created in an assignment statement?
A
S No tsit d e
ou
112
Chapter 10 Combining SAS Data
Sets
ia l
t e r
10.2 Appending a Data Set (Self-Study) ............................................................................. 10-7
Exercises ............................................................................................................................ 10-17
M a
10.3 Concatenating Data Sets ........................................................................................... 10-19
y o n
Exercises ............................................................................................................................ 10-41
r i
t a u t
10.4 Merging Data Sets One-to-One .................................................................................. 10-44
r i e r i b S
10.5 Merging Data Sets One-to-Many ............................................................................... 10-53
p
o dis SA t
Exercises ............................................................................................................................ 10-64
P r f
r
10.6 Merging Data Sets with Nonmatches ........................................................................ 10-66
S f o o
Exercises ............................................................................................................................ 10-89
e
A
S No tsit d
10.7 Chapter Review ........................................................................................................... 10-92
ou
10.8 Solutions ..................................................................................................................... 10-93
Solutions to Exercises ........................................................................................................ 10-93
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10.1 Introduction to Combining Data Sets 10-3
Objectives
Define the methods for combining SAS data sets.
ia l
t e r
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rAppending and Concatenating
f
r
Appending and concatenating involves combining SAS
S f o e o
data sets, one after the other, into a single SAS data set.
A
S No tsit d
SAS Data Set Appending adds the observations
in the second data set directly to
the end of the original data set.
+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.
4
10-4 Chapter 10 Combining SAS Data Sets
Merging
Merging involves combining observations from two or
more SAS data sets into a single observation in a new
SAS data set.
+
ia l
t e r
a
Observations can be merged based on their positions in
the original data sets or merged by one or more common
variables.
y M n
5
a r t i o
i e t b u
p r t i
Example: Appending a Data Set
r S
One data set is appended to a master data set.
r o dis SA
Emps
S P r
First
Stacey
o
F
f
Gender HireYear
o 2006 Emps
f
Gloria F 2007 First Gender HireYear
A
S No tsit
James
Emps2008
First
M
d e 2007
Gender HireYear
Stacey
Gloria
James
Brett
F
F
M
M
2006
2007
2007
2008
ou
Renee F 2008
Brett M 2008
Renee F 2008
6
10.1 Introduction to Combining Data Sets 10-5
EmpsDK
First Gender Country
Lars M Denmark EmpsAll1
l
Kari F Denmark First Gender Country
ia
Jonas M Denmark Lars M Denmark
Kari F Denmark
EmpsFR
First Gender Country
t e r Jonas
Pierre
Sophie
M
M
F
Denmark
France
France
a
Pierre M France
Sophie F France
y M n
7
a r t i o
i e t b u
p r t i
Example: Merging Data Sets
r S
Two data sets are merged to create a new data set.
r o dis SA
EmpsAU PhoneH
S P First
o
Togar
r
Gender EmpID
M
o
121150
f EmpID Phone
121150 +61(2)5555-1793
f
121151 +61(2)5555-1849
e
Kylie F 121151
A
S No tsit
Birin M
d
121152 121152 +61(2)5555-1665
EmpsAUH
ou
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1793
Kylie F 121151 +61(2)5555-1849
Birin M 121152 +61(2)5555-1665
8
10-6 Chapter 10 Combining SAS Data Sets
10.01 Quiz
Which method (appending, concatenating, or merging)
should be used for the given business scenario?
ia l
2
r
The Sales data set needs to be combined
e
with the Target data set by month to
t
compare the sales data to the target data.
M a
The OctSales data set needs to be
added to the YTD data set.
10
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10.2 Appending a Data Set (Self-Study) 10-7
Objectives
Append one SAS data set to another SAS data set
by using the APPEND procedure.
Append a SAS data set containing additional variables
to another SAS data set by using the FORCE option
ia l
r
with the APPEND procedure.
a t e
y M n
a r t i o
i e t b u
i
14
p r t r S
r o dis SA
Appending and Concatenating
S P o r o f
Appending and concatenating involves combining SAS
data sets, one after the other, into a single SAS data set.
S No tsi
SAS Data Set
in the second data set directly to
the end of the original data set.
+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.
15
10-8 Chapter 10 Combining SAS Data Sets
l
DATA = SAS-data-set;
ia
RUN;
are added.
t e r
BASE= names the data set to which observations
a
DATA= names the data set containing observations
that are added to the base data set.
y M n
16
a r t i o
i e t b u
p r
Requirements:
t i
The APPEND Procedure
r S
r o dis SA
Only two data sets can be used at a time in one step.
The observations in the base data set are not read.
S P o r o f
The variable information in the descriptor portion of
the base data set cannot change.
A t f d e
S No tsi
17
ou
10.2 Appending a Data Set (Self-Study) 10-9
Business Scenario
Emps is a master data set
that contains employees
hired in 2006 and 2007.
Emps
First Gender HireYear
Stacey
Gloria
F
F
2006
2007
ia l
r
James M 2007
a t e
y M n
18
a r t i o
i e t b u
p r
Business Scenario
t r i
Emps is a master data set
S
Emps2008
r o dis SA
that contains employees
hired in 2006 and 2007.
First
Brett
Gender HireYear
M 2008
S P Emps
o r o f Renee
Emps2009
F 2008
f
First Gender HireYear
A
S No tsi
Stacey
t
Gloria
James
F
F
M
d e 2006
2007
2007
First HireYear
Sara
Dennis
Emps2010
2009
2009
ou
The employees hired in First HireYear Country
Rose 2010 Spain
2008, 2009, and 2010
Eric 2010 Spain
need to be appended.
19
10-10 Chapter 10 Combining SAS Data Sets
10.02 Quiz
How many observations will be in Emps after appending
the three data sets?
Emps2008
First Gender HireYear
Brett M 2008
l
Renee F 2008
Emps
ia
Emps2009
First Gender HireYear
r
First HireYear
Stacey F 2006
e
Sara 2009
Gloria F 2007
t
Dennis 2009
James M 2007
M a Emps2010
First HireYear Country
n
Rose 2010 Spain
y
Eric 2010 Spain
21
a r t i o
i e t b u
p r
10.03 Quiz
t r i S
How many variables will be in Emps after appending
r o dis SA
the three data sets?
Emps2008
S P o r o f First
Brett
Gender HireYear
M 2008
f e
Renee F 2008
A
S No tsit
Emps
First
Stacey
Gloria d
Gender HireYear
F
F
2006
2007
Emps2009
First HireYear
Sara
Dennis
2009
2009
ou
James M 2007
Emps2010
First HireYear Country
Rose 2010 Spain
Eric 2010 Spain
24
10.2 Appending a Data Set (Self-Study) 10-11
ia l
r
proc append base=Emps
data=Emps2008;
run;
a t e
y M n
26
a r t i o p110d01
i e t b u
p r t r i
Like-Structured Data Sets
S
o dis SA
84 proc append base=Emps
85 data=Emps2008;
r
86 run;
S P o o
WORK.EMPS2008.
f
NOTE: Appending WORK.EMPS2008 to WORK.EMPS.
r
NOTE: There were 2 observations read from the data set
A t f d e
NOTE: The data set WORK.EMPS has 5 observations and 3 variables.
S No tsiEmps
First
Stacey
Gender HireYear
F 2006
ou
Gloria F 2007
James M 2007
Brett M 2008
Renee F 2008
27
10-12 Chapter 10 Combining SAS Data Sets
t e r
a
proc append base=Emps
data=Emps2009;
M
run;
28
r y i o n p110d01
t a u t
r i e i b
Unlike-Structured Data Sets
r S
p
90
o dis SA
91
t
proc append base=Emps
data=Emps2009;
r
92 run;
P f
NOTE: Appending WORK.EMPS2009 to WORK.EMPS.
r
WARNING: Variable Gender was not found on DATA file.
o
NOTE: There were 2 observations read from the data set
S o
WORK.EMPS2009.
f e
NOTE: 2 observations added.
A
NOTE: The data set WORK.EMPS has 7 observations and 3 variables.
S No tsit Emps
First
Stacey
d Gender
F
HireYear
2006
ou
Gloria F 2007
James M 2007
Brett M 2008
Renee F 2008
Sara 2009
Dennis 2009
29
10.2 Appending a Data Set (Self-Study) 10-13
ia l
r
Dennis 2009
t e
The DATA= data set has a variable that is not in the
BASE= data set.
a
proc append base=Emps
M
data=Emps2010;
y n
o
run;
r i
p110d01
30
t a u t
r i e i b
Unlike-Structured Data Sets
r S
96
97
p t
proc append base=Emps
o dis SA
data=Emps2010;
r
98 run;
P f
NOTE: Appending WORK.EMPS2010 to WORK.EMPS.
r
WARNING: Variable Country was not found on BASE file. The
o
variable will not be added to the BASE file.
S o
WARNING: Variable Gender was not found on DATA file.
f e
ERROR: No appending done because of anomalies listed above.
A
Use FORCE option to append these files.
t d
NOTE: 0 observations added.
S No tsi
NOTE: The data set WORK.EMPS has 7 observations and 3 variables.
NOTE: Statements not processed because of errors noted above.
ou
errors.
31
10-14 Chapter 10 Combining SAS Data Sets
l
PROC APPEND BASE = SAS-data-set
ia
DATA = SAS-data-set FORCE;
RUN;
t e r
The FORCE option causes the extra variables to be
dropped and issues a warning message.
proc append base=Emps
M a
data=Emps2010 force;
32
run;
r y i o n p110d01
t a u t
The FORCE option is needed when the DATA= data set contains variables that either
r i e r i b
• are not in the BASE= data set
S
• do not have the same type as the variables in the BASE= data set
p
o dis SA t
• are longer than the variables in the BASE= data set.
P r
If the length of a variable is longer in the DATA= data set than in the BASE= data set, SAS truncates
f
values from the DATA= data set to fit them into the length that is specified in the BASE= data set.
S o r o
If the type of a variable in the DATA= data set is different than in the BASE= data set, SAS replaces
f
A t d e
all values for the variable in the DATA= data set with missing values and keeps the variable type of
the variable specified in the BASE= data set.
S No tsi
ou
10.2 Appending a Data Set (Self-Study) 10-15
l
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 2 observations read from the data set
ia
WORK.EMPS2010.
NOTE: 2 observations added.
r
NOTE: The data set WORK.EMPS has 9 observations and 3 variables.
a t e
y M n
33
a r t i o
i e t b u
p
Emps r t i
Unlike-Structured Data Sets
r S
r o dis SA
First
Stacey
Gender
F
HireYear
2006
P f
Gloria F 2007
r
James M 2007
S f o
Brett
Renee
M
F
e o 2008
2008
A
S No tsi
Sara
t
Dennis
Rose
Eric d
2009
2009
2010
2010
34
ou
10-16 Chapter 10 Combining SAS Data Sets
l
use the FORCE option in the PROC APPEND
contains variables statement to force theinconcatenation of the two
ia
DATA= data set Use the FORCE option the PROC APPEND
that are not in the data sets. The statement drops the extra
contains a variable statement to force the concatenation of the two
r
BASE= data
that is not set,
in the variables
data and
sets. The issues a drops
statement warning
themessage.
extra variable
e
BASE= data set. and issues a warning message.
a t
y M n
35
a r t i o
i e t b u
p r
10.04 Quiz
t r i S
How many observations will be in Emps if the program
r o dis SA
is submitted a second time?
S P o o f
Submitting this program once appends six observations
r
to the Emps data set, which results in a total of nine
f
observations.
A
S No tsit run; d e
proc append base=Emps
data=Emps2008;
ou
run;
proc append base=Emps
data=Emps2010 force;
run;
7 obs + 2 obs = 9 obs
37
10.2 Appending a Data Set (Self-Study) 10-17
Exercises
Level 1
ia l
e r
b. Submit the two PROC CONTENTS steps to compare the variables in the two data sets.
t
a
How many variables are in orion.price_current? ______________________________
y M
How many variables are in orion.price_new? ___________________________________
n
o
Does orion.price_new contain any variables that are not in
r t i
orion.price_current?_____________________________________________________
t a
i b u
c. Add a PROC APPEND step after the PROC CONTENTS steps to append orion.price_new
e
to orion.price_current. The FORCE option is not needed.
i
p r t r S
Why is the FORCE option not needed? _____________________________________________
r o dis SA
d. Submit the program and confirm that 88 observations from orion.price_new were added to
orion.price_current, which should now have 259 observations (171 original observations
S P o r o f
plus 88 appended observations).
f
Level 2
A
S No tsit d e
2. Appending Unlike-Structured Data Sets
a. Write and submit two PROC CONTENTS steps to compare the variables in
ou
orion.qtr1_2007 and orion.qtr2_2007.
c. Submit the PROC APPEND step and confirm that 22 observations were copied to Work.ytd.
d. Write another PROC APPEND step to append orion.qtr2_2007 to Work.ytd. The FORCE
option is needed.
Why is the FORCE option needed? _________________________________________________
10-18 Chapter 10 Combining SAS Data Sets
e. Submit the second PROC APPEND step and confirm that 36 observations from
orion.qtr2_2007 were added to Work.ytd, which should now have 58 observations.
Level 3
ia l
r
orion.shoes_eclipse and orion.shoes_tracker to orion.shoes.
a t e
Documentation on the DATASETS procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The DATASETS Procedure).
M n
c. Submit the PROC DATASETS step and confirm that orion.shoes contains 34 observations
y
a r t i o
(10 original observations plus 14 observations from orion.shoes_eclipse and 10
observations from orion.shoes_tracker).
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.3 Concatenating Data Sets 10-19
Objectives
Concatenate two or more SAS data sets by using
the SET statement in a DATA step.
Change the names of variables by using the
RENAME= data set option.
ia l
r
Compare the APPEND procedure to the SET
e
statement. (Self-Study)
a t
Interleave two or more SAS data sets by using the
SET and BY statements in a DATA step. (Self-Study)
y M n
a r t i o
i e t b u
i
42
p r t r S
r o dis SA
Appending and Concatenating
S P o r o f
Appending and concatenating involves combining SAS
data sets, one after the other, into a single SAS data set.
S No tsi
SAS Data Set
in the second data set directly to
the end of the original data set.
+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.
43
10-20 Chapter 10 Combining SAS Data Sets
DATA SAS-data-set;
SET SAS-data-set1 SAS-data-set2 . . .;
<additional SAS statements>
RUN;
ia l
t e r
Any number of data sets can be in the SET statement.
The observations from the first data set in the SET
a
statement appear first in the new data set. The
observations from the second data set follow those
y M
from the first data set, and so on.
n
44
a r t i o
i e t b u
You must know your data. By default, a compile-time error occurs if the same variable
is not the same type in all SAS data sets in the SET statement.
p r t r i S
r o dis SA
Like-Structured Data Sets
P f
Concatenate EmpsDK and EmpsFR to create a new
S f o r o
data set named EmpsAll1.
e
EmpsDK EmpsFR
A
S No tsit First
Lars
Kari
Jonas
M
F
M
d
Gender Country
Denmark
Denmark
Denmark
First Gender Country
Pierre M
Sophie F
France
France
ou
The data sets contain the same variables.
data EmpsAll1;
set EmpsDK EmpsFR;
run;
p110d02
45
10.3 Concatenating Data Sets 10-21
Compilation
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r
EmpsAll1
First Gender Country
M a
46
r y i o n ...
t a u t
r i
Executione r i b S
p
EmpsDK
o dis SA
First Gender
tCountry
EmpsFR
First Gender Country
P rLars
Kari
M
F
Denmark
f
Denmark
Pierre M
Sophie F
France
France
S
Jonas
f o r M
o
Denmark
e
data EmpsAll1;
A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;
Gender Country
Initialize PDV
EmpsAll1
First Gender Country
47
ou ...
10-22 Chapter 10 Combining SAS Data Sets
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r EmpsAll1
First Gender Country
Lars M
a
Denmark
M
48
r y i o n ...
t a u t
r i e
Execution
r i b S
p
EmpsDK
o dis SA
First
t
Gender Country
EmpsFR
First Gender Country
P r Lars
Kari
M
F
Denmark
f
Denmark
Pierre M
Sophie F
France
France
S f o r
Jonas M
o
Denmark
e
data EmpsAll1;
A
S No tsit PDV
First
d
set EmpsDK EmpsFR;
run;
Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country
ou
Lars M Denmark Lars M Denmark
49 ...
SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by
assignment statements are reset to missing, but variables that are read with a SET statement are not reset
to missing until the input SAS data set changes.
10.3 Concatenating Data Sets 10-23
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r
EmpsAll1
First Gender Country
Kari F Denmark
M a Lars M Denmark
50
r y i o n ...
t a u t
r i
Executione r i b S
p
EmpsDK
o dis SA
First Gender
tCountry
EmpsFR
First Gender Country
P rLars
Kari
M
F
Denmark
f
Denmark
Pierre M
Sophie F
France
France
S
Jonas
f o r M
o
Denmark
e
data EmpsAll1;
A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;
Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country
ou
Kari F Denmark Lars M Denmark
Kari F Denmark
51 ...
10-24 Chapter 10 Combining SAS Data Sets
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r EmpsAll1
First Gender Country
Jonas M
M a
Denmark Lars
Kari
M
F
Denmark
Denmark
52
r y i o n ...
t a u t
r i e
Execution
r i b S
p
EmpsDK
o dis SA
First
t
Gender Country
EmpsFR
First Gender Country
P r Lars
Kari
M
F
Denmark
f
Denmark
Pierre M
Sophie F
France
France
S f o r
Jonas M
o
Denmark
e
data EmpsAll1;
A
S No tsit PDV
First
d
set EmpsDK EmpsFR;
run;
Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country
ou
Jonas M Denmark Lars M Denmark
Kari F Denmark
Jonas M Denmark
53 ...
10.3 Concatenating Data Sets 10-25
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
EOF
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r
EmpsAll1
First Gender Country
Jonas M Denmark
M a Lars
Kari
M
F
Denmark
Denmark
n
Jonas M Denmark
54
a r y t i o ...
i e t b u
p r
Execution
EmpsDK
t r i S
EmpsFR
r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France
P f
Kari F Denmark Sophie F France
S
Jonas
f o r M
o
Denmark
e
data EmpsAll1;
A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;
Gender Country
Reinitialize PDV
EmpsAll1
First Gender Country
ou
Lars M Denmark
Kari F Denmark
Jonas M Denmark
55 ...
10-26 Chapter 10 Combining SAS Data Sets
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r EmpsAll1
First Gender Country
Pierre M France
M a Lars
Kari
M
F
Denmark
Denmark
n
Jonas M Denmark
56
a r y t i o ...
i e t b u
p r
Execution
EmpsDK
t r i S EmpsFR
r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France
P f
Kari F Denmark Sophie F France
S f o r
Jonas M
o
Denmark
e
data EmpsAll1;
A
S No tsit PDV d
set EmpsDK EmpsFR;
run;
ou
Pierre M France Lars M Denmark
Kari F Denmark
Jonas M Denmark
Pierre M France
57 ...
10.3 Concatenating Data Sets 10-27
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r
EmpsAll1
First Gender Country
Sophie F France
M a Lars
Kari
M
F
Denmark
Denmark
n
Jonas M Denmark
y
Pierre M France
58
a r t i o ...
i e t b u
p r
Execution
EmpsDK
t r i SEmpsFR
r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France
P f
Kari F Denmark Sophie F France
S
Jonas
f o r M
o
Denmark
e
data EmpsAll1;
A
S No tsit
PDV d
set EmpsDK EmpsFR;
run;
ou
Sophie F France Lars M Denmark
Kari F Denmark
Jonas M Denmark
Pierre M France
59 Sophie F France ...
10-28 Chapter 10 Combining SAS Data Sets
Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark EOF
Sophie F France
Jonas M Denmark
l
data EmpsAll1;
ia
set EmpsDK EmpsFR;
run;
PDV
First Gender Country
t e r EmpsAll1
First Gender Country
Sophie F France
M a Lars
Kari
M
F
Denmark
Denmark
n
Jonas M Denmark
y
Pierre M France
60
a r t i o Sophie F France
i e t b u
p r t i
Unlike-Structured Data Sets
r S
Concatenate EmpsCN and EmpsJP to create a new
r o dis SA
data set named EmpsAll2.
S P EmpsCN
o r
First
Chang
Gender
M
o f
Country
China
EmpsJP
First
Cho
Gender Region
F Japan
A t fLi
Ming
M
F
d e China
China
Tomi M Japan
ou
data EmpsAll2;
set EmpsCN EmpsJP;
run;
p110d03
61
10.3 Concatenating Data Sets 10-29
10.05 Quiz
How many variables will be in EmpsAll2
after concatenating EmpsCN and EmpsJP?
EmpsCN EmpsJP
First Gender Country First Gender Region
l
Chang M China Cho F Japan
ia
Li M China Tomi M Japan
Ming F China
t e r
M a
data EmpsAll2;
set EmpsCN EmpsJP;
n
run;
63
a r y t i o
i e t b u
p r
Compilation
EmpsCN
t r i S
EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
P f
Li M China Tomi M Japan
S
Ming
f o r F
o
China
A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP;
run;
ou
PDV
First Gender Country
65 ...
10-30 Chapter 10 Combining SAS Data Sets
Compilation
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
data EmpsAll2;
ia l
r
set EmpsCN EmpsJP;
run;
PDV
a t e
M
First Gender Country Region
66
r y i o n
t a u t
r i e
Final Results
r i b S
p
EmpsAll2
o dis SA
First
t
Gender Country Region
P r Chang
Li
M
M
China
f
China
S f o
Cho
r
Ming F
F
o
China
Japan
A
S No tsit
Tomi M
d e Japan
67
ou
10.3 Concatenating Data Sets 10-31
l
old-name-2 = new-name-2
ia
...
old-name-n = new-name-n))
t e r
The RENAME= option must be specified in parentheses
a
immediately after the appropriate SAS data set name.
If the RENAME= option is associated with an input data
y M
set in the SET statement, the action applies to the data
set that is being read.
n
68
a r t i o
i e t b u
p r t i
The RENAME= Data Set Option
r
SET statement examples:
S
r o dis SA
set EmpsCN(rename=(Country=Region))
S P r
EmpsJP;
o o f
A f e
set EmpsCN(rename=(First=Fname
t d
S No tsi
Country=Region))
EmpsJP(rename=(First=Fname));
ou
set EmpsCN
EmpsJP(rename=(Region=Country));
69
10-32 Chapter 10 Combining SAS Data Sets
10.06 Quiz
Which statement has correct syntax?
a. Answer
set EmpsCN(rename(Country=Location))
EmpsJP(rename(Region=Location));
l
b. Answer
set EmpsCN(rename=(Country=Location))
EmpsJP(rename=(Region=Location));
c. Answer
e r
set EmpsCN rename=(Country=Location) ia
t
EmpsJP rename=(Region=Location);
a
y M n
71
a r t i o
i e t b u
p r
Compilation
EmpsCN
t r i S EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
S P Li
o r
Ming
M
F
o f
China
China
Tomi M Japan
A t f e
data EmpsAll2;
d
S No tsi
set EmpsCN EmpsJP(rename=(Region=Country));
run;
ou
PDV
First Gender Country
p110d03
73 ...
10.3 Concatenating Data Sets 10-33
Compilation
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));
ia l
run;
t e r
PDV
First
M
Gender Country
a
77
r y i o n
t a u t
r i e
Final Results
r i b S
p
EmpsAll2
o dis SA t
r
First Gender Country
Chang M China
S P Li
Ming
Cho
o r
M
F
F f
China
o
China
Japan
A t f
Tomi M
d e Japan
S No tsi
78
ou
10-34 Chapter 10 Combining SAS Data Sets
ia l
the BASE= data set.
t e r
procedure does not process the observations from
a
The two methods are significantly different when the
variables differ between data sets.
y M n
80
a r t i o
i e t b u
p r
(Self-Study)
t i
APPEND Procedure versus SET Statement
r S
r o dis SA
Criterion APPEND Procedure SET Statement
S P Number of data
o r
concatenate
o
sets that you can
f
Uses two data sets. Uses any number
of data sets.
A t f d e
Handling of data Uses all variables in the
sets that contain BASE= data set and
Uses all variables
and assigns missing
ou
appropriate; cannot include
variables found only in the
DATA= data set.
81
10.3 Concatenating Data Sets 10-35
ia l
t e r
M a
83
r y i o n
t a u t
r i e i b
Interleaving (Self-Study)
r S
p t
Interleaving intersperses observations from two or more
o dis SA
data sets, based on one or more common variables.
f
r
interleaves SAS data sets.
S f o o
DATA SAS-data-set;
e
A
S No tsit RUN;
SET SAS-data-set1 SAS-data-set2 . . .;
d
BY <DESCENDING> by-variable(s);
<additional SAS statements>
The data sets must
ou
be sorted by the
BY variable.
Use the SORT procedure to sort
the data sets by the BY variable.
85
Typically, it is more efficient to sort small SAS data sets and then interleave them as opposed
to concatenating several SAS data sets and then sorting the resultant larger file.
10-36 Chapter 10 Combining SAS Data Sets
Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Chang
ia l
r
data EmpsAll2;
e
set EmpsCN EmpsJP(rename=(Region=Country));
t
by First;
a
run;
M
PDV
n
First Gender Country
86
a r y Chang
t i o
M China
p110d03
...
i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
P f
Li M China Tomi M Japan
S f o r
Ming F
o
China
Which value comes first?
A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Reinitialize PDV
Cho
ou
run;
PDV
First Gender Country
87 ...
10.3 Concatenating Data Sets 10-37
Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Cho
ia l
r
data EmpsAll2;
e
set EmpsCN EmpsJP(rename=(Region=Country));
t
by First;
a
run;
M
PDV
n
First Gender Country
88
a r yCho
t
F
i o
Japan
...
i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
P f
Li M China Tomi M Japan
S
Ming
f o r F
o
China
Which value comes first?
A
S No tsit d
data EmpsAll2;e
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Reinitialize PDV
Li
ou
run;
PDV
First Gender Country
89 ...
10-38 Chapter 10 Combining SAS Data Sets
Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Li
ia l
r
data EmpsAll2;
e
set EmpsCN EmpsJP(rename=(Region=Country));
t
by First;
a
run;
M
PDV
n
First Gender Country
90
a r y Li
t i o
M China
...
i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
P f
Li M China Tomi M Japan
S f o r
Ming F
o
China
Which value comes first?
A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Ming
ou
run;
PDV
First Gender Country
Ming F China
91 ...
10.3 Concatenating Data Sets 10-39
Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
EOF
Ming F China
Which value comes first?
Tomi
ia l
r
data EmpsAll2;
e
set EmpsCN EmpsJP(rename=(Region=Country));
Reinitialize PDV
t
by First;
a
run;
M
PDV
n
First Gender Country
92
a r y t i o ...
i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP
r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan
P f
Li M China Tomi M Japan
S
EOF
Ming
f o rF
o
China
Which value comes first?
A
S No tsit d
data EmpsAll2;e Tomi
ou
run;
PDV
First Gender Country
Tomi M Japan
93 ...
10-40 Chapter 10 Combining SAS Data Sets
Interleaving (Self-Study)
EmpsAll2
First Gender Country
Chang M China
EmpsCN Cho F Japan
Li M China EmpsJP
l
Ming F China
Tomi M Japan
e r ia
a t
y M n
94
a r t i o
i e t b u
In the case where the data values are equal, the observation is always read first
i
from the first data set listed in the SET statement.
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.3 Concatenating Data Sets 10-41
Exercises
Level 1
ia l
t e r
and orion.mnth9_2007 to create a new data set called Work.thirdqtr.
a
How many observations in Work.thirdqtr are from orion.mnth8_2007? ____________
M
r i o n
How many observations in Work.thirdqtr are from orion.mnth9_2007? ____________
y t
b. Write and submit a PROC PRINT step to create the following report:
t a u
Partial PROC PRINT Output (First 10 of 32 Observations)
e
r i
Obs
r i b
Order_ID
t
Order_
S
Type Employee_ID Customer_ID
Order_
Date
Delivery_
Date
r p
o dis SA
1
2
3
1242691897
1242736731
1242773202
2
1
3
99999999
121107
99999999
90
10
24
02JUL2007
07JUL2007
11JUL2007
04JUL2007
07JUL2007
14JUL2007
S P o
4
r
5
6
o f
1242782701
1242827683
1242836878
3
1
1
99999999
121105
121027
27
10
10
12JUL2007
17JUL2007
18JUL2007
17JUL2007
17JUL2007
18JUL2007
A t f 7
8
d e
1242838815
1242848557
1
2
120195
99999999
41
2806
19JUL2007
19JUL2007
19JUL2007
23JUL2007
S No tsi 9
10
1242923327
1242938120
3
1
99999999
120124
70165
171
28JUL2007
30JUL2007
29JUL2007
30JUL2007
ou
Level 2
c. Add a DATA step after the PROC CONTENTS steps to concatenate orion.sales and
orion.nonsales to create a new data set called Work.allemployees.
Use a RENAME= data set option to change the names of the different variables in
orion.nonsales.
l
d. Add a PROC PRINT step to create the following report:
ia
Partial PROC PRINT Output (First 10 of 400 Observations)
Obs
t e
Employee_ID
r First_
Name Last_Name Salary Job_Title
1
2
3
M a120102
120103
120121
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
108255
87975
26600
Sales
Sales
Sales
Manager
Manager
Rep. II
4
5
r y
120122
i o
120123
n Christina
Kimiko
Ngan
Hotstone
27475
26190
Sales
Sales
Rep. II
Rep. I
t
6 120124 Lucian Daymond 26480 Sales Rep. I
e t a
7
8
u
120125
120126
Fong
Satyakam
Hofmeister
Denny
32040
26780
Sales
Sales
Rep. IV
Rep. II
i b
9 120127 Sharryn Clarkson 28100 Sales Rep. II
p r
10
t r i 120128
S
Monica Kletschkus 30890 Sales Rep. IV
o dis SA
Level 3
P r
6. Interleaving Data Sets
f
S f o r o
Interleaving data sets is mentioned at the end of this section in a self-study section.
e
Further documentation can be found in the SAS Help and Documentation from the
A
S No tsit
Index tab by typing interleaving data sets.
d
a. Retrieve the starter program p110e06.
b. Add a PROC SORT step after the PROC SORT step in the starter program. The PROC SORT
ou
step needs to sort orion.shoes_tracker by Product_Name to create a new data set
called Work.trackersort.
l
5 Eclipse Shoes Big Guy Men's Air Terra Sebec Shoes 1303
6 Eclipse Shoes Big Guy Men's International Triax Shoes 1303
ia
7 Eclipse Shoes Big Guy Men's Multicourt Ii Shoes 1303
r
8 Eclipse Shoes Cnv Plus Men's Off Court Tennis 1303
e
9 Tracker Shoes Hardcore Junior/Women's Street Shoes Large 14682
t
10 Tracker Shoes Hardcore Men's Street Shoes Large 14682
a
The order of the observations will be different for z/OS (OS/390).
M
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10-44 Chapter 10 Combining SAS Data Sets
Objectives
Define the different types of match-merging.
Prepare data sets for merging using the SORT
procedure.
Merge SAS data sets one-to-one based on a common
ia l
r
variable by using the MERGE and BY statements in
e
a DATA step.
a t
Eliminate duplicate observations using the SORT
procedure. (Self-Study)
y M n
a r t i o
i e t b u
i
97
p r t r S
r o dis SA
Merging
S P o r o f
Merging involves combining observations from two
or more SAS data sets into a single observation in
f
a new SAS data set.
A t
S No + tsi d e
SAS Data Set SAS Data Set
ou
Observations can be merged based on their positions
in the original data sets or merged by one or more
common variables.
98
10.4 Merging Data Sets One-to-One 10-45
Match-Merging
Match-merging combines observations from two or more
SAS data sets into a single observation in a new data set
based on the values of one or more common variables.
A B C C D E A B C C D E
1
2
1
2
1
2
1
1
ia l
r
3 3 2 2
aA
t e B C C D E
M
1 2
n
2 3
y
4 4
99
a r t i o
i e t b u
p r
Match-Merging
t r i S One-to-One
o dis SA
A B C C D E
A single observation in one data set is
r
1 1
related to one and only one observation
2 2
P f
from another data set based on the
r
3 3 values of one or more selected variables.
S f o e o One-to-Many or Many-to-One
A
A B C C D E A single observation in one data set is
S No tsit 1
2
2 d
1
1
2
related to more than one observation
from another data set based on the
values of one or more selected variables
and vice versa.
ou
Nonmatches
A B C C D E
At least one single observation in one
1 2
data set is unrelated to any observation
2 3 from another data set based on the
4 4 values of one or more selected variables.
100
10-46 Chapter 10 Combining SAS Data Sets
Match-Merging
In order to perform match-merging, the observations in
each data set must be sorted by the one or more common
variables that are being matched.
General form of the SORT procedure:
ia l
RUN;
t e r
a
The SORT procedure orders SAS data set observations
by the values of one or more variables.
y M n
101
a r t i o
i e t b u
p r t r i
The SORT Procedure
S
o dis SA
PROC SORT DATA=input-SAS-data-set
P r <OUT=output-SAS-data-set>;
f
BY <DESCENDING> by-variable(s);
S f o r RUN;
o
A
S No tsit d e
The SORT procedure
rearranges the observations in a SAS data set
either replaces the original data set or creates
a new data set
ou
can sort on multiple variables
can sort in ascending (default) or descending order
does not generate printed output.
102
10.4 Merging Data Sets One-to-One 10-47
10.08 Quiz
Which step is sorting the observations in a SAS data set
and overwriting the same SAS data set?
a. Answer
proc sort data=work.EmpsAU
out=work.sorted;
by First;
l
run;
b. Answer
proc sort data=work.EmpsAU
out=orion.EmpsAU;
e r ia
t
by First;
run;
c. Answer
M a
proc sort data=work.EmpsAU;
104
by First;
run;
r y i o n
t a u t
r i e
The BY Statement
r i b S
p t
The BY statement specifies the sorting variables.
o dis SA
r
PROC SORT first arranges the data set by the values
in ascending order, by default, of the first BY variable.
S P o r o f
PROC SORT then arranges any observations that
have the same value of the first BY variable by the
A t f d e
values of the second BY variable in ascending order.
This sorting continues for every specified BY variable.
S No tsi The DESCENDING option reverses the sort order for the
ou
variable that immediately follows in the statement so that
observations are sorted from the largest value to the
smallest value.
106
10-48 Chapter 10 Combining SAS Data Sets
The BY Statement
BY statement examples:
by Last First;
ia l
by Last descending First;
t e r
a
by descending Last descending First;
M
107
r y i o n
t a u t
r i e i
Setup for the Poll
r b S
p t
Retrieve program p110a01.
o dis SA
r
Add a BY statement to the PROC SORT step to sort
the observations first by ascending Gender and
S P o r o f
then by descending Employee_ID within the
values of Gender.
A t f e
Complete the PROC PRINT statement to reference
d
the sorted data set.
S No tsi Submit the program and confirm the sort order in the
PROC PRINT output.
109
ou
10.4 Merging Data Sets One-to-One 10-49
l
c. 121144
ia
d. 121145
t e r
M a
110
r y i o n
t a u t
r i e i b
The MERGE and BY Statements
r S
p t
The MERGE statement in a DATA step joins observations
o dis SA
from two or more SAS data sets into single observations.
P r f
DATA SAS-data-set;
S f o r o
MERGE SAS-data-set1 SAS-data-set2 . . .;
BY <DESCENDING> by-variable(s);
A
S No tsit RUN;
d e
<additional SAS statements>
ou
a match-merge.
112
10-50 Chapter 10 Combining SAS Data Sets
ia l
t e r
M a
113
r y i o n
t a u t
r i e
One-to-One Merge
r i b S
p t
Merge EmpsAU and PhoneH by EmpID to create
o dis SA
a new data set named EmpsAUH.
P r EmpsAU
f PhoneH
S f o r
First
Togar M
o
Gender EmpID
121150
EmpID Phone
121150 +61(2)5555-1793
A
S No tsit
Kylie
Birin
F
M
d e 121151
121152
121151 +61(2)5555-1849
121152 +61(2)5555-1665
ou
data EmpsAUH;
merge EmpsAU PhoneH;
by EmpID;
run;
p110d05
114
10.4 Merging Data Sets One-to-One 10-51
Final Results
EmpsAUH
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1793
Kylie F 121151 +61(2)5555-1849
Birin M 121152 +61(2)5555-1665
ia l
t e r
M a
115
r y i o n
t a u t
r i e
10.10 Quiz
r i b S
p t
Retrieve program p110a02.
o dis SA
r
Complete the program to match-merge the sorted
SAS data sets referenced in the PROC SORT steps.
S P o r
necessary.
o f
Submit the program. Correct and resubmit, if
A t f d e
What are the modified, completed statements?
S No tsi
117
ou
10-52 Chapter 10 Combining SAS Data Sets
ia l
t e r
The EQUALS option maintains the relative order
of the observations within the input data set in
a
the output data set for observations with identical
BY values.
y M n
120
a r t i o
i e t b u
p r t i
Eliminating Duplicates with the
r
SORT Procedure (Self-Study)
S
r o dis SA
proc sort data=EmpsDUP
S P o r
by EmpID;
run;
o f
out=EmpsDUP1 nodupkey equals;
A t fEmpsDUP
d e EmpsDUP1
S No tsi First
Matt
Julie
Gender
M
F
EmpID
121160
121161
First
Matt
Julie
Gender
M
F
EmpID
121160
121161
ou
Brett M 121162 Brett M 121162
Julie F 121161 Julie F 121163
Chris F 121161
Julie F 121163
p110d04
121
10.5 Merging Data Sets One-to-Many 10-53
Objectives
Merge SAS data sets one-to-many based on a
common variable by using the MERGE and BY
l
statements in a DATA step.
e r ia
a t
y M n
a r t i o
i e t b u
i
123
p r t r S
r o dis SA
One-to-Many Merge
S P o o f
Merge EmpsAU and PhoneHW by EmpID to create
r
a new data set named EmpsAUHW.
A t f d e PhoneHW
S No tsi
EmpID Type Phone
EmpsAU
121150 Home +61(2)5555-1793
First Gender EmpID
121150 Work +61(2)5555-1794
Togar M 121150
121151 Home +61(2)5555-1849
ou
Kylie F 121151
121151 Work +61(2)5555-1850
Birin M 121152
121152 Home +61(2)5555-1665
121152 Work +61(2)5555-1666
data EmpsAUHW;
merge EmpsAU PhoneHW; The data sets are
by EmpID; sorted by EmpID.
run;
p110d06
124
10-54 Chapter 10 Combining SAS Data Sets
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
merge EmpsAU Initialize
by EmpID;
run;
PhoneHW;
t e r
PDV
PDV
M a
n
First Gender EmpID Type Phone
y
.
125
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S f o r
Birin M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Do the EmpIDs match?
Yes
ou
run;
PDV
First Gender EmpID Type Phone
.
126 ...
10.5 Merging Data Sets One-to-Many 10-55
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
r
Reads one observation
merge EmpsAU PhoneHW; from each matching
by EmpID;
run;
PDV
a t e data set
Togar M
y M
First Gender EmpID Type
n
Phone
121150 Home +61(2)5555-1793
127
a r t i o ...
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10-56 Chapter 10 Combining SAS Data Sets
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
a
Implicit OUTPUT;
Implicit RETURN;
PDV
Togar M
y M
First Gender EmpID Type
n
Phone
121150 Home +61(2)5555-1793
128
a r t i o ...
i e t b u
SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by
an assignment statement are reset to missing, but variables that are read with a MERGE statement are
r
not reset to missing.
p t r i S
o dis SA
Before reading additional observations during a match-merge, SAS first determines whether there are
observations remaining for the current BY group.
P r
• If there are observations remaining for the current BY group, they are read into the PDV, processed,
f
r
and written to the output data set.
S o o
• If there are no more observations for the current BY group, SAS reinitializes the remainder of the
f e
PDV, identifies the next BY group, and reads the corresponding observations.
A
S No tsit Executiond PhoneHW
ou
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
121152 Work +61(2)5555-1666
PDV
First Gender EmpID Type Phone
Togar M 121150 Home +61(2)5555-1793
129 ...
10.5 Merging Data Sets One-to-Many 10-57
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
PDV
M a Yes
n
First Gender EmpID Type Phone
y
Togar M 121150 Home +61(2)5555-1793
130
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S
Birin
f o r M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Reads the observation
from the appropriate
data set
ou
run;
PDV
First Gender EmpID Type Phone
Togar M 121150 Work +61(2)5555-1794
131 ...
10-58 Chapter 10 Combining SAS Data Sets
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
a
Implicit OUTPUT;
Implicit RETURN;
PDV
Togar M
y M
First Gender EmpID Type
n
Phone
121150 Work +61(2)5555-1794
132
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S f o r
Birin M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Do the EmpIDs match?
Yes
ou
run;
PDV
First Gender EmpID Type Phone
Togar M 121150 Work +61(2)5555-1794
133 ...
10.5 Merging Data Sets One-to-Many 10-59
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
PDV
M a No
n
First Gender EmpID Type Phone
y
Togar M 121150 Work +61(2)5555-1794
134
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S
Birin
f o r M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
ou
run; Reinitialize PDV
PDV
First Gender EmpID Type Phone
.
135 ...
10-60 Chapter 10 Combining SAS Data Sets
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
r
Reads one observation
merge EmpsAU PhoneHW; from each matching
by EmpID;
run;
PDV
a t e data set
Kylie F
y M
First Gender EmpID Type
n
Phone
121151 Home +61(2)5555-1849
136
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S f o r
Birin M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
ou
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
First Gender EmpID Type Phone
Kylie F 121151 Home +61(2)5555-1849
137 ...
10.5 Merging Data Sets One-to-Many 10-61
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
r
Do the EmpIDs match?
merge EmpsAU PhoneHW;
by EmpID;
run;
PDV
a t e No
Kylie F
y M
First Gender EmpID Type
n
Phone
121151 Home +61(2)5555-1849
138
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S
Birin
f o r M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Is either EmpID the
same as the EmpID
currently in the PDV?
ou
run;
Yes
PDV
First Gender EmpID Type Phone
Kylie F 121151 Home +61(2)5555-1849
139 ...
10-62 Chapter 10 Combining SAS Data Sets
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
r
Reads the observation
merge EmpsAU PhoneHW; from the appropriate
by EmpID;
run;
PDV
a t e data set
Kylie F
y M
First Gender EmpID Type
n
Phone
121151 Work +61(2)5555-1850
140
a r t i o ...
i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone
o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793
P r Togar
Kylie
M
F
121150
f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
S f o r
Birin M
o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665
e
121152 Work +61(2)5555-1666
A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
ou
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
First Gender EmpID Type Phone
Kylie F 121151 Work +61(2)5555-1850
141 ...
10.5 Merging Data Sets One-to-Many 10-63
Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
data EmpsAUHW;
121152 Work +61(2)5555-1666
ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
Continue until EOF
on both data sets
PDV
M a
n
First Gender EmpID Type Phone
y
Kylie F 121151 Work +61(2)5555-1850
142
a r t i o
i e t b u
p r
Final Results
EmpsAUHW
t r i S
r o dis SA
First
Togar
Gender
M
EmpID
121150
Type
Home
Phone
+61(2)5555-1793
S P Togar
r
Kylie
o
M
F
o f
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849
f
Kylie F 121151 Work +61(2)5555-1850
A
S No tsit
Birin
Birin
M
M
d e 121152
121152
Home
Work
+61(2)5555-1665
+61(2)5555-1666
143
ou
10-64 Chapter 10 Combining SAS Data Sets
Exercises
Level 1
ia l
e r
b. Submit the two PROC CONTENTS steps to determine the common variable among the
two data sets.
t
M a
c. Add a DATA step after the two PROC CONTENTS steps and prior to the PROC PRINT step to
merge orion.orders and orion.order_item by the common variable to create a new
r y
data set called Work.allorders.
i o n
a t
d. Submit the program and confirm that Work.allorders was created with 732 observations
i e t
and 12 variables.
b u
Level 2
p r t r i S
o dis SA
8. Merging orion.product_level and orion.product_list in a One-to-Many Merge
P r
a. Write a PROC SORT step to sort orion.product_list by Product_Level to create
f
a new data set called Work.product_list.
S o r o
b. Write a DATA step to merge orion.product_level with the previous sorted data set
f
A
S No tsit d e
by the appropriate common variable. Create a new data set called Work.listlevel.
c. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 10 of 556 Observations)
ou
Product_
Product_ Level_
Obs Product_ID Product_Name Level Name
Level 3
ia l
t e r
Base SAS 9.2 Procedures Guide Procedures The SQL Procedure).
b. Write a PROC PRINT step to create the following report:
M a
Partial PROC PRINT Output (First 10 of 556 Observations)
Product_ Product_Level_
Obs
r y
Product_ID
i o n
Product_Name Level Name
t
1 210000000000 Children 4 Product Line
2
3
e t a
210100000000
210100100000
u
Children Outdoors
Outdoor things, Kids
3
2
Product
Product
Category
Group
i b
4 210200000000 Children Sports 3 Product Category
p r
5
6
t r i
210200100000
210200100009
A-Team, Kids
S
Kids Sweat Round Neck,Large Logo
2
1
Product
Product
Group
o dis SA
7 210200100017 Sweatshirt Children's O-Neck 1 Product
8 210200200000 Bathing Suits, Kids 2 Product Group
P r 9
10
210200200022
f
210200200023
Sunfit Slow Swimming Trunks
Sunfit Stockton Swimming Trunks Jr.
1
1
Product
Product
S f o r o
A
S No tsit d e
ou
10-66 Chapter 10 Combining SAS Data Sets
Objectives
Control the observations in the output data set by
using the IN= option.
Output observations to multiple data sets using the
IN= option and the OUTPUT statement. (Self-Study)
ia l
r
Compare the results of a many-to-many merge
e
based on using the DATA step or the SQL procedure.
(Self-Study)
a t
y M n
a r t i o
i e t b u
i
147
p r t r S
r o dis SA
Nonmatches Merge
S P o r o f
Merge EmpsAU and PhoneC by EmpID to create
a new data set named EmpsAUC.
A t fEmpsAU
d e PhoneC
S No tsi
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
ou
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
by EmpID;
run;
p110d07
148
10.6 Merging Data Sets with Nonmatches 10-67
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAUInitialize
PhoneC; PDV
ia l
by EmpID;
run;
t e r
PDV
First Gender
M
EmpID a Phone
149
r y
.
i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P Kylie
Birin
o r
F
M
121151
o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f
data EmpsAUC;
S No tsi
merge EmpsAU PhoneC;
by EmpID; Yes
run;
150
ou
PDV
First Gender EmpID
.
Phone
...
10-68 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
Reads one observation
from each matching
ia l
by EmpID;
run;
t e r data set
PDV
First
M
Gender EmpID a Phone
151
Togar M
r y i o n
121150 +61(2)5555-1795
...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
o
F
M
o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;
Implicit OUTPUT;
ou
Implicit RETURN;
PDV
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795
152 ...
10.6 Merging Data Sets with Nonmatches 10-69
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
Do the EmpIDs match?
ia l
by EmpID;
run;
t e r No
PDV
First Gender EmpID
M a Phone
153
Togar M
r i o n
121150 +61(2)5555-1795
y ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P Kylie
Birin
o r
F
M
121151
o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f
data EmpsAUC;
S No tsi
merge EmpsAU PhoneC; same as the EmpID
by EmpID;
run; currently in the PDV?
ou
No
PDV
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795
154 ...
10-70 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
ia l
by EmpID;
run;
t e r
Reinitialize PDV
PDV
First Gender
M a
EmpID Phone
155
r y
.
i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
o
F
M
o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
Which EmpID
S No tsi
merge EmpsAU PhoneC; sequentially comes first?
by EmpID;
run; 121151
156
ou
PDV
First Gender EmpID
.
Phone
...
10.6 Merging Data Sets with Nonmatches 10-71
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
Reads the observation
from the EmpID that
ia l
by EmpID;
run;
PDV
First Gender EmpID
M a Phone
157
Kylie F
r
121151
y i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P Kylie
Birin
o r
F
M
121151
o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f
data EmpsAUC;
d e
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;
Implicit OUTPUT;
ou
Implicit RETURN;
PDV
First Gender EmpID Phone
Kylie F 121151
158 ...
10-72 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
Do the EmpIDs match?
ia l
by EmpID;
run;
t e r Yes
PDV
First
M
Gender EmpID a Phone
159
Kylie F
r y
121151
i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
o
F
M
o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
Is either EmpID the
S No tsi
merge EmpsAU PhoneC; same as the EmpID
by EmpID;
run; currently in the PDV?
ou
No
PDV
First Gender EmpID Phone
Kylie F 121151
160 ...
10.6 Merging Data Sets with Nonmatches 10-73
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
ia l
by EmpID;
run;
t e
Reinitialize PDVr
PDV
First Gender
M
EmpID a Phone
161
r y
.
i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P Kylie
Birin
o r
F
M
121151
o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f
data EmpsAUC;
S No tsi
merge EmpsAU PhoneC; from each matching
by EmpID; data set
run;
162
ou
PDV
First
Birin
Gender EmpID
M
Phone
121152 +61(2)5555-1667
...
10-74 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU PhoneC;
ia l
by EmpID;
run;
t e r
Implicit OUTPUT;
a
Implicit RETURN;
PDV
First
Birin M
y M
Gender EmpID Phone
n
121152 +61(2)5555-1667
163
a r t i o ...
i e t b u
p r
Execution
t r i S
o dis SA
EmpsAU PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
EOF
o
F
M
o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
Is the EmpID the same
S No tsi
merge EmpsAU PhoneC; as the EmpID currently
by EmpID;
run; in the PDV?
ou
No
PDV
First Gender EmpID Phone
Birin M 121152 +61(2)5555-1667
164 ...
10.6 Merging Data Sets with Nonmatches 10-75
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
EOF
data EmpsAUC;
merge EmpsAU PhoneC;
ia l
by EmpID;
run;
t e r
Reinitialize PDV
PDV
First Gender
M
EmpIDa Phone
165
r y
.
i o n ...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
EOF
o
F
M
o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
Reads the observation
S No tsi
merge EmpsAU PhoneC; from the appropriate
by EmpID; data set
run;
166
ou
PDV
First Gender EmpID Phone
121153 +61(2)5555-1348
...
10-76 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
EOF
data EmpsAUC;
merge EmpsAU PhoneC;
ia l
by EmpID;
run;
t e r
Implicit OUTPUT;
a
Implicit RETURN;
PDV
First Gender
y M
EmpID Phone
n
121153 +61(2)5555-1348
167
a r t i o ...
i e t b u
p r
Execution
t r i S
o dis SA
EmpsAU PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P r
Kylie
Birin
EOF
o
F
M
f
121151
o
121152
EOF
121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f e
data EmpsAUC;
d
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;
168
ou
PDV
First Gender EmpID Phone
121153 +61(2)5555-1348
10.6 Merging Data Sets with Nonmatches 10-77
Final Results
EmpsAUC
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin M 121152 +61(2)5555-1667
l
121153 +61(2)5555-1348
r
Matches are observations that contain data from
e ia
t
both input data sets.
a
Nonmatches are observations that contain data
from only one input data set.
M
169
r y i o n
t a u t
r i e
10.11 Quiz
r i b S
p t
How many observations in the final data set EmpsAUC
o dis SA
are considered nonmatches?
P r a. 1
f
S
b.
f
c.
o r
2
3
o
A
S No tsit
d. 4
d e
EmpsAUC
First Gender EmpID Phone
ou
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin M 121152 +61(2)5555-1667
121153 +61(2)5555-1348
171
10-78 Chapter 10 Combining SAS Data Sets
ia l
two possible values:
t e r
indicates that the data set did not
a
0
contribute to the current observation.
1
M
indicates that the data set did
n
contribute to the current observation.
y
174
a r t i o
i e t b u
The variable created with the IN= data set option is temporary. Therefore, the variable
is only available during the execution phase and is not written to the SAS data set.
p r t r i S
r o dis SA
The IN= Data Set Option
P f
MERGE statement examples:
S f o r o
merge EmpsAU(in=Emps)
A
S No tsit d e
PhoneC(in=Cell);
merge EmpsAU(in=E)
ou
PhoneC(in=P);
merge EmpsAU(in=AU)
PhoneC;
175
10.6 Merging Data Sets with Nonmatches 10-79
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU(in=Emps)
ia l
r
PhoneC(in=Cell);
e
by EmpID;
t
run;
PDV
M a
First Gender EmpID D Emps Phone D Cell
176
Togar M 121150
r y i o n
1 +61(2)5555-1795 1
p110d07
...
t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC
r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
S P Kylie
Birin
o r
F
M
121151
o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348
A t f
data EmpsAUC;
d e
merge EmpsAU(in=Emps)
S No tsi PhoneC(in=Cell);
by EmpID;
run;
177
ou
PDV
First Gender EmpID D Emps
Kylie F 121151 1
Phone D Cell
0
...
10-80 Chapter 10 Combining SAS Data Sets
Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
data EmpsAUC;
merge EmpsAU(in=Emps)
ia l
r
PhoneC(in=Cell);
e
by EmpID;
t
run;
PDV
M a
First Gender EmpID D Emps Phone D Cell
178
Birin M
r
121152
y i o n
1 +61(2)5555-1667 1
...
t a u t
r i e
10.12 Quiz
r i b S
p t
What are the values of Emps and Cell?
o dis SA
P r EmpsAU
First
f
Gender EmpID
PhoneC
EmpID Phone
S f o r
Togar
Kylie
M
F
o
121150
121151
121150 +61(2)5555-1795
121152 +61(2)5555-1667
e
Birin M 121152 121153 +61(2)5555-1348
A
S No tsit d
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
ou
run;
PDV
First Gender EmpID D Emps Phone D Cell
121153 +61(2)5555-1348
180
10.6 Merging Data Sets with Nonmatches 10-81
PDV Results
PDV
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie F 121151 1 0
Birin M 121152 1 +61(2)5555-1667 1
121153 0 +61(2)5555-1348 1
The variables created with the IN= data set option are
ia l
the SAS data set.
t e r
only available during execution and are not written to
M a
182
r y i o n
t a u t
r i e
10.13 Quiz
r i b S
p t
Which subsetting IF statement can be added
o dis SA
to the DATA step to only output the matches?
P r f
a. ;if Emps=1 and Cell=0;
S f o r o
b. ;if Emps=1 and Cell=1;
A
S No tsit d e
c. ;if Emps=1;
d. ;if Cell=0;
PDV
ou
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie F 121151 1 0
Birin M 121152 1 +61(2)5555-1667 1
121153 0 +61(2)5555-1348 1
184
10-82 Chapter 10 Combining SAS Data Sets
Matches Only
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
if Emps=1 and Cell=1;
run;
EmpsAUC
ia l
r
First Gender EmpID Phone
e
Togar M 121150 +61(2)5555-1795
t
Birin M 121152 +61(2)5555-1667
M a
186
r y i o n p110d07
t a u t
The subsetting IF controls which observations are further processed by the DATA step. In this example,
r i e r i b
the only processing that remains is the implied output at the bottom of the DATA step. Therefore, if the
condition evaluates to true, the observation is written to the SAS data set. If the condition is evaluated
S
p t
to false, the observation is not written to the SAS data set.
o dis SA
This subsetting IF statement can be rewritten as follows:
f
S f o r o
A
S No tsit d e
ou
10.6 Merging Data Sets with Nonmatches 10-83
EmpsAUC
ia l
r
First Gender EmpID Phone
e
Kylie F 121151
a t
y M n
187
a r t i o p110d07
i e t b u
This subsetting IF statement can be rewritten as follows:
p r t r i
if Emps and not Cell;
S
r o dis SA
Nonmatches from PhoneC Only
S P r
data EmpsAUC;
o o f
merge EmpsAU(in=Emps)
A t f e
PhoneC(in=Cell);
by EmpID;
d
S No tsi
if Emps=0 and Cell=1;
run;
ou
EmpsAUC
First Gender EmpID Phone
121153 +61(2)5555-1348
p110d07
188
All Nonmatches
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
if Emps=0 or Cell=0;
run;
EmpsAUC
ia l
r
First Gender EmpID Phone
e
Kylie F 121151
t
121153 +61(2)5555-1348
M a
189
r y i o n p110d07
t a u t
The subsetting IF statement can be rewritten as follows:
r i e i b
if not Emps or not Cell;
r S
p
o dis SA t
P r 10.14 Quiz
f
Desired SAS Data Sets
r o
X Y Z W X Y Z W
S o
3 30 40 2 60
f
Write an appropriate
A e
if A=1 and B=0;
IF statement to create
t
OR
S No tsi dataA
X Y
d
the desired data sets.
Z
dataB
X W
if A and not B;
X
1
Y
10
Z
20
W
50
X
1
Y
10
Z
20
W
50
ou
1 10 20 1 50
3 30 40 2 60
3 30 40 2 60
data new;
merge dataA(in=A)
dataB(in=B);
by X;
run; X Y Z W X Y Z W
1 10 20 50 2 60
new 3 30 40
X Y Z W
1 10 20 50
2 60
191 3 30 40
10.6 Merging Data Sets with Nonmatches 10-85
l
if Emps=1 and Cell=1
then output EmpsAUC;
ia
else if Emps=1 and Cell=0
r
then output EmpsOnly;
e
else if Emps=0 and Cell=1
t
then output PhoneOnly;
a
run;
y M n
194
a r t i o p110d07
i e t b u
p r t i
Outputting to Multiple Data Sets (Self-Study)
r S
An OUTPUT statement can be used in a conditional
r o dis SA
statement to write the current observation to a specific
data set that is listed in the DATA statement.
S P r o f
data EmpsAUC EmpsOnly PhoneOnly;
o
merge EmpsAU(in=Emps) PhoneC(in=Cell);
A t f by EmpID;
e
if Emps=1 and Cell=1
d
S No tsi
then output EmpsAUC;
else if Emps=1 and Cell=0
then output EmpsOnly;
else if Emps=0 and Cell=1
ou
then output PhoneOnly;
run;
p110d07
195
10-86 Chapter 10 Combining SAS Data Sets
EmpsOnly
First Gender EmpID
Kylie F 121151
Phone
ia l
PhoneOnly
t e r
a
First Gender EmpID Phone
121153 +61(2)5555-1348
y M n
196
a r t i o
i e t b u
p r t i
Many-to-Many Merge (Self-Study)
r S
Merge EmpsAUUS and PhoneO by Country to create
r o dis SA
a new data set named EmpsOfc.
S P EmpsAUUS
r
First
o
Gender
o f
Country
PhoneO
Country Phone
f
Togar M AU AU +61(2)5555-1500
A
S No tsit
Kylie
Stacey
Gloria
James
F
F
F
Md e AU
US
US
US
AU
AU
US
US
+61(2)5555-1600
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600
ou
data EmpsOfc;
merge EmpsAUUS PhoneO; The data sets are
by Country; sorted by Country.
run;
p110d08
197
l
Kylie F AU +61(2)5555-1700
ia
Stacey F US +1(305)555-1500
Gloria F US +1(305)555-1600
James M US
t e r +1(305)555-1600
M a
198
r y i o n
t a u t
r i e i b
Many-to-Many Merge (Self-Study)
r S
p t
The SQL procedure creates different results than the
o dis SA
DATA step for a many-to-many merge.
P r EmpsAUUS
f
PhoneO
r
First Gender Country Country Phone
S f o
Togar
Kylie
M
F
e o AU
AU
AU
AU
+61(2)5555-1500
+61(2)5555-1600
A
S No tsit
Stacey
Gloria
James
proc sql;
F
d
F
M
US
US
US
AU
US
US
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600
ou
create table EmpsOfc as
select First, Gender, PhoneO.Country, Phone
from EmpsAUUS, PhoneO
where EmpsAUUS.Country=PhoneO.Country;
p110d08
199
The SQL procedure is the SAS implementation of Structured Query Language. PROC SQL is part of Base
SAS software, and you can use it with any SAS data set. Often, PROC SQL can be an alternative to other
SAS procedures or the DATA step.
10-88 Chapter 10 Combining SAS Data Sets
l
Togar M AU +61(2)5555-1700
ia
Kylie F AU +61(2)5555-1500
Kylie F AU +61(2)5555-1600
Kylie
Stacey
Stacey
F
F
F
AU
US
US
t e r
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600
Gloria
Gloria
F
F
US
M
US
a +1(305)555-1500
+1(305)555-1600
n
James M US +1(305)555-1500
y o
James M US +1(305)555-1600
200
t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.6 Merging Data Sets with Nonmatches 10-89
Exercises
Level 1
ia l
e r
b. Add a DATA step after the PROC SORT step to merge Work.product and
t
orion.supplier by Supplier_ID to create a new data set called Work.prodsup.
a
c. Submit the program and confirm that Work.prodsup was created with 556 observations
M n
and 10 variables.
a r y t i o
d. Modify the DATA step to output only observations that are in Work.product but
not orion.supplier. A subsetting IF statement that references IN= variables in the MERGE
i e t b u
statement needs to be added.
i
e. Submit the program and confirm that Work.prodsup was created with 75 observations
r r S
and 10 variables. The supplier information will be missing in the PROC PRINT output.
p t
o dis SA
Level 2
r
S P o r o f
11. Merging Using the IN= and RENAME= Options
a. Write a PROC SORT step to sort orion.customer by Country to create a new data set
A t f e
called Work.customer.
d
S No tsi
b. Write a DATA step to merge the previous sorted data set with orion.lookup_country
by Country to create a new data set called Work.allcustomer.
ou
In the orion.lookup_country data set, Start needs to be renamed to Country and
Label needs to be renamed to Country_Name.
Include only the following four variables: Customer_ID, Country, Customer_Name, and
Country_Name.
10-90 Chapter 10 Combining SAS Data Sets
1 . AD Andorra
2 . AE United Arab Emirates
3 . AF Afghanistan
4 . AG Antigua/Barbuda
l
5 . AI Anguilla
6 . AL Albania
ia
7 . AM Armenia
r
8 . AN Netherlands Antilles
e
9 . AO Angola
t
10 . AQ Antarctica
a
11 . AR Argentina
12 . AS American Samoa
M
13 . AT Austria
n
14 29 AU Candy Kinsey Australia
y
15 41 AU Wendell Summersby Australia
a r t i o
d. Modify the DATA step to store only the observations that contain both customer information
i e t u
and country information. A subsetting IF statement that references IN= variables in the MERGE
statement needs to be added.
b
p r t r i S
e. Submit the program to create the following report:
o dis SA
Partial PROC PRINT Output (First 7 of 77 Observations)
P r Obs
f
Customer_ID Country Customer_Name
Country_
Name
S f o r 1
2
o 29
41
AU
AU
Candy Kinsey
Wendell Summersby
Australia
Australia
A
S No tsit d e 3
4
5
6
7
53
111
171
183
195
AU
AU
AU
AU
AU
Dericka Pockran
Karolina Dokter
Robert Bowerman
Duncan Robertshawe
Cosi Rimmington
Australia
Australia
Australia
Australia
Australia
ou
10.6 Merging Data Sets with Nonmatches 10-91
Level 3
l
Create two new data sets: Work.allorders and Work.noorders.
The data set Work.allorders should include all observations from Work.orders,
r
regardless of matches or nonmatches from the orion.staff data set.
e ia
a t
The data set Work.noorders should include the observations from orion.staff that do
not have a match in Work.orders.
y M
Include only the following six variables: Employee_ID, Job_Title, Gender, Order_ID,
Order_Type, and Order_Date.
n
a r t i o
Outputting to multiple data sets is mentioned at the end of this section in a self-study
i e t
section.
b u
p r t i
c. Using the new data sets, write two PROC PRINT steps to create two reports.
r S
d. Submit the program and confirm that Work.allorders was created with 490 observations
r o dis SA
and 6 variables and Work.noorders was created with 324 observations and 6 variables.
S P o r o f
A t f d e
S No tsi
ou
10-92 Chapter 10 Combining SAS Data Sets
Chapter Review
1. What are the three methods for combining SAS data
sets?
ia l
r
of a variable?
t e
3. What is a requirement of the input SAS data sets prior
to match-merging?
a
y M n
4. Which three statements must be used in a DATA step
r i o
to perform a match-merge?
a t
i e t b u
i
202 continued...
p r t r S
r o dis SA
Chapter Review
S P o r o f
5. Which data set option can be used to prevent non-
matches from being written to the output data sets
f
in a match-merge?
A
S No tsit d e
ou
203
10.8 Solutions 10-93
10.8 Solutions
Solutions to Exercises
1. Appending Like-Structured Data Sets
a. Retrieve the starter program.
b. Submit the two PROC CONTENTS steps.
proc contents data=orion.price_current;
ia l
r
run;
t e
proc contents data=orion.price_new;
run;
a
y M
How many variables are in orion.price_current? 6
n
o
How many variables are in orion.price_new? 5
a r t i
Does orion.price_new contain any variables that are not in orion.price_current?
t
No
i e i b u
r
c. Add a PROC APPEND step.
p t r S
proc append base=orion.price_current
o dis SA
data=orion.price_new;
r
run;
o o f
Why is the FORCE option not needed? The variables in the DATA= data set are all in the
r
A t f d e
d. Submit the program.
S No tsi
2. Appending Unlike-Structured Data Sets
a. Write and submit two PROC CONTENTS steps.
ou
proc contents data=orion.qtr1_2007;
run;
ia l
t r
a. Write and submit three PROC CONTENTS steps.
e
proc contents data=orion.shoes_eclipse;
run;
M a
proc contents data=orion.shoes_tracker;
run;
r y
proc contents data=orion.shoes;
i o n
run;
t a u t
r e r i b
b. Write a PROC DATASETS step.
i
proc datasets library=orion nolist;
S
p t
append base=shoes data=shoes_eclipse;
o dis SA
append base=shoes data=shoes_tracker force;
r
quit;
S P o o f
c. Submit the PROC DATASETS step.
r
4. Concatenating Like-Structured Data Sets
A t f e
a. Write and submit a DATA step.
d
S No tsi
data work.thirdqtr;
set orion.mnth7_2007 orion.mnth8_2007 orion.mnth9_2007;
run;
ou
How many observations in Work.thirdqtr are from orion.mnth7_2007? 10
ia l
orion.sales
t e r orion.nonsales
a
First_Name First
Last_Name
y M n
Last
o
c. Add a DATA step.
a r
data work.allemployees;
t t i
u
set orion.sales
e
orion.nonsales(rename=(First=First_Name Last=Last_Name));
run;
r i t r i b
keep Employee_ID First_Name Last_Name Job_Title Salary;
S
p
o dis SA
d. Add a PROC PRINT step.
r
proc print data=work.allemployees;
S P
run;
o r o
6. Interleaving Data Sets f
A f e
a. Retrieve the starter program.
t d
S No tsi
b. Add a PROC SORT step after the PROC SORT step in the starter program.
proc sort data=orion.shoes_eclipse
ou
out=work.eclipsesort;
by Product_Name;
run;
ia l
proc contents data=orion.order_item;
run;
t e r
data work.allorders;
M a
c. Add a DATA step prior to the PROC PRINT step.
merge orion.orders
r y
orion.order_item;
i o n
t
by Order_ID;
run;
e t a u
r i r i b
proc print data=work.allorders;
S
var Order_ID Order_Item_Num Order_Type
t
run;
r p
Order_Date Quantity Total_Retail_Price;
o dis SA
d. Submit the program.
S P o r o f
8. Merging orion.product_level and orion.product_list in a One-to-Many Merge
A f e
a. Write a PROC SORT step.
t d
S No tsi
proc sort data=orion.product_list
out=work.product_list;
by Product_Level;
ou
run;
b. Write a DATA step.
data work.listlevel;
merge orion.product_level work.product_list;
by Product_Level;
run;
c. Write a PROC PRINT step.
proc print data=work.listlevel;
var Product_ID Product_Name Product_Level Product_Level_Name;
run;
10.8 Solutions 10-97
t e r
a
10. Merging Using the IN= Option
M n
a. Retrieve the starter program.
r
b. Add a DATA step.
a y t i o
t
proc sort data=orion.product_list
out=work.product;
i e
by Supplier_ID;
i b u
run;
p r t r S
o dis SA
data work.prodsup;
r
merge work.product
orion.supplier;
S P
run;
o r
by Supplier_ID;
o f
A t f e
proc print data=work.prodsup;
d
S No tsi
var Product_ID Product_Name Supplier_ID Supplier_Name;
run;
c. Submit the program.
ou
d. Modify the DATA step.
data work.prodsup;
merge work.product(in=P)
orion.supplier(in=S);
by Supplier_ID;
if P=1 and S=0;
run;
e. Submit the program.
10-98 Chapter 10 Combining SAS Data Sets
ia l
r
orion.lookup_country(rename=(Start=Country
e
Label=Country_Name));
by Country;
a t
keep Customer_ID Country Customer_Name Country_Name;
run;
M
c. Write a PROC PRINT step.
y n
r
proc print data=work.allcustomer;
run;
a t i o
i e t
d. Modify the DATA step.
b u
r
data work.allcustomer;
r i
merge work.customer(in=Cust)
p t S
orion.lookup_country(rename=(Start=Country
r o dis SA Label=Country_Name)
in=Ctry);
S Pby Country;
o r o f
keep Customer_ID Country Customer_Name Country_Name;
if Cust=1 and Ctry=1;
A
run;
t f d e
S No tsi
e. Submit the program.
12. Merging and Outputting to Multiple Data Sets
ou
a. Write a PROC SORT step.
proc sort data=orion.orders
out=work.orders;
by Employee_ID;
run;
b. Write a DATA step.
data work.allorders work.noorders;
merge orion.staff(in=Staff) work.orders(in=Ord);
by Employee_ID;
if Ord=1 then output work.allorders;
else if Staff=1 and Ord=0 then output work.noorders;
keep Employee_ID Job_Title Gender Order_ID Order_Type Order_Date;
run;
10.8 Solutions 10-99
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10-100 Chapter 10 Combining SAS Data Sets
Business Scenario
ia l
1
t e r
MarSales data sets need to be combined
to create the Qtr1Sales data set.
concatenating
M a
The Sales data set needs to be combined
with the Target data set by month to merging
n
compare the sales data to the target data.
3
y t o
The OctSales data set needs to be
r i
added to the YTD data set.
a
appending
11
i e t b u
p r t r i S
r o dis SA
10.02 Quiz – Correct Answer
S P o o
the three data sets?
f
How many observations will be in Emps after appending
r
f
Emps2008
A
S No tsit Emps d e
9 observations First
Brett
Renee
Emps2009
Gender HireYear
M
F
2008
2008
ou
First Gender HireYear
First HireYear
Stacey F 2006
Sara 2009
Gloria F 2007
Dennis 2009
James M 2007
Emps2010
First HireYear Country
Rose 2010 Spain
Eric 2010 Spain
22
10.8 Solutions 10-101
l
Renee F 2008
Emps
ia
Emps2009
First Gender HireYear
r
First HireYear
Stacey F 2006
e
Sara 2009
Gloria F 2007
t
Dennis 2009
James M 2007
M a
The base data set variable
Emps2010
First HireYear Country
n
Rose 2010 Spain
information cannot change.
y
Eric 2010 Spain
25
a r t i o
i e t b u
p r t i
10.04 Quiz – Correct Answer
r S
How many observations will be in Emps if the program
r o dis SA
is submitted a second time?
S P r f
15 observations (9 + 2 + 2 + 2 )
o o
A f e
Be careful; observations are added to the BASE= data
t d
S No tsi
set every time that you submit the program.
38
ou
10-102 Chapter 10 Combining SAS Data Sets
EmpsCN EmpsJP
First Gender Country First Gender Region
l
Chang M China Cho F Japan
ia
Li M China Tomi M Japan
Ming F China
Four variables
t e r
a
First, Gender, Country, and Region
M
64
r y i o n
t a u t
r i e i b
10.06 Quiz – Correct Answer
r S
p t
Which statement has correct syntax?
o dis SA
P r a. Answer
set EmpsCN(rename(Country=Location))
f
EmpsJP(rename(Region=Location));
S f o r
b. Answer
o
set EmpsCN(rename=(Country=Location))
A
S No tsitc. Answer d e
EmpsJP(rename=(Region=Location));
72
ou
10.8 Solutions 10-103
data EmpsBonus;
ia l
r
Example:
set EmpsDK EmpsFR;
if Country='Denmark'
then Bonus=300;
else Bonus=500;
run;
a t e
y M n
84
a r t i o
i e t b u
p r t i
10.08 Quiz – Correct Answer
r S
Which step is sorting the observations in a SAS data set
r o dis SA
and overwriting the same SAS data set?
P f
a. Answer
proc sort data=work.EmpsAU
S f o r out=work.sorted;
by First;
o
e
run;
A
S No tsitb. Answer
d
proc sort data=work.EmpsAU
out=orion.EmpsAU;
by First;
ou
run;
c. Answer
proc sort data=work.EmpsAU;
by First;
run;
105
10-104 Chapter 10 Combining SAS Data Sets
l
c. 121144
ia
d. 121145
proc sort data=orion.sales
e
out=work.sortsales;
t r
by Gender descending Employee_ID;
a
run;
M
proc print data=work.sortsales;
var Gender Employee_ID First_Name
111
run;
Last_Name Salary;
r y i o n p110a01s
t a u t
r i e i b
10.10 Quiz – Correct Answer
r S
p t
What are the modified, completed statements?
o dis SA
P r proc sort data=orion.employee_payroll
f
out=work.payroll;
S f o r
run;
o
by Employee_ID;
A
S No tsit d e
proc sort data=orion.employee_addresses
out=work.addresses;
by Employee_ID;
run;
ou
data work.payadd;
merge work.payroll work.addresses;
by Employee_ID;
run;
118
10.8 Solutions 10-105
l
c. 3
ia
d. 4
EmpsAUC
First
t e
Gender EmpID r Phone
a
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin
y
M
M 121152 +61(2)5555-1667
n
121153 +61(2)5555-1348
172
a r t i o
i e t b u
p r t i
10.12 Quiz – Correct Answer
r S
What are the values of Emps and Cell?
r o dis SA
EmpsAU PhoneC
P f
First Gender EmpID EmpID Phone
S
Togar
f o
Kylie
r M
F
o
121150
121151
121150 +61(2)5555-1795
121152 +61(2)5555-1667
e
Birin M 121152 121153 +61(2)5555-1348
A
S No tsit d
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
ou
run;
PDV
First Gender EmpID D Emps Phone D Cell
121153 0 +61(2)5555-1348 1
181
10-106 Chapter 10 Combining SAS Data Sets
t e r
a
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie
Birin
y
F
M
M 121151
121152
n
1
1 +61(2)5555-1667
0
1
r o
121153 0 +61(2)5555-1348 1
i
185
t a u t
r i e
10.14 Quiz –
r i b S
Desired SAS Data Sets
p
o dis SA t
Correct Answer X
3
Y
30
Z
40
W X
2
Y Z W
60
r
Write an appropriate
if A=1 and B=0; if A=0 and B=1;
IF statement to create
P f
OR OR
r
the desired data sets. if A and not B; if not A and B;
S o
dataA
f e o dataB X Y Z W X Y Z W
A
X Y Z X W
1 10 20 50 1 10 20 50
t
1 10 20 1 50
d
3 30 40 2 60
S No tsi
3 30 40 2 60
if A=1; if B=1;
data new; OR OR
merge dataA(in=A) if A; if B;
dataB(in=B);
ou
by X;
run; X Y Z W X Y Z W
1 10 20 50 2 60
new 3 30 40
X Y Z W
1 10 20 50
if A=1 and B=1; if A=0 or B=0;
2 60 OR OR
192 3 30 40 if A and B; if not A or not B;
10.8 Solutions 10-107
M a
the RENAME= data set option
r y i o n
t a u t
204
r i e r i b S
continued...
p
o dis SA t
Chapter Review Answers
P r f
3. What is a requirement of the input SAS data sets prior
S f o r o
to match-merging?
The input SAS data sets must be sorted by the
A
S No tsit d e
BY variable.
ou
DATA
MERGE
BY
205 continued...
10-108 Chapter 10 Combining SAS Data Sets
ia l
t e r
M a
206
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 11 Enhancing Reports
t
a
11.3 Creating User-Defined Formats................................................................................. 11-32
M
r i o n
Exercises ............................................................................................................................ 11-43
y
a t
11.4 Subsetting and Grouping Observations .................................................................. 11-46
t u
r i e r i b
Exercises ............................................................................................................................ 11-52
S
p t
11.5 Directing Output to External Files ............................................................................ 11-55
o dis SA
r
Demonstration: Creating HTML, PDF, and RTF Files ........................................................ 11-64
S P o r o f
Demonstration: Creating Files That Open in Excel ............................................................ 11-77
Demonstration: Using Options with the EXCELXP Destination (Self-Study) ..................... 11-80
A t f e
Exercises ............................................................................................................................ 11-83
d
S No tsi
11.6 Chapter Review ........................................................................................................... 11-88
ou
11.7 Solutions ..................................................................................................................... 11-89
Solutions to Exercises ........................................................................................................ 11-89
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.1 Using Global Statements 11-3
Objectives
Identify SAS statements that are used with most
reporting procedures.
Enhance reports by using SAS system options.
Enhance reports by adding titles and footnotes.
ia l
t e r
Add dates and times to titles. (Self-Study)
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rCreating Reports
f
r
A procedure step is a primary method for creating reports.
S f o e oSAS Procedures
A
S No tsit d
for Creating
Reports
ou
PROC PROC
PRINT TABULATE
PROC PROC
FREQ MEANS
...
4
11-4 Chapter 11 Enhancing Reports
l
Obs Employee_ID Name Last_Name Salary
ia
1 120102 Tom Zhou 108255
r
2 120103 Wilson Dawes 87975
3 120121 Irenie Elvish 26600
e
4 120122 Christina Ngan 27475
t
5 120123 Kimiko Hotstone 26190
6 120124 Lucian Daymond 26480
7
8
9
120125
120126
120127
M a
Fong
Satyakam
Sharryn
Hofmeister
Denny
Clarkson
32040
26780
28100
n
10 120128 Monica Kletschkus 30890
a r y t i o p111d01
i e t b u
p r
options nocenter;
t i
Example of an Enhanced Report
r S
r o dis SA
ods html file='enhanced.html' style=sasweb;
proc print data=orion.sales label;
P
var Employee_ID First_Name Last_Name Salary;
S o r o
title2 'Males Only';
f
title1 'Orion Sales Employees';
f
footnote 'Confidential';
A
S No tsi e
label Employee_ID='Sales ID'
t d
First_Name='First Name'
Last_Name='Last Name'
Salary='Annual Salary';
format Salary dollar8.;
ou
where Gender='M';
by Country;
run;
ods html close;
p111d01
6
11.1 Using Global Statements 11-5
ia l
t e r
M a
7
r y i o n
t a u t
r i e i b
Statements That Enhance Reports
r S
p t
Many statements are used with most reporting
o dis SA
procedures to enhance the report.
P roptions nocenter;
f
ods html file='enhanced.html' style=sasweb;
S f o r o
proc print data=orion.sales label;
var Employee_ID First_Name Last_Name Salary;
e
title1 'Orion Sales Employees';
Some of these
A
S No tsit
title2 'Males Only';
d
statements are
footnote 'Confidential';
global
label Employee_ID='Sales statements.
ID'
First_Name='First Name'
Last_Name='Last Name'
ou
Salary='Annual Salary';
format Salary dollar8.;
where Gender='M';
by Country;
run;
ods html close;
8
11-6 Chapter 11 Enhancing Reports
Global Statements
The following are global statements that enhance reports:
OPTIONS
TITLE
FOOTNOTE
ODS
Global statements are specified anywhere in your
SAS program and they remain in effect until canceled,
ia l
t e r
changed, or your SAS session ends.
M a
9
r y i o n
t a u t
r i e i b
The OPTIONS Statement
r S
p t
The OPTIONS statement changes the value of one
o dis SA
or more SAS system options.
f
S f o r o
OPTIONS option(s);
A
S No tsit d e
Some SAS system options change the appearance
of a report.
The OPTIONS statement is not usually included
ou
in a PROC or DATA step.
10
11.1 Using Global Statements 11-7
NUMBER (default)
prints page numbers on the first line of each page
of SAS output.
ia l
NONUMBER
r
does not print page numbers on the first line of each
e
page of SAS output.
t
a
defines a beginning page number (n) for the next page
PAGENO=n
of SAS output.
y M n
11
a r t i o continued...
i e t b u
p r t i
SAS System Options for Reporting
r
Selected SAS System Options:
S
r o dis SA
CENTER (default) centers SAS output.
S P
NOCENTER
PAGESIZE=n
o r o f
left-aligns SAS output.
f
per page of SAS output.
e
PS=n
A
S No tsit
LINESIZE=width
LS=width
d
defines the line size (width) for the SAS log and
SAS output.
12
ou
11-8 Chapter 11 Enhancing Reports
ia l
N Mean
t e r
Analysis Variable : Salary
a
165 31160.12 20082.67 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
y M 80 characters wide
n
13
a r t i o p111d02
i e t b u
p r t i
SAS System Options for Reporting
r
options nodate pageno=1;
S
r o dis SA
proc freq data=orion.sales;
P f
tables Country;
r
run;
S f o e o
A
1
S No tsit d
Country Frequency
The FREQ Procedure
Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ou
AU 63 38.18 63 38.18
US 102 61.82 165 100.00
80 characters wide
p111d02
14
11.1 Using Global Statements 11-9
ia l
r
DTRESET
of SAS output.
NODTRESET
(Default)
t e
does not update date and time at the top of each
a
page of SAS output.
y M n
16
a r t i o
i e t b u
p r
11.01 Poll
t r i S
Did the date and/or time change?
r o dis SA
Yes
No
S P o r o f
A t f d e
S No tsi
17
ou
11-10 Chapter 11 Enhancing Reports
ia l
t e r
The value of n can be from 1 to 10.
An unnumbered TITLE is equivalent to TITLE1.
a
Titles remain in effect until they are changed,
canceled, or you end your SAS session.
M
19
r y i o n
t a u t
r i e i b
The FOOTNOTE Statement
r S
p t
The FOOTNOTE statement specifies footnote lines
o dis SA
for SAS output.
f
S f o r o
FOOTNOTEn 'text ';
A
S No tsit d e
Footnotes appear at the bottom of the page.
No footnote is printed unless one is specified.
The value of n can be from 1 to 10.
An unnumbered FOOTNOTE is equivalent to
ou
FOOTNOTE1.
Footnotes remain in effect until they are changed,
canceled, or you end your SAS session.
20
11.1 Using Global Statements 11-11
l
run;
e r ia
a t
y M n
21
a r t i o p111d03
i e t b u
p r t r i
The TITLE and FOOTNOTE Statements
S
o dis SA
Orion Star Sales Employees
r
The MEANS Procedure
P f
Analysis Variable : Salary
S
N
o r Mean
f e
165 31160.12 20082.67 22710.00 243190.00
A
S No tsit
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
d
ou
By Human Resource Department
Confidential
22
11-12 Chapter 11 Enhancing Reports
ia l
t e r
M a
23
r y i o n
t a u t
r i e i b
Canceling All Titles and Footnotes
r S
p t
The null TITLE statement cancels all titles.
o dis SA
r
title;
S P o rfootnote;
o f
The null FOOTNOTE statement cancels all footnotes.
A t f d e
S No tsi
24
ou
11.1 Using Global Statements 11-13
l
title2 'The Next Line'; The Next Line
run;
proc print data=orion.sales;
title 'The Top Line';
run;
e r
The Top Line
ia
run;
a
title3 'The Third Line';
t
proc print data=orion.sales; The Top Line
M
proc print data=orion.sales;
title;
y n
r o
run;
i
35
t a u t
r i e
11.02 Quiz
r i b S
p t
Which footnote(s) appears in the second procedure
o dis SA
output?
f
c. Non Confidential
Sales Employees
S
b.
f o r o
Orion Star
Non Sales Employees d. Orion Star
Non Sales Employees
A e
Confidential
S No tsit d
footnote1 'Orion Star';
proc print data=orion.sales;
footnote2 'Sales Employees';
ou
footnote3 'Confidential';
run;
proc print data=orion.nonsales;
footnote2 'Non Sales Employees';
run;
37
11-14 Chapter 11 Enhancing Reports
ia l
Example Title Output:
t e r
a
Orion Star Employee Listing
Created on 11MAR2008 at 15:53
y M n
39
a r t i o p111d04
i e t b u
p r t i
Titles with Dates and Times (Self-Study)
r S
The %LET statement can be used with %SYSFUNC and
r o dis SA
the TODAY function or the TIME function to create a
macro variable with the current date or time.
S P o r o f
%LET macro-variable = %SYSFUNC(today(), date-format);
A t f d e
%LET macro-variable = %SYSFUNC(time(), time-format);
ou
variable and assigns it a value without leading or
trailing blanks.
%SYSFUNC is a macro function that executes
SAS functions outside of a step.
40
11.1 Using Global Statements 11-15
l
title2 "Created ¤tdate";
ia
title3 "at ¤ttime";
run;
t e r
a
Orion Star Employee Listing
Created March 11, 2008
M
at 4:09:43 PM
41
r y i o n p111d04
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-16 Chapter 11 Enhancing Reports
Exercises
Level 1
ia l
e r
b. Use the OPTIONS statement to establish these system options for the PROC MEANS report:
t
a
1) Suppress the page numbers that appear at the top of each output page.
M
2) Suppress the date and time that appear at the top of each output page.
r y i o
PROC MEANS step finishes. n
3) Limit the number of lines per page to 18 for the report. Reset the option value to 52 after the
t a u t
c. Specify the following title for the report: Orion Star Sales Report.
i e i b
d. Specify the following footnote for the report: Report by SAS Programming Student.
r r S
p t
e. After the PROC MEANS step finishes, cancel the footnote.
o dis SA
r
f. Submit the program to create the following PROC MEANS report:
S No tsi Analysis Variable : Total_Retail_Price Total Retail Price for This Product
ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
617 162.2001053 233.8530183 2.6000000 1937.20
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Level 2
l
c. Request that each report contain page numbers starting at 1.
ia
d. Request that the current date and time be displayed at the top of each page; not the date and
t e r
time that the SAS session began.
e. Specify the following title to appear in both reports: Orion Star Sales Analysis.
a
f. Specify a secondary title to appear in the first report with a blank line between the titles:
M n
Catalog Sales Only
a r y t i o
g. Specify the following footnote for the first report:
t
Based on the previous day's posted data
i e i b u
The text specified for a title or footnote can be enclosed in single quotation marks or
p r r S
double quotation marks. Use double quotation marks when the text contains an
t
apostrophe.
r o dis SA
h. Specify a secondary title to appear in the second report with a blank line between the titles:
A t f d e
S No tsi
ou
11-18 Chapter 11 Enhancing Reports
l
Analysis Variable : Total_Retail_Price Total Retail Price for This Product
ia
N Mean Std Dev Minimum Maximum
r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
e
170 199.5961765 282.9680817 2.6000000 1937.20
t
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
M a
r y i o n
Based on the previous day's posted data
t a u t
r i e r i b S
Orion Star Sales Analysis
p
o dis SA t The MEANS Procedure
f
S f o r N
A
S No tsit
Level 3
123
ou
3. Inserting Dates and Times into Titles
a. Use the OPTIONS procedure to verify that the date and time will not be automatically
displayed at the top of each page. If the option is not set correctly, change it.
Documentation about the OPTIONS procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The OPTIONS Procedure).
Look for an option in the PROC OPTIONS statement that can display the current
setting of a single option.
b. Retrieve the starter program p111e03.
11.1 Using Global Statements 11-19
c. Add a title with the following text, substituting the current date and time:
Sales Report as of 4:57 PM on Monday, January 28, 2008
An example of this technique is shown in the self-study material at the end of this
section.
d. Submit the program to create the following report:
PROC MEANS Output
Sales Report as of 4:57 PM on Monday, January 28, 2008
ia l
t e r The MEANS Procedure
n
617 162.2001053 233.8530183 2.6000000 1937.20
a r y t i o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11-20 Chapter 11 Enhancing Reports
Objectives
Display descriptive column headings using the
LABEL statement.
Display formatted values using the FORMAT
statement.
ia l
t e r
M a
r y i o n
t a u t
45
r i e r i b S
p
o dis SA t
P r Labels and Formats (Review)
f
r
When displaying reports,
S f o o
a label changes the appearance of a variable name
e
a format changes the appearance of variable value.
A
S No tsit Obs
1
d
Employee_ID
120102
Job_Title
Sales Manager
Annual
Salary
$108,255
Label
ou
2 120103 Sales Manager $87,975
3 120121 Sales Rep. II $26,600
4 120122 Sales Rep. II $27,475 Format
5 120123 Sales Rep. I $26,190
46
11.2 Adding Labels and Formats 11-21
ia l
t e r
A label can be up to 256 characters.
Labels are used automatically by many procedures.
a
The PRINT procedure uses labels when the LABEL
or SPLIT= option is specified in the PROC PRINT
statement.
M
47
r y i o n
t a u t
r i e i b
Assigning Temporary Labels
r S
p t
PROC FREQ automatically uses labels.
o dis SA
r
proc freq data=orion.sales;
tables Gender;
S P run;
o o f
label Gender='Sales Employee Gender';
r
A t f d e The FREQ Procedure
S No tsiGender
Sales Employee Gender
Frequency Percent
Cumulative
Frequency
Cumulative
Percent
ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68 41.21 68 41.21
M 97 58.79 165 100.00
p111d05
48
11-22 Chapter 11 Enhancing Reports
ia l
Partial PROC PRINT Output
Obs Employee_ID
t e r
Job_Title Salary
a
1 120102 Sales Manager 108255
2 120103 Sales Manager 87975
3 120121 Sales Rep. II 26600
M
4 120122 Sales Rep. II 27475
n
5 120123 Sales Rep. I 26190
49
a r y t i o p111d05
i e t b u
p r t i
Assigning Temporary Labels
r S
The LABEL option tells PROC PRINT to use labels.
r o dis SA
proc print data=orion.sales label;
var Employee_ID Job_Title Salary;
S P o r o f
label Employee_ID='Sales ID'
Job_Title='Job Title'
Salary='Annual Salary';
A t frun;
d e
S No tsi Partial PROC PRINT Output
Obs Sales ID Job Title
Annual
Salary
ou
1 120102 Sales Manager 108255
2 120103 Sales Manager 87975
3 120121 Sales Rep. II 26600
4 120122 Sales Rep. II 27475
5 120123 Sales Rep. I 26190
p111d05
50
SPLIT='split-character'
ia l
t e r
M a
51
r y i o n
t a u t
Without the SPLIT= option, PROC PRINT can split the headers at special characters such
r i e r i b
as the blank or underscore or in mixed-case values when going from lowercase to uppercase.
S
p
o dis SA t
Assigning Temporary Labels
P r f
The SPLIT= option makes PROC PRINT use labels.
S f o r o
proc print data=orion.sales split='*';
var Employee_ID Job_Title Salary;
e
label Employee_ID='Sales ID'
A
S No tsit run;
d
Job_Title='Job*Title'
Salary='Annual*Salary';
ou
Partial PROC PRINT Output
Job Annual
Obs Sales ID Title Salary
l
Bonus=Salary*0.10;
ia
label Salary='Annual*Salary'
Bonus='Annual*Bonus';
run; r
keep Employee_ID First_Name
e
Last_Name Salary Bonus;
t
run;
M a
proc print data=orion.bonus split='*';
53
r y i o n p111d05
t a u t
r i e i b
Assigning Permanent Labels (Review)
r S
p
o dis SA t
Partial PROC PRINT Output
First_ Annual Annual
P r Obs
1
Employee_ID
120102
f
Name
Tom
Last_Name
Zhou
Salary
108255
Bonus
10825.5
S f
2
3
4
o r o
120103
120121
120122
Wilson
Irenie
Christina
Dawes
Elvish
Ngan
87975
26600
27475
8797.5
2660.0
2747.5
A e
5 120123 Kimiko Hotstone 26190 2619.0
t
6 120124 Lucian Daymond 26480 2648.0
S No tsi
7
8
9
10 d 120125
120126
120127
120128
Fong
Satyakam
Sharryn
Monica
Hofmeister
Denny
Clarkson
Kletschkus
32040
26780
28100
30890
3204.0
2678.0
2810.0
3089.0
54
ou
11.2 Adding Labels and Formats 11-25
11.03 Quiz
Which statement is true concerning the
PROC PRINT output for Bonus?
data orion.bonus;
set orion.sales;
ia l
r
Bonus=Salary*0.10;
e
label Bonus='Annual Bonus';
t
run;
M a
proc print data=orion.bonus label;
label Bonus='Mid-Year Bonus';
n
run;
56
a r y t i o p111d05
i e t b u
p r t i
The FORMAT Statement (Review)
r S
The FORMAT statement assigns formats to
r o dis SA
variable values.
General form of the FORMAT statement:
S P o r o f
FORMAT variable(s) format;
A t f e
A format is an instruction that SAS uses to write
d
data values.
59
ou
11-26 Chapter 11 Enhancing Reports
11.04 Quiz
Which displayed value is incorrect for the given format?
Format Stored Value Displayed Value
$3. Wednesday Wed
6.1 1234.345 1234.3
COMMAX5.
DOLLAR9.2
1234.345
1234.345 $1,234.35
1.234
ia l
DDMMYY8.
DATE9.
t e r 0
0
01/01/1960
01JAN1960
YEAR4.
M a 0 1960
61
r y i o n
t a u t
r i e i b
Assigning Temporary Formats
r S
p
o dis SA t
proc print data=orion.sales label;
var Employee_ID Job_Title Salary
f
S f o r
format Salary dollar10.0
run;
o
Birth_Date Hire_Date monyy7.;
A
S No tsi
Obs
t d
Sales ID
e
Partial PROC PRINT Output
Job Title
Annual
Salary Country
Date of
Birth
Date of
Hire
ou
1 120102 Sales Manager $108,255 AU AUG1969 JUN1989
2 120103 Sales Manager $87,975 AU JAN1949 JAN1974
3 120121 Sales Rep. II $26,600 AU AUG1944 JAN1974
4 120122 Sales Rep. II $27,475 AU JUL1954 JUL1978
5 120123 Sales Rep. I $26,190 AU SEP1964 OCT1985
p111d06
63
11.2 Adding Labels and Formats 11-27
Cumulative Cumulative
ia l
Hire_Date
1974
1975
Frequency
23
2
e r
Percent
13.94
1.21
Frequency
23
25
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
13.94
15.15
a
1976 4 2.42 29 17.58
1977 3 1.82 32 19.39
M
1978 7 4.24 39 23.64
n
1979 3 1.82 42 25.45
64
a r y t i o p111d06
i e t b u
p r t i
Assigning Permanent and Temporary Formats
r S
Using a FORMAT statement in a DATA step permanently
r o dis SA
associates formats with variables by storing the format in
the descriptor portion of the SAS data set.
S P r
data orion.bonus;
o
set orion.sales;
o f
f
Bonus=Salary*0.10;
A
S No tsit
run; d e
format Salary Bonus comma8.;
keep Employee_ID First_Name
Last_Name Salary Bonus;
ou
proc print data=orion.bonus;
format Bonus dollar8.;
run;
l
4 120122 Christina Ngan 27,475 $2,748
5 120123 Kimiko Hotstone 26,190 $2,619
ia
6 120124 Lucian Daymond 26,480 $2,648
7 120125 Fong Hofmeister 32,040 $3,204
r
8 120126 Satyakam Denny 26,780 $2,678
9 120127 Sharryn Clarkson 28,100 $2,810
10 120128
a t e
Monica Kletschkus 30,890 $3,089
y M n
66
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.2 Adding Labels and Formats 11-29
Exercises
Level 1
ia l
e r
b. Modify the column heading for each variable as shown in the sample output that follows.
t
a
c. Display all dates in the form ddMONyyyy. If you are running SAS 9.2, specify a width of 11
for the format to obtain the hyphens as shown in the sample output that follows. Otherwise,
y M
use a width of 9; the hyphens will not appear.
n
r t i o
d. Display each salary with dollar signs, commas, and two decimal places as shown in the sample
output that follows. No salary in the data set exceeds $500,000.
a
i e t b u
e. Submit the program to produce the following report:
p r t i
Partial PROC PRINT Output
r o dis SA Obs
Employee
Number
Annual
Salary Birth Date Hire Date
Termination
Date
S P o r 9
11
o f 120109
120111
$26,495.00
$26,895.00
15-DEC-1986
23-JUL-1949
01-OCT-2006
01-NOV-1974
.
.
A t f 12
d
14
e
120112
120114
$26,550.00
$31,285.00
17-FEB-1969
08-FEB-1944
01-JUL-1990
01-JAN-1974
.
.
S No tsi 18
20
23
120118
120120
120123
$28,090.00
$27,645.00
$26,190.00
03-JUN-1959
05-MAY-1944
28-SEP-1964
01-JUL-1984
01-JAN-1974
01-OCT-1985
.
.
31-JAN-2005
ou
35 120135 $32,490.00 26-JAN-1969 01-OCT-1997 30-APR-2004
47 120147 $26,580.00 19-JAN-1988 01-OCT-2006 .
51 120151 $26,520.00 21-NOV-1944 01-JAN-1974 .
Level 2
d. Show the customer's ID with exactly six digits, including leading zeros if necessary.
ia l
Partial PROC PRINT Output
M a Obs
Customer
ID
First
Initial
Last
Name
Birth
Year
r y i o
47
n 000544 A Argac 1964
a t
48 000908 A Umran 1979
t
49 000928 B Urfalioglu 1969
i e i b u 50
51
001033
001100
S
A
Okay
Canko
1979
1964
p r t r
52
S
55
001684
002788
C
S
Aydemir
Yucel
1974
1944
r o dis SA
Level 3
P r f
6. Applying Permanent Labels and Formats
S o o
f
a. Retrieve the starter program p111e06.
A
S No tsit d e
b. Add permanent variable labels and formats to the Work.otherstatus data set so that
those attributes need not be repeated in subsequent steps.
1) Variable labels:
ou
• Employee_ID Employee Number
• Employee_Hire_Date Hired
c. Override the permanent attributes within the PROC FREQ step so that the hire dates are
grouped by calendar quarter in the form yyyyQq and the report explicitly states that the
counts are by quarter as shown in the sample output that follows.
ia l
d. Submit the program to produce the following reports. Verify that the variable attributes appear
in the PROC CONENTS output.
Partial PROC PRINT Output
t e r
a
Employees who are listed with Marital Status=O
M
Employee
n
Obs Number Hired
a r y t i o 1
2
120102
120117
1989.06.01
1986.04.01
i e t b u
3
4
120126
120145
2006.08.01
1985.06.01
i
5 120149 1993.01.01
p r t r
Partial PROC CONTENTS Output
S
o dis SA
Employees who are listed with Marital Status=O
P r f
The CONTENTS Procedure
A
S No tsit d
#
2
1
e Variable
Employee_Hire_Date
Employee_ID
Type
Num
Num
Len
8
8
Format
YYMMDDP10.
12.
Label
Hired
Employee Number
ou
Partial PROC FREQ Output
Employees who are listed with Marital Status=O
Quarter Hired
Objectives
Create user-defined formats using the FORMAT
procedure.
Apply user-defined formats to variables in reports.
ia l
t e r
M a
r y i o n
t a u t
70
r i e r i b S
p
o dis SA t
P r User-Defined Formats
f
r
A user-defined format needs to be created for Country.
S f o o
Current Report (partial output)
e
A t
Annual Date of Date of
d
Obs Sales ID Job Title Salary Country Birth Hire
S No tsi 61
62
63
64
65
120179
120180
120198
120261
121018
Sales
Sales
Sales
Chief
Sales
Rep. III
Rep. II
Rep. III
Sales Officer
Rep. II
$28,510
$26,970
$28,025
$243,190
$27,560
AU
AU
AU
US
US
MAR1974
JUN1954
JAN1988
FEB1969
JAN1944
JAN2004
DEC1978
DEC2006
AUG1987
JAN1974
ou
66 121019 Sales Rep. IV $31,320 US JUN1986 JUN2004
71
11.3 Creating User-Defined Formats 11-33
User-Defined Formats
To create and use your own formats, do the following:
ia l
t e r
statement in the reporting procedure.
M a
72
r y i o n
t a u t
r i e
The FORMAT Procedure
r i b S
p t
The FORMAT procedure is used to create user-defined
o dis SA
formats.
73
ou
11-34 Chapter 11 Enhancing Reports
ia l
as the first character
cannot end in a number
t e r
cannot be the name of a SAS format
a
does not end with a period in the VALUE statement.
M
74
r y i o n
t a u t
Format names prior to SAS®9 are limited to 8 characters.
r i e r i b S
p t
11.05 Multiple Answer Poll
o dis SA
P r Which user-defined format names are invalid?
f
r
a. $stfmt
S f o
b. $3levels
e o
A
c. _4years
S No tsit d.
e.
d
salranges
dollar
ou
76
11.3 Creating User-Defined Formats 11-35
Labels
can be up to 32,767 characters in length
ia l
t e
although it is not required.r
are typically enclosed in quotation marks,
M a
78
r y i o n
t a u t
r i e i b
Character User-Defined Format
r S
p
o dis SA t
character discrete
r
format character
name values
S P o r
proc format;
value $ctryfmt
o f 'AU' = 'Australia'
A t f d e
'US' = 'United States'
other = 'Miscoded';
S No tsi
run;
keyword labels
ou
The OTHER keyword matches all values that do not
match any other value or range.
p111d07
79
11-36 Chapter 11 Enhancing Reports
ia l
r
label Employee_ID='Sales ID'
Job_Title='Job Title'
e
Salary='Annual Salary'
t
Birth_Date='Date of Birth'
a
Hire_Date='Date of Hire';
format Salary dollar10.0
Part 2
M
Birth_Date Hire_Date monyy7.
n
Country $ctryfmt.;
y
80
run;
a r t i o
i e t b u
p r t i
Character User-Defined Format
r
Partial PROC PRINT Output
S
r o dis SA
Obs Sales ID Job Title
Annual
Salary Country
Date of
Birth
Date of
Hire
P f
60 120178 Sales Rep. II $26,165 Australia NOV1954 APR1974
r
61 120179 Sales Rep. III $28,510 Australia MAR1974 JAN2004
o
62 120180 Sales Rep. II $26,970 Australia JUN1954 DEC1978
S o
63 120198 Sales Rep. III $28,025 Australia JAN1988 DEC2006
64 120261 Chief Sales Officer $243,190 United States FEB1969 AUG1987
f e
65 121018 Sales Rep. II $27,560 United States JAN1944 JAN1974
A
66 121019 Sales Rep. IV $31,320 United States JUN1986 JUN2004
t
67 121020 Sales Rep. IV $31,750 United States FEB1984 MAY2002
d
68 121021 Sales Rep. IV $32,985 United States DEC1974 MAR1994
S No tsi
69 121022 Sales Rep. IV $32,210 United States OCT1979 FEB2002
70 121023 Sales Rep. I $26,010 United States MAR1964 MAY1989
71 121024 Sales Rep. II $26,600 United States SEP1984 MAY2004
72 121025 Sales Rep. II $28,295 United States OCT1949 SEP1975
81
ou
11.3 Creating User-Defined Formats 11-37
proc format;
value tiers 20000-49999 = 'Tier 1'
50000-99999 = 'Tier 2'
100000-250000 = 'Tier 3';
run; numeric
format
ia l
r
name labels
a t e
y M n
83
a r t i o p111d07
i e t b u
p r
11.06 Quiz
t r i S
If you have a value of 99999.87, how will it be displayed
r o dis SA
if the TIERS format is applied to the value?
P f
a. Tier 2
S
b.
c.
f o r
Tier 3
o
a missing value
A
S No tsi
d.
t
proc format;
d
value tiers
e
none of the above
ou
100000-250000 = 'Tier 3';
run;
85
11-38 Chapter 11 Enhancing Reports
ia l
50000 - < 100000
e r
Includes 50000
t
Excludes 100000
M a Excludes 50000
Excludes 50000
Includes 100000
Excludes 100000
87
r y i o n
t a u t
r i e
11.07 Quiz
r i b S
p t
If you have a value of 100000, how will it be displayed
o dis SA
if the TIERS format is applied to the value?
P r a. Tier 2
f
S f
b.
o
c.
r Tier 3
100000
o
A
S No tsit
d.
d e
a missing value
proc format;
value tiers 20000-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'
ou
100000<-250000 = 'Tier 3';
run;
89
11.3 Creating User-Defined Formats 11-39
proc format;
value tiers low-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'
100000<-high = 'Tier 3';
run;
keyword
ia l
t e r
LOW encompasses the lowest possible value.
a
HIGH encompasses the highest possible value.
M
91
r y i o n p111d07
t a u t
Low does not include missing values for numeric variables.
r i e i b
Low does include missing values for character variables.
r S
p
o dis SA t
P r Numeric User-Defined Format
proc format;
f
S o r
Part 1
f o
value tiers low-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'
A
S No tsit
run;
d e 100000<-high
ou
Country Birth_Date Hire_Date;
label Employee_ID='Sales ID'
Job_Title='Job Title'
Salary='Annual Salary'
Birth_Date='Date of Birth'
Hire_Date='Date of Hire';
format Birth_Date Hire_Date monyy7.
Part 2
Salary tiers.;
run;
92
11-40 Chapter 11 Enhancing Reports
l
65 121018 Sales Rep. II Tier 1 US JAN1944 JAN1974
66 121019 Sales Rep. IV Tier 1 US JUN1986 JUN2004
ia
67 121020 Sales Rep. IV Tier 1 US FEB1984 MAY2002
68 121021 Sales Rep. IV Tier 1 US DEC1974 MAR1994
r
69 121022 Sales Rep. IV Tier 1 US OCT1979 FEB2002
70 121023 Sales Rep. I Tier 1 US MAR1964 MAY1989
71 121024 Sales Rep. II Tier 1 US SEP1984 MAY2004
e
72 121025 Sales Rep. II Tier 1 US OCT1949 SEP1975
a t
y M n
93
a r t i o
i e t b u
p r
proc format;
t i
Other User-Defined Format Examples
r S
r o dis SA
value $grade 'A'
'B'-'D'
=
=
'Good'
'Fair'
P
'F' = 'Poor'
S o r o f 'I','U'
other
=
=
'See Instructor'
'Miscoded';
f
run;
A
S No tsit d e
proc format;
value mnthfmt 1,2,3
4,5,6
=
=
'Qtr 1'
'Qtr 2'
7,8,9 = 'Qtr 3'
ou
10,11,12 = 'Qtr 4'
. = 'missing'
other = 'unknown';
run;
94
11.3 Creating User-Defined Formats 11-41
l
other = 'Miscoded';
value tiers low-<50000 = 'Tier 1'
ia
50000- 100000 = 'Tier 2'
r
100000<-high = 'Tier 3';
run;
a t e
y M n
95
a r t i o p111d07
i e t b u
p r t r i
Multiple User-Defined Formats
S
o dis SA
proc print data=orion.sales label;
. . .
f
Country $ctryfmt.
r
Salary tiers.;
S
run;
f o e o
A
S No tsit
Obs
60
d
Partial PROC PRINT Output
Sales ID
120178
Job Title
Sales Rep. II
Annual
Salary
Tier 1
Country
Australia
Date of
Birth
NOV1954
Date of
Hire
APR1974
ou
61 120179 Sales Rep. III Tier 1 Australia MAR1974 JAN2004
62 120180 Sales Rep. II Tier 1 Australia JUN1954 DEC1978
63 120198 Sales Rep. III Tier 1 Australia JAN1988 DEC2006
64 120261 Chief Sales Officer Tier 3 United States FEB1969 AUG1987
65 121018 Sales Rep. II Tier 1 United States JAN1944 JAN1974
66 121019 Sales Rep. IV Tier 1 United States JUN1986 JUN2004
67 121020 Sales Rep. IV Tier 1 United States FEB1984 MAY2002
p111d07
96
11-42 Chapter 11 Enhancing Reports
ia l
r
Australia 63 38.18 63 38.18
United States 102 61.82 165 100.00
Salary
a
Frequency
t e Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Tier 1
Tier 2
Tier 3
y M
159
4
2
96.36
n
2.42
1.21
159
163
165
96.36
98.79
100.00
97
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.3 Creating User-Defined Formats 11-43
Exercises
Level 1
ia l
t e r
a. Retrieve the starter program p111e07.
b. Create a character format named $gender that displays gender codes as follows:
F Female
M a
M
r
Male
y i o n
c. Create a numeric format named moname that displays month numbers as follows:
t
1
a January
u t
r i e 2
i b
February
r S
p
o dis SA
3
tMarch
P r
d. In the PROC FREQ step, apply these two user-defined formats to the Employee_Gender
f
and BirthMonth variables, respectively.
S o r o
e. Submit the program to produce the following report:
f
A
S No tsit d e
PROC FREQ Output
Employees with Birthdays in Q1
ou
Birth Cumulative Cumulative
Month Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
January 44 38.94 44 38.94
February 34 30.09 78 69.03
March 35 30.97 113 100.00
Level 2
F Female
M Male
ia l
Any other value
t e r Invalid code
c. Create a numeric format named salrange that displays salary ranges as follows:
M a
At least 20,000 but less than 100,000 Below $100,000
r i o n
At least 100,000 and up to 500,000
y missing
$100,000 or more
Missing salary
t a u t
Any other value Invalid salary
r i e i b
d. In the PROC PRINT step, apply these two user-defined formats to the Gender and Salary
r S
t
variables, respectively.
p
o dis SA
e. Submit the program to produce the following report:
r
P
Partial PROC PRINT Output
A t f Obs
S No tsi 1
2
3
120101
120104
120105
Director
Administration Manager
Secretary I
$100,000 or more
Below $100,000
Below $100,000
Male
Female
Female
ou
4 120106 Office Assistant II Missing salary Male
5 120107 Office Assistant III Below $100,000 Female
6 120108 Warehouse Assistant II Below $100,000 Female
7 120108 Warehouse Assistant I Below $100,000 Female
8 120110 Warehouse Assistant III Below $100,000 Male
9 120111 Security Guard II Below $100,000 Male
10 120112 Below $100,000 Female
11 120113 Security Guard II Below $100,000 Female
12 120114 Security Manager Below $100,000 Invalid code
13 120115 Service Assistant I Invalid salary Male
The PROC PRINT output might not have an invalid Gender value if the data was
previously cleaned.
11.3 Creating User-Defined Formats 11-45
Level 3
ia l
r
missing Display the text None.
a t e
Documentation about the FORMAT procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The FORMAT Procedure).
y M
The documentation for the VALUE statement describes how to use an existing
n
format as the label for a range.
a r t i o
c. Apply the new format to the Employee_Term_Date variable in the PROC FREQ step.
e t u
d. Submit the program to produce the following report:
i b
p r r i S
An option is required in the TABLES statement in order to display missing values as part
t
of the main frequency report. Documentation about the FREQ procedure can be found in
o dis SA
the SAS Help and Documentation from the Contents tab (SAS Products Base SAS
f
r
PROC FREQ Output
A
S No tsit d Employee_
Term_Date Frequency
The FREQ Procedure
Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ou
None 308 72.64 308 72.64
2002 6 1.42 314 74.06
2003 29 6.84 343 80.90
2004 18 4.25 361 85.14
2005 21 4.95 382 90.09
2006 20 4.72 402 94.81
JAN2007 3 0.71 405 95.52
FEB2007 3 0.71 408 96.23
MAR2007 7 1.65 415 97.88
APR2007 3 0.71 418 98.58
MAY2007 4 0.94 422 99.53
JUN2007 2 0.47 424 100.00
11-46 Chapter 11 Enhancing Reports
Objectives
Display selected observations in reports by using the
WHERE statement.
Display groups of observations in reports by using the
BY statement.
ia l
t e r
M a
r y i o n
t a u t
101
r i e r i b S
p
o dis SA t
P r The WHERE Statement (Review)
f
r
For subsetting observations in a report, the WHERE
S f o o
statement is used to select observations that meet a
certain condition.
e
A
S No tsit d
General form of the WHERE statement:
WHERE where-expression;
ou
The where-expression is a sequence of operands and
operators that form a set of instructions that define a
condition for selecting observations.
Operands include constants and variables.
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.
102
11.4 Subsetting and Grouping Observations 11-47
11.08 Quiz
Which of the following WHERE statements have
invalid syntax?
a. where
. Salary ne .;
b. where
. Hire_Date >= '01APR2008'd;
c. where
. Country in (AU US);
ia l
d. where
.
e r
Salary + Bonus <= 10000;
t
e. where
.
a
Gender ne 'M' Salary >= 50000;
M n
f. where
. Name like '%N';
104
a r y t i o
i e t b u
p r t i
Subsetting Observations
r
proc print data=orion.sales;
S
r o dis SA
var First_Name Last_Name
Job_Title Country Salary;
P f
where Salary > 75000;
r
run;
S f o First_
e o
A
Obs Name Last_Name Job_Title Country Salary
S No tsit 1
2
64
163
164
Tom
Harry
Louis
Renee
d
Wilson
Zhou
Dawes
Highpoint
Favaron
Capachietti
Sales Manager
Sales Manager
Chief Sales Officer
Senior Sales Manager
Sales Manager
AU
AU
US
US
US
108255
87975
243190
95090
83505
ou
165 Dennis Lansberry Sales Manager US 84260
p111d08
106
11-48 Chapter 11 Enhancing Reports
Subsetting Observations
proc means data=orion.sales;
var Salary;
where Country = 'AU';
run;
N Mean
Analysis Variable : Salary
ia l
30158.97
e r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
63
t
12699.14 25185.00 108255.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
M a
107
r y i o n p111d08
t a u t
r i e i b
Setup for the Poll
r S
p t
Retrieve and submit program p111a02.
o dis SA
View the log to determine how SAS handles
f
S f o r o
proc freq data=orion.sales;
tables Gender;
A
S No tsit run;
d e
where Salary > 75000;
where Country = 'US';
109
ou
11.4 Subsetting and Grouping Observations 11-49
ia l
t e r
M a
110
r y i o n
t a u t
r i e
The BY Statement
r i b S
p t
For grouping observations in a report, the BY statement
o dis SA
is used to produce separate sections of the report for
P r each BY group.
f
S r o
General form of the BY statement:
f o
A
S No tsit e
BY <DESCENDING> by-variable(s);
d
The observations in the data set must be sorted
ou
by the variables specified in the BY statement.
112
11-50 Chapter 11 Enhancing Reports
Grouping Observations
proc sort data=orion.sales out=work.sort;
by Country descending Gender Last_Name;
run;
ia l
t e r
M a
113
r y i o n p111d09
t a u t
r i e i b
Grouping Observations
r S
p t
Partial PROC PRINT Output
o dis SA
r
---------------------- Country=AU Gender=M ----------------------
P f
First_
r
Obs Employee_ID Name Last_Name Salary
S f o
1
2
e o
120145
120144
Sandy
Viney
Aisbitt
Barbis
26060
30265
A
3 120146 Wendall Cederlund 25985
t d
4 120172 Edwin Comber 28345
---------------------- Country=AU Gender=F ----------------------
S No tsi
5 120175 Andrew Conolly 25745
6 120103 Wilson Dawes 87975
Obs Employee_ID First_Name Last_Name Salary
7 120124 Lucian Daymond 26480
8 120126 Satyakam Denny 26780
37 120168 Selina Barcoe 25275
ou
38 120198 Meera Body 28025
39 120149 Judy Chantharasy 26390
40 120127 Sharryn Clarkson 28100
41 120138 Shani Duckett 25795
42 120121 Irenie Elvish 26600
43 120154 Caterina Hayawardhana 30490
44 120123 Kimiko Hotstone 26190
114
11.4 Subsetting and Grouping Observations 11-51
11.10 Quiz
Which is a valid BY statement for the PROC FREQ step?
a. .by Country Gender;
l
c. .by Country;
ia
d. .by Gender;
e r
proc sort data=orion.sales out=work.sort;
t
by Country descending Gender Last_Name;
run;
M a
proc freq data=work.sort;
116
tables Gender;
run;
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-52 Chapter 11 Enhancing Reports
Exercises
Level 1
ia l
t e r
a. Retrieve the starter program p111e10.
b. Add a PROC SORT step to sort the observations in orion.order_fact based
a
on the Order_Type variable.
M n
To avoid overwriting the orion.order_fact data set, be sure to use
y
the OUT= option to create a new data set containing the sorted observations.
r i o
Remember to use the new data set in the PROC MEANS step.
a t
i e t b u
c. Restrict the PROC MEANS analysis to two Order_Type values: 2 and 3.
i
d. Modify the PROC MEANS step to generate the summary analysis separately for each
r r S
selected Order_Type value in the sorted data set.
p t
o dis SA
e. Submit the program to produce the following output:
P r
PROC MEANS Output
f
r
Orion Star Sales Summary
S f o e o
----------------------------------------- Order Type=2 -----------------------------------------
A
S No tsit d The MEANS Procedure
ou
N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
170 199.5961765 282.9680817 2.6000000 1937.20
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Level 2
Create a new data set containing the sorted observations. Do not overwrite
the orion.order_fact data set. Remember to use the new data set in
ia l
r
the PROC PRINT step.
t e
c. Divide the PROC PRINT report based on Order_Type using a BY statement. The
orders for each order type should be displayed in reverse chronological order, that is,
a
with more recent orders near the top of the report.
M n
d. Limit the observations in the PROC PRINT report based on the following criteria:
y
r i o
1) Orders placed in the first four months of 2005 (January 1 to April 30)
a t
i e t
2) Orders that were delivered exactly two days after the order was placed
b u
e. Add a second title to clarify that filters were applied to the data.
p r t r i S
f. Submit the program to produce the following report:
r o dis SA
PROC PRINT Output
Orion Star Sales Details
A t f d e Order_ Delivery_
S No tsi Obs
409
Order_ID
1235611754
Date
27APR2005
Date
29APR2005
ou
410 1235611754 27APR2005 29APR2005
411 1235591214 25APR2005 27APR2005
412 1235591214 25APR2005 27APR2005
413 1234972570 24FEB2005 26FEB2005
415 1234659163 24JAN2005 26JAN2005
417 1234588648 17JAN2005 19JAN2005
418 1234588648 17JAN2005 19JAN2005
419 1234538390 12JAN2005 14JAN2005
Order_ Delivery_
Obs Order_ID Date Date
Level 3
M a
c. Augment the existing WHERE criteria by further restricting the report to product names
that contain either the word Street or the word Running.
y o n
To add clauses to an existing WHERE statement without retyping or editing it,
r i
use the SAME-AND operator in a separate WHERE statement within the same step.
t a u t
See the documentation in the SAS Help and Documentation from the Contents tab
(SAS Products Base SAS SAS 9.2 Language Reference: Concepts
r i e r i b
SAS System Concepts WHERE-Expression Processing
Syntax of WHERE Expression).
S
p
o dis SA t
d. Submit the program to produce the following report:
P r
Partial PROC PRINT Output
f
r
Orion Star Products: Children Sports
S o o
---------- Supplier Name=Greenline Sports Ltd Supplier ID=14682 Country=Great Britain ----------
f e
A
S No tsit d
Obs
50
Product_ID
210200600015
Product_Name
ou
--------------- Supplier Name=3Top Sports Supplier ID=2963 Country=United States ---------------
Objectives
Direct output to ODS destinations by using
ODS statements.
Specify a style definition by using the STYLE= option.
Create ODS files that can be opened in
ia l
r
Microsoft Excel.
a t e
y M n
a r t i o
i e t b u
i
121
p r t r S
r o dis SA
Output Delivery System
S P r
ODS statements.
o o f
Output can be sent to a variety of destinations by using
A t f d e LISTING
S No tsi HTML
ou
Procedure
Output
RTF
122
11-56 Chapter 11 Enhancing Reports
ia l
RTF
e r
Rich Text Format
t
Word Processors such
as Microsoft Word
M a
123
r y i o n
t a u t
r i e i b
Default ODS Destination
r S
p t
The LISTING destination is the default ODS destination.
o dis SA
P r ods listing;
f
r
proc freq data=orion.sales;
S f o
run;
e o
tables Country;
A
S No tsit d
proc gchart data=orion.sales;
hbar Country / nostats;
run;
124
ou p111d10
11.5 Directing Output to External Files 11-57
ia l
t e r
M a
125
r y i o n
t a u t
r i e i b
Default ODS Destination
r S
p t
The ODS LISTING CLOSE statement stops sending
o dis SA
output to the OUTPUT and GRAPH windows.
f
S f o r o
proc freq data=orion.sales;
tables Country;
A
S No tsit
run;
d e
proc gchart data=orion.sales;
hbar Country / nostats;
run;
126
ou p111d10
11-58 Chapter 11 Enhancing Reports
l
24
25 proc freq data=orion.sales;
ia
26 tables Country;
27 run;
e r
WARNING: No output destinations active.
t
NOTE: There were 165 observations read from the data set ORION.SALES.
M a
127
r y i o n
t a u t
r i e i b
HTML, PDF, and RTF Destinations
r S
p t
ODS destinations such as HTML, PDF, and RTF
o dis SA
are opened and closed in the following manner:
P r f
ODS destination FILE = ' filename.ext ' <options>;
S f o r o
SAS code to generate a report(s)
A
S No tsit d e
ODS destination CLOSE;
128
ou
The filename specified in the FILE= option needs to be specific to your operating environment.
11.5 Directing Output to External Files 11-59
HTML Destination
ods html file='myreport.html';
proc freq data=orion.sales;
tables Country;
run;
ods html close;
ia l
t e r
M a
129
r y i o n p111d11
t a u t
Always terminate steps with an explicit step boundary before closing the destination.
r i e r i b
Otherwise, the file is closed before the step executes.
S
p
o dis SA t
PDF Destination
P r f
ods pdf file='myreport.pdf';
S f o r
proc freq data=orion.sales;
o
tables Country;
run;
A
S No tsit d e
ods pdf close;
ou
p111d11
130
11-60 Chapter 11 Enhancing Reports
RTF Destination
ods rtf file='myreport.rtf';
proc freq data=orion.sales;
tables Country;
run;
ods rtf close;
ia l
t e r
M a
131
r y i o n p111d11
t a u t
The traditional RTF destination does not control vertical measurement, so page breaking is
r i e r i b
controlled by the word processor. Starting in SAS 9.2, there is a new destination, TAGSETS.RTF,
S
which does control vertical measurement. Documentation about the TAGSETS.RTF destination
p t
can be found in the SAS Help and Documentation from the Contents tab (SAS Products
o dis SA
Base SAS SAS 9.2 Output Delivery System User’s Guide ODS Language Statements
f
S f o r
11.11 Quiz o
A
S No tsit d e
What is the problem with this program?
ou
proc print data=orion.sales;
run;
ods close;
133
11.5 Directing Output to External Files 11-61
Single Destination
Output can be sent to only one destination.
ods listing close;
ia l
ods html close;
ods listing;
t e r
It is a good habit to open the
LISTING destination at the end of
a
a program to guarantee an open
destination for the next
y M n
submission.
135
a r t i o p111d11
i e t b u
p r t i
Multiple Destinations
r S
Output can be sent to many destinations.
r o dis SA
ods listing;
ods pdf file='example.pdf';
S P o r o f
ods rtf file='example.rtf';
A t f e
tables Country;
run;
d
S No tsi ods pdf close;
ods rtf close;
136
ou
To view the results, all destinations except the LISTING
destination must be closed.
p111d11
11-62 Chapter 11 Enhancing Reports
Multiple Destinations
Use _ALL_ in the ODS CLOSE statement to close all
open destinations including the LISTING destination.
ods listing;
ods pdf file='example.pdf';
ods rtf file='example.rtf';
ia l
r
run;
a t e
y M n
137
a r t i o p111d11
i e t b u
p r t i
Multiple Procedures
r S
Output from many procedures can be sent to ODS
r o dis SA
destinations.
P f
ods listing;
r
ods pdf file='example.pdf';
S f o e o
ods rtf file='example.rtf';
A
proc freq data=orion.sales;
S No tsit run;
d
tables Country;
ou
var Salary;
run;
File Location
A path can be specified to control the location of where
the file is stored.
ods html file='s:\workshop\example.html';
l
tables Country;
run;
e r ia
t
run;
M a
n
If no path is specified, the file is saved in the current
139
a y
default directory.
r t i o p111d11
i e t b u
The path and filename specified in the FILE= option needs to be specific to your operating environment.
p r t r i
Operating Environments
S
r o dis SA
The Output Delivery System works on all operating
P f
environments.
S o r o
z/OS (OS/390) Example:
f
A
S No tsi d e
ods html file='.workshop.report(example)'
t rs=none;
ou
run;
RS=NONE writes one line of markup output at a time to the file. This enables the file to be read
with a text editor. Without the option, the lines of markup output run together.
RS=NONE is not needed with the PDF destination. It is needed with other destinations such as
CSVALL, MSOFFICE2K, and EXCELXP.
11-64 Chapter 11 Enhancing Reports
ia l
ods pdf file='myreport.pdf';
ods rtf file='myreport.rtf';
t e r
proc freq data=orion.sales;
tables Country;
M a
n
title 'Report 1';
run;
a r y
proc means data=orion.sales;
t i o
var Salary;
i e t
title 'Report 2';
b u
run;
p r t r i S
o dis SA
proc print data=orion.sales;
r
var First_Name Last_Name
Job_Title Country Salary;
S P r
where Salary > 75000;
title 'Report 3';
o o f
f
run;
A
S No tsit
ods _all_ close;
ods listing;
d e
For z/OS (OS/390), the following ODS statements are used:
ou
ods html file='.workshop.report(myhtml) rs=none';
ods pdf file='.workshop.report(mypdf)';
ods rtf file='.workshop.report(myrtf)' rs=none;
11.5 Directing Output to External Files 11-65
If you are using the SAS windowing environment on the Windows operating environment,
the Results Viewer window can be used to view the HTML, PDF, and RTF files.
1. After submitting the program, go to the Results window.
2. Right-click on the word Results within the Results window and select Expand All.
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
3. Double-click on the HTML file icon, the PDF file icon, or the RTF file icon.
S f o r o
A
S No tsit d e
ou
11-66 Chapter 11 Enhancing Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
PDF File in the Results Viewer
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-67
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-68 Chapter 11 Enhancing Reports
STYLE= Option
Use a STYLE= option in the ODS destination statement
to specify a style definition.
ia l
of SAS output.
t r
presentation aspects such as colors and fonts
e
STYLE= cannot be used with the LISTING destination.
M a
143
r y i o n
t a u t
r i e i b
SAS Supplied Style Definitions
r S
p
o dis SA
Analysis
t Astronomy Banker BarrettsBlue
r
Beige blockPrint Brick Brown
P f
Curve D3d Default Education
S f o r
EGDefault
e
FestivalPrinter Gears Journal Magnify
A
S No tsit Meadow
d
NoFontDefault
Rsvp
MeadowPrinter
Normal
Rtf
Minimal
NormalPrinter
sansPrinter
Money
Printer
sasdocPrinter
ou
Sasweb Science Seaside SeasidePrinter
serifPrinter Sketch Statdoc Statistical
Theme Torn Watercolor
144
11.5 Directing Output to External Files 11-69
ia l
t e r
M a
145
r y i o n
t a u t
Some of the style definitions changed between SAS 9.1.3 and SAS 9.2.
r i e r i b
• For example, the SAS 9.1.3 Analysis style definition produces different results than
S
the SAS 9.2 Analysis style definition. The SAS 9.1.3 Analysis style definition is similar
p t
to the SAS 9.2 Ocean style definition.
o dis SA
• The SAS 9.1.3 Statistical style definition produces different results than
P r
the SAS 9.2 Statistical style definition. The SAS 9.1.3 Statistical style definition is similar
f
to the SAS 9.2 Harvest style definition.
S f o r o
A
S No tsit e
HTML Examples
d STYLE=DEFAULT
ou
STYLE=SASWEB
p111d13
146
PDF Examples
STYLE=PRINTER
ia l
r
STYLE=JOURNAL
a t e
y M n
147
a r t i o p111d13
i e t b u
By default, the PDF destination uses the PRINTER style definition.
p r
RTF Examples
t r i S
r o dis SA STYLE=RTF
S P o r o f
A t f d e
S No tsi STYLE=OCEAN
ou
p111d13
148
l
Submit the program and review the results.
ia
Modify the STYLE= option to use one of the
following style definitions:
Education Harvest
t e r Rsvp Solutions
a
Submit the program and review the results.
y M n
150
a r t i o
i e t b u
p r
11.12 Poll
t r i S
Did you notice a difference in the presentation aspects
r o dis SA
between the two style definitions?
Yes
S P No
o r o f
A t f d e
S No tsi
151
ou
11-72 Chapter 11 Enhancing Reports
CSVALL
Procedure
Output
MSOFFICE2K
ia l
t e r EXCELXP
M a
154
r y i o n
t a u t
r i e i b
Destinations Used with Excel
r S
p
o dis SA
Destination
t Type of File Viewed In
f
Editor or
Microsoft Excel
S f o r
MSOFFICE2K
o
Hypertext Markup Language
Web Browser or
Microsoft Word or
A e
Microsoft Excel
S No tsit EXCELXP
155
ou
11.5 Directing Output to External Files 11-73
CSVALL Destination
ods csvall file='myexcel.csv';
l
proc means data=orion.sales;
var Salary;
ia
run;
t e r
M a
156
r y i o n p111d14
t a u t
r i e
CSVALL Destination
r i b S
p t
CSVALL does not include any style information.
o dis SA
P r f
S f o r o
A
S No tsit d e
157
ou
11-74 Chapter 11 Enhancing Reports
MSOFFICE2K Destination
ods msoffice2k file='myexcel.html';
l
proc means data=orion.sales;
var Salary;
ia
run;
t e r
M a
158
r y i o n p111d14
t a u t
Microsoft Excel 97 or greater is needed to open a MSOFFICE2K file.
r i e r i b S
p t
MSOFFICE2K Destination
o dis SA
P r MSOFFICE2K keeps the style information including
spanning headers.
f
S f o r o
A
S No tsit d e
ou
159
11.5 Directing Output to External Files 11-75
EXCELXP Destination
ods tagsets.excelxp file='myexcel.xml';
l
proc means data=orion.sales;
var Salary;
ia
run;
t e r
M a
160
r y i o n p111d14
t a u t
Microsoft Excel 2002 or greater is needed to open an EXCELXP file.
r i e r i b S
p t
EXCELXP Destination
o dis SA
P r EXCELXP keeps the style information and each
f
procedure is a separate sheet.
S f o r o
A
S No tsit d e
ou
161
11-76 Chapter 11 Enhancing Reports
Keep in Mind
The file you are creating is not an Excel file.
CSVALL
MSOFFIC2K
ia l
t e r EXCELXP
M a
162
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-77
l
ods csvall file='myexcel.csv';
ods msoffice2k file='myexcel.html';
ia
ods tagsets.excelxp file='myexcel.xml';
M a
var Salary;
r y
proc means data=orion.sales;
i o n
run;
t a
title 'Report 2';
u t
r i e i b
proc print data=orion.sales;
r S
p t
var First_Name Last_Name
o dis SA
Job_Title Country Salary;
r
where Salary > 75000;
title 'Report 3';
S P
run;
o r o f
f
ods _all_ close;
A ods listing;
S No tsit d e
For z/OS (OS/390), the following ODS statements are used:
ods csvall file='.workshop.report(mycsv)' rs=none;
ou
ods msoffice2k file='.workshop.report(myhtml)' rs=none;
ods tagsets.excelxp file='.workshop.report(myxml)' rs=none;
11-78 Chapter 11 Enhancing Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
HTML File in Microsoft Excel 2002
o dis SA
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-79
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-80 Chapter 11 Enhancing Reports
l
********** Documentation Option **********;
ods listing close;
ods tagsets.excelxp file='myexcel1.xml'
r
style=sasweb
e
options(doc='help'); ia
proc freq data=orion.sales;
a t
M
tables Country;
n
title 'Report 1';
y
run;
a r
proc means data=orion.sales;
t i o
var Salary;
i e t
title 'Report 2';
b u
run;
p r t r i S
o dis SA
ods tagsets.excelxp close;
r
ods listing;
P
ods listing close;
S o o f
********** Other Options **********;
r
f
ods tagsets.excelxp file='myexcel2.xml'
A
S No tsit d estyle=sasweb
options(embedded_titles='yes'
sheet_Name='First Report');
ou
tables Country;
title 'Report 1';
run;
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-82 Chapter 11 Enhancing Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-83
Exercises
Level 1
ia l
e r
b. Create the PDF version of the PROC PRINT report by adding ODS statements.
t
a
Use the following naming convention when creating the PDF file:
Windows or UNIX
y M n
p111s13p.pdf
o
z/OS (OS/390) .workshop.report(p111s13p)
a r t i
c. Submit the program to produce the following report in PDF form as displayed in Adobe Reader:
t
i e i b u
Partial PROC PRINT Output
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
Compare this PDF output to the equivalent report that appears in the Output window.
11-84 Chapter 11 Enhancing Reports
d. Modify your ODS statements to create the RTF version of the PROC PRINT report.
Use the following naming convention when creating the RTF file:
e. Suppress the default Output window listing before generating the RTF report, and then re-
establish the Output window as the report destination after the RTF report is complete.
ia l
f. Submit the program to produce the following report in RTF form as displayed in Microsoft Word:
Partial PROC PRINT Output
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
What happens if the RUN statement is moved to the end of the program?
g. Add the STYLE= option to the ODS RTF statement to use a style definitions such
as Curve, Gears, Money, or Torn.
h. Submit the program and view the report in RTF form in Microsoft Word.
11.5 Directing Output to External Files 11-85
Level 2
ia l
t e r
2) the reports in a single worksheet or multiple worksheets.
If selecting a destination that supports style information, specify the Listing style definition.
M a
Use the following naming convention when creating the file. For Windows or UNIX, choose
an appropriate extension for the file depending on the type of file that is created.
r y
Windows or UNIX
i o n p111s14.xxx
t a
z/OS (OS/390)
u t .workshop.report(p111s14)
r i e
c. Submit the program to produce the output file.
r i b S
d. Open the file with Microsoft Excel. The report should resemble the following results. Your
p t
output will look different depending on the ODS destination you choose.
o dis SA
P r f
S f o r o
A
S No tsit d e
Level 3
ou
15. Adding HTML-Specific Features to ODS Output
a. Retrieve the starter program p111e15.
b. Create the HTML version of the PROC PRINT report by adding ODS statements.
Use the following naming convention when creating the HTML file:
c. Customize the title so that it becomes a clickable hyperlink when displayed in a Web browser.
The hyperlink should point to the URL http://www.sas.com (the SAS home page).
ia l
Partial PROC PRINT Output
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
e
16. Implementing Cascading Style Sheets with ODS Output
A
S No tsit d
a. Retrieve the starter program p111e16.
b. Modify the ODS HTML statement so that the output generated by the program uses a cascading
style sheet. The CSS definition applies a yellow background to data cells in the ODS output.
ou
Use the following cascading style sheet:
Windows or UNIX
z/OS (OS/390)
p111e16c.css
.workshop.report(p111e16c)
The syntax required to reference an existing CSS file can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Output Delivery System User's Guide ODS Language Statements
Dictionary of ODS Language Statements ODS HTML Statement). Follow the
documentation links for the STYLESHEET= option and its URL= suboption.
11.5 Directing Output to External Files 11-87
c. Submit the program to produce the following HTML report as displayed in Internet Explorer:
PROC PRINT Output
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-88 Chapter 11 Enhancing Reports
Chapter Review
1. What are some examples of global statements that
enhance reports?
ia l
e r
3. How can you force a line break in a column header in
PROC PRINT?
t
M a
4. What is the difference between using a FORMAT
r i o n
statement in a PROC step versus a DATA step?
y
t a u t
167
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review
f
r
5. How can you create a descriptive label for values of
S f o o
a variable such as a department name instead of a
department code?
e
A
S No tsit d
6. What are some examples of ODS destinations?
ou
168
11.7 Solutions 11-89
11.7 Solutions
Solutions to Exercises
1. Specifying Titles, Footnotes, and System Options
a. Retrieve the starter program.
b. Use the OPTIONS statement.
options nonumber nodate pagesize=18;
ia l
r
proc means data=orion.order_fact;
e
var Total_Retail_Price;
run;
options pagesize=52;
a t
c. Specify a title.
y M
options nonumber nodate pagesize=18;
n
r
title 'Orion Star Sales Report';
a t
proc means data=orion.order_fact;
i o
run;
i e t
var Total_Retail_Price;
b u
p r
options pagesize=52;
t r
d. Specify a footnote.i S
r o dis SA
options nonumber nodate pagesize=18;
title 'Orion Star Sales Report';
S P o o f
footnote 'Report by SAS Programming Student';
r
proc means data=orion.order_fact;
var Total_Retail_Price;
A
run;
t f d e
options pagesize=52;
S No tsi
e. Cancel the footnote.
options nonumber nodate pagesize=18;
ou
title 'Orion Star Sales Report';
footnote 'Report by SAS Programming Student';
proc means data=orion.order_fact;
var Total_Retail_Price;
run;
options pagesize=52;
footnote;
f. Submit the program.
11-90 Chapter 11 Enhancing Reports
t e r
a
run;
options pagesize=52;
M n
c. Request page numbers starting at 1.
y o
options pagesize=18 number pageno=1;
r t
proc means data=orion.order_fact;
t a
where Order_Type=2;
i
run;
i e
var Total_Retail_Price;
i b u
p r
options pageno=1;
t r S
o dis SA
proc means data=orion.order_fact;
P r
where Order_Type=3;
var Total_Retail_Price;
f
run;
S o r
options pagesize=52;
f o
A
S No tsi d e
d. Request the current date and time.
t
options pagesize=18 number pageno=1 date dtreset;
e. Specify a title in both reports.
ou
options pagesize=18 number pageno=1 date dtreset;
title1 'Orion Star Sales Analysis';
f. Specify a secondary title in the first report.
options pagesize=18 number pageno=1 date dtreset;
title1 'Orion Star Sales Analysis';
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
title3 'Catalog Sales Only';
run;
11.7 Solutions 11-91
ia l
title1 'Orion Star Sales Analysis';
t
proc means data=orion.order_fact; r
options pagesize=18 number pageno=1 date dtreset;
e
where Order_Type=2;
var Total_Retail_Price;
M a
n
title3 'Catalog Sales Only';
y
footnote "Based on the previous day's posted data";
run;
a r t i o
i e t
options pageno=1;
b u
proc means data=orion.order_fact;
p r
where Order_Type=3;
t r i
var Total_Retail_Price;
S
o dis SA
title3 'Internet Sales Only';
r
run;
options pagesize=52;
S P o r o f
i. Cancel all footnotes for the second report.
f
options pagesize=18 number pageno=1 date dtreset;
A
S No tsi d e
title1 'Orion Star Sales Analysis';
t
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
title3 'Catalog Sales Only';
ou
footnote "Based on the previous day's posted data";
run;
options pageno=1;
proc means data=orion.order_fact;
where Order_Type=3;
var Total_Retail_Price;
title3 'Internet Sales Only';
footnote;
run;
options pagesize=52;
j. Submit the program.
11-92 Chapter 11 Enhancing Reports
l
c. Add a title.
ia
%let currentdate=%sysfunc(today(),weekdate.);
%let currenttime=%sysfunc(time(),timeampm8.);
proc means data=orion.order_fact;
t e r
title "Sales Report as of ¤ttime on ¤tdate";
a
var Total_Retail_Price;
run;
d. Submit the program.
y M n
r i o
4. Applying Labels and Formats in Reports
a t
i e t
a. Retrieve the starter program.
b u
b. Modify the column heading for each variable.
r r i S
proc print data=orion.employee_payroll label;
p t
where Dependents=3;
r o dis SA
title 'Employees with 3 Dependents';
var Employee_ID Salary
S P o o f
Birth_Date Employee_Hire_Date Employee_Term_Date;
r
label Employee_ID='Employee Number'
Salary='Annual Salary'
A t f e
Birth_Date='Birth Date'
d
Employee_Hire_Date='Hire Date'
S No tsi
run;
Employee_Term_Date='Termination Date';
ou
c. Display all dates in the form ddMONyyyy.
proc print data=orion.employee_payroll label;
where Dependents=3;
title 'Employees with 3 Dependents';
var Employee_ID Salary
Birth_Date Employee_Hire_Date Employee_Term_Date;
label Employee_ID='Employee Number'
Salary='Annual Salary'
Birth_Date='Birth Date'
Employee_Hire_Date='Hire Date'
Employee_Term_Date='Termination Date';
format Birth_Date Employee_Hire_Date Employee_Term_Date date11.;
run;
11.7 Solutions 11-93
d. Display each salary with dollar signs, commas, and two decimal places.
proc print data=orion.employee_payroll label;
where Dependents=3;
title 'Employees with 3 Dependents';
var Employee_ID Salary
Birth_Date Employee_Hire_Date Employee_Term_Date;
label Employee_ID='Employee Number'
Salary='Annual Salary'
Birth_Date='Birth Date'
Employee_Hire_Date='Hire Date'
Employee_Term_Date='Termination Date';
ia l
run;
Salary dollar11.2;
t r
format Birth_Date Employee_Hire_Date Employee_Term_Date date11.
e
e. Submit the program.
M a
r i
a. Retrieve the starter program.
o n
5. Overriding Existing Labels and Formats
y
t a u t
b. Display only the year portion of the birth dates.
r i e
where Country='TR';
r i b
proc print data=orion.customer;
S
p t
title 'Customers from Turkey';
o dis SA
var Customer_ID Customer_FirstName Customer_LastName
r
Birth_Date;
format Birth_Date year4.;
S P
run;
o r o f
c. Display only the first initial of each customer’s first name.
A t f e
proc print data=orion.customer;
d
S No tsi
where Country='TR';
title 'Customers from Turkey';
var Customer_ID Customer_FirstName Customer_LastName
ou
Birth_Date;
format Birth_Date year4.
Customer_FirstName $1.;
run;
d. Show the customer’s ID with exactly six digits.
proc print data=orion.customer;
where Country='TR';
title 'Customers from Turkey';
var Customer_ID Customer_FirstName Customer_LastName
Birth_Date;
format Birth_Date year4.
Customer_FirstName $1.
Customer_ID z6.;
run;
11-94 Chapter 11 Enhancing Reports
ia l
run;
Customer_FirstName $1.
Customer_ID z6.;
t e r
f. Submit the program.
M a
r i o
a. Retrieve the starter program. n
6. Applying Permanent Labels and Formats
y
t a u t
b. Add permanent variable labels and formats.
r i e
data otherstatus;
r i b
set orion.employee_payroll;
S
p t
keep Employee_ID Employee_Hire_Date;
o dis SA
if Marital_Status='O';
r
label Employee_ID='Employee Number'
Employee_Hire_Date='Hired';
P
run;
S o r o f
format Employee_Hire_Date yymmddp10.;
A t f e
title 'Employees who are listed with Marital Status=O';
d
S No tsi
proc print data=otherstatus label;
run;
ou
proc contents data=otherstatus;
run;
proc format;
ia l
value $gender
'F'='Female'
t e r
a
'M'='Male';
run;
M
c. Create a numeric format.
y n
o
data Q1Birthdays;
r
set orion.employee_payroll;
t a t
BirthMonth=month(Birth_Date);
i
run;
i e
if BirthMonth le 3;
i b u
p r
proc format;
t r S
o dis SA
value $gender
P r 'F'='Female'
'M'='Male';
f
S o r
value moname
1='January'
f o
e
2='February'
A
S No tsi
run;
t
3='March';
d
d. In the PROC FREQ step, apply these two user-defined formats.
ou
proc freq data=Q1Birthdays;
tables BirthMonth Employee_Gender;
format Employee_Gender $gender.
BirthMonth moname.;
title 'Employees with Birthdays in Q1';
run;
e. Submit the program.
11-96 Chapter 11 Enhancing Reports
run;
other='Invalid code';
ia l
c. Create a numeric format.
proc format;
t e r
a
value $gender
'F'='Female'
'M'='Male'
M
other='Invalid code';
y n
o
value salrange
t a r t i
.='Missing salary'
20000-<100000='Below $100,000'
i e i b u
100000-500000='$100,000 or more'
other='Invalid salary';
run;
p r t r S
o dis SA
d. In the PROC PRINT step, apply these two user-defined formats.
P r
proc print data=orion.nonsales;
f
var Employee_ID Job_Title Salary Gender;
S o r
format Salary salrange. Gender $gender.;
o
title1 'Distribution of Salary and Gender Values';
f e
title2 'for Non-Sales Employees';
A run;
S No tsit d
e. Submit the program.
9. Creating a Nested Format Definition
ou
a. Retrieve the starter program.
b. Create a user-defined format.
proc format;
value dategrp
.='None'
low-'31dec2006'd=[year4.]
'01jan2007'd-high=[monyy7.]
;
run;
11.7 Solutions 11-97
ia l
b. Add a PROC SORT step.
t e r
proc sort data=orion.order_fact out=order_sorted;
by order_type;
run;
M a
r y
proc means data=order_sorted;
i o n
c. Restrict the PROC MEANS analysis.
t a
where order_type in (2,3);
var Total_Retail_Price;
u t
run;
r i e r i b
title 'Orion Star Sales Summary';
S
p
o dis SA t
d. Modify the PROC MEANS step.
r
proc means data=order_sorted;
by order_type;
S P r o
var Total_Retail_Price;
o f
where order_type in (2,3);
f
title 'Orion Star Sales Summary';
A run;
S No tsit d e
e. Submit the program.
11. Subsetting and Grouping by Multiple Variables
ou
a. Retrieve the starter program.
b. Sort the data set.
proc sort data=orion.order_fact out=order_sorted;
by Order_Type descending Order_Date;
run;
c. Divide the PROC PRINT report.
proc print data=order_sorted;
by Order_Type;
var Order_ID Order_Date Delivery_Date;
title1 'Orion Star Sales Details';
run;
11-98 Chapter 11 Enhancing Reports
ia l
by Order_Type;
e r
var Order_ID Order_Date Delivery_Date;
t
where Delivery_Date - Order_Date = 2
a
and Order_Date between '01jan2005'd and '30apr2005'd;
title1 'Orion Star Sales Details';
M n
title2 '2-Day Deliveries from January to April 2005';
y
run;
a r
f. Submit the program.
t i o
i e t b u
12. Adding Subsetting Conditions
p r t r i
a. Retrieve the starter program.
S
o dis SA
b. Reorder the variables in the PROC PRINT step.
r
proc format;
value $country
S P r
"CA"="Canada"
"DK"="Denmark"
o o f
f
"ES"="Spain"
A
S No tsit d e
"GB"="Great Britain"
"NL"="Netherlands"
"SE"="Sweden"
"US"="United States";
ou
run;
run;
"US"="United States";
ia l
e r
proc sort data=orion.shoe_vendors out=vendors_by_country;
t
by Supplier_Country Supplier_Name;
run;
M a
n
proc print data=vendors_by_country;
y
where Product_Line=21;
where same
a r t i o
and Product_Name ? 'Street' or Product_Name ? 'Running';
i e t b u
by Supplier_Name Supplier_ID Supplier_Country notsorted;
var Product_ID Product_Name;
run;
p r t i
title1 'Orion Star Products: Children Sports';
r S
o dis SA
d. Submit the program.
r
P
13. Directing Output to the PDF and RTF Destinations
S r o f
a. Retrieve the starter program.
o
A f e
b. Create the PDF version of the PROC PRINT report.
t d
S No tsi
ods pdf file='p111s13p.pdf';
proc print data=orion.customer;
title 'Customer Information';
ou
run;
ods pdf close;
For z/OS (OS/390), the following ODS PDF statement is used:
ods pdf file='.workshop.report(p111s13p)';
c. Submit the program.
d. Modify your ODS statements to create the RTF version of the PROC PRINT report.
ods rtf file='p111s13r.rtf';
proc print data=orion.customer;
title 'Customer Information';
run;
ods rtf close;
For z/OS (OS/390), the following ODS RTF statement is used:
ods rtf file='.workshop.report(p111s13r)' rs=none;
11-100 Chapter 11 Enhancing Reports
ia l
r
What happens if the RUN statement is moved to the end of the program?
t
The file is empty.
g. Add the STYLE= option.
a e
ods listing close;
M n
ods rtf file='p111s13r.rtf' style=curve;
y o
proc print data=orion.customer;
r t
title 'Customer Information';
run;
t a i
ods rtf close;
ods listing;
i e i b u
p r t r
h. Submit the program.
S
o dis SA
14. Creating ODS Output Compatible with Microsoft Excel
r
P f
a. Retrieve the starter program.
S r o
b. Add ODS statements to send the report to a file that can be viewed in Microsoft Excel.
f o
A
S No tsi d e
ods csvall file='p111s14.csv';
t
proc print data=orion.customer_type;
title 'Customer Type Definitions';
run;
ou
proc print data=orion.country;
title 'Country Definitions';
run;
ods csvall close;
11.7 Solutions 11-101
ia l
e r
For z/OS (OS/390), the following ODS MSOFFICE2K statement is used:
t
ods msoffice2k file='.workshop.report(p111s14)' rs=none;
M a
n
ods tagsets.excelxp file='p111s14.xml' style=Listing;
y
proc print data=orion.customer_type;
r i o
title 'Customer Type Definitions';
run;
a t
i e t b u
proc print data=orion.country;
run;
p r t i
title 'Country Definitions';
r S
o dis SA
ods tagsets.excelxp close;
P r
For z/OS (OS/390), the following ODS TAGSETS.EXCELXP statement is used:
f
ods tagsets.excelxp file='.workshop.report(p111s14)' rs=none;
S o r o
c. Submit the program.
f
A t e
d. Open the file with Microsoft Excel.
S No tsi d
15. Adding HTML-Specific Features to ODS Output
a. Retrieve the starter program.
ou
b. Create the HTML version of the PROC PRINT report.
ods html file='p111s15.html';
proc print data=orion.customer;
title 'Customer Information';
run;
ods html close;
For z/OS (OS/390), the following ODS HTML statement is used:
ods html file='.workshop.report(p111s15)' rs=none;
11-102 Chapter 11 Enhancing Reports
l
16. Implementing Cascading Style Sheets with ODS Output
a. Retrieve the starter program.
e
b. Modify the ODS HTML statement.
r ia
a t
ods html file='p111s16.html' stylesheet=(url='p111e16c.css');
proc print data=orion.customer_type;
M
title 'Customer Type Definitions';
n
run;
ods html close;
a r y t i o
For z/OS (OS/390), the following ODS HTML statement is used:
i e t b u
ods html file='.workshop.report(p111s16)' rs=none
stylesheet=(url='.workshop.report(p111e16c)');
r r
c. Submit the program.
p t i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.7 Solutions 11-103
ia l
e r
The DTRESET option uses the current date and time
versus the SAS invocation date and time.
t
a
options date number pageno=1 ls=100 dtreset;
M
r y i o n
t a u t
18
r i e r i b S
p111a01s
p
o dis SA t
P r 11.02 Quiz – Correct Answer
f
r
Which footnote(s) appears in the second procedure
S
output?
f o e o
a. Non Sales Employees c. Non Confidential
Sales Employees
A
S No tsit b.
dOrion Star
Non Sales Employees d. Orion Star
Non Sales Employees
Confidential
ou
footnote1 'Orion Star';
proc print data=orion.sales;
footnote2 'Sales Employees';
footnote3 'Confidential';
run;
proc print data=orion.nonsales;
footnote2 'Non Sales Employees';
run;
38
11-104 Chapter 11 Enhancing Reports
ia l
t e r
M a
57
r y i o n p111d05
t a u t
r i e i b
11.04 Quiz – Correct Answer
r S
p t
Which displayed value is incorrect for the given format?
o dis SA
P r Format
$3.
f
Stored Value
Wednesday
Displayed Value
Wed
S f o r 6.1
o 1234.345 1234.3
e
COMMAX5. 1234.345 1.234
A
S No tsit d
DOLLAR9.2
DDMMYY8.
DATE9.
1234.345
0
0
$1,234.35
01/01/1960
01JAN1960
ou
YEAR4. 0 1960
l
d. salranges
ia
e. dollar
e r
Character formats must have a dollar sign as the
t
first character and a letter or underscore as the
second character.
M a
User-defined formats cannot be the name of a SAS
77
supplied format.
r y i o n
t a u t
r i e i b
11.06 Quiz – Correct Answer
r S
p t
If you have a value of 99999.87, how will it be displayed
o dis SA
if the TIERS format is applied to the value?
P ra. Tier 2
f
S
b.
c.
f o r
Tier 3
o
a missing value
A
S No tsi
d.
t
proc format;
d
value tiers
e
none of the above
ou
100000-250000 = 'Tier 3';
run;
86
11-106 Chapter 11 Enhancing Reports
ia l
proc format;
value tiers
e r
20000-<50000 = 'Tier 1'
t
50000- 100000 = 'Tier 2'
run;
90
r y i o n
t a u t
r i e i b
11.08 Quiz – Correct Answer
r S
p t
Which of the following WHERE statements have
o dis SA
invalid syntax?
P r a. where
.
f
Salary ne .;
S o r
b. where
f
.
o
Hire_Date >= '01APR2008'd;
A
S No tsit c. where
.
d. where
. d e
Country in (AU US);
ou
e. where
. Gender ne 'M' Salary >= 50000;
f. where
. Name like '%N';
105
11.7 Solutions 11-107
ia l
1001
1002
1003
tables Gender;
t e
where Salary > 75000;
where Country = 'US';r
1000 proc freq data=orion.sales;
a
NOTE: Where clause has been replaced.
1004 run;
ORION.SALES.
y M
NOTE: There were 102 observations read from the data set
n
o
WHERE Country='US';
111
t a r t i
i e i b u
p r
11.10 Quiz – Correct Answer
t r S
Which is a valid BY statement for the PROC FREQ step?
r o dis SA
a. .by Country Gender;
S P r f
b. .by Gender Last_Name;
o o
f
c. .by Country;
A
S No tsit d e
d. .by Gender;
ou
by Country descending Gender Last_Name;
run;
ia l
t e r
M a
134
r y i o n
t a u t
r i e i b
11.12 Poll – Correct Answer
r S
p t
Did you notice a difference in the presentation aspects
o dis SA
between the two style definitions?
P r Yes
f
r
No
S o o
The first group of style definitions did not use color.
f e
A
The second group of style definitions did use color.
S No tsit d
152
ou
11.7 Solutions 11-109
ia l
FOOTNOTE
ODS
t e r
10
M a
2. What is the maximum number of title or footnote lines?
r y i o n
t a u t
169
r i e r i b S
continued...
p
o dis SA t
P r Chapter Review Answers
f
r
3. How can you force a line break in a column header in
S f
PROC PRINT?
o e o
Use the SPLIT= option in the PROC PRINT
A
S No tsit
statement in combination with the LABEL
d
statement or a permanent label.
ou
statement in a PROC step versus a DATA step?
A FORMAT statement in a PROC step defines a
temporary format. A FORMAT statement in a DATA
step defines a permanent format.
170 continued...
11-110 Chapter 11 Enhancing Reports
ia l
t e r
M a
171
r y i o n continued...
t a u t
r i e i b
Chapter Review Answers
r S
p t
6. What are some examples of ODS destinations?
o dis SA
LISTING
P r HTML
f
S f o r
PDF
RTF
o
A
S No tsit
CSVALL
d e
MSOFFICE2K
EXCELXP
172
ou
Chapter 12 Producing Summary
Reports
ia l
Exercises ............................................................................................................................ 12-21
t e r
12.2 Using the MEANS Procedure .................................................................................... 12-27
a
Exercises ............................................................................................................................ 12-42
M
y o n
12.3 Using the TABULATE Procedure (Self-Study) ......................................................... 12-46
r i
a t
Exercises ............................................................................................................................ 12-60
t u
i e i b
12.4 Chapter Review ........................................................................................................... 12-65
r r S
p
o dis SA t
12.5 Solutions ..................................................................................................................... 12-66
f
S f o r o
Solutions to Student Activities (Polls/Quizzes) ................................................................... 12-75
e
Solutions to Chapter Review .............................................................................................. 12-78
A
S No tsit d
ou
12-2 Chapter 12 Producing Summary Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.1 Using the FREQ Procedure 12-3
Objectives
Produce one-way and two-way frequency tables with
the FREQ procedure.
Enhance frequency tables with options.
Produce output data sets by using the OUT= option
ia l
r
in the TABLES and OUTPUT statements. (Self-Study)
a t e
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
The FREQ Procedure
S P o r o f
The FREQ procedure can do the following:
produce one-way to n-way frequency and
A t f d e
crosstabulation (contingency) tables
compute chi-square tests for one-way to n-way tables
ou
the output in a SAS data set
ia l
Employee_ID
First_Name
t e rJob_Title
Country
Last_Name
Gender
M a Birth_Date
Hire_Date
5
Salary
r y i o n p112d01
t a u t
By default, PROC FREQ creates a report on every variable in the data set. For example, the
r i e r i b
Employee_ID report displays every unique value of Employee_ID, counts how many observations
have each value, and provides percentages and cumulative statistics. This is not a useful report because
S
p t
each employee has his or her own unique employee ID.
o dis SA
r
You do not typically create frequency reports for variables with a large number of distinct values,
such as Employee_ID, or for analysis variables, such as Salary. You usually create frequency reports
P r
and applying formats.
S o o f
for categorical variables, such as Job_Title. You can group variables into categories by creating
A t f d e
S No tsi The TABLES Statement
The TABLES statement specifies the frequency and
ou
crosstabulation tables to produce.
proc freq data=orion.sales;
one-way
tables Gender Country;
frequency tables
run;
p112d01
6
12.1 Using the FREQ Procedure 12-5
ia l
r
Cumulative Cumulative
Gender Frequency Percent Frequency Percent
e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
F 68 41.21 68 41.21
M 97 58.79 165 100.00
Country Frequency
M a Percent
Cumulative
Frequency
Cumulative
Percent
n
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 38.18 63 38.18
y o
US 102 61.82 165 100.00
7
t a r t i
i e i b u
p r
The TABLES Statement
t r S
An n-way frequency table produces cell frequencies, cell
r o dis SA
percentages, cell percentages of row frequencies, and
cell percentages of column frequencies, plus total
o r o f
proc freq data=orion.sales;
A t f
run;
d e
tables Gender*Country;
8
ou
12-6 Chapter 12 Producing Summary Reports
Gender Country
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚AU ‚US ‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68
ia l
r
‚ 16.36 ‚ 24.85 ‚ 41.21
‚ 39.71 ‚ 60.29 ‚
e
‚ 42.86 ‚ 40.20 ‚
t
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
M ‚ 36 ‚ 61 ‚ 97
a
‚ 21.82 ‚ 36.97 ‚ 58.79
‚ 37.11 ‚ 62.89 ‚
M
‚ 57.14 ‚ 59.80 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
n
Total 63 102 165
y
38.18 61.82 100.00
a r t i o
i e t b u
p r t i
12.01 Multiple Choice Poll
r S
Which of the following statements cannot be added
r o dis SA
to the PROC FREQ step to enhance the report?
P f
a. FORMAT
S
b.
c.
f o rSET
TITLE
o
A
S No tsit
d. WHERE
d e
11
ou
12.1 Using the FREQ Procedure 12-7
ia l
proc freq data=orion.sales;
tables Gender*Country;
t e r
where Job_Title contains 'Rep';
a
format Country $ctryfmt.;
title 'Sales Rep Frequency Report';
run;
M
13
ods html close;
r y i o n p112d01
t a u t
r i e i b
Additional SAS Statements
r S
p
HTML Output
o dis SA t
P r f
S f o r o
A
S No tsit d e
14
ou
12-8 Chapter 12 Producing Summary Reports
Option Description
l
suppresses the display of cumulative frequency
NOCUM
and cumulative percentage.
NOPERCENT
suppresses the display of percentage, cumulative
r
percentage, and total percentage.
e ia
t
suppresses the display of the cell frequency and
NOFREQ
a
total frequency.
NOROW suppresses the display of the row percentage.
NOCOL
M n
suppresses the display of the column percentage.
y
15
a r t i o
i e t b u
p r t r i
Options to Suppress Display of Statistics
S
o dis SA
NOCUM
P r f
Suppresses
S f o r o
e
Cumulative Cumulative
A
Country Frequency Percent Frequency Percent
t d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
S No tsi
AU 63 38.18 63 38.18
US 102 61.82 165 100.00
ou
NOPERCENT
Suppresses
16
12.1 Using the FREQ Procedure 12-9
Gender Country
Frequency‚
NOFREQ
Percent ‚ Suppresses
Row Pct ‚ NOROW
Col Pct ‚AU Suppresses
‚US ‚ Total
l
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
ia
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21
NOCOL
r
‚ 39.71 ‚ 60.29 ‚
Suppresses
‚ 42.86 ‚ 40.20 ‚
NOPERCENT
e
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
t
M ‚ 36 ‚ 61 ‚ 97 Suppresses
‚ 21.82 ‚ 36.97 ‚ 58.79
a
‚ 37.11 ‚ 62.89 ‚
‚ 57.14 ‚ 59.80 ‚
M
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
n
Total 63 102 165
y
38.18 61.82 100.00
17
a r t i o
i e t b u
p r
12.02 Quiz
t r i S
Which TABLES statement correctly creates the report?
r o dis SA
a. ;tables Gender nocum;
S P r f
b. ;tables Gender nocum nopercent;
o o
f
c. ;tables Gender / nopercent;
A
S No tsit d e
d. ;tables Gender / nocum nopercent;
ou
Gender Frequency
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68
M 97
p112d01
19
12-10 Chapter 12 Producing Summary Reports
Option Description
l
LIST displays n-way tables in list format.
ia
CROSSLIST displays n-way tables in column format.
FORMAT=
t r
formats the frequencies in n-way tables.
e
M a
21
r y i o n
t a u t
r i e i b
LIST and CROSSLIST Options
r S
p
o dis SA
Gender Country
t Frequency Percent
Cumulative Cumulative
Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
r
F Australia 27 16.36 27 16.36
F United States 41 24.85 68 41.21
M Australia 36 21.82 104 63.03
P f
tables Gender*Country / list;
M United States 61 36.97 165 100.00
A
S No tsit
Gender
d
Country
e
Australia
United States
Total
Frequency
27
41
68
Percent
16.36
24.85
41.21
Row
Percent
39.71
60.29
100.00
Column
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 42.86
40.20
----------------------------------------------------------------------
ou
M Australia 36 21.82 37.11 57.14
United States 61 36.97 62.89 59.80
tables Gender*Country / crosslist;
Total 97 58.79 100.00
----------------------------------------------------------------------
Total Australia 63 38.18 100.00
United States 102 61.82 100.00
FORMAT= Option
Partial PROC FREQ Outputs
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Australi‚United S‚ Total
‚a ‚tates ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21
l
‚ 39.71 ‚ 60.29 ‚
‚ 42.86 ‚ 40.20 ‚
ia
tables Gender*Country;
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Australia
t e r
‚United States‚ Total
a
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21
M
‚ 39.71 ‚ 60.29 ‚
n
‚ 42.86 ‚ 40.20 ‚
tables Gender*Country / format=12.;
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆ
23
a r y t i o p112d01
i e t b u
p r t i
PROC FREQ Statement Options
r S
Options can also be placed in the PROC FREQ statement.
r o dis SA
Option Description
S P r
NLEVELS
o o f
displays a table that provides the number of levels
for each variable named in the TABLES statement.
f
PAGE displays only one table per page.
A
S No tsit
COMPRESS
24
ou
12-12 Chapter 12 Producing Summary Reports
NLEVELS Option
proc freq data=orion.sales nlevels;
tables Gender Country Employee_ID;
run;
ia l
r
Variable Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
e
Gender 2
t
Country 2
Employee_ID 165
M a
25
r y i o n p112d01
t a u t
To display the number of levels without displaying the frequency counts, add the NOPRINT option to the
r i e
TABLES statement.
r i b
proc freq data=orion.sales nlevels;
S
run;
p t
tables Gender Country Employee_ID / noprint;
o dis SA
P r
To display the number of levels for all variables without displaying any frequency counts, use the _ALL_
f
r
keyword and the NOPRINT option in the TABLES statement.
S o o
proc freq data=orion.sales nlevels;
f e
tables _all_ / noprint;
A run;
S No tsit d
ou
12.1 Using the FREQ Procedure 12-13
PAGE Option
proc freq data=orion.sales;
tables Gender Country Employee_ID;
run;
ia l
r
proc freq data=orion.sales page;
e
tables Gender Country Employee_ID;
run;
t
page 1
M a
page 2 pages 3-6
26
r y i o n p112d01
t a u t
r i e
COMPRESS Option
r i b S
p
o dis SA t
proc freq data=orion.sales;
tables Gender Country Employee_ID;
P rrun;
f
S f o r page 1
o
page 1 pages 2-5
A
S No tsit d e
proc freq data=orion.sales compress;
tables Gender Country Employee_ID;
run;
ou
page 1 page 1 pages 1-4
p112d01
27
12-14 Chapter 12 Producing Summary Reports
l
TABLES variables / OUT=SAS-data-set <options>;
r
The OUTPUT statement with an OUT= option is used
e
to create a data set with specified statistics such as ia
t
the chi-square statistic.
a
M
OUTPUT OUT=SAS-data-set <options>;
29
r y i o n
t a u t
r i e i b
TABLES Statement OUT= Option (Self-Study)
r S
p t
The OUT= option in the TABLES statement creates an
o dis SA
output data set with the following variables:
P r BY variables
f
r
TABLES statement variables
S f o o
the automatic variables COUNT and PERCENT
e
A
S No tsit
other frequency and percentage variables requested
d
with options in the TABLES statement
30
ou
If more than one table request appears in the TABLES
statement, the contents of the data set correspond to the
last table request.
12.1 Using the FREQ Procedure 12-15
ia l
Obs
1
Country
AU
COUNT
63
t e r
PERCENT
38.1818
a
2 US 102 61.8182
y M
The NOPRINT option suppresses the display of all output.
n
31
a r t i o p112d02
i e t b u
p r t r i
TABLES Statement OUT= Option (Self-Study)
S
o dis SA
proc freq data=orion.sales noprint;
f
S run;
f o r o
proc print data=work.freq2;
A
S No tsit d e
PROC PRINT Output
Obs Gender Country COUNT PERCENT
1 F AU 27 16.3636
ou
2 F US 41 24.8485
3 M AU 36 21.8182
4 M US 61 36.9697
p112d02
32
12-16 Chapter 12 Producing Summary Reports
Option Description
l
includes the cumulative frequency and cumulative
OUTCUM percentage in the output data set for one-way
ia
frequency tables.
OUTPCT
r
includes the percentage of column frequency and row
e
frequency in the output data set for n-way frequency
tables.
t
M a
33
r y i o n
t a u t
r i e i b
TABLES Statement OUT= Option (Self-Study)
r S
p
o dis SA t
proc freq data=orion.sales noprint;
f
outcum;
r
run;
S f o o
proc print data=work.freq3;
e
A
run;
S No tsit d
PROC PRINT Output
Obs Country COUNT PERCENT CUM_FREQ CUM_PCT
ou
1 AU 63 38.1818 63 38.182
2 US 102 61.8182 165 100.000
p112d02
34
12.1 Using the FREQ Procedure 12-17
ia l
PROC PRINT Output
Obs Gender Country
t e r
COUNT PERCENT PCT_ROW PCT_COL
1
2
3
F
F
M
AU
US
AU
M a 27
41
36
16.3636
24.8485
21.8182
39.7059
60.2941
37.1134
42.8571
40.1961
57.1429
n
4 M US 61 36.9697 62.8866 59.8039
35
a r y t i o p112d02
i e t b u
p r t i
OUTPUT Statement OUT= Option (Self-Study)
r S
The OUT= option in the OUTPUT statement creates
r o dis SA
an output data set with the following variables:
BY variables
S P o r o f
the variables requested in the TABLES statement
variables that contain the specified statistics.
A t f d e
OUTPUT OUT=SAS-data-set <options>;
ou
statement, the contents of the data set corresponds to the
last table request.
36
If there are multiple TABLES statements, the contents of the data set corresponds to the last TABLES
statement.
12-18 Chapter 12 Producing Summary Reports
37
r y i n
association based on chi-square.
o p112d03
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
PROC FREQ Output
o dis SA
r
The FREQ Procedure
P f
Cumulative Cumulative
r
Country Frequency Percent Frequency Percent
o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
S o
AU 63 38.18 63 38.18
f e
US 102 61.82 165 100.00
A
S No tsit d Chi-Square Test
for Equal Proportions
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 9.2182
ou
DF 1
Pr > ChiSq 0.0024
38
12.1 Using the FREQ Procedure 12-19
chi-square p-value
ia l
r
degrees of freedom
t e
When you request a statistic, the OUTPUT data set
a
contains that test statistic plus any associated standard
y M
error, confidence limits, p-values, and degrees of freedom.
n
39
a r t i o
i e t b u
p r
12.03 Quiz
t r i S
Retrieve and submit program p112a01.
r o dis SA
proc freq data=orion.sales;
P
tables Gender / chisq out=freq6 outcum;
S
run;
o r o f
output out=freq7 chisq;
f
proc print data=freq6;
A
S No tsi
run;
t
run;
d e
proc print data=freq7;
ou
Review the PROC PRINT output from the TABLES
statement OUT= option.
Review the PROC PRINT output from the OUTPUT
statement OUT= option.
41
12-20 Chapter 12 Producing Summary Reports
l
1 F 68 41.2121 . .
2 M 97 58.7879 . .
ia
3 AU 63 38.1818 . .
r
4 US 102 61.8182 . .
5 Gender . . 5.09697 0.023968
e
6 Country . . 9.21818 0.002396
a t
y M n
43
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
12.1 Using the FREQ Procedure 12-21
Exercises
Level 1
ia l
e r
b. Modify the program to produce two separate reports:
t
a
1) Display the number of distinct levels of Customer_ID and Employee_ID for retail
orders.
M n
a) Use a WHERE statement to limit the report to retail sales by specifying the condition
y
a r
Order_Type=1.
t i o
b) Display this report title: Unique Customers and Salespersons for
i e t
Retail Sales.
b u
p r t r i
If you do not want to see the counts for individual levels of Customer_ID
S
and Employee_ID, add the NOPRINT option to the TABLES statement after
S P o r o f
a) Use a WHERE statement to limit the report to catalog and Internet sales by specifying
f
the condition corresponding to Order_Type values other than 1.
A
S No tsit d e
b) Display this report title: Unique Customers for Catalog and Internet.
If you do not want to see the counts for individual levels of Customer_ID,
add the NOPRINT option to the TABLES statement after a forward slash.
ou
12-22 Chapter 12 Producing Summary Reports
l
Variable Label Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ia
Customer_ID Customer ID 31
r
Employee_ID Employee ID 100
y M n
The FREQ Procedure
o
Number of Variable Levels
u
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
r i e r i b S
Customer_ID Customer ID 63
t
Level 2
r p
o dis SA
2. Producing Frequency Reports with PROC FREQ
S P o o f
a. Retrieve the starter program p112e02.
r
b. Add TABLES statements to the PROC FREQ step to produce three frequency reports:
A t f e
1) Number of orders in each year: Apply the YEAR4. format to the Order_Date variable to
d
S No tsi combine all orders within the same year.
2) Number of orders of each order type: Apply the ordertypes. format defined in the starter
ou
program to the Order_Type variable. Suppress the cumulative frequency and percentages.
3) Number of orders for each combination of year and order type: Suppress all percentages that
normally appear in each cell of an n-way table.
12.1 Using the FREQ Procedure 12-23
l
Cumulative Cumulative
Order_Date Frequency Percent Frequency Percent
ia
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
r
2003 104 21.22 104 21.22
e
2004 87 17.76 191 38.98
t
2005 70 14.29 261 53.27
a
2006 113 23.06 374 76.33
2007 116 23.67 490 100.00
y M n Order Type
a r t i o Order_
i e t b u
Type Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
i
Retail 260 53.06
p r t r S
Catalog
Internet
132
98
26.94
20.00
S No tsi ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2003 ‚ 45 ‚ 41 ‚ 18 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
104
ou
2004 ‚ 51 ‚ 20 ‚ 16 ‚ 87
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2005 ‚ 27 ‚ 23 ‚ 20 ‚ 70
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2006 ‚ 67 ‚ 33 ‚ 13 ‚ 113
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2007 ‚ 70 ‚ 15 ‚ 31 ‚ 116
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 260 132 98 490
12-24 Chapter 12 Producing Summary Reports
Level 3
l
Customer Demographics
Customer_
Country
y M n
Frequency Percent
Cumulative
Frequency
Cumulative
Percent
o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
AU
a
CA
r t i 8
15
10.39
19.48
8
23
10.39
29.87
i e
DE
IL
i b u 10
5
12.99
6.49
33
38
42.86
49.35
r
TR 7 9.09 45 58.44
p
US
ZA
t r S
28
4
36.36
5.19
73
77
94.81
100.00
r o dis SA
P f
Customer Type Name
S f o r
Customer_Type
o Frequency Percent
Cumulative
Frequency
Cumulative
Percent
A
S No tsit d e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Internet/Catalog Customers
Orion Club members high activity
Orion Club members medium activity
Orion Club Gold members high activity
8
11
20
10
10.39
14.29
25.97
12.99
8
19
39
49
10.39
24.68
50.65
63.64
ou
Orion Club Gold members low activity 5 6.49 54 70.13
Orion Club Gold members medium activity 6 7.79 60 77.92
Orion Club members low activity 17 22.08 77 100.00
c. What are the two most common values for each variable?
1) Country ___________________ ____________________
Documentation about the FREQ procedure can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS
ia l
r
Base SAS Procedures Guide: Statistical Procedures The FREQ Procedure).
e
Look for an option in the PROC FREQ statement that can perform the requested
action.
a t
e. Submit the modified program.
M n
f. What are the two most common values for each variable?
y
1) Country
a r t i o ___________________ ____________________
i e t
2) Customer Type
b u
___________________ ____________________
i
3) Customer Age Group ___________________ ____________________
p r t r S
Do these answers match the previous set of answers?
r o dis SA
Which report was easier to use to answer the questions correctly?
S P o o f
4. Creating an Output Data Set with PROC FREQ
r
a. Retrieve the starter program p112e04.
A t f e
b. Create an output data set containing the frequency counts based on Product_ID.
d
S No tsi Creating an output data set from PROC FREQ results is discussed in the self-study
content at the end of this section.
ou
c. Combine the output data set with orion.product_list to obtain the Product_Name
value for each Product_ID code.
d. Sort the merged data so that the most frequently ordered products appear at the top of the
resulting data set. Print the first 10 observations, that is, those that represent the 10 products
ordered most often.
.
12-26 Chapter 12 Producing Summary Reports
Product
Obs Orders Number Product
1 6 230100500056 Knife
l
2 6 230100600030 Outback Sleeping Bag, Large,Left,Blue/Black
3 5 230100600022 Expedition10,Medium,Right,Blue Ribbon
ia
4 5 240400300035 Smasher Shorts
r
5 4 230100500082 Lucky Tech Intergal Wp/B Rain Pants
e
6 4 230100600005 Basic 10, Left , Yellow/Black
t
7 4 230100600016 Expedition Zero,Medium,Right,Charcoal
a
8 4 230100600028 Expedition 20,Medium,Right,Forestgreen
9 4 230100700008 Family Holiday 4
M
10 4 230100700011 Hurricane 4
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.2 Using the MEANS Procedure 12-27
Objectives
Calculate summary statistics and multilevel summaries
with the MEANS procedure.
Enhance summary tables with options.
Produce output data sets by using the OUT= option in
ia l
r
the OUTPUT statement. (Self-Study)
procedure. (Self-Study)
a t e
Compare the SUMMARY procedure to the MEANS
y M n
a r t i o
i e t b u
i
47
p r t r S
r o dis SA
The MEANS Procedure
S P o r o f
The MEANS procedure provides data summarization
tools to compute descriptive statistics for variables across
f
all observations and within groups of observations.
A
S No tsit d e
General form of the MEANS procedure:
ou
VAR analysis-variable(s);
CLASS classification-variable(s);
RUN;
48
12-28 Chapter 12 Producing Summary Reports
ia l
Variable N
t e
Mean
r
The MEANS Procedure
a
Employee_ID 165 120713.90 450.0866939 120102.00 121145.00
Salary 165 31160.12 20082.67 22710.00 243190.00
Birth_Date 165 3622.58 5456.29 -5842.00 10490.00
Hire_Date 165
M
12054.28
n
4619.94 5114.00 17167.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
y
49
a r t i o p112d05
i e t b u
p r
The VAR Statement
t r i S
The VAR statement identifies the analysis variables
r o dis SA
and their order in the results.
P f
proc means data=orion.sales;
S f o r
var Salary;
run;
o
A
S No tsit N
d e Mean
The MEANS Procedure
ou
165 31160.12 20082.67 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
p112d05
50
12.2 Using the MEANS Procedure 12-29
ia l
r
Analysis Variable : Salary
e
N
Gender Country Obs N Mean Std Dev Minimum Maximum
t
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F AU 27 27 27702.41 1728.23 25185.00 30890.00
M
US
AU
41
M
36
a 41
36
29460.98
32001.39
8847.03
16592.45
25390.00
25745.00
83505.00
108255.00
n
US 61 61 33336.15 29592.69 22710.00 243190.00
y
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
51
a r t i o p112d05
i e t b u
p r t i
The CLASS Statement
r S proc means data=orion.sales;
P f
run;
r
classification
o
variables
S
The MEANS Procedure
o
analysis
f
Analysis Variable : Salary
e
variable
A t
N
d
Gender Country Obs N Mean Std Dev Minimum Maximum
S No tsi
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F AU 27 27 27702.41 1728.23 25185.00 30890.00
M AU 36 36
statistics16592.45
32001.39
for analysis variable
25745.00 108255.00
ou
US 61 61 33336.15 29592.69 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
The CLASS statement adds the N Obs column, which is the number
of observations for each unique combination of the class variables.
52
12-30 Chapter 12 Producing Summary Reports
12.04 Quiz
For a given data set, there are 63 observations with
a Country value of AU. Of those 63 observations,
only 61 observations have a value for Salary.
Which output is correct?
ia l
r
N N
Country Obs N Country Obs N
e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
AU 63 61 AU 61 63
a
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
y M n
54
a r t i o
i e t b u
p r t i
Additional SAS Statements
r S
Additional statements can be added to enhance the reports.
r o dis SA
proc format;
value $ctryfmt 'AU'='Australia'
S P r
run;
o o f 'US'='United States';
A t f e
options nodate pageno=1;
ods html file='p112d05.html';
d
S No tsi
proc means data=orion.sales;
var Salary;
class Gender Country;
where Job_Title contains 'Rep';
ou
format Country $ctryfmt.;
title 'Sales Rep Summary Report';
run;
ods html close;
p112d05
56
12.2 Using the MEANS Procedure 12-31
ia l
t e r
M a
57
r y i o n
t a u t
r i e i
PROC MEANS Statistics
r b S
p t
The statistics to compute and the order to display them
o dis SA
can be specified in the PROC MEANS statement.
f
r
var Salary;
S
class Country;
run;
f o e o
A
S No tsit dN
The MEANS Procedure
ou
Country Obs Sum Mean Range
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 1900015.00 30158.97 83070.00
p112d05
58
12-32 Chapter 12 Producing Summary Reports
ia l
r
Quantile Statistic Keywords
e
MEDIAN |
t
P1 P5 P10 Q1 | P25
P50
Q3 | P75 P90
n
Hypothesis Testing Keywords
59
PROBT
a r y T
t i o
i e t b u
p r t i
PROC MEANS Statement Options
r S
Options can also be placed in the PROC MEANS
r o dis SA
statement.
P f
Option Description
S f o r
MAXDEC=
o
specifies the number of decimal places to use in
printing the statistics.
A
S No tsit FW=
60
ou
12.2 Using the MEANS Procedure 12-33
MAXDEC= Option
proc meansThedata=orion.sales
MEANS Procedure
maxdec=0;
Analysis Variable : Salary
N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30159 12699 25185 108255
l
US 102 102 31778 23556 22710 243190
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
r
proc means Thedata=orion.sales
MEANS Procedure
e
maxdec=1;
Analysis Variable : Salary ia
Country
N
Obs N
M
AU 63 63 30159.0 12699.1 25185.0 108255.0
n
US 102 102 31778.5 23555.8 22710.0 243190.0
y
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
61
a r t i o p112d05
i e t b u
p r
FW= Option
t r i S
proc means data=orion.sales;
o dis SA
The MEANS Procedure
r
Analysis Variable : Salary
P f
Country Obs N Mean Std Dev Minimum Maximum
r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
o
AU 63 63 30158.97 12699.14 25185.00 108255.00
S o
US 102 102 31778.48 23555.84 22710.00 243190.00
f e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
A
S No tsit d proc means data=orion.sales fw=15;
The MEANS Procedure
ou
N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30158.96825397 12699.13932690 25185.00000000 108255
p112d05
62
12-34 Chapter 12 Producing Summary Reports
NONOBS Option
proc means data=orion.sales;
The MEANS Procedure
N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30158.97 12699.14 25185.00 108255.00
l
US 102 102 31778.48 23555.84 22710.00 243190.00
ia
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
proc means
t e r
data=orion.sales nonobs;
The MEANS Procedure
Country N
a
Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63
M
30158.97 12699.14 25185.00 108255.00
US
63
102
r y
31778.48
p112d05
t a u t
r i e i b
Output Data Sets (Self-Study)
r S
p t
PROC MEANS produces output data sets using the
o dis SA
following method:
P r f
OUTPUT OUT=SAS-data-set <options>;
S f o r o
A
S No tsit d e
The output data set contains the following variables:
BY variables
class variables
the automatic variables _TYPE_ and _FREQ_
ou
the variables requested in the OUTPUT statement
65
12.2 Using the MEANS Procedure 12-35
t
proc print data=work.means1;
e r
run;
M a
66
r y i o n p112d06
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p Gender
t
Partial PROC PRINT Output
o dis SA
Obs Country _TYPE_ _FREQ_ _STAT_ Salary
P r 1
2
f
0
0
165
165
N
MIN
165.00
22710.00
r
3 0 165 MAX 243190.00
o
4 0 165 MEAN 31160.12
S o
5 0 165 STD 20082.67
f
6 AU 1 63 N 63.00
A e
7 AU 1 63 MIN 25185.00
default statistics
t
8 AU 1 63 MAX 108255.00
S No tsi d
9 AU 1 63 MEAN 30158.97
10 AU 1 63 STD 12699.14
11 US 1 102 N 102.00
12 US 1 102 MIN 22710.00
13 US 1 102 MAX 243190.00
14 US 1 102 MEAN 31778.48
ou
15 US 1 102 STD 23555.84
16 F 2 68 N 68.00
17 F 2 68 MIN 25185.00
18 F 2 68 MAX 83505.00
19 F 2 68 MEAN 28762.72
20 F 2 68 STD 6974.15
67
12-36 Chapter 12 Producing Summary Reports
ia l
run;
t e r
sum=sumSalary mean=aveSalary;
run;
M a
proc print data=work.means2;
68
r y i o n
The NOPRINT option suppresses the display of all output.
p112d06
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
PROC PRINT Output
o dis SA
r
min max sum ave
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary
S P1
2
3
4
o r
F
AU
US
o f 0
1
1
2
165
63
102
68
22710
25185
22710
25185
243190
108255
243190
83505
5141420
1900015
3241405
1955865
31160.12
30158.97
31778.48
28762.72
f e
5 M 2 97 22710 243190 3185555 32840.77
A
6 F AU 3 27 25185 30890 747965 27702.41
t d
7 F US 3 41 25390 83505 1207900 29460.98
S No tsi
8 M AU 3 36 25745 108255 1152050 32001.39
9 M US 3 61 22710 243190 2033505 33336.15
69
ou
12.2 Using the MEANS Procedure 12-37
ia l
r
1 0 165 22710 243190 5141420 31160.12
2 AU 1 63 summary
25185 by Country
108255 1900015only
30158.97
e
3 US 1 102 22710 243190 3241405 31778.48
t
4 F 2 68 25185 83505 1955865 28762.72
5 M 2 97 summary
22710 by
243190Gender only
3185555 32840.77
a
6 F AU 3 27 25185 30890 747965 27702.41
7 F US 3 41 25390 83505 1207900 29460.98
8 M AU 3 summary
36 25745 by108255
Country and Gender
1152050 32001.39
M
9 M US 3 61 22710 243190 2033505 33336.15
70
r y i o n
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
Obs
p
For
o dis SA
Gender
t
this example,
Country _TYPE_ _FREQ_
min
Salary
max
Salary
sum
Salary
ave
Salary
r
1
_TYPE_ = 0 is an overall
0
summary
165 22710 243190 5141420 31160.12
2
PROC PRINT AU
Output:
1 63 25185 108255 1900015 30158.97
P f
3 US 1 102 22710 243190 3241405 31778.48
r
4 F 2 68 25185 83505 1955865 28762.72
o
5 M 2 97 22710 243190 3185555 32840.77
S o
6 F AU 3 27 25185 30890 747965 27702.41
f e
7 F US 3 41 25390 83505 1207900 29460.98
A
8 M AU 3 36 25745 108255 1152050 32001.39
t
9 M US 3 61 22710 243190 2033505 33336.15
S No tsi _TYPE_
0
d Type of Summary
overall summary
_FREQ_
165
ou
1 summary by Country only 63 AU + 102 AU = 165
2 summary by Gender only 68 F + 97 M = 165
summary by Country 27 F AU + 41 F US +
3
and Gender 36 M AU + 61 M US = 165
71
12-38 Chapter 12 Producing Summary Reports
Option Description
specifies that the output data set contain only
statistics for the observations with the highest
l
NWAY
_TYPE_ value.
DESCENDTYPES
r
orders the output data set by descending _TYPE_
value.
e ia
CHARTYPE
a t
specifies that the _TYPE_ variable in the output
data set is a character representation of the binary
value of _TYPE_.
y M n
72
a r t i o
i e t b u
p r t r i
OUTPUT Statement OUT= Option (Self-Study)
S
o dis SA
without options
min max sum ave
r
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary
P f
1 0 165 22710 243190 5141420 31160.12
r
2 AU 1 63 25185 108255 1900015 30158.97
o
3 US 1 102 22710 243190 3241405 31778.48
S o
4 F 2 68 25185 83505 1955865 28762.72
5 M 2 97 22710 243190 3185555 32840.77
f e
6 F AU 3 27 25185 30890 747965 27702.41
A
7 F US 3 41 25390 83505 1207900 29460.98
t d
8 M AU 3 36 25745 108255 1152050 32001.39
S No tsi
9 M US 3 61 22710 243190 2033505 33336.15
with NWAY
ou
min max sum ave
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary
p112d06
73
12.2 Using the MEANS Procedure 12-39
l
6 M 2 97 22710 243190 3185555 32840.77
7 AU 1 63 25185 108255 1900015 30158.97
ia
8 US 1 102 22710 243190 3241405 31778.48
9 0 165 22710 243190 5141420 31160.12
t e r
M a
74
r y i o n p112d06
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p
o dis SA t
with CHARTYPE
min max sum ave
r
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary
P f
1 00 165 22710 243190 5141420 31160.12
r
2 AU 01 63 25185 108255 1900015 30158.97
o
3 US 01 102 22710 243190 3241405 31778.48
S o
4 F 10 68 25185 83505 1955865 28762.72
5 M 10 97 22710 243190 3185555 32840.77
f e
6 F AU 11 27 25185 30890 747965 27702.41
A
7 F US 11 41 25390 83505 1207900 29460.98
t d
8 M AU 11 36 25745 108255 1152050 32001.39
S No tsi
9 M US 11 61 22710 243190 2033505 33336.15
75
ou p112d06
12-40 Chapter 12 Producing Summary Reports
12.05 Quiz
Retrieve and submit program p112a02.
Review the PROC PRINT output.
Add a WHERE statement to the PROC PRINT step
to subset _TYPE_ for observations summarized
by Gender only.
Submit the program and verify the results.
ia l
t e r
M a
77
r y i o n
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
Program p112d07 is an example of merging a PROC
o dis SA
MEANS output data set with a detail data set to create the
f
r o
Comparison Comparison
First_ to Country to Gender
S o
Obs Name Last_Name Salary Salary Average Salary Average
A t f 1
2
Tom
e
Wilson
d
Zhou
Dawes
108255
87975
Above
Above
Average
Average
Above
Above
Average
Average
S No tsi
3 Irenie Elvish 26600 Below Average Below Average
4 Christina Ngan 27475 Below Average Below Average
5 Kimiko Hotstone 26190 Below Average Below Average
6 Lucian Daymond 26480 Below Average Below Average
7 Fong Hofmeister 32040 Above Average Below Average
8 Satyakam Denny 26780 Below Average Below Average
ou
9 Sharryn Clarkson 28100 Below Average Below Average
10 Monica Kletschkus 30890 Above Average Above Average
ia l
r
VAR analysis-variable(s);
CLASS classification-variable(s);
RUN;
a t e
y M n
80
a r t i o
i e t b u
p r t i
The SUMMARY Procedure (Self-Study)
r S
The SUMMARY procedure uses the same syntax
r o dis SA
as the MEANS procedure.
The only differences to the two procedures are
S P the following:
o r o f
f
PROC MEANS PROC SUMMARY
A
S No tsit d e
The PRINT option is set by default, The NOPRINT option is set by
which displays output.
Omitting the VAR statement
default, which displays no output.
Omitting the VAR statement produces
analyzes all the numeric variables. a simple count of observations.
81
ou
12-42 Chapter 12 Producing Summary Reports
Exercises
Level 1
ia l
t e r
a. Retrieve the starter program p112e05.
b. Display only the SUM statistic for the Total_Retail_Price variable.
M a
c. Display separate statistics for the combination of Order_Date and Order_Type. Apply the
ORDERTYPES. format so that the order types are displayed as text descriptions, not numbers.
r y i o n
Apply the YEAR4. format so that order dates are displayed as years, not individual dates.
t
d. Submit the program to produce the following report:
t a u
Partial PROC MEANS Output
e
r i t r i b Revenue (in U.S. Dollars) Earned from All Orders
S
p
The MEANS Procedure
r o dis SA Analysis Variable : Total_Retail_Price Total Retail Price for This Product
S P o r o f Date
Order
was
A t f d e
placed
by Order N
ou
Catalog 52 10668.08
Internet 23 4124.05
Catalog 23 3494.60
Internet 22 3275.70
Catalog 33 6569.98
Internet 23 4626.40
12.2 Using the MEANS Procedure 12-43
Level 2
l
c. Suppress any decimal places in the displayed statistics.
ia
d. Display separate statistics for each value of Gender.
e r
e. Suppress the output column that displays the total number of observations in each classification
group.
t
a
f. Submit the program to produce the following report:
M n
PROC MEANS Output
a r y t i o
Number of Missing and Non-Missing Date Values
t
The MEANS Procedure
i e b
Employee
i u N
r
Gender Variable Label Miss N
p t r S
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
o dis SA
F Birth_Date Employee Birth Date 0 191
Emp_Hire_Date Employee Hire Date 0 191
P r f
Emp_Term_Date Employee Termination Date 139 52
r
M Birth_Date Employee Birth Date 0 233
S f o e o Emp_Hire_Date
Emp_Term_Date
Employee Hire Date
Employee Termination Date
0
169
233
64
A
S No tsit
Level 3
d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ou
7. Analyzing All Possible Classification Levels with PROC MEANS
a. Retrieve the starter program p112e07.
b. Display the following statistics in the report:
1) Lower Confidence Limit for the Mean
2) Mean
3) Upper Confidence Limit for the Mean
c. Change the value for the confidence limits to 0.10, resulting in a 90% confidence limit.
12-44 Chapter 12 Producing Summary Reports
d. Display all countries stored in the Work.countries data set in the report, even
if there are no customers from that country.
Documentation about the MEANS procedure can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The MEANS Procedure).
Look for options in the PROC MEANS statement that can perform the requested actions.
e. Submit the program to produce the following report:
PROC MEANS Output
ia l
r
Average Age of Customers in Each Country
y M
Customer
Country
n
N
Obs
Lower 90%
CL for Mean Mean
Upper 90%
CL for Mean
a r AU
i o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t
8 42.4983854 52.3750000 62.2516146
i e t BE
b u 0 . . .
p r t r i
CA
S
15 31.2270622 40.0000000 48.7729378
o dis SA
DE 10 35.2564025 46.6000000 57.9435975
P r DK
f
0 . . .
S f o r o
ES 0 . . .
e
FR 0 . . .
A
S No tsit d GB
IL
0
5
.
30.1150331
.
40.0000000
.
49.8849669
ou
NL 0 . . .
NO 0 . . .
PT 0 . . .
SE 0 . . .
Creating an output data set from PROC MEANS results is discussed in the self-study
content at the end of this section.
c. Combine the output data set with orion.product_list to obtain the Product_Name
value for each Product_ID code.
ia l
e r
d. Sort the merged data so that the products with higher revenues appear at the top of the resulting
t
data set. Print the first 10 observations, that is, those that represent the ten products with the most
revenue.
M a
To limit the number of observations displayed by PROC PRINT, apply the
r y i o n
OBS= data set option, as in the following:
a t
proc print data=work.mydataset(obs=10);
t u
e
e. Display the revenue values with a leading euro symbol (€), a period that separates every three
r i t r i b
digits, and a comma that separates the decimal fraction.
S
r p
f. Submit the program to produce the following report:
o dis SA
PROC MEANS Output
P f
Top Ten Products by Revenue
S f
Obs
o r Revenue
o Product
Number Product
A
S No tsit 1
2
3 d e
€3.391,80
€3.080,30
€2.250,00
230100700009
230100700008
230100700011
Family Holiday 6
Family Holiday 4
Hurricane 4
4 €1.937,20 240200100173 Proplay Executive Bi-Metal Graphite
ou
5 €1.796,00 240200100076 Expert Men's Firesole Driver
6 €1.561,80 240300300090 Top R&D Long Jacket
7 €1.514,40 240300300070 Top Men's R&D Ultimate Jacket
8 €1.510,80 240100400098 Rollerskate Roller Skates Ex9 76mm/78a Biofl
9 €1.424,40 240100400129 Rollerskate Roller Skates Sq9 80-76mm/78a
10 €1.343,30 240100400043 Perfect Fit Men's Roller Skates
12-46 Chapter 12 Producing Summary Reports
Objectives
Create one-, two-, and three-dimensional tabular
reports using the TABULATE procedure.
Produce output data sets by using the OUT= option
in the PROC statement.
ia l
t e r
M a
r y i o n
t a u t
85
r i e r i b S
p
o dis SA t
P r The TABULATE Procedure
f
r
The TABULATE procedure displays descriptive statistics
S f o e o
in tabular format.
A
General form of the TABULATE procedure:
S No tsit d
PROC TABULATE DATA=SAS-data-set <options>;
CLASS classification-variable(s);
VAR analysis-variable(s);
ou
TABLE page-expression,
row-expression,
column-expression </ option(s)>;
RUN;
86
The TABULATE procedure computes many of the same statistics that are computed by other descriptive
statistical procedures such as PROC MEANS and PROC FREQ.
A CLASS statement or a VAR statement must be specified, but both statements together
are not required.
12.3 Using the TABULATE Procedure (Self-Study) 12-47
Dimensional Tables
The TABULATE procedure produces one-, two-, or three-
dimensional tables.
page row column
dimension dimension dimension
one-
dimensional
two-
ia l
r
dimensional
three-
dimensional
a t e
y M n
87
a r t i o
i e t b u
p r t i
One-Dimensional Table
r S
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
o dis SA
‚ Country ‚
r
‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ AU ‚ US ‚
P f
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
r
‚ N ‚ N ‚
o
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
S o
‚ 63.00‚ 102.00‚
f e
Šƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
A
S No tsit d
Country is in the column dimension.
88
ou
12-48 Chapter 12 Producing Summary Reports
Two-Dimensional Table
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ N ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Gender ‚ ‚ ‚
l
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F ‚ 27.00‚ 41.00‚
ia
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 36.00‚ 61.00‚
t e r
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
a
Gender is in the row dimension.
M
89
r y i o n
t a u t
r i e i b
Three-Dimensional Table
r S
p
o dis SA t
Job_Title Sales Rep. I
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
r
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
P f
‚ ‚ AU ‚ US ‚
r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
o
‚ ‚ N ‚ N ‚
S o
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
f e
‚Gender ‚ ‚ ‚
A
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
S No tsit d
‚F ‚ 8.00‚ 13.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 13.00‚ 29.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
ou
Country is in the column dimension.
Gender is in the row dimension.
Job_Title is in the page dimension.
90
12.3 Using the TABULATE Procedure (Self-Study) 12-49
dimension expressions
ia l
t e r
Commas separate the dimension expressions.
Every variable that is part of a dimension expression
a
must be specified as a classification variable (CLASS
statement) or an analysis variable (VAR statement).
M
91
r y i o n
t a u t
r i e
The TABLE Statement
r i b S
p
o dis SA paget
r
row column
table , , ;
expression expression expression
S P o r
Examples:
o f
A t f
table Country;
d e
S No tsitable Gender , Country;
ou
table Job_Title , Gender , Country;
92
12-50 Chapter 12 Producing Summary Reports
CLASS classification-variable(s);
ia l
t e r
statistic for classification variables.
Examples of classification variables:
a
Job_Title, Gender, and Country
y M n
93
a r t i o
Class variables
i e t b
• can be numeric or character
u
p r t r i S
• identify classes or categories on which calculations are done
o dis SA
• represent discrete categories if they are numeric (for example, Year).
P r f
S f o r o
A
S No tsit d e
ou
12.3 Using the TABULATE Procedure (Self-Study) 12-51
VAR analysis-variable(s);
ia l
Salary and Bonus
t e r
Examples of analysis variables:
M a
94
r y i o n
t
Analysis variables
a u t
r i e
• are always numeric
• tend to be continuous
r i b S
p
o dis SA t
• are appropriate for calculating averages, sums, or other statistics.
P r f
S f o r
One-Dimensional Table
o
e
proc tabulate data=orion.sales;
A
S No tsit
class Country;
d
table Country;
run;
ou
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ Country ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ AU ‚ US ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ N ‚ N ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ 63.00‚ 102.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
p112d08
95
If there are only class variables in the TABLE statement, the default statistic is N, or number
of nonmissing values.
12-52 Chapter 12 Producing Summary Reports
Two-Dimensional Table
proc tabulate data=orion.sales;
class Gender Country;
table Gender, Country;
run;
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
l
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
ia
‚ ‚ AU ‚ US ‚
r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ N ‚
e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
t
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F
a ‚ 27.00‚ 41.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M
M
‚ 36.00‚ 61.00‚
n
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
96
a r y t i o p112d08
i e t b u
p r t i
Three-Dimensional Table
r S
proc tabulate data=orion.sales;
r o dis SA
class Job_Title Gender Country;
table Job_Title, Gender, Country;
P
run;
S o r o f
A t f d e
S No tsi
97
ou p112d08
12.3 Using the TABULATE Procedure (Self-Study) 12-53
Three-Dimensional Table
Partial PROC TABULATE Output
Job_Title Sales Rep. I
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
l
‚ ‚ N ‚ N ‚
Job_Title Sales Rep. II
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
ia
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Gender ‚ ‚ ‚
‚ ‚ Country ‚
r
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚F ‚ 8.00‚ 13.00‚
‚ ‚ AU ‚ US ‚
e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
t
‚M ‚ 13.00‚ 29.00‚
‚ ‚ N ‚ N ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
a
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F
M ‚
n
10.00‚ 14.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M
y
‚ 8.00‚ 14.00‚
98
a r i o
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
t
i e t b u
p r t i
Dimension Expression
r S
Elements that can be used in a dimension expression:
r o dis SA
classification variables
analysis variables
S P o r o f
the universal class variable ALL
keywords for statistics
A t f d e
S No tsi
Operators that can be used in a dimension expression:
blank, which concatenates table information
asterisk *, which crosses table information
ou
parentheses (), which group elements
99
Dimension Expression
proc tabulate data=orion.sales;
class Gender Country;
var Salary;
table Gender all, Country*Salary;
run;
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
l
‚ ‚ Country ‚
ia
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
r
‚ ‚ Salary ‚ Salary ‚
e
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Sum ‚ Sum ‚
t
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
a
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F ‚ 747965.00‚ 1207900.00‚
M
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 1152050.00‚ 2033505.00‚
n
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
y
‚All ‚ 1900015.00‚ 3241405.00‚
r o
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
i
p112d08
100
t a u t
If there are analysis variables in the TABLE statement, the default statistic is SUM.
r i e r i b S
p t
PROC TABULATE Statistics
o dis SA
P r f CSS
Descriptive Statistic Keywords
CV LCLM MAX
S f o r
MEAN
e
KURTOSIS RANGE SKEWNESS STDDEV STDERR
A
S No tsit
SUM
PCTN
PCTSUM d
SUMWGT
REPPCTN
REPPCTSUM
UCLM
PAGEPCTN
PAGEPCTSUM
USS
ROWPCTN
ROWPCTSUM
VAR
COLPCTN
COLPCTSUM
ou
Quantile Statistic Keywords
PROBT T
101
12.3 Using the TABULATE Procedure (Self-Study) 12-55
l
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚
ia
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Salary ‚ Salary ‚
e
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
t
‚ ‚ Min ‚ Max ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
a
‚Gender ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚F ‚ 25185.00‚ 30890.00‚ 25390.00‚ 83505.00‚
M
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
n
‚M ‚ 25745.00‚ 108255.00‚ 22710.00‚ 243190.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
y o
‚All ‚ 25185.00‚ 108255.00‚ 22710.00‚ 243190.00‚
r i
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
p112d06
102
t a u t
r i e i b
Additional SAS Statements
r S
p t
Additional statements can be added to enhance the report.
o dis SA
r
proc format;
value $ctryfmt 'AU'='Australia'
S P r
run;
o o f
'US'='United States';
f
options nodate pageno=1;
A
S No tsit d e
ods html file='p112d08.html';
proc tabulate data=orion.sales;
class Gender Country;
var Salary;
table Gender all, Country*Salary*(min max);
ou
where Job_Title contains 'Rep';
label Salary='Annual Salary';
format Country $ctryfmt.;
title 'Sales Rep Tabular Report';
run;
ods html close; p112d08
103
12-56 Chapter 12 Producing Summary Reports
ia l
t e r
M a
104
r y i o n
t a u t
r i e
Output Data Sets
r i b S
p t
PROC TABULATE produces output data sets using the
o dis SA
following method:
P r f
PROC TABULATE DATA=SAS-data-set
S f o r o
OUT=SAS-data-set <options>;
A
S No tsit d e
The output data set contains the following variables:
BY variables
class variables
the automatic variables _TYPE_, _PAGE_, and
ou
_TABLE_
calculated statistics
106
12.3 Using the TABULATE Procedure (Self-Study) 12-57
l
table Job_Title, Gender, Country;
run;
e r ia
a t
y M n
107
a r t i o p112d09
i e t b u
p r t i
PROC Statement OUT= Option
r
Partial PROC PRINT Output
S
r o dis SA
Obs Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_ N
P f
1 AU 001 1 1 61
r
2 US 001 1 1 98
o
3 F AU 011 1 2 27
S o
4 F US 011 1 2 40
f e
5 M AU 011 1 2 34
A
6 M US 011 1 2 58
S No tsit 7
8
9
10
11
d
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
I
I
I
I
II
F
F
M
M
F
AU
US
AU
US
AU
111
111
111
111
111
1
1
1
1
2
3
3
3
3
3
8
13
13
29
10
ou
12 Sales Rep. II F US 111 2 3 14
13 Sales Rep. II M AU 111 2 3 8
14 Sales Rep. II M US 111 2 3 14
15 Sales Rep. III F AU 111 3 3 7
16 Sales Rep. III F US 111 3 3 8
17 Sales Rep. III M AU 111 3 3 10
18 Sales Rep. III M US 111 3 3 9
108
12-58 Chapter 12 Producing Summary Reports
1
Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_
AU 001 1 1 61
N
ia l
2
3
4
5
t e r F
F
M
US
AU
US
AU
001
011
011
011
1
1
1
01for
1
2
2
2
98
27
40
Job_Title
34
,
a
6 M US 011 11for Gender
2 , and
58
1 for Country
y M n
109
a r t i o
i e t b u
p r t i
PROC Statement OUT= Option
r S
_PAGE_ is a numeric variable that shows the logical page
r o dis SA
number that contains that observation.
S P o r
Obs f
Partial PROC PRINT Output
o
Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_ N
A t f 7
8
d e
Sales
Sales
Rep.
Rep.
I
I
F
F
AU
US
111
111
1
1
3
3 Page
8
13 1 for
S No tsi 9
10
11
12
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
I
I
II
II
M
M
F
F
AU
US
AU
US
111
111
111
111
1
1
2
2
3Sales
3
3
3 Page
13Rep. I
29
10
14 2 for
ou
13 Sales Rep. II M AU 111 2 3Sales 8Rep. II
14 Sales Rep. II M US 111 2 3 14
15 Sales Rep. III F AU 111 3 3 7
16 Sales Rep. III F US 111 3 3 Page8 3 for
17 Sales Rep. III M AU 111 3 3Sales10Rep. III
18 Sales Rep. III M US 111 3 3 9
110
12.3 Using the TABULATE Procedure (Self-Study) 12-59
1
2
AU 001
1 for firstUSTABLE001
1
statement
1
1
1
61
98
ia l
r
3 F AU 011 1 2 27
4 F US 011 1 2 40
e
5
2 forMsecondAUTABLE011statement
1 2 34
6
7
8
9
Sales
Sales
Sales
Rep.
Rep.
Rep.
a
I
I
I t M
F
F
US
AU
US
011
111
111
3Mfor thirdAUTABLE111
1
1
1
statement
1
2
3
3
3
58
8
13
13
10 Sales
y
Rep.
M I
n
M US 111 1 3 29
111
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
12-60 Chapter 12 Producing Summary Reports
Exercises
Level 1
ia l
e r
b. Add a CLASS statement to enable Customer_Group and Customer_Gender
t
as classification variables.
a
c. Add a VAR statement to enable Customer_Age as an analysis variable
M
r i o n
d. Add a TABLE statement to create a report with the following characteristics:
y
1) Customer_Group defines the rows.
t a u t
2) An extra row that combines all groups appears at the bottom of the table.
r i e i b
3) Customer_Gender defines the columns.
r S
p t
4) The N and MEAN statistics based on Customer_Age are displayed for each
o dis SA
combination of Customer_Group and Customer_Gender.
P r f
e. Submit the program to produce the following report:
S f r o
PROC TABULATE Output
o e
Ages of Customers by Group and Gender
A
S No tsit d„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚
‚
‚ Customer Gender ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
ou
‚ ‚ F ‚ M ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Customer Age ‚ Customer Age ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ Mean ‚ N ‚ Mean ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Customer Group Name ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚Internet/Catalog ‚ ‚ ‚ ‚ ‚
‚Customers ‚ 4.00‚ 49.25‚ 4.00‚ 54.25‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club Gold ‚ ‚ ‚ ‚ ‚
‚members ‚ 11.00‚ 35.36‚ 10.00‚ 38.90‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club members ‚ 15.00‚ 32.53‚ 33.00‚ 47.03‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚All ‚ 30.00‚ 35.80‚ 47.00‚ 45.91‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
12.3 Using the TABULATE Procedure (Self-Study) 12-61
Level 2
ia l
r
3) The column dimension should display the number of customers and the percentage
e
of customers in each category (COLPCTN).
a t
Change the headers for the statistic columns with a KEYLABEL statement.
Documentation about the KEYLABEL statement can be found in the SAS Help
M
and Documentation from the Contents tab (SAS Products Base SAS
n
Base SAS 9.2 Procedures Guide Procedures The TABULATE Procedure).
y
r t i o
c. Submit the program to produce the following two-page report:
a
i e t
PROC TABULATE Output
b u
i
Customers by Group and Gender
p r t r S
Customer Gender F
o dis SA
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ†
r
‚ ‚ Number ‚ Percentage ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
A t f d e
‚Customers ‚ 4.00‚ 13.33‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
S No tsi
‚Orion Club Gold ‚ ‚ ‚
‚members ‚ 11.00‚ 36.67‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
ou
‚Orion Club members ‚ 15.00‚ 50.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
12-62 Chapter 12 Producing Summary Reports
Customer Gender M
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Number ‚ Percentage ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Customer Group Name ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚Internet/Catalog ‚ ‚ ‚
‚Customers ‚ 4.00‚ 8.51‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club Gold ‚ ‚ ‚
ia l
r
‚members ‚ 10.00‚ 21.28‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
a t e
‚Orion Club members ‚ 33.00‚ 70.21‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
M
Level 3
y o n
11. Creating a Customized Tabular Report with PROC TABULATE
r i
a t
a. Retrieve the starter program p112e11.
t u
r e r i b
b. Modify the label for the Total_Retail_Price variable.
i S
c. Suppress the labels for the Order_Date and Product_ID variables.
p
o dis SA t
d. Suppress the label for the SUM keyword.
P r
e. Insert this text into the box above the row titles: High Cost Products (Unit Cost >
f
S f o r
$250). Suppress all titles.
o
f. Display all calculated cell values with the DOLLAR12. format.
A
S No tsit d e
g. Display $0 in all cells that have no calculated value.
Documentation about the TABULATE procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
ou
Base SAS 9.2 Procedures Guide Procedures The TABULATE Procedure).
Look for features of the PROC TABULATE statement, the TABLE statement, and
the KEYLABEL statement that can perform the requested actions.
12.3 Using the TABULATE Procedure (Self-Study) 12-63
l
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚2005 ‚ $2,057‚ $2,256‚ $0‚ $0‚
ia
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
r
‚2006 ‚ $0‚ $1,136‚ $0‚ $0‚
e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
t
‚2007 ‚ $519‚ $0‚ $1,066‚ $0‚
a
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
M n
12. Creating an Output Data Set with PROC TABULATE
y
a r t i o
a. Retrieve the starter program p112e12.
t
b. Create an output data set from the PROC TABULATE results. The output data set should contain
i e i b u
average salaries for each combination of Company and Employee_Gender, plus overall
averages for each Company.
p r t r S
Creating an output data set from PROC TABULATE results is discussed in the self-study
P
c. Sort the data set by average salary.
S r o f
d. Print the sorted data set. Assign a format and column header to the average salary column.
o
A t f d e
S No tsi
ou
12-64 Chapter 12 Producing Summary Reports
Employee Average
Obs Company Gender Salary
l
2 Orion USA F $29,167
3 Orion Australia $30,574
ia
4 Orion USA $31,226
r
5 Orion USA M $32,534
e
6 Orion Australia M $32,963
t
7 Concession F $33,375
a
8 Purchasing M $33,462
9 Concession $33,839
M
10 Concession M $34,650
n
11 Purchasing $38,408
y
12 Logistics F $39,055
a r 13
t
14
i o Purchasing
Marketing
F
M
$41,556
$42,645
i e t b u
15
16
Logistics
Shared Functions M
$43,128
$43,428
i
17 Marketing $44,390
p r t r
18
19
S
Shared Functions
Shared Functions F
$44,631
$46,016
o dis SA
20 Marketing F $47,132
r
21 Logistics M $47,630
22 Board of Directors F $68,370
S P o r o f 23
24
Board of Directors
Board of Directors M
$134,034
$212,831
A t f d e
S No tsi
ou
12.4 Chapter Review 12-65
Chapter Review
1. What statistics are produced by default by PROC FREQ?
ia l
e r
3. What is the purpose of the VAR statement in PROC
MEANS?
t
M a
4. What is the purpose of the CLASS statement in PROC
MEANS?
r y i o n
t a
PROC MEANS output?
u t
5. How can you change which statistics are displayed in
114
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-66 Chapter 12 Producing Summary Reports
12.5 Solutions
Solutions to Exercises
1. Counting Levels of a Variable with PROC FREQ
a. Retrieve the starter program.
b. Modify the program to produce two separate reports.
proc freq data=orion.orders nlevels;
ia l
r
where Order_Type=1;
e
tables Customer_ID Employee_ID / noprint;
run;
a t
title1 'Unique Customers and Salespersons for Retail Sales';
y M
proc freq data=orion.orders nlevels;
where Order_Type ne 1;
n
r i
tables Customer_ID / noprint;
a t o
title1 'Unique Customers for Catalog and Internet Sales';
run;
i e t b u
i
c. Submit the program.
p r t r S
2. Producing Frequency Reports with PROC FREQ
o dis SA
a. Retrieve the starter program.
r
S Pproc format;
o o f
b. Add TABLES statements to the PROC FREQ step.
r
f
value ordertypes
A
S No tsit
run; d e
1='Retail'
2='Catalog'
3='Internet';
ou
proc freq data=orion.orders ;
tables Order_Date;
tables Order_Type / nocum;
tables Order_Date*Order_Type / nopercent norow nocol;
format Order_Date year4. Order_Type ordertypes.;
title 'Order Summary by Year and Type';
run;
c. Submit the program.
12.5 Solutions 12-67
The top two customer types are Orion Club members medium activity (20) and
Orion Club members low activity (17).
ia l
t e r
The top two customer age groups are 31-45 years (27) and 15-30 years (22).
M a
proc freq data=orion.customer_dim order=freq;
tables Customer_Country Customer_Type Customer_Age_Group;
title1 'Customer Demographics';
r y i o n
title3 '(Top two levels for each variable?)';
t
run;
t a u
e. Submit the modified program.
e
r i i b
f. What are the two most common values for each variable?
t r S
r p
The top two countries are US (United States, 28 customers) and CA (Canada, 15 customers).
o dis SA
The top two customer types are Orion Club members medium activity (20) and
P f
Orion Club members low activity (17).
S r o
The top two customer age groups are 31-45 years (27) and 15-30 years (22).
f o
A
S No tsit d e
Which report was easier to use to answer the questions correctly?
The ORDER=FREQ option in the PROC FREQ statement sequences the frequency output
in descending count order. Because the levels that occur most often appear near the top of
each report, the most common data values can be identified more easily.
ou
4. Creating an Output Data Set with PROC FREQ
a. Retrieve the starter program.
b. Create an output data set.
proc freq data=orion.order_fact noprint;
tables Product_ID / out=product_orders;
run;
c. Combine the output data set with orion.product_list.
data product_names;
merge product_orders orion.product_list;
by Product_ID;
keep Product_ID Product_Name Count;
run;
12-68 Chapter 12 Producing Summary Reports
ia l
run;
e. Submit the program.
t e r
a
5. Creating a Summary Report with PROC MEANS
M n
a. Retrieve the starter program.
r y i o
b. Display only the SUM statistic for the Total_Retail_Price variable.
proc format;
a t
i e t
value ordertypes
1='Retail'
b u
p r
2='Catalog'
t
3='Internet';
r i S
o dis SA
run;
P r
proc means data=orion.order_fact sum;
f
r
var Total_Retail_Price;
S run;
f o e o
title 'Revenue (in U.S. Dollars) Earned from All Orders';
A t d
c. Display separate statistics for the combination of Order_Date and Order_Type.
S No tsi
proc means data=orion.order_fact sum;
var Total_Retail_Price;
ou
class Order_Date Order_Type;
format Order_Date year4. Order_Type ordertypes.;
title 'Revenue (in U.S. Dollars) Earned from All Orders';
run;
d. Submit the program.
6. Analyzing Missing Numeric Values with PROC MEANS
a. Retrieve the starter program.
b. Display the number of missing values and the number of nonmissing values.
proc means data=orion.staff nmiss n;
var Birth_Date Emp_Hire_Date Emp_Term_Date;
title 'Number of Missing and Non-Missing Date Values';
run;
12.5 Solutions 12-69
l
var Birth_Date Emp_Hire_Date Emp_Term_Date;
class Gender;
title 'Number of Missing and Non-Missing Date Values';
run;
e r ia
a t
e. Suppress the output column that displays the total number of observations.
proc means data=orion.staff nmiss n maxdec=0 nonobs;
M
var Birth_Date Emp_Hire_Date Emp_Term_Date;
n
class Gender;
run;
a r y
title 'Number of Missing and Non-Missing Date Values';
t i o
i e t
f. Submit the program.
b u
i
7. Analyzing All Possible Classification Levels with PROC MEANS
p r t r
a. Retrieve the starter program.
S
r o dis SA
b. Display statistics in the report.
data work.countries(keep=Customer_Country);
S P o r
set orion.supplier;
o f
Customer_Country=Country;
f e
run;
A
S No tsit d
proc means data=orion.customer_dim
lclm mean uclm;
class Customer_Country;
ou
var Customer_Age;
title 'Average Age of Customers in Each Country';
run;
t r
M a
b. Create an output data set.
proc means data=orion.order_fact noprint nway;
n
class Product_ID;
run;
a y
var Total_Retail_Price;
r t i o
output out=product_orders sum=Product_Revenue;
e t b u
c. Combine the output data set with orion.product_list.
i
p r
data product_names;
t r i S
merge product_orders orion.product_list;
o dis SA
by Product_ID;
r
keep Product_ID Product_Name Product_Revenue;
run;
S P o r o f
d. Sort the merged data and print the first 10 observations.
f e
proc sort data=product_names;
A
S No tsit
by descending Product_Revenue;
run;
d
proc print data=product_names(obs=10) label;
ou
var Product_Revenue Product_ID Product_Name;
label Product_ID='Product Number'
Product_Name='Product'
Product_Revenue='Revenue';
title 'Top Ten Products by Revenue';
run;
e. Display the revenue values with a leading euro symbol.
proc print data=product_names(obs=10) label;
var Product_Revenue Product_ID Product_Name;
label Product_ID='Product Number'
Product_Name='Product'
Product_Revenue='Revenue';
format Product_Revenue eurox12.2;
title 'Top Ten Products by Revenue';
run;
12.5 Solutions 12-71
l
title 'Ages of Customers by Group and Gender';
ia
run;
c. Add a VAR statement.
e r
proc tabulate data=orion.customer_dim;
t
class Customer_Group Customer_Gender;
var Customer_Age;
M a
title 'Ages of Customers by Group and Gender';
n
run;
r y
d. Add a TABLE statement.
a t i o
proc tabulate data=orion.customer_dim;
i e t
var Customer_Age;
b u
class Customer_Group Customer_Gender;
p r t r i
table Customer_Group all,
S
Customer_Gender*Customer_Age*(n mean);
o dis SA
title 'Ages of Customers by Group and Gender';
r
run;
o r o f
10. Creating a Three-Dimensional Tabular Report with PROC TABULATE
A t f e
a. Retrieve the starter program.
d
S No tsi
b. Define a tabular report.
proc tabulate data=orion.customer_dim;
ou
class Customer_Gender Customer_Group;
table Customer_Gender, Customer_Group, (n colpctn);
keylabel colpctn='Percentage' N='Number';
title 'Customers by Group and Gender';
run;
c. Submit the program.
12-72 Chapter 12 Producing Summary Reports
ia l
r
label Total_Retail_Price='Revenue for Each Product';
title;
run;
a t e
c. Suppress the labels for the Order_Date and Product_ID variables.
M
proc tabulate data=orion.order_fact;
where CostPrice_Per_Unit > 250;
y n
r
class Product_ID Order_Date;
format Order_Date year4.;
a t i o
t u
var Total_Retail_Price;
e
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' ';
r
title;
i t r i b
label Total_Retail_Price='Revenue for Each Product';
S
run;
r p
o dis SA
d. Suppress the label for the SUM keyword.
P f
proc tabulate data=orion.order_fact;
S o r o
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;
f e
format Order_Date year4.;
A
S No tsit d
var Total_Retail_Price;
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' ';
label Total_Retail_Price='Revenue for Each Product';
keylabel Sum=' ';
ou
title;
run;
e. Insert text into the box above the row titles.
proc tabulate data=orion.order_fact;
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;
format Order_Date year4.;
var Total_Retail_Price;
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' '
/ box='High Cost Products (Unit Cost > $250)';
label Total_Retail_Price='Revenue for Each Product';
keylabel Sum=' ';
title;
run;
12.5 Solutions 12-73
ia l
run;
t e r
g. Display $0 in all cells that have no calculated value.
M a
proc tabulate data=orion.order_fact format=dollar12.;
where CostPrice_Per_Unit > 250;
n
class Product_ID Order_Date;
y
format Order_Date year4.;
r
var Total_Retail_Price;
a t i o
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' '
i e t
/ misstext='$0'
b u
box='High Cost Products (Unit Cost > $250)';
p r t
keylabel Sum=' ';
i
label Total_Retail_Price='Revenue for Each Product';
r S
o dis SA
title;
r
run;
o r o f
12. Creating an Output Data Set with PROC TABULATE
A f e
a. Retrieve the starter program.
t d
S No tsi
b. Create an output data set.
proc tabulate data=orion.Organization_Dim format=dollar12.
ou
out=work.Salaries;
class Employee_Gender Company;
var Salary;
table Company, (Employee_Gender all)*Salary*mean;
title 'Average Employee Salaries';
run;
c. Sort the data set.
proc sort data=work.Salaries;
by Salary_Mean;
run;
12-74 Chapter 12 Producing Summary Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.5 Solutions 12-75
ia l
c.
d.
TITLE
WHERE
t e r
M a
r y i o n
t a u t
12
r i e r i b S
p
o dis SA t
P r 12.02 Quiz – Correct Answer
f
r
Which TABLES statement correctly creates the report?
S o o
a. ;tables Gender nocum;
f e
A
S No tsit d
b. ;tables Gender nocum nopercent;
ou
d. ;tables Gender / nocum nopercent;
Gender Frequency
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68
M 97
p112d01
20
12-76 Chapter 12 Producing Summary Reports
1 F 68 41.2121 68 41.212
2 M 97 58.7879 165 100.000
ia l
statement.
Obs
t
N
42
r y i o n
t a u t
r i e i b
12.04 Quiz – Correct Answer
r S
p t
For a given data set, there are 63 observations with
o dis SA
a Country value of AU. Of those 63 observations,
f
S f o r
Which output is correct?
o
A
S No tsit
a.
e
Analysis Variable : Salary
d Country
N
Obs N
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 61
b. Analysis Variable : Salary
Country
AU
N
Obs
61
N
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
63
ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
55
12.5 Solutions 12-77
ia l
r
run;
4 F 10
a t
68
e min max
Obs Gender Country _TYPE_ _FREQ_ Salary Salary
sum
Salary
ave
Salary
M
5 M 10 97 22710 243190 3185555 32840.77
78
r y i o n p112a02s
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-78 Chapter 12 Producing Summary Reports
ia l
Cumulative Percent
t e r
a
2. How can you produce a two-way frequency table using
PROC FREQ?
y M
By using an asterisk between two variables in a
n
TABLES statement, you can produce a two-way
r i o
frequency table in PROC FREQ.
a t
Example: tables Gender*Country;
i e t b u
i
115 continued...
p r t r S
r o dis SA
Chapter Review Answers
S P o r
MEANS?
o f
3. What is the purpose of the VAR statement in PROC
A t f e
The VAR statement identifies the analysis
variables and their order in the PROC MEANS
d
S No tsi results.
ou
MEANS?
The CLASS statement identifies variables whose
values define subgroups for the PROC MEANS
analysis.
116 continued...
12.5 Solutions 12-79
ia l
t e r
M a
117
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-80 Chapter 12 Producing Summary Reports
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 13 Introduction to Graphics
Using SAS/GRAPH (Self-Study)
l
13.1 Introduction................................................................................................................... 13-3
ia
t e r
13.2 Creating Bar and Pie Charts ........................................................................................ 13-9
Demonstration: Creating Bar and Pie Charts..................................................................... 13-10
M a
13.3 Creating Plots ............................................................................................................. 13-21
y o n
Demonstration: Creating Plots ........................................................................................... 13-22
r i
t a u t
13.4 Enhancing Output ...................................................................................................... 13-25
r i e i b
Demonstration: Enhancing Output ..................................................................................... 13-26
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-2 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.1 Introduction 13-3
13.1 Introduction
ia l
contour plots
t e r
three-dimensional scatter and surface plots
a
maps
text slides
custom graphs
y M n
a r t i o
i e t b u
i
3
p r t r S
r o dis SA
Bar Charts (GCHART Procedure)
S P o r o f
A t f d e
S No tsi
ou
4
13-4 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
ia l
t e r
M a
5
r y i o n
t a u t
r i e i b
Scatter and Line Plots (GPLOT Procedure)
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
6
ou
13.1 Introduction 13-5
ia l
t e r
M a
7
r y i o n
t a u t
r i e i b
Three-Dimensional Surface and Scatter Plots
r S
p
(G3D Procedure)
o dis SA t
P r f
S f o r o
A
S No tsit d e
8
ou
13-6 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
ia l
t e r
M a
9
r y i o n
t a u t
r i e i b
Maps (GMAP Procedure)
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
10
ou
13.1 Introduction 13-7
ia l
t e r
M a
11
r y i o n
t a u t
r i e
SAS/GRAPH Programs
r i b S
p t
General form of a SAS/GRAPH program:
o dis SA
P r GOPTIONS options;
global statements
f
S f o r o
graphics procedure steps
A
S No tsi
For example:
t d e
goptions cback=white;
title 'Number of Employees by Job Title';
proc gchart data=orion.staff;
ou
vbar Job_Title;
run;
quit;
12
13-8 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
RUN-Group Processing
Many SAS/GRAPH procedures can use RUN-group
processing, which means that the following are true:
The procedure executes the group of statements
following the PROC statement when a RUN
statement is encountered.
Additional statements followed by another RUN
statement can be submitted without resubmitting the
PROC statement.
ia l
e r
The procedure stays active until a PROC, DATA, or
t
QUIT statement is encountered.
M a
13
r y i o n
t a u t
r i e i b
Example of RUN-Group Processing
r S
p t
proc gchart data=orion.staff;
o dis SA
vbar Job_Title;
f
r
pie Job_Title;
S f o
run;
e o
title 'Pie Chart of Job Titles';
A
quit;
S No tsit d
14
ou
13.2 Creating Bar and Pie Charts 13-9
ia l
e r
Use one of these statements to specify the chart type:
t
a
HBAR chart-variable . . . </ options>;
HBAR3D chart-variable . . . </ options>;
M
r i o n
VBAR chart-variable . . . </ options>;
y
VBAR3D chart-variable . . . </ options>;
t a u t
PIE chart-variable . . . </ options>;
PIE3D chart-variable . . . </ options>;
16
r i e r i b S
p t
The chart variable determines the number of bars or slices produced within a graph. The chart variable
o dis SA
can be character or numeric. By default, the height, length, or slice represents a frequency count of the
P r
values of the chart variable.
f
S f o r o
A
S No tsit d e
ou
13-10 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
ia
variable. Because the chart variable is a character variable, PROC GCHART displays one bar for
l
r
each value of Job_Title.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title;
a t e
where Job_Title =:'Sales Rep';
M n
title 'Number of Employees by Job Title';
run;
y
quit;
a r t i o
i e t u
The RESET=ALL option resets all graphics options to their default settings and clears any titles
or footnotes that are in effect.
b
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-11
2. Submit the second PROC GCHART step to create a three-dimensional horizontal bar chart.
The HBAR3D statement creates a three-dimensional horizontal bar chart showing the same
information as the previous bar chart. Notice that the HBAR and HBAR3D statements automatically
display statistics to the right of the chart.
goptions reset=all;
proc gchart data=orion.staff;
hbar3d Job_Title;
title 'Number of Employees by Job Title';
run;
where Job_Title =:'Sales Rep';
ia l
r
quit;
a t e
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-12 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
3. Submit the third PROC GCHART step to suppress the display of statistics on the horizontal bar chart.
The NOSTATS option in the HBAR3D statement suppresses the display of statistics on the chart.
goptions reset=all;
proc gchart data=orion.staff;
hbar3d Job_Title / nostats;
title 'Number of Employees by Job Title';
where Job_Title =:'Sales Rep';
run;
quit;
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.2 Creating Bar and Pie Charts 13-13
4. Submit the fourth PROC GCHART step to use a numeric chart variable.
The VBAR3D statement creates a vertical bar chart showing the distribution of values of the variable
Salary. Because the chart variable is numeric, PROC GCHART divides the values of Salary into
ranges and displays one bar for each range. The value under the bar represents the midpoint of the
range. The FORMAT statement assigns the DOLLAR9. format to Salary.
goptions reset=all;
proc gchart data=orion.staff;
vbar3d salary / autoref;
where Job_Title =:'Sales Rep';
format salary dollar9.;
ia l
run;
quit;
t r
title 'Salary Distribution Midpoints for Sales Reps';
e
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-14 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
5. Submit the fifth PROC GCHART step to specify ranges for a numeric chart variable and add
reference lines.
The HBAR3D statement creates a three-dimensional horizontal bar chart.
The LEVELS= option in the HBAR3D statement divides the values of Salary into five ranges and
display a bar for each range of values.
The RANGE option in the HBAR3D statement displays the range of values, rather than the midpoint,
under each bar.
The AUTOREF option displays reference lines at each major tick mark on the horizontal (response)
axis.
goptions reset=all;
ia l
proc gchart data=orion.staff;
e r
hbar3d salary/levels=5 range autoref;
t
where Job_Title =:'Sales Rep';
M a
format salary dollar9.;
title 'Salary Distribution Ranges for Sales Reps';
n
run;
quit;
a r y t i o
To display a bar for each unique value of the chart variable, specify the DISCRETE option instead
i e t u
of the LEVELS= option in the VBAR, VBAR3D, HBAR, or HBAR3D statement. The DISCRETE
option should be used only when the chart variable has a relatively small number of unique values.
b
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-15
6. Submit the sixth PROC GCHART step to create bar charts based on statistics.
A vertical bar chart displays one bar for each value of the variable Job_Title by specifying
Job_Title as the chart variable in the VBAR statement. The height of the bar should be based
on the mean value of the variable Salary for each job title.
The SUMVAR= option in the VBAR statement specifies the variable whose values control the
height or length of the bars. This variable (Salary, in this case) is known as the analysis variable.
The TYPE= option in the VBAR statement specifies the statistic for the analysis variable that
l
dictates the height or length of the bars. Possible values for the TYPE= option are SUM and
ia
MEAN.
goptions reset=all;
t e r
A LABEL statement assigns labels to the variables Job_Title and Salary.
a
proc gchart data=orion.staff;
vbar Job_Title / sumvar=salary type=mean;
M
where Job_Title =:'Sales Rep';
format salary dollar9.;
y n
o
label Job_Title='Job Title'
r
Salary='Salary';
t a t i
title 'Average Salary by Job Title';
run;
quit;
i e i b u
p r t r S
If the TYPE= option is not specified, the default value of the TYPE= option is SUM.
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-16 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
7. Submit the seventh PROC GCHART step to assign a different color to each bar and display the
mean statistic on the top of each bar.
The PATTERNID=MIDPOINT option in the VBAR statement causes PROC GCHART to assign
a different pattern or color to each value of the midpoint (chart) variable.
The MEAN option in the VBAR statement displays the mean statistic on the top of each bar. Other
options such as SUM, FREQ, and PERCENT can be specified to display other statistics on the top
of the bars.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title / sumvar=salary type=mean patternid=midpoint mean;
ia l
r
where Job_Title =:'Sales Rep';
format salary dollar9.;
run;
quit;
a t e
title 'Average Salary by Job Title';
M n
Only one statistic can be displayed on top of each vertical bar. For horizontal bar charts, you can
y o
specify multiple statistics, which are displayed to the right of the bars.
t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-17
8. Submit the eighth PROC GCHART step to divide the bars into subgroups.
A VBAR statement creates a vertical bar chart that shows the number of sales representatives for
each value of Job_Title.
The SUBGROUP= option in the VBAR statement divides the bar into sections, where each section
represents the frequency count for each value of the variable Gender.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title/subgroup=Gender;
where Job_Title =:'Sales Rep';
ia l
r
title 'Frequency of Job Title, Broken Down by Gender';
e
run;
quit;
a t
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-18 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
The GROUP= option in the VBAR statement displays a separate set of bars for each value of
Job_Title.
The PATTERNID=MIDPOINT option in the VBAR statement displays each value of Gender
(the midpoint or chart variable) with the same pattern.
goptions reset=all;
proc gchart data=orion.staff;
vbar gender/group=Job_Title patternid=midpoint;
ia l
t e r
where Job_Title =:'Sales Rep';
title 'Frequency of Job Gender, Grouped by Job Title';
a
run;
quit;
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
Submitting the following statements (reversing the chart and group variables) would produce a chart
with two groups of four bars:
proc gchart data=orion.staff;
vbar gender/group=Job_Title patternid=midpoint;
13.2 Creating Bar and Pie Charts 13-19
10. Submit the final PROC GCHART step to create multiple pie charts using RUN-group processing.
A PIE and a PIE3D statement in the same PROC GCHART step produces both a two-dimensional pie
chart and a three-dimensional pie chart showing the number of sales representatives for each value of
Job_Title.
The TITLE2 statements specify different subtitles for each chart. The NOHEADING option in
the PIE3D statement suppresses the FREQUENCY of Job_Title heading.
Because RUN-group processing is in effect, note the following:
• It is not necessary to submit a separate PROC GCHART statement for each graph.
• The WHERE statement is applied to both charts.
ia l
e r
• The TITLE statement is applied to both charts.
t
• A separate TITLE2 statement is used for each chart.
goptions reset=all;
M a
proc gchart data=orion.staff;
pie Job_Title;
r y o n
where Job_Title =:'Sales Rep';
i
t
title 'Frequency Distribution of Job Titles';
run;
e t a
title2 '2-D Pie Chart';
u
r i r i b
pie3d Job_Title / noheading;
title2 '3-D Pie Chart';
t S
run;
p
o dis SA
quit;
r
S P o r o f
A t f d e
S No tsi
ou
13-20 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.3 Creating Plots 13-21
ia l
PROC GPLOT DATA=SAS-data-set;
t e r
PLOT vertical-variable*horizontal-variable </ options>;
a
RUN;
QUIT;
y M n
a r t i o
i e t b u
i
19
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-22 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
Creating Plots
p113d02
1. Submit the first PROC GPLOT step to create a simple scatter plot.
The PROC GPLOT creates a plot displaying the values of the variable Yr2007 on the vertical axis
sign).
ia l
and Month on the horizontal axis. The points are displayed using the default plotting symbol (a plus
goptions reset=all;
t e r
A FORMAT statement assigns the DOLLAR12. format to Yr2007.
plot Yr2007*Month;
M a
proc gplot data=orion.budget;
run;
r y i o n
title 'Plot of Budget by Month';
quit;
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.3 Creating Plots 13-23
2. Submit the second PROC GPLOT step to specify plot symbols and interpolation lines.
The SYMBOL statement specifies an alternate plotting symbol and draws an interpolation
line joining the plot points. The options in the SYMBOL statement are as follows:
• V= specifies the plotting symbol (a dot).
• I= specifies the interpolation method to be used to connect the points (join).
• CV= specifies the color of the plotting symbol.
• CI= specifies the color of the interpolation line.
The LABEL statement assigns a label to the variable Yr2007.
goptions reset=all;
ia l
proc gplot data=orion.budget;
t e r
plot Yr2007*Month / haxis=1 to 12;
label Yr2007='Budget';
M a
format Yr2007 dollar12.;
title 'Plot of Budget by Month';
run;
quit;
r y i o n
symbol1 v=dot i=join cv=red ci=blue;
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-24 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
3. Submit the third PROC GPLOT step to overlay multiple plot lines on the same set of axes.
The PLOT statement specifies two separate plot requests (Yr2006*Month and Yr2007*Month),
which results in an overlay plot with the variables Yr2006 and Yr2007 on the vertical axis and
Month on the horizontal axis.
ia l
• The VREF= option specifies a value on the vertical axis where a reference line should be drawn.
goptions reset=all;
t e r
• The CFRAME= option specifies a color to be used for the background within the plot axes.
a
proc gplot data=orion.budget;
plot Yr2006*Month yr2007*Month/ overlay haxis=1 to 12
M n
vref=3000000
y
cframe=”very light gray”;
r
label Yr2006='Budget';
a
format Yr2006 dollar12.;
t i o
i e t b u
title 'Plot of Budget by Month for 2006 and 2007';
symbol1 i=join v=dot ci=blue cv=blue;
run;
p r t i
symbol2 i=join v=triangle ci=red cv=red;
r S
o dis SA
quit;
P r f
S f o r o
A
S No tsit d e
ou
13.4 Enhancing Output 13-25
l
defaults using the following methods:
ia
specifying an ODS style
r
specifying default attributes in a GOPTIONS statement
t
and procedure statements
a e
specifying attributes and options in global statements
y M n
a r t i o
i e t b u
i
22
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-26 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
Enhancing Output
p113d03
1. Submit the first PROC GPLOT step to use ODS styles to control the appearance of the output.
Specify the style in the ODS LISTING statement with the STYLE= option produces a different
ODS style.
ods listing style=gears;
ia l
r
goptions reset=all;
e
proc gplot data=orion.budget;
plot Yr2007*Month;
a
format Yr2007 dollar12.;
label Yr2007='Budget'; t
run;
y M
title 'Plot of Budget by Month';
n
quit;
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.4 Enhancing Output 13-27
2. Submit the second PROC GPLOT step to specify the options in the TITLE and FOOTNOTE
statements to control the text appearance.
The TITLE and FOOTNOTE statements override the default fonts, colors, height, and text
justification. The following options are used:
• F= (or FONT=) specifies a font.
• C= (or COLOR=) specifies text color.
• H= (or HEIGHT=) specifies text height. Units of height can be specified as a percent of the
display (PCT), inches (IN), centimeters (CM), cells (CELLS), or points (PT).
• J= (or JUSTIFY=) specifies text justification. Valid values are LEFT (L), CENTER (C),
ia l
r
and RIGHT (R).
y M
proc gplot data=orion.budget;
n
plot Yr2007*Month / vref=3000000;
r
label Yr2007='Budget';
a t
format Yr2007 dollar12.;
i o
i e t u
title f=centbi h=5 pct 'Budget by Month';
footnote c=green j=left 'Data for 2007';
b
run;
quit;
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-28 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)
3. Submit the third PROC GPLOT step to use a GOPTIONS statement to control the appearance
of the output.
A GOPTIONS statement specifies options to control the appearance of all text in the graph.
The following options are used:
• FTEXT= specifies a font for all text.
• CTEXT= specifies the color for all text.
• HTEXT= specifies the height for all text.
If a text option is specified both in a GOPTIONS statement and in a TITLE or FOOTNOTE
statement, the option specified in the TITLE or FOOTNOTE statement overrides the value in the
ia l
ods listing style=gears;
t e r
GOPTIONS statement only for that title or footnote.
a
goptions reset=all ftext=centb htext=3 pct ctext=dark_blue;
proc gplot data=orion.budget;
M
plot Yr2007*Month / vref=3000000;
label Yr2007='Budget';
y n
o
format Yr2007 dollar12.;
run; r t i
title f=centbi 'Budget by Month';
t a
quit;
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
Chapter 14 Learning More
l
14.2 Beyond This Course ..................................................................................................... 14-6
ia
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
14-2 Chapter 14 Learning More
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
14.1 SAS Resources 14-3
Objectives
Identify areas of support that SAS offers.
ia l
t e r
M a
r y i o n
t a u t
3
r i e r i b S
p
o dis SA t
P rEducation
f
r
Comprehensive training to deliver greater value to your
S
organization
f o e o
A
S No tsit d
More than 200 course offerings
World-class instructors
Multiple delivery methods: instructor-led and
ou
self-paced
Training centers around the world
http://support.sas.com/training/
4
14-4 Chapter 14 Learning More
SAS Publishing
SAS offers a complete selection of publications to help
customers use SAS software to its fullest potential:
ia l
t e r
other publishers, and distributors
a
http://support.sas.com/publishing/
M
5
r y i o n
t a u t
r i e i b
SAS Global Certification Program
r S
p t
SAS offers several globally recognized certifications.
o dis SA
P r Computer-based
f
r
certification exams –
S f o o
typically 60-70 questions and
e
2-3 hours in length
A
S No tsit d
Preparation materials and
practice exams available
Worldwide directory of
SAS Certified Professionals
6
ou http://support.sas.com/certify/
14.1 SAS Resources 14-5
Support
SAS provides a variety of self-help and assisted-help
resources.
ia l
SAS Technical Support
t e r
a
http://support.sas.com/techsup/
M
7
r y i o n
t a u t
r i e
User Groups
r i b S
p t
SAS supports many local, regional, international, and
o dis SA
special-interest SAS user groups.
P r f
r
SAS Global Forum
S f o o
Online SAS Community: www.sasCommunity.org
e
A
S No tsit d
http://support.sas.com/usergroups/
8
ou
14-6 Chapter 14 Learning More
Objectives
Identify the next set of courses that follow this course.
ia l
t e r
M a
r y i o n
t a u t
10
r i e r i b S
p
o dis SA t
P r Next Steps
f
r
SAS® Programming 1: Essentials is the entry point
S f o e o
to most areas of the SAS curriculum.
A
SAS® Programming 1:
S No tsit
Applications
Development
Curriculum
d Essentials
Accessing and
Manipulating
Data
ou Web
Enablement
Curriculum Data
Warehousing
Statistical
Analysis
Presenting
Your
Information
Curriculum Curriculum
11
14.2 Beyond This Course 14-7
Next Steps
To learn more about this: Enroll in the following:
Creating text-based
reports
SAS® Report Writing 1:
Using Procedures and
ODS
ia l
Creating graphic
reports with
SAS/GRAPH software
12
Language (SQL)
r y i o n ...
t a u t
r i e
Next Steps
r i b S
p t
In addition, there are prerecorded short technical
o dis SA
discussions and demonstrations called e-lectures.
P r f
S f o r o
http://support.sas.com/training/
A
S No tsit d e
13
ou
14-8 Chapter 14 Learning More
ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Appendix A Index
concatenating data sets, 10-19–10-40
_ constant operand, 5-14
CONTAINS operator, 5-19
_ALL_ keyword, 8-28, 12-12
CONTENTS procedure, 4-4–4-5, 6-9–6-10
A browsing a data library, 4-23–4-32
NODS option, 4-23
ia l
r
accessing SAS data library, 4-16–4-32 CROSSLIST option
APPEND procedure, 10-8, 10-34
e
TABLES statement, 12-10
appending data sets, 10-7–10-16
like-structured, 10-11
a
unlike-structured, 10-12–10-16t D
M
data
arithmetic operators, 5-15–5-16, 9-6
r i o n
assignment statements, 8-51–8-55, 9-4–9-5
y
browsing with the PRINT procedure, 4-
10–4-12
cleaning invalid, 8-41–8-63
t
B
t a
bar charts, 13-3, 13-9–13-20
e u
cleaning programmatically, 8-50
compilation phase, 7-12–7-16
r i
batch mode, 2-9
t r b
with line plot overlay, 13-5
i S
errors, 8-4
execution phase, 7-17–7-23
p
BETWEEN-AND operator, 5-18 missing values, 4-8
o dis SA
BY statement, 10-47–10-50, 11-49 nonstandard, 7-9, 7-30–7-31
P r f
reading, 5-3–5-5
reading nonstandard delimited, 7-30–7-43
S
charts
f o r o
character variables, 4-6 standard, 7-9–7-10, 7-30
validating with the FREQ procedure, 8-
e
bar, 13-3 25–8-29
A t d
bar with line plot overlay, 13-5
S No tsi
creating bar and pie charts, 13-9–13-20
pie, 13-4
CLASS statement, 12-50
validating with the MEANS procedure, 8-
33–8-35
validating with the PRINT procedure, 8-
20–8-25
ou
MEANS procedure, 12-29 validating with the UNIVARIATE
specifying classification variables, 12-50 procedure, 8-35–8-37
classification variables validity and cleaning, 8-3–8-8
specifying, 12-50 data access
cleaning data, 8-3–8-8 data-driven tasks, 1-6
programmatically, 8-50, 8-55–8-56, 8-63 data analysis
UNIX example, 8-47–8-48 data-driven tasks, 1-6
Windows example, 8-44–8-46 data errors, 8-9–8-18
z/OS (OS/390) example, 8-49 data library
cleaning invalid data, 8-41–8-63 browsing, 4-22–4-32
combining data sets, 10-3–10-6 browsing in UNIX, 4-28–4-30
comments browsing in Windows, 4-24–4-27
in SAS code, 3-8 browsing in z/OS (OS/390), 4-31–4-32
comparison operators, 5-14–5-15 data management
COMPRESS option data-driven tasks, 1-6
FREQ procedure, 12-11 data presentation
A-2 Index
l
Excel worksheets
eliminating duplicates, 10-52
ia
creating a SAS data set, 6-3–6-16
interleaving, 10-35–10-40
creating from SAS data sets, 6-20–6-35
r
like-structured, 10-20–10-28
reading in Windows, 6-15–6-16
e
match-merging, 10-45–10-46
EXPORT procedure, 6-23
merging, 10-4
a t
merging one-to-many, 10-53–10-63
merging one-to-one, 10-44–10-52
F
names, 4-9
y M
merging with non-matches, 10-66–10-88
n
FILE command, 3-14, 3-18
saving programs, 3-14, 3-18
r
output, 12-56–12-59
a t i o
outputting to multiple, 8-17–8-18, 10-85–
FOOTNOTE statement
SAS statements, 11-10–11-13
10-86
e t b u
renaming with RENAME= option, 10-31
i
FORMAT option
TABLES statement, 12-10
r
terminology, 4-10
t r i
unlike-structured, 10-28
p S
FORMAT procedure
creating user-defined formats, 11-33–11-
42
o dis SA
DATA statement, 5-9, 7-6
DROP= option, 9-24–9-26 VALUE statement, 11-33
P r
KEEP= option, 9-24–9-26
DATA step, 5-6–5-11
f
FORMAT statement, 5-29–5-40
SAS statements, 11-25
S o r
processing, 7-12
f o
assignment statements, 9-4–9-5 formats
assigning permanent, 11-27–11-28
A data-driven tasks
t d
data access, 1-6
S No tsi e
data analysis, 1-6
data management, 1-6
assigning temporary, 11-26–11-28
user-defined, 11-32–11-42
FREQ procedure, 8-7, 12-3–12-20
COMPRESS option, 12-11
data presentation, 1-6 NLEVELS option, 12-11
ou
date functions, 9-9–9-10 PAGE option, 12-11
date values, 4-7 TABLES statement, 12-4–12-6
delimiter validating data, 8-25–8-29
default, 7-9 frequency reports
dimension expression, 12-53–12-54 creating, 12-12
DO groups, 9-35–9-37 one-way, 12-4
documentation, 2-24 frequency tables
DROP statement, 5-23, 9-11 producing and enhancing, 12-3–12-20
processing, 9-14–9-24 FSEDIT window
subsetting variables, 5-12–5-25 using to clean data, 8-49
DROP= option functions
DATA statement, 9-24–9-26 date, 9-9–9-10
DSD option sum, 9-8
INFILE statement, 7-39–7-40 fundamentals of SAS statements, 3-3–3-9
Index A-3
G L
G3D procedure LABEL statement, 5-29–5-40, 11-21
SAS/GRAPH software, 13-5 labels
GBARLINE procedure assigning permanent, 11-24
SAS/GRAPH software, 13-5 assigning temporary, 11-21–11-23
GCHART procedure LENGTH statement, 7-24, 9-38
SAS/GRAPH software, 13-3–13-4, 13-9– LIBNAME statement, 4-35–4-39
13-20 assigning a libref, 4-19
l
GCONTOUR procedure SAS/ACCESS, 6-7–6-8
ia
SAS/GRAPH software, 13-6 libref
GMAP procedure assigning, 4-17, 4-19, 4-36
SAS/GRAPH software, 13-6
GPLOT procedure
t e r
SAS/GRAPH software, 13-4, 13-21–13-24
LIKE operator, 5-20–5-21
line plots, 13-4
list input, 7-8
graph
multiple on a page, 13-7
M a LIST option
TABLES statement, 12-10
GREPLAY procedure
y
SAS/GRAPH software, 13-7
t a u t M
I
r i e
Help facility, 2-24
r i b S
maps
GMAP procedure, 13-6
match-merging, 10-45–10-46
p
o dis SA t
IF-THEN DELETE statements, 9-50
MEANS procedure, 8-7, 12-27–12-41
CLASS statement, 12-29
r
IF-THEN statements, 8-57–8-63, 9-31
IF-THEN/ELSE DO statements, 9-35
P f
output data sets, 12-34–12-39
OUTPUT statement, 12-35–12-39
r
IF-THEN/ELSE statements, 9-31, 9-34
o
validating data, 8-33–8-35
S
IMPORT procedure, 6-29–6-30
A f o e
Import wizard, 6-23–6-30
t
IN= data set option, 10-78–10-84
VAR statement, 12-28
MERGE statement, 10-49–10-50
S No tsi d
INCLUDE command, 2-17
INFILE statement, 7-6–7-7
DSD option, 7-39–7-40
merging data sets, 10-4
one-to-many, 10-53–10-63
one-to-one, 10-44–10-52
with non-matches, 10-66–10-88
ou
MISSOVER option, 7-42
missing data values, 4-8
informats, 7-31–7-33
MISSOVER option
INPUT statement, 7-7
INFILE statement, 7-42
interleaving data sets, 10-35–10-40
invalid data
N
cleaning, 8-41–8-63
IS MISSING operator, 5-19 names
IS NULL operator, 5-19 data set and variable, 4-9
NLEVELS option, 8-28–8-29
K FREQ procedure, 12-11
NOCOL option
KEEP statement, 5-23, 9-11
TABLES statement, 12-8
processing, 9-14–9-24
NOCUM option
subsetting variables, 5-12–5-25
TABLES statement, 12-8
KEEP= option
NOFREQ option
DATA statement, 9-24–9-26
TABLES statement, 12-8
noninteractive mode, 2-10
A-4 Index
l
creating with the GPLOT procedure, 13-
observations 21–13-24
subsetting, 9-44–9-50
r
subsetting and grouping, 11-46–11-51
ODS
e
line, 13-4
scatter, 13-4
three-dimensional contour, 13-6 ia
a
one-way frequency reports, 12-4 t
See Output Delivery System, 11-55 three-dimensional surface and scatter, 13-
5
M
operands, 9-5 PRINT procedure, 4-10–4-12, 6-11, 8-6
constant, 5-14
variable, 5-14
operators
r y i o n validating data, 8-20–8-25
PROC MEANS statement
computing and displaying statistics, 12-
t a
arithmetic, 5-15–5-16, 9-6
BETWEEN-AND, 5-18
u t 31–12-34
Program Editor, 2-12, 2-17
r i e
comparison, 5-14–5-15
CONTAINS, 5-19
r i b S
Q
p
IS MISSING, 5-19
o dis SA
IS NULL, 5-19
t quotation marks
balanced, 3-16–3-18
P r
LIKE, 5-20–5-21
logical, 5-16–5-17
f
unbalanced, 3-16
S
OPTIONS
f o r
WHERE, 5-18
o
R
raw data files
e
SAS statements, 11-6
A t
OUT= option
S No tsi d
OUTPUT statement, 12-17–12-19, 12-35–
12-39
TABLES statement, 12-14–12-17
errors when reading, 8-9–8-18
reading data, 5-3–5-5
relational database
accessing, 4-35–4-39
ou
TABULATE procedure, 12-56–12-59 RENAME= option, 10-31
output reports
directing to external files, 11-55–11-82 creating, 11-3–11-15
enhancing with SAS/GRAPH software, enhancing, 11-5–11-15, 11-20–11-28
13-25–13-28 tabular, 12-46–12-59
output data sets, 12-56–12-59 RUN-group processing
MEANS procedure, 12-34–12-39 SAS/GRAPH software, 13-8
Output Delivery System, 11-55–11-58
creating HTML, PDF, and RTF files, 11- S
64–11-67 SAS code
destinations used with Excel, 11-72–11-77 comments, 3-8
HTML, PDF, and RTF destinations, 11- SAS data library
58–11-60 accessing, 4-16–4-32
STYLE= option, 11-68 permanent, 4-18
OUTPUT statement temporary, 4-18
MEANS procedure, 12-35–12-39 SAS data sets, 4-3
Index A-5
ia l
r
temporary, 4-21 SAS/GRAPH software, 13-3–13-8
e
SAS formats, 5-33–5-40 enhancing output, 13-25–13-28
SAS functions, 8-52, 9-8
SAS Help facility, 2-24
SAS informats, 7-31–7-33
a t G3D procedure, 13-5
GBARLINE procedure, 13-5
GCHART procedure, 13-3–13-4, 13-9–13-
SAS names
y M
data set and variable, 4-9
n
20
GCONTOUR procedure, 13-6
SAS programs
a r
debugging, 3-12–3-14
e t
editor windows, 2-13
examples, 2-4–2-5
i b u
GREPLAY procedure, 13-7
RUN-group processing, 13-8
r
fundamentals, 3-3–3-9
including, 2-17
p t r i S
saving programs, 3-14, 3-18
FILE command, 3-14, 3-18
o dis SA
introduction, 2-3–2-9 scatter plots, 13-4
noninteractive mode, 2-10 SET statement, 5-9, 10-20, 10-34
P r
running in batch mode, 2-9
saving, 3-14, 3-18
f
SORT procedure, 10-46, 10-52
BY statement, 10-47–10-48
S o r
submitting, 2-11–2-14
f o
step boundaries, 2-6–2-7 specifying classification variables
CLASS statement, 12-50
A t
16–2-20
S No tsi d e
submitting under Windows, 2-16–2-20, 2-
ou
starting, 2-16, 2-21, 2-25, 2-30 subsetting IF statement, 9-46
SAS statements subsetting observations, 9-44–9-50
BY, 11-49 WHERE statement, 5-12–5-25
DROP, 9-11 subsetting variables
ELSE, 9-33, 9-40 DROP statement, 5-12–5-25
FOOTNOTE, 11-10–11-13 KEEP statement, 5-12–5-25
FORMAT, 11-25 SUM function, 9-8
global, 11-3–11-15 SUMMARY procedure, 12-41
IF-THEN, 9-31 syntax
IF-THEN DELETE, 9-50 diagnosing and correcting errors, 3-10–3-
IF-THEN/ELSE, 9-31, 9-34 18
IF-THEN/ELSE DO, 9-35 syntax rules, 3-5–3-7
KEEP, 9-11
LABEL, 11-21 T
LENGTH statement, 9-38
TABLE statement, 12-49
OPTIONS, 11-6
A-6 Index
l
VAR statement, 12-51
NOCOL option, 12-8
MEANS procedure, 12-28
ia
NOCUM option, 12-8
variable names, 4-9
r
NOFREQ option, 12-8
assigning descriptive labels, 11-21–11-25
e
NOPERCENT option, 12-8
variable operand, 5-14
NOROW option, 12-8
OUT= option, 12-14–12-17
a
TABULATE procedure, 12-46–12-59 t variables
character, 4-6
M
creating, 9-3–9-26
OUT= option, 12-56–12-59
terminology
r y
statistics, 12-54–12-55
t a
SAS data sets, 4-10
u t
three-dimensional contour plots, 13-6
writing to output data set, 5-23–5-24
Viewtable window
13-5
r e r i b
three-dimensional surface and scatter plots,
i S
cleaning data interactively, 8-42–8-43
t
TITLE statement, 11-10–11-13
r p
o dis SA
W
WHERE operators, 5-18
WHERE statement, 7-36, 9-45, 11-46–11-48
P
unbalanced quotation marks, 3-16
S o r
UNIVARIATE procedure, 8-8
o f
validating data, 8-35–8-37
subsetting observations, 5-12–5-25
validating data, 8-21–8-23
A t f d e
S No tsi
ou