You are on page 1of 786

ia l

t e r
®
SAS Programming M a 1:
r y i o n
t a t
Essentials
u
r i e r i b S
p
o dis SA t
P r f
S f o r o
A t d
S No tsi Course Notes e
ou
SAS® Programming 1: Essentials Course Notes was developed by Michele Ensor and Susan Farmer.
Additional contributions were made by Michelle Buchecker, Christine Dillon, Marty Hultgren, Marya
Ilgen-Lieth, Mike Kalt, Natalie McGowan, Linda Mitterling, Georg Morsing, Dr. Sue Rakes, Warren
Repole, and Larry Stewart. Editing and production support was provided by the Curriculum Development
and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
SAS® Programming 1: Essentials Course Notes

ia l
t e r
Copyright © 2009 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in

a
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.

y M n
Book code E1516, course code LWPRG1/PRG1, prepared date 18Jun2009. LWPRG1_002

a r t i o
i e t b u ISBN 978-1-60764-194-0

p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
For Your Information iii

Table of Contents

Course Description ....................................................................................................................... x

Prerequisites ................................................................................................................................ xi

Chapter 1 Introduction .......................................................................................... 1-1

ia l
1.1
r
Course Logistics .............................................................................................................. 1-3

t e
a
1.2 An Overview of Foundation SAS .................................................................................... 1-7

1.3
M
Chapter Review.............................................................................................................. 1-10

y n
1.4

r i o
Solutions ........................................................................................................................ 1-11

a t
t
Solutions to Chapter Review ................................................................................... 1-11

i e i b u
Chapter 2

p r t r
Getting Started with SAS ..................................................................... 2-1

S
2.1

r o dis SA
Introduction to SAS Programs ......................................................................................... 2-3

S P
2.2

o r o f
Submitting a SAS Program ............................................................................................ 2-11
Demonstration: Submitting a SAS Program with SAS Windowing

A t f d e
Environment – Windows ............................................................... 2-16

S No tsi
Demonstration: Submitting a SAS Program with SAS Windowing
Environment – UNIX .................................................................... 2-21
Demonstration: Submitting a SAS Program with SAS Windowing

ou
Environment – z/OS (OS/390) ..................................................... 2-25
Demonstration: Submitting a SAS Program with SAS Enterprise Guide ............... 2-30
Exercises.................................................................................................................. 2-43

2.3 Chapter Review.............................................................................................................. 2-45

2.4 Solutions ........................................................................................................................ 2-46


Solutions to Exercises ............................................................................................. 2-46
Solutions to Student Activities (Polls/Quizzes) ....................................................... 2-49
Solutions to Chapter Review ................................................................................... 2-50
iv For Your Information

Chapter 3 Working with SAS Syntax .................................................................... 3-1

3.1 Mastering Fundamental Concepts.................................................................................... 3-3

3.2 Diagnosing and Correcting Syntax Errors ..................................................................... 3-10


Demonstration: Diagnosing and Correcting Syntax Errors..................................... 3-12
Demonstration: Diagnosing and Correcting Syntax Errors..................................... 3-15

ia
Exercises.................................................................................................................. 3-19
l
3.3
r
Chapter Review.............................................................................................................. 3-20

t e
3.4

a
Solutions ........................................................................................................................ 3-21

M
Solutions to Exercises ............................................................................................. 3-21

y o n
Solutions to Student Activities (Polls/Quizzes) ....................................................... 3-23

r i
a t
Solutions to Chapter Review ................................................................................... 3-25

t u
Chapter 4

r i e i b
Getting Familiar with SAS Data Sets .................................................. 4-1

r S
4.1
p
o dis SA t
Examining Descriptor and Data Portions......................................................................... 4-3

P r Exercises.................................................................................................................. 4-13

f
S
4.2
r o
Accessing SAS Data Libraries ....................................................................................... 4-16

f o
A
S No tsit d e
Demonstration: Accessing and Browsing SAS Data Libraries – Windows ............ 4-24
Demonstration: Accessing and Browsing SAS Data Libraries – UNIX ................. 4-28
Demonstration: Accessing and Browsing SAS Data Libraries – z/OS

ou
(OS/390) ........................................................................................ 4-31
Exercises.................................................................................................................. 4-33

4.3 Accessing Relational Databases (Self-Study) ................................................................ 4-35

4.4 Chapter Review.............................................................................................................. 4-40

4.5 Solutions ........................................................................................................................ 4-41


Solutions to Exercises ............................................................................................. 4-41
Solutions to Student Activities (Polls/Quizzes) ....................................................... 4-45
Solutions to Chapter Review ................................................................................... 4-48
For Your Information v

Chapter 5 Reading SAS Data Sets ........................................................................ 5-1

5.1 Introduction to Reading Data ........................................................................................... 5-3

5.2 Using SAS Data as Input ................................................................................................. 5-6

5.3 Subsetting Observations and Variables .......................................................................... 5-12

l
Exercises.................................................................................................................. 5-26

5.4 Adding Permanent Attributes ......................................................................................... 5-29

e r ia
Exercises.................................................................................................................. 5-41

5.5
a t
Chapter Review.............................................................................................................. 5-44

5.6
M n
Solutions ........................................................................................................................ 5-45

y
r i o
Solutions to Exercises ............................................................................................. 5-45

a t
i e t b u
Solutions to Student Activities (Polls/Quizzes) ....................................................... 5-50

p r t r i
Solutions to Chapter Review ................................................................................... 5-54

S
o dis SA
Chapter 6 Reading Excel Worksheets .................................................................. 6-1

P
6.1
r f
Using Excel Data as Input................................................................................................ 6-3

S f o r o
Demonstration: Reading Excel Worksheets – Windows ......................................... 6-15

A
S No tsi
6.2 t d e
Exercises.................................................................................................................. 6-17

Doing More with Excel Worksheets (Self-Study).......................................................... 6-20

ou
Exercises.................................................................................................................. 6-36

6.3 Chapter Review.............................................................................................................. 6-37

6.4 Solutions ........................................................................................................................ 6-38


Solutions to Exercises ............................................................................................. 6-38
Solutions to Student Activities (Polls/Quizzes) ....................................................... 6-43
Solutions to Chapter Review ................................................................................... 6-44

Chapter 7 Reading Delimited Raw Data Files ...................................................... 7-1

7.1 Using Standard Delimited Data as Input.......................................................................... 7-3


Exercises.................................................................................................................. 7-27
vi For Your Information

7.2 Using Nonstandard Delimited Data as Input ................................................................. 7-30


Exercises.................................................................................................................. 7-44

7.3 Chapter Review.............................................................................................................. 7-48

7.4 Solutions ........................................................................................................................ 7-49


Solutions to Exercises ............................................................................................. 7-49
Solutions to Student Activities (Polls/Quizzes) ....................................................... 7-52

ia l
t e r
Solutions to Chapter Review ................................................................................... 7-55

Chapter 8

a
Validating and Cleaning Data .............................................................. 8-1

M n
8.1 Introduction to Validating and Cleaning Data.................................................................. 8-3

8.2
r y i o
Examining Data Errors When Reading Raw Data Files .................................................. 8-9

a t
i e t b u
Demonstration: Examining Data Errors .................................................................. 8-14

8.3

r r i
Validating Data with the PRINT and FREQ Procedures ............................................... 8-19

p t S
o dis SA
Exercises.................................................................................................................. 8-30

8.4

P r Validating Data with the MEANS and UNIVARIATE Procedures................................ 8-33

f
S f o r o
Exercises.................................................................................................................. 8-38

A 8.5

S No tsit d e
Cleaning Invalid Data .................................................................................................... 8-41
Demonstration: Using the Viewtable Window to Clean Data – Windows
(Self-Study) ................................................................................... 8-44

ou
Demonstration: Using the Viewtable Window to Clean Data – UNIX
(Self-Study) ................................................................................... 8-47
Demonstration: Using the FSEDIT Window to Clean Data – z/OS (OS/390)
(Self-Study) ................................................................................... 8-49
Exercises.................................................................................................................. 8-64

8.6 Chapter Review.............................................................................................................. 8-66

8.7 Solutions ........................................................................................................................ 8-67


Solutions to Exercises ............................................................................................. 8-67
Solutions to Student Activities (Polls/Quizzes) ...................................................... 8-73
Solutions to Chapter Review ................................................................................... 8-77
For Your Information vii

Chapter 9 Manipulating Data................................................................................. 9-1

9.1 Creating Variables ............................................................................................................ 9-3


Exercises.................................................................................................................. 9-27

9.2 Creating Variables Conditionally ................................................................................... 9-30


Exercises.................................................................................................................. 9-41

9.3 Subsetting Observations................................................................................................. 9-44

ia l
t e r
Exercises.................................................................................................................. 9-51

9.4

a
Chapter Review.............................................................................................................. 9-53

M n
9.5 Solutions ........................................................................................................................ 9-54

a r y t i o
Solutions to Exercises ............................................................................................. 9-54

t
Solutions to Student Activities (Polls/Quizzes) ....................................................... 9-62

i e b u
Solutions to Chapter Review ................................................................................... 9-66

i
p r t r S
o dis SA
Chapter 10 Combining SAS Data Sets ................................................................. 10-1

P r
10.1 Introduction to Combining Data Sets ............................................................................ 10-3

f
S o r o
10.2 Appending a Data Set (Self-Study)................................................................................ 10-7

f
A
S No tsit d e
Exercises................................................................................................................ 10-17

10.3 Concatenating Data Sets .............................................................................................. 10-19

ou
Exercises................................................................................................................ 10-41

10.4 Merging Data Sets One-to-One.................................................................................... 10-44

10.5 Merging Data Sets One-to-Many ................................................................................. 10-53


Exercises................................................................................................................ 10-64

10.6 Merging Data Sets with Nonmatches........................................................................... 10-66


Exercises................................................................................................................ 10-89

10.7 Chapter Review............................................................................................................ 10-92

10.8 Solutions ...................................................................................................................... 10-93


Solutions to Exercises ........................................................................................... 10-93
viii For Your Information

Solutions to Student Activities (Polls/Quizzes) ................................................... 10-100


Solutions to Chapter Review ............................................................................... 10-107

Chapter 11 Enhancing Reports............................................................................. 11-1

11.1 Using Global Statements ................................................................................................ 11-3

l
Exercises................................................................................................................ 11-16

11.2 Adding Labels and Formats ......................................................................................... 11-20

e r ia
Exercises................................................................................................................ 11-29

a t
11.3 Creating User-Defined Formats ................................................................................... 11-32

M n
Exercises................................................................................................................ 11-43

y
r i o
11.4 Subsetting and Grouping Observations ....................................................................... 11-46

a t
i e t b u
Exercises................................................................................................................ 11-52

r r i
11.5 Directing Output to External Files ............................................................................... 11-55

p t S
o dis SA
Demonstration: Creating HTML, PDF, and RTF Files ......................................... 11-64

P r Demonstration: Creating Files That Open in Excel .............................................. 11-77

f
S f o r Demonstration: Using Options with the EXCELXP Destination (Self-Study) ..... 11-80

o
Exercises................................................................................................................ 11-83

A
S No tsit d e
11.6 Chapter Review............................................................................................................ 11-88

11.7 Solutions ...................................................................................................................... 11-89

ou
Solutions to Exercises ........................................................................................... 11-89
Solutions to Student Activities (Polls/Quizzes) ................................................... 11-103
Solutions to Chapter Review ............................................................................... 11-109

Chapter 12 Producing Summary Reports ............................................................ 12-1

12.1 Using the FREQ Procedure ............................................................................................ 12-3


Exercises................................................................................................................ 12-21

12.2 Using the MEANS Procedure ...................................................................................... 12-27


Exercises................................................................................................................ 12-42
For Your Information ix

12.3 Using the TABULATE Procedure (Self-Study) ........................................................... 12-46


Exercises................................................................................................................ 12-60

12.4 Chapter Review............................................................................................................ 12-65

12.5 Solutions ...................................................................................................................... 12-66


Solutions to Exercises ........................................................................................... 12-66
Solutions to Student Activities (Polls/Quizzes) ..................................................... 12-75

ia l
t e r
Solutions to Chapter Review ................................................................................. 12-78

Chapter 13

a
Introduction to Graphics Using SAS/GRAPH (Self-Study) ............. 13-1

M n
13.1 Introduction .................................................................................................................... 13-3

a r y t i o
13.2 Creating Bar and Pie Charts ........................................................................................... 13-9

i e t b u
Demonstration: Creating Bar and Pie Charts ........................................................ 13-10

r r i
13.3 Creating Plots ............................................................................................................... 13-21

p t S
o dis SA
Demonstration: Creating Plots .............................................................................. 13-22

P r
13.4 Enhancing Output ........................................................................................................ 13-25

f
S f o r o
Demonstration: Enhancing Output........................................................................ 13-26

A t
Chapter 14

S No tsi d e Learning More ..................................................................................... 14-1

14.1 SAS Resources ............................................................................................................... 14-3

ou
14.2 Beyond This Course ....................................................................................................... 14-6

Appendix A Index ..................................................................................................... A-1


x For Your Information

Course Description
This course is for users who want to learn how to write SAS programs. It is the entry point to learning
SAS programming and is a prerequisite to many other SAS courses. If you do not plan to write SAS
programs and you prefer a point-and-click interface, you should attend the SAS® Enterprise Guide® 1:
Querying and Reporting course.

To learn more…

ia l
e r
A full curriculum of general and statistical instructor-based training is available

t
at any of the Institute’s training facilities. Institute instructors can also provide

M a
on-site training.
For information on other courses in the curriculum, contact the SAS Education

r y i o n
Division at 1-800-333-7660, or send e-mail to training@sas.com. You can also
find this information on the Web at support.sas.com/training/ as well as in the

t a t
Training Course Catalog.

u
r i e r i b S
For a list of other SAS books that relate to the topics covered in this

p
o dis SA t Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to sasbook@sas.com. Customers outside the

P r USA, please contact your local SAS office.

f
r
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a

S f o e o complete list of books and a convenient order form.

A
S No tsit d
ou
For Your Information xi

Prerequisites
Before attending this course, you should have experience using computer software. Specifically, you
should be able to
• understand file structures and system commands on your operating systems
• access data files on your operating systems.
No prior SAS experience is needed. If you do not feel comfortable with the prerequisites or are new to
programming and think that the pace of this course might be too demanding, you can take the
Introduction to Programming Concepts Using SAS® Software course before attending this course.

ia l
e r
Introduction to Programming Concepts Using SAS® Software is designed to introduce you to computer
programming and presents a portion of the SAS® Programming 1: Essentials material at a slower pace.

t
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
xii For Your Information

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 1 Introduction

1.1 Course Logistics ............................................................................................................ 1-3

1.2 An Overview of Foundation SAS .................................................................................. 1-7

ia l
1.3
r
Chapter Review ............................................................................................................. 1-10

t e
1.4
a
Solutions ....................................................................................................................... 1-11

M
Solutions to Chapter Review ................................................................................................ 1-11

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-2 Chapter 1 Introduction

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1.1 Course Logistics 1-3

1.1 Course Logistics

Objectives
Explain the naming convention that is used for the
course files.
Compare the three levels of exercises that are used
in the course.

ia l
r
Describe at a high level how data is used and stored

e
at Orion Star Sports & Outdoors.

t
Navigate to the Help facility.

a
y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
Filename Conventions

S P o r
p104d01x
o f
A t f d
course ID
e chapter # type item # placeholder

S No tsiCode Type
p104a01
p104a02 Example:

ou
p104a02s The Programming 1
a Activity
course ID is p1, so
p104d01
d Demo p104d01 =
p104d02 Programming 1
e Exercise p104e01 Chapter 4, Demo 1.
s Solution p104e02
p104s01
p104s02
4
1-4 Chapter 1 Introduction

Three Levels of Exercises


Level 1 The exercise mimics an example
presented in the section.

Level 2 Less information and guidance are


provided in the exercise instructions.

ia l
r
Level 3 Only the task you are to perform or

e
the results to be obtained are provided.

a t
Typically, you will need to use the
Help facility.

M
You are not expected to complete all of the exercises

y o n
in the time allotted. Choose the exercise or exercises
that are at the level you are most comfortable with.

r i
t a u t
r i e i b
Orion Star Sports & Outdoors
r S
p
o dis SA t
P r Orion Star Sports & Outdoors is a fictitious global sports

f
r
and outdoors retailer with traditional stores, an online store,

S o o
and a large catalog business.

f e
A
S No tsit
The corporate headquarters is located in the United States

d
with offices and stores in many countries throughout the
world.

ou
Orion Star has about 1,000 employees and 90,000
customers, processes approximately 150,000 orders
annually, and purchases products from 64 suppliers.

6
1.1 Course Logistics 1-5

Orion Star Data


As is the case with most organizations, Orion Star has
a large amount of data about its customers, suppliers,
products, and employees. Much of this information is
stored in transactional systems in various formats.

Using applications and processes such as SAS Data


Integration Studio, this transactional information was
extracted, transformed, and loaded into a data

ia l
warehouse.

t e r
Data marts were created to meet the needs of specific

a
departments such as Marketing.

M
7

r y i o n
t a u t
r i e i
The SAS Help Facility
r b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
8
ou
The Help facility can also be accessed from a Web browser at the following link:
http://support.sas.com/documentation/index.html
1-6 Chapter 1 Introduction

Setup for the Poll


Start your SAS session.
Open the Help facility.

ia l
t e r
M a
10

r y i o n
t a u t
r i e
1.01 Poll
r i b S
p t
Were you able to open the Help facility in your

o dis SA
SAS session?

P r Yes

f
r
No

S f o e o
A
S No tsit d
11
ou
1.2 An Overview of Foundation SAS 1-7

1.2 An Overview of Foundation SAS

Objectives
Describe the structure and design of Foundation SAS.
Describe the functionality of Foundation SAS.

ia l
t e r
M a
r y i o n
t a u t
14

r i e r i b S
p
o dis SA t
P r What Is Foundation SAS?

f
r
Foundation SAS is a highly flexible and integrated

S f o o
software environment that can be used in virtually any
setting to access, manipulate, manage, store, analyze,

e
A
S No tsit
and report on data.

d
ou
15
1-8 Chapter 1 Introduction

What Is Foundation SAS?


Foundation SAS provides the following:
a graphical user interface for administering SAS tasks
a highly flexible and extensible programming language
a rich library of prewritten, ready-to-use SAS
procedures
the flexibility to run on all major operating environments
such as Windows, UNIX, and z/OS (OS/390)

ia l
r
the access to virtually any data source such as DB2,

e
Oracle, SYBASE, Teradata, SAP, and Microsoft Excel

t
the support for most widely used character encodings
for globalization

M a
16

r y i o n
t a u t
r i e i b
What Is Foundation SAS?
r S
p t
At the core of Foundation SAS is Base SAS software.

o dis SA
P r Components of Foundation SAS

f
S f o r
Reporting
and Graphics
o Data Access
and Management
User
Interfaces

A
S No tsit d e
Analytics

Visualization
Base SAS

Business
Application
Development

Web
and Discovery Solutions Enablement

17
ou
Base SAS capabilities can be extended with additional
components.
1.2 An Overview of Foundation SAS 1-9

1.02 Poll
Are you currently using SAS?
Yes
No

ia l
t e r
M a
19

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-10 Chapter 1 Introduction

1.3 Chapter Review

Chapter Review
1. How can you open the Help facility?

2. What is at the core of Foundation SAS?

ia l
t e r
M a
r y i o n
t a u t
21

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1.4 Solutions 1-11

1.4 Solutions

Solutions to Chapter Review

Chapter Review Answers


1. How can you open the Help facility?
The Help facility can be accessed from

ia l
a Web browser at

e r
http://support.sas.com/documentation/index.html.

t
The Help facility can be opened within your

Documentation.

M a
SAS session from Help SAS Help and

r y o n
2. What is at the core of Foundation SAS?

i
a t
Base SAS is at the core of Foundation SAS.

t u
r i e r i b S
22

p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
1-12 Chapter 1 Introduction

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 2 Getting Started with SAS

2.1 Introduction to SAS Programs ...................................................................................... 2-3

2.2 Submitting a SAS Program .......................................................................................... 2-11

ia l
r
Demonstration: Submitting a SAS Program with SAS Windowing Environment –
Windows ..................................................................................................... 2-16

t e
Demonstration: Submitting a SAS Program with SAS Windowing Environment –

a
UNIX ........................................................................................................... 2-21

y M
Demonstration: Submitting a SAS Program with SAS Windowing Environment –

n
z/OS (OS/390) ........................................................................................... 2-25

r i o
Demonstration: Submitting a SAS Program with SAS Enterprise Guide............................. 2-30

a t
i e t b u
Exercises .............................................................................................................................. 2-43

2.3
r r i S
Chapter Review ............................................................................................................. 2-45

p t
2.4

r o dis SA
Solutions ....................................................................................................................... 2-46

S P o r o f
Solutions to Exercises .......................................................................................................... 2-46

Solutions to Student Activities (Polls/Quizzes) ..................................................................... 2-49

A t f e
Solutions to Chapter Review ................................................................................................ 2-50

d
S No tsi
ou
2-2 Chapter 2 Getting Started with SAS

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.1 Introduction to SAS Programs 2-3

2.1 Introduction to SAS Programs

Objectives
List the components of a SAS program.
State the modes in which you can run a SAS program.

ia l
t e r
M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rSAS Programs

f
A SAS program is a sequence of steps that the user

S f o r o
submits for execution.

A
S No tsitRaw
Data d e
DATA steps are typically used to create
SAS data sets.

ou
DATA SAS PROC Report
Step Data Step
Set
SAS
Data PROC steps are typically used to process
Set SAS data sets (that is, generate reports
and graphs, manage data, and sort data).

4
2-4 Chapter 2 Getting Started with SAS

2.01 Quiz
How many steps are in this program?
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

l
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps;


run;

e r ia
t
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;

M a
6

r y i o n p102d01

t a u t
r i e i
SAS Program Example
r b S
p t
This DATA step creates a temporary SAS data set named

o dis SA
Work.NewSalesEmps by reading four fields from a

P r raw data file.

f
r
data work.NewSalesEmps;

S f o o
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;

e
infile 'newemps.csv' dlm=',';

A
S No tsit
input First_Name $ Last_Name $

run;
d
Job_Title $ Salary;

proc print data=work.NewSalesEmps;


run;

ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
8

The raw data filename specified in the INFILE statement needs to be specific to your operating
environment.
Examples of raw data filenames:

Windows s:\workshop\newemps.csv

UNIX /users/userid/newemps.csv

z/OS (OS/390) userid.workshop.rawdata(newemps)


2.1 Introduction to SAS Programs 2-5

SAS Program Example


This PROC PRINT step creates a listing report
of the Work.NewSalesEmps data set.

data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;

ia l
r
run;

run;

a t e
proc print data=work.NewSalesEmps;

proc means data=work.NewSalesEmps;

M
class Job_Title;

n
var Salary;
run;
9

a r y t i o
i e t b u
p r
SAS Program Example

t r i S
This PROC MEANS step creates a summary report of the

r o dis SA
Work.NewSalesEmps data set with statistics for the
variable Salary for each value of Job_Title.

S P o r o f
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;

A t f e
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

d
S No tsi
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps;


run;

ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
10
2-6 Chapter 2 Getting Started with SAS

Step Boundaries
SAS steps begin with either of the following:
a DATA statement
a PROC statement

SAS detects the end of a step when it encounters


one of the following:
a RUN statement (for most steps)

ia l
t e r
a QUIT statement (for some procedures)
the beginning of another step (DATA statement

a
or PROC statement)

y M n
11

a r t i o
e t b u
A SAS program executed in batch or noninteractive mode might not require any RUN statements
to execute successfully. However, this practice is not recommended.

i
p r t r i S
o dis SA
Step Boundaries

P r SAS detects the end of the DATA step when it encounters


the RUN statement.
f
S f o r o
data work.NewSalesEmps;

e
length First_Name $ 12

A
S No tsit d
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

ou
proc print data=work.NewSalesEmps;

proc means data=work.NewSalesEmps;


class Job_Title;
var Salary;

SAS detects the end of the PROC PRINT step when it


encounters the beginning of the PROC MEANS step.
12
2.1 Introduction to SAS Programs 2-7

2.02 Quiz
How does SAS detect the end of the PROC MEANS step?
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

l
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps;

e r
proc means data=work.NewSalesEmps;
ia
t
class Job_Title;
var Salary;

M a
14

r y i o n
t a u t
r i e
Step Boundaries
r i b S
p t
SAS detects the end of the PROC MEANS step when it

o dis SA
encounters the RUN statement.

P r f
data work.NewSalesEmps;

S f o r length First_Name $ 12

o
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';

A
S No tsit run;

d e
input First_Name $ Last_Name $
Job_Title $ Salary;

proc print data=work.NewSalesEmps;

ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;

16
2-8 Chapter 2 Getting Started with SAS

Running a SAS Program


You can invoke SAS in the following ways:
interactive mode (for example, SAS windowing
environment and SAS Enterprise Guide)
batch mode
noninteractive mode

ia l
t e r
M a
17

r y i o n
t a u t
r i e i b
SAS Windowing Environment
r S
p
o dis SA t
P r f
r
In the SAS windowing environment,

S f o e o windows are used to edit and execute


programming statements, display the

A
log, output, and Help facility, and more.

S No tsit d
18
ou
2.1 Introduction to SAS Programs 2-9

SAS Enterprise Guide

ia l
r
SAS Enterprise Guide provides a

e
menu-driven system for creating

t
and running your SAS programs.

M a
19

r y i o n
t a u t
r i e
Batch Mode
r i b S
p t
Batch mode is a method of running SAS programs in

o dis SA
which you prepare a file that contains SAS statements

P rplus any necessary operating system control statements

f
and submit the file to the operating system.

S f o r o
e
Partial z/OS (OS/390) Example:

A
S No tsit
// EXEC SAS
//SYSIN DD *
d
//jobname JOB accounting info,name …

data work.NewSalesEmps;
Appropriate JCL
is placed before
SAS statements.

ou
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile '.workshop.rawdata(newemps)' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
20
2-10 Chapter 2 Getting Started with SAS

Noninteractive Mode
In noninteractive mode, SAS program statements are
stored in an external file and are executed immediately
after you issue a SAS command referencing the file.

Directory-based Example:

SAS filename

ia l
z/OS (OS/390) Example:

t e r
a
SAS INPUT(filename)

M
21

r y i o n
t a u t
The command for invoking SAS at your site might be different from the default shown

r i e r i b
above. Ask your SAS administrator for the command to invoke SAS at your site.

S
p
o dis SA t
2.03 Multiple Answer Poll

P r f
Which mode(s) will you use for running SAS programs?

S
a.

f o r o
SAS windowing environment

e
b. SAS Enterprise Guide

A
S No tsit c.
d.
e.
d
batch mode
noninteractive mode
other

ou
f. unknown

23
2.2 Submitting a SAS Program 2-11

2.2 Submitting a SAS Program

Objectives
Include a SAS program in your session.
Submit a program and browse the results.
Navigate the SAS windowing environment.
Navigate SAS Enterprise Guide.

ia l
t e r
M a
r y i o n
t a u t
26

r i e r i b S
p
o dis SA t
P r SAS Windowing Environment

f
S f o r o
A
S No tsit d e The SAS windowing environment is
used to write, include, and submit SAS
programs and examine SAS results.

ou
27
2-12 Chapter 2 Getting Started with SAS

Three Primary Windows


In the SAS windowing environment, you submit and view
the results of a SAS program using three primary windows.

contains the SAS program to


submit.

contains information about the

ia l
r
processing of the SAS program,

e
including any warning and error

a t messages.

contains reports generated by the

y M n
SAS program.

28

a r t i o
i e t b u
p r
Editor Windows

t r i S
o dis SA
Enhanced Editor Program Editor

P r f
Only available in the Windows Available in all operating

S f o r o
operating environment
Default editor for Windows
environments
Default editor for all operating

A
S No tsit d e
operating environment
Multiple instances of the editor
can be open at one time
Code does not disappear after it
environments except Windows
Only one instance of the editor
can be open at one time
Code disappears after it is

ou
is submitted submitted
Incorporates color-coding as you Incorporates color-coding after
type you press ENTER

29
2.2 Submitting a SAS Program 2-13

Editor Windows

Enhanced Editor

ia l
t e r
M a Program Editor

30

r y i o n
t a u t
r i e
Log Window
r i b S
p
Partial SAS Log

o dis SA t
r
33 data work.NewSalesEmps;
34 length First_Name $ 12 Last_Name $ 18

P
35 Job_Title $ 25;

f
36 infile 'newemps.csv' dlm=',';

r
37 input First_Name $ Last_Name $

o
38 Job_Title $ Salary;

S o
39 run;

A f e
NOTE: The infile 'newemps.csv' is:

t d
File Name=S:\Workshop\newemps.csv,

S No tsi
RECFM=V,LRECL=256

NOTE: 71 records were read from the infile 'newemps.csv'.


The minimum record length was 28.
The maximum record length was 47.

ou
NOTE: The data set WORK.NEWSALESEMPS has 71 observations and 4 variables.

40
41 proc print data=work.NewSalesEmps;
42 run;

NOTE: There were 71 observations read from the data set WORK.NEWSALESEMPS.

31
2-14 Chapter 2 Getting Started with SAS

Output Window
Partial PROC PRINT Output
Obs First_Name Last_Name Job_Title Salary

1 Satyakam Denny Sales Rep. II 26780


2 Monica Kletschkus Sales Rep. IV 30890
3 Kevin Lyon Sales Rep. I 26955
4 Petrea Soltau Sales Rep. II 27440
5 Marina Iyengar Sales Rep. III 29715

l
6 Shani Duckett Sales Rep. I 25795
7 Fang Wilson Sales Rep. II 26810

ia
8 Michael Minas Sales Rep. I 26970
9 Amanda Liebman Sales Rep. II 27465

r
10 Vincent Eastley Sales Rep. III 29695
11 Viney Barbis Sales Rep. III 30265

e
12 Skev Rusli Sales Rep. II 26580

t
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445

a
15 Leonid Karavdic Sales Rep. II 27860

y M n
32

a r t i o
i e t b u
p r
Output Window
PROC MEANS Output
t r i S
r o dis SA The MEANS Procedure

Analysis Variable : Salary

S P
Job_Title

o r
N

o
Obs

f
N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

f
Sales Rep. I 21 21 26418.81 713.1898498 25275.00 27475.00

A
S No tsit
Sales Rep. II

Sales Rep. III

Sales Rep. IV
d e 11
9

6
9

11

6
26902.22

29345.91

31215.00
592.9487283

989.4311956

545.4997709
26080.00

28025.00

30305.00
27860.00

30785.00

31865.00

Temp. Sales Rep. 24 24 26265.83 732.6480659 25020.00 27480.00

ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

33
2.2 Submitting a SAS Program 2-15

2.04 Multiple Answer Poll


Which operating environment(s) will you use with SAS?
a. Windows
b. UNIX
c. z/OS (OS/390)

l
d. other

ia
e. unknown

t e r
M a
35

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-16 Chapter 2 Getting Started with SAS

Submitting a SAS Program with SAS Windowing


Environment – Windows
p102d01



Start a SAS session.
Include and submit a SAS program.

ia l


Examine the results.
Use the Help facility.

t e r
Starting a SAS Session

M a
y o n
1. Double-click the SAS icon to start your SAS session.

r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-17

Including and Submitting a SAS Program

1. To open a SAS program into your SAS session, select File Open Program or click and then
select the file that you want to include. To open a program, your Enhanced Editor must be active.
You can also issue the INCLUDE command to open (include) a program into your SAS session.
a. With the Enhanced Editor active, on the command bar type include and the name of the file
containing the program.
b. Press ENTER.

ia l
t e r
M a
r i o n
The program is included in the Enhanced Editor.

y
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
You can use the Enhanced Editor to do the following:

ou
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
In the Enhanced Editor, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names

2. To submit the program for execution, issue the SUBMIT command, click , or select Run
Submit. The output from the program is displayed in the Output window.
2-18 Chapter 2 Getting Started with SAS

Examining the Results

The Output window


• is one of the primary windows and is open by default
• becomes the active window each time that it receives output
• automatically accumulates output in the order in which it is generated.
You can issue the CLEAR command or select Edit Clear All to clear the contents of the window, or
you can click (the NEW icon).

To scroll horizontally in the Output window, use the horizontal scroll bar or issue the RIGHT
ia l
and LEFT commands.

t e r
In the Windows environment, the Output window displays the last page of output generated by the
submitted program.

M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
To scroll vertically in the Output window, use the vertical scroll bar, issue the FORWARD and
BACKWARD commands, or use the PAGE UP or PAGE DOWN keys on the keyboard.

You also can use the TOP and BOTTOM commands to scroll vertically in the Output window.

ou
1. Scroll to the top to view the output from the PRINT procedure.
2.2 Submitting a SAS Program 2-19

2. To open the Log window and browse the messages that the program generated, issue the
LOG command, select Window Log, or click on the log.
The Log window
• is one of the primary windows and is open by default
• acts as an audit trail of your SAS session; messages are written to the log in the order in which
they are generated by the program.
3. To clear the contents of the window, issue the CLEAR command, select Edit Clear All, or
you can click (the NEW icon).

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
The Log window contains the programming statements that are submitted, as well as notes about the
following:
• any files that were read
• the records that were read
• the program execution and results
In this example, the Log window contains no warning or error messages. If the program contains
errors, relevant warning and error messages are also written to the SAS log.
2-20 Chapter 2 Getting Started with SAS

Using the Help Facility

1. To open the Help facility, select Help SAS Help and Documentation or click .

2. Select the Contents tab.


3. From the Contents tab, select SAS Products Base SAS.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language

f
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming

S f o r o
with Base SAS Software are recommended to learn SAS concepts.

A t d e
4. For example, select Base SAS 9.2 Procedures Guide
find the documentation for the PRINT procedure.

S No tsi
Procedures The PRINT Procedure to

ou
2.2 Submitting a SAS Program 2-21

Submitting a SAS Program with SAS Windowing


Environment – UNIX
p102d01



Start a SAS session.
Include and submit a SAS program.

ia l


Examine the results.
Use the Help facility.

t e r
Starting a SAS Session

M a
y o n
1. In your UNIX session, type the appropriate command to start a SAS session.

r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-22 Chapter 2 Getting Started with SAS

Including and Submitting a SAS Program

1. To open a SAS program into your SAS session, select File Open or click and then select the
file that you want to include. To open a program, your Program Editor must be active.
You can also issue the INCLUDE command to open (include) a SAS program into your SAS session.
a. With the Program Editor active, on the command bar type include and the name of the file
containing the program.
b. Press ENTER.

ia l
t e r
The program is included in the Program Editor window.

M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program Editor window to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
Within the Program Editor, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names

2. To submit the program for execution, issue the SUBMIT command, click , or select Run
Submit. The output from the program is displayed in the Output window.
2.2 Submitting a SAS Program 2-23

Examining the Results

The Output window


• is one of the primary windows and is open by default
• becomes the active window each time that it receives output
• automatically accumulates output in the order in which it is generated.

To clear the contents of the window, issue the CLEAR command, select Edit Clear All, or click .

ia l
To scroll horizontally in the Output window, use the horizontal scroll bar or issue the RIGHT and LEFT
commands.

t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
To scroll vertically within the Output window, use the vertical scroll bar or issue the FORWARD
and BACKWARD commands.
f
S f o r o
You also can use the TOP and BOTTOM commands to scroll vertically in the Output window.

A t d e
1. Scroll to the top to view the output from the PRINT procedure.

S No tsi
ou
2-24 Chapter 2 Getting Started with SAS

2. To open the Log window and browse the messages that the program generated, issue the
LOG command or select View Log.
The Log window
• is one of the primary windows and is open by default
• acts as a record of your SAS session; messages are written to the log in the order in which
they are generated by the program.
3. To clear the contents of the window, issue the CLEAR command, select Edit Clear All,
or click .

ia l
r
The Log window contains the programming statements that were most recently submitted, as

e
well as notes about the following:

t
• any files that were read

a
• the records that were read

M n
• the program execution and results

r y i o
In this example, the Log window contains no warning or error messages. If your program contains

a t
errors, relevant warning and error messages are also written to the SAS log.

e t u
4. Issue the END command or select View

i b
Program Editor to return to the Program Editor window.

r
Using the Help Facility

p t r i S
r o dis SA
1. To open the Help facility, select Help SAS Help and Documentation or click .

P r
2. Select the Contents tab.

S o o f
A t f d e
3. From the Contents tab, select SAS Products Base SAS.

S No tsi
The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming
with Base SAS Software are recommended to learn SAS concepts.

ou
4. For example, select Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure to
find the documentation for the PRINT procedure.

The Help facility can also be accessed from a Web browser at the following link:
http://support.sas.com/documentation/index.html

From this Web page, Base SAS can be selected. The SAS syntax books are available in
HTML or PDF version.
2.2 Submitting a SAS Program 2-25

Submitting a SAS Program with SAS Windowing


Environment – z/OS (OS/390)
.workshop.sascode(p102d01)



Start a SAS session.
Include and submit a SAS program.

ia l


Examine the results.
Use the Help facility.

t e r
Starting a SAS Session

M a
y o n
1. Type the appropriate command to start your SAS session.

r i
t a u t
The method that you use to invoke SAS varies by your operating environment and any
customizations in effect at your site.

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-26 Chapter 2 Getting Started with SAS

Including and Submitting a SAS Program

1. To include (copy) a SAS program into your SAS session, issue the INCLUDE command.
a. Type include and the name of the file that contains your program on the command line
of the Program Editor.
b. Press ENTER.

ia l
t e r
M a
r y i o n
The program is included in the Program Editor.

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program Editor to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save programming statements in a file
The program contains three steps: a DATA step and two PROC steps.
Issue the SUBMIT command to execute your program.
2.2 Submitting a SAS Program 2-27

2. The first page of the output from your program is displayed in the Output window.

ia l
t e r
M a
r y i o n
t a u t
r i e
Examining the Results

r i b S
p
The Output window

o dis SA t
• is one of the primary windows and is open by default

P r
• becomes the active window each time that it receives output

f
S o r
• automatically accumulates output in the order in which it is generated.

o
You can issue the CLEAR command or select Edit Clear All to clear the contents of the window.

f
A t e
To scroll horizontally in the Output window, issue the RIGHT and LEFT commands.

S No tsi d
To scroll vertically in the Output window, issue the FORWARD and BACKWARD commands.

ou
You also can use the TOP and BOTTOM commands to scroll vertically within the Output
window.
2-28 Chapter 2 Getting Started with SAS

1. Issue the END command. If the PRINT procedure produces more than one page of output, you
are taken to the last page of output. If the PRINT procedure produces only one page of output,
the END command enables the MEANS procedure to execute and produce its output.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
You can issue an AUTOSCROLL 0 command on the command line of the Output window to

o dis SA
have all of your SAS output from one submission placed in the Output window at one time.

r
This eliminates the need to issue an END command to run each step separately.

S P o r o f
The AUTOSCROLL command is in effect for the duration of your SAS session. If you want
this every time that you invoke SAS, you can save this setting by typing autoscroll 0;

f
wsave on the command line of the Output window.

A
S No tsit d e
2. Issue the END command to return to the Program Editor.
After the program executes, you can view messages in the Log window.

ou
The Log window
• is one of the primary windows and is open by default.
• acts as a record of your SAS session; messages are written to the log in the order in which they are
generated by the program.
You can issue the CLEAR command to clear the contents of the window.
The Log window contains the programming statements that were recently submitted, as well as notes
about the following:
• any files that were read
• the records that were read
• the program execution and results
In this example, the Log window contains no warning or error messages. If your program contains
errors, relevant warning and error messages are also written to the SAS log.
2.2 Submitting a SAS Program 2-29

Issue the END command to return to the Program Editor.

Using the Help Facility

1. To open the Help facility, select Help SAS Help and Documentation or click .
2. Select the Contents tab.
3. From the Contents tab, select SAS Products Base SAS.

ia
The primary Base SAS syntax books are the Base SAS 9.2 Procedures Guide and SAS 9.2 Language l
e r
Reference: Dictionary. The SAS 9.2 Language Reference: Concepts and Step-by-Step Programming
with Base SAS Software are recommended to learn SAS concepts.

t
a
4. For example, select Base SAS 9.2 Procedures Guide
find the documentation for the PRINT procedure.

M
Procedures The PRINT Procedure to

r y i o n
The Help facility can also be accessed from a Web browser at the following link:

t
http://support.sas.com/documentation/index.html

t a u
From this Web page, Base SAS can be selected. The SAS syntax books are available in

e
r i t r b
HTML or PDF version.

i S
r p
o dis SA
S P o r o f
A t f d e
S No tsi
ou
2-30 Chapter 2 Getting Started with SAS

Submitting a SAS Program with SAS Enterprise Guide


p102d01

• Start SAS Enterprise Guide.




Include and submit a SAS program.
Examine the results.

ia l


Manage a project (optional).
Use the Help facility.

t e r
Starting SAS Enterprise Guide

M a
y o n
1. Double-click the Enterprise Guide icon to start your SAS session.

r i
t a u t
The method that you use to invoke Enterprise Guide varies by any customizations in effect at
your site. This demo is based on Enterprise Guide 4.2.

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou

2. Close the Welcome to SAS Enterprise Guide window by selecting the .


2.2 Submitting a SAS Program 2-31

Including and Submitting a SAS Program

1. To open a SAS program into Enterprise Guide, select File Open Program or select
Program and then select the file that you want to include.
The program is included in the Program tab of the workspace area.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
You can use the Program tab to do the following:
• access and edit existing SAS programs
• write new SAS programs
• submit SAS programs
• save SAS programs to a file
In the Program tab, the syntax in your program is color-coded to show these items:
• step boundaries
• keywords
• variable and data set names
2-32 Chapter 2 Getting Started with SAS

3. Modify the INFILE statement in the Program tab to include the path location of the CSV file.

ia l
t e r
4. To submit the program for execution, select Program Run On Local or click in the

workspace area.

M a
Program tab or select the F8 key. The output from the program is displayed in the Results tab of the

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-33

Examining the Results

The Results tab


• displays the output of the code that you run in SAS Enterprise Guide
• becomes the active tab each time that it receives output.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
To scroll vertically in the Results tab, use the vertical scroll bar or use the PAGE UP or PAGE DOWN
keys on the keyboard.
2-34 Chapter 2 Getting Started with SAS

By default, the result format is set to SAS Report (an XML file specific to SAS) in SAS Enterprise Guide
4.2.
1. To change the result format, select Tools Options Results Results General.
2. Select the desired result formats such as HTML and Text Output and select OK.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3. Resubmit the program.
2.2 Submitting a SAS Program 2-35

4. Select Yes to replace the results from the previous run.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
5. View the multiple Results tabs to view the different result formats.

f
S f o r o
A
S No tsit d e
ou
2-36 Chapter 2 Getting Started with SAS

The Log tab contains the statements specific to SAS Enterprise Guide and the programming statements
that are submitted as well as notes about the following:
• any files that were read
• the records that were read
• the program execution and results.
In this example, the Log tab contains no warning or error messages. If the program contains errors,
relevant warning and error messages are also written to the SAS log.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
To scroll horizontally in the Log tab, use the horizontal scroll bar.
To scroll vertically in the Log tab, use the vertical scroll bar or use the PAGE UP or PAGE DOWN keys
on the keyboard.
2.2 Submitting a SAS Program 2-37

Managing a Project (Optional)

The Project Tree window displays the active project and its associated programs. SAS Enterprise Guide
uses projects to manage each collection of related data, tasks, code, and results.
Multiple programs can be added to one project.
1. To add a new program to the existing project, select New Program.
2. Enter a PROC FREQ step on the Program tab.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-38 Chapter 2 Getting Started with SAS

3. Submit the program and review the results.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
4. To save the program, right-click on Program in the Project Tree and select Save Program As….
Then, supply a location and name for the program.

f
S f o r o
A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-39

5. To save the project, select File Save Project As …. Then, supply a location and name for the
project.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r
6. To maneuver between programs, double-click the desired program in the Project Tree.

f
S f o r o
A
S No tsit d e
ou
2-40 Chapter 2 Getting Started with SAS

7. To delete a program, right-click on the program in the Project Tree and select Delete.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
e
You will need to select Yes to delete all the items associated with the program.

A
S No tsit d
ou
2.2 Submitting a SAS Program 2-41

Using the Help Facility

1. To open the Help facility for SAS Enterprise Guide, select Help SAS Enterprise Guide Help.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
2. To open the Help facility for SAS Syntax, select Help SAS Syntax Help.

A t e
3. Select the Contents tab.

S No tsi d
ou
2-42 Chapter 2 Getting Started with SAS

4. From the Contents tab, select SAS Products Base SAS.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
5. For example, select Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure to

S f o r o
find the documentation for the PRINT procedure.

A
S No tsit d e
ou
2.2 Submitting a SAS Program 2-43

Exercises

Level 1

1. Submitting a Program and Using the Help Facility

ia l
Windows
t r
a. With the appropriate Editor window active, include a SAS program.

e
Select File Open Program and select the p102e01.sas program.

UNIX
a
Select File

M
Open and select the p102e01.sas program.

z/OS (OS/390)

r y i o n
Issue the command: include '.workshop.sascode(p102e01)'.

t
b. Submit the program for execution. Based on the report in the Output window, how many rows

t a
and columns are in the report?

e u
r i
rows: __________

t r i b columns: __________

S
c. Examine the Log window. Based on the log notes, how many observations and variables are in the

r p
o dis SA
Work.country data set?

observations: __________ variables: __________

S P r f
d. Clear the Log and Output windows.

o o
f
e. Use the Help facility to find documentation about the LINESIZE= option.

A
S No tsit d e
Go to the CONTENTS tab in the SAS Help and Documentation. Select SAS Products
Base SAS SAS 9.2 Language Reference: Dictionary
Dictionary of Language Elements SAS System Options LINESIZE= System Option.

ou
What is an alias for the LINESIZE= system option? ____________________

Level 2

2. Identifying SAS Components


a. With the appropriate Editor window active, type the following SAS program:
proc setinit;
run;
b. Submit the program for execution, and then look at the results in the Log window.

The SETINIT procedure produces a list of the SAS components licensed at a given site.
2-44 Chapter 2 Getting Started with SAS

c. If you see SAS/GRAPH in the list of components in the log, include a SAS program.

Windows Select File Open Program and select the p102e02.sas program.

UNIX Select File Open and select the p102e02.sas program.

z/OS (OS/390) Issue the command: include '.workshop.sascode(p102e02)'.

d. Submit the program for execution. View the results in the GRAPH window.
e. Close the GRAPH window.
3. Setting Up Function Keys
ia l
t e r
a. Issue the KEYS command or select Tools Options Keys to open the KEYS window.

M a
The KEYS window is a secondary window used to browse or change function key
definitions.

r y i o n
b. Add the following commands to the F12 key:

a t
clear log; clear output

t u
e
c. Close the KEYS window.

r i t r i b
d. Press the F12 key and confirm that the Log and Output windows are cleared.

S
Level 3

r p
o dis SA
S P o o f
4. Exploring Your SAS Environment – Windows

r
a. Customize the appearance and functionality of the Enhanced Editor by selecting Tools

A t f d e
Options Enhanced Editor. For example, select the Appearance tab to modify the
font size.

S No tsi b. In the Help facility, look up the documentation for the Enhanced Editor.
From the Contents tab, select Using SAS Software in Your Operating Environment

ou
SAS 9.2 Companion for Windows Running SAS under Windows
Using the SAS Editors Using the Enhanced Editor.
5. Exploring Your SAS Environment – UNIX and z/OS (OS/390)
a. From a Web browser, access the following link: http://support.sas.com/documentation/.
b. Select Base SAS.
c. Select the HTML version of Step-by-Step Programming with Base SAS Software.
d. On the Contents tab, select Understanding Your SAS Environment
Using the SAS Windowing Environment Working with SAS Programs.
e. Refer to Command Line Commands and the Editor and Line Commands and the Editor.
2.3 Chapter Review 2-45

2.3 Chapter Review

Chapter Review
1. What are the two components of a SAS program?

2. In which modes can you run a SAS program?

ia l
environment?

t e r
3. How can you include a program in the SAS windowing

environment?
M a
4. How can you submit a program in the SAS windowing

r y i o n
5. What are the three primary windows in the SAS

a
windowing environment?

t u t
39

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2-46 Chapter 2 Getting Started with SAS

2.4 Solutions

Solutions to Exercises
1. Submitting a Program and Using the Help Facility
a. Include a SAS program.

l
options linesize=95 pagesize=52;

ia
data work.country;

infile 'country.dat' dlm='!';

t r
length Country_Code $ 2 Country_Name $ 48;

e
input Country_Code $ Country_Name $;
run;

M a
n
proc print data=work.country;

y
run;

a r
b. Submit the program.

t i o
e t
rows: 238

i u
columns: 3

b
r r i
c. Examine the Log window.

p t S
o dis SA
observations: 238 variables: 2

P r d. Clear the Log and Output windows.

f
S f o r
e. Use the Help facility.

o
What is an alias for the LINESIZE= system option? LS=

A
S No tsit d e
2. Identifying SAS Components
a. Type the following SAS program:
proc setinit;

ou
run;
b. Submit the program.
Partial SAS Log
Product expiration dates:
---Base Product 31DEC2008
---SAS/STAT 31DEC2008
---SAS/GRAPH 31DEC2008
---SAS/ETS 31DEC2008
2.4 Solutions 2-47

c. Include a SAS program.


data work.SalesEmps;
length Job_Title $ 25;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

goptions reset=all;
proc gchart data=work.SalesEmps;
vbar3d Job_Title / sumvar=Salary type=mean;

ia l
r
hbar Job_Title / group=Gender sumvar=Salary

e
patternid=midpoint;
pie3d Job_Title / noheading;
t
a
where Job_Title contains 'Sales Rep';
label Job_Title='Job Title';

M n
format Salary dollar12.;

y
title 'Orion Star Sales Employees';
run;
quit;

a r t i o
e t
d. Submit the program.

i b u
r r i
e. Close the GRAPH window.

p t S
o dis SA
3. Setting Up Function Keys

P r
a. Issue the KEYS command.

f
S f o r
b. Add a command to the F12 key.

o
e
Keys Window (Windows):

A
S No tsit d
ou

c. Close the KEYS window.


d. Press the F12 key.
2-48 Chapter 2 Getting Started with SAS

4. Exploring Your SAS Environment – Windows


a. Customize the appearance and functionality of the Enhanced Editor.
b. Use the Help facility.

ia l
t e r
M a
r y i o n
t a u t
r i e i b
5. Exploring Your SAS Environment – UNIX and z/OS (OS/390)

r S
p t
a. From a Web browser, access the following link: http://support.sas.com/documentation/.

o dis SA
P r b. Select Base SAS.

f
r
c. Select the HTML version of Step-by-Step Programming with Base SAS Software.

S o o
d. On the Contents tab, select Understanding Your SAS Environment

f e
A
S No tsit
Using the SAS Windowing Environment Working with SAS Programs.

d
e. Refer to Command Line Commands and the Editor and Line Commands and the Editor.
Partial Documentation

ou
2.4 Solutions 2-49

Solutions to Student Activities (Polls/Quizzes)

2.01 Quiz – Correct Answer


How many steps are in this program?
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
DATA
Step

ia l
r
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

a t e
proc print data=work.NewSalesEmps;
run;
PROC
Step

class Job_Title;

y M
proc means data=work.NewSalesEmps;

n PROC

o
var Salary; Step
run;

t a r t i
u
3 steps
7

r i e r i b S
p102d01

p
o dis SA t
P r 2.02 Quiz – Correct Answer

f
How does SAS detect the end of the PROC MEANS step?

S f o r o
data work.NewSalesEmps;

e
length First_Name $ 12

A
S No tsit d
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

ou
proc print data=work.NewSalesEmps;

proc means data=work.NewSalesEmps;


class Job_Title;
var Salary;

SAS does not detect the end of the PROC MEANS step.
SAS needs a RUN statement to detect the end.
15
2-50 Chapter 2 Getting Started with SAS

Solutions to Chapter Review

Chapter Review Answers


1. What are the two components of a SAS program?
DATA step and PROC step

2. In which modes can you run a SAS program?

ia l
r
Batch, noninteractive, and interactive modes

t e
3. How can you include a program in the SAS windowing
environment?
a
M
INCLUDE command, File Open, or

r y i o n
4. How can you submit a program in the SAS windowing

t a
environment?

u t
SUBMIT command, Run Submit, or
40

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
5. What are the three primary windows in the SAS

S f o r o
windowing environment?

e
LOG, OUTPUT, and EDITOR windows

A
S No tsit d
ou
41
Chapter 3 Working with SAS Syntax

3.1 Mastering Fundamental Concepts ................................................................................ 3-3

3.2 Diagnosing and Correcting Syntax Errors ................................................................. 3-10

ia l
r
Demonstration: Diagnosing and Correcting Syntax Errors .................................................. 3-12

a t e
Demonstration: Diagnosing and Correcting Syntax Errors .................................................. 3-15

Exercises .............................................................................................................................. 3-19

3.3
M n
Chapter Review ............................................................................................................. 3-20

y
3.4

a r t i o
Solutions ....................................................................................................................... 3-21

i e t b u
Solutions to Exercises .......................................................................................................... 3-21

p r t r i S
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 3-23

o dis SA
Solutions to Chapter Review ................................................................................................ 3-25

P r f
S f o r o
A
S No tsit d e
ou
3-2 Chapter 3 Working with SAS Syntax

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.1 Mastering Fundamental Concepts 3-3

3.1 Mastering Fundamental Concepts

Objectives
Identify the characteristics of SAS statements.
Explain SAS syntax rules.
Insert SAS comments using two methods.

ia l
t e r
M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rSAS Programs

f
r
A SAS program is a sequence of steps.

S o o
data work.NewSalesEmps;

f e
length First_Name $ 12

A
S No tsit
Last_Name $ 18 Job_Title $ 25;

d
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;
DATA
Step

ou
proc print data=work.NewSalesEmps; PROC
run; Step
proc means data=work.NewSalesEmps;
class Job_Title; PROC
var Salary; Step
run;

A step is a sequence of SAS statements.


4
3-4 Chapter 3 Working with SAS Syntax

Statements
SAS statements have these characteristics:
usually begin with an identifying keyword
always end with a semicolon
data work.NewSalesEmps;
length First_Name $ 12

l
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';

ia
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

t e r
proc print data=work.NewSalesEmps;
run;

class Job_Title;

M a
proc means data=work.NewSalesEmps;

n
var Salary;

y
run;
5

a r t i o p103d01

i e t b u
p r
3.01 Quiz

t r i S
How many statements are in the DATA step?

r o dis SA
a. 1

S P b.
c.

o
3

r
5

o f
f
d. 7

A
S No tsit d e
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;

ou
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

7
3.1 Mastering Fundamental Concepts 3-5

SAS Syntax Rules


Structured, consistent spacing makes a SAS program
easier to read.

data work.NewSalesEmps; Conventional Formatting


length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

ia l
e r
proc print data=work.NewSalesEmps;
run;

t
class Job_Title;
var Salary;

M a
proc means data=work.NewSalesEmps;

n
run;

a r y t i o
i e t b u
SAS programming statements are easier to read if you begin DATA, PROC, and RUN statements
in column one and indent the other statements.

p r t r i S
r o dis SA
SAS Syntax Rules

P f
SAS statements are free-format.

S f o r o
One or more blanks or special characters can
be used to separate words.

A
S No tsit d e
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.

ou
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
10
3-6 Chapter 3 Working with SAS Syntax

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.

ia l
r
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12

e
Last_Name $ 18 Job_Title $ 25;

t
infile 'newemps.csv' dlm=',';

a
input First_Name $ Last_Name $
Job_Title $ Salary;

M
run;
proc print data=work.NewSalesEmps; run;

11

y o n
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;

r i
t a u t
r i e
SAS Syntax Rules
r i b S
p t
SAS statements are free-format.

o dis SA
r
One or more blanks or special characters can
be used to separate words.

S P o r o f
They can begin and end in any column.
A single statement can span multiple lines.

A t f e
Several statements can be on the same line.

d
S No tsi data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
Unconventional Formatting

ou
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
12
3.1 Mastering Fundamental Concepts 3-7

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.

ia l
r
data work.NewSalesEmps; Unconventional Formatting
length First_Name $ 12

infile 'newemps.csv' dlm=',';

a e
Last_Name $ 18 Job_Title $ 25;

t
input First_Name $ Last_Name $
Job_Title $ Salary;

M
run;
proc print data=work.NewSalesEmps; run;

13

y o
class Job_Title; var Salary;run;

r i n
proc means data =work.NewSalesEmps;

t a u t
r i e
SAS Syntax Rules
r i b S
p t
SAS statements are free-format.

o dis SA
r
One or more blanks or special characters can
be used to separate words.

S P o r o f
They can begin and end in any column.
A single statement can span multiple lines.

A t f e
Several statements can be on the same line.

d
S No tsidata work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
Unconventional Formatting

ou
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps; run;
proc means data =work.NewSalesEmps;
class Job_Title; var Salary;run;
14
3-8 Chapter 3 Working with SAS Syntax

SAS Comments
SAS comments are text that SAS ignores during
processing. You can use comments anywhere in
a SAS program to document the purpose of the
program, explain segments of the program, or
mark SAS code as non-executing text.

Two methods of commenting:

ia l
/* comment */

t e r
* comment ;

M a
15

r y i o n
t a u t
Avoid placing the /* comment symbols in columns 1 and 2. On some operating environments,

i e i b
SAS might interpret these symbols as a request to end the SAS job or session.

r r S
p
o dis SA
SAS Comments t
P r f
This program contains four comments.

S f
|

o r o
*----------------------------------------*
This program creates and uses the |

A e
| data set called work.NewSalesEmps. |

t
*----------------------------------------*;

S No tsi d
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18
Job_Title $ 25;
infile 'newemps.csv' dlm=',';

ou
input First_Name $ Last_Name $
Job_Title $ Salary /*numeric*/;
run;
/*
proc print data=work.NewSalesEmps;
run;
*/
proc means data=work.NewSalesEmps;
*class Job_Title;
var Salary;
run; p103d02
16
3.1 Mastering Fundamental Concepts 3-9

Setup for the Poll


Retrieve program p103a01.
Read the comment concerning DATALINES.
Submit the program and view the log to confirm
that the PROC CONTENTS step did not execute.

ia l
t e r
M a
18

r y i o n
t a u t
r i e i b
3.02 Multiple Choice Poll
r S
p t
Which statement is true concerning the DATALINES

o dis SA
statement based on reading the comment?

P r f
a. The DATALINES statement is used when reading

S o r
data located in a raw data file.

o
b. The DATALINES statement is used when reading

f
A
S No tsit d e
data located directly in the program.

19
ou
3-10 Chapter 3 Working with SAS Syntax

3.2 Diagnosing and Correcting Syntax Errors

Objectives
Identify SAS syntax errors.
Diagnose and correct a program with errors.
Save the corrected program.

ia l
t e r
M a
r y i o n
t a u t
23

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.2 Diagnosing and Correcting Syntax Errors 3-11

Syntax Errors
Syntax errors occur when program statements
do not conform to the rules of the SAS language.

Examples of syntax errors:


misspelled keywords

l
unmatched quotation marks

ia
missing semicolons

r
invalid options

t e
When SAS encounters a syntax error, SAS prints

a
a warning or an error message to the log.

M
ERROR 22-322: Syntax error, expecting one of the following:

n
a name, a quoted string, (, /, ;, _DATA_, _LAST_,

y
_NULL_.

24

a r t i o
i e
to the SAS log: t b u
When SAS encounters a syntax error, SAS underlines the error and the following information is written

r r i
• the word ERROR or WARNING

p t
• the location of the error
S
r o dis SA
• an explanation of the error

S P o r
3.03 Quiz
o f
A t f e
This program has three syntax errors.

d
S No tsi What are the errors?
daat work.NewSalesEmps;

ou
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps


run;

proc means data=work.NewSalesEmps average max;


class Job_Title;
var Salary;
run;

p103d03
26
3-12 Chapter 3 Working with SAS Syntax

Diagnosing and Correcting Syntax Errors


p103d03

• Submit a SAS program with errors.


• Diagnose and correct the errors.
• Save the corrected program.

ia l
e
Submitting a SAS Program with Errors

t r
daat work.NewSalesEmps;
length First_Name $ 12

M a
n
Last_Name $ 18 Job_Title $ 25;

y
infile 'newemps.csv' dlm=',';

r
input First_Name $ Last_Name $

a
Job_Title $ Salary;
t i o
run;

i e t b u
run;

p r t i
proc print data=work.NewSalesEmps

r S
r o dis SA
proc means data=work.NewSalesEmps average max;
class Job_Title;

Pvar Salary;
run;

S o r o f
A f e
For z/OS (OS/390), the following INFILE statement is used:

t d
S No tsi
infile '.workshop.rawdata(newemps)' dlm=',';
The SAS log contains error messages and warnings.

ou
36 daat work.NewSalesEmps;
----
14
WARNING 14-169: Assuming the symbol DATA was misspelled as daat.

37 length First_Name $ 12
38 Last_Name $ 18 Job_Title $ 25;
39 infile 'newemps.csv' dlm=',';
40 input First_Name $ Last_Name $
41 Job_Title $ Salary;
42 run;

NOTE: The infile 'newemps.csv' is:


Filename=S:\Workshop\newemps.csv,
RECFM=V,LRECL=256,File Size (bytes)=2604,
Last Modified=02Apr2008:09:10:12,
Create Time=02Apr2008:09:10:12

(Continued on the next page.)


3.2 Diagnosing and Correcting Syntax Errors 3-13

NOTE: 71 records were read from the infile 'newemps.csv'.


The minimum record length was 28.
The maximum record length was 47.
NOTE: The data set WORK.NEWSALESEMPS has 71 observations and 4 variables.

43
44 proc print data=work.NewSalesEmps
45 run;
---
22

ia l
r
202
ERROR 22-322: Syntax error, expecting one of the following: ;, (, BLANKLINE, DATA, DOUBLE,

WIDTH.

a t e
HEADING, LABEL, N, NOOBS, OBS, ROUND, ROWS, SPLIT, STYLE, SUMLABEL, UNIFORM,

ERROR 202-322: The option or parameter is not recognized and will be ignored.
46

y M n
NOTE: The SAS System stopped processing this step because of errors.

a r t i o
t
47 proc means data=work.NewSalesEmps average max;

i e i b u -------
22

r r
202

p t S
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ALPHA, CHARTYPE, CLASSDATA,

o dis SA
CLM, COMPLETETYPES, CSS, CV, DATA, DESCEND, DESCENDING, DESCENDTYPES, EXCLNPWGT,
EXCLNPWGTS, EXCLUSIVE, FW, IDMIN, KURTOSIS, LCLM, MAX, MAXDEC, MEAN, MEDIAN, MIN,

P r MISSING, MODE, N, NDEC, NMISS, NOLABELS, NONOBS, NOPRINT, NOTHREADS, NOTRAP,

f
NWAY, ORDER, P1, P10, P25, P5, P50, P75, P90, P95, P99, PCTLDEF, PRINT, PRINTALL,

S f o r PRINTALLTYPES, PRINTIDS, PRINTIDVARS, PROBT, Q1, Q3, QMARKERS, QMETHOD, QNTLDEF,

o
QRANGE, RANGE, SKEWNESS, STDDEV, STDERR, SUM, SUMSIZE, SUMWGT, T, THREADS, UCLM,

e
USS, VAR, VARDEF.

A 48
49
50
t
ERROR 202-322: The option or parameter is not recognized and will be ignored.

S No tsi d
class Job_Title;
var Salary;
run;

ou
NOTE: The SAS System stopped processing this step because of errors.

Diagnosing and Correcting the Errors

The log indicates that SAS


• assumed that the keyword DATA was misspelled and executed the DATA step
• interpreted the word RUN as an option in the PROC PRINT statement (because there was a missing
semicolon), so PROC PRINT was not executed
• did not recognize the word AVERAGE as a valid option in the PROC MEANS statement, so the PROC
MEANS step was not executed.
1. If you are using the Enhanced Editor, the program remains in the editor.
However, if you use the Program Editor, the code disappears with each submission. Use the RECALL
command or select Run Recall Last Submit to recall the program that you submitted. The original
program is copied into the Program Editor.
3-14 Chapter 3 Working with SAS Syntax

2. Edit the program.


a. Correct the spelling of DATA.
b. Put a semicolon at the end of the PROC PRINT statement.
c. Change the word AVERAGE to MEAN in the PROC MEANS statement.
data work.NewSalesEmps;
length First_Name $ 12

l
Last_Name $ 18 Job_Title $ 25;

ia
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

t e r
run;

M a
proc print data=work.NewSalesEmps;

y o n
proc means data=work.NewSalesEmps mean max;
class Job_Title;
r i
var Salary;
run;
t a u t
r i e i b
3. Submit the program. It runs successfully without errors and generates output.

r S
p
o dis SA t
Saving the Corrected Program

P r
You can use the FILE command to save your program to a file. The program must be in the Enhanced

f
r
Editor or Program Editor before you issue the FILE command. If the code is not in the Program Editor,

S o o
recall your program before saving the program.

f e
A Windows or UNIX

S No tsit d
z/OS (OS/390)

You can also select File


file 'myprog.sas'

file '.workshop.sascode(myprog)'

Save As.

ou
A note appears that indicates that the statements are saved to the file.
3.2 Diagnosing and Correcting Syntax Errors 3-15

Diagnosing and Correcting Syntax Errors


p103d04

• Submit a SAS program that contains unbalanced quotation marks.


• Diagnose and correct the error.
• Resubmit the program.

ia l
e r
Submitting a SAS Program that Contains Unbalanced Quotation Marks

t
data work.NewSalesEmps;
M a
The closing quotation mark for the DLM= option in the INFILE statement is missing.

r y
Job_Title $ 25;

i o n
length First_Name $ 12 Last_Name $ 18

t a u t
infile 'newemps.csv' dlm=',;
input First_Name $ Last_Name $

run;

r i e
Job_Title $ Salary;

r i b S
p t
proc print data=work.NewSalesEmps;
run;

o dis SA
P r f
proc means data=work.NewSalesEmps;

S o r
class Job_Title;
var Salary;

f o
A
run;

S No tsit
SAS Log
51
52
d e
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18

ou
53 Job_Title $ 25;
54 infile 'newemps.csv' dlm=',;
55 input First_Name $ Last_Name $
56 Job_Title $ Salary;
57 run;
58
59 proc print data=work.NewSalesEmps;
60 run;
61
62 proc means data=work.NewSalesEmps;
63 class Job_Title;
64 var Salary;
65 run;
3-16 Chapter 3 Working with SAS Syntax

Diagnosing and Correcting the Errors

There are no notes in the SAS log because all of the SAS statements after the DLM= option became part
of the quoted delimiter.

The banner in the window indicates that the DATA step is still running, and it is still running
because the RUN statement was not recognized.

ia l
r
You can correct the unbalanced quotation marks programmatically by adding the following code before

e
your previous statements:
*';*";run;

a t
If the quotation mark counter within SAS has an uneven number of quotation marks, as seen in the above

M
program, SAS reads the quotation mark in the code above as the matching quotation mark in the quotation

n
mark counter. SAS then has an even number of quotation marks in the quotation mark counter and runs

y o
successfully, assuming no other errors occur. Both single quotation marks and double quotation marks are

r t i
used in case you submitted double quotation marks instead of single quotation marks.

t a
i e i b u
Point-and-Click Approaches to Balancing Quotation Marks

Windows

p r t r S
r o dis SA
1. To correct the problem in the Windows environment, click the break icon
the CTRL and Break keys.
or press

S P o r o f
2. Select 1. Cancel Submitted Statements in the Tasking Manager window and select OK.

A t f d e
S No tsi
ou
3.2 Diagnosing and Correcting Syntax Errors 3-17

3. Select Y to cancel submitted statements, OK.

ia l
UNIX

t e r
a
1. To correct the problem in the UNIX operating environment, open the SAS: Session Management
window and select Interrupt.

y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
2. Select 1 in the SAS: Tasking Manager window.

S P o r o f
A t f d e
S No tsi
ou
3. Select Y.
3-18 Chapter 3 Working with SAS Syntax

z/OS (OS/390)

1. To correct the problem in the z/OS (OS/390) operating environment, press the Attention key
or issue the ATTENTION command.
2. Type 1 to select 1. Cancel Submitted Statements and press the ENTER key.

ia l
t e r
3. Type Y and press ENTER.

M a
r y i o n
t a
Resubmitting the Program

u t
i e i b
1. In the appropriate Editor window, add a closing quotation mark to the DLM= option in the INFILE

r
statement.

r S
p
o dis SA t
data work.NewSalesEmps;
length First_Name $ 12 Last_Name $ 18

P r Job_Title $ 25;

f
infile 'newemps.csv' dlm=',';

S f o r o
input First_Name $ Last_Name $
Job_Title $ Salary;

A
run;

S No tsit d e
proc print data=work.NewSalesEmps;
run;

ou
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
2. Resubmit the program.

When you make changes to the program in the Enhanced Editor and did not save the new
version of the program, the window bar and the top border of the window reflect that you
changed the program without saving it by putting an asterisk (*) beside the window name.
When you save the program, the * disappears.
3.2 Diagnosing and Correcting Syntax Errors 3-19

Exercises

Level 1

1. Diagnosing and Correcting a Misspelled Word


a. With the appropriate Editor window active, include the SAS program p103e01.

ia l
b. Submit the program.

t e r
a
c. Use the notes in the SAS log to identify the error.

M
d. Correct the error and resubmit the program.

Level 2

r y i o n
t a u t
2. Diagnosing and Correcting a Missing Statement

i e i b
a. With the appropriate Editor window active, include the SAS program p103e02.

r r S
p t
b. Submit the program.

o dis SA
r
c. Are there any errors in the SAS log? _____________________________________________

S P o o f
d. Notice the message in the title bar of the Editor window.

r
e. Why is PROC PRINT running? _________________________________________________

A t f e
f. Add the missing statement to execute the PROC PRINT step.

d
S No tsi
g. Submit the added statement.
h. Confirm that the output was created for the program by viewing the Log and Output windows.

Level 3
ou
3. Using the Help Facility to Determine the Types of Errors in SAS
a. In the Help facility, type syntax errors on the Index tab.

b. Double-click syntax errors in the results box.


c. In the Topics Found pop-up box, select Error Processing and Debugging: Types of Errors in
SAS.
d. Name the five types of errors.
____________________________________________________________________________
3-20 Chapter 3 Working with SAS Syntax

3.3 Chapter Review

Chapter Review
1. With what do SAS statements usually begin?

2. With what do SAS statements always end?

ia l
t e r
3. What are two methods of commenting?

a
4. Name four types of syntax errors.

M
5. How do you save a program?

y n
a r t i o
i e t b u
i
31

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
3.4 Solutions 3-21

3.4 Solutions

Solutions to Exercises
1. Diagnosing and Correcting a Misspelled Word
a. Include the SAS program.
b. Submit the program.
c. Use the notes in the SAS log to identify the error.

ia l
d. Correct the error.
data work.country;
t e r
M a
length Country_Code $ 2 Country_Name $ 48;
infile 'country.dat' dlm='!';

n
input Country_Code $ Country_Name $;
run;

a r y
proc print data=work.country;
t i o
run;

i e t b u
p r t i
2. Diagnosing and Correcting a Missing Statement

r
a. Include the SAS program.
S
r o dis SA
b. Submit the program.

S P o r o f
c. Are there any errors in the SAS log? No
d. Notice the message in the title bar.

A t f d e
e. Why is PROC PRINT running? The PROC PRINT step is missing a RUN statement.

S No tsi
f. Add the missing statement.
data work.donations;

ou
infile 'donation.dat';
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
run;

proc print data=work.donations;


run;
g. Submit the added statement.
h. Confirm that the output was created.
3-22 Chapter 3 Working with SAS Syntax

3. Using the Help Facility to Determine the Types of Errors in SAS


a. In the Help facility, type syntax errors on the Index tab.

b. Double-click syntax errors in the results box.


c. Select Error Processing and Debugging: Type of Errors in SAS.
d. Name the five types of errors.
Syntax: when programming statements do not conform to the rules of the SAS language
compile time

ia l
e r
Semantic: when the language element is correct, but the element might not be valid for a
particular usage compile time

t
time

M a
Execution-time: when SAS attempts to execute a program and execution fails execution

r i o n
Data: when data values are invalid execution time

y
Macro-related: when you use the macro facility incorrectly

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3.4 Solutions 3-23

Solutions to Student Activities (Polls/Quizzes)

3.01 Quiz – Correct Answer


How many statements are in the DATA step?
a. 1
b.
c.
3
5

ia l
d. 7

t e r
data work.NewSalesEmps;
a
length First_Name $ 12

M
Last_Name $ 18 Job_Title $ 25;

r o n
infile 'newemps.csv' dlm=',';

y
input First_Name $ Last_Name $

i
t
Job_Title $ Salary;
run;

e t a u
8

r i t r i b S
r p
o dis SA
3.02 Multiple Choice Poll – Correct Answer

S P o r o f
Which statement is true concerning the DATALINES
statement based on reading the comment?

A t f e
a. The DATALINES statement is used when reading

d
data located in a raw data file.

S No tsi b. The DATALINES statement is used when reading


data located directly in the program.

ou
20
3-24 Chapter 3 Working with SAS Syntax

3.03 Quiz – Correct Answer


This program has three syntax errors.
What are the errors?
daat work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

ia l
e r
proc print data=work.NewSalesEmps
run;

t
class Job_Title;
var Salary;

M a
proc means data=work.NewSalesEmps average max;

n
run;

27

a r y t i o p103d03

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
3.4 Solutions 3-25

Solutions to Chapter Review

Chapter Review Answers


1. With what do SAS statements usually begin?
identifying keyword

2. With what do SAS statements always end?

ia l
r
semicolon

t e
3. What are two methods of commenting?
/* comment */
a
* comment;

y M n
a r t i o
i e t b u
i
32 continued...

p r t r S
r o dis SA
Chapter Review Answers

S P o r o f
4. Name four types of syntax errors.
misspelled keywords

A t f d e
unmatched quotation marks
missing semicolons

S No tsi invalid options

ou
5. How do you save a program?
Issue the FILE command.
Select File Save As.

33
3-26 Chapter 3 Working with SAS Syntax

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 4 Getting Familiar with SAS
Data Sets

4.1 Examining Descriptor and Data Portions ..................................................................... 4-3

ia l
Exercises .............................................................................................................................. 4-13

4.2
t e r
Accessing SAS Data Libraries .................................................................................... 4-16

a
Demonstration: Accessing and Browsing SAS Data Libraries – Windows .......................... 4-24

M
r i o n
Demonstration: Accessing and Browsing SAS Data Libraries – UNIX ................................ 4-28

y
Demonstration: Accessing and Browsing SAS Data Libraries – z/OS (OS/390) ................. 4-31

t a u t
Exercises .............................................................................................................................. 4-33

4.3
r i e r i b
Accessing Relational Databases (Self-Study) ........................................................... 4-35

S
p
o dis SA t
r
4.4 Chapter Review ............................................................................................................. 4-40

P
4.5

S r o f
Solutions ....................................................................................................................... 4-41

o
A t f d e
Solutions to Exercises .......................................................................................................... 4-41

S No tsi
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 4-45

Solutions to Chapter Review ................................................................................................ 4-48

ou
4-2 Chapter 4 Getting Familiar with SAS Data Sets

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4.1 Examining Descriptor and Data Portions 4-3

4.1 Examining Descriptor and Data Portions

Objectives
Define the components of a SAS data set.
Define a SAS variable.
Identify a missing value and a SAS date value.
State the naming conventions for SAS data sets

ia l
and variables.

e r
Browse the descriptor portion of SAS data sets

t
by using the CONTENTS procedure.

M a
Browse the data portion of SAS data sets by using
the PRINT procedure.

r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P r SAS Data Set

f
r
A SAS data set is a file that SAS creates and processes.

S o o
Partial Work.NewSalesEmps

f e
A
Data Set Name WORK.NEWSALESEMPS

S No tsit
Engine
Created
Observations d V9
Fri, Feb 08, 2008 01:40 PM
71 Descriptor
Portion

ou
Variables 4
...
First_Name Last_Name Job_Title Salary
$ 12 $ 18 $ 25 N 8
Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV 30890 Data
Kevin Lyon Sales Rep. I 26955 Portion
Petrea Soltau Sales Rep. II 27440

Data must be in the form of a SAS data set to be processed by many SAS procedures and some
DATA step statements.
A SAS data set is a specially structured file that contains data values.
4-4 Chapter 4 Getting Familiar with SAS Data Sets

Descriptor Portion
The descriptor portion of a SAS data set contains the
following:
general information about the SAS data set (such
as data set name and number of observations)
variable information (such as name, type, and length)

Partial Work.NewSalesEmps
Data Set Name WORK.NEWSALESEMPS

ia l
Engine
Created
V9

t e r
Fri, Feb 08, 2008 01:40 PM General
Information

a
Observations 71
Variables 4

M
...

n
First_Name Last_Name Job_Title Salary Variable

5
$ 12 $ 18

a r y $ 25

t i o
N 8 Information

i e t b u
p r t i
Browsing the Descriptor Portion
r S
The CONTENTS procedure displays the descriptor

r o dis SA
portion of a SAS data set.

S P r f
General form of the CONTENTS procedure:

o o
A t f RUN;

d e
PROC CONTENTS DATA=SAS-data-set;

S No tsi Example:

ou
proc contents data=work.NewSalesEmps;
run;

p104d01
6
4.1 Examining Descriptor and Data Portions 4-5

Browsing the Descriptor Portion


Partial PROC CONTENTS Output
The CONTENTS Procedure

Data Set Name WORK.NEWSALESEMPS Observations 71


Member Type DATA Variables 4
Engine V9 Indexes 0
Created Wed, Jan 16, 2008 Observation Length 64
02:14:20 PM

l
Last Modified Wed, Jan 16, 2008 Deleted Observations 0

ia
02:14:20 PM
Protection Compressed NO

r
Data Set Type Sorted NO
Label

#
t e
Alphabetic List of Variables and Attributes

a Variable Type Len

y M
1
3
2
First_Name

n
Job_Title
Last_Name
Char
Char
Char
12
25
18

r o
4 Salary Num 8

i
7

t a u t
This is a partial view of the default PROC CONTENTS output. PROC CONTENTS output

i e i b
also contains information about the physical location of the file and other data set information.

r r S
p t
The descriptor portion contains the metadata of the data set.

o dis SA
P r4.01 Quiz
f
S o r o
How many observations are in the data set

f
A
S No tsit d e
Work.donations?

Retrieve program p104a01.


After the DATA step, add a PROC CONTENTS step

ou
to view the descriptor portion of
Work.donations.
Submit the program and review the results.

9
4-6 Chapter 4 Getting Familiar with SAS Data Sets

Data Portion
The data portion of a SAS data set is a rectangular table
of character and/or numeric data values.

Partial Work.NewSalesEmps
Variable
First_Name Last_Name Job_Title Salary
names

l
Satyakam Denny Sales Rep. II 26780

ia
Monica Kletschkus Sales Rep. IV 30890 Variable
Kevin Lyon Sales Rep. I 26955 values
Petrea Soltau

t
Character values
e r
Sales Rep. II 27440

Numeric

M a values

11

y o n
The data values are organized as a table of observations
(rows) and variables (columns).

r i
t a u t
Variable names are part of the descriptor portion, not the data portion.

r i e r i b S
p t
SAS Variable Values
o dis SA
P r There are two types of variables:

f
r
Contain any value: letters, numbers,

S f o e
character o special characters, and blanks.
Character values are stored with

A
S No tsit d
a length of 1 to 32,767 bytes.
One byte equals one character.
Stored as floating point numbers

ou
in 8 bytes of storage by default.
Eight bytes of floating point storage
numeric
provide space for 16 or 17 significant
digits.
You are not restricted to 8 digits.
12
4.1 Examining Descriptor and Data Portions 4-7

4.02 Multiple Choice Poll


Which variable type do you think SAS uses to store date
values?
a. character
b. numeric

ia l
t e r
M a
14

r y i o n
t a u t
r i e
SAS Date Values
r i b S
p t
SAS stores date values as numeric values.

o dis SA
P r 01JAN1959

f
01JAN1960 01JAN1961

S f o r o store

A
S No tsit d e
-365

01/01/1959
display
0

01/01/1960
366

01/01/1961

18
ou
A SAS date value is stored as the number of days
between January 1, 1960, and a specific date.

SAS can perform calculations on dates starting from 1582 A.D.


SAS can read either two- or four-digit year values. If SAS encounters a two-digit year, the
YEARCUTOFF= system option is used to specify to which 100-year span the two-digit year
should be attributed. For example, by setting the option YEARCUTOFF= option to 1950, the
100-year span from 1950 to 2049 is used for two-digit year values.
4-8 Chapter 4 Getting Familiar with SAS Data Sets

4.03 Quiz
What is the numeric value for today’s date?

Submit program p104a02.


View the output to retrieve the current date as
a numeric value referencing January 1, 1960.

ia l
t e r
M a
20

r y i o n
t a u t
r i e i
Missing Data Values
r b S
p t
A value must exist for every variable for each observation.

o dis SA
Missing values are valid values in a SAS data set.

P r f
Partial Work.NewSalesEmps

S f o r
First_Name
Satyakam
o
Last_Name
Denny
Job_Title
Sales Rep. II
Salary
26780

A
S No tsit
Monica
Kevin
Petrea
d e Kletschkus Sales Rep. IV
Lyon
Soltau
Sales Rep. I
.
26955
27440

ou
A character A numeric
missing value missing value
is displayed is displayed
as a blank. as a period.

22

A period is the default display for a missing numeric value. The default display can be altered by
changing the MISSING= SAS system option.
4.1 Examining Descriptor and Data Portions 4-9

SAS Data Set and Variable Names


SAS names have these characteristics:
can be 32 characters long.
must start with a letter or underscore. Subsequent
characters can be letters, underscores, or numerals.
can be uppercase, lowercase, or mixed case.
are not case sensitive.

ia l
t e r
M a
23

r y i o n
t a u t
Special characters can be used in variable names if you put the name in quotation marks followed

r i e r i b
immediately by the letter N.

S
t
Example: class 'Flight#'n;

p
o dis SA
In order to use special characters in variable names, the VALIDVARNAME option must be set to ANY.

r
P f
Example: options validvarname=any;

S f o r o
A
S No tsit d e
4.04 Multiple Answer Poll
Which variable names are valid?
a. data5mon

ou
b. 5monthsdata
c. data#5
d. five months data
e. five_months_data
f. FiveMonthsData

25
4-10 Chapter 4 Getting Familiar with SAS Data Sets

SAS Data Set Terminology


Comparable Terminology:

SAS Data Set SAS Table

Observation Row

Variable Column

ia l
is specific to SAS.

t e r
The terminology of data set, observation, and variable

among databases.

M a
The terminology of table, row, and column is common

27

r y i o n
t a u t
r i e i b
Browsing the Data Portion
r S
p t
The PRINT procedure displays the data portion

o dis SA
of a SAS data set.

P r f
By default, PROC PRINT displays the following:

S f o r o
all observations

e
all variables

A
S No tsit d
an Obs column on the left side

28
ou
4.1 Examining Descriptor and Data Portions 4-11

Browsing the Data Portion


General form of the PRINT procedure:

PROC PRINT DATA=SAS-data-set;


RUN;

l
Example:

ia
proc print data=work.NewSalesEmps;

r
run;

a t e
y M n
29

a r t i o p104d02

i e t b u
p r t i
Browsing the Data Portion
r
Partial PROC PRINT Output
S
r o dis SA
Obs First_Name Last_Name Job_Title Salary

P f
1 Satyakam Denny Sales Rep. II 26780

r
2 Monica Kletschkus Sales Rep. IV 30890

o
3 Kevin Lyon Sales Rep. I 26955

S o
4 Petrea Soltau Sales Rep. II 27440

f e
5 Marina Iyengar Sales Rep. III 29715

A
6 Shani Duckett Sales Rep. I 25795

S No tsit 7
8
9
10
11
Fang

d
Michael
Amanda
Vincent
Viney
Wilson
Minas
Liebman
Eastley
Barbis
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
II
I
II
III
III
26810
26970
27465
29695
30265

ou
12 Skev Rusli Sales Rep. II 26580
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445
15 Leonid Karavdic Sales Rep. II 27860

30
4-12 Chapter 4 Getting Familiar with SAS Data Sets

Browsing the Data Portion


Options and statements can be added to the PRINT
procedure.

PROC PRINT DATA=SAS-data-set NOOBS;


VAR variable(s);
RUN;

ia l
t e r
The NOOBS option suppresses the observation
numbers on the left side of the report.

a
The VAR statement selects variables that appear
in the report and determines their order.

y M n
31

a r t i o
i e t b u
p r t r i
Browsing the Data Portion
S
o dis SA
proc print data=work.NewSalesEmps noobs;
var Last_Name First_Name Salary;

P r run;

f
S f o r
Partial PROC PRINT Output

o
Last_Name First_Name Salary

A
S No tsit d e
Denny
Kletschkus
Lyon
Soltau
Iyengar
Duckett
Satyakam
Monica
Kevin
Petrea
Marina
Shani
26780
30890
26955
27440
29715
25795

ou
Wilson Fang 26810
Minas Michael 26970
Liebman Amanda 27465
Eastley Vincent 29695

p104d03
32
4.1 Examining Descriptor and Data Portions 4-13

Exercises

Level 1

1. Examining the Data Portion

ia l
t e r
a. Retrieve the starter program p104e01.
b. After the PROC CONTENTS step, add a PROC PRINT step to display all observations,

M a
all variables, and the Obs column for the data set named Work.donations.

c. Submit the program to create the following PROC PRINT report:

y o n
Partial PROC PRINT Output (First 10 of 124 Observations)

r i
a t
Employee_

t u
Obs ID Qtr1 Qtr2 Qtr3 Qtr4 Total

r i e 1
2

r i b
120265
120267

S
.
15
.
15
.
15
25
15
25
60

p t
3 120269 20 20 20 20 80

o dis SA
4 120270 20 10 5 . 35

r
5 120271 20 20 20 20 80
6 120272 10 10 10 10 40

S P o r
7
8
9

o f
120275
120660
120662
15
25
10
15
25
.
15
25
5
15
25
5
60
100
20

A t f d e
10 120663 . . 5 . 5

S No tsi
d. In the PROC PRINT step, add a VAR statement and the NOOBS option to display only the
Employee_ID and Total variables.

e. Submit the program to create the following PROC PRINT report:

ou
Partial PROC PRINT Output (First 10 of 124 Observations)
Employee_
ID

120265
Total

25
120267 60
120269 80
120270 35
120271 80
120272 40
120275 60
120660 100
120662 20
120663 5
4-14 Chapter 4 Getting Familiar with SAS Data Sets

Level 2

2. Examining the Descriptor and Data Portions


a. Retrieve the starter program p104e02.
b. After the DATA step, add a PROC CONTENTS step to display the descriptor portion of
Work.newpacks.

l
c. Submit the program and answer the following questions:

ia
How many observations are in the data set? ______________________________________

e r
How many variables are in the data set? _________________________________________

t
What is the length (byte-size) of the variable Product_Name? _____________________

M a
d. After the PROC CONTENTS step, add a PROC PRINT step with appropriate statements and

n
options to display part of the data portion of Work.newpacks.

a r y t i o
e. Submit the program to create the following PROC PRINT report:

t
Product_Name Supplier_Name

i e b u
Black/Black

i
Top Sports

r
X-Large Bottlegreen/Black Top Sports

p t r S
Commanche Women's 6000 Q Backpack. Bark Top Sports

o dis SA
Expedition Camp Duffle Medium Backpack Miller Trading Inc
Feelgood 55-75 Litre Black Women's Backpack Toto Outdoor Gear

P r Jaguar 50-75 Liter Blue Women's Backpack

f
Medium Black/Bark Backpack
Toto Outdoor Gear
Top Sports

r
Medium Gold Black/Gold Backpack Top Sports

S f o e oMedium Olive Olive/Black Backpack


Trekker 65 Royal Men's Backpack
Top Sports
Toto Outdoor Gear

A
S No tsit d
Victor Grey/Olive Women's Backpack
Deer Backpack
Deer Waist Bag
Hammock Sports Bag
Sioux Men's Backpack 26 Litre.
Top Sports
Luna sastreria S.A.
Luna sastreria S.A.
Luna sastreria S.A.
Miller Trading Inc

ou
4.1 Examining Descriptor and Data Portions 4-15

Level 3

3. Working with Times and Datetimes


a. Retrieve and submit the starter program p104e03.
b. Notice the values of CurrentTime and CurrentDateTime in the PROC PRINT output.

c. Use the Help facility to find documentation on how times and datetimes are stored in SAS.
Go to the CONTENTS tab in the SAS Help and Documentation and select SAS Products
Base SAS SAS 9.2 Language Reference: Concepts SAS System Concepts

ia l
t e r
Dates, Times, and Intervals About SAS Date, Time, and Datetime Values.
d. Complete the following sentences:

a
A SAS time value is a value representing the number of _________________________________

M n
______________________________________________________________________________.

a r y t i o
A SAS datetime value is a value representing the number of ______________________________

t
______________________________________________________________________________.

i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
4-16 Chapter 4 Getting Familiar with SAS Data Sets

4.2 Accessing SAS Data Libraries

Objectives
Explain the concept of a SAS data library.
Assign a library reference name to a SAS data library
by using the LIBNAME statement.
State the difference between a permanent library

ia l
r
and a temporary library.

the SAS Explorer window.

a e
Browse the contents of a SAS data library by using

t
Investigate a SAS data library by using the

M
CONTENTS procedure.

r y i o n
t a u t
36

r i e r i b S
p
o dis SA t
P r SAS Data Libraries

f
r
A SAS data library is a collection of SAS files that

S f o e o
are recognized as a unit by SAS.

A
S No tsit d
Directory-based System

Windows Example:
A SAS data library

s:\workshop
is a directory.

ou
UNIX Example: /users/userid

A SAS data library is an


z/OS (OS/390)
operating system file.
z/OS (OS/390) Example: userid.workshop.sasdata

37
4.2 Accessing SAS Data Libraries 4-17

SAS Data Libraries


You can think of a SAS data library as a drawer in a filing
cabinet and a SAS data set as one of the file folders in
the drawer.
Files

ia l
t e r
Libraries

M a
38

r y i o n
t a u t
r i e
Assigning a Libref
r i b S
p t
Regardless of which host operating system you use, you

o dis SA
identify SAS data libraries by assigning a library reference

P rname (libref) to each library.

f
S f o r o
A
S No tsit d e Libref

39
ou
4-18 Chapter 4 Getting Familiar with SAS Data Sets

SAS Data Libraries


When a SAS session starts, SAS automatically creates
one temporary and at least one permanent SAS data
library that you can access.

Work – temporary library


Work

ia l
Sasuser – permanent library

t e r Sasuser

M a
40

r y i o n
t a u t
The Work library and its SAS data files are deleted after your SAS session ends.

r i e i b
SAS data sets in permanent libraries such as the Sasuser library are saved after

r S
t
your SAS session ends.

r p
o dis SA
S P SAS Data Libraries
r o f
You can also create and access your own permanent

o
A t f
libraries.

d e
S No tsi
ou
Work

Sasuser

orion – permanent library orion

41
4.2 Accessing SAS Data Libraries 4-19

Assigning a Libref
You can use the LIBNAME statement to assign a library
reference name (libref) to a SAS data library.

General form of the LIBNAME statement:

l
LIBNAME libref 'SAS-data-library' <options>;

Rules for naming a libref:

e r ia
t
The name must be 8 characters or less.

a
The name must begin with a letter or underscore.
The remaining characters must be letters, numerals,

M n
or underscores.

42

a r y t i o
i e t b u
For Windows and UNIX, SAS can only make an association between a libref and

i
an existing directory. The LIBNAME statement does not create a new directory.

p r t r S
o dis SA
z/OS (OS/390) users can use a DD statement or TSO ALLOCATE command instead
of issuing a LIBNAME statement.

P r f
S o r
Assigning a Libref
f o
A
S No tsit
Examples:
Windows
d e
libname orion 's:\workshop';

ou
UNIX
libname orion '/users/userid';

z/OS (OS/390)
libname orion 'userid.workshop.sasdata';

43
4-20 Chapter 4 Getting Familiar with SAS Data Sets

Making the Connection


When you submit the LIBNAME statement, a connection
is made between a libref in SAS and the physical location
of files on your operating system.

Work

ia l
t e r Sasuser

M a
s:\workshop
/users/userid

44

r y i o n
userid.workshop.sasdata
orion

t a u t
When your session ends, the link between the libref and the physical location of your files is broken.

r i e r i b S
p
4.05 Poll
o dis SA t
P r During an interactive SAS session, every time that

f
you submit a program you must also resubmit

S f o r
the LIBNAME statement.

o
A
S No tsit
True
False
d e
ou
46
4.2 Accessing SAS Data Libraries 4-21

Two-Level SAS Filenames


Every SAS file has a two-level name: libref.filename

The data set orion.sales is a


SAS file in the orion library.

The first name (libref)


refers to the library.
Work

ia l
t
Sales
e r Sasuser

M a
The second name (filename)

n
refers to the file in the library. orion

48

a r y t i o
i e t b u
p r t i
Temporary SAS Filename
r S
The default libref is Work if the libref is omitted.

r o dis SA
NewSalesEmps work.NewSalesEmps

S P o r
data NewSalesEmps;
o f
A t f e
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;

d
S No tsi infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $

run;
Job_Title $ Salary;

49
ou
proc print data=work.NewSalesEmps;
run;
4-22 Chapter 4 Getting Familiar with SAS Data Sets

Browsing a SAS Data Library


The SAS Explorer enables you to
manage your files in the
windowing environment.

In the SAS Explorer,


you can do the following:
view a list of all the libraries

ia l
t e r
available during your current
SAS session
navigate to see all members

M a of a specific library
display the descriptor portion

50

r y i o n
of a SAS data set

t a u t
The SAS windowing environment opens the SAS Explorer by default on many hosts. You can

i e i b
issue the SAS EXPLORER command to invoke this window if it does not appear by default.

r r S
p
o dis SA
• View
t
The SAS Explorer can be opened by selecting
Contents Only

P r or

f
S f o r
• View Explorer.

o
In the Contents Only view, the SAS Explorer is a single-paned window that contains the contents

A
S No tsit d e
of your SAS environment. As you open folders, the folder contents replace the previous contents
in the same window.
In the Explorer view of the SAS Explorer window, folders appear in the tree view on the left and
folder contents appear in the list view on the right.

ou
4.2 Accessing SAS Data Libraries 4-23

Browsing a SAS Data Library


The CONTENTS procedure with the _ALL_ keyword
produces a list of all the SAS files in the data library.

PROC CONTENTS DATA=libref._ALL_ NODS;


RUN;

The NODS option suppresses the descriptor portions

ia l
of the data sets.

t e r
NODS is only used in conjunction with the keyword
_ALL_.

M a
51

r y i o n
t a u t
If you are using a noninteractive or batch SAS session, the CONTENTS procedure

i e i b
is an alternative to the EXPLORER command.

r r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4-24 Chapter 4 Getting Familiar with SAS Data Sets

Accessing and Browsing SAS Data Libraries – Windows


p104d04
1. Retrieve and submit the program p104d04.
libname orion 's:\workshop';

proc contents data=orion._all_ nods;


run;

ia l
1

t e r
2. Check the log to confirm that the orion libref was assigned.
libname orion 's:\workshop';

M a
NOTE: Libref ORION was successfully assigned as follows:
Engine: V9
Physical Name: s:\workshop

y o n
3. View the PROC CONTENTS output in the Output window.

r i
a t
Partial PROC CONTENTS Output

t u
e
The CONTENTS Procedure

r i t r i b S
Directory

r p
o dis SA
Libref
Engine
Physical Name
ORION
V9
s:\workshop

S P o r o f
File Name s:\workshop

A t f #

d eName
Member
Type
File
Size Last Modified

S No tsi 1
2
BUDGET
COUNTRY
COUNTRY
DATA
DATA
INDEX
5120
17408
17408
12Feb08:00:57:25
12Feb08:00:57:25
12Feb08:00:57:25

ou
3 CUSTOMER DATA 33792 12Feb08:00:57:25
4 CUSTOMER_DIM DATA 33792 12Feb08:00:57:25
5 CUSTOMER_TYPE DATA 17408 12Feb08:00:57:25
CUSTOMER_TYPE INDEX 9216 12Feb08:00:57:25
6 EMPLOYEE_ADDRESSES DATA 74752 12Feb08:00:57:25
7 EMPLOYEE_DONATIONS DATA 25600 12Feb08:00:57:25
8 EMPLOYEE_ORGANIZATION DATA 41984 12Feb08:00:57:25
9 EMPLOYEE_PAYROLL DATA 33792 12Feb08:00:57:25
10 LOOKUP_COUNTRY DATA 37888 12Feb08:00:57:25
11 MNTH7_2007 DATA 5120 12Feb08:00:57:25
12 MNTH8_2007 DATA 5120 12Feb08:00:57:25
13 MNTH9_2007 DATA 5120 12Feb08:00:57:25
14 NONSALES DATA 33792 12Feb08:00:57:25
4.2 Accessing SAS Data Libraries 4-25

4. Select the Explorer tab on the SAS window bar to activate the SAS Explorer or select
View Contents Only.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
5. Double-click Libraries to show all available libraries.

o dis SA
P r f
S f o r o
A
S No tsit d e
ou
4-26 Chapter 4 Getting Familiar with SAS Data Sets

6. Double-click on the Orion library to show all members of that library.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
7. Right-click on the Sales data set and select Properties.

o dis SA
P r f
S f o r o
A
S No tsit d e
ou

This default view provides general information about the data set, such as the library in which it is
stored, the type of information it contains, its creation date, the number of observations and variables,
and so on. You can request specific information about the columns in the data table by selecting the
Columns tab at the top of the Properties window.
4.2 Accessing SAS Data Libraries 4-27

8. Select to close the Properties window.

9. Double-click on the Sales data set or right-click on the file and select Open.
This opens the data set in a VIEWTABLE window. A view of orion.sales is shown below.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A t d e
In addition to browsing SAS data sets, you can use the VIEWTABLE window to edit data sets,

S No tsi
create data sets, and customize your view of a SAS data set. For example, you can do the following:
• sort your data

ou
• change the color and fonts of variables
• display variable labels versus variable names
• remove and add variables
Variable labels are displayed by default. Display variable names instead of variable labels
by selecting View Column Names.

10. Select to close the VIEWTABLE window.

11. With the Explorer window active, select to return to Libraries.


4-28 Chapter 4 Getting Familiar with SAS Data Sets

Accessing and Browsing SAS Data Libraries – UNIX


p104d04
1. Retrieve and submit the program p104d04.
libname orion '/users/userid';

proc contents data=orion._all_ nods;


run;

ia l
e r
2. Check the log to confirm that the orion libref was assigned.

t
a
3. View the PROC CONTENTS output in the Output window.

M
4. Select View Contents Only to activate the SAS Explorer.

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
5. Double-click Libraries to show all available libraries.

ou
4.2 Accessing SAS Data Libraries 4-29

6. Double-click on the Orion library to show all members of that library.

ia l
t e r
a
7. Right-click on the Sales data set and select Properties.

M
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8. Select to close the Properties window.
4-30 Chapter 4 Getting Familiar with SAS Data Sets

9. Double-click on the Sales data set or right-click on the file and select Open.
This opens the data set in a VIEWTABLE window. A view of orion.sales is shown below.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
10. Select

o dis SA t
to close the VIEWTABLE window.

P r
11. With the SAS Explorer active, select

f
on the Toolbox to return to Libraries.

S f o r o
A
S No tsit d e
ou
4.2 Accessing SAS Data Libraries 4-31

Accessing and Browsing SAS Data Libraries – z/OS (OS/390)


.workshop.sascode(p104d04)
1. Retrieve and submit the program p104d04.
libname orion '.workshop.sasdata';

proc contents data=orion._all_ nods;


run;

ia l
e r
2. Check the log to confirm that the orion libref was assigned.

t
a
3. View the PROC CONTENTS output in the Output window.

M
4. Type explorer on the command line and press ENTER to activate the SAS Explorer.

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5. Type s beside Orion and press ENTER to show all members of that library.
4-32 Chapter 4 Getting Familiar with SAS Data Sets

6. Type s beside Sales and press ENTER to display the properties.

ia l
t e r
M a
r y i o n
t a u t
r e r i b
7. Select OK to close the Properties window.

i S
8. Type ? beside Sales and press ENTER.

p
o dis SA t
9. Select Open to open the FSVIEW window.

P r f
S f o r o
A
S No tsit d e
ou
10. Type end to close the FSVIEW window.

11. Type end to close the SAS Explorer.


4.2 Accessing SAS Data Libraries 4-33

Exercises

Level 1

4. Accessing a SAS Data Library

ia l
t e r
a. Write and submit the appropriate LIBNAME statement to provide access to the orion libref.

Fill in the blank with the location of your SAS data library.

a
libname orion '_____________________________';

M
r
Windows
y i o n
Possible location of your SAS data library:
s:\workshop

t a
UNIX

u t /users/userid

r i e
z/OS (OS/390)

r i b S
.workshop.sasdata

p t
b. Check the log to confirm that the SAS data library was assigned.

o dis SA
r
NOTE: Libref ORION was successfully assigned as follows:

S P o o f
c. Add a PROC CONTENTS step to list all the SAS data sets in the orion library. Do not display

r
the descriptor portions of the individual data sets.

A t f d
orion.sales.
e
d. Add another PROC CONTENTS step to display the descriptor portion of the data set

S No tsi
e. Use the SAS Explorer window to view the contents of the orion library.

ou
Level 2

5. Reviewing Concepts
a. SAS statements usually begin with a(n) .
b. Every SAS statement ends with a .
c. The descriptor portion of a SAS data set can be viewed using the _________________ procedure.
d. Character variable values can be up to characters long and use
byte(s) of storage per character.
e. By default, numeric variables are stored in bytes of storage.
f. The internally stored SAS date value for January 3, 1960, is .
4-34 Chapter 4 Getting Familiar with SAS Data Sets

g. A SAS variable name has to characters and begins with a


or an .
h. A missing character value is displayed as a .
i. A missing numeric value is displayed as a .
j. When a SAS session starts, SAS automatically creates the temporary library called ___________.

l
k. A libref name must be _________________ characters or less.
l. What are the two kinds of steps? ____________________________________________________

e r
m. What are the three primary windows in the SAS windowing environment? __________________ ia
a t
n. What are the two portions of every SAS data set? ______________________________________

M
o. What are the two types of variables? _________________________________________________

r y i o n
p. True or False: If a SAS program produces output, then the program ran successfully and there is
no need to check the SAS log.

t a u t
q. True or False: There are two methods for commenting in a SAS program.

i e i b
r. True or False: Omitting a semicolon never causes errors.

r r S
p t
s. True or False: A library reference name (libref) references a particular data set.

o dis SA
t. True or False: If a data set is referenced with a one level name, Work is the implied libref.

P r f
u. True or False: The _ALL_ keyword is used with the PRINT procedure.

S
Level 3

f o r o
A
S No tsit d e
6. Investigating the LIBNAME Statement
a. Use the Help facility to find documentation about the LIBNAME statement.

ou
Go to the CONTENTS tab in the SAS Help and Documentation and select
SAS Products Base SAS SAS 9.2 Language Reference: Dictionary
Dictionary of Language Elements Statements LIBNAME Statement.
b. Answer the following questions:
What argument disassociates one or more currently assigned librefs? ___________________
What system option provides you the convenience of specifying only a one-level name for
permanent SAS files? _________________________________________________________
c. Write and submit a LIBNAME statement that shows the attributes of all currently assigned
SAS libraries in the SAS log. ___________________________________________________
4.3 Accessing Relational Databases (Self-Study) 4-35

4.3 Accessing Relational Databases (Self-Study)

Objectives
Assign a library reference name to a relational
database by using the LIBNAME statement.
Reference a relational database table using
a SAS two-level name.

ia l
t e r
M a
r y i o n
t a u t
62

r i e r i b S
p
o dis SA t
P r The LIBNAME Statement (Review)

f
r
The LIBNAME statement assigns a library reference

S f o e o
name (libref) to a SAS data library.

A
S No tsit d
General form of the LIBNAME statement:

LIBNAME libref 'SAS-data-library' <options>;

ou
63
4-36 Chapter 4 Getting Familiar with SAS Data Sets

The SAS/ACCESS LIBNAME Statement


The SAS/ACCESS LIBNAME statement assigns a library
reference name (libref) to a relational database.

General form of the SAS/ACCESS LIBNAME statement:

LIBNAME libref engine-name <SAS/ACCESS-options>;

ia l
t r
After a database is associated with a libref, you can use a

e
SAS two-level name to specify any table in the database
and then work with the table as you would with a SAS
data set.

M a
64

r y i o n
t a u t
The SAS/ACCESS interface to relational databases is a family of interfaces (each of which is licensed

r i e r i b
separately) that enable you to interact with data in other vendors' databases from within SAS.

S
t
The engine-name such as Oracle or DB2 is the SAS/ACCESS component that reads from and writes to

p
o dis SA
your DBMS. The engine name is required. Because the SAS/ACCESS LIBNAME statement associates a
libref with a SAS/ACCESS engine that supports connections to a particular DBMS, it requires a DBMS-

r
specific engine name.

S P o r o f
A t f d e
S No tsi
ou
4.3 Accessing Relational Databases (Self-Study) 4-37

Oracle Example
This example uses the LIBNAME statement as supported
in the SAS/ACCESS interface to Oracle.

libref engine

libname oralib oracle


user=edu001

ia l
r
pw=edu001
options path=dbmssrv

a t e schema=educ;

y M n
65

a r t i o p104d05

i e t b u
USER= specifies an optional Oracle user name. If the user name contains blanks or national characters,
enclose it in quotation marks. USER= must be used with PASSWORD=.

name.

p r t r i
PASSWORD= or PW= specifies an optional Oracle password that is associated with the Oracle user

S
r o dis SA
PATH= specifies the Oracle driver, node, and database. SAS/ACCESS uses the same Oracle path
designation that you use to connect to Oracle directly. See your database administrator to determine

P f
the databases that are set up in your operating environment, and to determine the default values

S f o r
if you do not specify a database.

o
SCHEMA= enables you to read database objects, such as tables and views, in the specified schema. If

A
S No tsit d e
this option is omitted, you connect to the default schema for your DBMS. The values for SCHEMA=
are usually case sensitive, so use care when you specify this option.

ou
4-38 Chapter 4 Getting Familiar with SAS Data Sets

Oracle Example
Any table in this Oracle database can be referenced
using a SAS two-level name.

ia l
t e r
M a
66

r y i o n
t a u t
r i e
Oracle Example
r i b S
p
o dis SA t
libname oralib oracle
user=edu001 pw=edu001

P r path=dbmssrv schema=educ;

f
S o
run;

f r
proc print data=oralib.supervisors;

o
A
S No tsit d e
data work.staffpay;
merge oralib.staffmaster
oralib.payrollmaster;
by empid;
run;

ou
libname oralib clear;

p104d05
67

The CLEAR option in the LIBNAME statement disassociates the libref. Disassociating the libref
disconnects the database engine from the database and closes any resources that are associated with
that libref's connection.
4.3 Accessing Relational Databases (Self-Study) 4-39

4.06 Quiz
Which option in the LIBNAME statement specifies a
user’s password when accessing an Informix database?

Documentation on SAS/ACCESS for Informix can be found


in the SAS Help and Documentation from the Contents tab
(SAS Products SAS/ACCESS
SAS/ACCESS 9.2 for Relational Databases Reference
DBMS-Specific Reference
ia l
SAS/ACCESS for Informix

t e r
LIBNAME Statement Specifics for Informix).

M a
69

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
4-40 Chapter 4 Getting Familiar with SAS Data Sets

4.4 Chapter Review

Chapter Review
1. What structure is a SAS data library in UNIX or
Windows?

2. What structure is a SAS data library in z/OS

ia l
r
(OS/390)?

t e
3. What is the name of the permanent SAS data library

a
that SAS creates for you?

y M n
4. What can you do with the SAS Explorer?

a r t i o
5. What window enables you to interactively browse a

i e t
SAS data set?

b u
i
72

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
4.5 Solutions 4-41

4.5 Solutions

Solutions to Exercises
1. Examining the Data Portion
a. Retrieve the starter program.
b. After the PROC CONTENTS step, add a PROC PRINT step.
data work.donations;

ia l
r
infile 'donation.dat';

e
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;

run;

a t
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);

run;

y M
proc contents data=work.donations;

n
a r
proc print data=work.donations;

t i o
run;

i e t b u
i
c. Submit the program.

p r t r S
d. In the PROC PRINT step, add a VAR statement and the NOOBS option.

o dis SA
proc print data=work.donations noobs;

P
run;
r
var Employee_ID Total;

f
S f o r o
e. Submit the program.

A
S No tsit d e
ou
4-42 Chapter 4 Getting Familiar with SAS Data Sets

2. Examining the Descriptor and Data Portions


a. Retrieve the starter program.
b. After the DATA step, add a PROC CONTENTS step.
data work.newpacks;
input Supplier_Name $ 1-20 Supplier_Country $ 23-24
Product_Name $ 28-70;
datalines;
Top Sports
Top Sports
Top Sports
DK
DK
DK
Black/Black
X-Large Bottlegreen/Black
Commanche Women's 6000 Q Backpack. Bark
ia l
Miller Trading Inc
Toto Outdoor Gear
US
AU

t e r
Expedition Camp Duffle Medium Backpack
Feelgood 55-75 Litre Black Women's Backpack

a
Toto Outdoor Gear AU Jaguar 50-75 Liter Blue Women's Backpack
Top Sports DK Medium Black/Bark Backpack
Top Sports
Top Sports

y M DK
DK
Medium Gold Black/Gold Backpack

n
Medium Olive Olive/Black Backpack

o
Toto Outdoor Gear AU Trekker 65 Royal Men's Backpack
Top Sports

t a
Luna sastreria S.A. r t
DK
ES
i Victor Grey/Olive Women's Backpack
Deer Backpack

i e
Luna sastreria S.A.
Luna sastreria S.A.

i b uES
ES
Deer Waist Bag
Hammock Sports Bag

p r
Miller Trading Inc

t r
US

S
Sioux Men's Backpack 26 Litre.

o dis SA
run;

P r f
proc contents data=work.newpacks;
run;

S f o r o
c. Submit the program and answer the following questions:

A
S No tsit d e
How many observations are in the data set? 15
How many variables are in the data set? 3

ou
What is the length (byte-size) of the variable Product_Name? 43

d. After the PROC CONTENTS step, add a PROC PRINT step.


proc print data=work.newpacks noobs;
var Product_Name Supplier_Name;
run;
e. Submit the program.
4.5 Solutions 4-43

3. Working with Times and Datetimes


a. Retrieve and submit the starter program.
b. Notice the values of CurrentTime and CurrentDateTime in the PROC PRINT output.

c. Use the Help facility to find documentation on how times and datetimes are stored in SAS.
d. Complete the following sentences:

day.

ia l
A SAS time value is a value representing the number of seconds since midnight of the current

e r
A SAS datetime value is a value representing the number of seconds between January 1, 1960,
and an hour/minute/second within a specified date.

t
M a
4. Accessing a SAS Data Library
a. Write and submit the appropriate LIBNAME statement.

r y
libname orion 'SAS-data-library';

i o n
a t
b. Check the log to confirm that the SAS data library was assigned.

i e t b u
c. Add a PROC CONTENTS step to list all the SAS data sets in the orion library.

run;

p r t i
proc contents data=orion._all_ nods;

r S
o dis SA
d. Add another PROC CONTENTS step to display the descriptor portion of orion.sales.

P
run;r
proc contents data=orion.sales;

f
S o r o
e. Use the SAS Explorer window to view the contents of the orion library.

f
A t e
5. Reviewing Concepts

S No tsi d
a. SAS statements usually begin with an identifying keyword.
b. Every SAS statement ends with a semicolon.

ou
c. The descriptor portion of a SAS data set can be viewed using the CONTENTS procedure.
d. Character variable values can be up to 32,767 characters long and use 1 byte(s) of storage per
character.
e. By default, numeric variables are stored in 8 bytes of storage.
f. The internally stored SAS date value for January 3, 1960, is 2.
g. A SAS variable name has 1 to 32 characters and begins with a letter or an underscore.
h. A missing character value is displayed as a blank.
i. A missing numeric value is displayed as a period.
j. When a SAS session starts, SAS automatically creates the temporary library called Work.

k. A libref name must be 8 characters or less.


4-44 Chapter 4 Getting Familiar with SAS Data Sets

l. What are the two kinds of steps? DATA and PROC


m. What are the three primary windows in the SAS windowing environment? Editor, Log, and
Output
n. What are the two portions of every SAS data set? Descriptor and Data
o. What are the two types of variables? Character and Numeric
p. True or False: If a SAS program produces output, then the program ran successfully and there is
no need to check the SAS log. False
q. True or False: There are two methods for commenting in a SAS program. True
ia l
e r
r. True or False: Omitting a semicolon never causes errors. False

t
M a
s. True or False: A library reference name (libref) references a particular data set. False
t. True or False: If a data set is referenced with a one level name, Work is the implied libref. True

y o n
u. True or False: The _ALL_ keyword is used with the PRINT procedure. False

r i
a t
6. Investigating the LIBNAME Statement

t u
r i e
a. Use the Help facility.

r i b S
b. Answer the following questions:

p
o dis SA t
What argument disassociates one or more currently assigned librefs? CLEAR

P r What system option provides you the convenience of specifying only a one-level name for

f
permanent SAS files? USER=

S o r o
c. Write and submit a LIBNAME statement.

f
A libname _all_ list;

S No tsit d e
ou
4.5 Solutions 4-45

Solutions to Student Activities (Polls/Quizzes)

4.01 Quiz – Correct Answer


How many observations are in the data set
Work.donations?
124 observations

ia l
r
data work.donations;
infile 'donation.dat';

t e
input Employee_ID Qtr1 Qtr2 Qtr3 Qtr4;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
run;
a
M n
proc contents data=work.donations;
run;

y
a r t i o
i e t b u p104a01s

i
10

p r t r S
r o dis SA
4.02 Multiple Choice Poll – Correct Answer

S P r
values?

o o f
Which variable type do you think SAS uses to store date

A t f
a. character

d e
S No tsi
b. numeric

ou
15
4-46 Chapter 4 Getting Familiar with SAS Data Sets

4.03 Quiz – Correct Answer


What is the numeric value for today’s date?
The answer depends on the current date.

Example:
If the current date is February 1, 2008, the numeric value
is 17563.

ia l
t e r
M a
21

r y i o n
t a u t
r i e i b
4.04 Multiple Answer Poll – Correct Answer
r S
p t
Which variable names are valid?

o dis SA
P r a.
b.
data5mon
5monthsdata

f
S
c.

f
d.
o rdata#5

o
five months data

A
S No tsit
e.
f. e
five_months_data

d
FiveMonthsData

26
ou
4.5 Solutions 4-47

4.05 Poll – Correct Answer


During an interactive SAS session, every time that
you submit a program you must also resubmit
the LIBNAME statement.

True
False

ia l
e r
The LIBNAME statement remains in effect until
canceled, changed, or your SAS session ends.

t
M a
47

r y i o n
t a u t
r i e i b
4.06 Quiz – Correct Answer
r S
p t
Which option in the LIBNAME statement specifies a

o dis SA
user’s password when accessing an Informix database?

P r f
r
The USING= option specifies the password that is

S o o
associated with the Informix user. USING= can also

f e
be specified with the PASSWORD= and PWD= aliases.

A
S No tsit d
70
ou
4-48 Chapter 4 Getting Familiar with SAS Data Sets

Solutions to Chapter Review

Chapter Review Answers


1. What structure is a SAS data library in UNIX or
Windows?
a directory

ia l
(OS/390)?

t
an operating system file r
2. What structure is a SAS data library in z/OS

e
M a
3. What is the name of the permanent SAS data library

Sasuser

r y
that SAS creates for you?

i o n
t a u t
73

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
r
4. What can you do with the SAS Explorer?

S f o o
view a list of all the libraries available to your

e
SAS session

A
S No tsit d
navigate to see all members of a specific library
display the descriptor portion of a SAS data set

ou
5. What window enables you to interactively browse a
SAS data set?
the VIEWTABLE window

74
Chapter 5 Reading SAS Data Sets

5.1 Introduction to Reading Data ........................................................................................ 5-3

5.2 Using SAS Data as Input ................................................................................................ 5-6

ia l
5.3

t e r
Subsetting Observations and Variables ..................................................................... 5-12
Exercises .............................................................................................................................. 5-26

5.4
M a
Adding Permanent Attributes ...................................................................................... 5-29

r y o n
Exercises .............................................................................................................................. 5-41

i
5.5
t a u t
Chapter Review ............................................................................................................. 5-44

r i e r i b S
t
5.6 Solutions ....................................................................................................................... 5-45

r p
o dis SA
Solutions to Exercises .......................................................................................................... 5-45

Solutions to Student Activities (Polls/Quizzes) ..................................................................... 5-50

S P o r o f
Solutions to Chapter Review ................................................................................................ 5-54

A t f d e
S No tsi
ou
5-2 Chapter 5 Reading SAS Data Sets

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5.1 Introduction to Reading Data 5-3

5.1 Introduction to Reading Data

Objectives
Define the concept of reading from a data source
to create a SAS data set.
Define the business scenario that will be used when
reading from a SAS data set, an Excel worksheet,

ia l
r
and a raw data file.

a t e
y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
Business Scenario

S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and

f
the United States.

A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales

ou
Representatives
the employee's first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion

4
5-4 Chapter 5 Reading SAS Data Sets

Business Scenario

Reading SAS
Data Sets

Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files

M a
5

r y i o n
t a u t
Three different input data sources are used to create the new SAS data set.

r i e i b
First, a SAS data set is used as the input data source.

r S
p t
Second, an Excel workbook is used as the input data source.

o dis SA
P r
Third, a raw data file is used as the input data source.

f
S f o r o
A
S No tsit d e
ou
5.1 Introduction to Reading Data 5-5

Business Scenario
libname ;
Reading SAS data ;
set ;
Data Sets ...
run;

l
libname ;
data ;

ia
Reading Excel
set ;
Worksheets

r
...
run;

Reading Delimited
a t
data
e
infile
input
;

;
;
Raw Data Files

y M ...
run;

n
6

a r t i o
i e t b u
The DATA step is used to accomplish the scenario regardless of the input data source. Additional
statements are added to the DATA step to complete all of the requirements.

r t r i S
The LIBNAME statement references a SAS data library when reading a SAS data set, and an Excel

p
o dis SA
workbook when reading an Excel worksheet.

P r f
S f o r
5.01 Multiple Answer Poll
o
Which types of files will you read into SAS?

A
S No tsita.
b.
c.
d e
SAS data sets
Excel worksheets
raw data files

ou
d. other
e. not sure

8
5-6 Chapter 5 Reading SAS Data Sets

5.2 Using SAS Data as Input

Objectives
Use the DATA step to create a SAS data set from
an existing SAS data set.

ia l
t e r
M a
r y i o n
t a u t
10

r i e r i b S
p
o dis SA t
P r Business Scenario

f
S f o r
Reading SAS
o
A
S No tsit
Data Sets

d e
Reading Excel

ou
Worksheets

Reading Delimited
Raw Data Files

11
5.2 Using SAS Data as Input 5-7

Business Scenario
libname ;
Reading SAS data ;
set ;
Data Sets ...
run;

l
libname ;
data ;

ia
Reading Excel
set ;
Worksheets

r
...
run;

Reading Delimited
data

a t e
infile
input
;

;
;
Raw Data Files

y M ...
run;

n
12

a r t i o
i e t b u
p r t i
Business Scenario Syntax
r S
Use the following statements to complete the scenario:

r o dis SA
LIBNAME libref 'SAS-data-library';

S P o r o f
DATA output-SAS-data-set;

A t f e
SET input-SAS-data-set;
WHERE where-expression;

d
S No tsi KEEP variable-list;
LABEL variable = 'label'
variable = 'label'

ou
variable = 'label';
FORMAT variable(s) format;
RUN;

13 continued...
5-8 Chapter 5 Reading SAS Data Sets

Business Scenario Syntax


Use the following statements to complete the scenario:

LIBNAME libref 'SAS-data-library';


Part 1
DATA output-SAS-data-set;

l
SET input-SAS-data-set;

ia
WHERE where-expression;
Part 2
KEEP variable-list;

t r
LABEL variable = 'label'

e
variable = 'label'
Part 3

a
variable = 'label';
FORMAT variable(s) format;
RUN;

y M n
14

a r t i o
i e t
move to the next part.

b u
Instead of writing the program all at once, break the program into three parts. Test each part before you

p r t r i S
r o dis SA
The LIBNAME Statement (Review)

P f
A library reference name (libref) is needed if a permanent

S f o r
data set is being read or created.

o
A
S No tsit d e
LIBNAME libref 'SAS-data-library';

DATA output-SAS-data-set;
SET input-SAS-data-set;

ou
<additional SAS statements>
RUN;

The LIBNAME statement assigns a libref to a SAS data


library.

15
5.2 Using SAS Data as Input 5-9

The DATA Statement


The DATA statement begins a DATA step and provides
the name of the SAS data set being created.

LIBNAME libref 'SAS-data-library';

l
DATA output-SAS-data-set;

ia
SET input-SAS-data-set;
<additional SAS statements>
RUN;

t e r
data sets.

M a
The DATA statement can create temporary or permanent

16

r y i o n
t a u t
r i e
The SET Statement
r i b S
p t
The SET statement reads observations from a SAS data

o dis SA
set for further processing in the DATA step.

P r f
r
LIBNAME libref 'SAS-data-library';

S f o o
DATA output-SAS-data-set;

e
A
S No tsit
SET input-SAS-data-set;

d
<additional SAS statements>
RUN;

ou
By default, the SET statement reads all observations
and all variables from the input data set.
The SET statement can read temporary or permanent
data sets.
17
5-10 Chapter 5 Reading SAS Data Sets

Business Scenario Part 1


Create a temporary SAS data set named Work.subset1
from the permanent SAS data set named orion.sales.
libname orion 's:\workshop';

data work.subset1;
set orion.sales;
run;

ia l
9
Partial SAS Log
data work.subset1;

t e r Both data sets contain 165

a
10 set orion.sales; observations and 9 variables
11 run;

M
NOTE: There were 165 observations read from the data set ORION.SALES.

n
NOTE: The data set WORK.SUBSET1 has 165 observations and 9 variables.

y
18

a r t i o p105d01

i e t b u
The LIBNAME statement needs to reference a SAS data library specific to your operating environment.

p r
For example:

Windows
t r i S
libname orion 's:\workshop';

r o dis SA
UNIX libname orion '/users/userid';

S P r
z/OS (OS/390)

o o f libname orion '.workshop.sasdata';

A t f d e
Business Scenario Part 1
S No tsi proc print data=work.subset1;
run;

Obs

1
2
ou
Partial PROC PRINT Output
Employee_ID

120102
120103
First_
Name

Tom
Wilson
Last_Name

Zhou
Dawes
Gender

M
M
Salary

108255
87975
Job_Title

Sales
Sales
Manager
Manager
Country

AU
AU
Birth_
Date

3510
-3996
Hire_
Date

10744
5114
3 120121 Irenie Elvish F 26600 Sales Rep. II AU -5630 5114
4 120122 Christina Ngan F 27475 Sales Rep. II AU -1984 6756
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 1732 9405
6 120124 Lucian Daymond M 26480 Sales Rep. I AU -233 6999
7 120125 Fong Hofmeister M 32040 Sales Rep. IV AU -1852 6999
8 120126 Satyakam Denny M 26780 Sales Rep. II AU 10490 17014
9 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 6943 14184
10 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 9691 17106
11 120129 Alvin Roebuck M 30070 Sales Rep. III AU 1787 9405
12 120130 Kevin Lyon M 26955 Sales Rep. I AU 9114 16922
13 120131 Marinus Surawski M 26910 Sales Rep. I AU 7207 15706
14 120132 Fancine Kaiser F 28525 Sales Rep. III AU -3923 6848
15 120133 Petrea Soltau F 27440 Sales Rep. II AU 9608 17075

p105d01
19
5.2 Using SAS Data as Input 5-11

Setup for the Poll


Retrieve program p105a01.
Submit the program and confirm that a new SAS data
set was created with 77 observations and 12 variables.

ia l
t e r
M a
21

r y i o n
t a u t
r i
5.02 Polle r i b S
p t
The DATA step reads a temporary SAS data set to create

o dis SA
a permanent SAS data set.

P r f
r
True

S f o
False

e o
A
S No tsit d
22
ou
5-12 Chapter 5 Reading SAS Data Sets

5.3 Subsetting Observations and Variables

Objectives
Subset observations by using the WHERE statement.
Subset variables by using the DROP and KEEP
statements.

ia l
t e r
M a
r y i o n
t a u t
26

r i e r i b S
p
o dis SA t
P r Business Scenario Syntax

f
r
Use the following statements to complete the scenario:

S f o e o
LIBNAME libref 'SAS-data-library';

A
S No tsit d
DATA output-SAS-data-set;
SET input-SAS-data-set;
WHERE where-expression;
Part 1

ou
Part 2
KEEP variable-list;
LABEL variable = 'label'
variable = 'label'
Part 3
variable = 'label';
FORMAT variable(s) format;
RUN;

27
5.3 Subsetting Observations and Variables 5-13

Subsetting Observations and Variables


By default, the SET statement reads all observations
and all variables from the input data set.

9 data work.subset1;
10 set orion.sales;
11 run;

NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.SUBSET1 has 165 observations and 9 variables.

ia l
t e r
By adding statements to the DATA step, the number of
observations and variables can be reduced.

a
NOTE: The data set WORK.SUBSET1 has 61 observations and 5 variables.

M
28

r y i o n
t a u t
r i e
The WHERE Statement
r i b S
p t
The WHERE statement subsets observations that meet

o dis SA
a particular condition.

P r General form of the WHERE statement:

f
S f o r o
WHERE where-expression ;

A
S No tsit d e
The where-expression is a sequence of operands and
operators that form a set of instructions that define a
condition for selecting observations.

ou
Operands include constants and variables.
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.

29
5-14 Chapter 5 Reading SAS Data Sets

Operands
A constant operand is a fixed value.
Character values must be enclosed in quotation marks
and are case sensitive.
Numeric values do not use quotation marks.

A variable operand must be a variable coming from an


input data set.

ia l
Examples:

t e r
a
where Gender = 'M'; where Salary > 50000;

variable

y M
constant

n variable constant

30

a r t i o
i e t b u
p r t i
Comparison Operators
r S
Comparison operators compare a variable with a value

r o dis SA
or with another variable.

S P o r
Symbol
=

o f
Mnemonic
EQ
Definition
equal to

A t f ^= ¬=

d e
~= NE not equal to

S No tsi
> GT greater than
< LT less than
>= GE greater than or equal

ou
<= LE less than or equal
IN equal to one of a list

31
5.3 Subsetting Observations and Variables 5-15

Comparison Operators
Examples:

where Gender = 'M';

where Gender eq ' ';

where Salary ne .;

ia l
where Salary >= 50000;

t e r
M a
where Country in ('AU','US');
Values must be
separated by

n
where Country in ('AU' 'US'); commas or blanks.

32

a r y t i o
i e t b u
p r
Arithmetic Operators

t r i S
Arithmetic operators indicate that an arithmetic calculation

r o dis SA
is performed.

S P o r o**
f
Symbol Definition
exponentiation

A t f d e * multiplication

S No tsi
/ division
+ addition
- subtraction

33
ou
5-16 Chapter 5 Reading SAS Data Sets

Arithmetic Operators
Examples:

where Salary / 12 < 6000;

where Salary / 12 * 1.10 >= 7500;

where (Salary / 12 ) * 1.10 >= 7500;

ia l
t e r
where Salary + Bonus <= 10000;

M a
34

r y i o n
t a u t
r i e
Logical Operators
r i b S
p t
Logical operators combine or modify expressions.

o dis SA
P r Symbol

f
Mnemonic Definition

r
& AND logical and

S f o e
|
o OR logical or

A
S No tsit
^

d
¬ ~ NOT logical not

35
ou
5.3 Subsetting Observations and Variables 5-17

Logical Operators
Examples:

where Gender ne 'M' and Salary >=50000;

where Gender ne 'M' or Salary >= 50000;

where Country = 'AU' or Country = 'US';

ia l
t e r
where Country not in ('AU' 'US');

M a
36

r y i o n
t a u t
r i
5.03 Quize r i b S
p t
Which WHERE statement correctly subsets the numeric

o dis SA
values for May, June, or July and missing character

P rnames?

f
S f o r
a. Answer
where Months in (5-7)

o
and Names = .;

A
S No tsit
b. Answer
d e
where Months in (5,6,7)
and Names = ' ';

ou
c. Answer
where Months in ('5','6','7')
and Names = '.';

38
5-18 Chapter 5 Reading SAS Data Sets

Special WHERE Operators


Special WHERE operators are operators that can only
be used in a where-expression.

Symbol Mnemonic Definition


BETWEEN-AND an inclusive range
IS NULL
IS MISSING
missing value
missing value

ia l
? CONTAINS

t
LIKE
e r a character string
a character pattern

M a
40

r y i o n
t a u t
r i e i
BETWEEN-AND Operator
r b S
p t
The BETWEEN-AND operator selects observations in

o dis SA
which the value of a variable falls within an inclusive

P r range of values.

f
S f o r
Examples:

o
where salary between 50000 and 100000;

A
S No tsit d e
where salary not between 50000 and 100000;

Equivalent Expressions:

ou
where salary between 50000 and 100000;

where 50000 <= salary <= 100000;

41
5.3 Subsetting Observations and Variables 5-19

IS NULL and IS MISSING Operators


The IS NULL and IS MISSING operators select
observations in which the value of a variable is missing.
The operator can be used for both character and
numeric variables.
You can combine the NOT logical operator with
IS NULL or IS MISSING to select nonmissing values.

ia l
Examples:
where Employee_ID is null;

t e r
a
where Employee_ID is missing;

M
42

r y i o n
t a u t
r i e
CONTAINS Operator
r i b S
p t
The CONTAINS (?) operator selects observations that

o dis SA
include the specified substring.

P r The position of the substring within the variable’s

f
r
values is not important.

S f o o
The operator is case sensitive when you make

e
comparisons.

A
S No tsit
Example:
d
where Job_Title contains 'Rep';

43
ou
5-20 Chapter 5 Reading SAS Data Sets

5.04 Quiz
Which value will not be returned based on the
WHERE statement?
a. Office Rep
b. Sales Rep. IV
c. service rep III
d. Representative

ia l
e r
where Job_Title contains 'Rep';

t
M a
45

r y i o n
t a u t
r i e
LIKE Operator
r i b S
p t
The LIKE operator selects observations by comparing

o dis SA
character values to specified patterns.

P r f
r
There are two special characters available for specifying

S o
a pattern:

f e o
A percent sign (%) replaces any number of characters.

A
S No tsit d
An underscore (_) replaces one character.

Consecutive underscores can be specified.

ou
A percent sign and an underscore can be specified in the
same pattern.
The operator is case sensitive.

47
5.3 Subsetting Observations and Variables 5-21

LIKE Operator
Examples:

where Name like '%N';

This WHERE statement selects observations that begin


with any number of characters and end with an N.

ia l
r
where Name like 'T_M%';

a t e
This WHERE statement selects observations that begin
with a T, followed by a single character, followed by an M,
followed by any number of characters.

y M n
48

a r t i o
i e t b u
Starting in SAS 9.2, the LIKE operator supports an escape character, which enables you to search for the
percent sign (%) and the underscore (_) characters in values.

p r t r i S
An escape character is a single character that in a sequence of characters signifies that what is to follow

o dis SA
takes an alternative meaning. For the LIKE operator, an escape character signifies to search for literal

P r
instances of the % and _ characters in the variable's values instead of performing the special-character
function.

f
S o r o
To specify an escape character, you include the character in the pattern-matching expression and then the

f e
keyword ESCAPE followed by the escape character expression. When you include an escape character,

A
S No tsit
the pattern-matching expression must be enclosed in quotation marks and it cannot contain a column

d
name. The escape character expression is an expression that evaluates to a single character. The operands
must be character or string literals. If it is a single character, it must be enclosed in quotation marks.

ou
For example, if the variable X contains the values abc, a_b, and axb, the following LIKE operator using
an escape character selects only the value a_b. The escape character (/) specifies that the pattern search
for a '_' instead of matching any single character:
where x like 'a/_b' escape '/';
Without an escape character, the following LIKE operator would select the values a_b and axb. The
special character underscore in the search pattern matches any single character, including the value with
the underscore:
where x like 'a_b';
5-22 Chapter 5 Reading SAS Data Sets

5.05 Quiz
Which WHERE statement will return all the observations
that have a first name starting with the letter M for the
given values?
where Name like '_, M_';
a. Answer

where Name like '%, M%';


b. Answer

ia l
c. Answer

t e r
where Name like '_, M%';

d. Answer

M a
where Name like '%, M_';

n
first name

50

a r y t i o
last name

i e t b u
p r t i
Business Scenario Part 2
r S
Include only the employees from Australia who have the

r o dis SA
word Rep in their job title.

P f
data work.subset1;

S f o rset orion.sales;

o
where Country='AU' and
Job_Title contains 'Rep';

A
S No tsit
run;

d e
Partial SAS Log
NOTE: There were 61 observations read from the data set ORION.SALES.

ou
WHERE (Country='AU') and Job_Title contains 'Rep';
NOTE: The data set WORK.SUBSET1 has 61 observations and 9 variables.

p105d02
52
5.3 Subsetting Observations and Variables 5-23

Business Scenario Part 2


proc print data=work.subset1;
run;

Partial PROC PRINT Output


First_ Birth_ Hire_
Obs Employee_ID Name Last_Name Gender Salary Job_Title Country Date Date

1
2
3
120121
120122
120123
Irenie
Christina
Kimiko
Elvish
Ngan
Hotstone
F
F
F
26600
27475
26190
Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
AU
AU
AU
-5630
-1984
1732
5114
6756
9405

ia l
r
4 120124 Lucian Daymond M 26480 Sales Rep. I AU -233 6999
5 120125 Fong Hofmeister M 32040 Sales Rep. IV AU -1852 6999

e
6 120126 Satyakam Denny M 26780 Sales Rep. II AU 10490 17014
7 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 6943 14184

t
8 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 9691 17106
9 120129 Alvin Roebuck M 30070 Sales Rep. III AU 1787 9405

a
10 120130 Kevin Lyon M 26955 Sales Rep. I AU 9114 16922
11 120131 Marinus Surawski M 26910 Sales Rep. I AU 7207 15706
12 120132 Fancine Kaiser F 28525 Sales Rep. III AU -3923 6848

M
13 120133 Petrea Soltau F 27440 Sales Rep. II AU 9608 17075
14 120134 Sian Shannan M 28015 Sales Rep. II AU -3861 5114

n
15 120135 Alexei Platts M 32490 Sales Rep. IV AU 3313 13788

53

a r y t i o p105d02

i e t b u
p r t i
The DROP and KEEP Statements
r S
The DROP statement specifies the names of the variables

r o dis SA
to omit from the output data set(s).

S P o r o f
DROP variable-list;

A t f e
The KEEP statement specifies the names of the variables
to write to the output data set(s).

d
S No tsi KEEP variable-list;

ou
The variable-list specifies the variables to drop or keep,
respectively, in the output data set.

55
5-24 Chapter 5 Reading SAS Data Sets

The DROP and KEEP Statements


Examples:
drop Employee_ID Gender
Country Birth_Date;

keep First_Name Last_Name


Salary Job_Title
Hire_Date;

ia l
t e r
M a
56

r y i o n
t a u t
r i e i b
Business Scenario Part 2
r S
p t
Include only the employee's first name, last name, salary,

o dis SA
job title, and hired date in the data set Work.subset1.

P r data work.subset1;

f
r
set orion.sales;

S f o o
where Country='AU' and
Job_Title contains 'Rep';

e
A
keep First_Name Last_Name Salary

S No tsit run;
d
Job_Title Hire_Date;

Partial SAS Log

ou
NOTE: There were 61 observations read from the data set ORION.SALES.
WHERE (Country='AU') and Job_Title contains 'Rep';
NOTE: The data set WORK.SUBSET1 has 61 observations and 5 variables.

p105d03
57
5.3 Subsetting Observations and Variables 5-25

Business Scenario Part 2


proc print data=work.subset1;
run;

Partial PROC PRINT Output


First_ Hire_
Obs Name Last_Name Salary Job_Title Date

1
2
3
Irenie
Christina
Kimiko
Elvish
Ngan
Hotstone
26600
27475
26190
Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
5114
6756
9405

ia l
r
4 Lucian Daymond 26480 Sales Rep. I 6999

e
5 Fong Hofmeister 32040 Sales Rep. IV 6999
6 Satyakam Denny 26780 Sales Rep. II 17014

t
7 Sharryn Clarkson 28100 Sales Rep. II 14184

a
8 Monica Kletschkus 30890 Sales Rep. IV 17106
9 Alvin Roebuck 30070 Sales Rep. III 9405
10 Kevin Lyon 26955 Sales Rep. I 16922

M
11 Marinus Surawski 26910 Sales Rep. I 15706
12 Fancine Kaiser 28525 Sales Rep. III 6848

58

r y i o n p105d03

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-26 Chapter 5 Reading SAS Data Sets

Exercises

Level 1

1. Subsetting Observations and Variables Using the WHERE and KEEP Statements
a. Retrieve and submit the starter program p105e01.

ia l
e r
What is the variable name that contains gender values? ________________________________

t
a
What are the two possible gender values? ___________________________________________

M
b. Add a DATA step before the PROC PRINT step to read the data set orion.customer_dim to

n
create a new data set called Work.youngadult.

y
r i o
c. Modify the PROC PRINT step to refer to the new data set.

a t
i e t
11 variables.

b u
d. Submit the program and confirm that Work.youngadult was created with 77 observations and

p r t r i S
e. Add a WHERE statement to the DATA step to subset for female customers.

r o dis SA
f. Submit the program and confirm that Work.youngadult was created with 30 observations and
11 variables.

S P o r o f
g. Modify the WHERE statement to subset for female customers whose Customer_Age is
between 18 and 36.

A t f e
h. Submit the program and confirm that Work.youngadult was created with 15 observations and

d
S No tsi
11 variables.
i. Modify the WHERE statement to subset for female 18- to 36-year-old customers who have the
word Gold in their Customer_Group.

ou
j. Submit the program and confirm that Work.youngadult was created with 5 observations and
11 variables.
k. Modify the DATA step so that Work.youngadult contains only Customer_Name,
Customer_Age, Customer_BirthDate, Customer_Gender, and Customer_Group.

l. Submit the program and confirm that Work.youngadult was created with 5 observations and 5
variables.
5.3 Subsetting Observations and Variables 5-27

Level 2

2. Subsetting Observations and Variables Using the WHERE and DROP Statements
a. Write a DATA step to read the data set orion.product_dim to create a new data set called
Work.sports.

Work.sports should include only those observations with Supplier_Country from Great
Britain (GB), Spain (ES), or Netherlands (NL) and Product_Category values that end in the
word Sports.

Work.sports should not include the following variables: Product_ID, Product_Line,


ia l
t e r
Product_Group, Supplier_Name, and Supplier_ID.

b. Write a PROC PRINT step to create the following report:

M a
Partial PROC PRINT Output (First 10 of 30 Observations)

Obs

r y
Product_
Category

i o n Product_Name
Supplier_
Country

1
2

t a u t
Children Sports
Children Sports
Butch T-Shirt with V-Neck
Children's Knit Sweater
ES
ES

r i e3
4
5
b
Children Sports

i
Children Sports

r
Children Sports

S
Gordon Children's Tracking Pants
O'my Children's T-Shirt with Logo
Strap Pants BBO
ES
ES
ES

p
o dis SA
6
7
t
Indoor Sports
Indoor Sports
Abdomen Shaper
Fitness Dumbbell Foam 0.90
NL
NL

P r 8
9
Indoor Sports
Indoor Sports

f
Letour Heart Bike
Letour Trimag Bike
NL
NL

r
10 Indoor Sports Weight 5.0 Kg NL

S
Level 3
f o e o
A
S No tsit d
3. Using the SOUNDS-LIKE Operator and the KEEP= Option
a. Write a DATA step to read the data set orion.customer_dim to create a new data set called

ou
Work.tony.

b. Add a WHERE statement to the DATA step to subset the observations with the
Customer_FirstName value that sounds like Tony.

Documentation on the SOUNDS-LIKE operator can be found in the SAS Help and
Documentation from the Index tab by typing sounds-like operator.

c. Add a KEEP= data set option in the SET statement to read only the Customer_FirstName
and Customer_LastName variables.

Documentation on the KEEP= data set option can be found in the SAS Help and
Documentation from the Contents tab. (Select SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
SAS Data Set Options KEEP= Data Set Option.)
5-28 Chapter 5 Reading SAS Data Sets

d. Write a PROC PRINT step to create the following report:


Customer_ Customer_
Obs FirstName LastName

1 Tonie Asmussen
2 Tommy Mcdonald

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5.4 Adding Permanent Attributes 5-29

5.4 Adding Permanent Attributes

Objectives
Add labels to the descriptor portion of a SAS data set
by using the LABEL statement.
Add formats to the descriptor portion of a SAS data set
by using the FORMAT statement.

ia l
t e r
M a
r y i o n
t a u t
61

r i e r i b S
p
o dis SA t
P r Business Scenario Syntax

f
r
Use the following statements to complete the scenario:

S f o e o
LIBNAME libref 'SAS-data-library';

A
S No tsit d
DATA output-SAS-data-set;
SET input-SAS-data-set;
WHERE where-expression;
Part 1

ou
Part 2
KEEP variable-list;
LABEL variable = 'label'
variable = 'label'
Part 3
variable = 'label';
FORMAT variable(s) format;
RUN;

62
5-30 Chapter 5 Reading SAS Data Sets

Adding Permanent Attributes


The descriptor portion of the SAS data set stores variable
attributes including the name, type (character or numeric),
and length of the variable.

Labels and formats can also be stored in the descriptor


portion.

ia l
Partial PROC CONTENTS Output

t e r
Alphabetic List of Variables and Attributes

a
# Variable Type Len Format Label

1 First_Name Char 12

M
5 Hire_Date Num 8 DDMMYY10. Date Hired
4 Job_Title Char 25 Sales Title

63
2
3

r y
Last_Name
Salary

i o n Char
Num
18
8 COMMAX8.

t a u t
r i e i b
Adding Permanent Attributes
r S
p t
When displaying reports,

o dis SA
r
a label changes the appearance of a variable name
a format changes the appearance of variable value.

S P o r o f
A t f e
Partial PROC PRINT Output

d
Label

S No tsi
First_
Obs Name Last_Name Salary Sales Title Date Hired

1 Irenie Elvish 26.600 Sales Rep. II 01/01/1974


2 Christina Ngan 27.475 Sales Rep. II 01/07/1978

ou
3 Kimiko Hotstone 26.190 Sales Rep. I 01/10/1985
4 Lucian Daymond 26.480 Sales Rep. I 01/03/1979
5 Fong Hofmeister 32.040 Sales Rep. IV 01/03/1979

Format

64
5.4 Adding Permanent Attributes 5-31

The LABEL Statement


The LABEL statement assigns descriptive labels to
variable names.
General form of the LABEL statement:

LABEL variable = 'label'

l
variable = 'label'

ia
variable = 'label';

e r
A label can have as many as 256 characters.

t
Any number of variables can be associated with labels

M a
in a single LABEL statement.
Using a LABEL statement in a DATA step permanently

65

r y o n
associates labels with variables by storing the label in
the descriptor portion of the SAS data set.

i
t a u t
r i e i b
Business Scenario Part 3
r S
p t
Include labels in the descriptor portion of Work.subset1.

o dis SA
r
data work.subset1;
set orion.sales;

S P o o f
where Country='AU' and

r Job_Title contains 'Rep';


keep First_Name Last_Name Salary

A t f e
Job_Title Hire_Date;

d
label Job_Title='Sales Title'

S No tsirun;
Hire_Date='Date Hired';

66
ou p105d04
5-32 Chapter 5 Reading SAS Data Sets

Business Scenario Part 3


proc contents data=work.subset1;
run;

Partial PROC CONTENTS Output


Alphabetic List of Variables and Attributes

1
Variable

First_Name
Type

Char
Len

12
Label

ia l
r
5 Hire_Date Num 8 Date Hired
4 Job_Title Char 25 Sales Title
2
3
Last_Name
Salary

a t e Char
Num
18
8

y M n
67

a r t i o p105d04

i e t b u
p r t i
Business Scenario Part 3
r S
In order to use labels in the PRINT procedure, a LABEL

r o dis SA
option needs to be added to the PROC PRINT statement.

P f
proc print data=work.subset1 label;

r
run;

S f o o
Partial PROC PRINT Output

e
A
S No tsit Obs

1
2
3
First_

d
Name

Irenie
Christina
Kimiko
Last_Name

Elvish
Ngan
Hotstone
Salary

26600
27475
26190
Sales Title

Sales
Sales
Sales
Rep.
Rep.
Rep.
II
II
I
Date
Hired

5114
6756
9405

ou
4 Lucian Daymond 26480 Sales Rep. I 6999
5 Fong Hofmeister 32040 Sales Rep. IV 6999
6 Satyakam Denny 26780 Sales Rep. II 17014
7 Sharryn Clarkson 28100 Sales Rep. II 14184
8 Monica Kletschkus 30890 Sales Rep. IV 17106
9 Alvin Roebuck 30070 Sales Rep. III 9405
10 Kevin Lyon 26955 Sales Rep. I 16922

p105d04
68
5.4 Adding Permanent Attributes 5-33

The FORMAT Statement


The FORMAT statement assigns formats to variable
values.
General form of the FORMAT statement:

FORMAT variable(s) format;

A format is an instruction that SAS uses to write

ia l
data values.

t e r
Using a FORMAT statement in a DATA step
permanently associates formats with variables

M
of the SAS data set. a
by storing the format in the descriptor portion

69

r y i o n
t a u t
r i e
SAS Formats
r i b S
p t
SAS formats have the following form:

o dis SA
P r <$>format<w>.<d>

f
S f o
$
r o
indicates a character format.

A
S No tsit format

w d e
names the SAS format or user-defined format.

specifies the total format width including decimal


places and special characters.

ou
. is a required delimiter.

specifies the number of decimal places in numeric


d
formats.

70
5-34 Chapter 5 Reading SAS Data Sets

SAS Formats
Selected SAS formats:
Format Definition
$w. writes standard character data.

w.d writes standard numeric data.

l
writes numeric values with a comma that separates every
COMMAw.d

ia
three digits and a period that separates the decimal fraction.

r
writes numeric values with a period that separates every three
COMMAXw.d
digits and a comma that separates the decimal fraction.

DOLLARw.d

a t e
writes numeric values with a leading dollar sign, a comma that
separates every three digits, and a period that separates the
decimal fraction.

M
writes numeric values with a leading euro symbol (€), a period

n
EUROXw.d that separates every three digits, and a comma that separates

y
the decimal fraction.
71

a r t i o
i e t b u
p r
SAS Formats

t r i
Selected SAS formats:
S
r o dis SA Format Stored Value Displayed Value

S P o r
$4.
12.

o f Programming
27134.2864
Prog
27134

A t f e
12.2

d
27134.2864 27134.29

S No tsi
COMMA12.2 27134.2864 27,134.29
COMMAX12.2 27134.2864 27.134,29
DOLLAR12.2 27134.2864 $27,134.29

ou
EUROX12.2 27134.2864 €27.134,29

72
5.4 Adding Permanent Attributes 5-35

SAS Formats
If you do not specify a format width that is large enough
to accommodate a numeric value, the displayed value
is automatically adjusted to fit into the width.

Format Stored Value Displayed Value

l
DOLLAR12.2 27134.2864 $27,134.29

ia
DOLLAR9.2 27134.2864 $27134.29

r
DOLLAR8.2 27134.2864 27134.29
DOLLAR5.2
DOLLAR4.2

a t e
27134.2864
27134.2864
27134
27E3

y M n
73

a r t i o
i e t b u
p r
5.06 Quiz

t r i S
Which numeric format writes standard numeric data with

r o dis SA
leading zeros?

S P o r o f
Documentation on formats can be found in the
SAS Help and Documentation from the Contents tab

A t f e
(SAS Products Base SAS SAS 9.2 Language
Reference: Dictionary Dictionary of Language

d
S No tsiElements Formats Formats by Category).

75
ou
5-36 Chapter 5 Reading SAS Data Sets

SAS Date Formats


SAS date formats display SAS date values in standard
date forms.

Format Stored Value Displayed Value


MMDDYY6. 0 010160

l
MMDDYY8. 0 01/01/60

ia
MMDDYY10. 0 01/01/1960
DDMMYY6.
DDMMYY8.

t e r 365
365
311260
31/12/60
DDMMYY10.

M a 365 31/12/1960

78

r y i o n
t a u t
r i e
SAS Date Formats
r i b S
p t
Additional date formats:

o dis SA
P r Format
DATE7.
Stored Value

f -1
Displayed Value
31DEC59

S f o r
DATE9.

o -1 31DEC1959

A e
WORDDATE. 0 January 1, 1960

S No tsit MONYY7.
YEAR4.
d
WEEKDATE. 0
0
0
Friday, January 1, 1960
JAN1960
1960

79
ou
5.4 Adding Permanent Attributes 5-37

5.07 Quiz
Which FORMAT statement creates the output?
a. Answer
format Birth_Date Hire_Date ddmmyy9.
Term_Date mmyy7.;

l
b. Answer
format Birth_Date Hire_Date ddmmyyyy.
Term_Date mmmyyyy.;

c. Answer

e r
format Birth_Date Hire_Date ddmmyy10. ia
t
Term_Date monyy7.;

a
Output

y M
Birth_Date Hire_Date

n
Term_Date

o
21/05/1969 15/10/1992 MAR2007
81

t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
5-38 Chapter 5 Reading SAS Data Sets

SAS Date Formats


The SAS National Language Support (NLS) date formats
convert SAS date values to a locale-sensitive date string.
Format Locale Example
English_UnitedStates January 01, 1960
NLDATEw.

l
German_Germany 01. Januar 1960

ia
English_UnitedStates January
NLDATEMNw.

r
German_Germany Januar

NLDATEWw.

a t e
English_UnitedStates
German_Germany
Fri, Jan 01, 60
Fr, 01. Jan 60

M
English_UnitedStates Friday
NLDATEWNw.

83

r y i o n
German_Germany Freitag
p105d05

t a u t
National Language Support (NLS) is a set of features that enable a software product to function

r i e r i b
properly in every global market for which the product is targeted. SAS contains NLS features to
ensure that SAS applications conform to local language conventions.

S
p
o dis SA t
A locale reflects the language, local conventions, and culture for a geographical region. Local
conventions might include specific formatting rules for dates. Dates have many representations,

P r
depending on the conventions that are accepted in a culture.

f
S o r o
The LOCALE= system option is used to specify the locale, which reflects the local conventions,
language, and culture of a geographical region. For example, a locale value of English_Canada

f
A
S No tsi d e
represents the country of Canada with a language of English, and a locale value of French_Canada

t
represents the country of Canada with a language of French. English_UnitedStates represents
the country of United States with a language of English. German_Germany represents the country
of Germany with a language of German.

ou
The LOCALE= system option can be specified in a configuration file, at SAS invocation, in the
OPTIONS statement, or in the SAS System Options window.

For more information, refer to the SAS 9.2 National Language Support Reference Guide in SAS Help
and Documentation.
5.4 Adding Permanent Attributes 5-39

5.08 Quiz
How many date and time formats start with EUR?

Documentation on NLS formats can be found in the


SAS Help and Documentation from the Contents tab
(SAS Products Base SAS SAS 9.2 Language
Reference: Dictionary Dictionary of Language
Elements Formats Formats Documented in
Other SAS Publications).
ia l
t e r
M a
85

r y i o n
t a u t
r i e i b
Business Scenario Part 3
r S
p t
Include formats in the descriptor portion of

o dis SA
Work.subset1.

P r
data work.subset1;

f
r
set orion.sales;

S f o o
where Country='AU' and
Job_Title contains 'Rep';

e
A
keep First_Name Last_Name Salary

S No tsit d
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';
format Salary commax8. Hire_Date ddmmyy10.;

ou
run;

p105d06
87
5-40 Chapter 5 Reading SAS Data Sets

Business Scenario Part 3


proc contents data=work.subset1;
run;

Partial PROC CONTENTS Output


Alphabetic List of Variables and Attributes

1
Variable

First_Name
Type

Char
Len

12
Format Label

ia l
r
5 Hire_Date Num 8 DDMMYY10. Date Hired
4 Job_Title Char 25 Sales Title
2
3
Last_Name
Salary

a t eChar
Num
18
8 COMMAX8.

y M n
88

a r t i o p105d06

i e t b u
p r t r i
Business Scenario Part 3
S
o dis SA
proc print data=work.subset1 label;
run;

P r f
r
Partial PROC PRINT Output

S f o
Obs
First_
Name

e o Last_Name Salary Sales Title Date Hired

A
S No tsit
1
2
3
4
5
6
Irenie

d
Christina
Kimiko
Lucian
Fong
Satyakam
Elvish
Ngan
Hotstone
Daymond
Hofmeister
Denny
26.600
27.475
26.190
26.480
32.040
26.780
Sales
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
Rep.
II
II
I
I
IV
II
01/01/1974
01/07/1978
01/10/1985
01/03/1979
01/03/1979
01/08/2006

ou
7 Sharryn Clarkson 28.100 Sales Rep. II 01/11/1998
8 Monica Kletschkus 30.890 Sales Rep. IV 01/11/2006
9 Alvin Roebuck 30.070 Sales Rep. III 01/10/1985
10 Kevin Lyon 26.955 Sales Rep. I 01/05/2006
11 Marinus Surawski 26.910 Sales Rep. I 01/01/2003
12 Fancine Kaiser 28.525 Sales Rep. III 01/10/1978

p105d06
89
5.4 Adding Permanent Attributes 5-41

Exercises

Level 1

4. Adding Permanent Attributes to Work.youngadult

a. Retrieve and submit the starter program p105e04.

ia l
e r
Notice the format and labels stored in the descriptor portion of Work.youngadult.

t
PROC PRINT report:

M a
b. Add a LABEL statement and a FORMAT statement to the DATA step to create the following

n
Customer

y
Obs Gender Customer Name Date of Birth Member Level Age

1 F

a r t i o
Sandrina Stephano July 9, 1979 Orion Club Gold members 28

t u
2 F Cornelia Krahl February 27, 1974 Orion Club Gold members 33

e
3 F Dianne Patchin May 6, 1979 Orion Club Gold members 28
4

r
5
i F
F

t r i b
Annmarie Leveille

S
Sanelisiwe Collier
July 16,
July 7,
1984
1988
Orion
Orion
Club
Club
Gold
Gold
members
members
23
19

r p
o dis SA
The labels need to be changed for Customer_Gender, Customer_BirthDate, and
Customer_Group.

S P o o f
The format needs to be changed for Customer_BirthDate.

r
Hint: Do not forget the option in the PROC PRINT step that enables the labels to appear.

A t f e
Why do Customer_Name and Customer_Age appear with a space in the column header

d
S No tsi but do not need labels? __________________________________________________________

Level 2

ou
5. Adding Permanent Attributes to Work.sports

a. Retrieve the starter program p105e05.


b. Add a LABEL statement to the DATA step and a LABEL option to the PROC PRINT step
to add the following labels:

Variable Label
Product_Category Sports Category
Product_Name Product Name (Abbrev)

Supplier_Name Supplier Name (Abbrev)


5-42 Chapter 5 Reading SAS Data Sets

c. Add a FORMAT statement to the DATA step to display only the first 15 letters of
Product_Name and Supplier_Name.

d. Submit the program to create the following PROC PRINT report:


Partial PROC PRINT Output (First 10 of 30 Observations)
Product Name Supplier Supplier Name
Obs Sports Category (Abbrev) Country (Abbrev)

l
1 Children Sports Butch T-Shirt w ES Luna sastreria
2 Children Sports Children's Knit ES Luna sastreria

ia
3 Children Sports Gordon Children ES Luna sastreria

r
4 Children Sports O'my Children's ES Luna sastreria

e
5 Children Sports Strap Pants BBO ES Sportico
6
7
8

a t
Indoor Sports
Indoor Sports
Indoor Sports
Abdomen Shaper
Fitness Dumbbel
Letour Heart Bi
NL
NL
NL
TrimSport B.V.
TrimSport B.V.
TrimSport B.V.

M
9 Indoor Sports Letour Trimag B NL TrimSport B.V.

n
10 Indoor Sports Weight 5.0 Kg NL TrimSport B.V.

r y i o
e. Add a PROC CONTENTS step to the end of the program to verify that the labels and formats

a t
are stored in the descriptor portion.

Level 3

i e t b u
p r t r i S
6. Using the $UPCASEw. Format and the SPLIT= Option

r o dis SA
a. Retrieve the starter program p105e06.

S P o o
uppercase values.
f
b. Add a FORMAT statement to display Customer_FirstName and Customer_LastName in

r
A t f d e
Documentation on the $UPCASEw. format can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS

S No tsi SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements


Formats $UPCASEw. Format).

ou
c. Add a LABEL statement to add the following labels:

Variable Label
Customer_FirstName CUSTOMER*FIRST NAME

Customer_LastName CUSTOMER*LAST NAME

d. In the PROC PRINT statement, replace the LABEL option with the SPLIT= option and reference
the asterisk as the split character.

Documentation on the SPLIT= option can be found in the SAS Help and Documentation
from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure).
5.4 Adding Permanent Attributes 5-43

e. Submit the program to create the following PROC PRINT report:

CUSTOMER CUSTOMER
Obs FIRST NAME LAST NAME

1 TONIE ASMUSSEN
2 TOMMY MCDONALD

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-44 Chapter 5 Reading SAS Data Sets

5.5 Chapter Review

Chapter Review
1. What statement is used to read from a SAS data set
in the DATA step?

2. What statement is used to write to a SAS data set in

ia l
r
the DATA step?

t e
3. What does the WHERE statement do?

a
y M
4. What are examples of logical operators?

n
r i o
5. How can you limit the variables written to an output

a t
data set in the DATA step?

i e t b u
i
92

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
5.6 Solutions 5-45

5.6 Solutions

Solutions to Exercises
1. Subsetting Observations and Variables Using the WHERE and KEEP Statements
a. Retrieve and submit the program.
What is the variable name that contains gender values? Customer_Gender

What are the two possible gender values? F and M

ia l
b. Add a DATA step.

t e r
a
data work.youngadult;
set orion.customer_dim;
run;

y M n
o
proc print data=orion.customer_dim;
run;

t a r t i
i i b u
c. Modify the PROC PRINT step.

e
proc print data=work.youngadult;
run;

p r t r S
o dis SA
d. Submit the program.

P r
e. Add a WHERE statement to subset for female customers.

f
r
data work.youngadult;

S f o o
set orion.customer_dim;
where Customer_Gender='F';

e
A run;

S No tsit d
f. Submit the program.
g. Modify the WHERE statement to subset for female 18- to 36-year-old customers.

ou
data work.youngadult;
set orion.customer_dim;
where Customer_Gender='F' and
Customer_Age between 18 and 36;
run;
h. Submit the program.
5-46 Chapter 5 Reading SAS Data Sets

i. Modify the WHERE statement to subset for female 18- to 36-year-old customers who have the
word Gold in their Customer_Group.
data work.youngadult;
set orion.customer_dim;
where Customer_Gender='F' and
Customer_Age between 18 and 36 and
Customer_Group contains 'Gold';
run;
j. Submit the program.
k. Keep only five variables.
ia l
data work.youngadult;
set orion.customer_dim;

t e r
a
where Customer_Gender='F' and
Customer_Age between 18 and 36 and

M
Customer_Group contains 'Gold';

r y i o n
keep Customer_Name Customer_Age Customer_BirthDate
Customer_Gender Customer_Group;
run;

t a u t
e
l. Submit the program.

r i t r i b
2. Subsetting Observations and Variables Using the WHERE and DROP Statements

S
r p
o dis SA
a. Write a DATA step.
data work.sports;
set orion.product_dim;

S P o r o f
where Supplier_Country in ('GB','ES','NL') and
Product_Category like '%Sports';

f e
drop Product_ID Product_Line Product_Group

A
S No tsi
run;
t
Supplier_Name Supplier_ID;

d
b. Write a PROC PRINT step.

ou
proc print data=work.sports;
run;
3. Using the SOUNDS-LIKE Operator and the KEEP= Option
a. Write a DATA step.
data work.tony;
set orion.customer_dim;
run;
b. Add a WHERE statement to the DATA step.
data work.tony;
set orion.customer_dim;
where Customer_FirstName =* 'Tony';
run;
5.6 Solutions 5-47

c. Add a KEEP= data set option in the SET statement.


data work.tony;
set orion.customer_dim(keep=Customer_FirstName Customer_LastName);
where Customer_FirstName =* 'Tony';
run;
d. Write a PROC PRINT step.
proc print data=work.tony;
run;
4. Adding Permanent Attributes to Work.youngadult

ia l
t e r
a. Retrieve and submit the starter program.
b. Add a LABEL statement and a FORMAT statement to the DATA step.
data work.youngadult;
set orion.customer_dim;

M a
where Customer_Gender='F' and

r y o n
Customer_Age between 18 and 35 and

i
t
Customer_Group contains 'Gold';

e t a
keep Customer_Name Customer_Age Customer_BirthDate

u
Customer_Gender Customer_Group;

r i r i b
label Customer_Gender='Gender'

S
Customer_BirthDate='Date of Birth'

t
p Customer_Group='Member Level';

o dis SA
format Customer_BirthDate worddate.;
run;

r
S P
run;

o r o f
proc contents data=work.youngadult;

A t f d e
proc print data=work.youngadult label;

S No tsi
run;
Why do Customer_Name and Customer_Age appear with a space in the column header but

ou
do not need labels? These variables already have permanent labels assigned in the data set
descriptor portion.
5. Adding Permanent Attributes to Work.sports

a. Retrieve the starter program.


5-48 Chapter 5 Reading SAS Data Sets

b. Add a LABEL statement to the DATA step and a LABEL option to PROC PRINT.
data work.sports;
set orion.product_dim;
where Supplier_Country in ('GB','ES','NL') and
Product_Category like '%Sports';
drop Product_ID Product_Line Product_Group Supplier_ID;
label Product_Category='Sports Category'
Product_Name='Product Name (Abbrev)'
Supplier_Name='Supplier Name (Abbrev)';
run;

ia l
proc print data=work.sports label;
run;

t e r
a
c. Add a FORMAT statement to the DATA step.
data work.sports;
set orion.product_dim;

y M n
where Supplier_Country in ('GB','ES','NL') and

r i o
Product_Category like '%Sports';

a t
drop Product_ID Product_Line Product_Group Supplier_ID;

i e t b u
label Product_Category='Sports Category'
Product_Name='Product Name (Abbrev)'

p r t i
Supplier_Name='Supplier Name (Abbrev)';

r S
format Product_Name Supplier_Name $15.;

o dis SA
run;

P r d. Submit the program.

f
S o r
e. Add a PROC CONTENTS step.

o
proc contents data=work.sports;

f
A
run;

S No tsit d e
6. Using the $UPCASEw. Format and the SPLIT= Option
a. Retrieve the starter program.

ou
b. Add a FORMAT statement.
data work.tony;
set orion.customer_dim(keep=Customer_FirstName Customer_LastName);
where Customer_FirstName =* 'Tony';
format Customer_FirstName Customer_LastName $upcase.;
run;

proc print data=work.tony label;


run;
5.6 Solutions 5-49

c. Add a LABEL statement.


data work.tony;
set orion.customer_dim(keep=Customer_FirstName Customer_LastName);
where Customer_FirstName =* 'Tony';
format Customer_FirstName Customer_LastName $upcase.;
label Customer_FirstName='CUSTOMER*FIRST NAME'
Customer_LastName='CUSTOMER*LAST NAME';
run;
d. Replace the LABEL option with the SPLIT= option.
proc print data=work.tony split='*';

ia l
run;
e. Submit the program.

t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-50 Chapter 5 Reading SAS Data Sets

Solutions to Student Activities (Polls/Quizzes)

5.02 Poll – Correct Answer


The DATA step reads a temporary SAS data set to create
a permanent SAS data set.

True

ia l
False

t e r
a
data work.mycustomers;
set orion.customer;
run;

M
y o n
proc print data=work.mycustomers;

r i
var Customer_ID Customer_Name

run;
t a u t
Customer_Address;

23

r i e r i b S
p
o dis SA t
P r 5.03 Quiz – Correct Answer

f
r
Which WHERE statement correctly subsets the numeric

S f o
names?

e o
values for May, June, or July and missing character

A
S No tsit a. Answer

d
where Months in (5-7)
and Names = .;

ou
b. Answer
where Months in (5,6,7)
and Names = ' ';

c. Answer
where Months in ('5','6','7')
and Names = '.';

39
5.6 Solutions 5-51

5.04 Quiz – Correct Answer


Which value will not be returned based on the
WHERE statement?
a. Office Rep
b. Sales Rep. IV
c. service rep III
d. Representative

ia l
e r
where Job_Title contains 'Rep';

t
M a
46

r y i o n
t a u t
r i e i b
5.05 Quiz – Correct Answer
r S
p t
Which WHERE statement will return all the observations

o dis SA
that have a first name starting with the letter M for the

P rgiven values?

f
S f o r
where Name like '_, M_';
a. Answer

o
A
S No tsit d e
where Name like '%, M%';
b. Answer

where Name like '_, M%';


c. Answer

ou
where Name like '%, M_';
d. Answer

first name
last name
51
5-52 Chapter 5 Reading SAS Data Sets

5.06 Quiz – Correct Answer


Which numeric format writes standard numeric data with
leading zeros?

Zw.d

The Zw.d format is similar to the w.d format except


that Zw.d pads right-aligned output with zeros instead

ia l
of blanks.

t e r
M a
76

r y i o n
t a u t
r i e i b
5.07 Quiz – Correct Answer
r S
p t
Which FORMAT statement creates the output?

o dis SA
P r a. Answer
format Birth_Date Hire_Date ddmmyy9.

f
Term_Date mmyy7.;

S f o r
b. Answer
o
format Birth_Date Hire_Date ddmmyyyy.

A
S No tsit c. Answer d e
Term_Date mmmyyyy.;

format Birth_Date Hire_Date ddmmyy10.


Term_Date monyy7.;

82
ou Output
Birth_Date

21/05/1969
Hire_Date

15/10/1992
Term_Date

MAR2007
5.6 Solutions 5-53

5.08 Quiz – Correct Answer


How many date and time formats start with EUR?

Nine

Example:
The EURDFDDw. format writes international SAS
date values in the form dd.mm.yy or dd.mm.yyyy.

ia l
t e r
M a
86

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
5-54 Chapter 5 Reading SAS Data Sets

Solutions to Chapter Review

Chapter Review Answers


1. What statement is used to read from a SAS data set
in the DATA step?
SET statement

ia l
the DATA step?
DATA statement
t r
2. What statement is used to write to a SAS data set in

e
M a
3. What does the WHERE statement do?

r y i
meet a certain condition.
o n
The WHERE statement subsets observations that

t a u t
93

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
r
4. What are examples of logical operators?

S f o
AND
OR

e o
A
S No tsit NOT

d
5. How can you limit the variables written to an output

ou
data set in the DATA step?
DROP or KEEP statement

94
Chapter 6 Reading Excel
Worksheets

6.1 Using Excel Data as Input .............................................................................................. 6-3


Demonstration: Reading Excel Worksheets – Windows ...................................................... 6-15

ia l
e r
Exercises .............................................................................................................................. 6-17

t
6.2
a
Doing More with Excel Worksheets (Self-Study) ....................................................... 6-20

M
r i o n
Exercises .............................................................................................................................. 6-36

y
6.3

a t
Chapter Review ............................................................................................................. 6-37

t u
6.4
i e i b
Solutions ....................................................................................................................... 6-38

r r S
p t
Solutions to Exercises .......................................................................................................... 6-38

o dis SA
r
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 6-43

S P o r o f
Solutions to Chapter Review ................................................................................................ 6-44

A t f d e
S No tsi
ou
6-2 Chapter 6 Reading Excel Worksheets

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.1 Using Excel Data as Input 6-3

6.1 Using Excel Data as Input

Objectives
Use the DATA step to create a SAS data set from
an Excel worksheet.
Use the SAS/ACCESS LIBNAME statement to
read from an Excel worksheet as though it were

ia l
r
a SAS data set.

a t e
y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
Business Scenario

S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and

f
the United States.

A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales

ou
Representatives
the employee's first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion

4
6-4 Chapter 6 Reading Excel Worksheets

Business Scenario

Reading SAS
Data Sets

Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files

M a
5

r y i o n
t a u t
r i e
Business Scenario
r i b S
p
o dis SA t libname ;

r
Reading SAS data ;
set ;
Data Sets

P f
...

r
run;

S f o e o libname ;

A
S No tsit
Reading Excel

d
Worksheets
data
set
...
run;
;
;

ou
data ;
Reading Delimited infile ;
input ;
Raw Data Files ...
run;
6

The LIBNAME statement references a SAS data library when reading a SAS data set
and an Excel workbook when reading an Excel worksheet.
6.1 Using Excel Data as Input 6-5

Business Scenario Syntax


Use the following statements to complete the scenario:

LIBNAME libref 'physical-file-name';

DATA output-SAS-data-set;

l
SET input-SAS-data-set;

ia
WHERE where-expression;
KEEP variable-list;

t e r
LABEL variable = 'label'
variable = 'label'

a
variable = 'label';
FORMAT variable(s) format ;
RUN;

y M n
7

a r t i o
i e t b u
p r
sales.xls

t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
8
ou
two worksheets cells formatted as dates
6-6 Chapter 6 Reading Excel Worksheets

The LIBNAME Statement (Review)


The LIBNAME statement assigns a library reference
name (libref) to a SAS data library.

General form of the LIBNAME statement:

LIBNAME libref 'SAS-data-library' <options>;

ia l
Example:

t
libname orion 's:\workshop';
e r
M a
n
libref physical location of

y
SAS data library
9

a r t i o
i e t b u
The LIBNAME statement needs to reference a SAS data library specific to your operating environment.

p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.1 Using Excel Data as Input 6-7

The SAS/ACCESS LIBNAME Statement


The SAS/ACCESS LIBNAME statement extends the
LIBNAME statement to support assigning a library
reference name (libref) to Microsoft Excel workbooks.

General form of the SAS/ACCESS LIBNAME statement:

LIBNAME libref 'physical-file-name' <options>;

ia l
t e r
This enables you to reference worksheets directly in
a DATA step or SAS procedure, and to read from and

a SAS data set.

M a
write to a Microsoft Excel worksheet as though it were

10

r y i o n
t a u t
SAS/ACCESS options can be used in the LIBNAME statement.

r
For example,
i e r i b S
p
MIXED=YES | NO

o dis SA t
specifies whether to convert numeric data values into character data values for a column

P r with mixed data types.

f
S f o r o
The default is NO, which means that numeric data will be imported as missing values in a
character column. If MIXED=YES, the engine assigns a SAS character type for the column

A
S No tsit d e
and converts all numeric data values to character data.

The following Technical Support Usage Note addresses column data that is imported as missing:

http://support.sas.com/kb/6/123.html

ou
6-8 Chapter 6 Reading Excel Worksheets

The SAS/ACCESS LIBNAME Statement


SAS/ACCESS Interface to PC File Formats is required in
order to use the SAS/ACCESS LIBNAME statement to
access Excel workbooks.

Example:
libname orionxls 's:\workshop\sales.xls';

ia l
libref

t e rphysical file name of Excel


workbook including path,

M a filename, and extension

11

r y i o n p106d01

t a u t
SAS/ACCESS Interface to PC File Formats enables you to read data from PC files, to use that data in

r i e r i b
SAS reports or applications, and to use SAS data sets to create PC files in various formats.

S
t
SAS/ACCESS Interface to PC File Formats gives access to Microsoft Excel, Microsoft Access, dBase,

p
o dis SA
JMP, Lotus 1-2-3, SPSS, Stata, and Paradox.

r
To determine if you have a license for SAS/ACCESS Interface to PC File Formats, submit the following

P
step:

r
proc setinit;

S o o f
f
run;

A
S No tsit d e
After submitting, look in the SAS log for the products that are licensed for your site.

SAS/ACCESS Interface to PC File Formats on Linux and UNIX enables access to PC files from
the Linux and UNIX operating environments. A PC Files Server is used to access the PC data.

ou
The PC Files Server runs on a Microsoft Windows server, and SAS/ACCESS Interface to PC File
Formats runs on a Linux or UNIX client server.
6.1 Using Excel Data as Input 6-9

SAS Explorer Window


Each worksheet in the Excel
workbook is treated as though
it is a SAS data set.

ia l
t e r
M a
n
Worksheet names appear with a

12

a r y dollar sign at the end of the name.

t i o
i e t b u
p r t i
The CONTENTS Procedure
r S
proc contents data=orionxls._all_;

r o dis SA
run;

P f
The CONTENTS Procedure

S f o r o Directory

e
Libref ORIONXLS

A
S No tsit d
Engine
Physical Name
User
EXCEL
sales.xls
Admin

DBMS

ou
Member Member
# Name Type Type

1 Australia$ DATA TABLE


2 UnitedStates$ DATA TABLE

p106d01
13
6-10 Chapter 6 Reading Excel Worksheets

The CONTENTS Procedure The CONTENTS Procedure

Data Set Name ORIONXLS.'Australia$'n Observations .


Member Type DATA Variables 9
Engine EXCEL Indexes 0
Created . Observation Length 0
Last Modified . Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation Default
Encoding Default

Alphabetic List of Variables and Attributes

ia l
r
# Variable Type Len Format Informat Label

e
8 Birth_Date Num 8 DATE9. DATE9. Birth Date

t
7 Country Char 2 $2. $2. Country
1 Employee_ID Num 8 Employee ID

a
2 First_Name Char 10 $10. $10. First Name
4 Gender Char 1 $1. $1. Gender

M
9 Hire_Date Num 8 DATE9. DATE9. Hire Date
6 Job_Title Char 14 $14. $14. Job Title

n
3 Last_Name Char 12 $12. $12. Last Name

y
5 Salary Num 8 Salary

14

a r t i o
e t u
The Excel LIBNAME engine converts worksheet dates to SAS date values and assigns

i b
i
the DATE9. format.

p r t r S
r o dis SA
The CONTENTS Procedure The CONTENTS Procedure

S P Data Set Name

r
Member Type
Engine

o
Created

o f
ORIONXLS.'UnitedStates$'n
DATA
EXCEL
.
Observations
Variables
Indexes
Observation Length
.
9
0
0

f e
Last Modified . Deleted Observations 0

A
Protection Compressed NO

t d
Data Set Type Sorted NO

S No tsi
Label
Data Representation Default
Encoding Default

ou
Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

8 Birth_Date Num 8 DATE9. DATE9. Birth Date


7 Country Char 2 $2. $2. Country
1 Employee_ID Num 8 Employee ID
2 First_Name Char 12 $12. $12. First Name
4 Gender Char 1 $1. $1. Gender
9 Hire_Date Num 8 DATE9. DATE9. Hire Date
6 Job_Title Char 20 $20. $20. Job Title
3 Last_Name Char 18 $18. $18. Last Name
5 Salary Num 8 Salary

15
6.1 Using Excel Data as Input 6-11

SAS Name Literals


By default, special characters such as the $ are not
allowed in data set names.
SAS name literals enable special characters to be
included in data set names.
A SAS name literal is a name token that is expressed as
a string within quotation marks, followed by the letter n.

ia l
r
orionxls.'Australia$'n

a t e SAS name literal

y M n
16

a r t i o
i e t b u
p r t i
The PRINT Procedure
r S
proc print data=orionxls.'Australia$'n;

r o dis SA
run;

S P
Obs

o r
Employee_
ID

o f
Partial PROC PRINT Output
First_Name Last_Name Gender Salary Job_Title Country
Birth_
Date Hire_Date

A
1
2
3

t f 120102
120103
120121

d e
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
108255
87975
26600
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
11AUG1969
22JAN1949
02AUG1944
01JUN1989
01JAN1974
01JAN1974

S No tsi
4 120122 Christina Ngan F 27475 Sales Rep. II AU 27JUL1954 01JUL1978
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 28SEP1964 01OCT1985
6 120124 Lucian Daymond M 26480 Sales Rep. I AU 13MAY1959 01MAR1979
7 120125 Fong Hofmeister M 32040 Sales Rep. IV AU 06DEC1954 01MAR1979
8 120126 Satyakam Denny M 26780 Sales Rep. II AU 20SEP1988 01AUG2006
9 120127 Sharryn Clarkson F 28100 Sales Rep. II AU 04JAN1979 01NOV1998

ou
10 120128 Monica Kletschkus F 30890 Sales Rep. IV AU 14JUL1986 01NOV2006
11 120129 Alvin Roebuck M 30070 Sales Rep. III AU 22NOV1964 01OCT1985
12 120130 Kevin Lyon M 26955 Sales Rep. I AU 14DEC1984 01MAY2006
13 120131 Marinus Surawski M 26910 Sales Rep. I AU 25SEP1979 01JAN2003
14 120132 Fancine Kaiser F 28525 Sales Rep. III AU 05APR1949 01OCT1978
15 120133 Petrea Soltau F 27440 Sales Rep. II AU 22APR1986 01OCT2006

p106d01
17
6-12 Chapter 6 Reading Excel Worksheets

6.01 Quiz
Which PROC PRINT step displays the worksheet
containing employees from the United States?

a. proc print data=orionxls.'UnitedStates';


run;

b. proc print data=orionxls.'UnitedStates$';


run;

ia l
e r
c. proc print data=orionxls.'UnitedStates'n;
run;
t
a
d. proc print data=orionxls.'UnitedStates$'n;

M n
run;

19

a r y t i o
i e t b u
p r
Business Scenario

t r i S
Create a temporary SAS data set named Work.subset2

r o dis SA
from the Excel workbook named sales.xls.

P f
libname orionxls 's:\workshop\sales.xls';

S r
data work.subset2;

f o o
set orionxls.'Australia$'n;

A
S No tsit e
where Job_Title contains 'Rep';

d
keep First_Name Last_Name Salary
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';

ou
format Salary comma10. Hire_Date weekdate.;
run;

p106d02
21
6.1 Using Excel Data as Input 6-13

Business Scenario
proc contents data=work.subset2;
run;

Partial PROC CONTENTS Output


Alphabetic List of Variables and Attributes

1
Variable

First_Name
Type

Char
Len

10
Format

$10.
Informat

$10.
Label

First Name

ia l
r
5 Hire_Date Num 8 WEEKDATE. DATE9. Date Hired
4 Job_Title Char 14 $14. $14. Sales Title

e
2 Last_Name Char 12 $12. $12. Last Name

t
3 Salary Num 8 COMMA10. Salary

M a
22

r y i o n p106d02

t a u t
r i e
Business Scenario
r i b S
p
o dis SA t
proc print data=work.subset2 label;
run;

P r f
r
Partial PROC PRINT Output

S f o e o
Obs First Name Last Name Salary Sales Title Date Hired

A
1 Irenie Elvish 26,600 Sales Rep. II Tuesday, January 1, 1974

t
2 Christina Ngan 27,475 Sales Rep. II Saturday, July 1, 1978

S No tsi d
3 Kimiko Hotstone 26,190 Sales Rep. I Tuesday, October 1, 1985
4 Lucian Daymond 26,480 Sales Rep. I Thursday, March 1, 1979
5 Fong Hofmeister 32,040 Sales Rep. IV Thursday, March 1, 1979
6 Satyakam Denny 26,780 Sales Rep. II Tuesday, August 1, 2006
7 Sharryn Clarkson 28,100 Sales Rep. II Sunday, November 1, 1998

ou
8 Monica Kletschkus 30,890 Sales Rep. IV Wednesday, November 1, 2006
9 Alvin Roebuck 30,070 Sales Rep. III Tuesday, October 1, 1985
10 Kevin Lyon 26,955 Sales Rep. I Monday, May 1, 2006
11 Marinus Surawski 26,910 Sales Rep. I Wednesday, January 1, 2003
12 Fancine Kaiser 28,525 Sales Rep. III Sunday, October 1, 1978

p106d02
23
6-14 Chapter 6 Reading Excel Worksheets

Disassociating a Libref
If SAS has a libref assigned to an Excel workbook, the
workbook cannot be opened in Excel. To disassociate a
libref, use a LIBNAME statement and specify the libref
and the CLEAR option.
libname orionxls 's:\workshop\sales.xls';

data work.subset2;
set orionxls.'Australia$'n;

ia l
...
run;

t e r
libname orionxls clear;

M a
SAS disconnects from the data source and closes any

24

r y i n
resources that are associated with that libref's connection.

o p106d02

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.1 Using Excel Data as Input 6-15

Reading Excel Worksheets – Windows


p106d02

1. Submit the following program except for the last LIBNAME statement.
libname orionxls 'sales.xls';

data work.subset2;

ia l
r
set orionxls.'Australia$'n;

e
where Job_Title contains 'Rep';

a
Job_Title Hire_Date;
t
keep First_Name Last_Name Salary

label Job_Title='Sales Title'

y M
Hire_Date='Date Hired';

n
format Salary comma10. Hire_Date weekdate.;
run;

a r t i o
run;

i e t b u
proc contents data=work.subset2;

r r i S
proc print data=work.subset2 label;
run;
p t
r o dis SA
libname orionxls clear;

P r f
2. Review the PROC CONTENTS and PROC PRINT results in the Output window.

S o o
f
3. Select the Explorer tab on the SAS window bar to activate the SAS Explorer or select

A
S No tsit d e
View Contents Only.

ou
6-16 Chapter 6 Reading Excel Worksheets

4. Double-click Libraries to show all available libraries.

ia l
t e r
M a
r y i o n
a t
5. Double-click on the Orionxls library to show all Excel worksheets of that library.

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6. Submit the last LIBNAME statement to disassociate the libref.
6.1 Using Excel Data as Input 6-17

Exercises

Level 1

1. Reading an Excel Worksheet


a. Retrieve the starter program p106e01.

ia l
t e r
b. Add a LIBNAME statement before the PROC CONTENTS step to create a libref called
CUSTFM that references the Excel workbook named custfm.xls.

PROC CONTENTS report:

M a
c. Submit the LIBNAME statement and the PROC CONTENTS step to create the following partial

n
Page 1 of 3

a r y t i o
The CONTENTS Procedure

t
Directory

i e i b u Libref CUSTFM

r
Engine EXCEL

p t r SPhysical Name custfm.xls

o dis SA
User Admin

P r f
DBMS

r
Member Member

S f o e o # Name Type Type

A
S No tsit d
1
2
Females$
Males$
DATA
DATA
TABLE
TABLE

d. Add a SET statement in the DATA step to read the worksheet containing the male data.
e. Add a KEEP statement in the DATA step to include only the First_Name, Last_Name, and

ou
Birth_Date variables.
f. Add a FORMAT statement in the DATA step to display the Birth_Date as a four-digit year.
g. Add a LABEL statement to change the column header of Birth_Date to Birth Year.
h. Submit the program including the last LIBNAME statement and create the following PROC
PRINT report:
Partial PROC PRINT Output (First 5 of 47 Observations)
Birth
Obs First Name Last Name Year

1 James Kvarniq 1974


2 David Black 1969
3 Markus Sepke 1988
4 Ulrich Heyde 1939
5 Jimmie Evans 1954
6-18 Chapter 6 Reading Excel Worksheets

Level 2

2. Reading an Excel Worksheet


a. Write a LIBNAME statement to create a libref called PROD that references the Excel workbook
named products.xls.
b. Write a PROC CONTENTS step to view all of the contents of PROD.

l
c. Submit the program to determine the names of the four worksheets in products.xls.

ia
d. Write a DATA step to read the worksheet containing sports data to create a new data set called
Work.golf.

t e r
The data set Work.golf should

M a
• include only the observations where Category is equal to Golf
• not include the Category variable

r i o n
• include a label of Golf Products for the Name variable.

y t
e. Write a LIBNAME to clear the PROD libref.

t a u
f. Write a PROC PRINT step to create the following report:

e
r i i b
Partial PROC PRINT Output (First 10 of 56 Observations)

t r S
r p
o dis SA
Obs

1
2
Golf Products

Ball Bag
Red/White/Black Staff 9 Bag

S P o r o f 3
4
5
Tee Holder
Bb Softspikes - Xp 22-pack
Bretagne Performance Tg Men's Golf Shoes L.

A t f d e 6
7
Bretagne Soft-Tech Men's Glove, left
Bretagne St2 Men's Golf Glove, left

S No tsi 8
9
10
Bretagne Stabilites 2000 Goretex Shoes
Bretagne Stabilities Tg Men's Golf Shoes
Bretagne Stabilities Women's Golf Shoes

ou
Level 3

3. Reading a Range of an Excel Worksheet


a. Write a LIBNAME statement to create a libref called XLSDATA that references the Excel
workbook named custcaus.xls. The worksheets in this Excel workbook do not have column
names. Add the appropriate option to the LIBNAME statement that specifies not to use the first
row of data as column names.

Documentation on the appropriate option can be found in the SAS Help and
Documentation from the Contents tab (SAS Products SAS/ACCESS
SAS/ACCESS 9.2 for PC Files: Reference LIBNAME Statement and
Pass-Through Facility on 32-Bit Microsoft Windows File-Specific Reference
Microsoft Excel Workbook Files).
b. Write a PROC CONTENTS step to view all of the contents of XLSDATA.
6.1 Using Excel Data as Input 6-19

c. Submit the program. Any member not containing a dollar sign in the name is an Excel range.
d. Write a DATA step to read the range containing Germany (DE) data to create a new data set
called Work.germany. Add appropriate labels and formats based on the desired report.

e. Write a LIBNAME to clear the XLSDATA libref.


f. Write a PROC PRINT step to create the following report:
Customer Birth

l
Obs ID Country Gender First Name Last Name Date

ia
1 9 DE F Cornelia Krahl 27/02/74

r
2 11 DE F Elke Wallstab 16/08/74

e
3 13 DE M Markus Sepke 21/07/88

t
4 16 DE M Ulrich Heyde 16/01/39

a
5 19 DE M Oliver S. Füßling 23/02/64
6 33 DE M Rolf Robak 24/02/39

M
7 42 DE M Thomas Leitmann 09/02/79

n
8 50 DE M Gert-Gunter Mendler 16/01/34

y o
9 61 DE M Carsten Maestrini 08/07/44
10

t a r 65

t i
DE F Ines Deisser 20/07/69

i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-20 Chapter 6 Reading Excel Worksheets

6.2 Doing More with Excel Worksheets (Self-Study)

Objectives
Use the DATA step to create an Excel worksheet
from a SAS data set.
Use the COPY procedure to create an Excel
worksheet from a SAS data set.

ia l
r
Use the IMPORT Wizard and procedure to read

e
an Excel worksheet.

an Excel worksheet.
a t
Use the EXPORT Wizard and procedure to create

y M n
a r t i o
i e t b u
i
29

p r t r S
r o dis SA
Creating Excel Worksheets

S P o r o f
In addition to reading an Excel worksheet, the
SAS/ACCESS LIBNAME statement with the DATA

f
step can be used to create an Excel worksheet.

A
S No tsit d e
libname orionxls
's:\workshop\qtr2007a.xls';

data orionxls.qtr1_2007;
set orion.qtr1_2007;

ou
run;

data orionxls.qtr2_2007;
set orion.qtr2_2007;
run;

proc contents data=orionxls._all_;


run;

libname orionxls clear; p106d03


30
6.2 Doing More with Excel Worksheets (Self-Study) 6-21

Creating Excel Worksheets


Partial SAS Log
70 data orionxls.qtr1_2007;
71 set orion.qtr1_2007;
72
73 run;

NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 22 observations read from the data set ORION.QTR1_2007.

l
NOTE: The data set ORIONXLS.qtr1_2007 has 22 observations and 5 variables.

ia
74 data orionxls.qtr2_2007;

r
75 set orion.qtr2_2007;
76 run;

t e
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 36 observations read from the data set ORION.QTR2_2007.

a
NOTE: The data set ORIONXLS.qtr2_2007 has 36 observations and 6 variables.

y M n
31

a r t i o
i e t b u
p r t i
Creating Excel Worksheets
r S
Partial PROC CONTENTS Output

r o dis SA The CONTENTS Procedure

P f
Directory

S f o r o
Libref
Engine
Physical Name
ORIONXLS
EXCEL
qtr2007a.xls

A
S No tsit d e #
User

Name
Admin

Member
Type
DBMS
Member
Type

ou
1 qtr1_2007 DATA TABLE named
2 qtr1_2007$ DATA TABLE
worksheets ranges
3 qtr2_2007 DATA TABLE
4 qtr2_2007$ DATA TABLE

32

In Excel, a named range is a descriptive name for a range of cells.


6-22 Chapter 6 Reading Excel Worksheets

Creating Excel Worksheets

ia l
t e r
M a
33

r y i o n
t a u t
r i e i b
Creating Excel Worksheets
r S
p t
As an alternative to the DATA step, the COPY procedure

o dis SA
can be used to create an Excel worksheet.

P r libname orionxls

f
r
's:\workshop\qtr2007b.xls';

S f o o
proc copy in=orion out=orionxls;

e
A
S No tsit
select qtr1_2007 qtr2_2007;
run;

d
proc contents data=orionxls._all_;
run;

ou
libname orionxls clear;

p106d03
34
6.2 Doing More with Excel Worksheets (Self-Study) 6-23

Creating Excel Worksheets


Partial SAS Log
82 proc copy in=orion out=orionxls;
83 select qtr1_2007 qtr2_2007;
84 run;

NOTE: Copying ORION.QTR1_2007 to ORIONXLS.QTR1_2007 (memtype=DATA).


NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 22 observations read from the data set ORION.QTR1_2007.

l
NOTE: The data set ORIONXLS.QTR1_2007 has 22 observations and 5 variables.
NOTE: Copying ORION.QTR2_2007 to ORIONXLS.QTR2_2007 (memtype=DATA).

ia
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 36 observations read from the data set ORION.QTR2_2007.

r
NOTE: The data set ORIONXLS.QTR2_2007 has 36 observations and 6 variables.

a t e
y M n
35

a r t i o
i e t b u
p r t i
Import/Export Wizards and Procedures
r S
The Import/Export Wizards and IMPORT/EXPORT

r o dis SA
procedures enable you to read and write data between
SAS data sets and external PC files.

S P o r o f
The Import/Export Wizards and procedures are part of

A t f e
Base SAS and enable access to delimited files. If you
have a license to SAS/ACCESS Interface to PC File

d
S No tsi Formats, you can also access Microsoft Excel, Microsoft
Access, dBASE, JMP, Lotus 1-2-3, SPSS, Stata, and
Paradox files.

36
ou
6-24 Chapter 6 Reading Excel Worksheets

Import/Export Wizards and Procedures


The wizards and procedures have similar capabilities; the
wizards are point-and-click interfaces and the procedures
are code-based.
To invoke the wizards from
the SAS windowing environment,

l
select File and Import Data or

ia
Export Data.

t e r
M a
37

r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
The Import Wizard enables you to read data from an

o dis SA
external data source and write it to a SAS data set.

P r f
r
Steps of the Import Wizard:

S f o o
1. Select the type of file you are importing.

e
A
2. Locate the input file.

S No tsit d
3. Select the table range or worksheet from which to
import data.
4. Select a location to store the imported file.

ou
5. Save the generated PROC IMPORT code. (Optional)

38
6.2 Doing More with Excel Worksheets (Self-Study) 6-25

The Import Wizard


1. Select the type of file you are importing.

ia l
t e r
M a
39

r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
2. Locate the input file.

o dis SA
P r f
S f o r o
A
S No tsit d e
40
ou
6-26 Chapter 6 Reading Excel Worksheets

The Import Wizard


3. Select the table range or worksheet from which to
import data.

ia l
t e r
M a
41

r y i o n
t a u t
r i e
The Import Wizard
r i b S
p t
4. Select a location to store the imported file.

o dis SA
P r f
S f o r o
A
S No tsit d e
42
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-27

The Import Wizard


5. Save the generated PROC IMPORT code. (Optional)

ia l
t e r
M a
43

r y i o n
t a u t
r i e i
The Import Wizard
r b S
pSAS Log

o dis SA t
NOTE: WORK.SUBSET2A data set was successfully created.

P r f
S f r o
proc print data=work.subset2a;

o
run;

A
S No tsi
Obs
t ID d e
Partial PROC PRINT Output
Employee_
First_Name Last_Name Gender Salary Job_Title Country
Birth_
Date Hire_Date

ou
1 120102 Tom Zhou M 108255 Sales Manager AU 11AUG1969 01JUN1989
2 120103 Wilson Dawes M 87975 Sales Manager AU 22JAN1949 01JAN1974
3 120121 Irenie Elvish F 26600 Sales Rep. II AU 02AUG1944 01JAN1974
4 120122 Christina Ngan F 27475 Sales Rep. II AU 27JUL1954 01JUL1978
5 120123 Kimiko Hotstone F 26190 Sales Rep. I AU 28SEP1964 01OCT1985

p106d04
44
6-28 Chapter 6 Reading Excel Worksheets

The Import Wizard


proc contents data=work.subset2a;
run;

Partial PROC CONTENTS Output


Alphabetic List of Variables and Attributes

8
7
Variable

Birth_Date
Country
Type

Num
Char
Len

8
2
Format

DATE9.
$2.
Informat

DATE9.
$2.
Label

Birth Date
Country

ia l
r
1 Employee_ID Num 8 Employee ID
2 First_Name Char 10 $10. $10. First Name

e
4 Gender Char 1 $1. $1. Gender

t
9 Hire_Date Num 8 DATE9. DATE9. Hire Date
6 Job_Title Char 14 $14. $14. Job Title

a
3 Last_Name Char 12 $12. $12. Last Name
5 Salary Num 8 Salary

y M n
45

a r t i o p106d04

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-29

The IMPORT Procedure


The program p106d04a was created from the Import Wizard.
PROC IMPORT OUT= WORK.subset2a
DATAFILE= "S:\Workshop\sales.xls"
DBMS=EXCEL REPLACE;
RANGE="Australia$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;

ia l
r
USEDATE=YES;
SCANTIME=YES;
RUN;

a t e
y M n
46

a r t i o p106d04a

i e t
OUT=<libref.>SAS-data-set

b u
i
identifies the output SAS data set.

p r
DATAFILE="filename"

t r S
r o dis SA
specifies the complete path and filename or a fileref for the input PC file, spreadsheet,
or delimited external file.

S P o r
DBMS=identifier

o f
specifies the type of data to import. To import a DBMS table, you must specify DBMS= using

A t f e
a valid database identifier. For example, DBMS=EXCEL specifies to import a Microsoft Excel

d
worksheet.

S No tsi
REPLACE
overwrites an existing SAS data set. If you do not specify REPLACE, PROC IMPORT does

ou
not overwrite an existing data set.
RANGE="range-name | absolute-range"
subsets a spreadsheet by identifying the rectangular set of cells to import from the specified
spreadsheet.
GETNAMES=YES | NO
for spreadsheets and delimited external files, determines whether to generate SAS variable
names from the column names in the input file's first row of data. Note that if a column name
contains special characters that are not valid in a SAS name, such as a blank, SAS converts
the character to an underscore.
6-30 Chapter 6 Reading Excel Worksheets

MIXED=YES | NO
converts numeric data values into character data values for a column that contains mixed data
types. The default is NO, which means that numeric data will be imported as missing values in
a character column. If MIXED=YES, then the engine will assign a SAS character type for the
column and convert all numeric data values to character data values.
SCANTEXT=YES | NO
scans the length of text data for a data source column and uses the longest string data that is

l
found as the SAS column width.

USEDATE=YES | NO

r ia
specifies which format to use. If USEDATE=YES, then the DATE. format is used for date/time

e
t
columns in the data source table while importing data from Excel workbook. If

a
USEDATE=NO, then a DATETIME. format is used for date/time.
SCANTIME=YES | NO

M n
scans all row values for a DATETIME data type field and automatically determines the TIME

y o
data type if only time values (that is, no date or datetime values) exist in the column.

t a r t i
e
The Export Wizard
i i b u
r r S
The Export Wizard reads data from a SAS data set and

p t
o dis SA
writes it to an external file source.

P r Steps of the Export Wizard:

f
S f o r
1. Select the data set from which you want to export
data.
o
A
S No tsit d e
2. Select the type of data source to which you want to
export files.
3. Assign the output file.
4. Assign the table name.

ou
5. Save the generated PROC EXPORT code. (Optional)

47
6.2 Doing More with Excel Worksheets (Self-Study) 6-31

The Export Wizard


1. Select the data set from which you want to export data.

ia l
t e r
M a
48

r y i o n
t a u t
r i e
The Export Wizard
r i b S
p t
2. Select the type of data source to which you want

o dis SA
to export files.

P r f
S f o r o
A
S No tsit d e
49
ou
6-32 Chapter 6 Reading Excel Worksheets

The Export Wizard


3. Assign the output file.

ia l
t e r
M a
50

r y i o n
t a u t
r i e
The Export Wizard
r i b S
p t
4. Assign the table name.

o dis SA
P r f
S f o r o
A
S No tsit d e
51
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-33

The Export Wizard


5. Save the generated PROC EXPORT code. (Optional)

ia l
t e r
M a
52

r y i o n
t a u t
r i e
The Export Wizard
r i b S
p
o dis SA
SAS Log
t
r
NOTE: File "S:\Workshop\qtr2007c.xls" will be created if the export
process succeeds.

P f
NOTE: "qtr1" table was successfully created.

S f o r o
A
S No tsit d e
53
ou
6-34 Chapter 6 Reading Excel Worksheets

The Export Wizard

ia l
t e r
M a
54

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6.2 Doing More with Excel Worksheets (Self-Study) 6-35

The EXPORT Procedure


The program p106d04b was created from the Export Wizard.
PROC EXPORT DATA= ORION.QTR1_2007
OUTFILE= "S:\Workshop\qtr2007c.xls"
DBMS=EXCEL REPLACE;
RANGE="qtr1";
RUN;

The RANGE statement is not supported and is ignored

ia l
in the EXPORT procedure.

t e r
M a
55

r y i o n p106d04b

t a
DATA=<libref.>SAS-data-set

u t
r i e r i b
identifies the input SAS data set.

S
p t
OUTFILE="filename"

o dis SA
specifies the complete path and filename or a fileref for the output PC file, spreadsheet,

P r or delimited external file.

f
r
DBMS=identifier

S f o o
specifies the type of data to export. To export a DBMS table, you must specify DBMS= by

e
using a valid database identifier. For example, DBMS=EXCEL specifies to export a table

A
S No tsit d
into a Microsoft Excel worksheet.

ou
6-36 Chapter 6 Reading Excel Worksheets

Exercises

Level 1

4. Using PROC COPY to Create an Excel Worksheet

ia l
e r
a. Write a LIBNAME statement to create a libref called MNTH that references a new Excel
workbook named mnth2007.xls.

t
a
b. Write a PROC COPY step that copies orion.mnth7_2007, orion.mnth8_2007,
and orion.mnth9_2007 to the new Excel workbook.

M
r y i o n
c. Write a PROC CONTENTS step to view all of the contents of MNTH.

t
d. Write a LIBNAME statement to clear the MNTH libref.

Level 2
e t a u
r i t r i b S
p
5. Using the Import Wizard to Read an Excel Worksheet

r o dis SA
a. Use the Import Wizard to read the products.xls workbook.

P f
1) Select the worksheet containing children data.

S f o r o
2) Name the new data set Work.children.

A
S No tsit d e
3) Save the generated PROC IMPORT code to a file called children.sas.
b. Write a PROC PRINT step to create a report of the new data set.
c. Open children.sas to view the PROC IMPORT code.

Level 3
ou
6. Using the EXPORT Procedure to Create an Excel Worksheet
a. Write a PROC EXPORT step to export the data set orion.mnth7_2007 to an Excel
workbook called mnth7.xls.
b. Submit the program and confirm in the log that the mnth_2007 worksheet was successfully
created in mnth7.xls.
6.3 Chapter Review 6-37

6.3 Chapter Review

Chapter Review
1. What statement is used to point to a physical filename
including the path, filename, and extension of an Excel

l
workbook ?

r
2. What character appears at the end of an Excel

e
worksheet name in the SAS Explorer? ia
a t
3. What is an example of a SAS name literal?

y M n
4. How do you disassociate a libref?

a r t i o
i e t b u
i
58

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-38 Chapter 6 Reading Excel Worksheets

6.4 Solutions

Solutions to Exercises
1. Reading an Excel Worksheet
a. Retrieve the starter program.
b. Add a LIBNAME statement.
libname custfm 'custfm.xls';

ia l
proc contents data=custfm._all_;
run;

t e r
data work.males;

M a
run;

r y
proc print data=work.males label;

i o n
run;

t a u t
r i e
libname custfm clear;

r i b S
c. Submit the LIBNAME statement and the PROC CONTENTS step.

p
o dis SA t
d. Add a SET statement in the DATA step.

P r
data work.males;
set custfm.'Males$'n;

f
run;

S f o r o
e. Add a KEEP statement in the DATA step.

A
S No tsit
data work.males;

d e
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
run;

ou
f. Add a FORMAT statement in the DATA step.
data work.males;
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
format Birth_Date year4.;
run;
g. Add a LABEL statement.
data work.males;
set custfm.'Males$'n;
keep First_Name Last_Name Birth_Date;
format Birth_Date year4.;
label Birth_Date='Birth Year';
run;
h. Submit the program.
6.4 Solutions 6-39

2. Reading an Excel Worksheet


a. Write a LIBNAME statement.
libname prod 'products.xls';
b. Write a PROC CONTENTS step.
proc contents data=prod._all_;
run;
c. Submit the program.
d. Write a DATA step.

ia l
data work.golf;
set prod.'Sports$'n;

t e r
a
where Category='Golf';
drop Category;

run;

y M
label Name='Golf Products';

n
r
e. Write a LIBNAME statement.

a t i o
t
libname prod clear;

i e b u
f. Write a PROC PRINT step.

i
r r
proc print data=work.golf label;
run;

p t S
r o dis SA
3. Reading a Range of an Excel Worksheet

S P o f
a. Write a LIBNAME statement.

r
libname xlsdata 'custcaus.xls' header=no;

o
A f e
b. Write a PROC CONTENTS step.

t d
S No tsi
proc contents data=xlsdata._all_;
run;
c. Submit the program.

ou
d. Write a DATA step.
data work.germany;
set xlsdata.DE;
label F1='Customer ID'
F2='Country'
F3='Gender'
F4='First Name'
F5='Last Name'
F6='Birth Date';
format F6 ddmmyy8.;
run;
e. Write a LIBNAME statement.
libname xlsdata clear;
6-40 Chapter 6 Reading Excel Worksheets

f. Write a PROC PRINT step.


proc print data=work.germany label;
run;
4. Using PROC COPY to Create an Excel Worksheet
a. Write a LIBNAME statement.
libname mnth 'mnth2007.xls';
b. Write a PROC COPY step.
proc copy in=orion out=mnth;

ia l
r
select mnth7_2007 mnth8_2007 mnth9_2007;

e
run;

a t
c. Write a PROC CONTENTS step.
proc contents data=mnth._all_;
run;

y M n
o
d. Write a LIBNAME statement.

a
libname mnth clear;

t r t i
i e i b u
5. Using the Import Wizard to Read an Excel Worksheet

p r t r
a. Use the Import Wizard.

S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6.4 Solutions 6-41

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
6-42 Chapter 6 Reading Excel Worksheets

ia l
t e r
M a
r y i o n
t a u
b. Write a PROC PRINT step. t
i e i b
proc print data=work.children;
run;
r r S
p
o dis SA t
c. Open children.sas.

P r
PROC IMPORT OUT= WORK.children

f
DATAFILE= "S:\Workshop\products.xls"

S o r o
DBMS=EXCEL REPLACE;
RANGE="Children$";

f
A
S No tsit
GETNAMES=YES;
MIXED=NO;

d
SCANTEXT=YES;
USEDATE=YES;
e
SCANTIME=YES;

ou
RUN;
6. Using the EXPORT Procedure to Create an Excel Worksheet
a. Write a PROC EXPORT step.
proc export data=orion.mnth7_2007
outfile='mnth7.xls'
dbms=excel replace;
run;
b. Submit the program.
6.4 Solutions 6-43

Solutions to Student Activities (Polls/Quizzes)

6.01 Quiz – Correct Answer


Which PROC PRINT step displays the worksheet
containing employees from the United States?

a. proc print data=orionxls.'UnitedStates';

ia l
r
run;

t e
b. proc print data=orionxls.'UnitedStates$';
run;

a
run;

y M
c. proc print data=orionxls.'UnitedStates'n;

n
r i o
d. proc print data=orionxls.'UnitedStates$'n;

a t
t
run;

20

i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
6-44 Chapter 6 Reading Excel Worksheets

Solutions to Chapter Review

Chapter Review Answers


1. What statement is used to point to a physical filename
including the path, filename, and extension of an Excel

l
workbook ?

ia
a LIBNAME statement

e r
2. What character appears at the end of an Excel

t
worksheet name in the SAS Explorer?
$

M a
r
orionxls.'Australia$'n

i o n
3. What is an example of a SAS name literal?

y
t a u t
59

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
r
4. How do you disassociate a libref?

S f o
CLEAR option

e o
A
S No tsit d
ou
60
Chapter 7 Reading Delimited Raw
Data Files

7.1 Using Standard Delimited Data as Input....................................................................... 7-3

ia l
Exercises .............................................................................................................................. 7-27

7.2
t e r
Using Nonstandard Delimited Data as Input .............................................................. 7-30

a
Exercises .............................................................................................................................. 7-44

M
7.3
y o n
Chapter Review ............................................................................................................. 7-48

r i
7.4
t a u t
Solutions ....................................................................................................................... 7-49

r i e i b
Solutions to Exercises .......................................................................................................... 7-49

r S
p t
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 7-52

o dis SA
r
Solutions to Chapter Review ................................................................................................ 7-55

S P o r o f
A t f d e
S No tsi
ou
7-2 Chapter 7 Reading Delimited Raw Data Files

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
7.1 Using Standard Delimited Data as Input 7-3

7.1 Using Standard Delimited Data as Input

Objectives
Use the DATA step to create a SAS data set from
a delimited raw data file.
Examine the compilation and execution phases
of the DATA step when reading a raw data file.

ia l
r
Explicitly define the length of a variable by using

e
the LENGTH statement.

a t
y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
Business Scenario

S P o r o f
An existing data source contains information on
Orion Star sales employees from Australia and

f
the United States.

A
S No tsit d e
A new SAS data set needs to be created that contains
a subset of this existing data source.
This new SAS data set must contain the following:
only the employees from Australia who are Sales

ou
Representatives
the employee’s first name, last name, salary, job title,
and hired date
labels and formats in the descriptor portion

4
7-4 Chapter 7 Reading Delimited Raw Data Files

Business Scenario

Reading SAS
Data Sets

Reading Excel
Worksheets
ia l
t e r
Reading Delimited
Raw Data Files

M a
5

r y i o n
t a u t
r i e
Business Scenario
r i b S
p
o dis SA t libname
data ;
;

r
Reading SAS set ;
Data Sets

P f
...

r
run;

S f o e o libname ;

A
data ;

t
Reading Excel

d
set ;

S No tsi Worksheets ...


run;

ou
data ;
Reading Delimited infile ;
input ;
Raw Data Files ...
run;
6
7.1 Using Standard Delimited Data as Input 7-5

sales.csv
Partial sales.csv comma delimited
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985

l
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979

ia
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006

r
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998
120128,Monica,Kletschkus,F,30890,Sales Rep. IV,AU,14JUL1986,11/01/2006

e
120129,Alvin,Roebuck,M,30070,Sales Rep. III,AU,22NOV1964,10/01/1985

t
120130,Kevin,Lyon,M,26955,Sales Rep. I,AU,14DEC1984,05/01/2006

a
120131,Marinus,Surawski,M,26910,Sales Rep. I,AU,25SEP1979,01/01/2003
120132,Fancine,Kaiser,F,28525,Sales Rep. III,AU,05APR1949,10/01/1978

M
120133,Petrea,Soltau,F,27440,Sales Rep. II,AU,22APR1986,10/01/2006

n
120134,Sian,Shannan,M,28015,Sales Rep. II,AU,06JUN1949,01/01/1974

y o
120135,Alexei,Platts,M,32490,Sales Rep. IV,AU,26JAN1969,10/01/1997
7

t a r t i
i i b u
The raw data filename needs to be specific to your operating environment.

e
p r t r
Business Scenario Syntax
S
r o dis SA
Use the following statements to complete the scenario:

S P o r o f
DATA output-SAS-data-set;
LENGTH variable(s) $ length;

A t f e
INFILE 'raw-data-file-name';

d
INPUT specifications;

S No tsi KEEP variable-list;


LABEL variable = 'label'
variable = 'label'

ou
variable = 'label';
FORMAT variable(s) format;
RUN;

8
7-6 Chapter 7 Reading Delimited Raw Data Files

The DATA Statement (Review)


The DATA statement begins a DATA step and provides
the name of the SAS data set being created.

DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
INPUT specifications;
<additional SAS statements>
RUN;

ia l
t e r
The DATA statement can create temporary or permanent

a
data sets.

y M n
9

a r t i o
i e t b u
p r t i
The INFILE Statement
r S
The INFILE statement identifies the physical name

r o dis SA
of the raw data file to read with an INPUT statement.

S P o r o f
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
INPUT specifications;

A t f e
<additional SAS statements>

d
S No tsi
RUN;

The physical name is the name that the operating

ou
environment uses to access the file.

10
7.1 Using Standard Delimited Data as Input 7-7

The INFILE Statement


Examples:

Windows infile 's:\workshop\sales.csv';

UNIX infile '/users/userid/sales.csv';

ia l
z/OS
(OS/390)
e r
infile '.workshop.rawdata(sales)';

t
M a
11

r y i o n
t a u t
r i e i
The INPUT Statement
r b S
p t
The INPUT statement describes the arrangement of

o dis SA
values in the raw data file and assigns input values

P r to the corresponding SAS variables.

f
S f o r o
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';

A
S No tsit RUN;
d e
INPUT specifications;
<additional SAS statements>

ou
The following are input specifications:
column input
formatted input
list input
12

Column input enables you to read standard data values that are aligned in columns in the raw data file.
Formatted input combines the flexibility of using informats with many of the features of column input.
By using formatted input, you can read nonstandard data for which SAS requires additional instructions.
List input uses a scanning method for locating data values. Data values are not required to be aligned
in columns, but must be separated by at least one blank or other defined delimiter.
7-8 Chapter 7 Reading Delimited Raw Data Files

7.01 Multiple Answer Poll


Which types of raw data files do you read?
a. delimited raw data files
b. raw data files aligned in columns
c. other

l
d. none

ia
e. not sure

t e r
M a
14

r y i o n
t a u t
r i e
List Input
r i b S
p t
To read with list input, data values

o dis SA
must be separated with a delimiter

P r can be in standard or nonstandard form.

f
S o r
Partial sales.csv

f o
A
S No tsit e
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989

d
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979

ou
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998

15
7.1 Using Standard Delimited Data as Input 7-9

Delimiter
A space (blank) is the default delimiter.

The DLM= option can be added to the INFILE statement


to specify an alternate delimiter.

DATA output-SAS-data-set;
INFILE 'raw-data-file-name' DLM='delimiter';

ia l
r
INPUT specifications;

e
<additional SAS statements>
RUN;

a t
y M n
16

a r t i o
i e t b u
The DLM= option is an alias for the DELIMITER= option.

p r t i
To specify a tab delimiter on Windows or UNIX, type dlm='09'x.

r S
To specify a tab delimiter on z/OS (OS/390), type dlm='05'x.

r o dis SA
The DSD (delimiter-sensitive data) option changes how SAS treats delimiters when you use LIST input

P f
and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters

S o r
as a missing value and removes quotation marks from character values. The DSD option specifies that

o
when data values are enclosed in quotation marks, delimiters within the value be treated as character data.

f
A
S No tsit d e
Standard and Nonstandard Data
Standard data is data that SAS can read without

ou
any special instructions.

Examples of standard numeric data:


58 -23 67.23 00.99 5.67E5 1.2E-2

Nonstandard data is any data that SAS cannot read


without a special instruction.

Examples of nonstandard numeric data:


5,823 (23) $67.23 01/12/1999 12MAY2006

17
7-10 Chapter 7 Reading Delimited Raw Data Files

List Input for Standard Data


List input specification:

INPUT variable <$>;

Variables must be specified in the order that they


appear in the raw data file, left to right.
$ indicates to store a variable value as a character

ia l
r
value rather than as a numeric value.

e
The default length for character and numeric variables
is eight bytes.
t
M a
18

r y i o n
t a u t
r i e i b
List Input for Standard Data
r S
p
o dis SA
Partial sales.csv
t
r
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974

P f
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974

r
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978

S o o
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985

f
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979

e
A
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979

S No tsit d
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998

input Employee_ID First_Name $ Last_Name $

ou
Gender $ Salary Job_Title $ Country $;

19
7.1 Using Standard Delimited Data as Input 7-11

Business Scenario
Create a temporary SAS data set named Work.subset3
from the delimited raw data file named sales.csv.

data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $

run;
Gender $ Salary Job_Title $ Country $;

ia l
t e r
M a
20

r y i o n p107d01

t a u t
r i e
Business Scenario
r i b S
281
282
p t
data work.subset3;

o dis SA
infile 'sales.csv' dlm=',';

r
283 input Employee_ID First_Name $ Last_Name $
284 Gender $ Salary Job_Title $ Country $;

P f
285 run;

S f o r o
NOTE: The infile 'sales.csv' is:
File Name=S:\Workshop\sales.csv,
RECFM=V,LRECL=256

A
S No tsit
NOTE: 165
The
The
NOTE: The
d e
records were read from the infile 'sales.csv'.
minimum record length was 61.
maximum record length was 80.
data set WORK.SUBSET3 has 165 observations and 7 variables.

21
ou
7-12 Chapter 7 Reading Delimited Raw Data Files

Business Scenario
proc print data=work.subset3;
run;

Partial PROC PRINT Output


Employee_ First_ Last_ Job_

l
Obs ID Name Name Gender Salary Title Country

ia
1 120102 Tom Zhou M 108255 Sales Ma AU
2 120103 Wilson Dawes M 87975 Sales Ma AU
3 120121 Irenie Elvish F 26600 Sales Re AU

r
4 120122 Christin Ngan F 27475 Sales Re AU

e
5 120123 Kimiko Hotstone F 26190 Sales Re AU
6 120124 Lucian Daymond M 26480 Sales Re AU
7
8
9
10
120125
120126
120127
120128
Fong

a
Satyakam
Sharryn
Monica t Hofmeist
Denny
Clarkson
Kletschk
M
M
F
F
32040
26780
28100
30890
Sales
Sales
Sales
Sales
Re
Re
Re
Re
AU
AU
AU
AU

M
11 120129 Alvin Roebuck M 30070 Sales Re AU
12 120130 Kevin Lyon M 26955 Sales Re AU

22

r y i o n p107d01

t a u t
r i e i
DATA Step Processing
r b S
p t
The DATA step is processed in two phases:

o dis SA
compilation

P r execution

f
S f o r o
Compile Program

A
S No tsit d e
Initialize Variables to Missing (PDV)

Execute Read Statement


End of
File?
Yes

ou
No
Execute Other Statements
Next
Step
Output to SAS Data Set
24
7.1 Using Standard Delimited Data as Input 7-13

Compilation
During the compilation phase, SAS
checks the syntax of the DATA step statements
creates an input buffer to hold the current raw
data file record that is being processed
creates a program data vector (PDV) to hold the
current SAS observation
creates the descriptor portion of the output data set.

ia l
t e r
M a
25

r y i o n
t a u t
r i e
Compilation
r i b S
p
data work.subset3;

o dis SA t
infile 'sales.csv' dlm=',';

P r
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;

f
r
run;

S f o e o
A
S No tsit d
26
ou ...
7-14 Chapter 7 Reading Delimited Raw Data Files

Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

ia l
t e r
M a
27

r y i o n ...

t a u t
r i e
Compilation
r i b S
p
data work.subset3;

o dis SA t
infile 'sales.csv' dlm=',';

P rinput Employee_ID First_Name $ Last_Name $


Gender $ Salary Job_Title $ Country $;

f
r
run;

S f o e o
A
S No tsit
PDV
Input Buffer

d
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

ou
Employee
_ID The default length for numeric
N 8 variables is eight bytes.

28 ...
7.1 Using Standard Delimited Data as Input 7-15

Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

ia l
PDV

t e r
a
Employee First_
_ID Name For list input, the default length for
$ 8 character variables is eight bytes.

M
N 8

29

r y i o n ...

t a u t
r i e
Compilation
r i b S
p
data work.subset3;

o dis SA t
infile 'sales.csv' dlm=',';

P r
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;

f
r
run;

S f o e o
A
S No tsi
PDV
Input Buffer

t d
1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

ou
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

30 ...
7-16 Chapter 7 Reading Delimited Raw Data Files

Compilation
data work.subset3;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

ia l
t
Descriptor Portion Work.subset3
e r
a
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8

M
N 8 $ 8

31

r y i o n ...

t a u t
r i e i b
7.02 Multiple Choice Poll
r S
p t
Which statement is true?

o dis SA
P r a. An input buffer is only created if you are reading data

f
from a raw data file.

S f o r o
b. The PDV at compile time holds the variable name,
type, byte size, and initial value.

A
S No tsit d e
c. The descriptor portion is the first item that is created
at compile time.

33
ou
7.1 Using Standard Delimited Data as Input 7-17

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... InitializeFirst_Name
input Employee_ID PDV $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...

Input Buffer 1

r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

e ia
PDV
Employee First_ Last
a t Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
. .
35

t a r t i
...

i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;

r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $

P
120121,Irenie,Elvish, ...

r
120122,Christina,Ngan, ...

o
120123,Kimiko,Hotstone, ...

S o
120124,Lucian,Daymond, ... f Last_Name $ Gender $
Salary Job_Title $
Country $;

f
run;

e
120125,Fong,Hofmeister, ...

A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
. .
36 ...
7-18 Chapter 7 Reading Delimited Raw Data Files

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...

Input Buffer 1

e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 , ia
PDV
Employee First_
a
Lastt Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
. .
37

t a r t i
...

i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;

r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $

P
120121,Irenie,Elvish, ...

S r f
120122,Christina,Ngan, ...

o
120123,Kimiko,Hotstone, ...

o
120124,Lucian,Daymond, ...
Last_Name $ Gender $
Salary Job_Title $
Country $;

f
run;

e
120125,Fong,Hofmeister, ...

A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,

ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
120102 Tom Zhou M 108255 Sales Ma AU
38 ...
7.1 Using Standard Delimited Data as Input 7-19

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...
Implicit OUTPUT;

ia
Implicit RETURN;
Input Buffer 1 2

e r
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,

t
a
PDV
Employee First_ Last Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
120102 Tom Zhou M 108255 Sales Ma AU
39

t a r t i
...

i e i b u
p r
Execution

t r S
Output SAS Data Set after First Iteration of DATA Step

r o dis SA
Work.subset3

P f
Employee First_ Last Job_
Gender Salary Country

r
_ID Name _Name Title

S f o
120102 Tom

e o
Zhou M 108255 Sales Ma AU

A
S No tsit d
40
ou ...
7-20 Chapter 7 Reading Delimited Raw Data Files

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... Reinitialize
input Employee_ID PDV
First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...

Input Buffer 1

e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 , ia
PDV
Employee First_
a
Last t Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
. .
41

t a r t i
...

i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;

r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $

P
120121,Irenie,Elvish, ...

S r f
120122,Christina,Ngan, ...

o
120123,Kimiko,Hotstone, ...

o
120124,Lucian,Daymond, ...
Last_Name $ Gender $
Salary Job_Title $
Country $;

f
run;

e
120125,Fong,Hofmeister, ...

A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 2 , T o m , Z h o u , M , 1 0 8 2 5 5 ,

ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
. .
42 ...
7.1 Using Standard Delimited Data as Input 7-21

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...

Input Buffer 1

e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9 ia
PDV
Employee First_
a
Last t Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
. .
43

t a r t i
...

i e i b u
p r
Execution
Partial sales.csv
t r S
o dis SA
120102,Tom,Zhou, ... data work.subset3;

r
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
input Employee_ID First_Name $

P
120121,Irenie,Elvish, ...

r
120122,Christina,Ngan, ...

o
120123,Kimiko,Hotstone, ...

S o
120124,Lucian,Daymond, ... f Last_Name $ Gender $
Salary Job_Title $
Country $;

f
run;

e
120125,Fong,Hofmeister, ...

A
S No tsit d
Input Buffer 1 2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9

ou
PDV
Employee First_ Last Job_
Gender Salary Country
_ID Name _Name Title
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8
120103 Wilson Dawes M 87975 Sales Ma AU
44 ...
7-22 Chapter 7 Reading Delimited Raw Data Files

Execution
Partial sales.csv
120102,Tom,Zhou, ... data work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...
Implicit OUTPUT;

ia
Implicit RETURN;
Input Buffer 1 2

e r
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9

t
a
PDV
Employee First_ Last Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
120103 Wilson Dawes M 87975 Sales Ma AU
45

t a r t i
...

i e i b u
p r
Execution

t r S
Output SAS Data Set after Second Iteration of DATA Step

r o dis SA
Work.subset3

P f
Employee First_ Last Job_
Gender Salary Country

r
_ID Name _Name Title

S f o
120102 Tom

e
120103 Wilson
o Zhou
Dawes
M
M
108255 Sales Ma AU
87975 Sales Ma AU

A
S No tsit d
46
ou ...
7.1 Using Standard Delimited Data as Input 7-23

Execution
Partial sales.csv
120102,Tom,Zhou, ... Continue
data until EOF
work.subset3;
120103,Wilson,Dawes, ... infile 'sales.csv' dlm=',';
120121,Irenie,Elvish, ... input Employee_ID First_Name $
120122,Christina,Ngan, ... Last_Name $ Gender $
120123,Kimiko,Hotstone, ... Salary Job_Title $
120124,Lucian,Daymond, ... Country $;

l
run;
120125,Fong,Hofmeister, ...

Input Buffer 1

e r
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
1 2 0 1 0 3 , W i l s o n , D a w e s , M , 8 7 9 ia
PDV
Employee First_ Last
a t Job_

M
Gender Salary Country
_ID Name _Name Title

n
$ 8 $ 8 N 8 $ 8 $ 8
N 8 $ 8

y o
120103 Wilson Dawes M 87975 Sales Ma AU
47

t a r t i
i e i b u
p r
7.03 Multiple Choice Poll

t r
Which statement is true?
S
r o dis SA
a. Data is read directly from the raw data file to the PDV.

P f
b. At the bottom of the DATA step, the contents of the

S o r o
PDV are output to the output SAS data set.
c. When SAS returns to the top of the DATA step, any

f
A
S No tsit d
missing.
e
variable coming from a SAS data set is set to

49
ou
7-24 Chapter 7 Reading Delimited Raw Data Files

The LENGTH Statement


The LENGTH statement defines the length of a variable
explicitly.

General form of the LENGTH statement:

l
LENGTH variable(s) $ length;

Example:

e r ia
t
length First_Name Last_Name $ 12

a
Gender $ 1;

y M n
52

a r t i o
i e t b u
p r
Business Scenario

t r i S
Create a temporary SAS data set named Work.subset3

r o dis SA
from the delimited raw data file named sales.csv.

S P
data work.subset3;

o r o f
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25

A t f e
Country $ 2;
infile 'sales.csv' dlm=',';

d
S No tsi
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

53
ou p107d02
7.1 Using Standard Delimited Data as Input 7-25

Business Scenario
proc print data=work.subset3;
run;

Partial PROC PRINT Output


First_ Employee_
Obs Name Last_Name Gender Job_Title Country ID Salary

1
2
3
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
120102
120103
120121
108255
87975
26600

ia l
r
4 Christina Ngan F Sales Rep. II AU 120122 27475
5 Kimiko Hotstone F Sales Rep. I AU 120123 26190

e
6 Lucian Daymond M Sales Rep. I AU 120124 26480

t
7 Fong Hofmeister M Sales Rep. IV AU 120125 32040
8 Satyakam Denny M Sales Rep. II AU 120126 26780

a
9 Sharryn Clarkson F Sales Rep. II AU 120127 28100
10 Monica Kletschkus F Sales Rep. IV AU 120128 30890
11 Alvin Roebuck M Sales Rep. III AU 120129 30070

M
12 Kevin Lyon M Sales Rep. I AU 120130 26955

54

r y i o n p107d02

t a u t
r i e
Compilation
r i b S
p
data work.subset3;

o dis SA t
length First_Name $ 12 Last_Name $ 18

P r Gender $ 1 Job_Title $ 25
Country $ 2;

f
r
infile 'sales.csv' dlm=',';

S f o o
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;

e
A
run;

S No tsi
PDV t
First
d
Last
Gender Job_Title Country

ou
_Name _Name
$ 1 $ 25 $ 2
$ 12 $ 18

55 ...
7-26 Chapter 7 Reading Delimited Raw Data Files

Compilation
data work.subset3;
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25
Country $ 2;
infile 'sales.csv' dlm=',';
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $;
run;

ia l
PDV
First
_Name
Last
_Name
Gender

t e r
Job_Title Country
Employee
_ID
Salary

a
$ 1 $ 25 $ 2 N 8
$ 12 $ 18 N 8

y M n
56

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
7.1 Using Standard Delimited Data as Input 7-27

Exercises

Level 1

1. Reading a Comma-Delimited Raw Data File


a. Retrieve the starter program p107e01.

ia l
e r
b. Add the appropriate LENGTH, INFILE, and INPUT statements to read the comma-delimited

t
raw data file named the following:

Windows or UNIX

M a newemps.csv

z/OS (OS/390)

r y
Partial Raw Data File
i o n
.workshop.rawdata(newemps)

t a u t
Satyakam,Denny,Sales Rep. II,26780

e
Monica,Kletschkus,Sales Rep. IV,30890

r i r i b
Kevin,Lyon,Sales Rep. I,26955

S
Petrea,Soltau,Sales Rep. II,27440

t
p
Marina,Iyengar,Sales Rep. III,29715

r o dis SA
The following variables should be read into the program data vector:

S P o r
Name
First
o f
Type

Character
Length

12

A t f e
Last

d
Character 18

S No tsi Title

Salary
Character

Numeric
25

ou
c. Submit the program to create the following PROC PRINT report:
Partial PROC PRINT Output (First 5 of 71 Observations)
Obs First Last Title Salary

1 Satyakam Denny Sales Rep. II 26780


2 Monica Kletschkus Sales Rep. IV 30890
3 Kevin Lyon Sales Rep. I 26955
4 Petrea Soltau Sales Rep. II 27440
5 Marina Iyengar Sales Rep. III 29715
7-28 Chapter 7 Reading Delimited Raw Data Files

Level 2

2. Reading a Space-Delimited Raw Data File


a. Write a DATA step to create a new data set named Work.QtrDonation by reading
the space-delimited raw data file named the following:

Windows or UNIX donation.dat

l
z/OS (OS/390) .workshop.rawdata(donation)

Partial Raw Data File


120265 . . . 25

e r ia
t
120267 15 15 15 15
120269 20 20 20 20
120270
120271
20 10
20 20

M a
5 .
20 20

r i o n
The following variables should be read into the program data vector:

y t
Name Type Length

e t a IDNum
u Character 6

r i Qtr1

t r i b S
Numeric 8

r p
o dis SA
Qtr2

Qtr3
Numeric

Numeric
8

S P o r Qtr4

o f Numeric 8

A t f d e
b. Write a PROC PRINT step to create the following report:

S No tsi
Partial PROC PRINT Output (First 10 of 124 Observations)

Obs IDNum Qtr1 Qtr2 Qtr3 Qtr4

ou
1 120265 . . . 25
2 120267 15 15 15 15
3 120269 20 20 20 20
4 120270 20 10 5 .
5 120271 20 20 20 20
6 120272 10 10 10 10
7 120275 15 15 15 15
8 120660 25 25 25 25
9 120662 10 . 5 5
10 120663 . . 5 .
7.1 Using Standard Delimited Data as Input 7-29

Level 3

3. Using Column Input to Read a Fixed Column Raw Data File


a. Write a DATA step to create a new data set named Work.supplier_info by reading
the fixed column raw data file named the following:

Windows or UNIX supplier.dat

l
z/OS (OS/390) .workshop.rawdata(supplier)

Use column input in the INPUT statement to read the fixed column data.

e r
Documentation on column input can be found in the SAS Help ia
a t
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements

M
Statements INPUT Statement, Column).

50

r y
Partial Raw Data File

i o n
Scandinavian Clothing A/S NO
109
316

t a
Petterson AB

u
Prime Sports Ltd
t SE
GB
755
772

r i e
Top Sports

i b
AllSeasons Outdoor Clothing

r S
DK
US

p t
The following is the layout of the raw data file:

o dis SA
r
Name Starting Column Ending Column

S P o r
ID

Name
o f 1

8
5

37

A t f d e
Country 40 41

S No tsi
b. Write a PROC PRINT step to create the following report:

ou
Partial PROC PRINT Output (First 10 of 52 Observations)
Obs ID Name Country

1 50 Scandinavian Clothing A/S NO


2 109 Petterson AB SE
3 316 Prime Sports Ltd GB
4 755 Top Sports DK
5 772 AllSeasons Outdoor Clothing US
6 798 Sportico ES
7 1280 British Sports Ltd GB
8 1303 Eclipse Inc US
9 1684 Magnifico Sports PT
10 1747 Pro Sportswear Inc US
7-30 Chapter 7 Reading Delimited Raw Data Files

7.2 Using Nonstandard Delimited Data as Input

Objectives
Use informats to read nonstandard data.
Add additional SAS statements to perform further
processing in the DATA step.
Use the DSD option with list input to read consecutive

ia l
r
delimiters as missing values.

t e
Use the MISSOVER option to recognize missing
values at the end of a record (Self-Study).

a
y M n
a r t i o
i e t b u
i
60

p r t r S
r o dis SA
Standard and Nonstandard Data

S P o o f
Standard data is data that SAS can read without

r
any special instructions.

A t f e
Examples of standard numeric data:

d
S No tsi 58 -23 67.23 00.99 5.67E5 1.2E-2

Nonstandard data is any data that SAS cannot read

ou
without a special instruction.

Examples of nonstandard numeric data:


5,823 (23) $67.23 01/12/1999 12MAY2006

61
7.2 Using Nonstandard Delimited Data as Input 7-31

List Input for Nonstandard Data


List input specification:

INPUT variable <$> variable < :informat >;

The : format modifier enables you to use an informat


to read nonstandard delimited data.
An informat is an instruction that SAS uses to read

ia l
data values into a variable.

e r
The width of the informat can be eliminated.

t
a
For character variables, the width of the informat
determines the variable length, if it has not been
previously defined.

y M n
62

a r t i o
i e t b u
p r
SAS Informats

t r i S
SAS informats have the following form:

r o dis SA
<$>informat<w>.<d>

S P o r
$

o f
indicates a character informat.

A t finformat

d enames the SAS informat or user-defined informat.

S No tsi w
specifies the number of columns to read in the
input data.

ou
. is a required delimiter.

specifies an optional decimal scaling factor in the


d
numeric informats.

63
7-32 Chapter 7 Reading Delimited Raw Data Files

SAS Informats
Selected SAS Informats:

Informat Definition
$w. reads standard character data.
w.d reads standard numeric data.

COMMAw.d
DOLLARw.d
reads nonstandard numeric data and removes
embedded commas, blanks, dollar signs, percent
signs, and dashes.

ia l
t e r
COMMAXw.d reads nonstandard numeric data and removes
embedded periods, blanks, dollar signs, percent signs,
DOLLARXw.d and dashes.

EUROXw.d

M a
reads nonstandard numeric data and removes
embedded characters in European currency.

64

r y i o n
t a u t
r i e
SAS Informats
r i b S
p t
In list input, informats are used to convert nonstandard

o dis SA
numeric data to SAS numeric values.

P r Informat

f
Raw Data Value SAS Data Value

S f o r COMMA.
DOLLAR.
o $12,345 12345

A
S No tsit e
COMMAX.

d
DOLLARX.
EUROX.
$12.345

€12.345
12345

12345

65
ou
7.2 Using Nonstandard Delimited Data as Input 7-33

SAS Informats
SAS uses date informats to read and convert dates
to SAS date values.

Informat Raw Data Value SAS Data Value


010160

l
MMDDYY. 01/01/60 0
01/01/1960

DDMMYY.

e r
311260
31/12/60 365
ia
t
31/12/1960

DATE.

M a 31DEC59
31DEC1959
-1

66

r y i o n
t a u t
r i e i b
List Input for Nonstandard Data
r S
p
Partial sales.csv

o dis SA t
120102,Tom,Zhou,M,108255,Sales Manager,AU,11AUG1969,06/01/1989

P r
120103,Wilson,Dawes,M,87975,Sales Manager,AU,22JAN1949,01/01/1974

f
120121,Irenie,Elvish,F,26600,Sales Rep. II,AU,02AUG1944,01/01/1974

r
120122,Christina,Ngan,F,27475,Sales Rep. II,AU,27JUL1954,07/01/1978

S f o o
120123,Kimiko,Hotstone,F,26190,Sales Rep. I,AU,28SEP1964,10/01/1985
120124,Lucian,Daymond,M,26480,Sales Rep. I,AU,13MAY1959,03/01/1979

e
A
120125,Fong,Hofmeister,M,32040,Sales Rep. IV,AU,06DEC1954,03/01/1979

t d
120126,Satyakam,Denny,M,26780,Sales Rep. II,AU,20SEP1988,08/01/2006

S No tsi
120127,Sharryn,Clarkson,F,28100,Sales Rep. II,AU,04JAN1979,11/01/1998

input Employee_ID First_Name $ Last_Name $

ou
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;

67
7-34 Chapter 7 Reading Delimited Raw Data Files

7.04 Quiz
Which INPUT statement correctly uses list input to read
the space-delimited raw data file?

Raw Data
Donny 5MAY2008 25 FL $43,132.50
Margaret 20FEB2008 43 NC 65,150

a. input name $ hired date. age

ia l
t e r
state $ salary comma.;

a
b. input name $ hired :date. age
state $ salary :comma.;

M
69

r y i o n
t a u t
r i e
Business Scenario
r i b S
p t
Create a temporary SAS data set named Work.subset3

o dis SA
from the delimited raw data file named sales.csv.

P r
data work.subset3;

f
S f o r o
length First_Name $ 12 Last_Name $ 18
Gender $ 1 Job_Title $ 25

e
Country $ 2;

A
S No tsit
infile 'sales.csv' dlm=',';

d
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;

ou
run;

p107d03
71
7.2 Using Nonstandard Delimited Data as Input 7-35

Business Scenario
proc print data=work.subset3;
run;

Partial PROC PRINT Output


First_ Employee_ Birth_ Hire_
Obs Name Last_Name Gender Job_Title Country ID Salary Date Date

1
2
3
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
M
M
F
Sales
Sales
Sales
Manager
Manager
Rep. II
AU
AU
AU
120102
120103
120121
108255
87975
26600
3510
-3996
-5630
10744
5114
5114

ia l
r
4 Christina Ngan F Sales Rep. II AU 120122 27475 -1984 6756
5 Kimiko Hotstone F Sales Rep. I AU 120123 26190 1732 9405
6 Lucian Daymond M Sales Rep. I AU 120124 26480 -233 6999

e
7 Fong Hofmeister M Sales Rep. IV AU 120125 32040 -1852 6999

t
8 Satyakam Denny M Sales Rep. II AU 120126 26780 10490 17014
9 Sharryn Clarkson F Sales Rep. II AU 120127 28100 6943 14184

a
10 Monica Kletschkus F Sales Rep. IV AU 120128 30890 9691 17106
11 Alvin Roebuck M Sales Rep. III AU 120129 30070 1787 9405
12 Kevin Lyon M Sales Rep. I AU 120130 26955 9114 16922

M
13 Marinus Surawski M Sales Rep. I AU 120131 26910 7207 15706
14 Fancine Kaiser F Sales Rep. III AU 120132 28525 -3923 6848

72

r y i o n p107d03

t a u t
r i e i b
Additional SAS Statements
r S
p t
Additional SAS statements can be added to perform

o dis SA
further processing in the DATA step.

P r
data work.subset3;

f
length First_Name $ 12 Last_Name $ 18

S f o r
Country $ 2;
o
Gender $ 1 Job_Title $ 25

e
infile 'sales.csv' dlm=',';

A
S No tsit d
input Employee_ID First_Name $ Last_Name $
Gender $ Salary Job_Title $ Country $
Birth_Date :date.
Hire_Date :mmddyy.;
keep First_Name Last_Name Salary

ou
Job_Title Hire_Date;
label Job_Title='Sales Title'
Hire_Date='Date Hired';
format Salary dollar12. Hire_Date monyy7.;
run; p107d04
73
7-36 Chapter 7 Reading Delimited Raw Data Files

Additional SAS Statements


proc print data=work.subset3 label;
run;

Partial PROC PRINT Output


First_ Date

l
Obs Name Last_Name Sales Title Salary Hired

ia
1 Tom Zhou Sales Manager $108,255 JUN1989
2 Wilson Dawes Sales Manager $87,975 JAN1974
3
4
5
6
Irenie
Christina
Kimiko
Lucian
Elvish
Ngan

Daymond

t
Hotstone

e r Sales
Sales
Sales
Sales
Rep. II
Rep. II
Rep. I
Rep. I
$26,600
$27,475
$26,190
$26,480
JAN1974
JUL1978
OCT1985
MAR1979

a
7 Fong Hofmeister Sales Rep. IV $32,040 MAR1979
8 Satyakam Denny Sales Rep. II $26,780 AUG2006

M
9 Sharryn Clarkson Sales Rep. II $28,100 NOV1998
10 Monica Kletschkus Sales Rep. IV $30,890 NOV2006

74

r y i o n p107d04

t a u t
r i e i b
Additional SAS Statements
r S
p t
The WHERE statement is used to obtain a subset

o dis SA
of observations from an input data set.

P r The WHERE statement cannot be used to select

f
records from a raw data file.

S f o r o
e
The subsetting IF can subset data that is in the PDV.

A
S No tsit d
75
ou
7.2 Using Nonstandard Delimited Data as Input 7-37

Missing Values in the Middle of the Record


Each record in phone2.csv has a contact name,
phone number, and a mobile number. The phone
number is missing from some of the records.

Missing data is indicated by


two consecutive delimiters.
phone2.csv
1 1 2 2 3 3 4 4
1---5----0----5----0----5----0----5----0----5

ia l
t r
James Kvarniq,(704) 293-8126,(701) 281-8923

e
Sandrina Stephano,, (919) 271-4592
Cornelia Krahl,(212) 891-3241,(212) 233-5413

M a
Karen Ballinger,, (714) 644-9090
Elke Wallstab,(910) 763-5561,(910) 545-3421

77

r y i o n
t a u t
r i
7.05 Quize r i b S
p t
Open and submit p107a01.

o dis SA
Examine the SAS log.

P r How many input records were read and how many

f
r
observations were created?

S f o
data contacts;

e o
A
S No tsit
length Name $ 20 Phone Mobile $ 14;

d
infile 'phone2.csv' dlm=',';
input Name $ Phone $ Mobile $;
run;

ou
proc print data=contacts noobs;
run;

79
7-38 Chapter 7 Reading Delimited Raw Data Files

Unexpected Results
The missing phone numbers caused unexpected results
in the output.
PROC PRINT Output
Name Phone Mobile

James Kvarniq (704) 293-8126 (701) 281-8923

l
Sandrina Stephano (919) 871-7830 Cornelia Krahl
Karen Ballinger (714) 344-4321 Elke Wallstab

Partial SAS Log

e r ia
t
NOTE: 5 records were read from the infile 'phone2.csv'.
The minimum record length was 31.

a
The maximum record length was 44.
NOTE: SAS went to a new line when INPUT statement reached

M
past the end of a line.
NOTE: The data set WORK.CONTACTS has 3 observations and 3

81
variables.

r y i o n
t a u t
r i e i b
Consecutive Delimiters in List Input
r S
p t
By default, list input treats two or more consecutive

o dis SA
delimiters as a single delimiter and not treated as a

P r missing value.

f
S f o r o The two consecutive commas are

e
not being read as a missing value.

A
phone2.csv

S No tsit d
1 1 2 2 3 3 4 4
1---5----0----5----0----5----0----5----0----5
James Kvarniq,(704) 293-8126,(701) 281-8923
Sandrina Stephano,, (919) 271-4592

ou
Cornelia Krahl,(212) 891-3241,(212) 233-5413
Karen Ballinger,, (714) 644-9090
Elke Wallstab,(910) 763-5561,(910) 545-3421

82
7.2 Using Nonstandard Delimited Data as Input 7-39

The DSD Option


The DSD option for the INFILE statement
sets the default delimiter to a comma
treats consecutive delimiters as missing values
enables SAS to read values with embedded delimiters
if the value is surrounded by quotation marks.

General form of a DSD option in an INFILE statement:

ia l
e r
INFILE 'raw-data-file-name' DSD;

t
M a
83

r y i o n
t a u t
r i e
Using the DSD Option
r i b S
p t
Adding the DSD option will correctly read the

o dis SA
phone2.csv data file.

P rdata contacts;

f
S f o r o
length Name $ 20 Phone Mobile $ 14;
infile 'phone2.csv' dsd;

e
input Name $ Phone $ Mobile $;

A
S No tsit
run;

d
proc print data=contacts noobs;
run;

ou
The DLM=',' option is no longer needed in the
INFILE statement because the DSD option sets
the default delimiter to a comma.
p107d05
84
7-40 Chapter 7 Reading Delimited Raw Data Files

Results
Adding the DSD option gives the expected results.
PROC PRINT Output
Name Phone Mobile

James Kvarniq (704) 293-8126 (701) 281-8923


Sandrina Stephano (919) 271-4592
Cornelia Krahl (212) 891-3241 (212) 233-5413
Karen Ballinger
Elke Wallstab (910) 763-5561
(714)
(910)
644-9090
545-3421

ia l
Partial SAS Log

t e r
NOTE: 5 records were read from the infile 'phone2.csv'.

a
The minimum record length was 31.
The maximum record length was 44.

M
NOTE: The data set WORK.CONTACTS has 5 observations and
3 variables.

85

r y i o n
t a u t
r i e i b
Missing Values at the End of a Record
r S
p
(Self-Study)

o dis SA t
The data values in phone.csv are separated

P r by commas. Each record has a contact name, and then

f
r
a phone number, and finally a mobile number.

S f o
phone.csv

e o The mobile number and

A
comma delimiter are missing

t
1 1 2 2
of the3lines3of data.
4 4

d
from some

S No tsi 1---5----0----5----0----5----0----5----0----5
James Kvarniq,(704) 293-8126,(701) 281-8923
Sandrina Stephano,(919) 871-7830

ou
Cornelia Krahl,(212) 891-3241,(212) 233-5413
Karen Ballinger,(714) 344-4321
Elke Wallstab,(910) 763-5561,(910) 545-3421

87
7.2 Using Nonstandard Delimited Data as Input 7-41

7.06 Quiz (Self-Study)


Open and submit p107a02. Examine the SAS log.
How many input records were read and how many
observations were created?

data contacts;

l
length Name $ 20 Phone Mobile $ 14;

ia
infile 'phone.csv' dsd;
input Name $ Phone $ Mobile $;
run;

t e
proc print data=contacts noobs;r
a
run;

y M n
89

a r t i o
i e t b u
p r t i
Unexpected Results (Self-Study)
r S
The missing mobile phone numbers caused unexpected

r o dis SA
results in the output.
PROC PRINT Output

S P o r
Name

James Kvarniq

o f Phone

(704) 293-8126
Mobile

(701) 281-8923

f
Sandrina Stephano (919) 871-7830 Cornelia Krahl

A
S No tsit d e
Karen Ballinger

Partial SAS Log


(714) 344-4321 Elke Wallstab

NOTE: 5 records were read from the infile 'phone.csv'.


The minimum record length was 31.

ou
The maximum record length was 44.
NOTE: SAS went to a new line when INPUT statement
reached past the end of a line.
NOTE: The data set WORK.CONTACTS has 3 observations and
3 variables.

91
7-42 Chapter 7 Reading Delimited Raw Data Files

Missing Values at the End of a Record


(Self-Study)
By default, when there is missing data at the end
of a row, SAS does the following:
loads the next record to finish the observation
writes a note to the log

ia l
t e r
M a
92

r y i o n
t a u t
r i e i b
The MISSOVER Option (Self-Study)
r S
p t
The MISSOVER option prevents SAS from loading a

o dis SA
new record when the end of the current record is reached.

P r f
r
General form of an INFILE statement with a MISSOVER

S
option:

f o e o
A
S No tsit
INFILE 'raw-data-file-name' MISSOVER;

d
If SAS reaches the end of the row without finding values

ou
for all fields, variables without values are set to missing.

93
7.2 Using Nonstandard Delimited Data as Input 7-43

7.07 Quiz (Self-Study)


Open p107a03 and add the MISSOVER option to
the INFILE statement. Submit the program and examine
the SAS log. How many input records were read and
how many observations were created?

data contacts;
length Name $ 20 Phone Mobile $ 14;
infile 'phone.csv' dsd;
input Name $ Phone $ Mobile $;
ia l
run;

t e
proc print data=contacts noobs;r
run;

M a
95

r y i o n
t a u t
r i e
Results (Self-Study)
r i b S
p t
Adding the MISSOVER option gives the expected results.

o dis SA
PROC PRINT Output

P r Name

f
Phone Mobile

r
James Kvarniq (704) 293-8126 (701) 281-8923

S f o
Cornelia Krahl

e o
Sandrina Stephano (919)
(212)
871-7830
891-3241 (212) 233-5413

A
Karen Ballinger (714) 344-4321

t
Elke Wallstab (910) 763-5561 (910) 545-3421

S No tsi d
Partial SAS Log
NOTE: 5 records were read from the infile 'phone.csv'.

ou
The minimum record length was 31.
The maximum record length was 44.
NOTE: The data set WORK.CONTACTS has 5 observations and
3 variables.

97
7-44 Chapter 7 Reading Delimited Raw Data Files

Exercises

Level 1

4. Reading a Comma-Delimited Raw Data File


a. Retrieve the starter program p107e04.

ia l
e r
b. Add the appropriate LENGTH, INFILE, and INPUT statements to read the comma-delimited

t
raw data file named the following:

Windows or UNIX

M a custca.csv

z/OS (OS/390)

r y
Partial Raw Data File
i o n .workshop.rawdata(custca)

t a u t
Bill,Cuddy,11171,M,16/10/1986,21,15-30 years

e
Susan,Krasowski,17023,F,09/07/1959,48,46-60 years

r i r i b
Andreas,Rennie,26148,M,18/07/1934,73,61-75 years

S
Lauren,Krasowski,46966,F,24/10/1986,21,15-30 years

t
p
Lauren,Marx,54655,F,18/08/1969,38,31-45 years

r o dis SA
The following variables should be read into the program data vector:

S P o r
Name

o
First f
Type

Character
Length

20

A t f d eLast Character 20

S No tsi ID

Gender
Numeric

Character
8

ou BirthDate

Age

AgeGroup
Numeric

Numeric

Character
8

12
7.2 Using Nonstandard Delimited Data as Input 7-45

c. Add a FORMAT statement and a DROP statement in the DATA step to create a data set that
resembles the following when used in the PROC PRINT step:
Partial PROC PRINT Output (First 5 of 15 Observations)
Birth
Obs First Last Gender AgeGroup Date

1 Bill Cuddy M 15-30 years OCT1986


2 Susan Krasowski F 46-60 years JUL1959

l
3 Andreas Rennie M 61-75 years JUL1934
4 Lauren Krasowski F 15-30 years OCT1986

ia
5 Lauren Marx F 31-45 years AUG1969

Level 2

t e r
a
5. Reading a Space-Delimited Raw Data File with Spaces in Data Values

M n
a. Write a DATA step to create a new data set named Work.us_customers by reading the space-

a r y
delimited raw data named the following:

Windows or UNIX
t i o custus.dat

e t
z/OS (OS/390)

i b u .workshop.rawdata(custus)

r r i S
Some of the data values contain spaces. Use an option in the INFILE statement to specify that

p t
when data values are enclosed in quotation marks, delimiters within the value are treated as part

r o dis SA
of the data value.

P
Partial Raw Data File

S r o f
"James Kvarniq" 4 M 27JUN1974 33 "31-45 years"

o
"Sandrina Stephano" 5 F 09JUL1979 28 "15-30 years"

A t f e
"Karen Ballinger" 10 F 18OCT1984 23 "15-30 years"
"David Black" 12 M 12APR1969 38 "31-45 years"

d
S No tsi
"Jimmie Evans" 17 M 17AUG1954 53 "46-60 years"

The following variables should be created in the data set Work.us_customers:

ou
Name Type Length
Name Character 20
ID Numeric 8
Gender Character 1
BirthDate Numeric 8
Age Numeric 8
AgeGroup Character 12

b. Add a FORMAT statement in the DATA step to make the BirthDate resemble a three-character
month with a four-digit year.
7-46 Chapter 7 Reading Delimited Raw Data Files

c. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 7 of 28 Observations)
Birth
Obs Name Gender Date AgeGroup Age

1 James Kvarniq M JUN1974 31-45 years 33


2 Sandrina Stephano F JUL1979 15-30 years 28
3 Karen Ballinger F OCT1984 15-30 years 23
4
5
6
David Black
Jimmie Evans
Tonie Asmussen
M
M
M
APR1969
AUG1954
FEB1954
31-45
46-60
46-60
years
years
years

ia l
38
53
53

Level 3
7

t e r
Michael Dineley M APR1959 46-60 years 48

M a
6. Reading Missing Values at the End of a Record

y o n
a. Write a DATA step to create a new data set named Work.prices by reading the asterisk-

r i
delimited raw data file named the following:

t a
Windows or UNIX
u t prices.dat

r i e i b
z/OS (OS/390)

r S
.workshop.rawdata(prices)

p
o dis SA t
Some of the records do not have a value for UnitSalesPrice and the last delimiter is
missing.

P r f
Documentation on the INFILE statement options can be found in the SAS Help

S f o r o
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements

A
S No tsit d e
Statements INFILE Statement).
Partial Raw Data File
210200100009*09JUN2007*31DEC9999*$15.50*$34.70
210200100017*24JAN2007*31DEC9999*$17.80

ou
210200200023*04JUL2007*31DEC9999*$8.25*$19.80
210200600067*27OCT2007*31DEC9999*$28.90
210200600085*28AUG2007*31DEC9999*$17.85*$39.40
7.2 Using Nonstandard Delimited Data as Input 7-47

The following variables should be read into the program data vector:

Name Type Length


ProductID Numeric 8
StartDate Numeric 8
EndDate Numeric 8
UnitCostPrice Numeric 8

ia l
UnitSalesPrice

t e r Numeric 8

b. Write a PROC PRINT step and add a LABEL and a FORMAT statement in the DATA step

M a
to create a data set that resembles the following when used in the PROC PRINT step:
Partial PROC PRINT Output (First 10 of 259 Observations)

r y i o nStart of End of Cost Price


Sales
Price per

a t
Obs Product ID Date Range Date Range per Unit Unit

i e
1
t b u
210200100009 06/09/2007 12/31/9999 15.50 34.70

i
2 210200100017 01/24/2007 12/31/9999 17.80 .

p r 3
4

t r
210200200023
210200600067

S
07/04/2007
10/27/2007
12/31/9999
12/31/9999
8.25
28.90
19.80
.

o dis SA
5 210200600085 08/28/2007 12/31/9999 17.85 39.40

r
6 210200600112 01/04/2007 12/31/9999 9.25 21.80
7 210200900033 09/17/2007 12/31/9999 6.45 14.20

S P o r
8
9
10

o f
210200900038
210201000050
210201000126
02/01/2007
04/02/2007
04/22/2007
12/31/9999
12/31/9999
12/31/9999
9.30
9.00
2.30
20.30
19.60
6.50

A t f d e
S No tsi
ou
7-48 Chapter 7 Reading Delimited Raw Data Files

7.3 Chapter Review

Chapter Review
1. What statement identifies the physical filename of the
raw data file to read?

2. What statement describes the arrangement of values

ia l
r
in the raw data file?

t e
3. What is the default delimiter when the DLM= option is
used?
a
M n
4. What are the two phases of DATA step processing?

y
a r t i o
5. What is a program data vector (PDV)?

i e t b u
i
100 continued...

p r t r S
r o dis SA
Chapter Review

S P r f
6. Why would you use a LENGTH statement?

o o
A t f d e
7. What is an instruction that SAS uses to read data
values into a variable?

S No tsi 8. When would you use a : modifier?

ou
101
7.4 Solutions 7-49

7.4 Solutions

Solutions to Exercises
1. Reading a Comma-Delimited Raw Data File
a. Retrieve the starter program.
b. Add the appropriate LENGTH, INFILE, and INPUT statements.
data work.NewEmployees;

ia l
r
length First $ 12 Last $ 18 Title $ 25;

e
infile 'newemps.csv' dlm=',';

run;

a t
input First $ Last $ Title $ Salary;

run;

y M
proc print data=work.NewEmployees;

n
r t i o
For z/OS (OS/390), the following INFILE statement is used:

a
i e t b u
infile '.workshop.rawdata(newemps)' dlm=',';

i
c. Submit the program.

p r t r S
2. Reading a Space-Delimited Raw Data File

r o dis SA
a. Write a DATA step.
data work.QtrDonation;

S P r
length IDNum $ 6;

o
infile 'donation.dat';

o f
f e
input IDNum $ Qtr1 Qtr2 Qtr3 Qtr4;

A run;

S No tsit d
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(donation)';

ou
b. Write a PROC PRINT step.
proc print data=work.QtrDonation;
run;
3. Using Column Input to Read a Fixed Column Raw Data File
a. Write a DATA step.
data work.supplier_info;
infile 'supplier.dat';
input ID 1-5 Name $ 8-37 Country $ 40-41;
run;
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(supplier)';
7-50 Chapter 7 Reading Delimited Raw Data Files

b. Write a PROC PRINT step.


proc print data=work.supplier_info;
run;
4. Reading a Comma-Delimited Raw Data File
a. Retrieve the starter program.
b. Add the appropriate LENGTH, INFILE, and INPUT statements.
data work.canada_customers;
length First Last $ 20 Gender $ 1 AgeGroup $ 12;
infile 'custca.csv' dlm=',';
ia l
input First $ Last $ ID Gender $

t e r
BirthDate :ddmmyy. Age AgeGroup $;

a
run;

run;

y M
proc print data=work.canada_customers;

n
r i o
For z/OS (OS/390), the following INFILE statement is used:

a t
t
infile '.workshop.rawdata(custca)' dlm=',';

i e b u
c. Add a FORMAT statement and a DROP statement.

i
r r
data work.canada_customers;

p t S
length First Last $ 20 Gender $ 1 AgeGroup $ 12;

o dis SA
infile 'custca.csv' dlm=',';

P r
input First $ Last $ ID Gender $

f
BirthDate :ddmmyy. Age AgeGroup $;

S o r
format BirthDate monyy7.;
drop ID Age;

f o
e
run;

A
S No tsit d
5. Reading a Space-Delimited Raw Data File with Spaces in Data Values
a. Write a DATA step.

ou
data work.us_customers;
length Name $ 20 Gender $ 1 AgeGroup $ 12;
infile 'custus.dat' dlm=' ' dsd;
input Name $ ID Gender $ BirthDate :date.
Age AgeGroup $;
run;
For z/OS (OS/390), the following INFILE statement is used:
infile '.workshop.rawdata(custus)' dlm=' ' dsd;
7.4 Solutions 7-51

b. Add a FORMAT statement.


data work.us_customers;
length Name $ 20 Gender $ 1 AgeGroup $ 12;
infile 'custus.dat' dlm=' ' dsd;
input Name $ ID Gender $ BirthDate :date.
Age AgeGroup $;
format BirthDate monyy7.;
run;
c. Write a PROC PRINT step.
proc print data=work.us_customers;
ia l
run;

t e r
var Name Gender BirthDate AgeGroup Age;

a. Write a DATA step.


M a
6. Reading Missing Values at the End of a Record

data work.prices;

r y i o n
infile 'prices.dat' dlm='*' missover;

t a u t
input ProductID StartDate :date. EndDate :date.
UnitCostPrice :dollar. UnitSalesPrice :dollar.;
run;

r i e r i b S
t
For z/OS (OS/390), the following INFILE statement is used:

p
o dis SA
infile '.workshop.rawdata(prices)' dlm='*' missover;

r
b. Write a PROC PRINT step and add a LABEL and a FORMAT statement in the DATA step.

S P r
data work.prices;

o f
infile 'prices.dat' dlm='*' missover;

o
f
input ProductID StartDate :date. EndDate :date.

A
S No tsit d e
UnitCostPrice :dollar. UnitSalesPrice :dollar.;
label ProductID='Product ID'
StartDate='Start of Date Range'
EndDate='End of Date Range'

ou
UnitCostPrice='Cost Price per Unit'
UnitSalesPrice='Sales Price per Unit';
format StartDate EndDate mmddyy10.
UnitCostPrice UnitSalesPrice 8.2;
run;

proc print data=work.prices label;


run;
7-52 Chapter 7 Reading Delimited Raw Data Files

Solutions to Student Activities (Polls/Quizzes)

7.02 Multiple Choice Poll – Correct Answer


Which statement is true?
a. An input buffer is only created if you are reading data
from a raw data file.
b. The PDV at compile time holds the variable name,

ia l
t r
type, byte size, and initial value.

e
c. The descriptor portion is the first item that is created
at compile time.

M a
r y i o n
t a u t
34

r i e r i b S
p
o dis SA t
P r 7.03 Multiple Choice Poll – Correct Answer

f
Which statement is true?

S o r o
a. Data is read directly from the raw data file to the PDV.

f
A
S No tsit d e
b. At the bottom of the DATA step, the contents of the
PDV are output to the output SAS data set.
c. When SAS returns to the top of the DATA step, any
variable coming from a SAS data set is set to

ou
missing.

50
7.4 Solutions 7-53

7.04 Quiz – Correct Answer


Which INPUT statement correctly uses list input to read
the space-delimited raw data file?

Raw Data
Donny 5MAY2008 25 FL $43,132.50
Margaret 20FEB2008 43 NC 65,150

a. input name $ hired date. age

ia l
state $ salary comma.;

t e r
a
b. input name $ hired :date. age
state $ salary :comma.;

M
70

r y i o n
t a u t
r i e i b
7.05 Quiz – Correct Answer
r S
p t
Open and submit p107a01.

o dis SA
Examine the SAS log.

P r How many input records were read and how many

f
r
observations were created?

S f o e o
A
Five records were read from the input file and three

S No tsit d
observations were created.

80
ou
7-54 Chapter 7 Reading Delimited Raw Data Files

7.06 Quiz – Correct Answer (Self-Study)


Open and submit p107a02. Examine the SAS log.
How many input records were read and how many
observations were created?

Five records were read from the input file, and three

l
observations were created.

e r ia
a t
y M n
90

a r t i o
i e t b u
p r t i
7.07 Quiz – Correct Answer (Self-Study)
r S
Open p107a03 and add the MISSOVER option to

r o dis SA
the INFILE statement. Submit the program and examine
the SAS log. How many input records were read and

S P o
data contacts;
o f
how many observations were created?

r
A t f e
length Name $ 20 Phone Mobile $ 14;

d
infile 'phone.csv' dsd missover;

S No tsi input Name $ Phone $ Mobile $;


run;

ou
proc print data=contacts noobs;
run;

Five input records were read and five observations


created.
p107a03s
96
7.4 Solutions 7-55

Solutions to Chapter Review

Chapter Review Answers


1. What statement identifies the physical filename of the
raw data file to read?
an INFILE statement

ia l
in the raw data file?
an INPUT statement
t r
2. What statement describes the arrangement of values

e
M a
3. What is the default delimiter when the DLM= option is
used?

r y
a blank delimiter

i o n
t a u t
102

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
4. What are the two phases of DATA step processing?

S f o r
compilation
o
e
execution

A
S No tsit d
5. What is a program data vector (PDV)?
A logical area in memory where SAS holds the

ou
current observation

6. Why would you use a LENGTH statement?


A LENGTH statement is used to define a variable
length when the default length is not adequate.

103 continued...
7-56 Chapter 7 Reading Delimited Raw Data Files

Chapter Review Answers


7. What is an instruction that SAS uses to read data
values into a variable?
an informat

8. When would you use a : modifier?


You use a : modifier with nonstandard raw data
that requires list input and an informat.

ia l
t e r
M a
104

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 8 Validating and Cleaning
Data

8.1 Introduction to Validating and Cleaning Data .............................................................. 8-3

ia l
8.2

t e r
Examining Data Errors When Reading Raw Data Files ............................................... 8-9
Demonstration: Examining Data Errors ............................................................................... 8-14

8.3
M a
Validating Data with the PRINT and FREQ Procedures ............................................ 8-19

y o n
Exercises .............................................................................................................................. 8-30

r i
8.4
t a u t
Validating Data with the MEANS and UNIVARIATE Procedures ............................... 8-33

r i e i b
Exercises .............................................................................................................................. 8-38

r S
8.5
p
o dis SA t
Cleaning Invalid Data ................................................................................................... 8-41

P r Demonstration: Using the Viewtable Window to Clean Data – Windows (Self-Study) ........ 8-44

f
S f o r o
Demonstration: Using the Viewtable Window to Clean Data – UNIX (Self-Study) .............. 8-47

e
Demonstration: Using the FSEDIT Window to Clean Data – z/OS (OS/390)

A
S No tsit d
(Self-Study) ................................................................................................. 8-49

Exercises .............................................................................................................................. 8-64

ou
8.6 Chapter Review ............................................................................................................. 8-66

8.7 Solutions ....................................................................................................................... 8-67


Solutions to Exercises .......................................................................................................... 8-67

Solutions to Student Activities (Polls/Quizzes) ..................................................................... 8-73

Solutions to Chapter Review ................................................................................................ 8-77


8-2 Chapter 8 Validating and Cleaning Data

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.1 Introduction to Validating and Cleaning Data 8-3

8.1 Introduction to Validating and Cleaning Data

Objectives
Define data errors in a raw data file.
Identify procedures for validating data.
Identify techniques for cleaning data.
Define the business scenario that will be used

ia l
t e r
with validating and cleaning data.

M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rBusiness Scenario

f
r
A delimited raw data file containing information on Orion

S f o o
Star non-sales employees from Australia and the United
States needs to be read to create a data set.

e
A
S No tsit d
Requirements of non-sales employee data:
Employee_ID, Salary, Birth_Date, and

ou
Hire_Date must be numeric variables.
First, Last, Gender, Job_Title, and
Country must be character variables.

4
8-4 Chapter 8 Validating and Cleaning Data

8.01 Quiz
What problems will SAS have reading the numeric data
Salary and Hire_Date?

Partial nonsales.csv
120101,Patrick,Lu,M,163040,Director,AU,18AUG1976,01JUL2003
120104,Kareen,Billington,F,46230,Administration Manager,au,11MAY1954,01JAN1981
120105,Liz,Povey,F,27110,Secretary I,AU,21DEC1974,01MAY1999
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974
120107,Sherie,Sheedy,F,30475,Office Assistant III,AU,01FEB1978,21JAN1953

ia l
r
120108,Gladys,Gromek,F,27660,Warehouse Assistant II,AU,23FEB1984,01AUG2006
120108,Gabriele,Baker,F,26495,Warehouse Assistant I,AU,15DEC1986,01OCT2006

t e
120110,Dennis,Entwisle,M,28615,Warehouse Assistant III,AU,20NOV1949,01NOV1979
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978

a
120112,Ellis,Glattback,F,26550, ,AU,17FEB1969,01JUL1990
120113,Riu,Horsey,F,26870,Security Guard II,AU,10MAY1944,01JAN1974

M
120114,Jeannette,Buddery,G,31285,Security Manager,AU,08FEB1944,01JAN1974

n
120115,Hugh,Nichollas,M,2650,Service Assistant I,AU,08MAY1984,01AUG2005
.,Austen,Ralston,M,29250,Service Assistant II,AU,13JUN1959,01FEB1980

a y t i o
120117,Bill,Mccleary,M,31670,Cabinet Maker III,AU,11SEP1964,01APR1986

r
i e t b u
p r
Data Errors

t r i S
Data errors occur when data values are not appropriate

r o dis SA
for the SAS statements that are specified in a program.
SAS detects data errors during program execution.

S P o r o f
When a data error is detected, SAS continues to
execute the program.

A t
RULE:
f d e
NOTE: Invalid data for Salary in line 4 23-29.
----+----1----+----2----+----3----+----4----+----5----+----6

S No tsi
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

ou
NOTE: Invalid data for Hire_Date in line 9A63-71.
data error example is
9 120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL194
61 9,99NOV1978 71 defining a variable as
numeric,
Employee_ID=120111 First=Ubaldo Last=Spillane but theSalary=26895
Gender=M data value
is actually character.
Job_Title=Security Guard II Country=AU Birth_Date=23/07/1949
Hire_Date=. _ERROR_=1 _N_=9
8
8.1 Introduction to Validating and Cleaning Data 8-5

Business Scenario
Additional requirements of non-sales employee data:
Employee_ID must be unique and not missing.
Gender must have a value of F or M.
Salary must be in the numeric range of 24000 –

l
500000.

ia
Job_Title must not be missing.

t e r
Country must have a value of AU or US.
Birth_Date value must occur before

a
Hire_Date value.
Hire_Date must have a value of 01/01/1974 or
later.

y M n
9

a r t i o
i e t b u
p r
8.02 Quiz

t r i S
What problems exist with the data in this partial data set?

r o dis SA
S P o r o f
A t f d e
S No tsi
11
ou
Hint: There are nine data problems.
8-6 Chapter 8 Validating and Cleaning Data

Validating the Data


In general, SAS procedures analyze data, produce output,
or manage SAS files.

In addition, SAS procedures can be used to detect invalid


data.

SAS procedures
for validating data
ia l
PROC

t e r PROC

a
PRINT UNIVARIATE

y M PROC
FREQ

n
PROC
MEANS
13

a r t i o
i e t b u
p r t i
The PRINT Procedure
r S
The PRINT procedure can show the job titles that are

r o dis SA
missing and the hire dates that occur before the birth
dates.

S P Obs

o r
Employee_
ID

o f Job_Title Birth_Date Hire_Date

A t f5
9
10
e
120107
120111

d
120112
Office Assistant III
Security Guard II
01/02/1978
23/07/1949
17/02/1969
21/01/1953
.
01/07/1990

S No tsi
14
ou p108d01
8.1 Introduction to Validating and Cleaning Data 8-7

The FREQ Procedure


The FREQ procedure can show if any genders are not
F or M and if any countries are not AU or US.
The FREQ Procedure

Cumulative Cumulative
Gender Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

l
F 110 47.01 110 47.01
G 1 0.43 111 47.44

ia
M 123 52.56 234 100.00

t r
Frequency Missing = 1

e Cumulative Cumulative

a
Country Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

M
AU 33 14.04 33 14.04

n
US 196 83.40 229 97.45
au 3 1.28 232 98.72

15
us

a r y 3

t i o
1.28 235 100.00

i e t b u
p r
The MEANS Procedure

t r i S
The MEANS procedure can show if any salaries are not

r o dis SA
in the range of 24000 to 500000.

P
The MEANS Procedure

S o r o f
Analysis Variable : Salary

f
N

A
S No tsit
N

234

d eMiss Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 2401.00 433800.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

16
ou p108d01
8-8 Chapter 8 Validating and Cleaning Data

The UNIVARIATE Procedure


The UNIVARIATE procedure can show if any salaries
are not in the range of 24000 to 500000.

Partial PROC UNIVARIATE Output


The UNIVARIATE Procedure

l
Variable: Salary

ia
Extreme Observations

r
-----Lowest---- -----Highest----

Value

2401
2650

a t
Obs

20
13
e Value

163040
194885
Obs

1
231

M
24025 25 207885 28
24100 19 268455 29

17
24390

r y
228

i o n 433800 27

p108d01

t a u t
r i e
Cleaning the Data
r i b S
p t
After the data is validated, the invalid data needs to be

o dis SA
cleaned.

P r f
r
Techniques for cleaning data:

S f o o
Editing raw data file outside of SAS

e
A
Interactively editing data set using VIEWTABLE

S No tsit d
Programmatically editing data set using the DATA step
Programmatically editing data set using the SQL
procedure

ou
Using the SAS DataFlux product dfPower Studio

18

Ideally, invalid data should be cleaned in the original data source and not in the SAS data set.
Integrity constraints can be placed on a data set to eliminate the possibility of invalid data in the data set.
Integrity constraints are a set of data validation rules that you can specify in order to restrict the data
values that can be stored for a variable in a SAS data file. Integrity constraints help you preserve the
validity and consistency of your data. SAS enforces the integrity constraints when the values associated
with an integrity constraint variable are added, updated, or deleted.
Section 8.5 addresses the situation where you have no choice but to correct the data in the SAS data set.
8.2 Examining Data Errors When Reading Raw Data Files 8-9

8.2 Examining Data Errors When Reading Raw Data Files

Objectives
Identify data errors.
Demonstrate what happens when a data error is
encountered.
Direct the observations with data errors to a different

ia l
r
data set than the observations without data errors.

e
(Self-Study)

a t
y M n
a r t i o
i e t b u
i
21

p r t r S
r o dis SA
Business Scenario

S P o r o f
A delimited raw data file containing information on Orion
Star non-sales employees from Australia and the United

f
States needs to be read to create a data set.

A
S No tsit d e
Requirements of non-sales employee data:
Employee_ID, Salary, Birth_Date,

ou
and Hire_Date must be numeric variables.
First, Last, Gender, Job_Title, and
Country must be character variables.

22
8-10 Chapter 8 Validating and Cleaning Data

8.03 Multiple Choice Poll


Which statements are used to read a delimited
raw data file and create a SAS data set?
a. DATA and SET only
b. DATA and INFILE only
c. DATA, SET, and INPUT only
d. DATA, INFILE, and INPUT only

ia l
t e r
M a
24

r y i o n
t a u t
r i e
Data Errors
r i b S
p t
One type of data error is when the INPUT statement

o dis SA
encounters invalid data in a field.

P r When SAS encounters a data error, these events occur:

f
r
A note that describes the error is printed in the

S f o
SAS log.

e o
The input record (contents of the input buffer) being

A
S No tsit d
read is displayed in the SAS log.
The values in the SAS observation (contents of the
PDV) being created are displayed in the SAS log.

ou
A missing value is assigned to the appropriate
SAS variable.
Execution continues.

26
8.2 Examining Data Errors When Reading Raw Data Files 8-11

Data Errors
A note that describes the error is printed in the
SAS log.

Partial SAS Log


NOTE: Invalid data for Salary in line 4 23-29.

l
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19

ia
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.

r
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

a t e
This note indicates that invalid data was found for variable
Salary in line 4 of the raw data file in columns 23-29.

y M n
27

a r t i o
i e t b u
p r
Data Errors

t r i S
The input record (contents of the input buffer) being read

r o dis SA
is displayed in the SAS log.

S P o r
Partial SAS Log

o f
NOTE: Invalid data for Salary in line 4 23-29.

f
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6

A
4

S No tsit d e
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

ou
A ruler is drawn above the raw data record that contains
the invalid data.

28
8-12 Chapter 8 Validating and Cleaning Data

Data Errors
The values in the SAS observation (contents of the PDV)
being created are displayed in the SAS log.

Partial SAS Log


NOTE: Invalid data for Salary in line 4 23-29.

l
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19

ia
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.

r
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

a t e
y M n
29

a r t i o
i e t b u
p r
Data Errors

t r i S
A missing value is assigned to the appropriate

r o dis SA
SAS variable.

S P o r
Partial SAS Log

o f
NOTE: Invalid data for Salary in line 4 23-29.

f
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6

A
4

S No tsit d e
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

30
ou
8.2 Examining Data Errors When Reading Raw Data Files 8-13

Data Errors
During the processing of every DATA step, SAS
automatically creates the following temporary variables:
the _N_ variable, which counts the number of times
the DATA step begins to iterate
the _ERROR_ variable, which signals the occurrence
of an error caused by the data during execution
0 indicates that no errors exist.

ia l
RULE:

t r
1 indicates that one or more errors occurred.

e
NOTE: Invalid data for Salary in line 4 23-29.
----+----1----+----2----+----3----+----4----+----5----+----6
4
61 44,01JAN1974 72

M a
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19

Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.

n
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944

y
Hire_Date=01/01/1974 _ERROR_=1 _N_=4

31

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8-14 Chapter 8 Validating and Cleaning Data

Examining Data Errors


p108d02
Submit the following program and review the results in the Output and Log windows.
data work.nonsales;

l
length Employee_ID 8 First $ 12 Last $ 18
Gender $ 1 Salary 8 Job_Title $ 25 Country $ 2

ia
Birth_Date Hire_Date 8;
infile 'nonsales.csv' dlm=',';

e
input Employee_ID First $ Last $

t r
Gender $ Salary Job_Title $ Country $

a
Birth_Date :date9.
Hire_Date :date9.;

M n
format Birth_Date Hire_Date ddmmyy10.;

y
run;

a r
proc print data=work.nonsales;
t i o
run;

i e t b u
p r t r i
For z/OS (OS/390), the following INFILE statement is used:

S
infile '.workshop.rawdata(nonsales)' dlm=',';

r o dis SA
Partial PROC PRINT Output
E

S P m
p

o
l
r o f J
o
B
i
r
H
i

A t f o
y

d e G S
b
_
C
o
t
h
r
e

S No tsi
O
e
e
_
F
i
r
L
a
e
n
d
a
l
a
T
i
t
u
n
t
_
D
a
_
D
a

ou
b I s s e r l r t t
s D t t r y e y e e

1 120101 Patrick Lu M 163040 Director AU 18/08/1976 01/07/2003


2 120104 Kareen Billington F 46230 Administration Manager au 11/05/1954 01/01/1981
3 120105 Liz Povey F 27110 Secretary I AU 21/12/1974 01/05/1999
4 120106 John Hornsey M . Office Assistant II AU 23/12/1944 01/01/1974
5 120107 Sherie Sheedy F 30475 Office Assistant III AU 01/02/1978 21/01/1953
6 120108 Gladys Gromek F 27660 Warehouse Assistant II AU 23/02/1984 01/08/2006
7 120108 Gabriele Baker F 26495 Warehouse Assistant I AU 15/12/1986 01/10/2006
8 120110 Dennis Entwisle M 28615 Warehouse Assistant III AU 20/11/1949 01/11/1979
9 120111 Ubaldo Spillane M 26895 Security Guard II AU 23/07/1949 .
10 120112 Ellis Glattback F 26550 AU 17/02/1969 01/07/1990
11 120113 Riu Horsey F 26870 Security Guard II AU 10/05/1944 01/01/1974
12 120114 Jeannette Buddery G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 Hugh Nichollas M 2650 Service Assistant I AU 08/05/1984 01/08/2005
14 . Austen Ralston M 29250 Service Assistant II AU 13/06/1959 01/02/1980
15 120117 Bill Mccleary M 31670 Cabinet Maker III AU 11/09/1964 01/04/1986
8.2 Examining Data Errors When Reading Raw Data Files 8-15

Partial SAS Log


181 data work.nonsales;
182 length Employee_ID 8 First $ 12 Last $ 18
183 Gender $ 1 Salary 8 Job_Title $ 25 Country $ 2
184 Birth_Date Hire_Date 8;
185 infile 'nonsales.csv' dlm=',';
186 input Employee_ID First $ Last $
187 Gender $ Salary Job_Title $ Country $
188 Birth_Date :date9.

l
189 Hire_Date :date9.;

ia
190 format Birth_Date Hire_Date ddmmyy10.;
191 run;

e
NOTE: The infile 'nonsales.csv' is:

t r
File Name=s:\workshop\nonsales.csv,

a
RECFM=V,LRECL=256

RULE:
M
NOTE: Invalid data for Salary in line 4 23-29.

n
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+

y o
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974 72

a r t i
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=. Job_Title=Office Assistant II
Country=AU Birth_Date=23/12/1944 Hire_Date=01/01/1974 _ERROR_=1 _N_=4

t
9

i e b u
NOTE: Invalid data for Hire_Date in line 9 63-71.
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978 71

i
r
Employee_ID=120111 First=Ubaldo Last=Spillane Gender=M Salary=26895 Job_Title=Security Guard II

p t r S
Country=AU Birth_Date=23/07/1949 Hire_Date=. _ERROR_=1 _N_=9
NOTE: 235 records were read from the infile 'nonsales.csv'.

o dis SA
The minimum record length was 55.

P r The maximum record length was 82.

f
NOTE: The data set WORK.NONSALES has 235 observations and 9 variables.

S f o r o
A
S No tsit d e
ou
8-16 Chapter 8 Validating and Cleaning Data

Setup for the Poll


Submit program p108a01.
Determine the reason for the invalid data that appears
in the SAS log.

ia l
t e r
M a
r y i o n
34

t a u t
r i e r i b
8.04 Multiple Choice Poll
S
p
o dis SA t
Which statement best describes the invalid data?

P r f
a. The data in the raw data file is bad.

S f o r o
b. The programmer incorrectly read the data.

A
S No tsit d e
ou
35
8.2 Examining Data Errors When Reading Raw Data Files 8-17

Outputting to Multiple Data Sets (Self-Study)


The DATA statement can specify multiple output data sets.
data work.baddata work.gooddata;
length Employee_ID 8 First $ 12 Last $ 18
Gender $ 1 Salary 8 Job_Title $ 25
Country $ 2 Birth_Date Hire_Date 8;
infile 'nonsales.csv' dlm=',';

l
input Employee_ID First $ Last $
Gender $ Salary Job_Title $ Country $

ia
Birth_Date :date9.

r
Hire_Date :date9.;
format Birth_Date Hire_Date ddmmyy10.;

else output work.gooddata;


run;

a e
if _error_=1 then output work.baddata;

t
y M n
38

a r t i o p108d03

i e t b u
The raw data filename specified in the INFILE statement must be specific to your operating environment.

p r t r i S
Outputting to Multiple Data Sets (Self-Study)

r o dis SA
An OUTPUT statement can be used in a conditional

P f
statement to write the current observation to a specific

S f o r
data set that is listed in the DATA statement.

o
data work.baddata work.gooddata;

A
S No tsi e
length Employee_ID 8 First $ 12 Last $ 18

t d
Gender $ 1 Salary 8 Job_Title $ 25
Country $ 2 Birth_Date Hire_Date 8;
infile 'nonsales.csv' dlm=',';
input Employee_ID First $ Last $
Gender $ Salary Job_Title $ Country $

ou
Birth_Date :date9.
Hire_Date :date9.;
format Birth_Date Hire_Date ddmmyy10.;
if _error_=1 then output work.baddata;
else output work.gooddata;
run;

p108d03
39
8-18 Chapter 8 Validating and Cleaning Data

Outputting to Multiple Data Sets (Self-Study)


Partial SAS Log
NOTE: Invalid data for Salary in line 4 23-29.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6
4 120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC19
61 44,01JAN1974 72
Employee_ID=120106 First=John Last=Hornsey Gender=M Salary=.
Job_Title=Office Assistant II Country=AU Birth_Date=23/12/1944

l
Hire_Date=01/01/1974 _ERROR_=1 _N_=4
NOTE: Invalid data for Hire_Date in line 9 63-71.

ia
9 120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL194
61 9,99NOV1978 71

r
Employee_ID=120111 First=Ubaldo Last=Spillane Gender=M Salary=26895

e
Job_Title=Security Guard II Country=AU Birth_Date=23/07/1949

t
Hire_Date=. _ERROR_=1 _N_=9
NOTE: 235 records were read from the infile

a
's:\workshop\nonsales.csv'.
The minimum record length was 55.

M
The maximum record length was 82.
NOTE: The data set WORK.BADDATA has 2 observations and 9 variables.

40

r i o n
NOTE: The data set WORK.GOODDATA has 233 observations and 9 variables.

y
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.3 Validating Data with the PRINT and FREQ Procedures 8-19

8.3 Validating Data with the PRINT and FREQ Procedures

Objectives
Validate data by using the PRINT procedure with the
WHERE statement.
Validate data by using the FREQ procedure with the
TABLES statement.

ia l
t e r
M a
r y i o n
t a u t
42

r i e r i b S
p
o dis SA t
P r Business Scenario

f
r
Additional requirements of non-sales employee data:

S f o o
Employee_ID must be unique and not missing.

e
A
Gender must have a value of F or M.

S No tsit d
Salary must be in the numeric range of 24000 –
500000.
Job_Title must not be missing.

ou
Country must have a value of AU or US.
Birth_Date value must occur before
Hire_Date value.
Hire_Date must have a value of 01/01/1974 or
later.

43
8-20 Chapter 8 Validating and Cleaning Data

SAS Procedures for Validating Data


SAS procedures can be used to detect invalid data.

PROC PRINT step with detects invalid character and numeric


values by subsetting observations
VAR and WHERE statements based on conditions.

PROC FREQ step with detects invalid character and numeric


TABLES statement values by looking at distinct values.

ia l
PROC MEANS step with
VAR statement

t e r
detects invalid numeric values by using
summary statistics.

VAR statement

M a
PROC UNIVARIATE step with detects invalid numeric values by
looking at extreme values.

44

r y i o n
t a u t
r i e i
The PRINT Procedure
r b S
p t
The PRINT procedure produces detail reports based

o dis SA
on SAS data sets.

P r f
r
General form of the PRINT procedure:

S f o o
PROC PRINT DATA=SAS-data-set ;

e
A
S No tsit
VAR variable(s) ;

d
WHERE where-expression ;
RUN;

ou
The VAR statement selects variables to include in
the report and determines their order in the report.
The WHERE statement is used to obtain a subset
of observations.
45
8.3 Validating Data with the PRINT and FREQ Procedures 8-21

The WHERE Statement


For validating data, the WHERE statement is used
to retrieve the observations that do not meet the data
requirements.
General form of the WHERE statement:

WHERE where-expression ;

The where-expression is a sequence of operands and

ia l
r
operators that form a set of instructions that define a

e
condition for selecting observations.

t
Operands include constants and variables.

M a
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.

46

r y i o n
t a u t
r i e
The WHERE Statement
r i b S
p t
The following PROC PRINT step retrieves observations

o dis SA
that have missing values for Job_Title.

P r f
r
proc print data=orion.nonsales;

S f o o
var Employee_ID Last Job_Title;
where Job_Title = ' ';

e
A
run;

S No tsit d
Obs
Employee_
ID Last
Job_
Title

ou
10 120112 Glattback

p108d04
47
8-22 Chapter 8 Validating and Cleaning Data

The WHERE Statement


A WHERE statement might need to reference
a SAS date value.

For example, the PRINT procedure needs to retrieve


observations that have values of Hire_Date less
than January 1, 1974.

ia l
t e r
What is the numeric SAS date value for January 1, 1974?

a
A SAS date constant is used to convert a calendar date
to a SAS date value.

y M n
48

a r t i o
i e t b u
p r
SAS Date Constant

t r i S
To write a SAS date constant, enclose a date in quotation

r o dis SA
marks in the form ddMMMyyyy and immediately follow the
final quotation mark with the letter d.

S P dd

o r o f
is a one- or two-digit value for the day.

f
MMM is a three-letter abbreviation for the month.

A
S No tsit yyyy
d
d e
is a four-digit value for the year.
is required to convert the quoted string to a SAS date.

ou
Example:
The date constant for January 1, 1974, is '01JAN1974'd .

49
8.3 Validating Data with the PRINT and FREQ Procedures 8-23

SAS Date Constant


The following PROC PRINT step retrieves observations
that have values of Hire_Date that are less than
January 1, 1974.
proc print data=orion.nonsales;
var Employee_ID Birth_Date Hire_Date;

l
where Hire_Date < '01JAN1974'd;
run;

Obs
Employee_
ID

e r
Birth_Date Hire_Date ia
5
9
214
120107
120111
121011
a t
01/02/1978
23/07/1949
11/03/1944
21/01/1953
.
01/01/1968

y M n
50

a r t i o p108d04

i e t b u
p r t i
8.05 Multiple Choice Poll
r S
Which data requirement cannot be achieved with

r o dis SA
the PRINT procedure using a WHERE statement?

S P a.
b.

o r o f
Employee_ID must be unique and not missing.
Gender must have a value of F or M.

A
c.

t
d.
f d e
Salary must be in the numeric range of 24000 – 500000.
Job_Title must not be missing.

S No tsie.
f.
Country must have a value of AU or US.
Birth_Date value must occur before Hire_Date value.

ou
g. Hire_Date must have a value of 01/01/1974 or later.

52
8-24 Chapter 8 Validating and Cleaning Data

Data Requirements
Data Requirement where-expression to obtain invalid data
Employee_ID must be Does not account
Employee_ID = .
unique and not missing. for uniqueness.
Gender must have a
Gender not in ('F','M')
value of F or M.

l
Salary must be in the
Salary not between 24000 and 500000
range of 24000 – 500000.
Job_Title must not be
missing.

r
Job_Title = ' '

e ia
t
Country must have a
Country not in ('AU','US')
value of AU or US.
Birth_Date must occur
before Hire_Date.

M a Birth_Date > Hire_Date

54
Hire_Date must have a

r y i o
value of 01/01/1974 or later.
n
Hire_Date < '01JAN1974'd

t a u t
r i e i
Data Requirements
r b S
p t
The following PROC PRINT step accounts for all of the data

o dis SA
requirements except the Employee_ID being unique.

P r
proc print data=orion.nonsales;

f
r
var Employee_ID Gender Salary Job_Title

S f o o
Country Birth_Date Hire_Date;
where Employee_ID = . or

e
A
Gender not in ('F','M') or

S No tsit d
Salary not between 24000 and 500000 or
Job_Title = ' ' or
Country not in ('AU','US') or
Birth_Date > Hire_Date or

ou
Hire_Date < '01JAN1974'd;
run;
The OR operator is used between expressions. Only one
expression needs to be true to account for an observation
with invalid data. p108d04
55
8.3 Validating Data with the PRINT and FREQ Procedures 8-25

Data Requirements
Sixteen observations need the data cleaned.
Obs Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date

2 120104 F 46230 Administration Manager au 11/05/1954 01/01/1981


4 120106 M . Office Assistant II AU 23/12/1944 01/01/1974
5 120107 F 30475 Office Assistant III AU 01/02/1978 21/01/1953
9 120111 M 26895 Security Guard II AU 23/07/1949 .
10 120112 F 26550 AU 17/02/1969 01/07/1990

l
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005

ia
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003

r
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976

e
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974

t
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994

a
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968

y M n
56

a r t i o
i e t b u
p r
The FREQ Procedure

t r i S
The FREQ procedure produces one-way to n-way

r o dis SA
frequency tables.

S P o o f
General form of the FREQ procedure:

r PROC FREQ DATA=SAS-data-set <NLEVELS>;

A t f e
TABLES variable(s);

d
S No tsi
RUN;

The TABLES statement specifies the frequency tables

ou
to produce.
The NLEVELS option displays a table that provides
the number of distinct values for each variable named
in the TABLES statement.
57
8-26 Chapter 8 Validating and Cleaning Data

The FREQ Procedure


The following PROC FREQ step will show whether there
are any invalid values for Gender and Country.

proc freq data=orion.nonsales;


tables Gender Country;
run;

Without the TABLES statement, PROC FREQ

ia l
t e r
produces a frequency table for each variable.

M a
58

r y i o n p108d05

t a u t
r i e
The FREQ Procedure
r i b S
p t
Two observations need the data cleaned for Gender and

o dis SA
six observations need the data cleaned for Country.

P r f
The FREQ Procedure

S f o r
Gender

o
Frequency Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

A
S No tsit
F
G
M

d e 110
1
123
47.01
0.43
52.56

Frequency Missing = 1
110
111
234
47.01
47.44
100.00

ou
Cumulative Cumulative
Country Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 33 14.04 33 14.04
US 196 83.40 229 97.45
au 3 1.28 232 98.72
us 3 1.28 235 100.00
59

If a format is permanently assigned to a variable, PROC FREQ automatically groups


the report by the formatted values.
8.3 Validating Data with the PRINT and FREQ Procedures 8-27

The FREQ Procedure


This PROC FREQ step will show whether there are any
duplicates for Employee_ID.

proc freq data=orion.nonsales;


tables Employee_ID;
run;

ia l
t e r
M a
60

r y i o n p108d05

t a u t
r i e
The FREQ Procedure
r i b S
p t
Partial PROC FREQ Output

o dis SA
r
The FREQ Procedure

P f
Cumulative Cumulative

r
Employee_ID Frequency Percent Frequency Percent

o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
The FREQ Procedure

S o
120101 1 0.43 1
Cumulative 0.43
Cumulative

f
120104 1 0.43 Percent 2 0.86 Percent

e
Employee_ID Frequency Frequency

A
120105 1 0.43 3 1.29
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t d
120106
121130 1 1 0.43
0.43 4
224 1.72
96.14

S No tsi
120107 121131 1 1 0.43 0.43 5 225 2.15 96.57
120108 121132 2 1 0.85 0.43 7 226 2.99 97.00
120110 121133 1 1 0.43 0.43 8 227 3.43 97.42
120111 121134 1 1 0.43 0.43 9 228 3.86 97.85
120112 121141 1 1 0.43 10 229 4.29 98.28

ou
0.43
120113 121142 1 1 0.43 0.43 11 230 4.72 98.71
121146 1 0.43 232 99.14
121147 1 0.43 233 99.57
121148 1 0.43 234 100.00

Frequency Missing = 1
61
8-28 Chapter 8 Validating and Cleaning Data

The NLEVELS Option


If the number of desired distinct values is known, the
NLEVELS option can help to determine whether there are
any duplicates.
proc freq data=orion.nonsales nlevels;
tables Gender Country Employee_ID;

l
run;

r
The NLEVELS option displays a table that provides the

e
number of distinct values for each variable named in the ia
TABLES statement.

a t
y M n
62

a r t i o p108d05

i e t
TABLES statement.

b u
To display the number of levels without displaying the frequency counts, add the NOPRINT option to the

r r i S
proc freq data=orion.nonsales nlevels;

p t
tables Gender Country Employee_ID / noprint;
run;

r o dis SA
To display the number of levels for all variables without displaying any frequency counts, use the _ALL_

S P o r o f
keyword and the NOPRINT option in the TABLES statement.
proc freq data=orion.nonsales nlevels;

A run;

t f d e
tables _all_ / noprint;

S No tsi
ou
8.3 Validating Data with the PRINT and FREQ Procedures 8-29

The NLEVELS Option


The Number of Variable Levels table appears before the
individual frequency tables.

Partial PROC FREQ Output


The FREQ Procedure

l
Number of Variable Levels

ia
Missing Nonmissing

r
Variable Levels Levels Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

e
Gender 4 1 3

t
Country 4 0 4
Employee_ID 234 1 233

a
There are 235 employees but there are only 234 distinct

M n
Employee_ID values. Therefore, there is one
63

a y t i o
duplicate value for Employee_ID.

r
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8-30 Chapter 8 Validating and Cleaning Data

Exercises

Level 1

1. Validating orion.shoes_tracker with the PRINT and FREQ Procedures

a. Retrieve the starter program p108e01.

ia l
e r
b. The data in orion.shoes_tracker should meet the following requirements:

t
• Product_Category must not be missing.

a
• Supplier_Country must have a value of GB or US.

M
r o n
Add a WHERE statement to the PROC PRINT step to find any observations that

y
do not meet the above requirements.

i
t a t
c. Add a VAR statement to create the following PROC PRINT report:

u
e
Product_ Supplier_

r i t r i
Obs

b Category

S
Supplier_Name Country Supplier_ID

p
1 Shoes 3Top Sports us .

o dis SA
2 3Top Sports US 2963

r
5 Shoes 3Top Sports UT 2963
10 Shoes Greenline Sports Ltd gB 14682

S P o r o f
How many observations have missing Product_Category? ___________________________

A t f d e
How many observations have invalid values of Supplier_Country? ___________________

S No tsi
d. Add a PROC FREQ step with a TABLES statement to create frequency tables for
Supplier_Name and Supplier_ID of orion.shoes_tracker. Include
the NLEVELS option.

ou
The data in orion.shoes_tracker should meet the following requirements:
• Supplier_Name must be 3Top Sports or Greenline Sports Ltd.
• Supplier_ID must be 2963 or 14682.

What invalid data exist for Supplier_Name and Supplier_ID? ______________________


8.3 Validating Data with the PRINT and FREQ Procedures 8-31

Level 2

2. Validating orion.qtr2_2007 with the PRINT and FREQ Procedures

a. Write a PROC PRINT step with a WHERE statement to validate the data in
orion.qtr2_2007.

The data in orion.qtr2_2007 should meet the following requirements:


• Delivery_Date values must be equal to or greater than Order_Date values.
• Order_Date values must be in the range of April 1, 2007 – June 30, 2007.

ia l
t e r
The WHERE statement should find any observations that do not meet the above requirements.
b. Submit the program to create the following PROC PRINT report:

Obs Order_ID

M aOrder_
Type Employee_ID Customer_ID
Order_
Date
Delivery_
Date

5
22

r y
1242012259
1242449327
1

i
3

o n 121040
99999999
10
27
18APR2007
26JUL2007
12APR2007
26JUL2007

t a t
How many observations have Delivery_Date values occurring before Order_Date values?

u
e
______________________________________________________________________________

r i i b
How many observations have Order_Date values out of the range of April 1, 2007 – June 30,

t r S
p
2007? ________________________________________________________________________

r o dis SA
c. Add a PROC FREQ step with a TABLES statement to create frequency tables for Order_ID and
Order_Type of orion.qtr2_2007. Include the NLEVELS option.

S P o r o f
d. Submit the PROC FREQ step.

A t f d e
The data in orion.qtr2_2007 should meet the following requirements:
• Order_ID must be unique (36 distinct values) and not missing.

S No tsi • Order_Type must have a value of 1, 2, or 3.

ou
What invalid data exists for Order_ID and Order_Type? ____________________________

Level 3

3. Using the PROPCASE Function, Two-Way Frequency Table, and MISSING Option
a. Write a PROC PRINT step with a WHERE statement to validate the data in
orion.shoes_tracker. All Product_Name values should be written in proper case.

Documentation on the PROPCASE function can be found in the SAS Help


and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Functions and CALL Routines PROPCASE Function).
8-32 Chapter 8 Validating and Cleaning Data

b. Add a VAR statement to create the following PROC PRINT report:


Obs Product_ID Product_Name

3 220200300015 men's running shoes piedmont


6 220200300096 Mns.raptor Precision Sg Football

c. Add a PROC FREQ step with a TABLES statement to create the following two-way frequency
table with Supplier_Name and Supplier_ID of orion.shoes_tracker:
The FREQ Procedure

Table of Supplier_Name by Supplier_ID

ia l
t e r
Supplier_Name(Supplier Name)

Frequency ‚
Supplier_ID(Supplier ID)

M a
Percent
Row Pct
Col Pct


‚ .‚ 2963‚ 14682‚ Total

r y i o n
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
3Top Sports ‚ 1 ‚ 5 ‚ 1 ‚ 7

t
‚ 10.00 ‚ 50.00 ‚ 10.00 ‚ 70.00

e t a u
‚ 14.29 ‚ 71.43 ‚ 14.29 ‚
‚ 100.00 ‚ 71.43 ‚ 50.00 ‚

i b
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

p r t r i 3op Sports

S


0 ‚ 2 ‚
0.00 ‚ 20.00 ‚
0 ‚
0.00 ‚ 20.00
2

o dis SA
‚ 0.00 ‚ 100.00 ‚ 0.00 ‚
‚ 0.00 ‚ 28.57 ‚ 0.00 ‚

P r f
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Greenline Sports ‚ 0 ‚ 0 ‚ 1 ‚ 1

r
Ltd ‚ 0.00 ‚ 0.00 ‚ 10.00 ‚ 10.00

S f o e o ‚

0.00 ‚
0.00 ‚
0.00 ‚ 100.00 ‚
0.00 ‚ 50.00 ‚

A
S No tsit d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 1
10.00
7
70.00
2
20.00
10
100.00

Documentation on two-way frequency tables and the MISSING option can be found

ou
in the SAS Help and Documentation from the Contents tab (SAS Products
Base SAS Base SAS Procedures Guide: Statistical Procedures
The FREQ Procedure Syntax: FREQ Procedure TABLES Statement).
The data in orion.shoes_tracker should meet the following requirements:
• A Supplier_Name of 3Top Sports must have a Supplier_ID of 2963.
• A Supplier_Name of Greenline Sports Ltd must have a Supplier_ID
of 14682.

What invalid data exists for Supplier_Name and Supplier_ID? _____________________


8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-33

8.4 Validating Data with the MEANS and


UNIVARIATE Procedures

Objectives
Validate data by using the MEANS procedure with
the VAR statement.
Validate data by using the UNIVARIATE procedure

ia l
with the VAR statement.

t e r
M a
r y i o n
t a u t
r i e r i b S
p t
67

r o dis SA
S P The MEANS Procedure

o r o f
The MEANS procedure produces summary reports that

A t f e
display descriptive statistics.

d
General form of the MEANS procedure:

S No tsi PROC MEANS DATA=SAS-data-set <statistics>;


VAR variable(s);

ou
RUN;

The VAR statement specifies the analysis variables


and their order in the results.
The statistics to display can be specified in the
PROC MEANS statement.

68
8-34 Chapter 8 Validating and Cleaning Data

The MEANS Procedure


This PROC MEANS step shows default descriptive
statistics for Salary.
proc means data=orion.nonsales;
var Salary;
run;

The MEANS Procedure

ia l
r
Analysis Variable : Salary

N Mean

t eStd Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
234 43954.60

a
38354.77 2401.00 433800.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

M n
Without the VAR statement, PROC MEANS

y
69

a r t i o
analyzes all numeric variables in the data set. p108d06

i e t b u
p r t i
The MEANS Procedure
r S
By default, the MEANS procedure creates a report with

r o dis SA
N (number of nonmissing values), MEAN, STDDEV, MIN,
and MAX.

S P o r o f
For validating data, the following descriptive statistics are

A t fbeneficial:

d e
N, number of nonmissing values

S No tsi NMISS, number of missing values


MIN

ou
MAX

70
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-35

The MEANS Procedure


The following PROC MEANS step shows whether there
are any Salary values not in the range of 24000
through 500000.
proc means data=orion.nonsales n nmiss min max;
var Salary;

l
run;

r
The MEANS Procedure

e
Analysis Variable : Salary ia
N
N
Miss

a t Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
234 1

M 2401.00

n
433800.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

y
71

a r t i o p108d06

i e t b u
p r t i
The UNIVARIATE Procedure
r S
The UNIVARIATE procedure produces summary reports

r o dis SA
that display descriptive statistics.

S P r f
General form of the UNIVARIATE procedure:

o o
A t f e
PROC UNIVARIATE DATA=SAS-data-set;
VAR variable(s);

d
S No tsi RUN;

ou
The VAR statement specifies the analysis variables and
their order in the results.

72
8-36 Chapter 8 Validating and Cleaning Data

The UNIVARIATE Procedure


The following PROC UNIVARIATE step shows default
descriptive statistics for Salary.
proc univariate data=orion.nonsales;
var Salary;
run;

Without the VAR statement, SAS will analyze all

ia l
numeric variables.

t e r
M a
73

r y i o n p108d06

t a u t
r i e i b
The UNIVARIATE Procedure
r S
p t
The UNIVARIATE procedure can produce the following

o dis SA
sections of output:

P r Moments

f
r
Basic Statistical Measures

S f o o
Tests for Locations

e
A
Quantiles

S No tsit d
Extreme Observations
Missing Values

ou
For validating data, the Extreme Observations and
Missing Values sections are beneficial.

74
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-37

The UNIVARIATE Procedure


Partial PROC UNIVARIATE Output
Extreme Observations

-----Lowest---- -----Highest----

Value Obs Value Obs

l
2401 20 163040 1
2650 13 194885 231

ia
24025 25 207885 28
24100 19 268455 29
24390 228

t e r
Missing Values
433800 27

Missing

M a -----Percent Of-----
Missing

n
Value Count All Obs Obs

75

a r y
.

t i o
1 0.43 100.00

NEXTROBS=n

e t b u
specifies the number of extreme observations that PROC UNIVARIATE lists in the table of

i
p r t i
extreme observations. The table lists the n lowest observations and the n highest observations.

r S
The default value is 5, and n can range between 0 and half the maximum number of

o dis SA
observations. You can specify NEXTROBS=0 to suppress the table of extreme observations.

P r
For example:

f
proc univariate data=orion.nonsales nextrobs=8;

S
run;

f o r
var Salary;

o
A
S No tsit d e
ou
8-38 Chapter 8 Validating and Cleaning Data

Exercises

Level 1

4. Validating orion.price_current with the MEANS and UNIVARIATE Procedures

a. Retrieve the starter program p108e04.

ia l
e r
b. Add a VAR statement to the PROC MEANS step to validate Unit_Cost_Price,

t
Unit_Sales_Price, and Factor.

a
c. Add statistics to the PROC MEANS statement to create the following PROC MEANS report:

M n
The MEANS Procedure

a r
Variable
y t i o
Label N Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

i e t
Unit_Cost_Price

u
Unit_Sales_Price

b
Unit Cost Price
Unit Sales Price
171
170
2.3000000
6.5000000
315.1500000
5730.00

i
Factor Yearly increase in Price 171 0.0100000 100.0000000

p r r S
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
o dis SA
The data in orion.price_current should meet the following requirements:

P r • Unit_Cost_Price must be in the numeric range of 1 – 400.

f
• Unit_Sales_Price must be in the numeric range of 3 – 800.

S f o r o
• Factor must be in the numeric range of 1 – 1.05.

A
S No tsit d e
What variables have invalid data? _________________________________________________
d. Add a PROC UNIVARIATE step with a VAR statement to validate Unit_Sales_Price and
Factor.

ou
e. Submit the PROC UNIVARIATE step and find the Extreme Observations output.
How many values of Unit_Sales_Price are over the maximum of 800? _______________

How many values of Factor are under the minimum of 1? ____________________________

How many values of Factor are over the maximum of 1.05? __________________________

Level 2

5. Validating orion.shoes_tracker with the MEANS and UNIVARIATE Procedures

a. Write a PROC MEANS step with a VAR statement to validate Product_ID of


orion.shoes_tracker.

b. Add the MIN, MAX, and RANGE statistics to the PROC MEANS statement.
8.4 Validating Data with the MEANS and UNIVARIATE Procedures 8-39

c. Add FW=15 to the PROC MEANS statement. The FW= option specifies the field width
to display the statistics in printed or displayed output.
d. Add the following CLASS statement to group the data by Supplier_Name:
class Supplier_Name;

Documentation on the FW= option and the CLASS statement can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The MEANS Procedure).
e. Submit the program to create the following PROC MEANS report:

ia l
t e r The MEANS Procedure

a
Analysis Variable : Product_ID Product ID

M
N

n
Supplier Name Obs Minimum Maximum Range

y o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3Top Sports

t a r t i
7 22020030007 2202003001290 2179982971283

u
3op Sports 2 220200300015 220200300116 101.00000000

r i e r i
Greenline Sports Ltd

b S
1 220200300157 220200300157 0
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

p
o dis SA t
Which Supplier_Name has invalid Product_ID values assuming Product_ID must have

r
only twelve digits? ______________________________________________________________

S P o o
orion.shoes_tracker.
f
f. Add a PROC UNIVARIATE step with a VAR statement to validate Product_ID of

r
A f e
g. Submit the PROC UNIVARIATE step and find the Extreme Observations output.

t d
S No tsi How many values of Product_ID are too small? _____________________________________

How many values of Product_ID are too large? _____________________________________

Level 3
ou
6. Selecting Only the Extreme Observations Output from the UNIVARIATE Procedure
a. Write a PROC UNIVARIATE step with a VAR statement to validate Product_ID of
orion.shoes_tracker.

b. Before the PROC UNIVARIATE step, add the following ODS statement:
ods trace on;
c. After the PROC UNIVARIATE step add, the following ODS statement:
ods trace off;
8-40 Chapter 8 Validating and Cleaning Data

d. Submit the program and notice the trace information in the SAS log.
What is the name of the last Output Added in the SAS log? ______________________________
e. Add an ODS SELECT statement immediately before the PROC UNIVARIATE step to select only
the Extreme Observation output object.

Documentation on the ODS TRACE and ODS SELECT statements can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS

l
SAS 9.2 Output Delivery System User’s Guide ODS Language Statements
Dictionary of ODS Language Statements).

r
f. Submit the program to create the following PROC UNIVARIATE report:

e The UNIVARIATE Procedure ia


a t Variable: Product_ID (Product ID)

Extreme Observations

y M n
--------Lowest------- -------Highest------

a r t i o Value Obs Value Obs

i e t b u 2.20200E+10
2.20200E+11
4
1
2.2020E+11
2.2020E+11
6
7

p r t r i 2.20200E+11

S
2.20200E+11
2
3
2.2020E+11
2.2020E+11
9
10

o dis SA
2.20200E+11 5 2.2020E+12 8

P r f
S f o r o
A
S No tsit d e
ou
8.5 Cleaning Invalid Data 8-41

8.5 Cleaning Invalid Data

Objectives
Clean data by using the Viewtable window.
(Self-Study)
Clean data by using assignment statements in
the DATA step.

ia l
r
Clean data by using IF-THEN/ELSE statements

e
in the DATA step.

a t
y M n
a r t i o
i e t b u
i
79

p r t r S
r o dis SA
Invalid Data to Clean

S P o r o f
The orion.nonsales data set contains invalid
data that needs to be cleaned.

A t f d e
S No tsi
ou
After you validate the data and find the invalid data,
80 the correct data values are needed.
8-42 Chapter 8 Validating and Cleaning Data

Invalid Data to Clean


Variable Obs Invalid Value Correct Value
7 120108 120109
Employee_ID
14 . 120116
12 G F
Gender
101 F
Job_Title 10 Security Guard I
Country 2, 84, 87, 125, 197, and 200

4
au or us

.
AU or US

26960

ia l
Salary

t e
13

20 r 2650

2401
26500

24015

Hire_Date

M a 5

9
21/01/1953

.
21/01/1995

01/11/1978

81

r y
214

i o n 01/01/1968 01/01/1998

t a u t
r i e i b
Interactively Cleaning Data (Self-Study)
r S
p t
If you are using the SAS windowing environment, the

o dis SA
Viewtable window can be used to interactively clean data.

P r f
r
Use the Viewtable window to interactively clean the

S f o o
following five observations:

e
A
Variable Obs Invalid Value Correct Value

S No tsit d
Employee_ID
7
14
12
120108

G
.
120109
120116
F

ou
Gender
101 F
Job_Title 10 Security Guard I

82

For z/OS (OS/390), the FSEDIT window is used to clean data. If FSEDIT is not available, the data can be
cleaned programmatically instead of interactively. If you use batch mode, cleaning the data interactively
is not an option.
8.5 Cleaning Invalid Data 8-43

Interactively Cleaning Data (Self-Study)


The Viewtable window enables you to browse, edit,
or create SAS data sets.

ia l
t e r
M a
83

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8-44 Chapter 8 Validating and Cleaning Data

Using the Viewtable Window to Clean Data – Windows


(Self-Study)

1. Select the Explorer tab on the SAS window bar to activate the SAS Explorer window or select
View Contents Only.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
2. Double-click Libraries to show all available libraries.

ou
8.5 Cleaning Invalid Data 8-45

3. Double-click on the Orion library to show all members of that library.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b
4. Double-click on the Nonsales data set to open the data set in the Viewtable window.

S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8-46 Chapter 8 Validating and Cleaning Data

5. Select Edit Mode from the Edit menu.

ia l
t e r
M a
r y i o n
t
6. Go to the following observations and make the desired changes:

Obs

e t a u
Variable Correct Value

r i7

t r i b
Employee_ID

S
120109

r p
o dis SA
12

14
Gender

Employee_ID
F

120116

S P o r101

o fGender F

A t f
7. Select

e
to close the Viewtable window. The changes are saved to the data set.

d
S No tsi
ou
8.5 Cleaning Invalid Data 8-47

Using the Viewtable Window to Clean Data – UNIX


(Self-Study)

1. Select View Contents Only to activate the SAS Explorer.

ia l
t e r
M a
r y i o n
t a u t
r i e i b
2. Double-click Libraries to show all available libraries.

r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
3. Double-click on the Orion library to show all members of that library.
8-48 Chapter 8 Validating and Cleaning Data

4. Double-click on the Nonsales data set to open the data set in the Viewtable window.

ia l
t e r
M a
r y i o n
t a u t
r i e i b
5. Select Edit Mode from the Edit menu.

r S
p t
6. Go to the following observations and make the desired changes:

o dis SA
P r Obs

f
Variable Correct Value

r
7 Employee_ID 120109

S f o 12

e o Gender F

A
S No tsit 14

101 d Employee_ID

Gender
120116

ou
7. Select to close the Viewtable window. The changes are saved to the data set.
8.5 Cleaning Invalid Data 8-49

Using the FSEDIT Window to Clean Data – z/OS (OS/390)


(Self-Study)
1. Type fsedit orion.nonsales on the command line and press ENTER to activate
the FSEDIT window.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
2. Go to the following observations and make the desired changes:

Obs

7
Variable
Employee_ID
Correct Value

120109

12 Gender F

14 Employee_ID 120116

101 Gender F

Type the observation number on the command line and press ENTER to go to an observation.
3. Type end on the command line and press ENTER to close the FSEDIT window. The changes
are saved to the data set.
8-50 Chapter 8 Validating and Cleaning Data

8.06 Quiz (Self-Study)


Open the VIEWTABLE window for orion.nonsales.
Use the VIEWTABLE window to interactively clean the
following observation:

Variable Obs Invalid Value Correct Value


Job_Title 10 Security Guard I

ia l
t e r
M a
86

r y i o n
t a u t
r i e i b
Programmatically Cleaning Data
r S
p t
The DATA step can be used to programmatically clean

o dis SA
the invalid data.

P r f
r
Use the DATA step to clean the following observations:

S f o
Variable

e o Obs Invalid Value Correct Value

A
S No tsit
Country

Salary d
2, 84, 87, 125, 197, and 200

13
au or us

2650
AU or US

26960

26500

ou
20 2401 24015

5 21/01/1953 21/01/1995
Hire_Date 9 . 01/11/1978

214 01/01/1968 01/01/1998


89
8.5 Cleaning Invalid Data 8-51

The Assignment Statement


The assignment statement evaluates an expression
and assigns the resulting value to a variable.

General form of the assignment statement:

variable = expression;

ia l
t e r
variable names an existing or new variable.
expression is a sequence of operands and operators

a
that form a set of instructions that produce a value.

y M n
90

a r t i o
i e t b u
Assignment statements evaluate the expression on the right side of the equal sign and store
the result in the variable that is specified on the left side of the equal sign.

p r t r i S
r o dis SA
The Assignment Statement Expression

P f
Operands are

S f o r o
character constants
numeric constants

A
S No tsit d e
date constants
character variables
numeric variables.

ou
Operators are
symbols that represent an arithmetic calculation
SAS functions.

91
8-52 Chapter 8 Validating and Cleaning Data

The Assignment Statement Expression


Examples:

Salary = 26960; numeric constant

Gender = 'F'; character constant

Hire_Date = '21JAN1995'd; date constant

ia l
t
Country = upcase(Country);
e r
M
functiona variable

92

r y i o n
t a u t
r i e
SAS Functions
r i b S
p t
A SAS function is a routine that returns a value that is

o dis SA
determined from specified arguments.

P r f
r
The UPCASE function converts all letters in an argument

S o
to uppercase.

f e o
A
S No tsit d
General form of the UPCASE function:

UPCASE(argument)

93
ou
The argument specifies any SAS character expression.
8.5 Cleaning Invalid Data 8-53

The Assignment Statement


All the values of Country in the data set
orion.nonsales need to be uppercase.

data work.clean;
set orion.nonsales;
Country=upcase(Country);
run;

ia l
PDV
Employee_ID Job_Title
120101 ... Director

t e r Country
AU ...

M a
94

r y i o n p108d07
...

t a u t
r i e i b
The Assignment Statement
r S
p t
All the values of Country in the data set

o dis SA
orion.nonsales need to be uppercase.

P r f
r
data work.clean;

S f o o
set orion.nonsales;
Country=upcase(Country);

e
A
run;

S No tsit
PDV
Employee_ID d Job_Title Country

ou
120101 ... Director AU ...

upcase(au)

95 ...
8-54 Chapter 8 Validating and Cleaning Data

The Assignment Statement


All the values of Country in the data set
orion.nonsales need to be uppercase.

data work.clean;
set orion.nonsales;
Country=upcase(Country);
run;

ia l
PDV
Employee_ID Job_Title

t e r Country
120104 ... Administration Manager au ...

M a
96

r y i o n ...

t a u t
r i e i b
The Assignment Statement
r S
p t
All the values of Country in the data set

o dis SA
orion.nonsales need to be uppercase.

P r f
r
data work.clean;

S f o o
set orion.nonsales;
Country=upcase(Country);

e
A
run;

S No tsit PDV
d
Employee_ID Job_Title Country

ou
120104 ... Administration Manager AU ...

upcase(au)

97
8.5 Cleaning Invalid Data 8-55

The Assignment Statement


proc print data=work.clean;
var Employee_ID Job_Title Country;
run;

Partial PROC PRINT Output


Employee_

l
Obs ID Job_Title Country

ia
84 120695 Warehouse Assistant II AU
85 120696 Warehouse Assistant I AU

r
86 120697 Warehouse Assistant IV AU
87 120698 Warehouse Assistant I AU
88
89
90
91
120710
120711
120712
120713

a t e
Business Analyst II
Business Analyst III
Marketing Manager
Marketing Assistant III
US
US
US
US

M
The assignment statement executed for every

n
observation regardless of whether the value

y
98

r i o
needed to be uppercased or not.

a t
i e t b u
p r t i
Programmatically Cleaning Data
r S
The DATA step can be used to programmatically clean

r o dis SA
the invalid data.
Use the DATA step to clean the following observations:

S P o r
Variable
The
o f Obs
84,assignment
Invalid Value
87, 125, 197,statement
and 200 was
auapplied
Correct Value
or us to all AU
observations.

f
Country 2, or US

A
S No tsit
Salary

d e 4

13

20
.

2650
The2401
26960

26500
assignment statement
24015
needs to be applied to

ou
5 21/01/1953 21/01/1995
specific observations.
Hire_Date 9 . 01/11/1978

214 01/01/1968 01/01/1998

99
8-56 Chapter 8 Validating and Cleaning Data

8.07 Quiz
Which variable can be used to specifically identify
the observations with invalid salary values?
Obs Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date

2 120104 F 46230 Administration Manager au 11/05/1954 01/01/1981


4 120106 M . Office Assistant II AU 23/12/1944 01/01/1974
5 120107 F 30475 Office Assistant III AU 01/02/1978 21/01/1953

l
9 120111 M 26895 Security Guard II AU 23/07/1949 .

ia
10 120112 F 26550 AU 17/02/1969 01/07/1990
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005

r
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980

e
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989

t
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976

a
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994

M
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968

101

r y i o n
t a u t
r i e i b
Programmatically Cleaning Data
r S
p t
The DATA step can be used to programmatically clean

o dis SA
the invalid data.

P r Use the DATA step to clean the following observations:

f
S f o r
Variable
Country
o
Obs
2, 84, 87, 125, 197, and 200
Invalid Value
au or us
Correct Value
AU or US

A
S No tsit Salary

d e 4

13

20
.

2650

2401
26960

26500

24015

ou
5 21/01/1953 21/01/1995
Hire_Date 9 . 01/11/1978

214 01/01/1968 01/01/1998

103
8.5 Cleaning Invalid Data 8-57

IF-THEN Statements
The IF-THEN statement executes a SAS statement
for observations that meet specific conditions.

General form of the IF-THEN statement:

IF expression THEN statement ;

ia l
t r
expression is a sequence of operands and operators

e
that form a set of instructions that define a condition
for selecting observations.

M a
statement is any executable statement such as the
assignment statement.

104

r y i o n
t a u t
If the condition in the IF clause is met, the IF-THEN statement executes a SAS statement

r i e
for that observation.

r i b S
p
o dis SA t
IF-THEN Statements

P r f
All the values of Salary must be in the range

S f o r o
of 24000 – 500000.

A
S No tsit e
data work.clean;

d
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

ou
run;

PDV
Employee_ID Salary Job_Title
120105 ... 27110 Secretary I ...

p108d07
105 ...
8-58 Chapter 8 Validating and Cleaning Data

IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

ia l
r
run;

PDV
Employee_ID
a t e
Salary Job_Title
120105 ...

y M 27110 Secretary I

n
...

106

a r t i o ...

i e t b u
p r
IF-THEN Statements

t r i S
All the values of Salary must be in the range

r o dis SA
of 24000 – 500000.

P
FALSE

S o r
data work.clean;

o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;

A t f e
if Employee_ID=120115 then Salary=26500;

d
if Employee_ID=120191 then Salary=24015;

S No tsi run;

ou
PDV
Employee_ID Salary Job_Title
120105 ... 27110 Secretary I ...

107 ...
8.5 Cleaning Invalid Data 8-59

IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

ia l
r
run;

PDV
Employee_ID
a t e
Salary Job_Title
120105 ...

y M
27110 Secretary I

n
...

108

a r t i o ...

i e t b u
p r
IF-THEN Statements

t r i S
All the values of Salary must be in the range

r o dis SA
of 24000 – 500000.

S P o r
data work.clean;

o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;

A t f e
if Employee_ID=120115 then Salary=26500;

d
if Employee_ID=120191 then Salary=24015;

S No tsi run;

ou
PDV
Employee_ID Salary Job_Title
120106 ... . Office Assistant II ...

109 ...
8-60 Chapter 8 Validating and Cleaning Data

IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
TRUE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

ia l
r
run;

PDV
Employee_ID
a t e
Salary Job_Title
120106 ...

y M 26960 Office Assistant II

n
...

110

a r t i o ...

i e t b u
p r
IF-THEN Statements

t r i S
All the values of Salary must be in the range

r o dis SA
of 24000 – 500000.

P
FALSE

S o r
data work.clean;

o f
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;

A t f e
if Employee_ID=120115 then Salary=26500;

d
if Employee_ID=120191 then Salary=24015;

S No tsi run;

ou
PDV
Employee_ID Salary Job_Title
120106 ... 26960 Office Assistant II ...

111 ...
8.5 Cleaning Invalid Data 8-61

IF-THEN Statements
All the values of Salary must be in the range
of 24000 – 500000.
FALSE
data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

ia l
r
run;

PDV
Employee_ID
a t e
Salary Job_Title
120106 ...

y M26960 Office Assistant II

n
...

112

a r t i o
i e t b u
p r
IF-THEN Statements

t r i S
When an IF expression is TRUE in this IF-THEN

r o dis SA
statement series, there is no reason to check the
remaining IF-THEN statements when checking

P f
Employee_ID.

r
TRUE

S f o o
data work.clean;

e
set orion.nonsales;

A
S No tsit run;
if Employee_ID=120106 then Salary=26960;

d
if Employee_ID=120115 then Salary=26500;
if Employee_ID=120191 then Salary=24015;

113
ou
The word ELSE can be placed before the word IF,
causing SAS to execute conditional statements until
it encounters the first true statement.

When a series of IF expressions represent mutually exclusive events, then it is not necessary
to check all expressions when one is found to be true.
8-62 Chapter 8 Validating and Cleaning Data

IF-THEN/ELSE Statements
All the values of Salary must be in the range
of 24000 – 500000.

data work.clean;
set orion.nonsales;
if Employee_ID=120106 then Salary=26960;
else if Employee_ID=120115 then Salary=26500;
else if Employee_ID=120191 then Salary=24015;

ia l
r
run;

PDV
Employee_ID
a t e
Salary Job_Title
120106 ...

y M . Office Assistant II

n
...

114

a r t i o p108d07
...

i e t b u
p r t i
IF-THEN/ELSE Statements
r S
All the values of Salary must be in the range

r o dis SA
of 24000 – 500000.

P
TRUE

S r
data work.clean;
set orion.nonsales;

o o f
if Employee_ID=120106 then Salary=26960;

A t f SKIP

d e
else if Employee_ID=120115 then Salary=26500;
else if Employee_ID=120191 then Salary=24015;

S No tsi
run;

ou
PDV
Employee_ID Salary Job_Title
120106 ... 26960 Office Assistant II ...

115
8.5 Cleaning Invalid Data 8-63

Programmatically Cleaning Data


The DATA step can be used to programmatically clean
the invalid data.
Use the DATA step to clean the following observations:
Variable Obs Invalid Value Correct Value

l
Country 2, 84, 87, 125, 197, and 200 au or us AU or US

ia
4 . 26960

r
Salary 13 2650 26500

Hire_Date
a
20

9t e 2401

21/01/1953

.
24015

21/01/1995

01/11/1978

y M 214

n
01/01/1968 01/01/1998

116

a r t i o
i e t b u
p r t i
IF-THEN/ELSE Statements
r S
All the values of Hire_Date must have a value

r o dis SA
of 01/01/1974 or later.

P f
data work.clean;

S f o r
set orion.nonsales;

o
Country=upcase(Country);
if Employee_ID=120106 then Salary=26960;

A
S No tsi e
else if Employee_ID=120115 then Salary=26500;

t d
else if Employee_ID=120191 then Salary=24015;
else if Employee_ID=120107 then
Hire_Date='21JAN1995'd;
else if Employee_ID=120111 then

ou
Hire_Date='01NOV1978'd;
else if Employee_ID=121011 then
Hire_Date='01JAN1998'd;
run;
p108d07
117
8-64 Chapter 8 Validating and Cleaning Data

Exercises

Level 1

7. Cleaning Data from orion.qtr2_2007

a. Retrieve the starter program p108e07.

ia l
e r
b. Add two conditional statements to the DATA step to correct the following invalid data:

t
Variable
Delivery_Date

M a Obs

5
Invalid Value
12APR2007
Correct Value Reference Variable
12MAY2007 Order_ID=1242012259

Order_Date

r y i o n
22 26JUL2007 26JUN2007 Order_ID=1242449327

a t
c. Submit the program. Verify that zero observations were returned from the PROC PRINT step.

t u
Level 2

r i e r i b S
p t
8. Cleaning Data from orion.price_current

o dis SA
P r a. Retrieve the starter program p108e08.

f
r
b. Add a DATA step prior to the PROC steps to read orion.price_current to create

S f o o
Work.price_current. In the DATA step, include two conditional IF-THEN statements

e
to correct the following invalid data:

A
S No tsit d
Variable

Unit_Sales_Price
Obs Invalid
Value
Correct
Value
Reference Variable

41 5730 57.30 Product_ID=220200200022

ou Unit_Sales_Price 103 . 41.20 Product_ID=240200100056

c. Submit the program. Verify that Unit_Sales_Price is in the numeric range of 3 – 800.
8.5 Cleaning Invalid Data 8-65

Level 3

9. Cleaning Data from orion.shoes_tracker

a. Retrieve the starter program p108e09.


b. Add a DATA step prior to the PROC steps to read orion.shoes_tracker to create
Work.shoes_tracker. In the DATA step, include statements to correct the following
invalid data:

Variable Obs Invalid Value Correct Value

ia l
Reference Variable
Supplier_Country

Supplier_Country
t e r
5
mixed case

UT
upper case

US Supplier_Country='UT'
Product_Category

M a 2 Shoes Product_Category=' '

Supplier_ID

r y i o
1
n . 2963 Supplier_ID=.

t
Supplier_Name 3, 7 3op Sports 3Top Sports Supplier_Name =

e t a u
'3op Sports'

r i
Product_ID

t r i b S
4 22020030007 220200300079 _N_=4

p
Product_ID 8 2202003001290 220200300129 _N_=8

r o dis SA
Product_Name not proper case proper case

S P o
Supplier_Name

r o f 9 3Top Sports Greenline


Sports Ltd
Supplier_ID=14682 and
Supplier_Name =
'3Top Sports'

A t f d e
c. Submit the program. Verify that the data requirements are all met.

S No tsi • Product_Category must not be missing.


• Supplier_Country must have a value of GB or US.

ou
• Supplier_Name must be 3Top Sports or Greenline Sports Ltd.
• Supplier_ID must be 2963 or 14682.
• A Supplier_Name of 3Top Sports must have a Supplier_ID of 2963.
• A Supplier_Name of Greenline Sports Ltd must have a Supplier_ID of
14682.
• Product_ID must have only 12 digits.
8-66 Chapter 8 Validating and Cleaning Data

8.6 Chapter Review

Chapter Review
1. What procedures can be used to detect invalid data?

2. What happens when SAS encounters a data error?

ia l
t e r
3. Why would you need a SAS date constant?

a
4. How can you clean invalid data?

M n
5. What symbol is required in an assignment statement?

y
r t i o
6. Why would you use IF-THEN/ELSE statements

a
i e t b u
instead of IF-THEN statements?

i
120

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
8.7 Solutions 8-67

8.7 Solutions

Solutions to Exercises
1. Validating orion.shoes_tracker with the PRINT and FREQ Procedures

a. Retrieve the starter program.


b. Add a WHERE statement to the PROC PRINT step.
proc print data=orion.shoes_tracker;

ia l
where Product_Category=' ' or

run;
e r
Supplier_Country not in ('GB','US');

t
c. Add a VAR statement.

M a
proc print data=orion.shoes_tracker;

r y
where Product_Category=' ' or

i o n
Supplier_Country not in ('GB','US');

run;

t a u t
var Product_Category Supplier_Name Supplier_Country Supplier_ID;

i e i b
How many observations have missing Product_Category? One (observation 2)

r r S
p t
How many observations have invalid values of Supplier_Country? Three (observations 1,

o dis SA
5, and 10)

P r
d. Add a PROC FREQ step with a TABLES statement.

f
S o r
proc freq data=orion.shoes_tracker nlevels;

o
tables Supplier_Name Supplier_ID;

f e
run;

A
S No tsit d
What invalid data exists for Supplier_Name and Supplier_ID?
• two invalid values for Supplier_Name (3op Sports)
• one missing value for Supplier_ID

ou
2. Validating orion.qtr2_2007 with the PRINT and FREQ Procedures

a. Write a PROC PRINT step with a WHERE statement.


proc print data=orion.qtr2_2007;
where Order_Date>Delivery_Date or
Order_Date<'01APR2007'd or
Order_Date>'30JUN2007'd;
run;
b. Submit the program.
How many observations have Delivery_Date values occurring before Order_Date values?
One (observation 5)
How many observations have Order_Date values out of the range of April 1, 2007 – June 30,
2007? One (observation 22)
8-68 Chapter 8 Validating and Cleaning Data

c. Add a PROC FREQ step with a TABLES statement.


proc freq data=orion.qtr2_2007 nlevels;
tables Order_ID Order_Type;
run;
d. Submit the PROC FREQ step.
What invalid data exists for Order_ID and Order_Type?
• two missing values for Order_ID
• one value of 0 for Order_Type
• one value of 4 for Order_Type
ia l
t e r
3. Using the PROPCASE Function, Two-Way Frequency Table, and MISSING Option

M a
a. Write a PROC PRINT step with a WHERE statement.
proc print data=orion.shoes_tracker;

run;

r i o n
where propcase(Product_Name) ne Product_Name;

y
a
b. Add a VAR statement.

t u t
e
proc print data=orion.shoes_tracker;

r i r i b
where propcase(Product_Name) ne Product_Name;
var Product_ID Product_Name;

t S
run;

r p
o dis SA
c. Add a PROC FREQ step with a TABLES statement.

P f
proc freq data=orion.shoes_tracker;

S
run;

f o r
tables Supplier_Name*Supplier_ID / missing;

o
A
S No tsit d e
What invalid data exists for Supplier_Name and Supplier_ID?
• two invalid values for Supplier_Name (3op Sports should be 3Top Sports)
• one missing value for Supplier_ID (. should be 2963)

ou
• one wrong value for Supplier_ID (14682 should be 2963)

4. Validating orion.price_current with the MEANS and UNIVARIATE Procedures

a. Retrieve the starter program.


b. Add a VAR statement to the PROC MEANS step.
proc means data=orion.price_current;
var Unit_Cost_Price Unit_Sales_Price Factor;
run;
8.7 Solutions 8-69

c. Add statistics to the PROC MEANS statement.


proc means data=orion.price_current n min max;
var Unit_Cost_Price Unit_Sales_Price Factor;
run;
What variables have invalid data?
• The maximum value of Unit_Sales_Price is out of range and one value
of Unit_Sales_Price is missing.
• The minimum and maximum values of Factor are out of range.

d. Add a PROC UNIVARIATE step with a VAR statement.

ia l
e r
proc univariate data=orion.price_current;
var Unit_Sales_Price Factor;

t
a
run;

M
e. Find the Extreme Observations output.

r i o n
How many values of Unit_Sales_Price are over the maximum of 800? One (5730)

y
How many values of Factor are under the minimum of 1? One (0.01)

a t
How many values of Factor are over the maximum of 1.05? Two (10.20 and 100.00)

t u
r i e r i b
5. Validating orion.shoes_tracker with the MEANS and UNIVARIATE Procedures

S
a. Write a PROC MEANS step with a VAR statement.

p
o dis SA t
proc means data=orion.shoes_tracker;

r
var Product_ID;
run;

S P o r o f
b. Add the MIN, MAX, and RANGE statistics to the PROC MEANS statement.

f
proc means data=orion.shoes_tracker min max range;

A var Product_ID;

S No tsi
run;
t d e
c. Add FW=15 to the PROC MEANS statement.

ou
proc means data=orion.shoes_tracker min max range fw=15;
var Product_ID;
run;
d. Add the CLASS statement.
proc means data=orion.shoes_tracker min max range fw=15;
var Product_ID;
class Supplier_Name;
run;
e. Submit the program.
Which Supplier_Name has invalid Product_ID values assuming Product_ID must have
only twelve digits? 3Top Sports
8-70 Chapter 8 Validating and Cleaning Data

f. Add a PROC UNIVARIATE step with a VAR statement.


proc univariate data=orion.shoes_tracker;
var Product_ID;
run;
g. Submit the PROC UNIVARIATE step.
How many values of Product_ID are too small? One (2.20200E+10)

l
How many values of Product_ID are too large? One (2.2020E+12)

ia
6. Selecting Only the Extreme Observations Output from the UNIVARIATE Procedure

e r
a. Write a PROC UNIVARIATE step with a VAR statement.

t
proc univariate data=orion.shoes_tracker;
var Product_ID;
run;

M a
ods trace on;
r y i o n
b. Before the PROC UNIVARIATE step, add an ODS statement.

t a u t
proc univariate data=orion.shoes_tracker;

run;

r i e
var Product_ID;

r i b S
p t
c. After the PROC UNIVARIATE step, add an ODS statement.

o dis SA
r
ods trace on;

S P r
var Product_ID;

o o f
proc univariate data=orion.shoes_tracker;

f
run;

A
S No tsit
ods trace off;
d e
d. Submit the program.

ou
What is the name of the last Output Added in the SAS log? ExtremeObs
e. Add an ODS SELECT statement.
ods trace on;

ods select ExtremeObs;


proc univariate data=orion.shoes_tracker;
var Product_ID;
run;

ods trace off;


f. Submit the program.
8.7 Solutions 8-71

7. Cleaning Data from orion.qtr2_2007

a. Retrieve the starter program.


b. Add two conditional statements to the DATA step.
data work.qtr2_2007;
set orion.qtr2_2007;
if Order_ID=1242012259 then Delivery_Date='12MAY2007'd;
else if Order_ID=1242449327 then Order_Date='26JUN2007'd;
run;

ia l
r
proc print data=work.qtr2_2007;
where Order_Date>Delivery_Date or

run;
a t
Order_Date>'30JUN2007'd; e
Order_Date<'01APR2007'd or

c. Submit the program.

y M n
r i o
8. Cleaning Data from orion.price_current

a t
i e t b u
a. Retrieve the starter program.

i
b. Add a DATA step.

r r
data work.price_current;

p t S
o dis SA
set orion.price_current;
if Product_ID=220200200022 then Unit_Sales_Price=57.30;

P
run;
r
else if Product_ID=240200100056 then Unit_Sales_Price=41.20;

f
S o r o
proc means data=work.price_current n min max;

f
A run;

S No tsi d e
var Unit_Sales_Price;

t
proc univariate data=work.price_current;
var Unit_Sales_Price;

ou
run;
c. Submit the program.
8-72 Chapter 8 Validating and Cleaning Data

9. Cleaning Data from orion.shoes_tracker

a. Retrieve the starter program.


b. Add a DATA step.
data work.shoes_tracker;
set orion.shoes_tracker;
Supplier_Country=upcase(Supplier_Country);
if Supplier_Country='UT' then Supplier_Country='US';
if Product_Category=' ' then Product_Category='Shoes';
if Supplier_ID=. then Supplier_ID=2963;

ia l
r
if Supplier_Name='3op Sports' then Supplier_Name='3Top Sports';
if _n_=4 then Product_ID=220200300079;

a t e
else if _n_=8 then Product_ID=220200300129;
Product_Name=propcase(Product_Name);
if Supplier_ID=14682 and Supplier_Name='3Top Sports'

run;

y M
then Supplier_Name='Greenline Sports Ltd';

n
r t i
proc print data=work.shoes_tracker;

a o
t
where Product_Category=' ' or

i e i b u
Supplier_Country not in ('GB','US') or
propcase(Product_Name) ne Product_Name;
run;

p r t r S
o dis SA
proc freq data=work.shoes_tracker;

P
run;
r
tables Supplier_Name*Supplier_ID / missing;

f
S o r o
proc means data=work.shoes_tracker min max range fw=15;

f e
var Product_ID;

A t d
class Supplier_Name;

S No tsi
run;

proc univariate data=work.shoes_tracker;

ou
var Product_ID;
run;
c. Submit the program.
8.7 Solutions 8-73

Solutions to Student Activities (Polls/Quizzes)

8.01 Quiz – Correct Answer


What problems will SAS have reading the numeric data
Salary and Hire_Date?

Partial nonsales.csv

ia l
r
120101,Patrick,Lu,M,163040,Director,AU,18AUG1976,01JUL2003
120104,Kareen,Billington,F,46230,Administration Manager,au,11MAY1954,01JAN1981

e
120105,Liz,Povey,F,27110,Secretary I,AU,21DEC1974,01MAY1999

t
120106,John,Hornsey,M,unknown,Office Assistant II,AU,23DEC1944,01JAN1974

a
120107,Sherie,Sheedy,F,30475,Office Assistant III,AU,01FEB1978,21JAN1953
120108,Gladys,Gromek,F,27660,Warehouse Assistant II,AU,23FEB1984,01AUG2006
120108,Gabriele,Baker,F,26495,Warehouse Assistant I,AU,15DEC1986,01OCT2006

M
120110,Dennis,Entwisle,M,28615,Warehouse Assistant III,AU,20NOV1949,01NOV1979

n
120111,Ubaldo,Spillane,M,26895,Security Guard II,AU,23JUL1949,99NOV1978

y o
120112,Ellis,Glattback,F,26550, ,AU,17FEB1969,01JUL1990

r i
120113,Riu,Horsey,F,26870,Security Guard II,AU,10MAY1944,01JAN1974

t
120114,Jeannette,Buddery,G,31285,Security Manager,AU,08FEB1944,01JAN1974

e t a
120115,Hugh,Nichollas,M,2650,Service Assistant I,AU,08MAY1984,01AUG2005

u
.,Austen,Ralston,M,29250,Service Assistant II,AU,13JUN1959,01FEB1980
120117,Bill,Mccleary,M,31670,Cabinet Maker III,AU,11SEP1964,01APR1986
7

r i t r i b S
r p
o dis SA
8.02 Quiz – Correct Answer

S P r f
What problems exist with the data in this partial data set?

o o
A t f d e
S No tsi
ou
Hint: There are nine data problems.
12
8-74 Chapter 8 Validating and Cleaning Data

8.03 Multiple Choice Poll – Correct Answer


Which statements are used to read a delimited
raw data file and create a SAS data set?
a. DATA and SET only
b. DATA and INFILE only
c. DATA, SET, and INPUT only
d. DATA, INFILE, and INPUT only

ia l
t e r
M a
25

r y i o n
t a u t
r i e i b
8.04 Multiple Choice Poll – Correct Answer
r S
p t
Which statement best describes the invalid data?

o dis SA
P r a. The data in the raw data file is bad.

f
b. The programmer incorrectly read the data.

S f o r o Last was read as

e
Partial SAS Log

A 404

S No tsi
405

t
RULE:
1
run;

d
input Employee_ID First $ Last;

NOTE: Invalid data for Last in line 1 16-17.


numeric but needs
to be read as
character.
----+----1----+----2----+----3----+----4----+----5----+----6
120101,Patrick,Lu,M,163040,Director,AU,18AUG1976,01JUL2003 58

ou
Employee_ID=120101 First=Patrick Last=. _ERROR_=1 _N_=1
NOTE: Invalid data for Last in line 2 15-24.
2 120104,Kareen,Billington,F,46230,Administration Manager,au,1
61 1MAY1954,01JAN1981 78
Employee_ID=120104 First=Kareen Last=. _ERROR_=1 _N_=2

36
8.7 Solutions 8-75

8.05 Multiple Choice Poll – Correct Answer


Which data requirement cannot be achieved with
the PRINT procedure using a WHERE statement?

a. Employee_ID must be unique and not missing.


b. Gender must have a value of F or M.
c.
d.
Salary must be in the numeric range of 24000 – 500000.
Job_Title must not be missing.

ia l
r
e. Country must have a value of AU or US.

e
f. Birth_Date value must occur before Hire_Date value.
g.

a t
Hire_Date must have a value of 01/01/1974 or later.

y M n
53

a r t i o
i e t b u
p r t i
8.06 Quiz – Correct Answer (Self-Study)
r S
Open the VIEWTABLE window for orion.nonsales.

r o dis SA
Use the VIEWTABLE window to interactively clean the
following observation:

S P o r o f
A t f d e
S No tsi
87
ou
8-76 Chapter 8 Validating and Cleaning Data

8.07 Quiz – Correct Answer


Which variable can be used to specifically identify
the observations with invalid salary values?
Obs Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date

2 120104 F 46230 Administration Manager au 11/05/1954 01/01/1981


4 120106 M . Office Assistant II AU 23/12/1944 01/01/1974
5 120107 F 30475 Office Assistant III AU 01/02/1978 21/01/1953

l
9 120111 M 26895 Security Guard II AU 23/07/1949 .

ia
10 120112 F 26550 AU 17/02/1969 01/07/1990
12 120114 G 31285 Security Manager AU 08/02/1944 01/01/1974
13 120115 M 2650 Service Assistant I AU 08/05/1984 01/08/2005

r
14 . M 29250 Service Assistant II AU 13/06/1959 01/02/1980

e
20 120191 F 2401 Trainee AU 17/01/1959 01/01/2003
84 120695 M 28180 Warehouse Assistant II au 13/07/1964 01/07/1989

t
87 120698 M 26160 Warehouse Assistant I au 17/05/1954 01/08/1976

a
101 120723 33950 Corp. Comm. Specialist II US 10/08/1949 01/01/1974
125 120747 F 43590 Financial Controller I us 20/06/1974 01/08/1995
197 120994 F 31645 Office Administrator I us 16/06/1974 01/11/1994

M
200 120997 F 27420 Shipping Administrator I us 21/11/1974 01/09/1996
214 121011 M 25735 Service Assistant I US 11/03/1944 01/01/1968

102

r y o n
Employee_ID because the values are unique.

i
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
8.7 Solutions 8-77

Solutions to Chapter Review

Chapter Review Answers


1. What procedures can be used to detect invalid data?
PRINT
FREQ
MEANS

ia l
UNIVARIATE

t e r
M a
r y i o n
t a u t
121

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
r
2. What happens when SAS encounters a data error?

S f o o
A note that describes the error is printed in the
SAS log.

e
A
S No tsit d
The input record (contents of the input buffer)
being read is displayed in the SAS log.
The values in the SAS observation (contents of
the PDV) being created are displayed in the

ou
SAS log.
A missing value is assigned to the appropriate
SAS variable.
Execution continues.

122 continued...
8-78 Chapter 8 Validating and Cleaning Data

Chapter Review Answers


3. Why would you need a SAS date constant?
You use a SAS date constant when you want to
convert a calendar date to a SAS date value.

4. How can you clean invalid data?


the VIEWTABLE window
assignment statements in the DATA step

ia l
t e r
IF-THEN/ELSE statements in the DATA step

= (Equal sign)

M a
5. What symbol is required in an assignment statement?

123

r y i o n continued...

t a u t
r i e i b
Chapter Review Answers
r S
p t
6. Why would you use IF-THEN/ELSE statements

o dis SA
instead of IF-THEN statements?

P r If a series of IF expressions are mutually

f
r
exclusive, it is not necessary to check all

S f o e o
expressions when one is found to be true.

A
S No tsit d
124
ou
Chapter 9 Manipulating Data

9.1 Creating Variables .......................................................................................................... 9-3


Exercises .............................................................................................................................. 9-27

9.2 Creating Variables Conditionally ................................................................................ 9-30


ia l
e r
Exercises .............................................................................................................................. 9-41

t
9.3
a
Subsetting Observations ............................................................................................. 9-44

M
r i o n
Exercises .............................................................................................................................. 9-51

y
9.4

a t
Chapter Review ............................................................................................................. 9-53

t u
9.5

r i e i b
Solutions ....................................................................................................................... 9-54

r S
p t
Solutions to Exercises .......................................................................................................... 9-54

o dis SA
r
Solutions to Student Activities (Polls/Quizzes) ..................................................................... 9-62

S P o r o f
Solutions to Chapter Review ................................................................................................ 9-66

A t f d e
S No tsi
ou
9-2 Chapter 9 Manipulating Data

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
9.1 Creating Variables 9-3

9.1 Creating Variables

Objectives
Create SAS variables with the assignment statement
in the DATA step.
Create data values by using operators including
SAS functions.

ia l
r
Subset variables by using the DROP and KEEP

e
statements.

a t
Examine the compilation and execution phases
of the DATA step when you read a SAS data set.

M
Subset variables by using the DROP= and KEEP=
options. (Self-Study)

r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rBusiness Scenario

f
A new SAS data set named Work.comp needs to

S f o r o
be created by reading the orion.sales data set.

e
Work.comp must include the following new variables:

A
S No tsit d
Bonus, which is equal to a constant 500
Compensation, which is the combination of the
employee's salary and bonus

ou
BonusMonth, which is equal to the month that the
employee was hired
Work.comp must not include the Gender, Salary,
Job_Title, Country, Birth_Date, and
Hire_Date variables from orion.sales.

4
9-4 Chapter 9 Manipulating Data

Business Scenario
Partial orion.sales
Employee_ First_ Last_ Birth_ Hire_
Gender Salary Job_ Title Country
ID Name Name Date Date
120102 Tom Zhou M 108255 Sales Manager AU 3510 10744
120103 Wilson Dawes M 87975 Sales Manager AU -3996 5114
120121 Irenie Elvish F 26600 Sales Rep. II AU -5630 5114

ia l
Partial Work.comp
Employee_

t e r
First_ Last_ Bonus

a
Bonus Compensation
ID Name Name Month
120102 Tom Zhou 500 108755 6

M
120103 Wilson Dawes 500 88475 1

n
120121 Irenie Elvish 500 27100 1

a r y t i o
i e t b u
p r t i
Assignment Statements (Review)
r S
Assignment statements are used in the DATA step

r o dis SA
to update existing variables or create new variables.

S P o r o f
DATA output-SAS-data-set;
SET input-SAS-data-set;
variable = expression;

A t f RUN;

d e
S No tsi DATA output-SAS-data-set;

ou
INFILE 'raw-data-file-name';
INPUT specifications;
variable = expression;
RUN;
6
9.1 Creating Variables 9-5

Assignment Statements (Review)


The assignment statement evaluates an expression
and assigns the resulting value to a variable.

General form of the assignment statement:

l
variable = expression;

r
variable names an existing or new variable.

e ia
t
expression is a sequence of operands and operators

a
that form a set of instructions that produce a value.

y M n
7

a r t i o
i e t b u
An assignment statement evaluates the expression on the right side of the equal sign and stores the result
in the variable that is specified on the left side of the equal sign.

p r t r i S
r o dis SA
Operands (Review)

P f
Operands are constants (character, numeric, or date) and

S f o r
variables (character or numeric).

o
e
Examples:

A
S No tsit d
Bonus = 500;

Gender = 'M';
numeric constant

character constant

ou
Hire_Date = '01APR2008'd;

NewSalary = 1.1 * Salary;


date constant

variable
8
9-6 Chapter 9 Manipulating Data

Operators (Review)
Operators are symbols that represent an arithmetic
calculation and SAS functions.

Examples:
Revenue = Quantity * Price;

ia l
r
NewCountry = upcase(Country);

a t e
y M n
9

a r t i o
i e t b u
p r t i
Arithmetic Operators
r S
Arithmetic operators indicate that an arithmetic

r o dis SA
calculation is performed.

P f
Symbol Definition Priority

S f o r **

o exponentiation I

e
- negative prefix I

A
S No tsit *
/
+
d
multiplication
division
addition
II
II
III

ou
- subtraction III

If a missing value is an operand for an arithmetic


operator, the result is a missing value.
10

Rules for Operators


• Operations of priority I are performed before operations of priority II, and so on.
• Consecutive operations with the same priority are performed in this sequence:
– from right to left within priority I
– from left to right within priorities II and III
• Parentheses can be used to control the order of operations.
9.1 Creating Variables 9-7

9.01 Quiz
What is the result of the assignment statement?
a. . (missing)
b. 0 num = 4 + 10 / 2;
c. 7

l
d. 9

e r ia
a t
y M n
12

a r t i o
i e t b u
p r
9.02 Quiz

t r i S
What is the result of the assignment statement given

r o dis SA
the values of var1 and var2?

P f
a. . (missing)

S
b.
c.

f o
0
5r o
A
S No tsit
d. 10

d e
num = var1 + var2 / 2; var1
.
var2
10

15
ou
9-8 Chapter 9 Manipulating Data

SAS Functions (Review)


A SAS function is a routine that returns a value that
is determined from specified arguments.

Some SAS functions manipulate character values, compute


descriptive statistics, or manipulate SAS date values.

General form of a SAS function:

ia l
e r
function-name(argument1, argument2, …)

t
M
arguments are used.a
Depending on the function, zero, one, or many

17

r y i o n
Arguments are separated with commas.

t a u t
r i e i b
Descriptive Statistics Function
r S
p t
The SUM function returns the sum of the arguments.

o dis SA
P r General form of the SUM function:

f
S f o r o
SUM(argument1,argument2, ...)

A
S No tsit d e
The arguments must be numeric values.
Missing values are ignored by some of the descriptive
statistics functions.

ou
Example:
Compensation=sum(Salary,Bonus);

18
9.1 Creating Variables 9-9

Date Functions
SAS date functions can be used to
extract information from SAS date values
create SAS date values.

01JAN1959
Calendar Date

01JAN1960 01JAN1961

ia l
t e r
-365

M a 0

SAS Date Value


366

19

r y i o n
t a u t
r i e i b
Date Functions – Extracting Information
r S
p
o dis SA t
YEAR(SAS-date)
extracts the year from a SAS date and
returns a four-digit value for year.

P r QTR(SAS-date)

f
extracts the quarter from a SAS date and
returns a number from 1 to 4.

S f o r
MONTH(SAS-date)
o extracts the month from a SAS date and
returns a number from 1 to 12.

A
S No tsit d e
DAY(SAS-date)

WEEKDAY(SAS-date)
extracts the day of the month from a SAS
date and returns a number from 1 to 31.
extracts the day of the week from a SAS
date and returns a number from 1 to 7,

ou
where 1 represents Sunday, and so on.

Example:
BonusMonth=month(Hire_Date);
20
9-10 Chapter 9 Manipulating Data

Date Functions – Creating SAS Dates


returns the current date as a SAS date
TODAY() value.
returns a SAS date value from numeric
MDY(month,day,year) month, day, and year values.

Example:
AnnivBonus=mdy(month(Hire_Date),15,2008);

ia l
t e r
M a
21

r y i o n
t a u t
r i e
Business Scenario
r i b S
p t
Create Bonus, Compensation, and BonusMonth.

o dis SA
r
data work.comp;
set orion.sales;

S P o
Bonus=500;

r o f
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);

A t f
run;

d e
S No tsi
1700
1701
1702
1703
data work.comp;
set orion.sales;
Bonus=500;
Compensation=sum(Salary,Bonus);
orion.sales
has 9 variables.

ou
1704 BonusMonth=month(Hire_Date);
1705 run;

NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.COMP has 165 observations and 12 variables.

p109d01
23
9.1 Creating Variables 9-11

9.03 Quiz
What statement needs to be added to the DATA step
to eliminate six of the 12 variables?

ia l
t e r
M a
25

r y i o n
t a u t
r i e i b
The DROP and KEEP Statements (Review)
r S
p t
The DROP statement specifies the names of the variables

o dis SA
to omit from the output data set(s).

P r f
DROP variable-list ;

S f o r o
e
The KEEP statement specifies the names of the variable

A
S No tsit
to write to the output data set(s).

d
KEEP variable-list ;

ou
The variable-list specifies the variables to drop or keep,
respectively, in the output data set.

27
9-12 Chapter 9 Manipulating Data

Business Scenario
Drop Gender, Salary, Job_Title, Country,
Birth_Date, and Hire_Date.
data work.comp;
set orion.sales;
Bonus=500;

l
Compensation=sum(Salary,Bonus);

ia
BonusMonth=month(Hire_Date);
drop Gender Salary Job_Title

run;

t e r
Country Birth_Date Hire_Date;

Partial SAS Log

M a
NOTE: There were 165 observations read from the data set ORION.SALES.

n
NOTE: The data set WORK.COMP has 165 observations and 6 variables.

28

a r y t i o p109d01

i e t b u
p r
Business Scenario

t r i
proc print data=work.comp;
S
r o dis SA
run;

S P
Partial PROC PRINT Output
Obs

o r
Employee_ID

o
Namef
First_
Last_Name Bonus Compensation
Bonus
Month

A 1
2

t f d e
120102
120103
Tom
Wilson
Zhou
Dawes
500
500
108755
88475
6
1

S No tsi
3
4
5
6
120121
120122
120123
120124
Irenie
Christina
Kimiko
Lucian
Elvish
Ngan
Hotstone
Daymond
500
500
500
500
27100
27975
26690
26980
1
7
10
3

ou
7 120125 Fong Hofmeister 500 32540 3
8 120126 Satyakam Denny 500 27280 8
9 120127 Sharryn Clarkson 500 28600 11
10 120128 Monica Kletschkus 500 31390 11

p109d01
29
9.1 Creating Variables 9-13

Setup for the Poll


Submit program p109a01.
Verify the results.

data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
ia l
run;

t e r
BonusMonth=month(Hire_Date);

M a
31

r y i o n
t a u t
r i
9.04 Polle r i b S
p t
Are the correct results produced when the DROP

o dis SA
statement is placed after the SET statement?

P r Yes

f
r
No

S f o e o
A
S No tsit d
32
ou
9-14 Chapter 9 Manipulating Data

Processing the DROP and KEEP Statements


The DROP and KEEP statements select variables
after they are brought into the program data vector.

Input SAS Data Set

PDV

ia l
t e r
DROP and KEEP statements

M a
34
Output SAS Data Set

r y i o n
t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.comp;
set orion.sales;

r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;

P f
Bonus=500;

r
Compensation=sum(Salary,Bonus);

S f o run;

e o
BonusMonth=month(Hire_Date);

A
S No tsit d
35
ou ...
9.1 Creating Variables 9-15

Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title

ia l
r
N 8 $ 12 $ 18 $ 1 N 8 $ 25

Country Birth_Date Hire_Date


$ 2 N 8 N 8

a t e
y M n
36

a r t i o ...

i e t b u
p r
Compilation

t r i
data work.comp;
S
o dis SA
set orion.sales;

r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;

P f
Bonus=500;

r
Compensation=sum(Salary,Bonus);

S f o run;

e o
BonusMonth=month(Hire_Date);

A
PDV

S No tsit
N 8

d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18

Country Birth_Date Hire_Date


$ 1

Bonus
N 8
Job_Title
$ 25

ou
$ 2 N 8 N 8 N 8

37 ...
9-16 Chapter 9 Manipulating Data

Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title

ia l
r
N 8 $ 12 $ 18 $ 1 N 8 $ 25

Country Birth_Date Hire_Date


$ 2 N 8 N 8

a t e Bonus
N 8
Compensation
N 8

y M n
38

a r t i o ...

i e t b u
p r
Compilation

t r i
data work.comp;
S
o dis SA
set orion.sales;

r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;

P f
Bonus=500;

r
Compensation=sum(Salary,Bonus);

S f o run;

e o
BonusMonth=month(Hire_Date);

A
PDV

S No tsitN 8

d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18

Country Birth_Date Hire_Date


$ 1

Bonus
N 8
Job_Title
$ 25

Compensation BonusMonth

ou
$ 2 N 8 N 8 N 8 N 8 N 8

39 ...
9.1 Creating Variables 9-17

Compilation
data work.comp;
set orion.sales;
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;
PDV
Employee_ID First_Name Last_Name Gender Salary Job_Title

ia l
r
D
N 8 $ 12 $ 18 $ 1 D N 8 D
$ 25

D
Country Birth_Date Hire_Date
$ 2 D
N 8 D
N 8

a t e
Bonus
N 8
Compensation BonusMonth
N 8 N 8

y M n
40

a r t i o ...

i e t b u
p r
Compilation

t r i
data work.comp;
S
o dis SA
set orion.sales;

r
drop Gender Salary Job_Title
Country Birth_Date Hire_Date;

P f
Bonus=500;

r
Compensation=sum(Salary,Bonus);

S f o run;

e o
BonusMonth=month(Hire_Date);

A
PDV

S No tsit
N 8

d
Employee_ID First_Name Last_Name Gender Salary
$ 12 $ 18

Country Birth_Date Hire_Date


D
$ 1 D N 8

Bonus
D
Job_Title
$ 25

Compensation BonusMonth

ou
D D D
$ 2 N 8 N 8 N 8 N 8 N 8

Descriptor Portion Work.comp


Employee_ID First_Name Last_Name Bonus Compensation BonusMonth
41 N 8 $ 12 $ 18 N 8 N 8 N 8
9-18 Chapter 9 Manipulating Data

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Hire_Date Initialize
drop Gender SalaryPDV
Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
. . . . .

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

42

a r y t i o ...

i e t b u
p r
Execution
Partial orion.sales

t r i S
data work.comp;

o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title

r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;

S P r
120103
120121

o
...

o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);

f
120122 6756 run;

A PDV

S No tsit
Employee
_ID
120102
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth

M 10744 . . .

ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID

43 ...
9.1 Creating Variables 9-19

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120102 M 10744 500 . .

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

44

a r y t i o ...

i e t b u
p r
Execution
Partial orion.sales

t r i S
data work.comp;

o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title

r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;

S P 120103
120121

o r
...

o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);

f
120122 6756 run;

A PDV

S No tsi
_IDt
Employee

120102
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth

M 10744 500 108755 .

ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID

45 ...
9-20 Chapter 9 Manipulating Data

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120102 M 10744 500 108755 6

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

46

a r y t i o ...

i e t b u
p r
Execution
Partial orion.sales

t r i S
data work.comp;

o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title

r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;

S P r
120103
120121

o
...

o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
Implicit OUTPUT;

f
120122 6756 run;

e
Implicit RETURN;

A PDV

S No tsit
Employee
_ID
120102 M
d
... DGender ... DHire_Date Bonus Compensation BonusMonth

10744 500 108755 6

ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6

47 ...
9.1 Creating Variables 9-21

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Hire_Date Reinitialize
drop Gender Salary PDV
Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120102 M 10744 . . .

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

y
120102 Tom Zhou 500 108755 6

48

a r t i o ...

i e t b u
SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by an
assignment statement are reset to missing, but variables that are read with a SET statement are not reset

p r
to missing.

t r i S
r o dis SA
Execution

S P
Employee

o r
Partial orion.sales

o f
Hire_Date
data work.comp;
set orion.sales;
drop Gender Salary Job_Title

f
_ID Country Birth_Date

A
120102

S No tsit 120103
120121
120122 d
...
e 10744
5114
5114
6756
Hire_Date;
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
run;

ou
PDV
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth
_ID
120103 M 5114 . . .

Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6

49 ...
9-22 Chapter 9 Manipulating Data

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120103 M 5114 500 . .

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

y
120102 Tom Zhou 500 108755 6

50

a r t i o ...

i e t b u
p r
Execution
Partial orion.sales

t r i S
data work.comp;

o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title

r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;

S P r
120103
120121

o
...

o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);

f
120122 6756 run;

A PDV

S No tsit
Employee
_ID
120103
d e
... DGender ... DHire_Date Bonus Compensation BonusMonth

M 5114 500 88475 .

ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6

51 ...
9.1 Creating Variables 9-23

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee drop Gender Salary Job_Title
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120103 M 5114 500 88475 1

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

y
120102 Tom Zhou 500 108755 6

52

a r t i o ...

i e t b u
p r
Execution
Partial orion.sales

t r i S
data work.comp;

o dis SA
set orion.sales;
Employee drop Gender Salary Job_Title

r
Hire_Date
_ID Country Birth_Date
120102 10744 Hire_Date;

S P 120103
120121

o r
...

o f
5114
5114
Bonus=500;
Compensation=sum(Salary,Bonus);
BonusMonth=month(Hire_Date);
Implicit OUTPUT;

f
120122 6756 run;

e
Implicit RETURN;

A PDV

S No tsi
_IDt
Employee

120103 M
d
... DGender ... DHire_Date Bonus Compensation BonusMonth

5114 500 88475 1

ou
Work.comp
Employee
First_Name Last_Name Bonus Compensation BonusMonth
_ID
120102 Tom Zhou 500 108755 6
120103 Wilson Dawes 500 88475 1
53
9-24 Chapter 9 Manipulating Data

Execution
data work.comp;
Partial orion.sales set orion.sales;
Employee
Continue until EOF
Hire_Date drop Gender Salary Job_Title
_ID Country Birth_Date
120102 10744 Hire_Date;
120103 ... 5114 Bonus=500;
Compensation=sum(Salary,Bonus);
120121 5114 BonusMonth=month(Hire_Date);

l
120122 6756 run;

ia
PDV

r
Employee
... DGender ... DHire_Date Bonus Compensation BonusMonth

e
_ID

t
120103 M 5114 500 88475 1

Work.comp
Employee

M a
First_Name Last_Name Bonus Compensation BonusMonth

n
_ID

y
120102 Tom Zhou 500 108755 6

54

a r
120103 Wilson

t i o
Dawes 500 88475 1

i e t b u
p r t i
DROP= and KEEP= Options (Self-Study)
r S
Alternatives to the DROP and KEEP statements are the

r o dis SA
DROP= and KEEP= data set options placed in the DATA
statement.

S P o r o f
The DROP= data set option in the DATA statement
excludes the variables for writing to the output data set.

A t f e
DATA output-SAS-data-set (DROP = variable-list) ;

d
S No tsi The KEEP= data set option in the DATA statement
specifies the variables for writing to the output data set.

56
ou DATA output-SAS-data-set (KEEP = variable-list) ;

When specified for a data set named in the DATA statement, the DROP= and KEEP= data set options are
similar to DROP and KEEP statements. However, the DROP= and KEEP= data set options can be used in
situations where the DROP and KEEP statements cannot. For example, the DROP= and KEEP= data set
options can be used in a PROC step to control which variables are available for processing by the
procedure.
9.1 Creating Variables 9-25

DROP= and KEEP= Options (Self-Study)


The DROP= and KEEP= data set options can also be
placed in the SET statement to control which variables
are read from the input data set.
The DROP= data set option in the SET statement
excludes the variables for processing in the PDV.

SET input-SAS-data-set (DROP = variable-list) ;

ia l
e r
The KEEP= data set option in the SET statement

t
specifies the variables for processing in the PDV.

a
SET input-SAS-data-set (KEEP = variable-list) ;

M
57

r y i o n
t a u t
r i e i b
DROP= and KEEP= Options (Self-Study)
r S
p t
data work.comp(drop=Salary Hire_Date);

o dis SA
set orion.sales(keep=Employee_ID First_Name

r
Last_Name Salary Hire_Date);
Bonus=500;

P f
Compensation=sum(Salary,Bonus);

r
BonusMonth=month(Hire_Date);

S
run;

f o e o
A
orion.sales

t d
Employee_ First_ Last_ Birth_ Hire_

S No tsi
Gender Salary Job_ Title Country
ID Name Name Date Date

PDV
Employee_ First_ Last_ Hire_

ou
Salary Bonus Compensation BonusMonth
ID Name Name Date

Work.comp
Employee_ First_ Last_
Bonus Compensation BonusMonth
ID Name Name

p109d02
58
9-26 Chapter 9 Manipulating Data

DROP= and KEEP= Options (Self-Study)

Input SAS Data Set

DROP= and KEEP= options


in the SET statement
PDV

ia l
t e r
DROP and KEEP statements
DROP= and KEEP= options

a
in the DATA statement

M
Output SAS Data Set

y n
59

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9.1 Creating Variables 9-27

Exercises

Level 1

1. Creating Two New Variables


a. Retrieve the starter program p109e01.

ia l
e r
b. In the DATA step, create two new variables, Increase and NewSalary.

t
• Increase is the Salary multiplied by 0.10.

a
• NewSalary is Salary added with Increase.

M
r i o n
c. Include only the following variables: Employee_ID, Salary, Increase, and NewSalary.

y
d. Store formats displaying commas for Salary, Increase, and NewSalary.

t a u t
e. Submit the program to create the following PROC PRINT report:

r i e i b
Partial PROC PRINT Output (First 10 of 424 Observations)

r S
p t
Employee

o dis SA
Annual

r
Obs Employee_ID Salary Increase NewSalary

S P o r o
1
2
3 f
120101
120102
120103
163,040
108,255
87,975
16,304
10,826
8,798
179,344
119,081
96,773

A t f d e
4
5
120104
120105
46,230
27,110
4,623
2,711
50,853
29,821

S No tsi
6 120106 26,960 2,696 29,656
7 120107 30,475 3,048 33,523
8 120108 27,660 2,766 30,426
9 120109 26,495 2,650 29,145

ou
10 120110 28,615 2,862 31,477

Level 2

2. Creating Three New Variables


a. Write a DATA step to read orion.customer to create Work.birthday.

b. In the DATA step, create three new variables, Bday2009, BdayDOW2009, and Age2009.
• Bday2009 is the combination of the month of Birth_Date, the day of Birth_Date, and
the constant of 2009 in the MDY function.
• BdayDOW2009 is the day of the week of Bday2009.
• Age2009 is the age of the customer in 2009. Subtract Birth_Date from Bday2009 and
then divide by 365.25.
9-28 Chapter 9 Manipulating Data

c. Include only the following variables: Customer_Name, Birth_Date, Bday2009,


BdayDOW2009, and Age2009.

d. Format Bday2009 to resemble a two-digit day, a three-letter month, and a


four-digit year. Age2009 should be formatted to appear with no digits after the decimal point.

e. Write a PROC PRINT step to create the following report:


Partial PROC PRINT Output (First 10 of 77 Observations)

Obs Customer_Name
Birth_
Date Bday2009
Bday
DOW2009 Age2009

ia l
1
2
3

t e r
James Kvarniq
Sandrina Stephano
Cornelia Krahl
27JUN1974
09JUL1979
27FEB1974
27JUN2009
09JUL2009
27FEB2009
7
5
6
35
30
35
4
5
6

M a
Karen Ballinger
Elke Wallstab
David Black
18OCT1984
16AUG1974
12APR1969
18OCT2009
16AUG2009
12APR2009
1
1
1
25
35
40

r
7
8

y i o n
Markus Sepke
Ulrich Heyde
21JUL1988
16JAN1939
21JUL2009
16JAN2009
3
6
21
70

t
9 Jimmie Evans 17AUG1954 17AUG2009 2 55

e t a 10 Tonie Asmussen

u
02FEB1954 02FEB2009 2 55

Level 3

r i t r i b S
r p
3. Using the CATX and INTCK Functions to Create Variables

o dis SA
a. Write a DATA step to read orion.sales to create Work.employees.

S P o r o f
b. In the DATA step, create the new variable FullName, which is the combination
of First_Name, a space, and Last_Name. Use the CATX function.

A t f d e
Documentation on the CATX function can be found in the SAS Help

S No tsi
and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Functions and CALL Routines CATX Function).

ou
c. In the DATA step, create the new variable Yrs2012, which is the number of years between
January 1, 2012, and Hire_Date. Use the INTCK function.

Documentation on the INTCK function can be found in the SAS Help


and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Functions and CALL Routines INTCK Function).
d. Format Hire_Date to resemble a two-digit day, a two-digit month, and a
four-digit year.
e. Give Yrs2012 a label of Years of Employment in 2012.
9.1 Creating Variables 9-29

f. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 10 of 165 Observations)
Years of
Employment
Obs FullName Hire_Date in 2012

1 Tom Zhou 01/06/1989 23


2 Wilson Dawes 01/01/1974 38

l
3 Irenie Elvish 01/01/1974 38
4 Christina Ngan 01/07/1978 34

ia
5 Kimiko Hotstone 01/10/1985 27

r
6 Lucian Daymond 01/03/1979 33

e
7 Fong Hofmeister 01/03/1979 33

t
8 Satyakam Denny 01/08/2006 6

a
9 Sharryn Clarkson 01/11/1998 14
10 Monica Kletschkus 01/11/2006 6

y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9-30 Chapter 9 Manipulating Data

9.2 Creating Variables Conditionally

Objectives
Execute statements conditionally by using IF-THEN
and IF-THEN DO statements.
Give alternate actions if the previous THEN clause
is not executed by using the ELSE statement.

ia l
r
Control the length of character variables by using

e
the LENGTH statement.

a t
y M n
a r t i o
i e t b u
i
62

p r t r S
r o dis SA
Business Scenario

S P r o f
A new SAS data set named Work.bonus needs to
be created by reading the orion.sales data set.

o
A t f d e
Work.bonus must include a new variable named

S No tsi Bonus that is equal to


500 for United States employees

ou
300 for Australian employees.

63
9.2 Creating Variables Conditionally 9-31

IF-THEN Statements (Review)


The IF-THEN statement executes a SAS statement
for observations that meet specific conditions.
General form of the IF-THEN statement:

IF expression THEN statement;

expression is a sequence of operands and operators

ia l
r
that form a set of instructions that define a condition

e
for selecting observations.

t
statement is any executable statement such as the

a
assignment statement.

M
64

r y i o n
t a u t
If the condition in the IF clause is met, the IF-THEN statement executes a SAS statement for that
observation.

r i e r i b S
p
o dis SA t
IF-THEN/ELSE Statements (Review)

P r The optional ELSE statement gives an alternate action

f
r
if the previous THEN clause is not executed.

S o o
General form of the IF-THEN/ELSE statements:

f e
A
S No tsit d
IF expression THEN statement;
ELSE IF expression THEN statement;

ou
Using IF-THEN statements without the ELSE
statement causes SAS to evaluate all IF-THEN
statements.
Using IF-THEN statements with the ELSE statement
causes SAS to execute IF-THEN statements until it
encounters the first true statement.

65

Conditional logic can include one or more ELSE IF statements.


For greater efficiency, construct your IF-THEN/ELSE statements with conditions of decreasing
probability.
9-32 Chapter 9 Manipulating Data

Business Scenario
Create the new variable Bonus.
data work.bonus;
set orion.sales;
if Country='US' then Bonus=500;
else if Country='AU' then Bonus=300;
run;

1819
1820
data work.bonus;
set orion.sales;

ia l
r
1821 if Country='US' then Bonus=500;
1822 else if Country='AU' then Bonus=300;

e
1823 run;

a t
NOTE: There were 165 observations read from the data set ORION.SALES.
NOTE: The data set WORK.BONUS has 165 observations and 10 variables

y M n
66

a r t i o p109d03

i e t b u
p r
Business Scenario

t r i
proc print data=work.bonus;
S
r o dis SA
var First_Name Last_Name Country Bonus;
run;

S P o r o f
Partial PROC PRINT Output

f e
Obs First_Name Last_Name Country Bonus

A
S No tsit 60
61
62
63
64
d
Billy
Matsuoka
Vino
Meera
Harry
Plested
Wills
George
Body
Highpoint
AU
AU
AU
AU
US
300
300
300
300
500

ou
65 Julienne Magolan US 500
66 Scott Desanctis US 500
67 Cherda Ridley US 500
68 Priscilla Farren US 500
69 Robert Stevens US 500

p109d03
67
9.2 Creating Variables Conditionally 9-33

9.05 Quiz
Why are some of the Bonus values missing in
the PROC PRINT output for orion.nonsales?
Submit program p109a02.
Review the results.

ia l
t e r
M a
69

r y i o n
t a u t
r i e
ELSE Statements
r i b S
p t
The conditional clause does not have to be in an

o dis SA
ELSE statement.

P r f
r
For example:

S o
data work.bonus;

f e o
set orion.sales;

A
S No tsit
run;
if Country='US' then Bonus=500;

d
else Bonus=300;

ou
All observations not equal to US get a bonus of 300.

p109d03
71
9-34 Chapter 9 Manipulating Data

Business Scenario
A new SAS data set named Work.bonus needs to
be created by reading the orion.sales data set.

Work.bonus must include a new variable named


Bonus that is equal to
500 for United States employees

ia l
r
300 for Australian employees.

t e
Work.bonus must include another new variable

a
named Freq that is equal to

y M
Once a Year for United States employees

n
Twice a Year for Australian employees.
72

a r t i o
i e t b u
p r t i
IF-THEN/ELSE Statements
r S
Only one executable statement is allowed in

r o dis SA
IF-THEN/ELSE statements.

S P o r o f
IF expression THEN statement;
ELSE IF expression THEN statement;

f
ELSE statement;

A
S No tsit d e
For the given business scenario, two statements
need to be executed per each true expression.

ou
Bonus=500;
if Country='US' then
Freq='Once per Year';
73
9.2 Creating Variables Conditionally 9-35

IF-THEN DO/ELSE DO Statements


Multiple executable statements are allowed in
IF-THEN DO/ELSE DO statements.

IF expression THEN DO;


executable statements DO Group

l
END;

ia
ELSE IF expression THEN DO;
executable statements DO Group
END;

t e r
a
Each DO group can contain multiple statements
that apply to the expression.

M n
Each DO group ends with an END statement.

74

a r y t i o
i e t b u
p r
Business Scenario

t r i S
Create another new variable named Freq.

r o dis SA
data work.bonus;
set orion.sales;

S P o
Bonus=500;

o f
if Country='US' then do;

r
Freq='Once a Year';

A t fend;

e
else if Country='AU' then do;

d
S No tsi
Bonus=300;
Freq='Twice a Year';
end;
run;

75
ou p109d04
9-36 Chapter 9 Manipulating Data

Business Scenario
proc print data=work.bonus;
var First_Name Last_Name
Country Bonus Freq;
run;
Partial PROC PRINT Output

l
Obs First_Name Last_Name Country Bonus Freq

ia
60 Billy Plested AU 300 Twice a Yea
61 Matsuoka Wills AU 300 Twice a Yea

r
62 Vino George AU 300 Twice a Yea

e
63 Meera Body AU 300 Twice a Yea

t
64 Harry Highpoint US 500 Once a Year
65 Julienne Magolan US 500 Once a Year

a
66 Scott Desanctis US 500 Once a Year
67 Cherda Ridley US 500 Once a Year

M
68 Priscilla Farren US 500 Once a Year
69 Robert Stevens US 500 Once a Year

76

r y i o n p109d04

t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.bonus;
set orion.sales;

P r if Country='US' then do;

f
Bonus=500;

r
Freq='Once a Year';

S f o
end;

o
else if Country='AU' then do;

e
A
Bonus=300;

S No tsit d Freq='Twice a Year';


end;
run;

ou
PDV
Employee_ID First_Name Hire_Date
...
N 8 $ 12 N 8

77 ...
9.2 Creating Variables Conditionally 9-37

Compilation
data work.bonus;
set orion.sales;
if Country='US' then do;
Bonus=500;
Freq='Once a Year';
end;

l
else if Country='AU' then do;
Bonus=300;

ia
Freq='Twice a Year';

r
end;

e
run;

PDV
Employee_ID First_Name
a t
Hire_Date Bonus

M
...
N 8 $ 12 N 8 N 8

78

r y i o n ...

t a u t
r i e
Compilation
r i b S
p
o dis SA t
data work.bonus;
set orion.sales;

P r if Country='US' then do;


Bonus=500;

f
r
Freq='Once a Year';

S f o
end;

o
else if Country='AU'

e
11 characters then do;

A
Bonus=300;

S No tsit end;
run; d
Freq='Twice a Year';

ou
PDV
Employee_ID First_Name Hire_Date Bonus Freq
...
N 8 $ 12 N 8 N 8 $ 11

79
9-38 Chapter 9 Manipulating Data

9.06 Quiz
How would you prevent Freq from being truncated?

ia l
t e r
M a
81

r y i o n
t a u t
r i e i b
The LENGTH Statement (Review)
r S
p t
The LENGTH statement defines the length of a

o dis SA
variable explicitly.

P r f
r
General form of the LENGTH statement:

S f o o
LENGTH variable(s) $ length;

e
A
S No tsit Example:
d
length First_Name Last_Name $ 12

ou
Gender $ 1;

83
9.2 Creating Variables Conditionally 9-39

Business Scenario
Set the length of the variable Freq to avoid truncation.
data work.bonus;
set orion.sales;
length Freq $ 12;
if Country='US' then do;
Bonus=500;

end;
Freq='Once a Year';

else if Country='AU' then do;

ia l
end;
Bonus=300;
Freq='Twice a Year';

t e r
run;

M a
84

r y i o n p109d04

t a u t
r i e
Business Scenario
r i b S
p t
proc print data=work.bonus;

o dis SA
var First_Name Last_Name

P rrun;
Country Bonus Freq;

f
S f
Obs

o r
First_Name
o
Partial PROC PRINT Output
Last_Name Country Bonus Freq

A
S No tsit
60
61
62
63
64
Billy

Vino
Meera
Harry d
Matsuoka
e Plested
Wills
George
Body
Highpoint
AU
AU
AU
AU
US
300
300
300
300
500
Twice a Year
Twice a Year
Twice a Year
Twice a Year
Once a Year
65 Julienne Magolan US 500 Once a Year

ou
66 Scott Desanctis US 500 Once a Year
67 Cherda Ridley US 500 Once a Year
68 Priscilla Farren US 500 Once a Year
69 Robert Stevens US 500 Once a Year

p109d04
85
9-40 Chapter 9 Manipulating Data

ELSE Statements
The conditional clause does not have to be in an
ELSE statement.
data work.bonus;
set orion.sales;
length Freq $ 12;
if Country='US' then do;
Bonus=500;
Freq='Once a Year';
end;

ia l
else do;
Bonus=300;

t
Freq='Twice a Year';
e r
a
end;
run;

M n
All observations not equal to US execute the

y o
statements in the second DO group.

r i
p109d04
86

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
9.2 Creating Variables Conditionally 9-41

Exercises

Level 1

4. Creating Variables Conditionally


a. Retrieve the starter program p109e04.

ia l
e r
b. In the DATA step, create three new variables, Discount, DiscountType, and Region.

t
a
If Country is equal to CA or US,
• Discount is equal to 0.10

M
r i o n
• DiscountType is equal to Required

y
• Region is equal to North America.

t a u t
If Country is equal to any other value,

r i e r i b
• Discount is equal to 0.05
• DiscountType is equal to Optional

S
p t
• Region is equal to Not North America.

o dis SA
P r
c. Include only the following variables: Supplier_Name, Country, Discount,

f
DiscountType, and Region.

S o r o
d. Submit the program to create the following PROC PRINT report:

f
A
S No tsit Obs
e
Partial PROC PRINT Output (First 10 of 52 Observations)

d
Supplier_Name Country Region Discount
Discount
Type

ou
1 Scandinavian Clothing A/S NO Not North America 0.05 Optional
2 Petterson AB SE Not North America 0.05 Optional
3 Prime Sports Ltd GB Not North America 0.05 Optional
4 Top Sports DK Not North America 0.05 Optional
5 AllSeasons Outdoor Clothing US North America 0.10 Required
6 Sportico ES Not North America 0.05 Optional
7 British Sports Ltd GB Not North America 0.05 Optional
8 Eclipse Inc US North America 0.10 Required
9 Magnifico Sports PT Not North America 0.05 Optional
10 Pro Sportswear Inc US North America 0.10 Required
9-42 Chapter 9 Manipulating Data

Level 2

5. Creating Variables Unconditionally and Conditionally


a. Write a DATA step to read orion.orders to create Work.ordertype.

b. Create the new variable DayOfWeek, which is equal to the week day of Order_Date.

c. Create the new variable Type, which is equal to


• Catalog Sale if Order_Type is equal to 1
• Internet Sale if Order_Type is equal to 2

ia l
t e r
• Retail Sale if Order_Type is equal to 3.

d. Create the new variable SaleAds, which is equal to

M a
• Mail if Order_Type is equal to 1
• Email if Order_Type is equal to 2.

r y i o n
e. Do not include Order_Type, Employee_ID, and Customer_ID.

t a t
f. Write a PROC PRINT step to create the following report:

u
r i e r i b
Partial PROC PRINT Output (First 20 of 490 Observations)

S
Day

p
o dis SA
Obs
t Order_ID
Order_
Date
Delivery_
Date Type
Sale
Ads
Of
Week

P r 1 1230058123

f
11JAN2003 11JAN2003 Catalog Sale Mail 7

r
2 1230080101 15JAN2003 19JAN2003 Internet Sale Email 4

S f o
3
4

e o
1230106883
1230147441
20JAN2003
28JAN2003
22JAN2003
28JAN2003
Internet Sale
Catalog Sale
Email
Mail
2
3

A
5 1230315085 27FEB2003 27FEB2003 Catalog Sale Mail 5

S No tsit 6
7
8
9
d
1230333319
1230338566
1230371142
1230404278
02MAR2003
03MAR2003
09MAR2003
15MAR2003
03MAR2003
08MAR2003
11MAR2003
15MAR2003
Internet Sale
Internet Sale
Internet Sale
Catalog Sale
Email
Email
Email
Mail
1
2
1
7

ou
10 1230440481 22MAR2003 22MAR2003 Catalog Sale Mail 7
11 1230450371 24MAR2003 26MAR2003 Internet Sale Email 2
12 1230453723 24MAR2003 25MAR2003 Internet Sale Email 2
13 1230455630 25MAR2003 25MAR2003 Catalog Sale Mail 3
14 1230478006 28MAR2003 30MAR2003 Internet Sale Email 6
15 1230498538 01APR2003 01APR2003 Catalog Sale Mail 3
16 1230500669 02APR2003 03APR2003 Retail Sale 4
17 1230503155 02APR2003 03APR2003 Internet Sale Email 4
18 1230591673 18APR2003 23APR2003 Internet Sale Email 6
19 1230591675 18APR2003 20APR2003 Retail Sale 6
20 1230591684 18APR2003 18APR2003 Catalog Sale Mail 6
9.2 Creating Variables Conditionally 9-43

Level 3

6. Using WHEN Statements in a SELECT Group to Create Variables Conditionally


a. Write a DATA step to read orion.nonsales to create Work.gifts.

b. Create two new variables, Gift1 and Gift2, using a SELECT group with WHEN statements.

If Gender is equal to F,
• Gift1 is equal to Perfume
• Gift2 is equal to Cookware.

ia l
If Gender is equal to M,

t
• Gift1 is equal to Cologne
e r
a
• Gift2 is equal to Lawn Equipment.

M
r
• Gift1 is equal to Coffee

i o n
If Gender is not equal to F or M,

y
a t
• Gift2 is equal to Calendar.

t u
r i e r i b
Documentation on the SELECT group with WHEN statements can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS

S
p
o dis SA t
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Statements SELECT Statement).

P r
c. Include only the following variables: Employee_ID, First, Last, Gift1, and Gift2.

f
S f o r o
d. Write a PROC PRINT step to create the following report:

e
Partial PROC PRINT Output (First 15 of 235 Observations)

A
S No tsit Obs

1
2
d
Employee_ID

120101
120104
First

Patrick
Kareen
Last

Lu
Billington
Gift1

Cologne
Perfume
Gift2

Lawn Equipment
Cookware

ou
3 120105 Liz Povey Perfume Cookware
4 120106 John Hornsey Cologne Lawn Equipment
5 120107 Sherie Sheedy Perfume Cookware
6 120108 Gladys Gromek Perfume Cookware
7 120108 Gabriele Baker Perfume Cookware
8 120110 Dennis Entwisle Cologne Lawn Equipment
9 120111 Ubaldo Spillane Cologne Lawn Equipment
10 120112 Ellis Glattback Perfume Cookware
11 120113 Riu Horsey Perfume Cookware
12 120114 Jeannette Buddery Coffee Calendar
13 120115 Hugh Nichollas Cologne Lawn Equipment
14 . Austen Ralston Cologne Lawn Equipment
15 120117 Bill Mccleary Cologne Lawn Equipment
9-44 Chapter 9 Manipulating Data

9.3 Subsetting Observations

Objectives
Subset observations by using the WHERE statement.
Subset observations by using the subsetting IF
statement.
Subset observations by using the IF-THEN DELETE

ia l
r
statement. (Self-Study)

a t e
y M n
a r t i o
i e t b u
i
90

p r t r S
r o dis SA
Business Scenario

S P r o f
A new SAS data set named Work.december needs
to be created by reading the orion.sales data set.

o
A t f e
Work.december must include the following new
variables:

d
S No tsi Bonus, which is equal to a constant 500.
Compensation, which is the combination of the

ou
employee's salary and bonus.
BonusMonth, which is equal to the month the
employee was hired.
Work.december must include only the employees
from Australia who have a bonus month in December.

91
9.3 Subsetting Observations 9-45

The WHERE Statement (Review)


The WHERE statement subsets observations that meet
a particular condition.
General form of the WHERE statement:
WHERE where-expression;

The where-expression is a sequence of operands and


operators that form a set of instructions that define a

ia l
r
condition for selecting observations.

e
Operands include constants and variables.

t
Operators are symbols that request a comparison,

a
arithmetic calculation, or logical operation.

M
92

r y i o n
t a u t
r i e i b
Processing the WHERE Statement
r S
p t
The WHERE statement selects observations before they

o dis SA
are brought into the program data vector.

P r f
S f o r o
Input SAS Data Set

A
S No tsit
PDV

d e WHERE statement

93
ou Output SAS Data Set
9-46 Chapter 9 Manipulating Data

9.07 Quiz
Why does the WHERE statement not work in this
DATA step?

data work.december;
set orion.sales;
BonusMonth=month(Hire_Date);

l
Bonus=500;

ia
Compensation=sum(Salary,Bonus);
where Country='AU' and BonusMonth=12;
run;

t e r
M a
95

r y i o n p109d05

t a u t
r i e i b
The Subsetting IF Statement
r S
p t
The subsetting IF statement continues processing

o dis SA
only those observations that meet the condition.

P r General form of the subsetting IF statement:

f
S f o r o
IF expression;

A
S No tsit d e
The expression is a sequence of operands and operators
that form a set of instructions that define a condition for
selecting observations.
Operands include constants and variables.

ou
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.

97
9.3 Subsetting Observations 9-47

The Subsetting IF Statement


Examples:
if Salary > 50000;

if Last_Name='Smith' and First_Name='Joe';

if Country not in ('GB', 'FR', 'NL');

if Hire_Date = '15APR2008'd;

ia l
if upcase(Gender)='M';
t e r
if BirthMonth = 5 or BirthMonth = 6;

M a
if 40000 <= Compensation <= 80000;

98

r y
if sum(Salary,Bonus) < 43000;

i o n
t a u t
Special WHERE operators such as BETWEEN-AND, IS NULL, IS MISSING, CONTAINS, and LIKE

r i e r i b
cannot be used with the subsetting IF statement.

S
p
o dis SA t
Processing the Subsetting IF Statement

P r f
The subsetting IF statement determines if observations

S f o r
continue being processed in the program data vector.

o
A
S No tsit d e
Input SAS Data Set

WHERE statement
PDV

ou
subsetting IF statement

Output SAS Data Set


99
9-48 Chapter 9 Manipulating Data

Processing the Subsetting IF Statement

DATA Statement DATA Statement

Read Observation Read Observation


or Record or Record

IF Expression IF Expression
FALSE

ia l
TRUE
Continue Processing

t e r
a
Observation A false IF expression
causes the contents of
Output Observation

y
to SAS Data Set
M n
the PDV to not output.

100

a r t i o
i e t b u
p r
Business Scenario

t r i S
Include only the employees from Australia who have

r o dis SA
a bonus month in December.

P
data work.december;

S o r
set orion.sales;

o f
where Country='AU';

f
BonusMonth=month(Hire_Date);

A
S No tsit d e
if BonusMonth=12;
Bonus=500;
Compensation=sum(Salary,Bonus);
run;

ou
Partial SAS Log
NOTE: There were 63 observations read from the data set ORION.SALES.
WHERE Country='AU';
NOTE: The data set WORK.DECEMBER has 3 observations and 12 variables.

p109d05
101
9.3 Subsetting Observations 9-49

9.08 Quiz
Could you write only an IF statement?
Yes
No
data work.december;
set orion.sales;
where Country='AU';
BonusMonth=month(Hire_Date);
if BonusMonth=12;

ia l
Bonus=500;

e
data work.december;
Compensation=sum(Salary,Bonus);
run;

t
set orion.sales;
r
BonusMonth=month(Hire_Date);

Bonus=500;

M a
if BonusMonth=12 and Country='AU';

Compensation=sum(Salary,Bonus);

103
run;

r y i o n p109d05

t a u t
r i e i b
WHERE Statement versus Subsetting
r S
p
IF Statement

o dis SA t
P r Step and Usage
PROC step

f
WHERE
Yes
IF
No

S o r o
DATA step (source of variable)

f
A
S No tsit d e
INPUT statement
assignment statement
SET statement (single data set)
No
No
Yes
Yes
Yes
Yes

ou
SET/MERGE statement (multiple data sets)
Variable in ALL data sets Yes Yes
Variable not in ALL data sets No Yes

105
9-50 Chapter 9 Manipulating Data

The IF-THEN DELETE Statement (Self-Study)


An alternative to the subsetting IF statement is the
DELETE statement in an IF-THEN statement.

General form of the IF-THEN DELETE statement:

IF expression THEN DELETE;

ia l
observation.

t e r
The DELETE statement stops processing the current

M a
107

r y i o n
t a u t
When the DELETE statement executes, the current observation is not written to a data set, and SAS

r i e r i b
returns immediately to the beginning of the DATA step for the next iteration.

S
p
o dis SA t
The IF-THEN DELETE Statement (Self-Study)

P r data work.december;

f
S f o r o
set orion.sales;
where Country='AU';

e
BonusMonth=month(Hire_Date);

A
S No tsit
if BonusMonth ne 12 then delete;

d
Bonus=500;
Compensation=sum(Salary,Bonus);
run;

ou
equivalent
data work.december;
set orion.sales;
where Country='AU';
BonusMonth=month(Hire_Date);
if BonusMonth=12;
Bonus=500;
Compensation=sum(Salary,Bonus);
run;
p109d06
108
9.3 Subsetting Observations 9-51

Exercises

Level 1

7. Subsetting Observations Based on Two Conditions


a. Retrieve the starter program p109e07.

ia l
e r
b. In the DATA step, write a statement to select only the observations that have Emp_Hire_Date

t
greater than or equal to July 1, 2006. Subset the observations as they are being read into the
program data vector.

M a
c. In the DATA step, write another statement to select only the observations that have an increase

r y
greater than 3000.

i o n
t
d. Submit the program to create the following PROC PRINT report:

e t a u
Employee
Annual Employee

r i Obs

t r i b
Employee ID

S
Salary Hire Date Increase NewSalary

p
1 120128 30,890 01NOV2006 3,089 33,979

o dis SA
2 120144 30,265 01OCT2006 3,027 33,292

r
3 120161 30,785 01OCT2006 3,079 33,864
4 120264 37,510 01DEC2006 3,751 41,261

S P o r
5
6
7
o f 120761
120995
121055
30,960
34,850
30,185
01JUL2006
01AUG2006
01AUG2006
3,096
3,485
3,019
34,056
38,335
33,204

A t f 8

d
9
e
121062
121085
30,305
32,235
01AUG2006
01JAN2007
3,031
3,224
33,336
35,459

S No tsi
Level 2
10 121107 31,380 01JUL2006 3,138 34,518

ou
8. Subsetting Observations Based on Three Conditions
a. Write a DATA step to read orion.orders to create Work.delays.

b. Create the new variable Order_Month, which is equal to the month of Order_Date.

c. Use a WHERE statement and a subsetting IF statement to select only the observations that
meet all of the following conditions:
• Delivery_Date values that are more than four days beyond Order_Date
• Employee_ID values that are equal to 99999999
• Order_Month values occurring in August
9-52 Chapter 9 Manipulating Data

d. Write a PROC PRINT step to create the following report:


Order_ Order_ Delivery_ Order_
Obs Order_ID Type Employee_ID Customer_ID Date Date Month

1 1231227910 2 99999999 70187 13AUG2003 18AUG2003 8


2 1231270767 3 99999999 52 20AUG2003 26AUG2003 8
3 1231305521 2 99999999 16 27AUG2003 04SEP2003 8
4 1231317443 2 99999999 61 29AUG2003 03SEP2003 8
5 1233484749 3 99999999 2550 10AUG2004 15AUG2004 8

l
6 1233514453 3 99999999 70201 15AUG2004 20AUG2004 8
7 1236673732 3 99999999 9 10AUG2005 15AUG2005 8

ia
8 1240051245 3 99999999 71 30AUG2006 05SEP2006 8

r
9 1243165497 3 99999999 70201 24AUG2007 29AUG2007 8

Level 3

a t e
M
9. Using an IF-THEN DELETE Statement to Subset Observations

r y
Work.bigdonations.

i o n
a. Write a DATA step to read orion.employee_donations to create

t a u t
b. Create the new variable Total, which is equal to the sum of Qtr1, Qtr2, Qtr3, and Qtr4.

i e i b
c. Create the new variable NoDonation, which is equal to the count of missing values in Qtr1,

r r S
Qtr2, Qtr3, and Qtr4. Use the NMISS function.

p
o dis SA t
Documentation on the NMISS function can be found in the SAS Help and

P r Documentation from the Contents tab (SAS Products Base SAS

f
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements

S f o r o
Functions and CALL Routines NMISS Function).

e
d. The final data set should contain only observations meeting the following two conditions:

A
S No tsit d
• Total values greater than or equal to 50
• NoDonation values equal to 0.
Use an IF-THEN DELETE statement to eliminate the observations where the conditions are not

ou
met.

The IF-THEN DELETE statement is mentioned at the end of this section in a self-study
section.
e. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 7 of 50 Observations)
No
Obs Employee_ID Qtr1 Qtr2 Qtr3 Qtr4 Total Donation

1 120267 15 15 15 15 60 0
2 120269 20 20 20 20 80 0
3 120271 20 20 20 20 80 0
4 120275 15 15 15 15 60 0
5 120660 25 25 25 25 100 0
6 120669 15 15 15 15 60 0
7 120671 20 20 20 20 80 0
9.4 Chapter Review 9-53

9.4 Chapter Review

Chapter Review
1. What is an advantage in using the SUM function
instead of an arithmetic operator?

2. Do the DROP/KEEP statements eliminate variables

ia l
r
from the input or output data set in the DATA step?

t e
3. When would you use DO group statements in the
DATA step?
a
M n
4. What is the default length of a numeric variable

y
r i o
created in an assignment statement?

a t
i e t b u
i
110

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
9-54 Chapter 9 Manipulating Data

9.5 Solutions

Solutions to Exercises
1. Creating Two New Variables
a. Retrieve the starter program.
b. Create two new variables.
data work.increase;

ia l
r
set orion.staff;

e
Increase=Salary*0.10;

run;

a t
NewSalary=sum(Salary,Increase);

run;

y M
proc print data=work.increase label;

n
r t i
c. Include only four variables.

a o
i e t
data work.increase;
set orion.staff;

b u
r
Increase=Salary*0.10;

r i
NewSalary=sum(Salary,Increase);

p t S
keep Employee_ID Salary Increase NewSalary;
run;

r o dis SA
P
run;

S o o f
proc print data=work.increase label;

r
A t f d
data work.increase;
e
d. Format three variables.

S No tsi
set orion.staff;
Increase=Salary*0.10;
NewSalary=sum(Salary,Increase);

ou
keep Employee_ID Salary Increase NewSalary;
format Salary Increase NewSalary comma10.;
run;

proc print data=work.increase label;


run;
e. Submit the program.
2. Creating Three New Variables
a. Write a DATA step.
data work.birthday;
set orion.customer;
run;
9.5 Solutions 9-55

b. Create three new variables.


data work.birthday;
set orion.customer;
Bday2009=mdy(month(Birth_Date),day(Birth_Date),2009);
BdayDOW2009=weekday(Bday2009);
Age2009=(Bday2009-Birth_Date)/365.25;
run;
c. Include only five variables.
data work.birthday;
set orion.customer;

ia l
BdayDOW2009=weekday(Bday2009);

t r
Bday2009=mdy(month(Birth_Date),day(Birth_Date),2009);

e
Age2009=(Bday2009-Birth_Date)/365.25;

run;

M a
keep Customer_Name Birth_Date Bday2009 BdayDOW2009 Age2009;

d. Format two variables.


data work.birthday;

r y i o n
t a
set orion.customer;

u t
Bday2009=mdy(month(Birth_Date),day(Birth_Date),2009);

r i e i b
BdayDOW2009=weekday(Bday2009);
Age2009=(Bday2009-Birth_Date)/365.25;

r S
p t
keep Customer_Name Birth_Date Bday2009 BdayDOW2009 Age2009;

o dis SA
format Bday2009 date9. Age2009 3.;

r
run;

S P o o f
e. Write a PROC PRINT step.

r
proc print data=work.birthday;

f
run;

A
S No tsit d e
3. Using the CATX and INTCK Functions to Create Variables
a. Write a DATA step.
data work.employees;

ou
set orion.sales;
run;
b. Create the new variable FullName.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
run;
c. Create the new variable Yrs2012.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
run;
9-56 Chapter 9 Manipulating Data

d. Format Hire_Date.
data work.employees;
set orion.sales;
FullName=catx(' ',First_Name,Last_Name);
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
format Hire_Date ddmmyy10.;
run;
e. Add a label.
data work.employees;
set orion.sales;

ia l
r
FullName=catx(' ',First_Name,Last_Name);

e
Yrs2012=intck('year',Hire_Date,'01JAN2012'd);
format Hire_Date ddmmyy10.;
t
run;

M a
label Yrs2012='Years of Employment in 2012';

f. Write a PROC PRINT step.

r y i o
proc print data=work.employees label;n
run;
t a u t
var FullName Hire_Date Yrs2012;

i e i b
4. Creating Variables Conditionally

r r S
p t
a. Retrieve the starter program.

o dis SA
r
b. Create three new variables.

P f
data work.region;

S o r
set orion.supplier;
length Region $ 17;

f o
e
if Country in ('CA','US') then do;

A
S No tsit
end;
d
Discount=0.10;
DiscountType='Required';
Region='North America';

ou
else do;
Discount=0.05;
DiscountType='Optional';
Region='Not North America';
end;
run;

proc print data=work.region;


run;
9.5 Solutions 9-57

c. Include only five variables.


data work.region;
set orion.supplier;
length Region $ 17;
if Country in ('CA','US') then do;
Discount=0.10;
DiscountType='Required';
Region='North America';
end;
else do;
Discount=0.05;

ia l
end;
DiscountType='Optional';
Region='Not North America';

t e r
keep Supplier_Name Country

a
Discount DiscountType Region ;

M n
run;

r y
proc print data=work.region;
run;

a t i o
e t
d. Submit the program.

i b u
r r i
5. Creating Variables Unconditionally and Conditionally

p t S
o dis SA
a. Write a DATA step.

P r
data work.ordertype;
set orion.orders;

f
run;

S f o r o
b. Create the new variable DayOfWeek.

A t d e
data work.ordertype;

S No tsi
set orion.orders;
DayOfWeek=weekday(Order_Date);
run;

ou
c. Create the new variable Type.
data work.ordertype;
set orion.orders;
length Type $ 13;
DayOfWeek=weekday(Order_Date);
if Order_Type=1 then do;
Type='Catalog Sale';
end;
else if Order_Type=2 then do;
Type='Internet Sale';
end;
else if Order_Type=3 then do;
Type='Retail Sale';
end;
run;
9-58 Chapter 9 Manipulating Data

d. Create the new variable SaleAds.


data work.ordertype;
set orion.orders;
length Type $ 13 SaleAds $ 5;
DayOfWeek=weekday(Order_Date);
if Order_Type=1 then do;
Type='Catalog Sale';
SaleAds='Mail';

l
end;
else if Order_Type=2 then do;

ia
Type='Internet Sale';
SaleAds='Email';
end;
else if Order_Type=3 then do;
t e r
Type='Retail Sale';
end;

M a
n
run;

a
data work.ordertype;
r y t i o
e. Do not include three variables.

i e t
set orion.orders;

b u
length Type $ 13 SaleAds $ 5;

p r t i
DayOfWeek=weekday(Order_Date);

r
if Order_Type=1 then do;
S
o dis SA
Type='Catalog Sale';

r
SaleAds='Mail';
end;

S P o o f
else if Order_Type=2 then do;

r
Type='Internet Sale';

f
SaleAds='Email';

A end;

S No tsit d e
else if Order_Type=3 then do;

end;
Type='Retail Sale';

drop Order_Type Employee_ID Customer_ID;

ou
run;
f. Write a PROC PRINT step.
proc print data=work.ordertype;
run;
6. Using WHEN Statements in a SELECT Group to Create Variables Conditionally
a. Write a DATA step.
data work.gifts;
set orion.nonsales;
run;
9.5 Solutions 9-59

b. Create two new variables.


data work.gifts;
set orion.nonsales;
length Gift1 Gift2 $ 15;
select(Gender);
when('F') do;
Gift1='Perfume';
Gift2='Cookware';
end;
when('M') do;
Gift1='Cologne';

ia l
Gift2='Lawn Equipment';
end;
otherwise do;
t e r
Gift1='Coffee';
Gift2='Calendar';

M a
n
end;

y
end;
run;

a r t i o
i e
data work.gifts;
t b u
c. Include only five variables.

p r
set orion.nonsales;

t r i
length Gift1 Gift2 $ 15;
S
o dis SA
select(Gender);

r
when('F') do;
Gift1='Perfume';

S P Gift2='Cookware';
end;

o r o f
f
when('M') do;

A
S No tsitend;
d e
Gift1='Cologne';
Gift2='Lawn Equipment';

otherwise do;
Gift1='Coffee';

ou
Gift2='Calendar';
end;
end;
keep Employee_ID First Last Gift1 Gift2;
run;
d. Write a PROC PRINT step.
proc print data=gifts;
run;
9-60 Chapter 9 Manipulating Data

7. Subsetting Observations Based on Two Conditions


a. Retrieve the starter program.
b. Write a statement to select only the observations based on Emp_Hire_Date.
data work.increase;
set orion.staff;
where Emp_Hire_Date>='01JUL2006'd;
Increase=Salary*0.10;
NewSalary=sum(Salary,Increase);
keep Employee_ID Emp_Hire_Date Salary Increase NewSalary;

ia l
r
format Salary Increase NewSalary comma10.;
run;

a t e
proc print data=work.increase label;
run;

M n
c. Write another statement to select only the observations based on Increase.

y
data work.increase;
set orion.staff;

a r t i o
t u
where Emp_Hire_Date>='01JUL2006'd;

e
Increase=Salary*0.10;

r i
if Increase>3000;

r i b
NewSalary=sum(Salary,Increase);

t S
p
keep Employee_ID Emp_Hire_Date Salary Increase NewSalary;

o dis SA
format Salary Increase NewSalary comma10.;
run;
r
P r o f
proc print data=work.increase label;

S
run;

o
A t f e
d. Submit the program.

d
S No tsi
8. Subsetting Observations Based on Three Conditions
a. Write a DATA step.

ou
data work.delays;
set orion.orders;
run;
b. Create a new variable.
data work.delays;
set orion.orders;
Order_Month=month(Order_Date);
run;
9.5 Solutions 9-61

c. Use a WHERE statement and a subsetting IF statement.


data work.delays;
set orion.orders;
where Order_Date+4<Delivery_Date
and Employee_ID=99999999;
Order_Month=month(Order_Date);
if Order_Month=8;
run;
d. Write a PROC PRINT step.
proc print data=work.delays;

ia l
run;

t e r
9. Using an IF-THEN DELETE Statement to Subset Observations
a. Write a DATA step.
data work.bigdonations;
M a
run;

r y
set orion.employee_donations;

i o n
t a t
b. Create the new variable Total.

u
r e
data work.bigdonations;

i i b
set orion.employee_donations;

r
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);

S
run;
p
o dis SA t
P r
c. Create the new variable NoDonation.
data work.bigdonations;

f
S o r o
set orion.employee_donations;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);

f
A run;

S No tsi d e
NoDonation=nmiss(Qtr1,Qtr2,Qtr3,Qtr4);

t
d. Use an IF-THEN DELETE statement.
data work.bigdonations;

ou
set orion.employee_donations;
Total=sum(Qtr1,Qtr2,Qtr3,Qtr4);
NoDonation=nmiss(Qtr1,Qtr2,Qtr3,Qtr4);
if Total < 50 or NoDonation > 0 then delete;
run;
e. Write a PROC PRINT step.
proc print data=work.bigdonations;
var Employee_ID Qtr1 Qtr2 Qtr3 Qtr4 Total NoDonation;
run;
9-62 Chapter 9 Manipulating Data

Solutions to Student Activities (Polls/Quizzes)

9.01 Quiz – Correct Answer


What is the result of the assignment statement?
a. . (missing)
b.
c.
0
7
num = 4 + 10 / 2;

ia l
d. 9

t e r
M a
The order of operations from left to right is division and
multiplication followed by addition and subtraction.

y o n
Parentheses can be used to control the order of
operations.
r i
t a t
num = (4 + 10) / 2;

u
13

r i e r i b S
p
o dis SA t
P r 9.02 Quiz – Correct Answer

f
What is the result of the assignment statement given

S f o r o
the values of var1 and var2?

A
S No tsit
a.
b.
c.
0
5 d e
. (missing)

d. 10

ou
num = var1 + var2 / 2; var1 var2
. 10

If an operand is missing for an arithmetic operator,


the result is missing.

16
9.5 Solutions 9-63

9.03 Quiz – Correct Answer


What statement needs to be added to the DATA step
to eliminate six of the 12 variables?

the DROP or KEEP statement

ia l
t e r
M a
26

r y i o n
t a u t
r i e i b
9.04 Poll – Correct Answer
r S
p t
Are the correct results produced when the DROP

o dis SA
statement is placed after the SET statement?

P r Yes

f
r
No

S f o e o
A
Yes, the DROP statement specifies the names

S No tsit d
of the variables to omit from the output data set.

33
ou
9-64 Chapter 9 Manipulating Data

9.05 Quiz – Correct Answer


Why are some of the Bonus values missing in
the PROC PRINT output for orion.nonsales?

Country has mixed case values in the


orion.nonsales data set.

The UPCASE function will correct the issue.

ia l
data work.bonus;
set orion.nonsales;

t
if upcase(Country)='US'
e r
then Bonus=500;
a
else if upcase(Country)='AU'

M
then Bonus=300;

70
run;

r y i o n p109a02s

t a u t
r i e i b
9.06 Quiz – Correct Answer
r S
p t
How would you prevent Freq from being truncated?

o dis SA
P r Possible solutions:

f
S f o r o
Pad the first occurrence of the Freq value with
blanks to be the length of the longest possible

A
S No tsit
value.

d e
Switch conditional statements to place the
longest value of Freq in the first conditional
statement.

ou
Add a LENGTH statement to declare the byte size
of the variable up front.

82
9.5 Solutions 9-65

9.07 Quiz – Correct Answer


Why does the WHERE statement not work in this
DATA step?

data work.december;
set orion.sales;
BonusMonth=month(Hire_Date);

l
Bonus=500;

ia
Compensation=sum(Salary,Bonus);
where Country='AU' and BonusMonth=12;
run;

t e r
a
The WHERE statement can only subset variables that
are coming from an existing data set.

M n
ERROR: Variable BonusMonth is not on file ORION.SALES.

y
96

a r t i o p109d05

i e t b u
p r t i
9.08 Quiz – Correct Answer
r S
Could you write only an IF statement?

r o dis SA
Yes
No

S P o r o f
Yes, but the program using both the

A t f e
WHERE and IF statements is more efficient.

d
S No tsi Both methods create a data set with three
observations. The program using both statements

ou
reads 63 observations into the PDV. The program
using only the IF statement reads 165 observations
into the PDV.

p109d05
104
9-66 Chapter 9 Manipulating Data

Solutions to Chapter Review

Chapter Review Answers


1. What is an advantage in using the SUM function
instead of an arithmetic operator?
The SUM function ignores missing values.

2. Do the DROP/KEEP statements eliminate variables


ia l
the output data set

t e r
from the input or output data set in the DATA step?

M a
3. When would you use DO group statements in the

n
DATA step?

y t i o
Use DO group statements when you want to

r
execute multiple statements as a result of a true

a
t
IF expression in an IF-THEN statement.
111

i e i b u continued...

p r t r S
r o dis SA
Chapter Review Answers

P f
4. What is the default length of a numeric variable

S f o r
8 bytes
o
created in an assignment statement?

A
S No tsit d e
ou
112
Chapter 10 Combining SAS Data
Sets

10.1 Introduction to Combining Data Sets ......................................................................... 10-3

ia l
t e r
10.2 Appending a Data Set (Self-Study) ............................................................................. 10-7
Exercises ............................................................................................................................ 10-17

M a
10.3 Concatenating Data Sets ........................................................................................... 10-19

y o n
Exercises ............................................................................................................................ 10-41

r i
t a u t
10.4 Merging Data Sets One-to-One .................................................................................. 10-44

r i e r i b S
10.5 Merging Data Sets One-to-Many ............................................................................... 10-53

p
o dis SA t
Exercises ............................................................................................................................ 10-64

P r f
r
10.6 Merging Data Sets with Nonmatches ........................................................................ 10-66

S f o o
Exercises ............................................................................................................................ 10-89

e
A
S No tsit d
10.7 Chapter Review ........................................................................................................... 10-92

ou
10.8 Solutions ..................................................................................................................... 10-93
Solutions to Exercises ........................................................................................................ 10-93

Solutions to Student Activities (Polls/Quizzes) ................................................................. 10-100

Solutions to Chapter Review ............................................................................................ 10-107


10-2 Chapter 10 Combining SAS Data Sets

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10.1 Introduction to Combining Data Sets 10-3

10.1 Introduction to Combining Data Sets

Objectives
Define the methods for combining SAS data sets.

ia l
t e r
M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rAppending and Concatenating

f
r
Appending and concatenating involves combining SAS

S f o e o
data sets, one after the other, into a single SAS data set.

A
S No tsit d
SAS Data Set Appending adds the observations
in the second data set directly to
the end of the original data set.

+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.

4
10-4 Chapter 10 Combining SAS Data Sets

Merging
Merging involves combining observations from two or
more SAS data sets into a single observation in a new
SAS data set.

SAS Data Set SAS Data Set

+
ia l
t e r
a
Observations can be merged based on their positions in
the original data sets or merged by one or more common
variables.

y M n
5

a r t i o
i e t b u
p r t i
Example: Appending a Data Set
r S
One data set is appended to a master data set.

r o dis SA
Emps

S P r
First
Stacey

o
F
f
Gender HireYear

o 2006 Emps

f
Gloria F 2007 First Gender HireYear

A
S No tsit
James

Emps2008
First
M

d e 2007

Gender HireYear
Stacey
Gloria
James
Brett
F
F
M
M
2006
2007
2007
2008

ou
Renee F 2008
Brett M 2008
Renee F 2008

6
10.1 Introduction to Combining Data Sets 10-5

Example: Concatenating Data Sets


Two data sets are concatenated to create a new data set.

EmpsDK
First Gender Country
Lars M Denmark EmpsAll1

l
Kari F Denmark First Gender Country

ia
Jonas M Denmark Lars M Denmark
Kari F Denmark

EmpsFR
First Gender Country

t e r Jonas
Pierre
Sophie
M
M
F
Denmark
France
France

a
Pierre M France
Sophie F France

y M n
7

a r t i o
i e t b u
p r t i
Example: Merging Data Sets
r S
Two data sets are merged to create a new data set.

r o dis SA
EmpsAU PhoneH

S P First

o
Togar
r
Gender EmpID
M

o
121150
f EmpID Phone
121150 +61(2)5555-1793

f
121151 +61(2)5555-1849

e
Kylie F 121151

A
S No tsit
Birin M

d
121152 121152 +61(2)5555-1665

EmpsAUH

ou
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1793
Kylie F 121151 +61(2)5555-1849
Birin M 121152 +61(2)5555-1665
8
10-6 Chapter 10 Combining SAS Data Sets

10.01 Quiz
Which method (appending, concatenating, or merging)
should be used for the given business scenario?

Business Scenario Method

The JanSales, FebSales, and


1 MarSales data sets need to be combined
to create the Qtr1Sales data set.

ia l
2
r
The Sales data set needs to be combined

e
with the Target data set by month to

t
compare the sales data to the target data.

M a
The OctSales data set needs to be
added to the YTD data set.

10

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10.2 Appending a Data Set (Self-Study) 10-7

10.2 Appending a Data Set (Self-Study)

Objectives
Append one SAS data set to another SAS data set
by using the APPEND procedure.
Append a SAS data set containing additional variables
to another SAS data set by using the FORCE option

ia l
r
with the APPEND procedure.

a t e
y M n
a r t i o
i e t b u
i
14

p r t r S
r o dis SA
Appending and Concatenating

S P o r o f
Appending and concatenating involves combining SAS
data sets, one after the other, into a single SAS data set.

A t f d e Appending adds the observations

S No tsi
SAS Data Set
in the second data set directly to
the end of the original data set.

+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.

15
10-8 Chapter 10 Combining SAS Data Sets

The APPEND Procedure


The APPEND procedure adds the observations from
one SAS data set to the end of another SAS data set.
General form of the APPEND procedure:

PROC APPEND BASE = SAS-data-set

l
DATA = SAS-data-set;

ia
RUN;

are added.

t e r
BASE= names the data set to which observations

a
DATA= names the data set containing observations
that are added to the base data set.

y M n
16

a r t i o
i e t b u
p r
Requirements:
t i
The APPEND Procedure
r S
r o dis SA
Only two data sets can be used at a time in one step.
The observations in the base data set are not read.

S P o r o f
The variable information in the descriptor portion of
the base data set cannot change.

A t f d e
S No tsi
17
ou
10.2 Appending a Data Set (Self-Study) 10-9

Business Scenario
Emps is a master data set
that contains employees
hired in 2006 and 2007.

Emps
First Gender HireYear
Stacey
Gloria
F
F
2006
2007

ia l
r
James M 2007

a t e
y M n
18

a r t i o
i e t b u
p r
Business Scenario

t r i
Emps is a master data set
S
Emps2008

r o dis SA
that contains employees
hired in 2006 and 2007.
First
Brett
Gender HireYear
M 2008

S P Emps

o r o f Renee

Emps2009
F 2008

f
First Gender HireYear

A
S No tsi
Stacey

t
Gloria
James
F
F
M
d e 2006
2007
2007
First HireYear
Sara
Dennis

Emps2010
2009
2009

ou
The employees hired in First HireYear Country
Rose 2010 Spain
2008, 2009, and 2010
Eric 2010 Spain
need to be appended.

19
10-10 Chapter 10 Combining SAS Data Sets

10.02 Quiz
How many observations will be in Emps after appending
the three data sets?
Emps2008
First Gender HireYear
Brett M 2008

l
Renee F 2008
Emps

ia
Emps2009
First Gender HireYear

r
First HireYear
Stacey F 2006

e
Sara 2009
Gloria F 2007

t
Dennis 2009
James M 2007

M a Emps2010
First HireYear Country

n
Rose 2010 Spain

y
Eric 2010 Spain
21

a r t i o
i e t b u
p r
10.03 Quiz

t r i S
How many variables will be in Emps after appending

r o dis SA
the three data sets?
Emps2008

S P o r o f First
Brett
Gender HireYear
M 2008

f e
Renee F 2008

A
S No tsit
Emps
First
Stacey
Gloria d
Gender HireYear
F
F
2006
2007
Emps2009
First HireYear
Sara
Dennis
2009
2009

ou
James M 2007
Emps2010
First HireYear Country
Rose 2010 Spain
Eric 2010 Spain
24
10.2 Appending a Data Set (Self-Study) 10-11

Like-Structured Data Sets


Emps Emps2008
First Gender HireYear First Gender HireYear
Stacey F 2006 Brett M 2008
Gloria F 2007 Renee F 2008
James M 2007

The data sets contain the same variables.

ia l
r
proc append base=Emps
data=Emps2008;
run;

a t e
y M n
26

a r t i o p110d01

i e t b u
p r t r i
Like-Structured Data Sets
S
o dis SA
84 proc append base=Emps
85 data=Emps2008;

r
86 run;

S P o o
WORK.EMPS2008.
f
NOTE: Appending WORK.EMPS2008 to WORK.EMPS.

r
NOTE: There were 2 observations read from the data set

NOTE: 2 observations added.

A t f d e
NOTE: The data set WORK.EMPS has 5 observations and 3 variables.

S No tsiEmps
First
Stacey
Gender HireYear
F 2006

ou
Gloria F 2007
James M 2007
Brett M 2008
Renee F 2008

27
10-12 Chapter 10 Combining SAS Data Sets

Unlike-Structured Data Sets


Emps Emps2009
First Gender HireYear First HireYear
Stacey F 2006 Sara 2009
Gloria F 2007 Dennis 2009
James M 2007
Brett M 2008
Renee F 2008

The BASE= data set has a variable that is not in the


ia l
DATA= data set.

t e r
a
proc append base=Emps
data=Emps2009;

M
run;

28

r y i o n p110d01

t a u t
r i e i b
Unlike-Structured Data Sets
r S
p
90

o dis SA
91
t
proc append base=Emps
data=Emps2009;

r
92 run;

P f
NOTE: Appending WORK.EMPS2009 to WORK.EMPS.

r
WARNING: Variable Gender was not found on DATA file.

o
NOTE: There were 2 observations read from the data set

S o
WORK.EMPS2009.

f e
NOTE: 2 observations added.

A
NOTE: The data set WORK.EMPS has 7 observations and 3 variables.

S No tsit Emps
First
Stacey
d Gender
F
HireYear
2006

ou
Gloria F 2007
James M 2007
Brett M 2008
Renee F 2008
Sara 2009
Dennis 2009
29
10.2 Appending a Data Set (Self-Study) 10-13

Unlike-Structured Data Sets


Emps Emps2010
First Gender HireYear First HireYear Country
Stacey F 2006 Rose 2010 Spain
Gloria F 2007 Eric 2010 Spain
James M 2007
Brett M 2008
Renee
Sara
F 2008
2009

ia l
r
Dennis 2009

t e
The DATA= data set has a variable that is not in the
BASE= data set.

a
proc append base=Emps

M
data=Emps2010;

y n
o
run;

r i
p110d01
30

t a u t
r i e i b
Unlike-Structured Data Sets
r S
96
97
p t
proc append base=Emps

o dis SA
data=Emps2010;

r
98 run;

P f
NOTE: Appending WORK.EMPS2010 to WORK.EMPS.

r
WARNING: Variable Country was not found on BASE file. The

o
variable will not be added to the BASE file.

S o
WARNING: Variable Gender was not found on DATA file.

f e
ERROR: No appending done because of anomalies listed above.

A
Use FORCE option to append these files.

t d
NOTE: 0 observations added.

S No tsi
NOTE: The data set WORK.EMPS has 7 observations and 3 variables.
NOTE: Statements not processed because of errors noted above.

NOTE: The SAS System stopped processing this step because of

ou
errors.

31
10-14 Chapter 10 Combining SAS Data Sets

Unlike-Structured Data Sets


The FORCE option forces the observations to be
appended when the DATA= data set contains variables
that are not in the BASE= data set.
General form of the FORCE option:

l
PROC APPEND BASE = SAS-data-set

ia
DATA = SAS-data-set FORCE;
RUN;

t e r
The FORCE option causes the extra variables to be
dropped and issues a warning message.
proc append base=Emps

M a
data=Emps2010 force;

32
run;

r y i o n p110d01

t a u t
The FORCE option is needed when the DATA= data set contains variables that either

r i e r i b
• are not in the BASE= data set

S
• do not have the same type as the variables in the BASE= data set

p
o dis SA t
• are longer than the variables in the BASE= data set.

P r
If the length of a variable is longer in the DATA= data set than in the BASE= data set, SAS truncates

f
values from the DATA= data set to fit them into the length that is specified in the BASE= data set.

S o r o
If the type of a variable in the DATA= data set is different than in the BASE= data set, SAS replaces

f
A t d e
all values for the variable in the DATA= data set with missing values and keeps the variable type of
the variable specified in the BASE= data set.

S No tsi
ou
10.2 Appending a Data Set (Self-Study) 10-15

Unlike-Structured Data Sets


100 proc append base=Emps
101 data=Emps2010 force;
102 run;

NOTE: Appending WORK.EMPS2010 to WORK.EMPS.


WARNING: Variable Country was not found on BASE file. The
variable will not be added to the BASE file.
WARNING: Variable Gender was not found on DATA file.

l
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 2 observations read from the data set

ia
WORK.EMPS2010.
NOTE: 2 observations added.

r
NOTE: The data set WORK.EMPS has 9 observations and 3 variables.

a t e
y M n
33

a r t i o
i e t b u
p
Emps r t i
Unlike-Structured Data Sets
r S
r o dis SA
First
Stacey
Gender
F
HireYear
2006

P f
Gloria F 2007

r
James M 2007

S f o
Brett
Renee
M
F

e o 2008
2008

A
S No tsi
Sara

t
Dennis
Rose
Eric d
2009
2009
2010
2010

34
ou
10-16 Chapter 10 Combining SAS Data Sets

Unlike-Structured Data Sets


the observations are appended, but the
If theSituation
BASE= data set Action
observations from the DATA= data set have a
contains
BASE= data a variable
set The observations arethe
appended, but the
missing value for variable that was not
that
contains a in
is not the
variable observations fromDATA=
the DATA= dataThe
set FORCE
have a
present in the data set.
DATA= data
that is not in set,
the missing
optionvalue
is notfor the variable
necessary thatcase.
in this was not present
DATA= data set. in the DATA= data set. The FORCE option is not
If the DATA= data setnecessary in this case.

l
use the FORCE option in the PROC APPEND
contains variables statement to force theinconcatenation of the two

ia
DATA= data set Use the FORCE option the PROC APPEND
that are not in the data sets. The statement drops the extra
contains a variable statement to force the concatenation of the two

r
BASE= data
that is not set,
in the variables
data and
sets. The issues a drops
statement warning
themessage.
extra variable

e
BASE= data set. and issues a warning message.

a t
y M n
35

a r t i o
i e t b u
p r
10.04 Quiz

t r i S
How many observations will be in Emps if the program

r o dis SA
is submitted a second time?

S P o o f
Submitting this program once appends six observations

r
to the Emps data set, which results in a total of nine

f
observations.

A
S No tsit run; d e
proc append base=Emps
data=Emps2008;

proc append base=Emps


3 obs + 2 obs = 5 obs

data=Emps2009; 5 obs + 2 obs = 7 obs

ou
run;
proc append base=Emps
data=Emps2010 force;
run;
7 obs + 2 obs = 9 obs
37
10.2 Appending a Data Set (Self-Study) 10-17

Exercises

Level 1

1. Appending Like-Structured Data Sets


a. Retrieve the starter program p110e01.

ia l
e r
b. Submit the two PROC CONTENTS steps to compare the variables in the two data sets.

t
a
How many variables are in orion.price_current? ______________________________

y M
How many variables are in orion.price_new? ___________________________________

n
o
Does orion.price_new contain any variables that are not in

r t i
orion.price_current?_____________________________________________________

t a
i b u
c. Add a PROC APPEND step after the PROC CONTENTS steps to append orion.price_new

e
to orion.price_current. The FORCE option is not needed.

i
p r t r S
Why is the FORCE option not needed? _____________________________________________

r o dis SA
d. Submit the program and confirm that 88 observations from orion.price_new were added to
orion.price_current, which should now have 259 observations (171 original observations

S P o r o f
plus 88 appended observations).

f
Level 2

A
S No tsit d e
2. Appending Unlike-Structured Data Sets
a. Write and submit two PROC CONTENTS steps to compare the variables in

ou
orion.qtr1_2007 and orion.qtr2_2007.

How many variables are in orion.qtr1_2007? ____________________________________

How many variables are in orion.qtr2_2007? ____________________________________

Which variable is not in both data sets? _____________________________________________


b. Write a PROC APPEND step to append orion.qtr1_2007 to a non-existing data set called
Work.ytd.

c. Submit the PROC APPEND step and confirm that 22 observations were copied to Work.ytd.

d. Write another PROC APPEND step to append orion.qtr2_2007 to Work.ytd. The FORCE
option is needed.
Why is the FORCE option needed? _________________________________________________
10-18 Chapter 10 Combining SAS Data Sets

e. Submit the second PROC APPEND step and confirm that 36 observations from
orion.qtr2_2007 were added to Work.ytd, which should now have 58 observations.

Level 3

3. Using the Append Statement


a. Write and submit three PROC CONTENTS steps to compare the variables in
orion.shoes_eclipse, orion.shoes_tracker, and orion.shoes.

b. Write a PROC DATASETS step with two APPEND statements to append

ia l
r
orion.shoes_eclipse and orion.shoes_tracker to orion.shoes.

a t e
Documentation on the DATASETS procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The DATASETS Procedure).

M n
c. Submit the PROC DATASETS step and confirm that orion.shoes contains 34 observations

y
a r t i o
(10 original observations plus 14 observations from orion.shoes_eclipse and 10
observations from orion.shoes_tracker).

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.3 Concatenating Data Sets 10-19

10.3 Concatenating Data Sets

Objectives
Concatenate two or more SAS data sets by using
the SET statement in a DATA step.
Change the names of variables by using the
RENAME= data set option.

ia l
r
Compare the APPEND procedure to the SET

e
statement. (Self-Study)

a t
Interleave two or more SAS data sets by using the
SET and BY statements in a DATA step. (Self-Study)

y M n
a r t i o
i e t b u
i
42

p r t r S
r o dis SA
Appending and Concatenating

S P o r o f
Appending and concatenating involves combining SAS
data sets, one after the other, into a single SAS data set.

A t f d e Appending adds the observations

S No tsi
SAS Data Set
in the second data set directly to
the end of the original data set.

+
ou
Concatenating copies all
observations from the first data
SAS Data Set set and then copies all
observations from one or more
successive data sets into a new
data set.

43
10-20 Chapter 10 Combining SAS Data Sets

The SET Statement


The SET statement in a DATA step reads observations
from one or more SAS data sets.

DATA SAS-data-set;
SET SAS-data-set1 SAS-data-set2 . . .;
<additional SAS statements>
RUN;

ia l
t e r
Any number of data sets can be in the SET statement.
The observations from the first data set in the SET

a
statement appear first in the new data set. The
observations from the second data set follow those

y M
from the first data set, and so on.

n
44

a r t i o
i e t b u
You must know your data. By default, a compile-time error occurs if the same variable
is not the same type in all SAS data sets in the SET statement.

p r t r i S
r o dis SA
Like-Structured Data Sets

P f
Concatenate EmpsDK and EmpsFR to create a new

S f o r o
data set named EmpsAll1.

e
EmpsDK EmpsFR

A
S No tsit First
Lars
Kari
Jonas
M
F
M
d
Gender Country
Denmark
Denmark
Denmark
First Gender Country
Pierre M
Sophie F
France
France

ou
The data sets contain the same variables.

data EmpsAll1;
set EmpsDK EmpsFR;
run;

p110d02
45
10.3 Concatenating Data Sets 10-21

Compilation
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r
EmpsAll1
First Gender Country

M a
46

r y i o n ...

t a u t
r i
Executione r i b S
p
EmpsDK

o dis SA
First Gender
tCountry
EmpsFR
First Gender Country

P rLars
Kari
M
F
Denmark

f
Denmark
Pierre M
Sophie F
France
France

S
Jonas

f o r M

o
Denmark

e
data EmpsAll1;

A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;

Gender Country
Initialize PDV

EmpsAll1
First Gender Country

47
ou ...
10-22 Chapter 10 Combining SAS Data Sets

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r EmpsAll1
First Gender Country
Lars M

a
Denmark

M
48

r y i o n ...

t a u t
r i e
Execution
r i b S
p
EmpsDK

o dis SA
First
t
Gender Country
EmpsFR
First Gender Country

P r Lars
Kari
M
F
Denmark

f
Denmark
Pierre M
Sophie F
France
France

S f o r
Jonas M

o
Denmark

e
data EmpsAll1;

A
S No tsit PDV
First
d
set EmpsDK EmpsFR;
run;

Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country

ou
Lars M Denmark Lars M Denmark

49 ...

SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by
assignment statements are reset to missing, but variables that are read with a SET statement are not reset
to missing until the input SAS data set changes.
10.3 Concatenating Data Sets 10-23

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r
EmpsAll1
First Gender Country
Kari F Denmark

M a Lars M Denmark

50

r y i o n ...

t a u t
r i
Executione r i b S
p
EmpsDK

o dis SA
First Gender
tCountry
EmpsFR
First Gender Country

P rLars
Kari
M
F
Denmark

f
Denmark
Pierre M
Sophie F
France
France

S
Jonas

f o r M

o
Denmark

e
data EmpsAll1;

A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;

Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country

ou
Kari F Denmark Lars M Denmark
Kari F Denmark

51 ...
10-24 Chapter 10 Combining SAS Data Sets

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r EmpsAll1
First Gender Country
Jonas M

M a
Denmark Lars
Kari
M
F
Denmark
Denmark

52

r y i o n ...

t a u t
r i e
Execution
r i b S
p
EmpsDK

o dis SA
First
t
Gender Country
EmpsFR
First Gender Country

P r Lars
Kari
M
F
Denmark

f
Denmark
Pierre M
Sophie F
France
France

S f o r
Jonas M

o
Denmark

e
data EmpsAll1;

A
S No tsit PDV
First
d
set EmpsDK EmpsFR;
run;

Gender Country
Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country

ou
Jonas M Denmark Lars M Denmark
Kari F Denmark
Jonas M Denmark

53 ...
10.3 Concatenating Data Sets 10-25

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
EOF
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r
EmpsAll1
First Gender Country
Jonas M Denmark

M a Lars
Kari
M
F
Denmark
Denmark

n
Jonas M Denmark

54

a r y t i o ...

i e t b u
p r
Execution
EmpsDK
t r i S
EmpsFR

r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France

P f
Kari F Denmark Sophie F France

S
Jonas

f o r M

o
Denmark

e
data EmpsAll1;

A
S No tsit
PDV
First
d
set EmpsDK EmpsFR;
run;

Gender Country
Reinitialize PDV

EmpsAll1
First Gender Country

ou
Lars M Denmark
Kari F Denmark
Jonas M Denmark

55 ...
10-26 Chapter 10 Combining SAS Data Sets

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r EmpsAll1
First Gender Country
Pierre M France

M a Lars
Kari
M
F
Denmark
Denmark

n
Jonas M Denmark

56

a r y t i o ...

i e t b u
p r
Execution
EmpsDK
t r i S EmpsFR

r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France

P f
Kari F Denmark Sophie F France

S f o r
Jonas M

o
Denmark

e
data EmpsAll1;

A
S No tsit PDV d
set EmpsDK EmpsFR;
run;

First Gender Country


Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country

ou
Pierre M France Lars M Denmark
Kari F Denmark
Jonas M Denmark
Pierre M France
57 ...
10.3 Concatenating Data Sets 10-27

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r
EmpsAll1
First Gender Country
Sophie F France

M a Lars
Kari
M
F
Denmark
Denmark

n
Jonas M Denmark

y
Pierre M France
58

a r t i o ...

i e t b u
p r
Execution
EmpsDK
t r i SEmpsFR

r o dis SA
First
Lars
Gender
M
Country
Denmark
First Gender Country
Pierre M France

P f
Kari F Denmark Sophie F France

S
Jonas

f o r M

o
Denmark

e
data EmpsAll1;

A
S No tsit
PDV d
set EmpsDK EmpsFR;
run;

First Gender Country


Implicit OUTPUT;
Implicit RETURN;
EmpsAll1
First Gender Country

ou
Sophie F France Lars M Denmark
Kari F Denmark
Jonas M Denmark
Pierre M France
59 Sophie F France ...
10-28 Chapter 10 Combining SAS Data Sets

Execution
EmpsDK EmpsFR
First Gender Country First Gender Country
Lars M Denmark Pierre M France
Kari F Denmark EOF
Sophie F France
Jonas M Denmark

l
data EmpsAll1;

ia
set EmpsDK EmpsFR;
run;

PDV
First Gender Country

t e r EmpsAll1
First Gender Country
Sophie F France

M a Lars
Kari
M
F
Denmark
Denmark

n
Jonas M Denmark

y
Pierre M France
60

a r t i o Sophie F France

i e t b u
p r t i
Unlike-Structured Data Sets
r S
Concatenate EmpsCN and EmpsJP to create a new

r o dis SA
data set named EmpsAll2.

S P EmpsCN

o r
First
Chang
Gender
M
o f
Country
China
EmpsJP
First
Cho
Gender Region
F Japan

A t fLi
Ming
M
F

d e China
China
Tomi M Japan

S No tsi The data sets do not contain the same variables.

ou
data EmpsAll2;
set EmpsCN EmpsJP;
run;

p110d03
61
10.3 Concatenating Data Sets 10-29

10.05 Quiz
How many variables will be in EmpsAll2
after concatenating EmpsCN and EmpsJP?

EmpsCN EmpsJP
First Gender Country First Gender Region

l
Chang M China Cho F Japan

ia
Li M China Tomi M Japan
Ming F China

t e r
M a
data EmpsAll2;
set EmpsCN EmpsJP;

n
run;

63

a r y t i o
i e t b u
p r
Compilation
EmpsCN
t r i S
EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

P f
Li M China Tomi M Japan

S
Ming

f o r F

o
China

A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP;
run;

ou
PDV
First Gender Country

65 ...
10-30 Chapter 10 Combining SAS Data Sets

Compilation
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China

data EmpsAll2;

ia l
r
set EmpsCN EmpsJP;
run;

PDV
a t e
M
First Gender Country Region

66

r y i o n
t a u t
r i e
Final Results
r i b S
p
EmpsAll2

o dis SA
First
t
Gender Country Region

P r Chang
Li
M
M
China

f
China

S f o
Cho
r
Ming F
F
o
China
Japan

A
S No tsit
Tomi M

d e Japan

67
ou
10.3 Concatenating Data Sets 10-31

The RENAME= Data Set Option


The RENAME= data set option changes the name
of a variable.
General form of the RENAME= data set option:

SAS-data-set (RENAME = (old-name-1 = new-name-1

l
old-name-2 = new-name-2

ia
...
old-name-n = new-name-n))

t e r
The RENAME= option must be specified in parentheses

a
immediately after the appropriate SAS data set name.
If the RENAME= option is associated with an input data

y M
set in the SET statement, the action applies to the data
set that is being read.
n
68

a r t i o
i e t b u
p r t i
The RENAME= Data Set Option
r
SET statement examples:
S
r o dis SA
set EmpsCN(rename=(Country=Region))

S P r
EmpsJP;

o o f
A f e
set EmpsCN(rename=(First=Fname

t d
S No tsi
Country=Region))
EmpsJP(rename=(First=Fname));

ou
set EmpsCN
EmpsJP(rename=(Region=Country));

69
10-32 Chapter 10 Combining SAS Data Sets

10.06 Quiz
Which statement has correct syntax?
a. Answer
set EmpsCN(rename(Country=Location))
EmpsJP(rename(Region=Location));

l
b. Answer
set EmpsCN(rename=(Country=Location))
EmpsJP(rename=(Region=Location));

c. Answer

e r
set EmpsCN rename=(Country=Location) ia
t
EmpsJP rename=(Region=Location);

a
y M n
71

a r t i o
i e t b u
p r
Compilation
EmpsCN
t r i S EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

S P Li

o r
Ming
M
F

o f
China
China
Tomi M Japan

A t f e
data EmpsAll2;

d
S No tsi
set EmpsCN EmpsJP(rename=(Region=Country));
run;

ou
PDV
First Gender Country

p110d03
73 ...
10.3 Concatenating Data Sets 10-33

Compilation
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China

data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));

ia l
run;

t e r
PDV
First

M
Gender Country
a
77

r y i o n
t a u t
r i e
Final Results
r i b S
p
EmpsAll2

o dis SA t
r
First Gender Country
Chang M China

S P Li
Ming
Cho

o r
M
F
F f
China

o
China
Japan

A t f
Tomi M

d e Japan

S No tsi
78
ou
10-34 Chapter 10 Combining SAS Data Sets

APPEND Procedure versus SET Statement


(Self-Study)
The data set that results from concatenating two
data sets with the SET statement is the same data
set that results from concatenating them with the
APPEND procedure if the two data sets contain the
same variables.
The APPEND procedure concatenates much faster
than the SET statement because the APPEND

ia l
the BASE= data set.

t e r
procedure does not process the observations from

a
The two methods are significantly different when the
variables differ between data sets.

y M n
80

a r t i o
i e t b u
p r
(Self-Study)
t i
APPEND Procedure versus SET Statement
r S
r o dis SA
Criterion APPEND Procedure SET Statement

S P Number of data

o r
concatenate

o
sets that you can
f
Uses two data sets. Uses any number
of data sets.

A t f d e
Handling of data Uses all variables in the
sets that contain BASE= data set and
Uses all variables
and assigns missing

S No tsi different variables assigns missing values to


observations from the
DATA= data set where
values where
appropriate.

ou
appropriate; cannot include
variables found only in the
DATA= data set.

81
10.3 Concatenating Data Sets 10-35

10.07 Multiple Choice Poll (Self-Study)


Which method would you use if you wanted to create
a new variable at the time of concatenation?
a. APPEND procedure
b. SET statement

ia l
t e r
M a
83

r y i o n
t a u t
r i e i b
Interleaving (Self-Study)
r S
p t
Interleaving intersperses observations from two or more

o dis SA
data sets, based on one or more common variables.

P r The SET statement with a BY statement in a DATA step

f
r
interleaves SAS data sets.

S f o o
DATA SAS-data-set;

e
A
S No tsit RUN;
SET SAS-data-set1 SAS-data-set2 . . .;

d
BY <DESCENDING> by-variable(s);
<additional SAS statements>
The data sets must

ou
be sorted by the
BY variable.
Use the SORT procedure to sort
the data sets by the BY variable.

85

Typically, it is more efficient to sort small SAS data sets and then interleave them as opposed
to concatenating several SAS data sets and then sorting the resultant larger file.
10-36 Chapter 10 Combining SAS Data Sets

Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Chang

ia l
r
data EmpsAll2;

e
set EmpsCN EmpsJP(rename=(Region=Country));

t
by First;

a
run;

M
PDV

n
First Gender Country

86

a r y Chang

t i o
M China
p110d03
...

i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

P f
Li M China Tomi M Japan

S f o r
Ming F

o
China
Which value comes first?

A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Reinitialize PDV
Cho

ou
run;

PDV
First Gender Country

87 ...
10.3 Concatenating Data Sets 10-37

Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Cho

ia l
r
data EmpsAll2;

e
set EmpsCN EmpsJP(rename=(Region=Country));

t
by First;

a
run;

M
PDV

n
First Gender Country

88

a r yCho

t
F

i o
Japan
...

i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

P f
Li M China Tomi M Japan

S
Ming

f o r F

o
China
Which value comes first?

A
S No tsit d
data EmpsAll2;e
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Reinitialize PDV
Li

ou
run;

PDV
First Gender Country

89 ...
10-38 Chapter 10 Combining SAS Data Sets

Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
Ming F China
Which value comes first?
Li

ia l
r
data EmpsAll2;

e
set EmpsCN EmpsJP(rename=(Region=Country));

t
by First;

a
run;

M
PDV

n
First Gender Country

90

a r y Li

t i o
M China
...

i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

P f
Li M China Tomi M Japan

S f o r
Ming F

o
China
Which value comes first?

A
S No tsit d e
data EmpsAll2;
set EmpsCN EmpsJP(rename=(Region=Country));
by First;
Ming

ou
run;

PDV
First Gender Country
Ming F China
91 ...
10.3 Concatenating Data Sets 10-39

Interleaving (Self-Study)
EmpsCN EmpsJP
First Gender Country First Gender Region
Chang M China Cho F Japan
Li M China Tomi M Japan
EOF
Ming F China
Which value comes first?
Tomi

ia l
r
data EmpsAll2;

e
set EmpsCN EmpsJP(rename=(Region=Country));
Reinitialize PDV

t
by First;

a
run;

M
PDV

n
First Gender Country

92

a r y t i o ...

i e t b u
p r
EmpsCN
t i
Interleaving (Self-Study)
r S EmpsJP

r o dis SA
First
Chang
Gender
M
Country
China
First
Cho
Gender Region
F Japan

P f
Li M China Tomi M Japan

S
EOF
Ming

f o rF

o
China
Which value comes first?

A
S No tsit d
data EmpsAll2;e Tomi

set EmpsCN EmpsJP(rename=(Region=Country));


by First;

ou
run;

PDV
First Gender Country
Tomi M Japan
93 ...
10-40 Chapter 10 Combining SAS Data Sets

Interleaving (Self-Study)
EmpsAll2
First Gender Country
Chang M China
EmpsCN Cho F Japan
Li M China EmpsJP

l
Ming F China
Tomi M Japan

e r ia
a t
y M n
94

a r t i o
i e t b u
In the case where the data values are equal, the observation is always read first

i
from the first data set listed in the SET statement.

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.3 Concatenating Data Sets 10-41

Exercises

Level 1

4. Concatenating Like-Structured Data Sets


a. Write and submit a DATA step to concatenate orion.mnth7_2007, orion.mnth8_2007,

ia l
t e r
and orion.mnth9_2007 to create a new data set called Work.thirdqtr.

How many observations in Work.thirdqtr are from orion.mnth7_2007? ____________

a
How many observations in Work.thirdqtr are from orion.mnth8_2007? ____________

M
r i o n
How many observations in Work.thirdqtr are from orion.mnth9_2007? ____________

y t
b. Write and submit a PROC PRINT step to create the following report:

t a u
Partial PROC PRINT Output (First 10 of 32 Observations)

e
r i
Obs

r i b
Order_ID

t
Order_

S
Type Employee_ID Customer_ID
Order_
Date
Delivery_
Date

r p
o dis SA
1
2
3
1242691897
1242736731
1242773202
2
1
3
99999999
121107
99999999
90
10
24
02JUL2007
07JUL2007
11JUL2007
04JUL2007
07JUL2007
14JUL2007

S P o
4

r
5
6

o f
1242782701
1242827683
1242836878
3
1
1
99999999
121105
121027
27
10
10
12JUL2007
17JUL2007
18JUL2007
17JUL2007
17JUL2007
18JUL2007

A t f 7
8

d e
1242838815
1242848557
1
2
120195
99999999
41
2806
19JUL2007
19JUL2007
19JUL2007
23JUL2007

S No tsi 9
10
1242923327
1242938120
3
1
99999999
120124
70165
171
28JUL2007
30JUL2007
29JUL2007
30JUL2007

ou
Level 2

5. Concatenating Unlike-Structured Data Sets


a. Retrieve the starter program p110e05.
b. Submit the two PROC CONTENTS steps to compare the variables in the two data sets.
What are the names of the two variables that are different in the two data sets?
orion.sales orion.nonsales
10-42 Chapter 10 Combining SAS Data Sets

c. Add a DATA step after the PROC CONTENTS steps to concatenate orion.sales and
orion.nonsales to create a new data set called Work.allemployees.

Use a RENAME= data set option to change the names of the different variables in
orion.nonsales.

Include only the following five variables: Employee_ID, First_Name, Last_Name,


Job_Title, and Salary.

l
d. Add a PROC PRINT step to create the following report:

ia
Partial PROC PRINT Output (First 10 of 400 Observations)

Obs

t e
Employee_ID
r First_
Name Last_Name Salary Job_Title

1
2
3

M a120102
120103
120121
Tom
Wilson
Irenie
Zhou
Dawes
Elvish
108255
87975
26600
Sales
Sales
Sales
Manager
Manager
Rep. II
4
5

r y
120122

i o
120123
n Christina
Kimiko
Ngan
Hotstone
27475
26190
Sales
Sales
Rep. II
Rep. I

t
6 120124 Lucian Daymond 26480 Sales Rep. I

e t a
7
8

u
120125
120126
Fong
Satyakam
Hofmeister
Denny
32040
26780
Sales
Sales
Rep. IV
Rep. II

i b
9 120127 Sharryn Clarkson 28100 Sales Rep. II

p r
10

t r i 120128

S
Monica Kletschkus 30890 Sales Rep. IV

o dis SA
Level 3

P r
6. Interleaving Data Sets

f
S f o r o
Interleaving data sets is mentioned at the end of this section in a self-study section.

e
Further documentation can be found in the SAS Help and Documentation from the

A
S No tsit
Index tab by typing interleaving data sets.

d
a. Retrieve the starter program p110e06.
b. Add a PROC SORT step after the PROC SORT step in the starter program. The PROC SORT

ou
step needs to sort orion.shoes_tracker by Product_Name to create a new data set
called Work.trackersort.

Documentation on the SORT procedure can be found in the SAS Help


and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The SORT Procedure).
c. Add a DATA step after the two PROC SORT steps to interleave the two sorted data sets
by Product_Name to create a new data set called Work.e_t_shoes.

Include only the following three variables: Product_Group, Product_Name, and


Supplier_ID.
10.3 Concatenating Data Sets 10-43

d. Add a PROC PRINT step to create the following report:


Partial PROC PRINT Output (First 10 of 24 Observations)
Obs Product_Group Product_Name Supplier_ID

1 Eclipse Shoes Atmosphere Imara Women's Running Shoes 1303


2 Eclipse Shoes Atmosphere Shatter Mid Shoes 1303
3 Eclipse Shoes Big Guy Men's Air Deschutz Viii Shoes 1303
4 Eclipse Shoes Big Guy Men's Air Terra Reach Shoes 1303

l
5 Eclipse Shoes Big Guy Men's Air Terra Sebec Shoes 1303
6 Eclipse Shoes Big Guy Men's International Triax Shoes 1303

ia
7 Eclipse Shoes Big Guy Men's Multicourt Ii Shoes 1303

r
8 Eclipse Shoes Cnv Plus Men's Off Court Tennis 1303

e
9 Tracker Shoes Hardcore Junior/Women's Street Shoes Large 14682

t
10 Tracker Shoes Hardcore Men's Street Shoes Large 14682

a
The order of the observations will be different for z/OS (OS/390).

M
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10-44 Chapter 10 Combining SAS Data Sets

10.4 Merging Data Sets One-to-One

Objectives
Define the different types of match-merging.
Prepare data sets for merging using the SORT
procedure.
Merge SAS data sets one-to-one based on a common

ia l
r
variable by using the MERGE and BY statements in

e
a DATA step.

a t
Eliminate duplicate observations using the SORT
procedure. (Self-Study)

y M n
a r t i o
i e t b u
i
97

p r t r S
r o dis SA
Merging

S P o r o f
Merging involves combining observations from two
or more SAS data sets into a single observation in

f
a new SAS data set.

A t
S No + tsi d e
SAS Data Set SAS Data Set

ou
Observations can be merged based on their positions
in the original data sets or merged by one or more
common variables.

98
10.4 Merging Data Sets One-to-One 10-45

Match-Merging
Match-merging combines observations from two or more
SAS data sets into a single observation in a new data set
based on the values of one or more common variables.

A B C C D E A B C C D E
1
2
1
2
1
2
1
1

ia l
r
3 3 2 2

aA
t e B C C D E

M
1 2

n
2 3

y
4 4
99

a r t i o
i e t b u
p r
Match-Merging

t r i S One-to-One

o dis SA
A B C C D E
A single observation in one data set is

r
1 1
related to one and only one observation
2 2

P f
from another data set based on the

r
3 3 values of one or more selected variables.

S f o e o One-to-Many or Many-to-One

A
A B C C D E A single observation in one data set is

S No tsit 1
2
2 d
1
1
2
related to more than one observation
from another data set based on the
values of one or more selected variables
and vice versa.

ou
Nonmatches
A B C C D E
At least one single observation in one
1 2
data set is unrelated to any observation
2 3 from another data set based on the
4 4 values of one or more selected variables.
100
10-46 Chapter 10 Combining SAS Data Sets

Match-Merging
In order to perform match-merging, the observations in
each data set must be sorted by the one or more common
variables that are being matched.
General form of the SORT procedure:

PROC SORT DATA=input-SAS-data-set


<OUT=output-SAS-data-set>;
BY <DESCENDING> by-variable(s);

ia l
RUN;

t e r
a
The SORT procedure orders SAS data set observations
by the values of one or more variables.

y M n
101

a r t i o
i e t b u
p r t r i
The SORT Procedure
S
o dis SA
PROC SORT DATA=input-SAS-data-set

P r <OUT=output-SAS-data-set>;

f
BY <DESCENDING> by-variable(s);

S f o r RUN;

o
A
S No tsit d e
The SORT procedure
rearranges the observations in a SAS data set
either replaces the original data set or creates
a new data set

ou
can sort on multiple variables
can sort in ascending (default) or descending order
does not generate printed output.
102
10.4 Merging Data Sets One-to-One 10-47

10.08 Quiz
Which step is sorting the observations in a SAS data set
and overwriting the same SAS data set?
a. Answer
proc sort data=work.EmpsAU
out=work.sorted;
by First;

l
run;

b. Answer
proc sort data=work.EmpsAU
out=orion.EmpsAU;

e r ia
t
by First;
run;

c. Answer

M a
proc sort data=work.EmpsAU;

104
by First;
run;

r y i o n
t a u t
r i e
The BY Statement
r i b S
p t
The BY statement specifies the sorting variables.

o dis SA
r
PROC SORT first arranges the data set by the values
in ascending order, by default, of the first BY variable.

S P o r o f
PROC SORT then arranges any observations that
have the same value of the first BY variable by the

A t f d e
values of the second BY variable in ascending order.
This sorting continues for every specified BY variable.

S No tsi The DESCENDING option reverses the sort order for the

ou
variable that immediately follows in the statement so that
observations are sorted from the largest value to the
smallest value.

106
10-48 Chapter 10 Combining SAS Data Sets

The BY Statement
BY statement examples:

by Last First;

by descending Last First;

ia l
by Last descending First;

t e r
a
by descending Last descending First;

M
107

r y i o n
t a u t
r i e i
Setup for the Poll
r b S
p t
Retrieve program p110a01.

o dis SA
r
Add a BY statement to the PROC SORT step to sort
the observations first by ascending Gender and

S P o r o f
then by descending Employee_ID within the
values of Gender.

A t f e
Complete the PROC PRINT statement to reference

d
the sorted data set.

S No tsi Submit the program and confirm the sort order in the
PROC PRINT output.

109
ou
10.4 Merging Data Sets One-to-One 10-49

10.09 Multiple Choice Poll


What is the Employee_ID value for the first
observation in the sorted data set?
a. 120102
b. 120121

l
c. 121144

ia
d. 121145

t e r
M a
110

r y i o n
t a u t
r i e i b
The MERGE and BY Statements
r S
p t
The MERGE statement in a DATA step joins observations

o dis SA
from two or more SAS data sets into single observations.

P r f
DATA SAS-data-set;

S f o r o
MERGE SAS-data-set1 SAS-data-set2 . . .;
BY <DESCENDING> by-variable(s);

A
S No tsit RUN;
d e
<additional SAS statements>

A BY statement after the MERGE statement performs

ou
a match-merge.

112
10-50 Chapter 10 Combining SAS Data Sets

The MERGE and BY Statements


Requirements when two or more SAS data sets are
specified in the MERGE statement:
The variables in the BY statement must be common
to all data sets.
The data sets that are listed in the MERGE statement
must be sorted in the order of the values of the
variables that are listed in the BY statement.

ia l
t e r
M a
113

r y i o n
t a u t
r i e
One-to-One Merge
r i b S
p t
Merge EmpsAU and PhoneH by EmpID to create

o dis SA
a new data set named EmpsAUH.

P r EmpsAU
f PhoneH

S f o r
First
Togar M
o
Gender EmpID
121150
EmpID Phone
121150 +61(2)5555-1793

A
S No tsit
Kylie
Birin
F
M

d e 121151
121152
121151 +61(2)5555-1849
121152 +61(2)5555-1665

The data sets are sorted by EmpID.

ou
data EmpsAUH;
merge EmpsAU PhoneH;
by EmpID;
run;
p110d05
114
10.4 Merging Data Sets One-to-One 10-51

Final Results
EmpsAUH
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1793
Kylie F 121151 +61(2)5555-1849
Birin M 121152 +61(2)5555-1665

ia l
t e r
M a
115

r y i o n
t a u t
r i e
10.10 Quiz
r i b S
p t
Retrieve program p110a02.

o dis SA
r
Complete the program to match-merge the sorted
SAS data sets referenced in the PROC SORT steps.

S P o r
necessary.
o f
Submit the program. Correct and resubmit, if

A t f d e
What are the modified, completed statements?

S No tsi
117
ou
10-52 Chapter 10 Combining SAS Data Sets

Eliminating Duplicates with the


SORT Procedure (Self-Study)
The SORT procedure can be used to eliminate duplicate
observations.

PROC SORT Statement Options:


The NODUPKEY option deletes observations with
duplicate BY values.

ia l
t e r
The EQUALS option maintains the relative order
of the observations within the input data set in

a
the output data set for observations with identical
BY values.

y M n
120

a r t i o
i e t b u
p r t i
Eliminating Duplicates with the
r
SORT Procedure (Self-Study)
S
r o dis SA
proc sort data=EmpsDUP

S P o r
by EmpID;
run;
o f
out=EmpsDUP1 nodupkey equals;

A t fEmpsDUP

d e EmpsDUP1

S No tsi First
Matt
Julie
Gender
M
F
EmpID
121160
121161
First
Matt
Julie
Gender
M
F
EmpID
121160
121161

ou
Brett M 121162 Brett M 121162
Julie F 121161 Julie F 121163
Chris F 121161
Julie F 121163
p110d04
121
10.5 Merging Data Sets One-to-Many 10-53

10.5 Merging Data Sets One-to-Many

Objectives
Merge SAS data sets one-to-many based on a
common variable by using the MERGE and BY

l
statements in a DATA step.

e r ia
a t
y M n
a r t i o
i e t b u
i
123

p r t r S
r o dis SA
One-to-Many Merge

S P o o f
Merge EmpsAU and PhoneHW by EmpID to create

r
a new data set named EmpsAUHW.

A t f d e PhoneHW

S No tsi
EmpID Type Phone
EmpsAU
121150 Home +61(2)5555-1793
First Gender EmpID
121150 Work +61(2)5555-1794
Togar M 121150
121151 Home +61(2)5555-1849

ou
Kylie F 121151
121151 Work +61(2)5555-1850
Birin M 121152
121152 Home +61(2)5555-1665
121152 Work +61(2)5555-1666

data EmpsAUHW;
merge EmpsAU PhoneHW; The data sets are
by EmpID; sorted by EmpID.
run;
p110d06
124
10-54 Chapter 10 Combining SAS Data Sets

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
merge EmpsAU Initialize
by EmpID;
run;
PhoneHW;

t e r
PDV

PDV

M a
n
First Gender EmpID Type Phone

y
.
125

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S f o r
Birin M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Do the EmpIDs match?
Yes

ou
run;

PDV
First Gender EmpID Type Phone
.
126 ...
10.5 Merging Data Sets One-to-Many 10-55

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
r
Reads one observation
merge EmpsAU PhoneHW; from each matching
by EmpID;
run;

PDV
a t e data set

Togar M

y M
First Gender EmpID Type

n
Phone
121150 Home +61(2)5555-1793
127

a r t i o ...

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10-56 Chapter 10 Combining SAS Data Sets

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
a
Implicit OUTPUT;
Implicit RETURN;
PDV

Togar M

y M
First Gender EmpID Type

n
Phone
121150 Home +61(2)5555-1793
128

a r t i o ...

i e t b u
SAS reinitializes variables in the PDV at the start of every DATA step iteration. Variables created by
an assignment statement are reset to missing, but variables that are read with a MERGE statement are

r
not reset to missing.

p t r i S
o dis SA
Before reading additional observations during a match-merge, SAS first determines whether there are
observations remaining for the current BY group.

P r
• If there are observations remaining for the current BY group, they are read into the PDV, processed,

f
r
and written to the output data set.

S o o
• If there are no more observations for the current BY group, SAS reinitializes the remainder of the

f e
PDV, identifies the next BY group, and reads the corresponding observations.

A
S No tsit Executiond PhoneHW

ou
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665
121152 Work +61(2)5555-1666

data EmpsAUHW; Do the EmpIDs match?


merge EmpsAU PhoneHW;
by EmpID; No
run;

PDV
First Gender EmpID Type Phone
Togar M 121150 Home +61(2)5555-1793
129 ...
10.5 Merging Data Sets One-to-Many 10-57

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

Is either EmpID the


ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
same as the EmpID
currently in the PDV?

PDV

M a Yes

n
First Gender EmpID Type Phone

y
Togar M 121150 Home +61(2)5555-1793
130

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S
Birin

f o r M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Reads the observation
from the appropriate
data set

ou
run;

PDV
First Gender EmpID Type Phone
Togar M 121150 Work +61(2)5555-1794
131 ...
10-58 Chapter 10 Combining SAS Data Sets

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
a
Implicit OUTPUT;
Implicit RETURN;
PDV

Togar M

y M
First Gender EmpID Type

n
Phone
121150 Work +61(2)5555-1794
132

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S f o r
Birin M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Do the EmpIDs match?
Yes

ou
run;

PDV
First Gender EmpID Type Phone
Togar M 121150 Work +61(2)5555-1794
133 ...
10.5 Merging Data Sets One-to-Many 10-59

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

Is the EmpID the same


ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
as the EmpID currently
in the PDV?

PDV

M a No

n
First Gender EmpID Type Phone

y
Togar M 121150 Work +61(2)5555-1794
134

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S
Birin

f o r M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;

ou
run; Reinitialize PDV

PDV
First Gender EmpID Type Phone
.
135 ...
10-60 Chapter 10 Combining SAS Data Sets

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
r
Reads one observation
merge EmpsAU PhoneHW; from each matching
by EmpID;
run;

PDV
a t e data set

Kylie F

y M
First Gender EmpID Type

n
Phone
121151 Home +61(2)5555-1849
136

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S f o r
Birin M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;

ou
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
First Gender EmpID Type Phone
Kylie F 121151 Home +61(2)5555-1849
137 ...
10.5 Merging Data Sets One-to-Many 10-61

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
r
Do the EmpIDs match?
merge EmpsAU PhoneHW;
by EmpID;
run;

PDV
a t e No

Kylie F

y M
First Gender EmpID Type

n
Phone
121151 Home +61(2)5555-1849
138

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S
Birin

f o r M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;
Is either EmpID the
same as the EmpID
currently in the PDV?

ou
run;
Yes
PDV
First Gender EmpID Type Phone
Kylie F 121151 Home +61(2)5555-1849
139 ...
10-62 Chapter 10 Combining SAS Data Sets

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
r
Reads the observation
merge EmpsAU PhoneHW; from the appropriate
by EmpID;
run;

PDV
a t e data set

Kylie F

y M
First Gender EmpID Type

n
Phone
121151 Work +61(2)5555-1850
140

a r t i o ...

i e t b u
p r
Execution
EmpsAU
t r i S
PhoneHW
EmpID Type Phone

o dis SA
First Gender EmpID 121150 Home +61(2)5555-1793

P r Togar
Kylie
M
F
121150

f
121151
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

S f o r
Birin M

o
121152 121151
121152
Work
Home
+61(2)5555-1850
+61(2)5555-1665

e
121152 Work +61(2)5555-1666

A
S No tsit d
data EmpsAUHW;
merge EmpsAU PhoneHW;
by EmpID;

ou
run;
Implicit OUTPUT;
Implicit RETURN;
PDV
First Gender EmpID Type Phone
Kylie F 121151 Work +61(2)5555-1850
141 ...
10.5 Merging Data Sets One-to-Many 10-63

Execution PhoneHW
EmpsAU EmpID Type Phone
First Gender EmpID 121150 Home +61(2)5555-1793
Togar M 121150 121150 Work +61(2)5555-1794
Kylie F 121151 121151 Home +61(2)5555-1849
Birin M 121152 121151 Work +61(2)5555-1850
121152 Home +61(2)5555-1665

data EmpsAUHW;
121152 Work +61(2)5555-1666

ia l
merge EmpsAU PhoneHW;
by EmpID;
run;
t e r
Continue until EOF
on both data sets

PDV

M a
n
First Gender EmpID Type Phone

y
Kylie F 121151 Work +61(2)5555-1850
142

a r t i o
i e t b u
p r
Final Results
EmpsAUHW
t r i S
r o dis SA
First
Togar
Gender
M
EmpID
121150
Type
Home
Phone
+61(2)5555-1793

S P Togar

r
Kylie

o
M
F

o f
121150
121151
Work
Home
+61(2)5555-1794
+61(2)5555-1849

f
Kylie F 121151 Work +61(2)5555-1850

A
S No tsit
Birin
Birin
M
M

d e 121152
121152
Home
Work
+61(2)5555-1665
+61(2)5555-1666

143
ou
10-64 Chapter 10 Combining SAS Data Sets

Exercises

Level 1

7. Merging orion.orders and orion.order_item in a One-to-Many Merge

a. Retrieve the starter program p110e07.

ia l
e r
b. Submit the two PROC CONTENTS steps to determine the common variable among the
two data sets.
t
M a
c. Add a DATA step after the two PROC CONTENTS steps and prior to the PROC PRINT step to
merge orion.orders and orion.order_item by the common variable to create a new

r y
data set called Work.allorders.

i o n
a t
d. Submit the program and confirm that Work.allorders was created with 732 observations

i e t
and 12 variables.

b u
Level 2

p r t r i S
o dis SA
8. Merging orion.product_level and orion.product_list in a One-to-Many Merge

P r
a. Write a PROC SORT step to sort orion.product_list by Product_Level to create

f
a new data set called Work.product_list.

S o r o
b. Write a DATA step to merge orion.product_level with the previous sorted data set

f
A
S No tsit d e
by the appropriate common variable. Create a new data set called Work.listlevel.

c. Write a PROC PRINT step with a VAR statement to create the following report:
Partial PROC PRINT Output (First 10 of 556 Observations)

ou
Product_
Product_ Level_
Obs Product_ID Product_Name Level Name

1 210200100009 Kids Sweat Round Neck,Large Logo 1 Product


2 210200100017 Sweatshirt Children's O-Neck 1 Product
3 210200200022 Sunfit Slow Swimming Trunks 1 Product
4 210200200023 Sunfit Stockton Swimming Trunks Jr. 1 Product
5 210200300006 Fleece Cuff Pant Kid'S 1 Product
6 210200300007 Hsc Dutch Player Shirt Junior 1 Product
7 210200300052 Tony's Cut & Sew T-Shirt 1 Product
8 210200400020 Kids Baby Edge Max Shoes 1 Product
9 210200400070 Tony's Children's Deschutz (Bg) Shoes 1 Product
10 210200500002 Children's Mitten 1 Product
10.5 Merging Data Sets One-to-Many 10-65

Level 3

9. Joining orion.product_level and orion.product_list in a One-to-Many Merge

a. Write a PROC SQL step to perform an inner join of orion.product_level and


orion.product_list by Product_Level to create a new data set called
Work.listlevelsql. The new data set should include only Product_ID,
Product_Name, Product_Level, and Product_Level_Name.

Documentation on the SQL procedure can be found in the SAS Help


and Documentation from the Contents tab (SAS Products Base SAS

ia l
t e r
Base SAS 9.2 Procedures Guide Procedures The SQL Procedure).
b. Write a PROC PRINT step to create the following report:

M a
Partial PROC PRINT Output (First 10 of 556 Observations)
Product_ Product_Level_
Obs

r y
Product_ID

i o n
Product_Name Level Name

t
1 210000000000 Children 4 Product Line
2
3

e t a
210100000000
210100100000

u
Children Outdoors
Outdoor things, Kids
3
2
Product
Product
Category
Group

i b
4 210200000000 Children Sports 3 Product Category

p r
5
6

t r i
210200100000
210200100009
A-Team, Kids

S
Kids Sweat Round Neck,Large Logo
2
1
Product
Product
Group

o dis SA
7 210200100017 Sweatshirt Children's O-Neck 1 Product
8 210200200000 Bathing Suits, Kids 2 Product Group

P r 9
10
210200200022

f
210200200023
Sunfit Slow Swimming Trunks
Sunfit Stockton Swimming Trunks Jr.
1
1
Product
Product

S f o r o
A
S No tsit d e
ou
10-66 Chapter 10 Combining SAS Data Sets

10.6 Merging Data Sets with Nonmatches

Objectives
Control the observations in the output data set by
using the IN= option.
Output observations to multiple data sets using the
IN= option and the OUTPUT statement. (Self-Study)

ia l
r
Compare the results of a many-to-many merge

e
based on using the DATA step or the SQL procedure.
(Self-Study)

a t
y M n
a r t i o
i e t b u
i
147

p r t r S
r o dis SA
Nonmatches Merge

S P o r o f
Merge EmpsAU and PhoneC by EmpID to create
a new data set named EmpsAUC.

A t fEmpsAU

d e PhoneC

S No tsi
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667

ou
Birin M 121152 121153 +61(2)5555-1348

The data sets are sorted by EmpID.

data EmpsAUC;
merge EmpsAU PhoneC;
by EmpID;
run;
p110d07
148
10.6 Merging Data Sets with Nonmatches 10-67

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAUInitialize
PhoneC; PDV

ia l
by EmpID;
run;

t e r
PDV
First Gender

M
EmpID a Phone

149

r y
.

i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P Kylie
Birin

o r
F
M
121151

o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f
data EmpsAUC;

d e Do the EmpIDs match?

S No tsi
merge EmpsAU PhoneC;
by EmpID; Yes
run;

150
ou
PDV
First Gender EmpID
.
Phone

...
10-68 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;
Reads one observation
from each matching
ia l
by EmpID;
run;

t e r data set

PDV
First

M
Gender EmpID a Phone

151
Togar M

r y i o n
121150 +61(2)5555-1795

...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin

o
F
M

o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;
Implicit OUTPUT;

ou
Implicit RETURN;
PDV
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795

152 ...
10.6 Merging Data Sets with Nonmatches 10-69

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;
Do the EmpIDs match?

ia l
by EmpID;
run;

t e r No

PDV
First Gender EmpID

M a Phone

153
Togar M

r i o n
121150 +61(2)5555-1795

y ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P Kylie
Birin

o r
F
M
121151

o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f
data EmpsAUC;

d e Is either EmpID the

S No tsi
merge EmpsAU PhoneC; same as the EmpID
by EmpID;
run; currently in the PDV?

ou
No
PDV
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795

154 ...
10-70 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;

ia l
by EmpID;
run;

t e r
Reinitialize PDV

PDV
First Gender

M a
EmpID Phone

155

r y
.

i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin

o
F
M

o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
Which EmpID

S No tsi
merge EmpsAU PhoneC; sequentially comes first?
by EmpID;
run; 121151

156
ou
PDV
First Gender EmpID
.
Phone

...
10.6 Merging Data Sets with Nonmatches 10-71

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;
Reads the observation
from the EmpID that
ia l
by EmpID;
run;

t e r sequentially comes first

PDV
First Gender EmpID

M a Phone

157
Kylie F

r
121151

y i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P Kylie
Birin

o r
F
M
121151

o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f
data EmpsAUC;

d e
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;
Implicit OUTPUT;

ou
Implicit RETURN;
PDV
First Gender EmpID Phone
Kylie F 121151

158 ...
10-72 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;
Do the EmpIDs match?

ia l
by EmpID;
run;

t e r Yes

PDV
First

M
Gender EmpID a Phone

159
Kylie F

r y
121151

i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin

o
F
M

o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
Is either EmpID the

S No tsi
merge EmpsAU PhoneC; same as the EmpID
by EmpID;
run; currently in the PDV?

ou
No
PDV
First Gender EmpID Phone
Kylie F 121151

160 ...
10.6 Merging Data Sets with Nonmatches 10-73

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;

ia l
by EmpID;
run;

t e
Reinitialize PDVr
PDV
First Gender

M
EmpID a Phone

161

r y
.

i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P Kylie
Birin

o r
F
M
121151

o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f
data EmpsAUC;

d e Reads one observation

S No tsi
merge EmpsAU PhoneC; from each matching
by EmpID; data set
run;

162
ou
PDV
First
Birin
Gender EmpID
M
Phone
121152 +61(2)5555-1667

...
10-74 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU PhoneC;

ia l
by EmpID;
run;

t e r
Implicit OUTPUT;

a
Implicit RETURN;
PDV
First
Birin M

y M
Gender EmpID Phone

n
121152 +61(2)5555-1667

163

a r t i o ...

i e t b u
p r
Execution

t r i S
o dis SA
EmpsAU PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin
EOF

o
F
M

o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
Is the EmpID the same

S No tsi
merge EmpsAU PhoneC; as the EmpID currently
by EmpID;
run; in the PDV?

ou
No
PDV
First Gender EmpID Phone
Birin M 121152 +61(2)5555-1667

164 ...
10.6 Merging Data Sets with Nonmatches 10-75

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
EOF

data EmpsAUC;
merge EmpsAU PhoneC;

ia l
by EmpID;
run;

t e r
Reinitialize PDV

PDV
First Gender

M
EmpIDa Phone

165

r y
.

i o n ...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin
EOF

o
F
M

o f
121151
121152
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
Reads the observation

S No tsi
merge EmpsAU PhoneC; from the appropriate
by EmpID; data set
run;

166
ou
PDV
First Gender EmpID Phone
121153 +61(2)5555-1348

...
10-76 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348
EOF

data EmpsAUC;
merge EmpsAU PhoneC;

ia l
by EmpID;
run;

t e r
Implicit OUTPUT;

a
Implicit RETURN;
PDV
First Gender

y M
EmpID Phone

n
121153 +61(2)5555-1348

167

a r t i o ...

i e t b u
p r
Execution

t r i S
o dis SA
EmpsAU PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P r
Kylie
Birin
EOF

o
F
M
f
121151

o
121152
EOF
121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f e
data EmpsAUC;

d
S No tsi
merge EmpsAU PhoneC;
by EmpID;
run;

168
ou
PDV
First Gender EmpID Phone
121153 +61(2)5555-1348
10.6 Merging Data Sets with Nonmatches 10-77

Final Results
EmpsAUC
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin M 121152 +61(2)5555-1667

l
121153 +61(2)5555-1348

The final results include matches and nonmatches.

r
Matches are observations that contain data from

e ia
t
both input data sets.

a
Nonmatches are observations that contain data
from only one input data set.

M
169

r y i o n
t a u t
r i e
10.11 Quiz
r i b S
p t
How many observations in the final data set EmpsAUC

o dis SA
are considered nonmatches?

P r a. 1

f
S
b.

f
c.
o r
2
3
o
A
S No tsit
d. 4

d e
EmpsAUC
First Gender EmpID Phone

ou
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin M 121152 +61(2)5555-1667
121153 +61(2)5555-1348

171
10-78 Chapter 10 Combining SAS Data Sets

The IN= Data Set Option


The IN= data set option creates a variable that
indicates whether the data set contributed data
to the current observation.
General form of the IN= data set option:

SAS-data-set (IN = variable)

variable is a temporary numeric variable that has

ia l
two possible values:

t e r
indicates that the data set did not

a
0
contribute to the current observation.

1
M
indicates that the data set did

n
contribute to the current observation.

y
174

a r t i o
i e t b u
The variable created with the IN= data set option is temporary. Therefore, the variable
is only available during the execution phase and is not written to the SAS data set.

p r t r i S
r o dis SA
The IN= Data Set Option

P f
MERGE statement examples:

S f o r o
merge EmpsAU(in=Emps)

A
S No tsit d e
PhoneC(in=Cell);

merge EmpsAU(in=E)

ou
PhoneC(in=P);

merge EmpsAU(in=AU)
PhoneC;

175
10.6 Merging Data Sets with Nonmatches 10-79

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU(in=Emps)

ia l
r
PhoneC(in=Cell);

e
by EmpID;

t
run;

PDV

M a
First Gender EmpID D Emps Phone D Cell

176
Togar M 121150

r y i o n
1 +61(2)5555-1795 1
p110d07
...

t a u t
r i e
Execution
r i b S
p
o dis SA
EmpsAU
t PhoneC

r
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795

S P Kylie
Birin

o r
F
M
121151

o
121152
f 121152 +61(2)5555-1667
121153 +61(2)5555-1348

A t f
data EmpsAUC;

d e
merge EmpsAU(in=Emps)

S No tsi PhoneC(in=Cell);
by EmpID;
run;

177
ou
PDV
First Gender EmpID D Emps
Kylie F 121151 1
Phone D Cell
0

...
10-80 Chapter 10 Combining SAS Data Sets

Execution
EmpsAU PhoneC
First Gender EmpID EmpID Phone
Togar M 121150 121150 +61(2)5555-1795
Kylie F 121151 121152 +61(2)5555-1667
Birin M 121152 121153 +61(2)5555-1348

data EmpsAUC;
merge EmpsAU(in=Emps)

ia l
r
PhoneC(in=Cell);

e
by EmpID;

t
run;

PDV

M a
First Gender EmpID D Emps Phone D Cell

178
Birin M

r
121152

y i o n
1 +61(2)5555-1667 1

...

t a u t
r i e
10.12 Quiz
r i b S
p t
What are the values of Emps and Cell?

o dis SA
P r EmpsAU
First

f
Gender EmpID
PhoneC
EmpID Phone

S f o r
Togar
Kylie
M
F

o
121150
121151
121150 +61(2)5555-1795
121152 +61(2)5555-1667

e
Birin M 121152 121153 +61(2)5555-1348

A
S No tsit d
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;

ou
run;

PDV
First Gender EmpID D Emps Phone D Cell
121153 +61(2)5555-1348
180
10.6 Merging Data Sets with Nonmatches 10-81

PDV Results
PDV
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie F 121151 1 0
Birin M 121152 1 +61(2)5555-1667 1
121153 0 +61(2)5555-1348 1

The variables created with the IN= data set option are

ia l
the SAS data set.

t e r
only available during execution and are not written to

M a
182

r y i o n
t a u t
r i e
10.13 Quiz
r i b S
p t
Which subsetting IF statement can be added

o dis SA
to the DATA step to only output the matches?

P r f
a. ;if Emps=1 and Cell=0;

S f o r o
b. ;if Emps=1 and Cell=1;

A
S No tsit d e
c. ;if Emps=1;
d. ;if Cell=0;

PDV

ou
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie F 121151 1 0
Birin M 121152 1 +61(2)5555-1667 1
121153 0 +61(2)5555-1348 1
184
10-82 Chapter 10 Combining SAS Data Sets

Matches Only
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
if Emps=1 and Cell=1;
run;

EmpsAUC

ia l
r
First Gender EmpID Phone

e
Togar M 121150 +61(2)5555-1795

t
Birin M 121152 +61(2)5555-1667

M a
186

r y i o n p110d07

t a u t
The subsetting IF controls which observations are further processed by the DATA step. In this example,

r i e r i b
the only processing that remains is the implied output at the bottom of the DATA step. Therefore, if the
condition evaluates to true, the observation is written to the SAS data set. If the condition is evaluated

S
p t
to false, the observation is not written to the SAS data set.

o dis SA
This subsetting IF statement can be rewritten as follows:

P r if Emps and Cell;

f
S f o r o
A
S No tsit d e
ou
10.6 Merging Data Sets with Nonmatches 10-83

Nonmatches from EmpsAU Only


data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
if Emps=1 and Cell=0;
run;

EmpsAUC

ia l
r
First Gender EmpID Phone

e
Kylie F 121151

a t
y M n
187

a r t i o p110d07

i e t b u
This subsetting IF statement can be rewritten as follows:

p r t r i
if Emps and not Cell;

S
r o dis SA
Nonmatches from PhoneC Only

S P r
data EmpsAUC;

o o f
merge EmpsAU(in=Emps)

A t f e
PhoneC(in=Cell);
by EmpID;

d
S No tsi
if Emps=0 and Cell=1;
run;

ou
EmpsAUC
First Gender EmpID Phone
121153 +61(2)5555-1348

p110d07
188

The subsetting IF statement can be rewritten as follows:

if not Emps and Cell;


10-84 Chapter 10 Combining SAS Data Sets

All Nonmatches
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;
if Emps=0 or Cell=0;
run;

EmpsAUC

ia l
r
First Gender EmpID Phone

e
Kylie F 121151

t
121153 +61(2)5555-1348

M a
189

r y i o n p110d07

t a u t
The subsetting IF statement can be rewritten as follows:

r i e i b
if not Emps or not Cell;

r S
p
o dis SA t
P r 10.14 Quiz
f
Desired SAS Data Sets

r o
X Y Z W X Y Z W

S o
3 30 40 2 60

f
Write an appropriate

A e
if A=1 and B=0;
IF statement to create

t
OR

S No tsi dataA
X Y
d
the desired data sets.

Z
dataB
X W
if A and not B;

X
1
Y
10
Z
20
W
50
X
1
Y
10
Z
20
W
50

ou
1 10 20 1 50
3 30 40 2 60
3 30 40 2 60

data new;
merge dataA(in=A)
dataB(in=B);
by X;
run; X Y Z W X Y Z W
1 10 20 50 2 60
new 3 30 40
X Y Z W
1 10 20 50
2 60
191 3 30 40
10.6 Merging Data Sets with Nonmatches 10-85

Outputting to Multiple Data Sets (Self-Study)


The DATA statement can specify multiple output data
sets.
data EmpsAUC EmpsOnly PhoneOnly;
merge EmpsAU(in=Emps) PhoneC(in=Cell);
by EmpID;

l
if Emps=1 and Cell=1
then output EmpsAUC;

ia
else if Emps=1 and Cell=0

r
then output EmpsOnly;

e
else if Emps=0 and Cell=1

t
then output PhoneOnly;

a
run;

y M n
194

a r t i o p110d07

i e t b u
p r t i
Outputting to Multiple Data Sets (Self-Study)
r S
An OUTPUT statement can be used in a conditional

r o dis SA
statement to write the current observation to a specific
data set that is listed in the DATA statement.

S P r o f
data EmpsAUC EmpsOnly PhoneOnly;

o
merge EmpsAU(in=Emps) PhoneC(in=Cell);

A t f by EmpID;

e
if Emps=1 and Cell=1

d
S No tsi
then output EmpsAUC;
else if Emps=1 and Cell=0
then output EmpsOnly;
else if Emps=0 and Cell=1

ou
then output PhoneOnly;
run;

p110d07
195
10-86 Chapter 10 Combining SAS Data Sets

Outputting to Multiple Data Sets (Self-Study)


EmpsAUC
First Gender EmpID Phone
Togar M 121150 +61(2)5555-1795
Birin M 121152 +61(2)5555-1667

EmpsOnly
First Gender EmpID
Kylie F 121151
Phone

ia l
PhoneOnly

t e r
a
First Gender EmpID Phone
121153 +61(2)5555-1348

y M n
196

a r t i o
i e t b u
p r t i
Many-to-Many Merge (Self-Study)
r S
Merge EmpsAUUS and PhoneO by Country to create

r o dis SA
a new data set named EmpsOfc.

S P EmpsAUUS

r
First

o
Gender

o f
Country
PhoneO
Country Phone

f
Togar M AU AU +61(2)5555-1500

A
S No tsit
Kylie
Stacey
Gloria
James
F
F
F
Md e AU
US
US
US
AU
AU
US
US
+61(2)5555-1600
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600

ou
data EmpsOfc;
merge EmpsAUUS PhoneO; The data sets are
by Country; sorted by Country.
run;
p110d08
197

In a many-to-many merge, this note is issued to the log:


NOTE: MERGE statement has more
than one data set with repeats of BY values.

This message is meant to be informational.


A DATA step that performs a many-to-many merge does not produce a Cartesian product.
10.6 Merging Data Sets with Nonmatches 10-87

Many-to-Many Merge (Self-Study)


DATA Step Results:
EmpsOfc
First Gender Country Phone
Togar M AU +61(2)5555-1500
Kylie F AU +61(2)5555-1600

l
Kylie F AU +61(2)5555-1700

ia
Stacey F US +1(305)555-1500
Gloria F US +1(305)555-1600
James M US

t e r +1(305)555-1600

M a
198

r y i o n
t a u t
r i e i b
Many-to-Many Merge (Self-Study)
r S
p t
The SQL procedure creates different results than the

o dis SA
DATA step for a many-to-many merge.

P r EmpsAUUS

f
PhoneO

r
First Gender Country Country Phone

S f o
Togar
Kylie
M
F

e o AU
AU
AU
AU
+61(2)5555-1500
+61(2)5555-1600

A
S No tsit
Stacey
Gloria
James

proc sql;
F

d
F
M
US
US
US
AU
US
US
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600

ou
create table EmpsOfc as
select First, Gender, PhoneO.Country, Phone
from EmpsAUUS, PhoneO
where EmpsAUUS.Country=PhoneO.Country;
p110d08
199

The SQL procedure is the SAS implementation of Structured Query Language. PROC SQL is part of Base
SAS software, and you can use it with any SAS data set. Often, PROC SQL can be an alternative to other
SAS procedures or the DATA step.
10-88 Chapter 10 Combining SAS Data Sets

Many-to-Many Merge (Self-Study)


PROC SQL Results:
EmpsOfc
First Gender Country Phone
Togar M AU +61(2)5555-1500
Togar M AU +61(2)5555-1600

l
Togar M AU +61(2)5555-1700

ia
Kylie F AU +61(2)5555-1500
Kylie F AU +61(2)5555-1600
Kylie
Stacey
Stacey
F
F
F
AU
US
US

t e r
+61(2)5555-1700
+1(305)555-1500
+1(305)555-1600
Gloria
Gloria
F
F
US

M
US
a +1(305)555-1500
+1(305)555-1600

n
James M US +1(305)555-1500

y o
James M US +1(305)555-1600
200

t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
10.6 Merging Data Sets with Nonmatches 10-89

Exercises

Level 1

10. Merging Using the IN= Option


a. Retrieve the starter program p110e10.

ia l
e r
b. Add a DATA step after the PROC SORT step to merge Work.product and

t
orion.supplier by Supplier_ID to create a new data set called Work.prodsup.

a
c. Submit the program and confirm that Work.prodsup was created with 556 observations

M n
and 10 variables.

a r y t i o
d. Modify the DATA step to output only observations that are in Work.product but
not orion.supplier. A subsetting IF statement that references IN= variables in the MERGE

i e t b u
statement needs to be added.

i
e. Submit the program and confirm that Work.prodsup was created with 75 observations

r r S
and 10 variables. The supplier information will be missing in the PROC PRINT output.

p t
o dis SA
Level 2

r
S P o r o f
11. Merging Using the IN= and RENAME= Options
a. Write a PROC SORT step to sort orion.customer by Country to create a new data set

A t f e
called Work.customer.

d
S No tsi
b. Write a DATA step to merge the previous sorted data set with orion.lookup_country
by Country to create a new data set called Work.allcustomer.

ou
In the orion.lookup_country data set, Start needs to be renamed to Country and
Label needs to be renamed to Country_Name.

Include only the following four variables: Customer_ID, Country, Customer_Name, and
Country_Name.
10-90 Chapter 10 Combining SAS Data Sets

c. Write a PROC PRINT step to create the following report:


Partial PROC PRINT Output (First 15 of 308 Observations)
Obs Customer_ID Country Customer_Name Country_Name

1 . AD Andorra
2 . AE United Arab Emirates
3 . AF Afghanistan
4 . AG Antigua/Barbuda

l
5 . AI Anguilla
6 . AL Albania

ia
7 . AM Armenia

r
8 . AN Netherlands Antilles

e
9 . AO Angola

t
10 . AQ Antarctica

a
11 . AR Argentina
12 . AS American Samoa

M
13 . AT Austria

n
14 29 AU Candy Kinsey Australia

y
15 41 AU Wendell Summersby Australia

a r t i o
d. Modify the DATA step to store only the observations that contain both customer information

i e t u
and country information. A subsetting IF statement that references IN= variables in the MERGE
statement needs to be added.

b
p r t r i S
e. Submit the program to create the following report:

o dis SA
Partial PROC PRINT Output (First 7 of 77 Observations)

P r Obs

f
Customer_ID Country Customer_Name
Country_
Name

S f o r 1
2
o 29
41
AU
AU
Candy Kinsey
Wendell Summersby
Australia
Australia

A
S No tsit d e 3
4
5
6
7
53
111
171
183
195
AU
AU
AU
AU
AU
Dericka Pockran
Karolina Dokter
Robert Bowerman
Duncan Robertshawe
Cosi Rimmington
Australia
Australia
Australia
Australia
Australia

ou
10.6 Merging Data Sets with Nonmatches 10-91

Level 3

12. Merging and Outputting to Multiple Data Sets


a. Write a PROC SORT step to sort orion.orders by Employee_ID to create a new data set
called Work.orders.

b. Write a DATA step to merge orion.staff and Work.orders by Employee_ID.

l
Create two new data sets: Work.allorders and Work.noorders.

The data set Work.allorders should include all observations from Work.orders,

r
regardless of matches or nonmatches from the orion.staff data set.

e ia
a t
The data set Work.noorders should include the observations from orion.staff that do
not have a match in Work.orders.

y M
Include only the following six variables: Employee_ID, Job_Title, Gender, Order_ID,
Order_Type, and Order_Date.
n
a r t i o
Outputting to multiple data sets is mentioned at the end of this section in a self-study

i e t
section.

b u
p r t i
c. Using the new data sets, write two PROC PRINT steps to create two reports.

r S
d. Submit the program and confirm that Work.allorders was created with 490 observations

r o dis SA
and 6 variables and Work.noorders was created with 324 observations and 6 variables.

S P o r o f
A t f d e
S No tsi
ou
10-92 Chapter 10 Combining SAS Data Sets

10.7 Chapter Review

Chapter Review
1. What are the three methods for combining SAS data
sets?

2. What data set option enables you to change the name

ia l
r
of a variable?

t e
3. What is a requirement of the input SAS data sets prior
to match-merging?
a
y M n
4. Which three statements must be used in a DATA step

r i o
to perform a match-merge?

a t
i e t b u
i
202 continued...

p r t r S
r o dis SA
Chapter Review

S P o r o f
5. Which data set option can be used to prevent non-
matches from being written to the output data sets

f
in a match-merge?

A
S No tsit d e
ou
203
10.8 Solutions 10-93

10.8 Solutions

Solutions to Exercises
1. Appending Like-Structured Data Sets
a. Retrieve the starter program.
b. Submit the two PROC CONTENTS steps.
proc contents data=orion.price_current;

ia l
r
run;

t e
proc contents data=orion.price_new;
run;

a
y M
How many variables are in orion.price_current? 6

n
o
How many variables are in orion.price_new? 5

a r t i
Does orion.price_new contain any variables that are not in orion.price_current?

t
No

i e i b u
r
c. Add a PROC APPEND step.

p t r S
proc append base=orion.price_current

o dis SA
data=orion.price_new;

r
run;

S P BASE= data set.

o o f
Why is the FORCE option not needed? The variables in the DATA= data set are all in the

r
A t f d e
d. Submit the program.

S No tsi
2. Appending Unlike-Structured Data Sets
a. Write and submit two PROC CONTENTS steps.

ou
proc contents data=orion.qtr1_2007;
run;

proc contents data=orion.qtr2_2007;


run;
How many variables are in orion.qtr1_2007? 5

How many variables are in orion.qtr2_2007? 6

Which variable is not in both data sets? Employee_ID


b. Write a PROC APPEND step.
proc append base=work.ytd
data=orion.qtr1_2007;
run;
c. Submit the PROC APPEND step.
10-94 Chapter 10 Combining SAS Data Sets

d. Write another PROC APPEND step.


proc append base=work.ytd
data=orion.qtr2_2007 force;
run;
Why is the FORCE option needed? The variable Employee_ID in the DATA= data set is not
in the BASE= data set.
e. Submit the second PROC APPEND step.
3. Using the APPEND Statement

ia l
t r
a. Write and submit three PROC CONTENTS steps.

e
proc contents data=orion.shoes_eclipse;
run;

M a
proc contents data=orion.shoes_tracker;
run;

r y
proc contents data=orion.shoes;

i o n
run;

t a u t
r e r i b
b. Write a PROC DATASETS step.

i
proc datasets library=orion nolist;

S
p t
append base=shoes data=shoes_eclipse;

o dis SA
append base=shoes data=shoes_tracker force;

r
quit;

S P o o f
c. Submit the PROC DATASETS step.

r
4. Concatenating Like-Structured Data Sets

A t f e
a. Write and submit a DATA step.

d
S No tsi
data work.thirdqtr;
set orion.mnth7_2007 orion.mnth8_2007 orion.mnth9_2007;
run;

ou
How many observations in Work.thirdqtr are from orion.mnth7_2007? 10

How many observations in Work.thirdqtr are from orion.mnth8_2007? 12

How many observations in Work.thirdqtr are from orion.mnth9_2007? 10

b. Write and submit a PROC PRINT step.


proc print data=work.thirdqtr;
run;
10.8 Solutions 10-95

5. Concatenating Unlike-Structured Data Sets


a. Retrieve the starter program.
b. Submit the two PROC CONTENTS steps.
proc contents data=orion.sales;
run;

proc contents data=orion.nonsales;


run;
What are the names of the two variables that are different in the two data sets?

ia l
orion.sales

t e r orion.nonsales

a
First_Name First

Last_Name

y M n
Last

o
c. Add a DATA step.

a r
data work.allemployees;

t t i
u
set orion.sales

e
orion.nonsales(rename=(First=First_Name Last=Last_Name));

run;
r i t r i b
keep Employee_ID First_Name Last_Name Job_Title Salary;

S
p
o dis SA
d. Add a PROC PRINT step.

r
proc print data=work.allemployees;

S P
run;

o r o
6. Interleaving Data Sets f
A f e
a. Retrieve the starter program.

t d
S No tsi
b. Add a PROC SORT step after the PROC SORT step in the starter program.
proc sort data=orion.shoes_eclipse

ou
out=work.eclipsesort;
by Product_Name;
run;

proc sort data=orion.shoes_tracker


out=work.trackersort;
by Product_Name;
run;
c. Add a DATA step.
data work.e_t_shoes;
set work.eclipsesort work.trackersort;
by Product_Name;
keep Product_Group Product_Name Supplier_ID;
run;
10-96 Chapter 10 Combining SAS Data Sets

d. Add a PROC PRINT step.


proc print data=work.e_t_shoes;
run;
7. Merging orion.orders and orion.order_item in a One-to-Many Merge

a. Retrieve the starter program.


b. Submit the two PROC CONTENTS steps.
proc contents data=orion.orders;
run;

ia l
proc contents data=orion.order_item;
run;

t e r
data work.allorders;

M a
c. Add a DATA step prior to the PROC PRINT step.

merge orion.orders

r y
orion.order_item;

i o n
t
by Order_ID;
run;

e t a u
r i r i b
proc print data=work.allorders;

S
var Order_ID Order_Item_Num Order_Type

t
run;

r p
Order_Date Quantity Total_Retail_Price;

o dis SA
d. Submit the program.

S P o r o f
8. Merging orion.product_level and orion.product_list in a One-to-Many Merge

A f e
a. Write a PROC SORT step.

t d
S No tsi
proc sort data=orion.product_list
out=work.product_list;
by Product_Level;

ou
run;
b. Write a DATA step.
data work.listlevel;
merge orion.product_level work.product_list;
by Product_Level;
run;
c. Write a PROC PRINT step.
proc print data=work.listlevel;
var Product_ID Product_Name Product_Level Product_Level_Name;
run;
10.8 Solutions 10-97

9. Joining orion.product_level and orion.product_list in a One-to-Many Merge

a. Write a PROC SQL step.


proc sql;
create table work.listlevelsql as
select Product_ID, Product_Name,
product_level.Product_Level, Product_Level_Name
from orion.product_level, orion.product_list
where product_level.Product_Level = product_list.Product_Level;
quit;
b. Write a PROC PRINT step.
ia l
proc print data=work.listlevelsql;
run;

t e r
a
10. Merging Using the IN= Option

M n
a. Retrieve the starter program.

r
b. Add a DATA step.

a y t i o
t
proc sort data=orion.product_list
out=work.product;

i e
by Supplier_ID;

i b u
run;

p r t r S
o dis SA
data work.prodsup;

r
merge work.product
orion.supplier;

S P
run;

o r
by Supplier_ID;

o f
A t f e
proc print data=work.prodsup;

d
S No tsi
var Product_ID Product_Name Supplier_ID Supplier_Name;
run;
c. Submit the program.

ou
d. Modify the DATA step.
data work.prodsup;
merge work.product(in=P)
orion.supplier(in=S);
by Supplier_ID;
if P=1 and S=0;
run;
e. Submit the program.
10-98 Chapter 10 Combining SAS Data Sets

11. Merging Using the IN= and RENAME= Options


a. Write a PROC SORT step.
proc sort data=orion.customer
out=work.customer;
by Country;
run;
b. Write a DATA step.
data work.allcustomer;
merge work.customer

ia l
r
orion.lookup_country(rename=(Start=Country

e
Label=Country_Name));
by Country;

a t
keep Customer_ID Country Customer_Name Country_Name;
run;

M
c. Write a PROC PRINT step.

y n
r
proc print data=work.allcustomer;
run;

a t i o
i e t
d. Modify the DATA step.

b u
r
data work.allcustomer;

r i
merge work.customer(in=Cust)

p t S
orion.lookup_country(rename=(Start=Country

r o dis SA Label=Country_Name)
in=Ctry);

S Pby Country;

o r o f
keep Customer_ID Country Customer_Name Country_Name;
if Cust=1 and Ctry=1;

A
run;

t f d e
S No tsi
e. Submit the program.
12. Merging and Outputting to Multiple Data Sets

ou
a. Write a PROC SORT step.
proc sort data=orion.orders
out=work.orders;
by Employee_ID;
run;
b. Write a DATA step.
data work.allorders work.noorders;
merge orion.staff(in=Staff) work.orders(in=Ord);
by Employee_ID;
if Ord=1 then output work.allorders;
else if Staff=1 and Ord=0 then output work.noorders;
keep Employee_ID Job_Title Gender Order_ID Order_Type Order_Date;
run;
10.8 Solutions 10-99

c. Write two PROC PRINT steps.


proc print data=work.allorders;
run;

proc print data=work.noorders;


run;
d. Submit the program.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
10-100 Chapter 10 Combining SAS Data Sets

Solutions to Student Activities (Polls/Quizzes)

10.01 Quiz – Correct Answer


Which method (appending, concatenating, or merging)
should be used for the given business scenario?

Business Scenario

The JanSales, FebSales, and


Method

ia l
1

t e r
MarSales data sets need to be combined
to create the Qtr1Sales data set.
concatenating

M a
The Sales data set needs to be combined
with the Target data set by month to merging

n
compare the sales data to the target data.

3
y t o
The OctSales data set needs to be

r i
added to the YTD data set.

a
appending

11

i e t b u
p r t r i S
r o dis SA
10.02 Quiz – Correct Answer

S P o o
the three data sets?
f
How many observations will be in Emps after appending

r
f
Emps2008

A
S No tsit Emps d e
9 observations First
Brett
Renee
Emps2009
Gender HireYear
M
F
2008
2008

ou
First Gender HireYear
First HireYear
Stacey F 2006
Sara 2009
Gloria F 2007
Dennis 2009
James M 2007
Emps2010
First HireYear Country
Rose 2010 Spain
Eric 2010 Spain
22
10.8 Solutions 10-101

10.03 Quiz – Correct Answer


How many variables will be in Emps after appending
the three data sets?
Emps2008
3 variables First Gender HireYear
Brett M 2008

l
Renee F 2008
Emps

ia
Emps2009
First Gender HireYear

r
First HireYear
Stacey F 2006

e
Sara 2009
Gloria F 2007

t
Dennis 2009
James M 2007

M a
The base data set variable
Emps2010
First HireYear Country

n
Rose 2010 Spain
information cannot change.

y
Eric 2010 Spain
25

a r t i o
i e t b u
p r t i
10.04 Quiz – Correct Answer
r S
How many observations will be in Emps if the program

r o dis SA
is submitted a second time?

S P r f
15 observations (9 + 2 + 2 + 2 )

o o
A f e
Be careful; observations are added to the BASE= data

t d
S No tsi
set every time that you submit the program.

38
ou
10-102 Chapter 10 Combining SAS Data Sets

10.05 Quiz – Correct Answer


How many variables will be in EmpsAll2
after concatenating EmpsCN and EmpsJP?

EmpsCN EmpsJP
First Gender Country First Gender Region

l
Chang M China Cho F Japan

ia
Li M China Tomi M Japan
Ming F China

Four variables
t e r
a
First, Gender, Country, and Region

M
64

r y i o n
t a u t
r i e i b
10.06 Quiz – Correct Answer
r S
p t
Which statement has correct syntax?

o dis SA
P r a. Answer
set EmpsCN(rename(Country=Location))

f
EmpsJP(rename(Region=Location));

S f o r
b. Answer
o
set EmpsCN(rename=(Country=Location))

A
S No tsitc. Answer d e
EmpsJP(rename=(Region=Location));

set EmpsCN rename=(Country=Location)


EmpsJP rename=(Region=Location);

72
ou
10.8 Solutions 10-103

10.07 Multiple Choice Poll – Correct Answer


(Self-Study)
Which method would you use if you wanted to create
a new variable at the time of concatenation?
a. APPEND procedure
b. SET statement

data EmpsBonus;
ia l
r
Example:
set EmpsDK EmpsFR;
if Country='Denmark'
then Bonus=300;
else Bonus=500;
run;
a t e
y M n
84

a r t i o
i e t b u
p r t i
10.08 Quiz – Correct Answer
r S
Which step is sorting the observations in a SAS data set

r o dis SA
and overwriting the same SAS data set?

P f
a. Answer
proc sort data=work.EmpsAU

S f o r out=work.sorted;
by First;
o
e
run;

A
S No tsitb. Answer
d
proc sort data=work.EmpsAU
out=orion.EmpsAU;
by First;

ou
run;

c. Answer
proc sort data=work.EmpsAU;
by First;
run;
105
10-104 Chapter 10 Combining SAS Data Sets

10.09 Multiple Choice Poll – Correct Answer


What is the Employee_ID value for the first
observation in the sorted data set?
a. 120102
b. 120121

l
c. 121144

ia
d. 121145
proc sort data=orion.sales

e
out=work.sortsales;

t r
by Gender descending Employee_ID;

a
run;

M
proc print data=work.sortsales;
var Gender Employee_ID First_Name

111
run;
Last_Name Salary;

r y i o n p110a01s

t a u t
r i e i b
10.10 Quiz – Correct Answer
r S
p t
What are the modified, completed statements?

o dis SA
P r proc sort data=orion.employee_payroll

f
out=work.payroll;

S f o r
run;
o
by Employee_ID;

A
S No tsit d e
proc sort data=orion.employee_addresses
out=work.addresses;
by Employee_ID;
run;

ou
data work.payadd;
merge work.payroll work.addresses;
by Employee_ID;
run;
118
10.8 Solutions 10-105

10.11 Quiz – Correct Answer


How many observations in the final data set EmpsAUC
are considered nonmatches?
a. 1
b. 2

l
c. 3

ia
d. 4

EmpsAUC
First

t e
Gender EmpID r Phone

a
Togar M 121150 +61(2)5555-1795
Kylie F 121151
Birin

y
M

M 121152 +61(2)5555-1667

n
121153 +61(2)5555-1348

172

a r t i o
i e t b u
p r t i
10.12 Quiz – Correct Answer
r S
What are the values of Emps and Cell?

r o dis SA
EmpsAU PhoneC

P f
First Gender EmpID EmpID Phone

S
Togar

f o
Kylie
r M
F

o
121150
121151
121150 +61(2)5555-1795
121152 +61(2)5555-1667

e
Birin M 121152 121153 +61(2)5555-1348

A
S No tsit d
data EmpsAUC;
merge EmpsAU(in=Emps)
PhoneC(in=Cell);
by EmpID;

ou
run;

PDV
First Gender EmpID D Emps Phone D Cell
121153 0 +61(2)5555-1348 1
181
10-106 Chapter 10 Combining SAS Data Sets

10.13 Quiz – Correct Answer


Which subsetting IF statement can be added
to the DATA step to only output the matches?
a. ;if Emps=1 and Cell=0;
b. ;if Emps=1 and Cell=1;
c. ;if Emps=1;
d. ;if Cell=0;
ia l
PDV

t e r
a
First Gender EmpID D Emps Phone D Cell
Togar M 121150 1 +61(2)5555-1795 1
Kylie
Birin

y
F
M
M 121151
121152

n
1
1 +61(2)5555-1667
0
1

r o
121153 0 +61(2)5555-1348 1

i
185

t a u t
r i e
10.14 Quiz –
r i b S
Desired SAS Data Sets

p
o dis SA t
Correct Answer X
3
Y
30
Z
40
W X
2
Y Z W
60

r
Write an appropriate
if A=1 and B=0; if A=0 and B=1;
IF statement to create

P f
OR OR

r
the desired data sets. if A and not B; if not A and B;

S o
dataA

f e o dataB X Y Z W X Y Z W

A
X Y Z X W
1 10 20 50 1 10 20 50

t
1 10 20 1 50

d
3 30 40 2 60

S No tsi
3 30 40 2 60
if A=1; if B=1;
data new; OR OR
merge dataA(in=A) if A; if B;
dataB(in=B);

ou
by X;
run; X Y Z W X Y Z W
1 10 20 50 2 60
new 3 30 40
X Y Z W
1 10 20 50
if A=1 and B=1; if A=0 or B=0;
2 60 OR OR
192 3 30 40 if A and B; if not A or not B;
10.8 Solutions 10-107

Solutions to Chapter Review

Chapter Review Answers


1. What are the three methods for combining SAS data
sets?
Append
Concatenate
Merge
ia l
t e r
2. What data set option enables you to change the name
of a variable?

M a
the RENAME= data set option

r y i o n
t a u t
204

r i e r i b S
continued...

p
o dis SA t
Chapter Review Answers

P r f
3. What is a requirement of the input SAS data sets prior

S f o r o
to match-merging?
The input SAS data sets must be sorted by the

A
S No tsit d e
BY variable.

4. Which three statements must be used in a DATA step


to perform a match-merge?

ou
DATA
MERGE
BY

205 continued...
10-108 Chapter 10 Combining SAS Data Sets

Chapter Review Answers


5. Which data set option can be used to prevent non-
matches from being written to the output data sets
in a match-merge?
the IN= data set option

ia l
t e r
M a
206

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 11 Enhancing Reports

11.1 Using Global Statements ............................................................................................. 11-3


Exercises ............................................................................................................................ 11-16

11.2 Adding Labels and Formats ...................................................................................... 11-20


ia l
e r
Exercises ............................................................................................................................ 11-29

t
a
11.3 Creating User-Defined Formats................................................................................. 11-32

M
r i o n
Exercises ............................................................................................................................ 11-43

y
a t
11.4 Subsetting and Grouping Observations .................................................................. 11-46

t u
r i e r i b
Exercises ............................................................................................................................ 11-52

S
p t
11.5 Directing Output to External Files ............................................................................ 11-55

o dis SA
r
Demonstration: Creating HTML, PDF, and RTF Files ........................................................ 11-64

S P o r o f
Demonstration: Creating Files That Open in Excel ............................................................ 11-77

Demonstration: Using Options with the EXCELXP Destination (Self-Study) ..................... 11-80

A t f e
Exercises ............................................................................................................................ 11-83

d
S No tsi
11.6 Chapter Review ........................................................................................................... 11-88

ou
11.7 Solutions ..................................................................................................................... 11-89
Solutions to Exercises ........................................................................................................ 11-89

Solutions to Student Activities (Polls/Quizzes) ................................................................. 11-103

Solutions to Chapter Review ............................................................................................ 11-109


11-2 Chapter 11 Enhancing Reports

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.1 Using Global Statements 11-3

11.1 Using Global Statements

Objectives
Identify SAS statements that are used with most
reporting procedures.
Enhance reports by using SAS system options.
Enhance reports by adding titles and footnotes.

ia l
t e r
Add dates and times to titles. (Self-Study)

M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rCreating Reports

f
r
A procedure step is a primary method for creating reports.

S f o e oSAS Procedures

A
S No tsit d
for Creating
Reports

ou
PROC PROC
PRINT TABULATE

PROC PROC
FREQ MEANS

...

4
11-4 Chapter 11 Enhancing Reports

Example of a Basic Report


proc print data=orion.sales;
var Employee_ID First_Name Last_Name Salary;
run;

Partial PROC PRINT Output


First_

l
Obs Employee_ID Name Last_Name Salary

ia
1 120102 Tom Zhou 108255

r
2 120103 Wilson Dawes 87975
3 120121 Irenie Elvish 26600

e
4 120122 Christina Ngan 27475

t
5 120123 Kimiko Hotstone 26190
6 120124 Lucian Daymond 26480
7
8
9
120125
120126
120127

M a
Fong
Satyakam
Sharryn
Hofmeister
Denny
Clarkson
32040
26780
28100

n
10 120128 Monica Kletschkus 30890

a r y t i o p111d01

i e t b u
p r
options nocenter;
t i
Example of an Enhanced Report
r S
r o dis SA
ods html file='enhanced.html' style=sasweb;
proc print data=orion.sales label;

P
var Employee_ID First_Name Last_Name Salary;

S o r o
title2 'Males Only';
f
title1 'Orion Sales Employees';

f
footnote 'Confidential';

A
S No tsi e
label Employee_ID='Sales ID'

t d
First_Name='First Name'
Last_Name='Last Name'
Salary='Annual Salary';
format Salary dollar8.;

ou
where Gender='M';
by Country;
run;
ods html close;
p111d01
6
11.1 Using Global Statements 11-5

Example of an Enhanced Report


Partial PROC PRINT Output

ia l
t e r
M a
7

r y i o n
t a u t
r i e i b
Statements That Enhance Reports
r S
p t
Many statements are used with most reporting

o dis SA
procedures to enhance the report.

P roptions nocenter;

f
ods html file='enhanced.html' style=sasweb;

S f o r o
proc print data=orion.sales label;
var Employee_ID First_Name Last_Name Salary;

e
title1 'Orion Sales Employees';
Some of these

A
S No tsit
title2 'Males Only';

d
statements are
footnote 'Confidential';
global
label Employee_ID='Sales statements.
ID'
First_Name='First Name'
Last_Name='Last Name'

ou
Salary='Annual Salary';
format Salary dollar8.;
where Gender='M';
by Country;
run;
ods html close;
8
11-6 Chapter 11 Enhancing Reports

Global Statements
The following are global statements that enhance reports:
OPTIONS
TITLE
FOOTNOTE
ODS
Global statements are specified anywhere in your
SAS program and they remain in effect until canceled,

ia l
t e r
changed, or your SAS session ends.

M a
9

r y i o n
t a u t
r i e i b
The OPTIONS Statement
r S
p t
The OPTIONS statement changes the value of one

o dis SA
or more SAS system options.

P r General form of the OPTIONS statement:

f
S f o r o
OPTIONS option(s);

A
S No tsit d e
Some SAS system options change the appearance
of a report.
The OPTIONS statement is not usually included

ou
in a PROC or DATA step.

10
11.1 Using Global Statements 11-7

SAS System Options for Reporting


Selected SAS System Options:
displays the date and time that the SAS session began
DATE (default)
at the top of each page of SAS output.
does not display the date and time that the SAS
NODATE
session began at the top of each page of SAS output.

NUMBER (default)
prints page numbers on the first line of each page
of SAS output.

ia l
NONUMBER
r
does not print page numbers on the first line of each

e
page of SAS output.

t
a
defines a beginning page number (n) for the next page
PAGENO=n
of SAS output.

y M n
11

a r t i o continued...

i e t b u
p r t i
SAS System Options for Reporting
r
Selected SAS System Options:
S
r o dis SA
CENTER (default) centers SAS output.

S P
NOCENTER
PAGESIZE=n

o r o f
left-aligns SAS output.

defines the number of lines (n) that can be printed

f
per page of SAS output.

e
PS=n

A
S No tsit
LINESIZE=width
LS=width
d
defines the line size (width) for the SAS log and
SAS output.

12
ou
11-8 Chapter 11 Enhancing Reports

SAS System Options for Reporting


options ls=80 date number;

proc means data=orion.sales;


var Salary;
run;

09:11 Monday, January 14, 2008

The MEANS Procedure


35

ia l
N Mean

t e r
Analysis Variable : Salary

Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

a
165 31160.12 20082.67 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

y M 80 characters wide

n
13

a r t i o p111d02

i e t b u
p r t i
SAS System Options for Reporting
r
options nodate pageno=1;
S
r o dis SA
proc freq data=orion.sales;

P f
tables Country;

r
run;

S f o e o
A
1

S No tsit d
Country Frequency
The FREQ Procedure

Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ou
AU 63 38.18 63 38.18
US 102 61.82 165 100.00

80 characters wide

p111d02
14
11.1 Using Global Statements 11-9

Setup for the Poll


Retrieve and submit program p111a01.
Review the results including the date, time, and page
number in the top-right corner of each page of output.
Add the DTRESET system option to the OPTIONS
statement.
Submit the program and review the results.

updates date and time at the top of each page

ia l
r
DTRESET
of SAS output.
NODTRESET
(Default)
t e
does not update date and time at the top of each

a
page of SAS output.

y M n
16

a r t i o
i e t b u
p r
11.01 Poll

t r i S
Did the date and/or time change?

r o dis SA
Yes
No

S P o r o f
A t f d e
S No tsi
17
ou
11-10 Chapter 11 Enhancing Reports

The TITLE Statement


The TITLE statement specifies title lines for SAS output.
General form of the TITLE statement:

TITLEn 'text ';

Titles appear at the top of the page.


The default title is The SAS System.

ia l
t e r
The value of n can be from 1 to 10.
An unnumbered TITLE is equivalent to TITLE1.

a
Titles remain in effect until they are changed,
canceled, or you end your SAS session.

M
19

r y i o n
t a u t
r i e i b
The FOOTNOTE Statement
r S
p t
The FOOTNOTE statement specifies footnote lines

o dis SA
for SAS output.

P r General form of the FOOTNOTE statement:

f
S f o r o
FOOTNOTEn 'text ';

A
S No tsit d e
Footnotes appear at the bottom of the page.
No footnote is printed unless one is specified.
The value of n can be from 1 to 10.
An unnumbered FOOTNOTE is equivalent to

ou
FOOTNOTE1.
Footnotes remain in effect until they are changed,
canceled, or you end your SAS session.

20
11.1 Using Global Statements 11-11

The TITLE and FOOTNOTE Statements


footnote1 'By Human Resource Department';
footnote3 'Confidential';

proc means data=orion.sales;


var Salary;
title 'Orion Star Sales Employees';

l
run;

e r ia
a t
y M n
21

a r t i o p111d03

i e t b u
p r t r i
The TITLE and FOOTNOTE Statements
S
o dis SA
Orion Star Sales Employees

r
The MEANS Procedure

P f
Analysis Variable : Salary

S
N

o r Mean

o Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

f e
165 31160.12 20082.67 22710.00 243190.00

A
S No tsit
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

d
ou
By Human Resource Department

Confidential

22
11-12 Chapter 11 Enhancing Reports

Changing Titles and Footnotes


TITLEn or FOOTNOTEn
replaces a previous title or footnote with the same
number
cancels all titles or footnotes with higher numbers.

ia l
t e r
M a
23

r y i o n
t a u t
r i e i b
Canceling All Titles and Footnotes
r S
p t
The null TITLE statement cancels all titles.

o dis SA
r
title;

S P o rfootnote;
o f
The null FOOTNOTE statement cancels all footnotes.

A t f d e
S No tsi
24
ou
11.1 Using Global Statements 11-13

Changing and Canceling Titles and Footnotes


PROC PRINT Code Resultant Title(s)
proc print data=orion.sales; The First Line
title1 'The First Line'; The Second Line
title2 'The Second Line';
run;
proc print data=orion.sales; The First Line

l
title2 'The Next Line'; The Next Line
run;
proc print data=orion.sales;
title 'The Top Line';
run;

e r
The Top Line

ia
run;
a
title3 'The Third Line';
t
proc print data=orion.sales; The Top Line

The Third Line

M
proc print data=orion.sales;
title;

y n
r o
run;

i
35

t a u t
r i e
11.02 Quiz
r i b S
p t
Which footnote(s) appears in the second procedure

o dis SA
output?

P ra. Non Sales Employees

f
c. Non Confidential
Sales Employees

S
b.

f o r o
Orion Star
Non Sales Employees d. Orion Star
Non Sales Employees

A e
Confidential

S No tsit d
footnote1 'Orion Star';
proc print data=orion.sales;
footnote2 'Sales Employees';

ou
footnote3 'Confidential';
run;
proc print data=orion.nonsales;
footnote2 'Non Sales Employees';
run;
37
11-14 Chapter 11 Enhancing Reports

Titles with Dates and Times (Self-Study)


The automatic macro variables &SYSDATE9 and
&SYSTIME can be used to add the SAS invocation date
and time to titles and footnotes.
title1 'Orion Star Employee Listing';
title2 "Created on &sysdate9 at &systime";

Double quotation marks must be used


when you reference a macro variable.

ia l
Example Title Output:

t e r
a
Orion Star Employee Listing
Created on 11MAR2008 at 15:53

y M n
39

a r t i o p111d04

i e t b u
p r t i
Titles with Dates and Times (Self-Study)
r S
The %LET statement can be used with %SYSFUNC and

r o dis SA
the TODAY function or the TIME function to create a
macro variable with the current date or time.

S P o r o f
%LET macro-variable = %SYSFUNC(today(), date-format);

A t f d e
%LET macro-variable = %SYSFUNC(time(), time-format);

S No tsi %LET is a macro statement that creates a macro

ou
variable and assigns it a value without leading or
trailing blanks.
%SYSFUNC is a macro function that executes
SAS functions outside of a step.
40
11.1 Using Global Statements 11-15

Titles with Dates and Times (Self-Study)


%let currentdate=%sysfunc(today(),worddate.);
%let currenttime=%sysfunc(time(),timeampm.);

proc freq data=orion.sales;


tables Gender Country;
title1 'Orion Star Employee Listing';

l
title2 "Created &currentdate";

ia
title3 "at &currenttime";
run;

Example Title Output:

t e r
a
Orion Star Employee Listing
Created March 11, 2008

M
at 4:09:43 PM

41

r y i o n p111d04

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-16 Chapter 11 Enhancing Reports

Exercises

Level 1

1. Specifying Titles, Footnotes, and System Options


a. Retrieve the starter program p111e01.

ia l
e r
b. Use the OPTIONS statement to establish these system options for the PROC MEANS report:

t
a
1) Suppress the page numbers that appear at the top of each output page.

M
2) Suppress the date and time that appear at the top of each output page.

r y i o
PROC MEANS step finishes. n
3) Limit the number of lines per page to 18 for the report. Reset the option value to 52 after the

t a u t
c. Specify the following title for the report: Orion Star Sales Report.

i e i b
d. Specify the following footnote for the report: Report by SAS Programming Student.

r r S
p t
e. After the PROC MEANS step finishes, cancel the footnote.

o dis SA
r
f. Submit the program to create the following PROC MEANS report:

S P PROC MEANS Output

o r o f Orion Star Sales Report

A t f d e The MEANS Procedure

S No tsi Analysis Variable : Total_Retail_Price Total Retail Price for This Product

N Mean Std Dev Minimum Maximum

ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
617 162.2001053 233.8530183 2.6000000 1937.20
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Report by SAS Programming Student


11.1 Using Global Statements 11-17

Level 2

2. Specifying Multiple Titles and System Options


a. Retrieve the starter program p111e02.
b. Limit the number of lines per page to 18 and then reset that option to 52 after both reports
are complete.

l
c. Request that each report contain page numbers starting at 1.

ia
d. Request that the current date and time be displayed at the top of each page; not the date and

t e r
time that the SAS session began.
e. Specify the following title to appear in both reports: Orion Star Sales Analysis.

a
f. Specify a secondary title to appear in the first report with a blank line between the titles:

M n
Catalog Sales Only

a r y t i o
g. Specify the following footnote for the first report:

t
Based on the previous day's posted data

i e i b u
The text specified for a title or footnote can be enclosed in single quotation marks or

p r r S
double quotation marks. Use double quotation marks when the text contains an

t
apostrophe.

r o dis SA
h. Specify a secondary title to appear in the second report with a blank line between the titles:

S P o r o f Internet Sales Only

i. Cancel all footnotes for the second report.

A t f d e
S No tsi
ou
11-18 Chapter 11 Enhancing Reports

j. Submit the program to create the following PROC MEANS reports:


PROC MEANS Output
Orion Star Sales Analysis 1
16:30 Monday, January 28, 2008
Catalog Sales Only

The MEANS Procedure

l
Analysis Variable : Total_Retail_Price Total Retail Price for This Product

ia
N Mean Std Dev Minimum Maximum

r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

e
170 199.5961765 282.9680817 2.6000000 1937.20

t
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

M a
r y i o n
Based on the previous day's posted data

t a u t
r i e r i b S
Orion Star Sales Analysis

Internet Sales Only


1
16:30 Monday, January 28, 2008

p
o dis SA t The MEANS Procedure

P r Analysis Variable : Total_Retail_Price Total Retail Price for This Product

f
S f o r N

o Mean Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

A
S No tsit
Level 3
123

d e 174.7280488 214.3528338 2.7000000 1542.60


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ou
3. Inserting Dates and Times into Titles
a. Use the OPTIONS procedure to verify that the date and time will not be automatically
displayed at the top of each page. If the option is not set correctly, change it.

Documentation about the OPTIONS procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The OPTIONS Procedure).
Look for an option in the PROC OPTIONS statement that can display the current
setting of a single option.
b. Retrieve the starter program p111e03.
11.1 Using Global Statements 11-19

c. Add a title with the following text, substituting the current date and time:
Sales Report as of 4:57 PM on Monday, January 28, 2008

An example of this technique is shown in the self-study material at the end of this
section.
d. Submit the program to create the following report:
PROC MEANS Output
Sales Report as of 4:57 PM on Monday, January 28, 2008

ia l
t e r The MEANS Procedure

Analysis Variable : Total_Retail_Price Total Retail Price for This Product

M a Mean Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

n
617 162.2001053 233.8530183 2.6000000 1937.20

a r y t i o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11-20 Chapter 11 Enhancing Reports

11.2 Adding Labels and Formats

Objectives
Display descriptive column headings using the
LABEL statement.
Display formatted values using the FORMAT
statement.

ia l
t e r
M a
r y i o n
t a u t
45

r i e r i b S
p
o dis SA t
P r Labels and Formats (Review)

f
r
When displaying reports,

S f o o
a label changes the appearance of a variable name

e
a format changes the appearance of variable value.

A
S No tsit Obs

1
d
Employee_ID

120102
Job_Title

Sales Manager
Annual
Salary

$108,255
Label

ou
2 120103 Sales Manager $87,975
3 120121 Sales Rep. II $26,600
4 120122 Sales Rep. II $27,475 Format
5 120123 Sales Rep. I $26,190

46
11.2 Adding Labels and Formats 11-21

The LABEL Statement (Review)


The LABEL statement assigns descriptive labels
to variable names.
General form of the LABEL statement:

LABEL variable = 'label'


variable = 'label'
variable = 'label';

ia l
t e r
A label can be up to 256 characters.
Labels are used automatically by many procedures.

a
The PRINT procedure uses labels when the LABEL
or SPLIT= option is specified in the PROC PRINT
statement.
M
47

r y i o n
t a u t
r i e i b
Assigning Temporary Labels
r S
p t
PROC FREQ automatically uses labels.

o dis SA
r
proc freq data=orion.sales;
tables Gender;

S P run;

o o f
label Gender='Sales Employee Gender';

r
A t f d e The FREQ Procedure

S No tsiGender
Sales Employee Gender

Frequency Percent
Cumulative
Frequency
Cumulative
Percent

ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68 41.21 68 41.21
M 97 58.79 165 100.00

p111d05
48
11-22 Chapter 11 Enhancing Reports

Assigning Temporary Labels


PROC PRINT does not automatically use labels.
proc print data=orion.sales;
var Employee_ID Job_Title Salary;
label Employee_ID='Sales ID'
Job_Title='Job Title'
Salary='Annual Salary';
run;

ia l
Partial PROC PRINT Output
Obs Employee_ID

t e r
Job_Title Salary

a
1 120102 Sales Manager 108255
2 120103 Sales Manager 87975
3 120121 Sales Rep. II 26600

M
4 120122 Sales Rep. II 27475

n
5 120123 Sales Rep. I 26190

49

a r y t i o p111d05

i e t b u
p r t i
Assigning Temporary Labels
r S
The LABEL option tells PROC PRINT to use labels.

r o dis SA
proc print data=orion.sales label;
var Employee_ID Job_Title Salary;

S P o r o f
label Employee_ID='Sales ID'
Job_Title='Job Title'
Salary='Annual Salary';

A t frun;

d e
S No tsi Partial PROC PRINT Output
Obs Sales ID Job Title
Annual
Salary

ou
1 120102 Sales Manager 108255
2 120103 Sales Manager 87975
3 120121 Sales Rep. II 26600
4 120122 Sales Rep. II 27475
5 120123 Sales Rep. I 26190
p111d05
50

The FSEDIT procedure is another procedure in addition to the PRINT procedure


that uses the LABEL option.
11.2 Adding Labels and Formats 11-23

Assigning Temporary Labels


Instead of the LABEL option in PROC PRINT, the
SPLIT= option can be used.
The SPLIT= option specifies the split character, which
controls line breaks in column headers.
General form of the SPLIT= option:

SPLIT='split-character'

ia l
t e r
M a
51

r y i o n
t a u t
Without the SPLIT= option, PROC PRINT can split the headers at special characters such

r i e r i b
as the blank or underscore or in mixed-case values when going from lowercase to uppercase.

S
p
o dis SA t
Assigning Temporary Labels

P r f
The SPLIT= option makes PROC PRINT use labels.

S f o r o
proc print data=orion.sales split='*';
var Employee_ID Job_Title Salary;

e
label Employee_ID='Sales ID'

A
S No tsit run;
d
Job_Title='Job*Title'
Salary='Annual*Salary';

ou
Partial PROC PRINT Output
Job Annual
Obs Sales ID Title Salary

1 120102 Sales Manager 108255


2 120103 Sales Manager 87975
3 120121 Sales Rep. II 26600
4 120122 Sales Rep. II 27475
5 120123 Sales Rep. I 26190
p111d05
52
11-24 Chapter 11 Enhancing Reports

Assigning Permanent Labels (Review)


Using a LABEL statement in a DATA step permanently
associates labels with variables by storing the label in
the descriptor portion of the SAS data set.
data orion.bonus;
set orion.sales;

l
Bonus=Salary*0.10;

ia
label Salary='Annual*Salary'
Bonus='Annual*Bonus';

run; r
keep Employee_ID First_Name

e
Last_Name Salary Bonus;

t
run;

M a
proc print data=orion.bonus split='*';

53

r y i o n p111d05

t a u t
r i e i b
Assigning Permanent Labels (Review)
r S
p
o dis SA t
Partial PROC PRINT Output
First_ Annual Annual

P r Obs

1
Employee_ID

120102

f
Name

Tom
Last_Name

Zhou
Salary

108255
Bonus

10825.5

S f
2
3
4

o r o
120103
120121
120122
Wilson
Irenie
Christina
Dawes
Elvish
Ngan
87975
26600
27475
8797.5
2660.0
2747.5

A e
5 120123 Kimiko Hotstone 26190 2619.0

t
6 120124 Lucian Daymond 26480 2648.0

S No tsi
7
8
9
10 d 120125
120126
120127
120128
Fong
Satyakam
Sharryn
Monica
Hofmeister
Denny
Clarkson
Kletschkus
32040
26780
28100
30890
3204.0
2678.0
2810.0
3089.0

54
ou
11.2 Adding Labels and Formats 11-25

11.03 Quiz
Which statement is true concerning the
PROC PRINT output for Bonus?

a. Annual Bonus will be the label.


b. Mid-Year Bonus will be the label.

data orion.bonus;
set orion.sales;

ia l
r
Bonus=Salary*0.10;

e
label Bonus='Annual Bonus';

t
run;

M a
proc print data=orion.bonus label;
label Bonus='Mid-Year Bonus';

n
run;

56

a r y t i o p111d05

i e t b u
p r t i
The FORMAT Statement (Review)
r S
The FORMAT statement assigns formats to

r o dis SA
variable values.
General form of the FORMAT statement:

S P o r o f
FORMAT variable(s) format;

A t f e
A format is an instruction that SAS uses to write

d
data values.

S No tsi Values in the data set are not changed.

59
ou
11-26 Chapter 11 Enhancing Reports

11.04 Quiz
Which displayed value is incorrect for the given format?
Format Stored Value Displayed Value
$3. Wednesday Wed
6.1 1234.345 1234.3
COMMAX5.
DOLLAR9.2
1234.345
1234.345 $1,234.35
1.234

ia l
DDMMYY8.
DATE9.

t e r 0
0
01/01/1960
01JAN1960
YEAR4.

M a 0 1960

61

r y i o n
t a u t
r i e i b
Assigning Temporary Formats
r S
p
o dis SA t
proc print data=orion.sales label;
var Employee_ID Job_Title Salary

P r Country Birth_Date Hire_Date;


. . .

f
S f o r
format Salary dollar10.0

run;
o
Birth_Date Hire_Date monyy7.;

A
S No tsi
Obs
t d
Sales ID
e
Partial PROC PRINT Output
Job Title
Annual
Salary Country
Date of
Birth
Date of
Hire

ou
1 120102 Sales Manager $108,255 AU AUG1969 JUN1989
2 120103 Sales Manager $87,975 AU JAN1949 JAN1974
3 120121 Sales Rep. II $26,600 AU AUG1944 JAN1974
4 120122 Sales Rep. II $27,475 AU JUL1954 JUL1978
5 120123 Sales Rep. I $26,190 AU SEP1964 OCT1985

p111d06
63
11.2 Adding Labels and Formats 11-27

Assigning Temporary Formats


proc freq data=orion.sales;
tables Hire_Date;
format Hire_Date year4.;
run;

Partial PROC FREQ Output


The FREQ Procedure

Cumulative Cumulative

ia l
Hire_Date

1974
1975
Frequency

23
2
e r
Percent

13.94
1.21
Frequency

23
25
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
13.94
15.15

a
1976 4 2.42 29 17.58
1977 3 1.82 32 19.39

M
1978 7 4.24 39 23.64

n
1979 3 1.82 42 25.45

64

a r y t i o p111d06

i e t b u
p r t i
Assigning Permanent and Temporary Formats
r S
Using a FORMAT statement in a DATA step permanently

r o dis SA
associates formats with variables by storing the format in
the descriptor portion of the SAS data set.

S P r
data orion.bonus;

o
set orion.sales;

o f
f
Bonus=Salary*0.10;

A
S No tsit
run; d e
format Salary Bonus comma8.;
keep Employee_ID First_Name
Last_Name Salary Bonus;

ou
proc print data=orion.bonus;
format Bonus dollar8.;
run;

Temporary formats override permanent formats. p111d06


65
11-28 Chapter 11 Enhancing Reports

Assigning Permanent and Temporary Formats


Partial PROC PRINT Output
First_
Obs Employee_ID Name Last_Name Salary Bonus

1 120102 Tom Zhou 108,255 $10,826


2 120103 Wilson Dawes 87,975 $8,798
3 120121 Irenie Elvish 26,600 $2,660

l
4 120122 Christina Ngan 27,475 $2,748
5 120123 Kimiko Hotstone 26,190 $2,619

ia
6 120124 Lucian Daymond 26,480 $2,648
7 120125 Fong Hofmeister 32,040 $3,204

r
8 120126 Satyakam Denny 26,780 $2,678
9 120127 Sharryn Clarkson 28,100 $2,810
10 120128

a t e
Monica Kletschkus 30,890 $3,089

y M n
66

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.2 Adding Labels and Formats 11-29

Exercises

Level 1

4. Applying Labels and Formats in Reports


a. Retrieve the starter program p111e04.

ia l
e r
b. Modify the column heading for each variable as shown in the sample output that follows.

t
a
c. Display all dates in the form ddMONyyyy. If you are running SAS 9.2, specify a width of 11
for the format to obtain the hyphens as shown in the sample output that follows. Otherwise,

y M
use a width of 9; the hyphens will not appear.

n
r t i o
d. Display each salary with dollar signs, commas, and two decimal places as shown in the sample
output that follows. No salary in the data set exceeds $500,000.

a
i e t b u
e. Submit the program to produce the following report:

p r t i
Partial PROC PRINT Output

r S Employees with 3 Dependents

r o dis SA Obs
Employee
Number
Annual
Salary Birth Date Hire Date
Termination
Date

S P o r 9
11

o f 120109
120111
$26,495.00
$26,895.00
15-DEC-1986
23-JUL-1949
01-OCT-2006
01-NOV-1974
.
.

A t f 12

d
14
e
120112
120114
$26,550.00
$31,285.00
17-FEB-1969
08-FEB-1944
01-JUL-1990
01-JAN-1974
.
.

S No tsi 18
20
23
120118
120120
120123
$28,090.00
$27,645.00
$26,190.00
03-JUN-1959
05-MAY-1944
28-SEP-1964
01-JUL-1984
01-JAN-1974
01-OCT-1985
.
.
31-JAN-2005

ou
35 120135 $32,490.00 26-JAN-1969 01-OCT-1997 30-APR-2004
47 120147 $26,580.00 19-JAN-1988 01-OCT-2006 .
51 120151 $26,520.00 21-NOV-1944 01-JAN-1974 .

Level 2

5. Overriding Existing Labels and Formats


a. Retrieve the starter program p111e05.
b. Display only the year portion of the birth dates.
c. Display only the first initial of each customer's first name. Display the entire last name.
11-30 Chapter 11 Enhancing Reports

d. Show the customer's ID with exactly six digits, including leading zeros if necessary.

Documentation on SAS formats can be found in the SAS Help and


Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Formats Formats by Category). Look for a numeric format that writes standard
numeric data with leading zeros.
e. Modify the column heading for each variable as shown in the sample output that follows.
Be sure that the column header for the customer's last name is also split into two lines.
f. Submit the program to produce the following report:

ia l
Partial PROC PRINT Output

t e r Customers from Turkey

M a Obs
Customer
ID
First
Initial
Last
Name
Birth
Year

r y i o
47
n 000544 A Argac 1964

a t
48 000908 A Umran 1979

t
49 000928 B Urfalioglu 1969

i e i b u 50
51
001033
001100
S
A
Okay
Canko
1979
1964

p r t r
52

S
55
001684
002788
C
S
Aydemir
Yucel
1974
1944

r o dis SA
Level 3

P r f
6. Applying Permanent Labels and Formats

S o o
f
a. Retrieve the starter program p111e06.

A
S No tsit d e
b. Add permanent variable labels and formats to the Work.otherstatus data set so that
those attributes need not be repeated in subsequent steps.
1) Variable labels:

ou
• Employee_ID Employee Number
• Employee_Hire_Date Hired

2) The format for Employee_Hire_Date should be displayed in the yyyy.mm.dd form.

Documentation on SAS formats can be found in the SAS Help and


Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Formats Formats by Category). Look for a date format that satisfies the
requirements noted above.
11.2 Adding Labels and Formats 11-31

c. Override the permanent attributes within the PROC FREQ step so that the hire dates are
grouped by calendar quarter in the form yyyyQq and the report explicitly states that the
counts are by quarter as shown in the sample output that follows.

Documentation on SAS formats can be found in the SAS Help and


Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Formats Formats by Category). Look for a date format that satisfies the
requirements noted above.

ia l
d. Submit the program to produce the following reports. Verify that the variable attributes appear
in the PROC CONENTS output.
Partial PROC PRINT Output

t e r
a
Employees who are listed with Marital Status=O

M
Employee

n
Obs Number Hired

a r y t i o 1
2
120102
120117
1989.06.01
1986.04.01

i e t b u
3
4
120126
120145
2006.08.01
1985.06.01

i
5 120149 1993.01.01

p r t r
Partial PROC CONTENTS Output
S
o dis SA
Employees who are listed with Marital Status=O

P r f
The CONTENTS Procedure

S f o r o Alphabetic List of Variables and Attributes

A
S No tsit d
#

2
1
e Variable

Employee_Hire_Date
Employee_ID
Type

Num
Num
Len

8
8
Format

YYMMDDP10.
12.
Label

Hired
Employee Number

ou
Partial PROC FREQ Output
Employees who are listed with Marital Status=O

The FREQ Procedure

Quarter Hired

Employee_ Cumulative Cumulative


Hire_Date Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1974Q1 5 12.50 5 12.50
1976Q3 1 2.50 6 15.00
1978Q4 1 2.50 7 17.50
1981Q1 1 2.50 8 20.00
1981Q3 1 2.50 9 22.50
11-32 Chapter 11 Enhancing Reports

11.3 Creating User-Defined Formats

Objectives
Create user-defined formats using the FORMAT
procedure.
Apply user-defined formats to variables in reports.

ia l
t e r
M a
r y i o n
t a u t
70

r i e r i b S
p
o dis SA t
P r User-Defined Formats

f
r
A user-defined format needs to be created for Country.

S f o o
Current Report (partial output)

e
A t
Annual Date of Date of

d
Obs Sales ID Job Title Salary Country Birth Hire

S No tsi 61
62
63
64
65
120179
120180
120198
120261
121018
Sales
Sales
Sales
Chief
Sales
Rep. III
Rep. II
Rep. III
Sales Officer
Rep. II
$28,510
$26,970
$28,025
$243,190
$27,560
AU
AU
AU
US
US
MAR1974
JUN1954
JAN1988
FEB1969
JAN1944
JAN2004
DEC1978
DEC2006
AUG1987
JAN1974

ou
66 121019 Sales Rep. IV $31,320 US JUN1986 JUN2004

Desired Report (partial output)


Annual Date of Date of
Obs Sales ID Job Title Salary Country Birth Hire

61 120179 Sales Rep. III $28,510 Australia MAR1974 JAN2004


62 120180 Sales Rep. II $26,970 Australia JUN1954 DEC1978
63 120198 Sales Rep. III $28,025 Australia JAN1988 DEC2006
64 120261 Chief Sales Officer $243,190 United States FEB1969 AUG1987
65 121018 Sales Rep. II $27,560 United States JAN1944 JAN1974
66 121019 Sales Rep. IV $31,320 United States JUN1986 JUN2004

71
11.3 Creating User-Defined Formats 11-33

User-Defined Formats
To create and use your own formats, do the following:

Part 1 Use the FORMAT procedure to


create the user-defined format.

Part 2 Apply the format to a specific


variable(s) by using a FORMAT

ia l
t e r
statement in the reporting procedure.

M a
72

r y i o n
t a u t
r i e
The FORMAT Procedure
r i b S
p t
The FORMAT procedure is used to create user-defined

o dis SA
formats.

P rGeneral form of the FORMAT procedure with the VALUE


statement:
f
S f o r
PROC FORMAT;
o
A
S No tsit RUN;
d e
VALUE format-name range1 = 'label '
range2 = 'label '
...;

73
ou
11-34 Chapter 11 Enhancing Reports

The FORMAT Procedure


A format-name
names the format that you are creating
cannot be more than 32 characters in SAS®9
for character values, must have a dollar sign ($) as
the first character, and a letter or underscore as the
second character
for numeric values, must have a letter or underscore

ia l
as the first character
cannot end in a number

t e r
cannot be the name of a SAS format

a
does not end with a period in the VALUE statement.

M
74

r y i o n
t a u t
Format names prior to SAS®9 are limited to 8 characters.

r i e r i b S
p t
11.05 Multiple Answer Poll
o dis SA
P r Which user-defined format names are invalid?

f
r
a. $stfmt

S f o
b. $3levels

e o
A
c. _4years

S No tsit d.
e.
d
salranges
dollar

ou
76
11.3 Creating User-Defined Formats 11-35

The FORMAT Procedure


Range(s) can be
single values
ranges of values
lists of values.

Labels
can be up to 32,767 characters in length

ia l
t e
although it is not required.r
are typically enclosed in quotation marks,

M a
78

r y i o n
t a u t
r i e i b
Character User-Defined Format
r S
p
o dis SA t
character discrete

r
format character
name values

S P o r
proc format;
value $ctryfmt
o f 'AU' = 'Australia'

A t f d e
'US' = 'United States'
other = 'Miscoded';

S No tsi
run;
keyword labels

ou
The OTHER keyword matches all values that do not
match any other value or range.

p111d07
79
11-36 Chapter 11 Enhancing Reports

Character User-Defined Format


proc format;
value $ctryfmt 'AU' = 'Australia'
Part 1
'US' = 'United States'
other = 'Miscoded';
run;

proc print data=orion.sales label;


var Employee_ID Job_Title Salary
Country Birth_Date Hire_Date;

ia l
r
label Employee_ID='Sales ID'
Job_Title='Job Title'

e
Salary='Annual Salary'

t
Birth_Date='Date of Birth'

a
Hire_Date='Date of Hire';
format Salary dollar10.0
Part 2

M
Birth_Date Hire_Date monyy7.

n
Country $ctryfmt.;

y
80
run;

a r t i o
i e t b u
p r t i
Character User-Defined Format
r
Partial PROC PRINT Output
S
r o dis SA
Obs Sales ID Job Title
Annual
Salary Country
Date of
Birth
Date of
Hire

P f
60 120178 Sales Rep. II $26,165 Australia NOV1954 APR1974

r
61 120179 Sales Rep. III $28,510 Australia MAR1974 JAN2004

o
62 120180 Sales Rep. II $26,970 Australia JUN1954 DEC1978

S o
63 120198 Sales Rep. III $28,025 Australia JAN1988 DEC2006
64 120261 Chief Sales Officer $243,190 United States FEB1969 AUG1987

f e
65 121018 Sales Rep. II $27,560 United States JAN1944 JAN1974

A
66 121019 Sales Rep. IV $31,320 United States JUN1986 JUN2004

t
67 121020 Sales Rep. IV $31,750 United States FEB1984 MAY2002

d
68 121021 Sales Rep. IV $32,985 United States DEC1974 MAR1994

S No tsi
69 121022 Sales Rep. IV $32,210 United States OCT1979 FEB2002
70 121023 Sales Rep. I $26,010 United States MAR1964 MAY1989
71 121024 Sales Rep. II $26,600 United States SEP1984 MAY2004
72 121025 Sales Rep. II $28,295 United States OCT1949 SEP1975

81
ou
11.3 Creating User-Defined Formats 11-37

Numeric User-Defined Format


numeric
ranges

proc format;
value tiers 20000-49999 = 'Tier 1'
50000-99999 = 'Tier 2'
100000-250000 = 'Tier 3';
run; numeric
format

ia l
r
name labels

a t e
y M n
83

a r t i o p111d07

i e t b u
p r
11.06 Quiz

t r i S
If you have a value of 99999.87, how will it be displayed

r o dis SA
if the TIERS format is applied to the value?

P f
a. Tier 2

S
b.
c.

f o r
Tier 3

o
a missing value

A
S No tsi
d.

t
proc format;
d
value tiers
e
none of the above

20000-49999 = 'Tier 1'


50000-99999 = 'Tier 2'

ou
100000-250000 = 'Tier 3';
run;

85
11-38 Chapter 11 Enhancing Reports

Numeric User-Defined Formats


The less than (<) symbol excludes values from ranges.
Put < after the value if you want to exclude the first
value in a range.
Put < before the value if you want to exclude the last
value in a range.

50000 - 100000 Includes 50000 Includes 100000

ia l
50000 - < 100000

e r
Includes 50000

t
Excludes 100000

50000 < - 100000

50000 < - < 100000

M a Excludes 50000

Excludes 50000
Includes 100000

Excludes 100000

87

r y i o n
t a u t
r i e
11.07 Quiz
r i b S
p t
If you have a value of 100000, how will it be displayed

o dis SA
if the TIERS format is applied to the value?

P r a. Tier 2

f
S f
b.

o
c.
r Tier 3
100000
o
A
S No tsit
d.

d e
a missing value

proc format;
value tiers 20000-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'

ou
100000<-250000 = 'Tier 3';
run;

89
11.3 Creating User-Defined Formats 11-39

Numeric User-Defined Format


keyword

proc format;
value tiers low-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'
100000<-high = 'Tier 3';
run;

keyword

ia l
t e r
LOW encompasses the lowest possible value.

a
HIGH encompasses the highest possible value.

M
91

r y i o n p111d07

t a u t
Low does not include missing values for numeric variables.

r i e i b
Low does include missing values for character variables.

r S
p
o dis SA t
P r Numeric User-Defined Format
proc format;

f
S o r
Part 1

f o
value tiers low-<50000 = 'Tier 1'
50000- 100000 = 'Tier 2'

A
S No tsit
run;

d e 100000<-high

proc print data=orion.sales label;


= 'Tier 3';

var Employee_ID Job_Title Salary

ou
Country Birth_Date Hire_Date;
label Employee_ID='Sales ID'
Job_Title='Job Title'
Salary='Annual Salary'
Birth_Date='Date of Birth'
Hire_Date='Date of Hire';
format Birth_Date Hire_Date monyy7.
Part 2
Salary tiers.;
run;
92
11-40 Chapter 11 Enhancing Reports

Numeric User-Defined Format


Partial PROC PRINT Output
Annual Date of Date of
Obs Sales ID Job Title Salary Country Birth Hire

60 120178 Sales Rep. II Tier 1 AU NOV1954 APR1974


61 120179 Sales Rep. III Tier 1 AU MAR1974 JAN2004
62 120180 Sales Rep. II Tier 1 AU JUN1954 DEC1978
63 120198 Sales Rep. III Tier 1 AU JAN1988 DEC2006
64 120261 Chief Sales Officer Tier 3 US FEB1969 AUG1987

l
65 121018 Sales Rep. II Tier 1 US JAN1944 JAN1974
66 121019 Sales Rep. IV Tier 1 US JUN1986 JUN2004

ia
67 121020 Sales Rep. IV Tier 1 US FEB1984 MAY2002
68 121021 Sales Rep. IV Tier 1 US DEC1974 MAR1994

r
69 121022 Sales Rep. IV Tier 1 US OCT1979 FEB2002
70 121023 Sales Rep. I Tier 1 US MAR1964 MAY1989
71 121024 Sales Rep. II Tier 1 US SEP1984 MAY2004

e
72 121025 Sales Rep. II Tier 1 US OCT1949 SEP1975

a t
y M n
93

a r t i o
i e t b u
p r
proc format;
t i
Other User-Defined Format Examples
r S
r o dis SA
value $grade 'A'
'B'-'D'
=
=
'Good'
'Fair'

P
'F' = 'Poor'

S o r o f 'I','U'
other
=
=
'See Instructor'
'Miscoded';

f
run;

A
S No tsit d e
proc format;
value mnthfmt 1,2,3
4,5,6
=
=
'Qtr 1'
'Qtr 2'
7,8,9 = 'Qtr 3'

ou
10,11,12 = 'Qtr 4'
. = 'missing'
other = 'unknown';
run;
94
11.3 Creating User-Defined Formats 11-41

Multiple User-Defined Formats


Multiple VALUE statements can be in a single
PROC FORMAT step.
proc format;
value $ctryfmt 'AU' = 'Australia'
'US' = 'United States'

l
other = 'Miscoded';
value tiers low-<50000 = 'Tier 1'

ia
50000- 100000 = 'Tier 2'

r
100000<-high = 'Tier 3';
run;

a t e
y M n
95

a r t i o p111d07

i e t b u
p r t r i
Multiple User-Defined Formats
S
o dis SA
proc print data=orion.sales label;
. . .

P r format Birth_Date Hire_Date monyy7.

f
Country $ctryfmt.

r
Salary tiers.;

S
run;

f o e o
A
S No tsit
Obs

60
d
Partial PROC PRINT Output
Sales ID

120178
Job Title

Sales Rep. II
Annual
Salary

Tier 1
Country

Australia
Date of
Birth

NOV1954
Date of
Hire

APR1974

ou
61 120179 Sales Rep. III Tier 1 Australia MAR1974 JAN2004
62 120180 Sales Rep. II Tier 1 Australia JUN1954 DEC1978
63 120198 Sales Rep. III Tier 1 Australia JAN1988 DEC2006
64 120261 Chief Sales Officer Tier 3 United States FEB1969 AUG1987
65 121018 Sales Rep. II Tier 1 United States JAN1944 JAN1974
66 121019 Sales Rep. IV Tier 1 United States JUN1986 JUN2004
67 121020 Sales Rep. IV Tier 1 United States FEB1984 MAY2002

p111d07
96
11-42 Chapter 11 Enhancing Reports

Multiple User-Defined Formats


proc freq data=orion.sales;
tables Country Salary;
format Country $ctryfmt. Salary tiers.;
run;
The FREQ Procedure

Country Frequency Percent


Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ia l
r
Australia 63 38.18 63 38.18
United States 102 61.82 165 100.00

Salary

a
Frequency
t e Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Tier 1
Tier 2
Tier 3

y M
159
4
2
96.36

n
2.42
1.21
159
163
165
96.36
98.79
100.00
97

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.3 Creating User-Defined Formats 11-43

Exercises

Level 1

7. Creating User-Defined Formats

ia l
t e r
a. Retrieve the starter program p111e07.
b. Create a character format named $gender that displays gender codes as follows:

F Female

M a
M

r
Male

y i o n
c. Create a numeric format named moname that displays month numbers as follows:

t
1
a January

u t
r i e 2

i b
February

r S
p
o dis SA
3
tMarch

P r
d. In the PROC FREQ step, apply these two user-defined formats to the Employee_Gender

f
and BirthMonth variables, respectively.

S o r o
e. Submit the program to produce the following report:

f
A
S No tsit d e
PROC FREQ Output
Employees with Birthdays in Q1

The FREQ Procedure

ou
Birth Cumulative Cumulative
Month Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
January 44 38.94 44 38.94
February 34 30.09 78 69.03
March 35 30.97 113 100.00

Employee_ Cumulative Cumulative


Gender Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Female 52 46.02 52 46.02
Male 61 53.98 113 100.00
11-44 Chapter 11 Enhancing Reports

Level 2

8. Defining Ranges in User-Defined Formats


a. Retrieve the starter program p111e08.
b. Create a character format named $gender that displays gender codes as follows:

F Female

M Male

ia l
Any other value

t e r Invalid code

c. Create a numeric format named salrange that displays salary ranges as follows:

M a
At least 20,000 but less than 100,000 Below $100,000

r i o n
At least 100,000 and up to 500,000

y missing
$100,000 or more

Missing salary

t a u t
Any other value Invalid salary

r i e i b
d. In the PROC PRINT step, apply these two user-defined formats to the Gender and Salary

r S
t
variables, respectively.

p
o dis SA
e. Submit the program to produce the following report:

r
P
Partial PROC PRINT Output

S o r o f Distribution of Salary and Gender Values


for Non-Sales Employees

A t f Obs

d e Employee_ID Job_Title Salary Gender

S No tsi 1
2
3
120101
120104
120105
Director
Administration Manager
Secretary I
$100,000 or more
Below $100,000
Below $100,000
Male
Female
Female

ou
4 120106 Office Assistant II Missing salary Male
5 120107 Office Assistant III Below $100,000 Female
6 120108 Warehouse Assistant II Below $100,000 Female
7 120108 Warehouse Assistant I Below $100,000 Female
8 120110 Warehouse Assistant III Below $100,000 Male
9 120111 Security Guard II Below $100,000 Male
10 120112 Below $100,000 Female
11 120113 Security Guard II Below $100,000 Female
12 120114 Security Manager Below $100,000 Invalid code
13 120115 Service Assistant I Invalid salary Male

The PROC PRINT output might not have an invalid Gender value if the data was
previously cleaned.
11.3 Creating User-Defined Formats 11-45

Level 3

9. Creating a Nested Format Definition


a. Retrieve the starter program p111e09.
b. Create a user-defined format that displays date ranges as follows:

Dates through 31DEC2006 Apply the YEAR4. format.

Dates starting 01JAN2007 Apply the MONYY7. format.

ia l
r
missing Display the text None.

a t e
Documentation about the FORMAT procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The FORMAT Procedure).

y M
The documentation for the VALUE statement describes how to use an existing

n
format as the label for a range.

a r t i o
c. Apply the new format to the Employee_Term_Date variable in the PROC FREQ step.

e t u
d. Submit the program to produce the following report:

i b
p r r i S
An option is required in the TABLES statement in order to display missing values as part

t
of the main frequency report. Documentation about the FREQ procedure can be found in

o dis SA
the SAS Help and Documentation from the Contents tab (SAS Products Base SAS

P r Base SAS Procedures Guide: Statistical Procedures The FREQ Procedure).

f
r
PROC FREQ Output

S f o e o Employee Status Report

A
S No tsit d Employee_
Term_Date Frequency
The FREQ Procedure

Percent
Cumulative
Frequency
Cumulative
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ou
None 308 72.64 308 72.64
2002 6 1.42 314 74.06
2003 29 6.84 343 80.90
2004 18 4.25 361 85.14
2005 21 4.95 382 90.09
2006 20 4.72 402 94.81
JAN2007 3 0.71 405 95.52
FEB2007 3 0.71 408 96.23
MAR2007 7 1.65 415 97.88
APR2007 3 0.71 418 98.58
MAY2007 4 0.94 422 99.53
JUN2007 2 0.47 424 100.00
11-46 Chapter 11 Enhancing Reports

11.4 Subsetting and Grouping Observations

Objectives
Display selected observations in reports by using the
WHERE statement.
Display groups of observations in reports by using the
BY statement.

ia l
t e r
M a
r y i o n
t a u t
101

r i e r i b S
p
o dis SA t
P r The WHERE Statement (Review)

f
r
For subsetting observations in a report, the WHERE

S f o o
statement is used to select observations that meet a
certain condition.

e
A
S No tsit d
General form of the WHERE statement:

WHERE where-expression;

ou
The where-expression is a sequence of operands and
operators that form a set of instructions that define a
condition for selecting observations.
Operands include constants and variables.
Operators are symbols that request a comparison,
arithmetic calculation, or logical operation.
102
11.4 Subsetting and Grouping Observations 11-47

11.08 Quiz
Which of the following WHERE statements have
invalid syntax?

a. where
. Salary ne .;

b. where
. Hire_Date >= '01APR2008'd;

c. where
. Country in (AU US);

ia l
d. where
.

e r
Salary + Bonus <= 10000;

t
e. where
.
a
Gender ne 'M' Salary >= 50000;

M n
f. where
. Name like '%N';

104

a r y t i o
i e t b u
p r t i
Subsetting Observations
r
proc print data=orion.sales;
S
r o dis SA
var First_Name Last_Name
Job_Title Country Salary;

P f
where Salary > 75000;

r
run;

S f o First_

e o
A
Obs Name Last_Name Job_Title Country Salary

S No tsit 1
2
64
163
164
Tom

Harry
Louis
Renee
d
Wilson
Zhou
Dawes
Highpoint
Favaron
Capachietti
Sales Manager
Sales Manager
Chief Sales Officer
Senior Sales Manager
Sales Manager
AU
AU
US
US
US
108255
87975
243190
95090
83505

ou
165 Dennis Lansberry Sales Manager US 84260

p111d08
106
11-48 Chapter 11 Enhancing Reports

Subsetting Observations
proc means data=orion.sales;
var Salary;
where Country = 'AU';
run;

The MEANS Procedure

N Mean
Analysis Variable : Salary

Std Dev Minimum Maximum

ia l
30158.97

e r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
63

t
12699.14 25185.00 108255.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

M a
107

r y i o n p111d08

t a u t
r i e i b
Setup for the Poll
r S
p t
Retrieve and submit program p111a02.

o dis SA
View the log to determine how SAS handles

P r multiple WHERE statements.

f
S f o r o
proc freq data=orion.sales;
tables Gender;

A
S No tsit run;
d e
where Salary > 75000;
where Country = 'US';

109
ou
11.4 Subsetting and Grouping Observations 11-49

11.09 Multiple Choice Poll


Which statement is true concerning the multiple
WHERE statements?
a. All the WHERE statements are used.
b. None of the WHERE statements is used.
c. The first WHERE statement is used.
d. The last WHERE statement is used.

ia l
t e r
M a
110

r y i o n
t a u t
r i e
The BY Statement
r i b S
p t
For grouping observations in a report, the BY statement

o dis SA
is used to produce separate sections of the report for

P r each BY group.

f
S r o
General form of the BY statement:

f o
A
S No tsit e
BY <DESCENDING> by-variable(s);

d
The observations in the data set must be sorted

ou
by the variables specified in the BY statement.

112
11-50 Chapter 11 Enhancing Reports

Grouping Observations
proc sort data=orion.sales out=work.sort;
by Country descending Gender Last_Name;
run;

proc print data=work.sort;


by Country descending Gender;
run;

ia l
t e r
M a
113

r y i o n p111d09

t a u t
r i e i b
Grouping Observations
r S
p t
Partial PROC PRINT Output

o dis SA
r
---------------------- Country=AU Gender=M ----------------------

P f
First_

r
Obs Employee_ID Name Last_Name Salary

S f o
1
2

e o
120145
120144
Sandy
Viney
Aisbitt
Barbis
26060
30265

A
3 120146 Wendall Cederlund 25985

t d
4 120172 Edwin Comber 28345
---------------------- Country=AU Gender=F ----------------------

S No tsi
5 120175 Andrew Conolly 25745
6 120103 Wilson Dawes 87975
Obs Employee_ID First_Name Last_Name Salary
7 120124 Lucian Daymond 26480
8 120126 Satyakam Denny 26780
37 120168 Selina Barcoe 25275

ou
38 120198 Meera Body 28025
39 120149 Judy Chantharasy 26390
40 120127 Sharryn Clarkson 28100
41 120138 Shani Duckett 25795
42 120121 Irenie Elvish 26600
43 120154 Caterina Hayawardhana 30490
44 120123 Kimiko Hotstone 26190
114
11.4 Subsetting and Grouping Observations 11-51

11.10 Quiz
Which is a valid BY statement for the PROC FREQ step?
a. .by Country Gender;

b. .by Gender Last_Name;

l
c. .by Country;

ia
d. .by Gender;

e r
proc sort data=orion.sales out=work.sort;

t
by Country descending Gender Last_Name;
run;

M a
proc freq data=work.sort;

116
tables Gender;
run;

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-52 Chapter 11 Enhancing Reports

Exercises

Level 1

10. Subsetting and Grouping Observations

ia l
t e r
a. Retrieve the starter program p111e10.
b. Add a PROC SORT step to sort the observations in orion.order_fact based

a
on the Order_Type variable.

M n
To avoid overwriting the orion.order_fact data set, be sure to use

y
the OUT= option to create a new data set containing the sorted observations.

r i o
Remember to use the new data set in the PROC MEANS step.

a t
i e t b u
c. Restrict the PROC MEANS analysis to two Order_Type values: 2 and 3.

i
d. Modify the PROC MEANS step to generate the summary analysis separately for each

r r S
selected Order_Type value in the sorted data set.

p t
o dis SA
e. Submit the program to produce the following output:

P r
PROC MEANS Output

f
r
Orion Star Sales Summary

S f o e o
----------------------------------------- Order Type=2 -----------------------------------------

A
S No tsit d The MEANS Procedure

Analysis Variable : Total_Retail_Price Total Retail Price for This Product

ou
N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
170 199.5961765 282.9680817 2.6000000 1937.20
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

----------------------------------------- Order Type=3 -----------------------------------------

Analysis Variable : Total_Retail_Price Total Retail Price for This Product

N Mean Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
123 174.7280488 214.3528338 2.7000000 1542.60
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
11.4 Subsetting and Grouping Observations 11-53

Level 2

11. Subsetting and Grouping by Multiple Variables


a. Retrieve the starter program p111e11.
b. Sort the orion.order_fact data set by Order_Type (in ascending sequence)
and Order_Date (in descending sequence).

Create a new data set containing the sorted observations. Do not overwrite
the orion.order_fact data set. Remember to use the new data set in

ia l
r
the PROC PRINT step.

t e
c. Divide the PROC PRINT report based on Order_Type using a BY statement. The
orders for each order type should be displayed in reverse chronological order, that is,

a
with more recent orders near the top of the report.

M n
d. Limit the observations in the PROC PRINT report based on the following criteria:

y
r i o
1) Orders placed in the first four months of 2005 (January 1 to April 30)

a t
i e t
2) Orders that were delivered exactly two days after the order was placed

b u
e. Add a second title to clarify that filters were applied to the data.

p r t r i S
f. Submit the program to produce the following report:

r o dis SA
PROC PRINT Output
Orion Star Sales Details

S P o r o f2-Day Deliveries from January to April 2005

----------------------------------------- Order Type=2 -----------------------------------------

A t f d e Order_ Delivery_

S No tsi Obs

409
Order_ID

1235611754
Date

27APR2005
Date

29APR2005

ou
410 1235611754 27APR2005 29APR2005
411 1235591214 25APR2005 27APR2005
412 1235591214 25APR2005 27APR2005
413 1234972570 24FEB2005 26FEB2005
415 1234659163 24JAN2005 26JAN2005
417 1234588648 17JAN2005 19JAN2005
418 1234588648 17JAN2005 19JAN2005
419 1234538390 12JAN2005 14JAN2005

----------------------------------------- Order Type=3 -----------------------------------------

Order_ Delivery_
Obs Order_ID Date Date

568 1235176942 15MAR2005 17MAR2005


569 1235176942 15MAR2005 17MAR2005
570 1234891576 16FEB2005 18FEB2005
11-54 Chapter 11 Enhancing Reports

Level 3

12. Adding Subsetting Conditions


a. Retrieve the starter program p111e12.
b. Reorder the variables in the PROC PRINT step's BY statement so that the BY-line displays
Supplier_Name, Supplier_ID, and Supplier_Country, in that order. The input
data remains grouped, but not sorted, by these variables.

An option must be added to the BY statement to support the use of grouped,


unsorted data. Documentation about the BY statement can be found in the SAS Help
ia l
t e r
and Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The PRINT Procedure).

M a
c. Augment the existing WHERE criteria by further restricting the report to product names
that contain either the word Street or the word Running.

y o n
To add clauses to an existing WHERE statement without retyping or editing it,

r i
use the SAME-AND operator in a separate WHERE statement within the same step.

t a u t
See the documentation in the SAS Help and Documentation from the Contents tab
(SAS Products Base SAS SAS 9.2 Language Reference: Concepts

r i e r i b
SAS System Concepts WHERE-Expression Processing
Syntax of WHERE Expression).

S
p
o dis SA t
d. Submit the program to produce the following report:

P r
Partial PROC PRINT Output

f
r
Orion Star Products: Children Sports

S o o
---------- Supplier Name=Greenline Sports Ltd Supplier ID=14682 Country=Great Britain ----------

f e
A
S No tsit d
Obs

50
Product_ID

210200600015
Product_Name

Hardcore Kids Street Shoes

ou
--------------- Supplier Name=3Top Sports Supplier ID=2963 Country=United States ---------------

Obs Product_ID Product_Name

87 210201000169 Children's Street Shoes


88 210201000174 Freestyle Children's Leather Street Shoes
91 210201000179 K Street Shoes
94 210201000187 Mona C- Children's Street Shoes
95 210201000189 Mona J- Children's Street Shoes
104 210201000205 Torino 2000 K Street Shoes
107 210201000209 Universe 4 Children's Running Shoes
11.5 Directing Output to External Files 11-55

11.5 Directing Output to External Files

Objectives
Direct output to ODS destinations by using
ODS statements.
Specify a style definition by using the STYLE= option.
Create ODS files that can be opened in

ia l
r
Microsoft Excel.

a t e
y M n
a r t i o
i e t b u
i
121

p r t r S
r o dis SA
Output Delivery System

S P r
ODS statements.

o o f
Output can be sent to a variety of destinations by using

A t f d e LISTING

S No tsi HTML

ou
Procedure
Output
RTF

PDF

122
11-56 Chapter 11 Enhancing Reports

Output Delivery System


Destination Type of File Viewed In

SAS Output Window or


LISTING
SAS/GRAPH Window
Web Browsers such
HTML Hypertext Markup Language
as Internet Explorer

PDF Portable Document Format


Adobe Products such
as Acrobat Reader

ia l
RTF

e r
Rich Text Format

t
Word Processors such
as Microsoft Word

M a
123

r y i o n
t a u t
r i e i b
Default ODS Destination
r S
p t
The LISTING destination is the default ODS destination.

o dis SA
P r ods listing;

f
r
proc freq data=orion.sales;

S f o
run;

e o
tables Country;

A
S No tsit d
proc gchart data=orion.sales;
hbar Country / nostats;
run;

124
ou p111d10
11.5 Directing Output to External Files 11-57

Default ODS Destination


The LISTING destination directs output to the
OUTPUT window and the GRAPH window.

ia l
t e r
M a
125

r y i o n
t a u t
r i e i b
Default ODS Destination
r S
p t
The ODS LISTING CLOSE statement stops sending

o dis SA
output to the OUTPUT and GRAPH windows.

P r ods listing close;

f
S f o r o
proc freq data=orion.sales;
tables Country;

A
S No tsit
run;

d e
proc gchart data=orion.sales;
hbar Country / nostats;
run;

126
ou p111d10
11-58 Chapter 11 Enhancing Reports

Default ODS Destination


A warning will appear in the SAS log if the LISTING
destination is closed and no other destinations are active.

Partial SAS Log


23 ods listing close;

l
24
25 proc freq data=orion.sales;

ia
26 tables Country;
27 run;

e r
WARNING: No output destinations active.

t
NOTE: There were 165 observations read from the data set ORION.SALES.

M a
127

r y i o n
t a u t
r i e i b
HTML, PDF, and RTF Destinations
r S
p t
ODS destinations such as HTML, PDF, and RTF

o dis SA
are opened and closed in the following manner:

P r f
ODS destination FILE = ' filename.ext ' <options>;

S f o r o
SAS code to generate a report(s)

A
S No tsit d e
ODS destination CLOSE;

128
ou
The filename specified in the FILE= option needs to be specific to your operating environment.
11.5 Directing Output to External Files 11-59

HTML Destination
ods html file='myreport.html';
proc freq data=orion.sales;
tables Country;
run;
ods html close;

ia l
t e r
M a
129

r y i o n p111d11

t a u t
Always terminate steps with an explicit step boundary before closing the destination.

r i e r i b
Otherwise, the file is closed before the step executes.

S
p
o dis SA t
PDF Destination

P r f
ods pdf file='myreport.pdf';

S f o r
proc freq data=orion.sales;

o
tables Country;
run;

A
S No tsit d e
ods pdf close;

ou
p111d11
130
11-60 Chapter 11 Enhancing Reports

RTF Destination
ods rtf file='myreport.rtf';
proc freq data=orion.sales;
tables Country;
run;
ods rtf close;

ia l
t e r
M a
131

r y i o n p111d11

t a u t
The traditional RTF destination does not control vertical measurement, so page breaking is

r i e r i b
controlled by the word processor. Starting in SAS 9.2, there is a new destination, TAGSETS.RTF,

S
which does control vertical measurement. Documentation about the TAGSETS.RTF destination

p t
can be found in the SAS Help and Documentation from the Contents tab (SAS Products

o dis SA
Base SAS SAS 9.2 Output Delivery System User’s Guide ODS Language Statements

P r Dictionary of ODS Language Statements ODS TAGSETS.RTF Statement).

f
S f o r
11.11 Quiz o
A
S No tsit d e
What is the problem with this program?

ods pdf file='myreport.pdf';

ou
proc print data=orion.sales;
run;

ods close;

133
11.5 Directing Output to External Files 11-61

Single Destination
Output can be sent to only one destination.
ods listing close;

ods html file='example.html';

proc freq data=orion.sales;


tables Country;
run;

ia l
ods html close;

ods listing;
t e r
It is a good habit to open the
LISTING destination at the end of

a
a program to guarantee an open
destination for the next

y M n
submission.

135

a r t i o p111d11

i e t b u
p r t i
Multiple Destinations
r S
Output can be sent to many destinations.

r o dis SA
ods listing;
ods pdf file='example.pdf';

S P o r o f
ods rtf file='example.rtf';

proc freq data=orion.sales;

A t f e
tables Country;
run;

d
S No tsi ods pdf close;
ods rtf close;

136
ou
To view the results, all destinations except the LISTING
destination must be closed.
p111d11
11-62 Chapter 11 Enhancing Reports

Multiple Destinations
Use _ALL_ in the ODS CLOSE statement to close all
open destinations including the LISTING destination.
ods listing;
ods pdf file='example.pdf';
ods rtf file='example.rtf';

proc freq data=orion.sales;


tables Country;

ia l
r
run;

ods _all_ close;


ods listing;

a t e
y M n
137

a r t i o p111d11

i e t b u
p r t i
Multiple Procedures
r S
Output from many procedures can be sent to ODS

r o dis SA
destinations.

P f
ods listing;

r
ods pdf file='example.pdf';

S f o e o
ods rtf file='example.rtf';

A
proc freq data=orion.sales;

S No tsit run;
d
tables Country;

proc means data=orion.sales;

ou
var Salary;
run;

ods _all_ close;


ods listing;
p111d11
138
11.5 Directing Output to External Files 11-63

File Location
A path can be specified to control the location of where
the file is stored.
ods html file='s:\workshop\example.html';

proc freq data=orion.sales;

l
tables Country;
run;

proc means data=orion.sales;


var Salary;

e r ia
t
run;

ods html close;

M a
n
If no path is specified, the file is saved in the current
139

a y
default directory.

r t i o p111d11

i e t b u
The path and filename specified in the FILE= option needs to be specific to your operating environment.

p r t r i
Operating Environments
S
r o dis SA
The Output Delivery System works on all operating

P f
environments.

S o r o
z/OS (OS/390) Example:

f
A
S No tsi d e
ods html file='.workshop.report(example)'

t rs=none;

proc freq data=orion.sales;


tables Country;

ou
run;

ods html close;

Use the RS=NONE option when you create HTML and


RTF files on z/OS (OS/390).
p111d11
140

The RS= option is an alias for the RECORD_SEPARATOR= option.

RS=NONE writes one line of markup output at a time to the file. This enables the file to be read
with a text editor. Without the option, the lines of markup output run together.

RS=NONE is not needed with the PDF destination. It is needed with other destinations such as
CSVALL, MSOFFICE2K, and EXCELXP.
11-64 Chapter 11 Enhancing Reports

Creating HTML, PDF, and RTF Files


p111d12
Submit the following program and view the results in the appropriate application. The HTML file can
be viewed in a Web browser, the PDF file can be viewed in an Adobe product, and the RTF file can be
viewed in a word processor.
ods listing close;
ods html file='myreport.html';

ia l
ods pdf file='myreport.pdf';
ods rtf file='myreport.rtf';

t e r
proc freq data=orion.sales;
tables Country;

M a
n
title 'Report 1';
run;

a r y
proc means data=orion.sales;
t i o
var Salary;

i e t
title 'Report 2';

b u
run;

p r t r i S
o dis SA
proc print data=orion.sales;

r
var First_Name Last_Name
Job_Title Country Salary;

S P r
where Salary > 75000;
title 'Report 3';

o o f
f
run;

A
S No tsit
ods _all_ close;
ods listing;
d e
For z/OS (OS/390), the following ODS statements are used:

ou
ods html file='.workshop.report(myhtml) rs=none';
ods pdf file='.workshop.report(mypdf)';
ods rtf file='.workshop.report(myrtf)' rs=none;
11.5 Directing Output to External Files 11-65

If you are using the SAS windowing environment on the Windows operating environment,
the Results Viewer window can be used to view the HTML, PDF, and RTF files.
1. After submitting the program, go to the Results window.
2. Right-click on the word Results within the Results window and select Expand All.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
3. Double-click on the HTML file icon, the PDF file icon, or the RTF file icon.

S f o r o
A
S No tsit d e
ou
11-66 Chapter 11 Enhancing Reports

4. View the HTML, PDF, or RTF file in the Results Viewer.


HTML File in the Results Viewer

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
PDF File in the Results Viewer

S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-67

RTF File in the Results Viewer

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-68 Chapter 11 Enhancing Reports

STYLE= Option
Use a STYLE= option in the ODS destination statement
to specify a style definition.

ODS destination FILE = 'filename.ext'


STYLE = style-definition;

A style definition describes how to display the

ia l
of SAS output.

t r
presentation aspects such as colors and fonts

e
STYLE= cannot be used with the LISTING destination.

M a
143

r y i o n
t a u t
r i e i b
SAS Supplied Style Definitions
r S
p
o dis SA
Analysis
t Astronomy Banker BarrettsBlue

r
Beige blockPrint Brick Brown

P f
Curve D3d Default Education

S f o r
EGDefault

o Electronics fancyPrinter Festival

e
FestivalPrinter Gears Journal Magnify

A
S No tsit Meadow

d
NoFontDefault
Rsvp
MeadowPrinter
Normal
Rtf
Minimal
NormalPrinter
sansPrinter
Money
Printer
sasdocPrinter

ou
Sasweb Science Seaside SeasidePrinter
serifPrinter Sketch Statdoc Statistical
Theme Torn Watercolor
144
11.5 Directing Output to External Files 11-69

SAS Supplied Style Definitions


The following style definitions are new to SAS 9.2:
grayscalePrinter Harvest HighContrast

Journal2 Journal3 Listing

monochromePrinter Ocean Solutions

ia l
t e r
M a
145

r y i o n
t a u t
Some of the style definitions changed between SAS 9.1.3 and SAS 9.2.

r i e r i b
• For example, the SAS 9.1.3 Analysis style definition produces different results than

S
the SAS 9.2 Analysis style definition. The SAS 9.1.3 Analysis style definition is similar

p t
to the SAS 9.2 Ocean style definition.

o dis SA
• The SAS 9.1.3 Statistical style definition produces different results than

P r
the SAS 9.2 Statistical style definition. The SAS 9.1.3 Statistical style definition is similar

f
to the SAS 9.2 Harvest style definition.

S f o r o
A
S No tsit e
HTML Examples
d STYLE=DEFAULT

ou
STYLE=SASWEB

p111d13
146

By default, the HTML destination uses the DEFAULT style definition.


11-70 Chapter 11 Enhancing Reports

PDF Examples
STYLE=PRINTER

ia l
r
STYLE=JOURNAL

a t e
y M n
147

a r t i o p111d13

i e t b u
By default, the PDF destination uses the PRINTER style definition.

p r
RTF Examples
t r i S
r o dis SA STYLE=RTF

S P o r o f
A t f d e
S No tsi STYLE=OCEAN

ou
p111d13
148

By default, the RTF destination uses the RTF style definition.


11.5 Directing Output to External Files 11-71

Setup for the Poll


Retrieve p111a03.
Add a STYLE= option to the first ODS statement, and
select one of the following style definitions:
HighContrast Minimal Listing Journal3

l
Submit the program and review the results.

ia
Modify the STYLE= option to use one of the
following style definitions:
Education Harvest

t e r Rsvp Solutions

a
Submit the program and review the results.

y M n
150

a r t i o
i e t b u
p r
11.12 Poll

t r i S
Did you notice a difference in the presentation aspects

r o dis SA
between the two style definitions?
Yes

S P No

o r o f
A t f d e
S No tsi
151
ou
11-72 Chapter 11 Enhancing Reports

Destinations Used with Excel


The following destinations create files that can be opened
in Excel.

CSVALL

Procedure
Output
MSOFFICE2K

ia l
t e r EXCELXP

M a
154

r y i o n
t a u t
r i e i b
Destinations Used with Excel
r S
p
o dis SA
Destination
t Type of File Viewed In

P r CSVALL Comma-Separated Value

f
Editor or
Microsoft Excel

S f o r
MSOFFICE2K
o
Hypertext Markup Language
Web Browser or
Microsoft Word or

A e
Microsoft Excel

S No tsit EXCELXP

d Extensible Markup Language Microsoft Excel

155
ou
11.5 Directing Output to External Files 11-73

CSVALL Destination
ods csvall file='myexcel.csv';

proc freq data=orion.sales;


tables Country;
run;

l
proc means data=orion.sales;
var Salary;

ia
run;

ods csvall close;

t e r
M a
156

r y i o n p111d14

t a u t
r i e
CSVALL Destination
r i b S
p t
CSVALL does not include any style information.

o dis SA
P r f
S f o r o
A
S No tsit d e
157
ou
11-74 Chapter 11 Enhancing Reports

MSOFFICE2K Destination
ods msoffice2k file='myexcel.html';

proc freq data=orion.sales;


tables Country;
run;

l
proc means data=orion.sales;
var Salary;

ia
run;

ods msoffice2k close;

t e r
M a
158

r y i o n p111d14

t a u t
Microsoft Excel 97 or greater is needed to open a MSOFFICE2K file.

r i e r i b S
p t
MSOFFICE2K Destination
o dis SA
P r MSOFFICE2K keeps the style information including
spanning headers.

f
S f o r o
A
S No tsit d e
ou
159
11.5 Directing Output to External Files 11-75

EXCELXP Destination
ods tagsets.excelxp file='myexcel.xml';

proc freq data=orion.sales;


tables Country;
run;

l
proc means data=orion.sales;
var Salary;

ia
run;

ods tagsets.excelxp close;

t e r
M a
160

r y i o n p111d14

t a u t
Microsoft Excel 2002 or greater is needed to open an EXCELXP file.

r i e r i b S
p t
EXCELXP Destination
o dis SA
P r EXCELXP keeps the style information and each

f
procedure is a separate sheet.

S f o r o
A
S No tsit d e
ou
161
11-76 Chapter 11 Enhancing Reports

Keep in Mind
The file you are creating is not an Excel file.
CSVALL

MSOFFIC2K

ia l
t e r EXCELXP

M a
162

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-77

Creating Files That Open in Excel


p111d15
Submit the following program and view the results in Microsoft Excel.
ods listing close;

l
ods csvall file='myexcel.csv';
ods msoffice2k file='myexcel.html';

ia
ods tagsets.excelxp file='myexcel.xml';

proc freq data=orion.sales;


tables Country;
t e r
title 'Report 1';
run;

M a
var Salary;

r y
proc means data=orion.sales;

i o n
run;
t a
title 'Report 2';

u t
r i e i b
proc print data=orion.sales;

r S
p t
var First_Name Last_Name

o dis SA
Job_Title Country Salary;

r
where Salary > 75000;
title 'Report 3';

S P
run;

o r o f
f
ods _all_ close;

A ods listing;

S No tsit d e
For z/OS (OS/390), the following ODS statements are used:
ods csvall file='.workshop.report(mycsv)' rs=none;

ou
ods msoffice2k file='.workshop.report(myhtml)' rs=none;
ods tagsets.excelxp file='.workshop.report(myxml)' rs=none;
11-78 Chapter 11 Enhancing Reports

CSV File in Microsoft Excel 2002

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p t
HTML File in Microsoft Excel 2002

o dis SA
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-79

XML File in Microsoft Excel 2002

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-80 Chapter 11 Enhancing Reports

Using Options with the EXCELXP Destination (Self-Study)


p111d16
Submit the following program and view the EXCELXP documentation in the SAS log and the XML files
in Microsoft Excel.

l
********** Documentation Option **********;
ods listing close;
ods tagsets.excelxp file='myexcel1.xml'

r
style=sasweb

e
options(doc='help'); ia
proc freq data=orion.sales;
a t
M
tables Country;

n
title 'Report 1';

y
run;

a r
proc means data=orion.sales;
t i o
var Salary;

i e t
title 'Report 2';

b u
run;

p r t r i S
o dis SA
ods tagsets.excelxp close;

r
ods listing;

P
ods listing close;

S o o f
********** Other Options **********;

r
f
ods tagsets.excelxp file='myexcel2.xml'

A
S No tsit d estyle=sasweb
options(embedded_titles='yes'
sheet_Name='First Report');

proc freq data=orion.sales;

ou
tables Country;
title 'Report 1';
run;

ods tagsets.excelxp options(sheet_Name='Second Report');


proc means data=orion.sales;
var Salary;
title 'Report 2';
run;

ods tagsets.excelxp close;


ods listing;
11.5 Directing Output to External Files 11-81

For z/OS (OS/390), the following ODS statements are used:


ods tagsets.excelxp file='.workshop.report(myxml1)' rs=none
style=sasweb
options(doc='help');

ods tagsets.excelxp file='.workshop.report(myxml2)' rs=none


style=sasweb
options(embedded_titles='yes'
sheet_Name='First Report');
Partial SAS Log

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-82 Chapter 11 Enhancing Reports

XML Files in Microsoft Excel 2002

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11.5 Directing Output to External Files 11-83

Exercises

Level 1

13. Directing Output to the PDF and RTF Destinations


a. Retrieve the starter program p111e13.

ia l
e r
b. Create the PDF version of the PROC PRINT report by adding ODS statements.

t
a
Use the following naming convention when creating the PDF file:

Windows or UNIX

y M n
p111s13p.pdf

o
z/OS (OS/390) .workshop.report(p111s13p)

a r t i
c. Submit the program to produce the following report in PDF form as displayed in Adobe Reader:

t
i e i b u
Partial PROC PRINT Output

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou

Compare this PDF output to the equivalent report that appears in the Output window.
11-84 Chapter 11 Enhancing Reports

d. Modify your ODS statements to create the RTF version of the PROC PRINT report.
Use the following naming convention when creating the RTF file:

Windows or UNIX p111s13r.rtf

z/OS (OS/390) .workshop.report(p111s13r)

e. Suppress the default Output window listing before generating the RTF report, and then re-
establish the Output window as the report destination after the RTF report is complete.

ia l
f. Submit the program to produce the following report in RTF form as displayed in Microsoft Word:
Partial PROC PRINT Output

t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
What happens if the RUN statement is moved to the end of the program?

g. Add the STYLE= option to the ODS RTF statement to use a style definitions such
as Curve, Gears, Money, or Torn.
h. Submit the program and view the report in RTF form in Microsoft Word.
11.5 Directing Output to External Files 11-85

Level 2

14. Creating ODS Output Compatible with Microsoft Excel


a. Retrieve the starter program p111e14.
b. Add ODS statements to send the report to a file that can be viewed in Microsoft Excel.
Choose the ODS destination (and use the associated file extension) based on whether
you want
1) style information stored in the report output

ia l
t e r
2) the reports in a single worksheet or multiple worksheets.
If selecting a destination that supports style information, specify the Listing style definition.

M a
Use the following naming convention when creating the file. For Windows or UNIX, choose
an appropriate extension for the file depending on the type of file that is created.

r y
Windows or UNIX

i o n p111s14.xxx

t a
z/OS (OS/390)

u t .workshop.report(p111s14)

r i e
c. Submit the program to produce the output file.

r i b S
d. Open the file with Microsoft Excel. The report should resemble the following results. Your

p t
output will look different depending on the ODS destination you choose.

o dis SA
P r f
S f o r o
A
S No tsit d e
Level 3

ou
15. Adding HTML-Specific Features to ODS Output
a. Retrieve the starter program p111e15.
b. Create the HTML version of the PROC PRINT report by adding ODS statements.
Use the following naming convention when creating the HTML file:

Windows or UNIX p111s15.html

z/OS (OS/390) .workshop.report(p111s15)


11-86 Chapter 11 Enhancing Reports

c. Customize the title so that it becomes a clickable hyperlink when displayed in a Web browser.
The hyperlink should point to the URL http://www.sas.com (the SAS home page).

An option must be added to the TITLE statement to make it an active hyperlink.


Documentation about the TITLE statement can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Language Reference: Dictionary Dictionary of Language Elements
Statements TITLE Statement).
d. Submit the program to produce the following report in HTML form as displayed in Internet
Explorer:

ia l
Partial PROC PRINT Output

t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
e
16. Implementing Cascading Style Sheets with ODS Output

A
S No tsit d
a. Retrieve the starter program p111e16.
b. Modify the ODS HTML statement so that the output generated by the program uses a cascading
style sheet. The CSS definition applies a yellow background to data cells in the ODS output.

ou
Use the following cascading style sheet:

Windows or UNIX

z/OS (OS/390)
p111e16c.css

.workshop.report(p111e16c)

The syntax required to reference an existing CSS file can be found in the
SAS Help and Documentation from the Contents tab (SAS Products Base SAS
SAS 9.2 Output Delivery System User's Guide ODS Language Statements
Dictionary of ODS Language Statements ODS HTML Statement). Follow the
documentation links for the STYLESHEET= option and its URL= suboption.
11.5 Directing Output to External Files 11-87

c. Submit the program to produce the following HTML report as displayed in Internet Explorer:
PROC PRINT Output

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
11-88 Chapter 11 Enhancing Reports

11.6 Chapter Review

Chapter Review
1. What are some examples of global statements that
enhance reports?

2. What is the maximum number of title or footnote lines?

ia l
e r
3. How can you force a line break in a column header in
PROC PRINT?
t
M a
4. What is the difference between using a FORMAT

r i o n
statement in a PROC step versus a DATA step?

y
t a u t
167

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review

f
r
5. How can you create a descriptive label for values of

S f o o
a variable such as a department name instead of a
department code?

e
A
S No tsit d
6. What are some examples of ODS destinations?

ou
168
11.7 Solutions 11-89

11.7 Solutions

Solutions to Exercises
1. Specifying Titles, Footnotes, and System Options
a. Retrieve the starter program.
b. Use the OPTIONS statement.
options nonumber nodate pagesize=18;

ia l
r
proc means data=orion.order_fact;

e
var Total_Retail_Price;
run;
options pagesize=52;

a t
c. Specify a title.

y M
options nonumber nodate pagesize=18;
n
r
title 'Orion Star Sales Report';

a t
proc means data=orion.order_fact;
i o
run;

i e t
var Total_Retail_Price;

b u
p r
options pagesize=52;

t r
d. Specify a footnote.i S
r o dis SA
options nonumber nodate pagesize=18;
title 'Orion Star Sales Report';

S P o o f
footnote 'Report by SAS Programming Student';

r
proc means data=orion.order_fact;
var Total_Retail_Price;

A
run;

t f d e
options pagesize=52;

S No tsi
e. Cancel the footnote.
options nonumber nodate pagesize=18;

ou
title 'Orion Star Sales Report';
footnote 'Report by SAS Programming Student';
proc means data=orion.order_fact;
var Total_Retail_Price;
run;
options pagesize=52;
footnote;
f. Submit the program.
11-90 Chapter 11 Enhancing Reports

2. Specifying Multiple Titles and System Options


a. Retrieve the starter program.
b. Limit the number of lines per page.
options pagesize=18;
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
run;

proc means data=orion.order_fact;


ia l
where Order_Type=3;
var Total_Retail_Price;

t e r
a
run;
options pagesize=52;

M n
c. Request page numbers starting at 1.

y o
options pagesize=18 number pageno=1;

r t
proc means data=orion.order_fact;

t a
where Order_Type=2;
i
run;

i e
var Total_Retail_Price;

i b u
p r
options pageno=1;
t r S
o dis SA
proc means data=orion.order_fact;

P r
where Order_Type=3;
var Total_Retail_Price;

f
run;

S o r
options pagesize=52;

f o
A
S No tsi d e
d. Request the current date and time.

t
options pagesize=18 number pageno=1 date dtreset;
e. Specify a title in both reports.

ou
options pagesize=18 number pageno=1 date dtreset;
title1 'Orion Star Sales Analysis';
f. Specify a secondary title in the first report.
options pagesize=18 number pageno=1 date dtreset;
title1 'Orion Star Sales Analysis';
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
title3 'Catalog Sales Only';
run;
11.7 Solutions 11-91

g. Specify a footnote in the first report.


options pagesize=18 number pageno=1 date dtreset;
title1 'Orion Star Sales Analysis';
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
title3 'Catalog Sales Only';
footnote "Based on the previous day's posted data";
run;
h. Specify a secondary title in the second report.

ia l
title1 'Orion Star Sales Analysis';

t
proc means data=orion.order_fact; r
options pagesize=18 number pageno=1 date dtreset;

e
where Order_Type=2;
var Total_Retail_Price;

M a
n
title3 'Catalog Sales Only';

y
footnote "Based on the previous day's posted data";
run;

a r t i o
i e t
options pageno=1;

b u
proc means data=orion.order_fact;

p r
where Order_Type=3;

t r i
var Total_Retail_Price;
S
o dis SA
title3 'Internet Sales Only';

r
run;
options pagesize=52;

S P o r o f
i. Cancel all footnotes for the second report.

f
options pagesize=18 number pageno=1 date dtreset;

A
S No tsi d e
title1 'Orion Star Sales Analysis';

t
proc means data=orion.order_fact;
where Order_Type=2;
var Total_Retail_Price;
title3 'Catalog Sales Only';

ou
footnote "Based on the previous day's posted data";
run;

options pageno=1;
proc means data=orion.order_fact;
where Order_Type=3;
var Total_Retail_Price;
title3 'Internet Sales Only';
footnote;
run;
options pagesize=52;
j. Submit the program.
11-92 Chapter 11 Enhancing Reports

3. Inserting Dates and Times into Titles


a. Use the OPTIONS procedure.
proc options option=date;
run;
options nodate;
b. Retrieve the starter program.

l
c. Add a title.

ia
%let currentdate=%sysfunc(today(),weekdate.);
%let currenttime=%sysfunc(time(),timeampm8.);
proc means data=orion.order_fact;

t e r
title "Sales Report as of &currenttime on &currentdate";

a
var Total_Retail_Price;
run;
d. Submit the program.

y M n
r i o
4. Applying Labels and Formats in Reports

a t
i e t
a. Retrieve the starter program.

b u
b. Modify the column heading for each variable.

r r i S
proc print data=orion.employee_payroll label;

p t
where Dependents=3;

r o dis SA
title 'Employees with 3 Dependents';
var Employee_ID Salary

S P o o f
Birth_Date Employee_Hire_Date Employee_Term_Date;

r
label Employee_ID='Employee Number'
Salary='Annual Salary'

A t f e
Birth_Date='Birth Date'

d
Employee_Hire_Date='Hire Date'

S No tsi
run;
Employee_Term_Date='Termination Date';

ou
c. Display all dates in the form ddMONyyyy.
proc print data=orion.employee_payroll label;
where Dependents=3;
title 'Employees with 3 Dependents';
var Employee_ID Salary
Birth_Date Employee_Hire_Date Employee_Term_Date;
label Employee_ID='Employee Number'
Salary='Annual Salary'
Birth_Date='Birth Date'
Employee_Hire_Date='Hire Date'
Employee_Term_Date='Termination Date';
format Birth_Date Employee_Hire_Date Employee_Term_Date date11.;
run;
11.7 Solutions 11-93

d. Display each salary with dollar signs, commas, and two decimal places.
proc print data=orion.employee_payroll label;
where Dependents=3;
title 'Employees with 3 Dependents';
var Employee_ID Salary
Birth_Date Employee_Hire_Date Employee_Term_Date;
label Employee_ID='Employee Number'
Salary='Annual Salary'
Birth_Date='Birth Date'
Employee_Hire_Date='Hire Date'
Employee_Term_Date='Termination Date';

ia l
run;
Salary dollar11.2;

t r
format Birth_Date Employee_Hire_Date Employee_Term_Date date11.

e
e. Submit the program.

M a
r i
a. Retrieve the starter program.
o n
5. Overriding Existing Labels and Formats

y
t a u t
b. Display only the year portion of the birth dates.

r i e
where Country='TR';

r i b
proc print data=orion.customer;

S
p t
title 'Customers from Turkey';

o dis SA
var Customer_ID Customer_FirstName Customer_LastName

r
Birth_Date;
format Birth_Date year4.;

S P
run;

o r o f
c. Display only the first initial of each customer’s first name.

A t f e
proc print data=orion.customer;

d
S No tsi
where Country='TR';
title 'Customers from Turkey';
var Customer_ID Customer_FirstName Customer_LastName

ou
Birth_Date;
format Birth_Date year4.
Customer_FirstName $1.;
run;
d. Show the customer’s ID with exactly six digits.
proc print data=orion.customer;
where Country='TR';
title 'Customers from Turkey';
var Customer_ID Customer_FirstName Customer_LastName
Birth_Date;
format Birth_Date year4.
Customer_FirstName $1.
Customer_ID z6.;
run;
11-94 Chapter 11 Enhancing Reports

e. Modify the column heading for each variable.


proc print data=orion.customer split='/';
where Country='TR';
title 'Customers from Turkey';
var Customer_ID Customer_FirstName Customer_LastName
Birth_Date;
label Customer_ID='Customer ID'
Customer_FirstName='First Initial'
Customer_LastName='Last/Name'
Birth_Date='Birth Year';
format Birth_Date year4.

ia l
run;
Customer_FirstName $1.
Customer_ID z6.;

t e r
f. Submit the program.

M a
r i o
a. Retrieve the starter program. n
6. Applying Permanent Labels and Formats

y
t a u t
b. Add permanent variable labels and formats.

r i e
data otherstatus;

r i b
set orion.employee_payroll;

S
p t
keep Employee_ID Employee_Hire_Date;

o dis SA
if Marital_Status='O';

r
label Employee_ID='Employee Number'
Employee_Hire_Date='Hired';

P
run;

S o r o f
format Employee_Hire_Date yymmddp10.;

A t f e
title 'Employees who are listed with Marital Status=O';

d
S No tsi
proc print data=otherstatus label;
run;

ou
proc contents data=otherstatus;
run;

proc freq data=otherstatus;


tables Employee_Hire_Date;
run;
c. Override the permanent attributes within the PROC FREQ step.
proc freq data=otherstatus;
tables Employee_Hire_Date;
label Employee_Hire_Date='Quarter Hired';
format Employee_Hire_Date yyq6.;
run;
d. Submit the program.
11.7 Solutions 11-95

7. Creating User-Defined Formats


a. Retrieve the starter program.
b. Create a character format.
data Q1Birthdays;
set orion.employee_payroll;
BirthMonth=month(Birth_Date);
if BirthMonth le 3;
run;

proc format;
ia l
value $gender
'F'='Female'

t e r
a
'M'='Male';
run;

M
c. Create a numeric format.

y n
o
data Q1Birthdays;

r
set orion.employee_payroll;

t a t
BirthMonth=month(Birth_Date);
i
run;

i e
if BirthMonth le 3;

i b u
p r
proc format;
t r S
o dis SA
value $gender

P r 'F'='Female'
'M'='Male';

f
S o r
value moname
1='January'

f o
e
2='February'

A
S No tsi
run;
t
3='March';

d
d. In the PROC FREQ step, apply these two user-defined formats.

ou
proc freq data=Q1Birthdays;
tables BirthMonth Employee_Gender;
format Employee_Gender $gender.
BirthMonth moname.;
title 'Employees with Birthdays in Q1';
run;
e. Submit the program.
11-96 Chapter 11 Enhancing Reports

8. Defining Ranges in User-Defined Formats


a. Retrieve the starter program.
b. Create a character format.
proc format;
value $gender
'F'='Female'
'M'='Male'

run;
other='Invalid code';

ia l
c. Create a numeric format.
proc format;

t e r
a
value $gender
'F'='Female'
'M'='Male'

M
other='Invalid code';

y n
o
value salrange

t a r t i
.='Missing salary'
20000-<100000='Below $100,000'

i e i b u
100000-500000='$100,000 or more'
other='Invalid salary';
run;

p r t r S
o dis SA
d. In the PROC PRINT step, apply these two user-defined formats.

P r
proc print data=orion.nonsales;

f
var Employee_ID Job_Title Salary Gender;

S o r
format Salary salrange. Gender $gender.;

o
title1 'Distribution of Salary and Gender Values';

f e
title2 'for Non-Sales Employees';

A run;

S No tsit d
e. Submit the program.
9. Creating a Nested Format Definition

ou
a. Retrieve the starter program.
b. Create a user-defined format.
proc format;
value dategrp
.='None'
low-'31dec2006'd=[year4.]
'01jan2007'd-high=[monyy7.]
;
run;
11.7 Solutions 11-97

c. Apply the new format.


proc freq data=orion.employee_payroll;
tables Employee_Term_Date / missing;
format Employee_Term_Date dategrp.;
title 'Employee Status Report';
run;
d. Submit the program.
10. Subsetting and Grouping Observations
a. Retrieve the starter program.

ia l
b. Add a PROC SORT step.

t e r
proc sort data=orion.order_fact out=order_sorted;
by order_type;
run;

M a
r y
proc means data=order_sorted;

i o n
c. Restrict the PROC MEANS analysis.

t a
where order_type in (2,3);
var Total_Retail_Price;
u t
run;

r i e r i b
title 'Orion Star Sales Summary';

S
p
o dis SA t
d. Modify the PROC MEANS step.

r
proc means data=order_sorted;
by order_type;

S P r o
var Total_Retail_Price;

o f
where order_type in (2,3);

f
title 'Orion Star Sales Summary';

A run;

S No tsit d e
e. Submit the program.
11. Subsetting and Grouping by Multiple Variables

ou
a. Retrieve the starter program.
b. Sort the data set.
proc sort data=orion.order_fact out=order_sorted;
by Order_Type descending Order_Date;
run;
c. Divide the PROC PRINT report.
proc print data=order_sorted;
by Order_Type;
var Order_ID Order_Date Delivery_Date;
title1 'Orion Star Sales Details';
run;
11-98 Chapter 11 Enhancing Reports

d. Limit the observations in the PROC PRINT report.


proc print data=order_sorted;
by Order_Type;
var Order_ID Order_Date Delivery_Date;
where Delivery_Date - Order_Date = 2
and Order_Date between '01jan2005'd and '30apr2005'd;
title1 'Orion Star Sales Details';
run;
e. Add a second title.
proc print data=order_sorted;

ia l
by Order_Type;

e r
var Order_ID Order_Date Delivery_Date;

t
where Delivery_Date - Order_Date = 2

a
and Order_Date between '01jan2005'd and '30apr2005'd;
title1 'Orion Star Sales Details';

M n
title2 '2-Day Deliveries from January to April 2005';

y
run;

a r
f. Submit the program.

t i o
i e t b u
12. Adding Subsetting Conditions

p r t r i
a. Retrieve the starter program.

S
o dis SA
b. Reorder the variables in the PROC PRINT step.

r
proc format;
value $country

S P r
"CA"="Canada"
"DK"="Denmark"

o o f
f
"ES"="Spain"

A
S No tsit d e
"GB"="Great Britain"
"NL"="Netherlands"
"SE"="Sweden"
"US"="United States";

ou
run;

proc sort data=orion.shoe_vendors out=vendors_by_country;


by Supplier_Country Supplier_Name;
run;

proc print data=vendors_by_country;


where Product_Line=21;
by Supplier_Name Supplier_ID Supplier_Country notsorted;
var Product_ID Product_Name;
title1 'Orion Star Products: Children Sports';
run;
11.7 Solutions 11-99

c. Augment the existing WHERE criteria.


proc format;
value $country
"CA"="Canada"
"DK"="Denmark"
"ES"="Spain"
"GB"="Great Britain"
"NL"="Netherlands"
"SE"="Sweden"

run;
"US"="United States";

ia l
e r
proc sort data=orion.shoe_vendors out=vendors_by_country;

t
by Supplier_Country Supplier_Name;
run;

M a
n
proc print data=vendors_by_country;

y
where Product_Line=21;
where same

a r t i o
and Product_Name ? 'Street' or Product_Name ? 'Running';

i e t b u
by Supplier_Name Supplier_ID Supplier_Country notsorted;
var Product_ID Product_Name;

run;

p r t i
title1 'Orion Star Products: Children Sports';

r S
o dis SA
d. Submit the program.

r
P
13. Directing Output to the PDF and RTF Destinations

S r o f
a. Retrieve the starter program.

o
A f e
b. Create the PDF version of the PROC PRINT report.

t d
S No tsi
ods pdf file='p111s13p.pdf';
proc print data=orion.customer;
title 'Customer Information';

ou
run;
ods pdf close;
For z/OS (OS/390), the following ODS PDF statement is used:
ods pdf file='.workshop.report(p111s13p)';
c. Submit the program.
d. Modify your ODS statements to create the RTF version of the PROC PRINT report.
ods rtf file='p111s13r.rtf';
proc print data=orion.customer;
title 'Customer Information';
run;
ods rtf close;
For z/OS (OS/390), the following ODS RTF statement is used:
ods rtf file='.workshop.report(p111s13r)' rs=none;
11-100 Chapter 11 Enhancing Reports

e. Suppress the default Output window listing.


ods listing close;
ods rtf file='p111s13r.rtf';
proc print data=orion.customer;
title 'Customer Information';
run;
ods rtf close;
ods listing;
f. Submit the program.

ia l
r
What happens if the RUN statement is moved to the end of the program?

t
The file is empty.
g. Add the STYLE= option.
a e
ods listing close;

M n
ods rtf file='p111s13r.rtf' style=curve;

y o
proc print data=orion.customer;

r t
title 'Customer Information';
run;

t a i
ods rtf close;
ods listing;

i e i b u
p r t r
h. Submit the program.
S
o dis SA
14. Creating ODS Output Compatible with Microsoft Excel

r
P f
a. Retrieve the starter program.

S r o
b. Add ODS statements to send the report to a file that can be viewed in Microsoft Excel.

f o
A
S No tsi d e
ods csvall file='p111s14.csv';

t
proc print data=orion.customer_type;
title 'Customer Type Definitions';
run;

ou
proc print data=orion.country;
title 'Country Definitions';
run;
ods csvall close;
11.7 Solutions 11-101

For z/OS (OS/390), the following ODS CSVALL statement is used:


ods csvall file='.workshop.report(p111s14)' rs=none;
ods msoffice2k file='p111s14.html' style=Listing;
proc print data=orion.customer_type;
title 'Customer Type Definitions';
run;

proc print data=orion.country;


title 'Country Definitions';
run;
ods msoffice2k close;

ia l
e r
For z/OS (OS/390), the following ODS MSOFFICE2K statement is used:

t
ods msoffice2k file='.workshop.report(p111s14)' rs=none;

M a
n
ods tagsets.excelxp file='p111s14.xml' style=Listing;

y
proc print data=orion.customer_type;

r i o
title 'Customer Type Definitions';
run;

a t
i e t b u
proc print data=orion.country;

run;

p r t i
title 'Country Definitions';

r S
o dis SA
ods tagsets.excelxp close;

P r
For z/OS (OS/390), the following ODS TAGSETS.EXCELXP statement is used:

f
ods tagsets.excelxp file='.workshop.report(p111s14)' rs=none;

S o r o
c. Submit the program.

f
A t e
d. Open the file with Microsoft Excel.

S No tsi d
15. Adding HTML-Specific Features to ODS Output
a. Retrieve the starter program.

ou
b. Create the HTML version of the PROC PRINT report.
ods html file='p111s15.html';
proc print data=orion.customer;
title 'Customer Information';
run;
ods html close;
For z/OS (OS/390), the following ODS HTML statement is used:
ods html file='.workshop.report(p111s15)' rs=none;
11-102 Chapter 11 Enhancing Reports

c. Customize the title so that it becomes a clickable hyperlink.


ods html file='p111s15.html';
proc print data=orion.customer;
title link='http://www.sas.com' 'Customer Information';
run;
ods html close;
d. Submit the program.

l
16. Implementing Cascading Style Sheets with ODS Output
a. Retrieve the starter program.

e
b. Modify the ODS HTML statement.
r ia
a t
ods html file='p111s16.html' stylesheet=(url='p111e16c.css');
proc print data=orion.customer_type;

M
title 'Customer Type Definitions';

n
run;
ods html close;

a r y t i o
For z/OS (OS/390), the following ODS HTML statement is used:

i e t b u
ods html file='.workshop.report(p111s16)' rs=none
stylesheet=(url='.workshop.report(p111e16c)');

r r
c. Submit the program.

p t i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
11.7 Solutions 11-103

Solutions to Student Activities (Polls/Quizzes)

11.01 Poll – Correct Answer


Did the date and/or time change?
Yes
No

ia l
e r
The DTRESET option uses the current date and time
versus the SAS invocation date and time.

t
a
options date number pageno=1 ls=100 dtreset;

M
r y i o n
t a u t
18

r i e r i b S
p111a01s

p
o dis SA t
P r 11.02 Quiz – Correct Answer

f
r
Which footnote(s) appears in the second procedure

S
output?

f o e o
a. Non Sales Employees c. Non Confidential
Sales Employees

A
S No tsit b.
dOrion Star
Non Sales Employees d. Orion Star
Non Sales Employees
Confidential

ou
footnote1 'Orion Star';
proc print data=orion.sales;
footnote2 'Sales Employees';
footnote3 'Confidential';
run;
proc print data=orion.nonsales;
footnote2 'Non Sales Employees';
run;
38
11-104 Chapter 11 Enhancing Reports

11.03 Quiz – Correct Answer


Which statement is true concerning the
PROC PRINT output for Bonus?

a. Annual Bonus will be the label.


b. Mid-Year Bonus will be the label.

Temporary labels override permanent labels.

ia l
t e r
M a
57

r y i o n p111d05

t a u t
r i e i b
11.04 Quiz – Correct Answer
r S
p t
Which displayed value is incorrect for the given format?

o dis SA
P r Format
$3.

f
Stored Value
Wednesday
Displayed Value
Wed

S f o r 6.1

o 1234.345 1234.3

e
COMMAX5. 1234.345 1.234

A
S No tsit d
DOLLAR9.2
DDMMYY8.
DATE9.
1234.345
0
0
$1,234.35
01/01/1960
01JAN1960

ou
YEAR4. 0 1960

DDMMYY8. produces 01/01/60.


62
11.7 Solutions 11-105

11.05 Multiple Answer Poll – Correct Answer


Which user-defined format names are invalid?
a. $stfmt
b. $3levels
c. _4years

l
d. salranges

ia
e. dollar

e r
Character formats must have a dollar sign as the

t
first character and a letter or underscore as the
second character.

M a
User-defined formats cannot be the name of a SAS

77
supplied format.

r y i o n
t a u t
r i e i b
11.06 Quiz – Correct Answer
r S
p t
If you have a value of 99999.87, how will it be displayed

o dis SA
if the TIERS format is applied to the value?

P ra. Tier 2

f
S
b.
c.

f o r
Tier 3

o
a missing value

A
S No tsi
d.

t
proc format;
d
value tiers
e
none of the above

20000-49999 = 'Tier 1'


50000-99999 = 'Tier 2'

ou
100000-250000 = 'Tier 3';
run;

86
11-106 Chapter 11 Enhancing Reports

11.07 Quiz – Correct Answer


If you have a value of 100000, how will it be displayed
if the TIERS format is applied to the value?
a. Tier 2
b. Tier 3
c. 100000
d. a missing value

ia l
proc format;
value tiers

e r
20000-<50000 = 'Tier 1'

t
50000- 100000 = 'Tier 2'

run;

M a100000<-250000 = 'Tier 3';

90

r y i o n
t a u t
r i e i b
11.08 Quiz – Correct Answer
r S
p t
Which of the following WHERE statements have

o dis SA
invalid syntax?

P r a. where
.

f
Salary ne .;

S o r
b. where

f
.
o
Hire_Date >= '01APR2008'd;

A
S No tsit c. where
.

d. where
. d e
Country in (AU US);

Salary + Bonus <= 10000;

ou
e. where
. Gender ne 'M' Salary >= 50000;

f. where
. Name like '%N';

105
11.7 Solutions 11-107

11.09 Multiple Choice Poll – Correct Answer


Which statement is true concerning the multiple
WHERE statements?
a. All the WHERE statements are used.
b. None of the WHERE statements is used.
c. The first WHERE statement is used.
d. The last WHERE statement is used.

ia l
1001
1002
1003
tables Gender;

t e
where Salary > 75000;
where Country = 'US';r
1000 proc freq data=orion.sales;

a
NOTE: Where clause has been replaced.
1004 run;

ORION.SALES.

y M
NOTE: There were 102 observations read from the data set

n
o
WHERE Country='US';
111

t a r t i
i e i b u
p r
11.10 Quiz – Correct Answer

t r S
Which is a valid BY statement for the PROC FREQ step?

r o dis SA
a. .by Country Gender;

S P r f
b. .by Gender Last_Name;

o o
f
c. .by Country;

A
S No tsit d e
d. .by Gender;

proc sort data=orion.sales out=work.sort;

ou
by Country descending Gender Last_Name;
run;

proc freq data=work.sort;


tables Gender;
run;
117
11-108 Chapter 11 Enhancing Reports

11.11 Quiz – Correct Answer


What is the problem with this program?

ods pdf file='myreport.pdf';

proc print data=orion.sales;


run;

ods pdf close;

ia l
t e r
M a
134

r y i o n
t a u t
r i e i b
11.12 Poll – Correct Answer
r S
p t
Did you notice a difference in the presentation aspects

o dis SA
between the two style definitions?

P r Yes

f
r
No

S o o
The first group of style definitions did not use color.

f e
A
The second group of style definitions did use color.

S No tsit d
152
ou
11.7 Solutions 11-109

Solutions to Chapter Review

Chapter Review Answers


1. What are some examples of global statements that
enhance reports?
OPTIONS
TITLE

ia l
FOOTNOTE
ODS

t e r
10
M a
2. What is the maximum number of title or footnote lines?

r y i o n
t a u t
169

r i e r i b S
continued...

p
o dis SA t
P r Chapter Review Answers

f
r
3. How can you force a line break in a column header in

S f
PROC PRINT?

o e o
Use the SPLIT= option in the PROC PRINT

A
S No tsit
statement in combination with the LABEL

d
statement or a permanent label.

4. What is the difference between using a FORMAT

ou
statement in a PROC step versus a DATA step?
A FORMAT statement in a PROC step defines a
temporary format. A FORMAT statement in a DATA
step defines a permanent format.

170 continued...
11-110 Chapter 11 Enhancing Reports

Chapter Review Answers


5. How can you create a descriptive label for values of
a variable such as a department name instead of a
department code?
Use a VALUE statement in PROC FORMAT to
create user-defined formats.

ia l
t e r
M a
171

r y i o n continued...

t a u t
r i e i b
Chapter Review Answers
r S
p t
6. What are some examples of ODS destinations?

o dis SA
LISTING

P r HTML

f
S f o r
PDF
RTF
o
A
S No tsit
CSVALL

d e
MSOFFICE2K
EXCELXP

172
ou
Chapter 12 Producing Summary
Reports

12.1 Using the FREQ Procedure ......................................................................................... 12-3

ia l
Exercises ............................................................................................................................ 12-21

t e r
12.2 Using the MEANS Procedure .................................................................................... 12-27

a
Exercises ............................................................................................................................ 12-42

M
y o n
12.3 Using the TABULATE Procedure (Self-Study) ......................................................... 12-46

r i
a t
Exercises ............................................................................................................................ 12-60

t u
i e i b
12.4 Chapter Review ........................................................................................................... 12-65

r r S
p
o dis SA t
12.5 Solutions ..................................................................................................................... 12-66

P r Solutions to Exercises ........................................................................................................ 12-66

f
S f o r o
Solutions to Student Activities (Polls/Quizzes) ................................................................... 12-75

e
Solutions to Chapter Review .............................................................................................. 12-78

A
S No tsit d
ou
12-2 Chapter 12 Producing Summary Reports

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.1 Using the FREQ Procedure 12-3

12.1 Using the FREQ Procedure

Objectives
Produce one-way and two-way frequency tables with
the FREQ procedure.
Enhance frequency tables with options.
Produce output data sets by using the OUT= option

ia l
r
in the TABLES and OUTPUT statements. (Self-Study)

a t e
y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
The FREQ Procedure

S P o r o f
The FREQ procedure can do the following:
produce one-way to n-way frequency and

A t f d e
crosstabulation (contingency) tables
compute chi-square tests for one-way to n-way tables

S No tsi and measures of association and agreement for


contingency tables
automatically display the output in a report and save

ou
the output in a SAS data set

General form of the FREQ procedure:

PROC FREQ DATA=SAS-data-set <option(s)>;


TABLES variable(s) </ option(s)>;
RUN;
4
12-4 Chapter 12 Producing Summary Reports

The FREQ Procedure


A FREQ procedure with no TABLES statement generates
one-way frequency tables for all data set variables.
proc freq data=orion.sales;
run;

This PROC FREQ step creates a frequency table for the


following nine variables:

ia l
Employee_ID
First_Name
t e rJob_Title
Country
Last_Name
Gender
M a Birth_Date
Hire_Date

5
Salary

r y i o n p112d01

t a u t
By default, PROC FREQ creates a report on every variable in the data set. For example, the

r i e r i b
Employee_ID report displays every unique value of Employee_ID, counts how many observations
have each value, and provides percentages and cumulative statistics. This is not a useful report because

S
p t
each employee has his or her own unique employee ID.

o dis SA
r
You do not typically create frequency reports for variables with a large number of distinct values,
such as Employee_ID, or for analysis variables, such as Salary. You usually create frequency reports

P r
and applying formats.

S o o f
for categorical variables, such as Job_Title. You can group variables into categories by creating

A t f d e
S No tsi The TABLES Statement
The TABLES statement specifies the frequency and

ou
crosstabulation tables to produce.
proc freq data=orion.sales;
one-way
tables Gender Country;
frequency tables
run;

An asterisk between variables requests a n-way


crosstabulation table.
proc freq data=orion.sales;
tables Gender*Country; two-way
run; frequency table

p112d01
6
12.1 Using the FREQ Procedure 12-5

The TABLES Statement


A one-way frequency table produces frequencies,
cumulative frequencies, percentages, and cumulative
percentages.
proc freq data=orion.sales;
tables Gender Country;
run;
The FREQ Procedure

ia l
r
Cumulative Cumulative
Gender Frequency Percent Frequency Percent

e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
F 68 41.21 68 41.21
M 97 58.79 165 100.00

Country Frequency

M a Percent
Cumulative
Frequency
Cumulative
Percent

n
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 38.18 63 38.18

y o
US 102 61.82 165 100.00
7

t a r t i
i e i b u
p r
The TABLES Statement

t r S
An n-way frequency table produces cell frequencies, cell

r o dis SA
percentages, cell percentages of row frequencies, and
cell percentages of column frequencies, plus total

S P frequency and percent.

o r o f
proc freq data=orion.sales;

A t f
run;

d e
tables Gender*Country;

S No tsi rows columns

8
ou
12-6 Chapter 12 Producing Summary Reports

The TABLES Statement


The FREQ Procedure

Table of Gender by Country

Gender Country

Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚AU ‚US ‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68

ia l
r
‚ 16.36 ‚ 24.85 ‚ 41.21
‚ 39.71 ‚ 60.29 ‚

e
‚ 42.86 ‚ 40.20 ‚

t
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
M ‚ 36 ‚ 61 ‚ 97

a
‚ 21.82 ‚ 36.97 ‚ 58.79
‚ 37.11 ‚ 62.89 ‚

M
‚ 57.14 ‚ 59.80 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

n
Total 63 102 165

y
38.18 61.82 100.00

a r t i o
i e t b u
p r t i
12.01 Multiple Choice Poll
r S
Which of the following statements cannot be added

r o dis SA
to the PROC FREQ step to enhance the report?

P f
a. FORMAT

S
b.
c.

f o rSET
TITLE
o
A
S No tsit
d. WHERE

d e
11
ou
12.1 Using the FREQ Procedure 12-7

Additional SAS Statements


Additional statements can be added to enhance the report.
proc format;
value $ctryfmt 'AU'='Australia'
'US'='United States';
run;

options nodate pageno=1;

ods html file='p112d01.html';

ia l
proc freq data=orion.sales;
tables Gender*Country;

t e r
where Job_Title contains 'Rep';

a
format Country $ctryfmt.;
title 'Sales Rep Frequency Report';
run;

M
13
ods html close;

r y i o n p112d01

t a u t
r i e i b
Additional SAS Statements
r S
p
HTML Output

o dis SA t
P r f
S f o r o
A
S No tsit d e
14
ou
12-8 Chapter 12 Producing Summary Reports

Options to Suppress Display of Statistics


Options can be placed in the TABLES statement
after a forward slash to suppress the display of the
default statistics.

Option Description

l
suppresses the display of cumulative frequency
NOCUM
and cumulative percentage.

NOPERCENT
suppresses the display of percentage, cumulative

r
percentage, and total percentage.

e ia
t
suppresses the display of the cell frequency and
NOFREQ

a
total frequency.
NOROW suppresses the display of the row percentage.
NOCOL
M n
suppresses the display of the column percentage.

y
15

a r t i o
i e t b u
p r t r i
Options to Suppress Display of Statistics
S
o dis SA
NOCUM

P r f
Suppresses

S f o r o
e
Cumulative Cumulative

A
Country Frequency Percent Frequency Percent

t d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

S No tsi
AU 63 38.18 63 38.18
US 102 61.82 165 100.00

ou
NOPERCENT
Suppresses

16
12.1 Using the FREQ Procedure 12-9

Options to Suppress Display of Statistics


Table of Gender by Country

Gender Country

Frequency‚
NOFREQ
Percent ‚ Suppresses
Row Pct ‚ NOROW
Col Pct ‚AU Suppresses
‚US ‚ Total

l
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

ia
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21
NOCOL

r
‚ 39.71 ‚ 60.29 ‚
Suppresses
‚ 42.86 ‚ 40.20 ‚
NOPERCENT

e
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

t
M ‚ 36 ‚ 61 ‚ 97 Suppresses
‚ 21.82 ‚ 36.97 ‚ 58.79

a
‚ 37.11 ‚ 62.89 ‚
‚ 57.14 ‚ 59.80 ‚

M
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

n
Total 63 102 165

y
38.18 61.82 100.00

17

a r t i o
i e t b u
p r
12.02 Quiz

t r i S
Which TABLES statement correctly creates the report?

r o dis SA
a. ;tables Gender nocum;

S P r f
b. ;tables Gender nocum nopercent;

o o
f
c. ;tables Gender / nopercent;

A
S No tsit d e
d. ;tables Gender / nocum nopercent;

The FREQ Procedure

ou
Gender Frequency
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68
M 97

p112d01
19
12-10 Chapter 12 Producing Summary Reports

Additional TABLES Statement Options


Additional options can be placed in the TABLES
statement after a forward slash to control the displayed
output.

Option Description

l
LIST displays n-way tables in list format.

ia
CROSSLIST displays n-way tables in column format.
FORMAT=

t r
formats the frequencies in n-way tables.

e
M a
21

r y i o n
t a u t
r i e i b
LIST and CROSSLIST Options
r S
p
o dis SA
Gender Country

t Frequency Percent
Cumulative Cumulative
Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

r
F Australia 27 16.36 27 16.36
F United States 41 24.85 68 41.21
M Australia 36 21.82 104 63.03

P f
tables Gender*Country / list;
M United States 61 36.97 165 100.00

S f o r o Table of Gender by Country

A
S No tsit
Gender

d
Country

e
Australia
United States

Total
Frequency

27
41

68
Percent

16.36
24.85

41.21
Row
Percent

39.71
60.29

100.00
Column
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 42.86
40.20

----------------------------------------------------------------------

ou
M Australia 36 21.82 37.11 57.14
United States 61 36.97 62.89 59.80
tables Gender*Country / crosslist;
Total 97 58.79 100.00
----------------------------------------------------------------------
Total Australia 63 38.18 100.00
United States 102 61.82 100.00

Total 165 100.00


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ p112d01
22
12.1 Using the FREQ Procedure 12-11

FORMAT= Option
Partial PROC FREQ Outputs
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Australi‚United S‚ Total
‚a ‚tates ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21

l
‚ 39.71 ‚ 60.29 ‚
‚ 42.86 ‚ 40.20 ‚

ia
tables Gender*Country;
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Australia

t e r
‚United States‚ Total

a
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆ
F ‚ 27 ‚ 41 ‚ 68
‚ 16.36 ‚ 24.85 ‚ 41.21

M
‚ 39.71 ‚ 60.29 ‚

n
‚ 42.86 ‚ 40.20 ‚
tables Gender*Country / format=12.;
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆ

23

a r y t i o p112d01

i e t b u
p r t i
PROC FREQ Statement Options
r S
Options can also be placed in the PROC FREQ statement.

r o dis SA
Option Description

S P r
NLEVELS

o o f
displays a table that provides the number of levels
for each variable named in the TABLES statement.

f
PAGE displays only one table per page.

A
S No tsit
COMPRESS

d e begins the display of the next one-way frequency table


on the same page as the preceding one-way table if
there is enough space to begin the table.

24
ou
12-12 Chapter 12 Producing Summary Reports

NLEVELS Option
proc freq data=orion.sales nlevels;
tables Gender Country Employee_ID;
run;

Partial PROC FREQ Output


The FREQ Procedure

Number of Variable Levels

ia l
r
Variable Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

e
Gender 2

t
Country 2
Employee_ID 165

M a
25

r y i o n p112d01

t a u t
To display the number of levels without displaying the frequency counts, add the NOPRINT option to the

r i e
TABLES statement.

r i b
proc freq data=orion.sales nlevels;

S
run;
p t
tables Gender Country Employee_ID / noprint;

o dis SA
P r
To display the number of levels for all variables without displaying any frequency counts, use the _ALL_

f
r
keyword and the NOPRINT option in the TABLES statement.

S o o
proc freq data=orion.sales nlevels;

f e
tables _all_ / noprint;

A run;

S No tsit d
ou
12.1 Using the FREQ Procedure 12-13

PAGE Option
proc freq data=orion.sales;
tables Gender Country Employee_ID;
run;

page 1 page 1 pages 2-5

ia l
r
proc freq data=orion.sales page;

e
tables Gender Country Employee_ID;
run;

t
page 1

M a
page 2 pages 3-6

26

r y i o n p112d01

t a u t
r i e
COMPRESS Option
r i b S
p
o dis SA t
proc freq data=orion.sales;
tables Gender Country Employee_ID;

P rrun;

f
S f o r page 1

o
page 1 pages 2-5

A
S No tsit d e
proc freq data=orion.sales compress;
tables Gender Country Employee_ID;
run;

ou
page 1 page 1 pages 1-4

p112d01
27
12-14 Chapter 12 Producing Summary Reports

Output Data Sets (Self-Study)


PROC FREQ produces output data sets using two
different methods.
The TABLES statement with an OUT= option is used
to create a data set with frequencies and percentages.

l
TABLES variables / OUT=SAS-data-set <options>;

r
The OUTPUT statement with an OUT= option is used

e
to create a data set with specified statistics such as ia
t
the chi-square statistic.

a
M
OUTPUT OUT=SAS-data-set <options>;

29

r y i o n
t a u t
r i e i b
TABLES Statement OUT= Option (Self-Study)
r S
p t
The OUT= option in the TABLES statement creates an

o dis SA
output data set with the following variables:

P r BY variables

f
r
TABLES statement variables

S f o o
the automatic variables COUNT and PERCENT

e
A
S No tsit
other frequency and percentage variables requested

d
with options in the TABLES statement

TABLES variables / OUT=SAS-data-set <options>;

30
ou
If more than one table request appears in the TABLES
statement, the contents of the data set correspond to the
last table request.
12.1 Using the FREQ Procedure 12-15

TABLES Statement OUT= Option (Self-Study)


proc freq data=orion.sales noprint;
tables Gender Country / out=work.freq1;
run;

proc print data=work.freq1;


run;

PROC PRINT Output

ia l
Obs

1
Country

AU
COUNT

63

t e r
PERCENT

38.1818

a
2 US 102 61.8182

y M
The NOPRINT option suppresses the display of all output.

n
31

a r t i o p112d02

i e t b u
p r t r i
TABLES Statement OUT= Option (Self-Study)
S
o dis SA
proc freq data=orion.sales noprint;

P r tables Gender*Country / out=work.freq2;


run;

f
S run;

f o r o
proc print data=work.freq2;

A
S No tsit d e
PROC PRINT Output
Obs Gender Country COUNT PERCENT

1 F AU 27 16.3636

ou
2 F US 41 24.8485
3 M AU 36 21.8182
4 M US 61 36.9697

p112d02
32
12-16 Chapter 12 Producing Summary Reports

TABLES Statement OUT= Option (Self-Study)


Options can be added to the TABLES statement after the
forward slash to control the additional statistics added to
the output data set.

Option Description

l
includes the cumulative frequency and cumulative
OUTCUM percentage in the output data set for one-way

ia
frequency tables.

OUTPCT
r
includes the percentage of column frequency and row

e
frequency in the output data set for n-way frequency
tables.

t
M a
33

r y i o n
t a u t
r i e i b
TABLES Statement OUT= Option (Self-Study)
r S
p
o dis SA t
proc freq data=orion.sales noprint;

P r tables Gender Country / out=work.freq3

f
outcum;

r
run;

S f o o
proc print data=work.freq3;

e
A
run;

S No tsit d
PROC PRINT Output
Obs Country COUNT PERCENT CUM_FREQ CUM_PCT

ou
1 AU 63 38.1818 63 38.182
2 US 102 61.8182 165 100.000

p112d02
34
12.1 Using the FREQ Procedure 12-17

TABLES Statement OUT= Option (Self-Study)


proc freq data=orion.sales noprint;
tables Gender*Country / out=work.freq4
outpct;
run;

proc print data=work.freq4;


run;

ia l
PROC PRINT Output
Obs Gender Country

t e r
COUNT PERCENT PCT_ROW PCT_COL

1
2
3
F
F
M
AU
US
AU

M a 27
41
36
16.3636
24.8485
21.8182
39.7059
60.2941
37.1134
42.8571
40.1961
57.1429

n
4 M US 61 36.9697 62.8866 59.8039

35

a r y t i o p112d02

i e t b u
p r t i
OUTPUT Statement OUT= Option (Self-Study)
r S
The OUT= option in the OUTPUT statement creates

r o dis SA
an output data set with the following variables:
BY variables

S P o r o f
the variables requested in the TABLES statement
variables that contain the specified statistics.

A t f d e
OUTPUT OUT=SAS-data-set <options>;

S No tsi If more than one table request appears in the TABLES

ou
statement, the contents of the data set corresponds to the
last table request.

36

If there are multiple TABLES statements, the contents of the data set corresponds to the last TABLES
statement.
12-18 Chapter 12 Producing Summary Reports

OUTPUT Statement OUT= Option (Self-Study)


In order to specify that the output data set contain a
particular statistic, you must have PROC FREQ compute
the statistic by using the corresponding option in the
TABLES statement.

proc freq data=orion.sales;


tables Country / chisq;
output out=work.freq5 chisq;
run;
ia l
e
proc print data=work.freq5;
run;
t r
M a
CHISQ requests chi-square tests and measures of

37

r y i n
association based on chi-square.

o p112d03

t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
PROC FREQ Output

o dis SA
r
The FREQ Procedure

P f
Cumulative Cumulative

r
Country Frequency Percent Frequency Percent

o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

S o
AU 63 38.18 63 38.18

f e
US 102 61.82 165 100.00

A
S No tsit d Chi-Square Test
for Equal Proportions
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 9.2182

ou
DF 1
Pr > ChiSq 0.0024

Sample Size = 165

38
12.1 Using the FREQ Procedure 12-19

OUTPUT Statement OUT= Option (Self-Study)


PROC PRINT Output

Obs N _PCHI_ DF_PCHI P_PCHI

1 165 9.21818 1 .002396234

chi-square p-value

ia l
r
degrees of freedom

t e
When you request a statistic, the OUTPUT data set

a
contains that test statistic plus any associated standard

y M
error, confidence limits, p-values, and degrees of freedom.

n
39

a r t i o
i e t b u
p r
12.03 Quiz

t r i S
Retrieve and submit program p112a01.

r o dis SA
proc freq data=orion.sales;

P
tables Gender / chisq out=freq6 outcum;

S
run;

o r o f
output out=freq7 chisq;

f
proc print data=freq6;

A
S No tsi
run;

t
run;
d e
proc print data=freq7;

Review the PROC FREQ output.

ou
Review the PROC PRINT output from the TABLES
statement OUT= option.
Review the PROC PRINT output from the OUTPUT
statement OUT= option.
41
12-20 Chapter 12 Producing Summary Reports

Output Data Sets (Self-Study)


Program p112d04 is an example of combining multiple
PROC FREQ output data sets into one data set.
Percent of
Frequency Total Chi-
Obs Value Count Frequency Square P-Value

l
1 F 68 41.2121 . .
2 M 97 58.7879 . .

ia
3 AU 63 38.1818 . .

r
4 US 102 61.8182 . .
5 Gender . . 5.09697 0.023968

e
6 Country . . 9.21818 0.002396

a t
y M n
43

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
12.1 Using the FREQ Procedure 12-21

Exercises

Level 1

1. Counting Levels of a Variable with PROC FREQ


a. Retrieve the starter program p112e01.

ia l
e r
b. Modify the program to produce two separate reports:

t
a
1) Display the number of distinct levels of Customer_ID and Employee_ID for retail
orders.

M n
a) Use a WHERE statement to limit the report to retail sales by specifying the condition

y
a r
Order_Type=1.

t i o
b) Display this report title: Unique Customers and Salespersons for

i e t
Retail Sales.

b u
p r t r i
If you do not want to see the counts for individual levels of Customer_ID

S
and Employee_ID, add the NOPRINT option to the TABLES statement after

r o dis SA a forward slash.


2) Display the number of distinct levels for Customer_ID for catalog and Internet orders.

S P o r o f
a) Use a WHERE statement to limit the report to catalog and Internet sales by specifying

f
the condition corresponding to Order_Type values other than 1.

A
S No tsit d e
b) Display this report title: Unique Customers for Catalog and Internet.

If you do not want to see the counts for individual levels of Customer_ID,
add the NOPRINT option to the TABLES statement after a forward slash.

ou
12-22 Chapter 12 Producing Summary Reports

c. Submit the program to produce the following reports:


PROC FREQ Output
Unique Customers and Salespersons for Retail Sales

The FREQ Procedure

Number of Variable Levels

l
Variable Label Levels
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ia
Customer_ID Customer ID 31

r
Employee_ID Employee ID 100

a t eUnique Customers for Catalog and Internet Sales

y M n
The FREQ Procedure

o
Number of Variable Levels

t a r t i Variable Label Levels

u
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

r i e r i b S
Customer_ID Customer ID 63

t
Level 2

r p
o dis SA
2. Producing Frequency Reports with PROC FREQ

S P o o f
a. Retrieve the starter program p112e02.

r
b. Add TABLES statements to the PROC FREQ step to produce three frequency reports:

A t f e
1) Number of orders in each year: Apply the YEAR4. format to the Order_Date variable to

d
S No tsi combine all orders within the same year.
2) Number of orders of each order type: Apply the ordertypes. format defined in the starter

ou
program to the Order_Type variable. Suppress the cumulative frequency and percentages.

3) Number of orders for each combination of year and order type: Suppress all percentages that
normally appear in each cell of an n-way table.
12.1 Using the FREQ Procedure 12-23

c. Submit the program to produce the following output:


PROC FREQ Output
Order Summary by Year and Type

The FREQ Procedure

Date Order was placed by Customer

l
Cumulative Cumulative
Order_Date Frequency Percent Frequency Percent

ia
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

r
2003 104 21.22 104 21.22

e
2004 87 17.76 191 38.98

t
2005 70 14.29 261 53.27

a
2006 113 23.06 374 76.33
2007 116 23.67 490 100.00

y M n Order Type

a r t i o Order_

i e t b u
Type Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

i
Retail 260 53.06

p r t r S
Catalog
Internet
132
98
26.94
20.00

r o dis SA Table of Order_Date by Order_Type

S P o r o f Order_Date(Date Order was placed by Customer)


Order_Type(Order Type)

A t f d e Frequency‚Retail ‚Catalog ‚Internet‚ Total

S No tsi ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2003 ‚ 45 ‚ 41 ‚ 18 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
104

ou
2004 ‚ 51 ‚ 20 ‚ 16 ‚ 87
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2005 ‚ 27 ‚ 23 ‚ 20 ‚ 70
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2006 ‚ 67 ‚ 33 ‚ 13 ‚ 113
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2007 ‚ 70 ‚ 15 ‚ 31 ‚ 116
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 260 132 98 490
12-24 Chapter 12 Producing Summary Reports

Level 3

3. Displaying PROC FREQ Output in Descending Frequency Order


a. Retrieve the starter program p112e03.
b. Submit the program to produce the following report:
PROC FREQ Output

l
Customer Demographics

(Top two levels for each variable?)

e r The FREQ Procedure ia


a t Customer Country

Customer_
Country

y M n
Frequency Percent
Cumulative
Frequency
Cumulative
Percent

o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
AU

a
CA
r t i 8
15
10.39
19.48
8
23
10.39
29.87

i e
DE
IL

i b u 10
5
12.99
6.49
33
38
42.86
49.35

r
TR 7 9.09 45 58.44

p
US
ZA

t r S
28
4
36.36
5.19
73
77
94.81
100.00

r o dis SA
P f
Customer Type Name

S f o r
Customer_Type
o Frequency Percent
Cumulative
Frequency
Cumulative
Percent

A
S No tsit d e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Internet/Catalog Customers
Orion Club members high activity
Orion Club members medium activity
Orion Club Gold members high activity
8
11
20
10
10.39
14.29
25.97
12.99
8
19
39
49
10.39
24.68
50.65
63.64

ou
Orion Club Gold members low activity 5 6.49 54 70.13
Orion Club Gold members medium activity 6 7.79 60 77.92
Orion Club members low activity 17 22.08 77 100.00

Customer Age Group

Customer_ Cumulative Cumulative


Age_Group Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
15-30 years 22 28.57 22 28.57
31-45 years 27 35.06 49 63.64
46-60 years 14 18.18 63 81.82
61-75 years 14 18.18 77 100.00
12.1 Using the FREQ Procedure 12-25

c. What are the two most common values for each variable?
1) Country ___________________ ____________________

2) Customer Type ___________________ ____________________

3) Customer Age Group ___________________ ____________________

d. Modify the program to display the frequency counts in descending order.

Documentation about the FREQ procedure can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS

ia l
r
Base SAS Procedures Guide: Statistical Procedures The FREQ Procedure).

e
Look for an option in the PROC FREQ statement that can perform the requested
action.

a t
e. Submit the modified program.

M n
f. What are the two most common values for each variable?

y
1) Country

a r t i o ___________________ ____________________

i e t
2) Customer Type

b u
___________________ ____________________

i
3) Customer Age Group ___________________ ____________________

p r t r S
Do these answers match the previous set of answers?

r o dis SA
Which report was easier to use to answer the questions correctly?

S P o o f
4. Creating an Output Data Set with PROC FREQ

r
a. Retrieve the starter program p112e04.

A t f e
b. Create an output data set containing the frequency counts based on Product_ID.

d
S No tsi Creating an output data set from PROC FREQ results is discussed in the self-study
content at the end of this section.

ou
c. Combine the output data set with orion.product_list to obtain the Product_Name
value for each Product_ID code.

d. Sort the merged data so that the most frequently ordered products appear at the top of the
resulting data set. Print the first 10 observations, that is, those that represent the 10 products
ordered most often.

To limit the number of observations displayed by PROC PRINT, apply the


OBS= data set option, as in the following:

proc print data=work.mydataset(obs=10);

.
12-26 Chapter 12 Producing Summary Reports

e. Submit the program to produce the following report:


PROC PRINT Output
Top Ten Products by Number of Orders

Product
Obs Orders Number Product

1 6 230100500056 Knife

l
2 6 230100600030 Outback Sleeping Bag, Large,Left,Blue/Black
3 5 230100600022 Expedition10,Medium,Right,Blue Ribbon

ia
4 5 240400300035 Smasher Shorts

r
5 4 230100500082 Lucky Tech Intergal Wp/B Rain Pants

e
6 4 230100600005 Basic 10, Left , Yellow/Black

t
7 4 230100600016 Expedition Zero,Medium,Right,Charcoal

a
8 4 230100600028 Expedition 20,Medium,Right,Forestgreen
9 4 230100700008 Family Holiday 4

M
10 4 230100700011 Hurricane 4

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.2 Using the MEANS Procedure 12-27

12.2 Using the MEANS Procedure

Objectives
Calculate summary statistics and multilevel summaries
with the MEANS procedure.
Enhance summary tables with options.
Produce output data sets by using the OUT= option in

ia l
r
the OUTPUT statement. (Self-Study)

procedure. (Self-Study)

a t e
Compare the SUMMARY procedure to the MEANS

y M n
a r t i o
i e t b u
i
47

p r t r S
r o dis SA
The MEANS Procedure

S P o r o f
The MEANS procedure provides data summarization
tools to compute descriptive statistics for variables across

f
all observations and within groups of observations.

A
S No tsit d e
General form of the MEANS procedure:

PROC MEANS DATA=SAS-data-set <statistic(s)> <option(s)>;

ou
VAR analysis-variable(s);
CLASS classification-variable(s);
RUN;

48
12-28 Chapter 12 Producing Summary Reports

The MEANS Procedure


By default, the MEANS procedure reports the number
of nonmissing observations, the mean, the standard
deviation, the minimum value, and the maximum value
of all numeric variables.

proc means data=orion.sales;


run;

ia l
Variable N

t e
Mean
r
The MEANS Procedure

Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

a
Employee_ID 165 120713.90 450.0866939 120102.00 121145.00
Salary 165 31160.12 20082.67 22710.00 243190.00
Birth_Date 165 3622.58 5456.29 -5842.00 10490.00
Hire_Date 165

M
12054.28

n
4619.94 5114.00 17167.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

y
49

a r t i o p112d05

i e t b u
p r
The VAR Statement

t r i S
The VAR statement identifies the analysis variables

r o dis SA
and their order in the results.

P f
proc means data=orion.sales;

S f o r
var Salary;
run;
o
A
S No tsit N
d e Mean
The MEANS Procedure

Analysis Variable : Salary

Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ou
165 31160.12 20082.67 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

p112d05
50
12.2 Using the MEANS Procedure 12-29

The CLASS Statement


The CLASS statement identifies variables whose values
define subgroups for the analysis.
proc means data=orion.sales;
var Salary;
class Gender Country;
run;
The MEANS Procedure

ia l
r
Analysis Variable : Salary

e
N
Gender Country Obs N Mean Std Dev Minimum Maximum

t
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F AU 27 27 27702.41 1728.23 25185.00 30890.00

M
US

AU
41

M
36
a 41

36
29460.98

32001.39
8847.03

16592.45
25390.00

25745.00
83505.00

108255.00

n
US 61 61 33336.15 29592.69 22710.00 243190.00

y
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

51

a r t i o p112d05

i e t b u
p r t i
The CLASS Statement
r S proc means data=orion.sales;

r o dis SA var Salary;


class Gender Country;

P f
run;

r
classification

o
variables

S
The MEANS Procedure

o
analysis

f
Analysis Variable : Salary

e
variable

A t
N

d
Gender Country Obs N Mean Std Dev Minimum Maximum

S No tsi
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F AU 27 27 27702.41 1728.23 25185.00 30890.00

US 41 41 29460.98 8847.03 25390.00 83505.00

M AU 36 36
statistics16592.45
32001.39
for analysis variable
25745.00 108255.00

ou
US 61 61 33336.15 29592.69 22710.00 243190.00
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The CLASS statement adds the N Obs column, which is the number
of observations for each unique combination of the class variables.
52
12-30 Chapter 12 Producing Summary Reports

12.04 Quiz
For a given data set, there are 63 observations with
a Country value of AU. Of those 63 observations,
only 61 observations have a value for Salary.
Which output is correct?

a. Analysis Variable : Salary b. Analysis Variable : Salary

ia l
r
N N
Country Obs N Country Obs N

e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
AU 63 61 AU 61 63

a
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

y M n
54

a r t i o
i e t b u
p r t i
Additional SAS Statements
r S
Additional statements can be added to enhance the reports.

r o dis SA
proc format;
value $ctryfmt 'AU'='Australia'

S P r
run;

o o f 'US'='United States';

A t f e
options nodate pageno=1;
ods html file='p112d05.html';

d
S No tsi
proc means data=orion.sales;
var Salary;
class Gender Country;
where Job_Title contains 'Rep';

ou
format Country $ctryfmt.;
title 'Sales Rep Summary Report';
run;
ods html close;
p112d05
56
12.2 Using the MEANS Procedure 12-31

Additional SAS Statements


HTML Output

ia l
t e r
M a
57

r y i o n
t a u t
r i e i
PROC MEANS Statistics
r b S
p t
The statistics to compute and the order to display them

o dis SA
can be specified in the PROC MEANS statement.

P rproc means data=orion.sales sum mean range;

f
r
var Salary;

S
class Country;
run;

f o e o
A
S No tsit dN
The MEANS Procedure

Analysis Variable : Salary

ou
Country Obs Sum Mean Range
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 1900015.00 30158.97 83070.00

US 102 3241405.00 31778.48 220480.00


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

p112d05
58
12-32 Chapter 12 Producing Summary Reports

PROC MEANS Statistics


Descriptive Statistic Keywords
CLM CSS CV LCLM MAX
MEAN MIN MODE N NMISS
KURTOSIS RANGE SKEWNESS STDDEV STDERR
SUM SUMWGT UCLM USS VAR

ia l
r
Quantile Statistic Keywords

e
MEDIAN |

t
P1 P5 P10 Q1 | P25
P50
Q3 | P75 P90

M a P95 P99 QRANGE

n
Hypothesis Testing Keywords

59
PROBT

a r y T

t i o
i e t b u
p r t i
PROC MEANS Statement Options
r S
Options can also be placed in the PROC MEANS

r o dis SA
statement.

P f
Option Description

S f o r
MAXDEC=

o
specifies the number of decimal places to use in
printing the statistics.

A
S No tsit FW=

NONOBS d e specifies the field width to use in displaying the


statistics.
suppresses reporting the total number of observations
for each unique combination of the class variables.

60
ou
12.2 Using the MEANS Procedure 12-33

MAXDEC= Option
proc meansThedata=orion.sales
MEANS Procedure
maxdec=0;
Analysis Variable : Salary

N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30159 12699 25185 108255

l
US 102 102 31778 23556 22710 243190
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

r
proc means Thedata=orion.sales
MEANS Procedure

e
maxdec=1;
Analysis Variable : Salary ia
Country
N
Obs N

a t Mean Std Dev Minimum Maximum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

M
AU 63 63 30159.0 12699.1 25185.0 108255.0

n
US 102 102 31778.5 23555.8 22710.0 243190.0

y
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

61

a r t i o p112d05

i e t b u
p r
FW= Option

t r i S
proc means data=orion.sales;

o dis SA
The MEANS Procedure

r
Analysis Variable : Salary

P f
Country Obs N Mean Std Dev Minimum Maximum

r
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

o
AU 63 63 30158.97 12699.14 25185.00 108255.00

S o
US 102 102 31778.48 23555.84 22710.00 243190.00

f e
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

A
S No tsit d proc means data=orion.sales fw=15;
The MEANS Procedure

Analysis Variable : Salary

ou
N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30158.96825397 12699.13932690 25185.00000000 108255

US 102 102 31778.48039216 23555.84171928 22710.00000000 243190


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

p112d05
62
12-34 Chapter 12 Producing Summary Reports

NONOBS Option
proc means data=orion.sales;
The MEANS Procedure

Analysis Variable : Salary

N
Country Obs N Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 63 30158.97 12699.14 25185.00 108255.00

l
US 102 102 31778.48 23555.84 22710.00 243190.00

ia
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

proc means

t e r
data=orion.sales nonobs;
The MEANS Procedure

Analysis Variable : Salary

Country N

a
Mean Std Dev Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63

M
30158.97 12699.14 25185.00 108255.00

US

63
102

r y
31778.48

i o n 23555.84 22710.00 243190.00

p112d05

t a u t
r i e i b
Output Data Sets (Self-Study)
r S
p t
PROC MEANS produces output data sets using the

o dis SA
following method:

P r f
OUTPUT OUT=SAS-data-set <options>;

S f o r o
A
S No tsit d e
The output data set contains the following variables:
BY variables
class variables
the automatic variables _TYPE_ and _FREQ_

ou
the variables requested in the OUTPUT statement

65
12.2 Using the MEANS Procedure 12-35

OUTPUT Statement OUT= Option (Self-Study)


The statistics in the PROC
statement impact only the
MEANS report, not the data set.

proc means data=orion.sales sum mean range;


var Salary;
class Gender Country;
output out=work.means1;
ia l
run;

t
proc print data=work.means1;
e r
run;

M a
66

r y i o n p112d06

t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p Gender
t
Partial PROC PRINT Output

o dis SA
Obs Country _TYPE_ _FREQ_ _STAT_ Salary

P r 1
2

f
0
0
165
165
N
MIN
165.00
22710.00

r
3 0 165 MAX 243190.00

o
4 0 165 MEAN 31160.12

S o
5 0 165 STD 20082.67

f
6 AU 1 63 N 63.00

A e
7 AU 1 63 MIN 25185.00
default statistics

t
8 AU 1 63 MAX 108255.00

S No tsi d
9 AU 1 63 MEAN 30158.97
10 AU 1 63 STD 12699.14
11 US 1 102 N 102.00
12 US 1 102 MIN 22710.00
13 US 1 102 MAX 243190.00
14 US 1 102 MEAN 31778.48

ou
15 US 1 102 STD 23555.84
16 F 2 68 N 68.00
17 F 2 68 MIN 25185.00
18 F 2 68 MAX 83505.00
19 F 2 68 MEAN 28762.72
20 F 2 68 STD 6974.15

67
12-36 Chapter 12 Producing Summary Reports

OUTPUT Statement OUT= Option (Self-Study)


The OUTPUT statement can also do the following:
specify the statistics for the output data set
select and name variables
proc means data=orion.sales noprint;
var Salary;
class Gender Country;
output out=work.means2
min=minSalary max=maxSalary

ia l
run;

t e r
sum=sumSalary mean=aveSalary;

run;

M a
proc print data=work.means2;

68

r y i o n
The NOPRINT option suppresses the display of all output.
p112d06

t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
PROC PRINT Output

o dis SA
r
min max sum ave
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary

S P1
2
3
4

o r
F
AU
US

o f 0
1
1
2
165
63
102
68
22710
25185
22710
25185
243190
108255
243190
83505
5141420
1900015
3241405
1955865
31160.12
30158.97
31778.48
28762.72

f e
5 M 2 97 22710 243190 3185555 32840.77

A
6 F AU 3 27 25185 30890 747965 27702.41

t d
7 F US 3 41 25390 83505 1207900 29460.98

S No tsi
8 M AU 3 36 25745 108255 1152050 32001.39
9 M US 3 61 22710 243190 2033505 33336.15

69
ou
12.2 Using the MEANS Procedure 12-37

OUTPUT Statement OUT= Option (Self-Study)


_TYPE_ is a numeric variable that shows which
combination of class variables produced the summary
statistics in that observation.

PROC PRINT Output


Obs Gender Country _TYPE_ _FREQ_
min
Salary
max
Salary
overall
sum
Salary
summary
ave
Salary

ia l
r
1 0 165 22710 243190 5141420 31160.12
2 AU 1 63 summary
25185 by Country
108255 1900015only
30158.97

e
3 US 1 102 22710 243190 3241405 31778.48

t
4 F 2 68 25185 83505 1955865 28762.72
5 M 2 97 summary
22710 by
243190Gender only
3185555 32840.77

a
6 F AU 3 27 25185 30890 747965 27702.41
7 F US 3 41 25390 83505 1207900 29460.98
8 M AU 3 summary
36 25745 by108255
Country and Gender
1152050 32001.39

M
9 M US 3 61 22710 243190 2033505 33336.15

70

r y i o n
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
Obs

p
For

o dis SA
Gender
t
this example,
Country _TYPE_ _FREQ_
min
Salary
max
Salary
sum
Salary
ave
Salary

r
1
_TYPE_ = 0 is an overall
0
summary
165 22710 243190 5141420 31160.12
2
PROC PRINT AU
Output:
1 63 25185 108255 1900015 30158.97

P f
3 US 1 102 22710 243190 3241405 31778.48

r
4 F 2 68 25185 83505 1955865 28762.72

o
5 M 2 97 22710 243190 3185555 32840.77

S o
6 F AU 3 27 25185 30890 747965 27702.41

f e
7 F US 3 41 25390 83505 1207900 29460.98

A
8 M AU 3 36 25745 108255 1152050 32001.39

t
9 M US 3 61 22710 243190 2033505 33336.15

S No tsi _TYPE_
0
d Type of Summary
overall summary
_FREQ_
165

ou
1 summary by Country only 63 AU + 102 AU = 165
2 summary by Gender only 68 F + 97 M = 165
summary by Country 27 F AU + 41 F US +
3
and Gender 36 M AU + 61 M US = 165
71
12-38 Chapter 12 Producing Summary Reports

OUTPUT Statement OUT= Option (Self-Study)


Options can be added to the PROC MEANS statement to
control the output data set.

Option Description
specifies that the output data set contain only
statistics for the observations with the highest

l
NWAY
_TYPE_ value.

DESCENDTYPES

r
orders the output data set by descending _TYPE_
value.

e ia
CHARTYPE

a t
specifies that the _TYPE_ variable in the output
data set is a character representation of the binary
value of _TYPE_.

y M n
72

a r t i o
i e t b u
p r t r i
OUTPUT Statement OUT= Option (Self-Study)
S
o dis SA
without options
min max sum ave

r
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary

P f
1 0 165 22710 243190 5141420 31160.12

r
2 AU 1 63 25185 108255 1900015 30158.97

o
3 US 1 102 22710 243190 3241405 31778.48

S o
4 F 2 68 25185 83505 1955865 28762.72
5 M 2 97 22710 243190 3185555 32840.77

f e
6 F AU 3 27 25185 30890 747965 27702.41

A
7 F US 3 41 25390 83505 1207900 29460.98

t d
8 M AU 3 36 25745 108255 1152050 32001.39

S No tsi
9 M US 3 61 22710 243190 2033505 33336.15

with NWAY

ou
min max sum ave
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary

1 F AU 3 27 25185 30890 747965 27702.41


2 F US 3 41 25390 83505 1207900 29460.98
3 M AU 3 36 25745 108255 1152050 32001.39
4 M US 3 61 22710 243190 2033505 33336.15

p112d06
73
12.2 Using the MEANS Procedure 12-39

OUTPUT Statement OUT= Option (Self-Study)


with DESCENDTYPES
min max sum ave
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary

1 F AU 3 27 25185 30890 747965 27702.41


2 F US 3 41 25390 83505 1207900 29460.98
3 M AU 3 36 25745 108255 1152050 32001.39
4 M US 3 61 22710 243190 2033505 33336.15
5 F 2 68 25185 83505 1955865 28762.72

l
6 M 2 97 22710 243190 3185555 32840.77
7 AU 1 63 25185 108255 1900015 30158.97

ia
8 US 1 102 22710 243190 3241405 31778.48
9 0 165 22710 243190 5141420 31160.12

t e r
M a
74

r y i o n p112d06

t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p
o dis SA t
with CHARTYPE
min max sum ave

r
Obs Gender Country _TYPE_ _FREQ_ Salary Salary Salary Salary

P f
1 00 165 22710 243190 5141420 31160.12

r
2 AU 01 63 25185 108255 1900015 30158.97

o
3 US 01 102 22710 243190 3241405 31778.48

S o
4 F 10 68 25185 83505 1955865 28762.72
5 M 10 97 22710 243190 3185555 32840.77

f e
6 F AU 11 27 25185 30890 747965 27702.41

A
7 F US 11 41 25390 83505 1207900 29460.98

t d
8 M AU 11 36 25745 108255 1152050 32001.39

S No tsi
9 M US 11 61 22710 243190 2033505 33336.15

75
ou p112d06
12-40 Chapter 12 Producing Summary Reports

12.05 Quiz
Retrieve and submit program p112a02.
Review the PROC PRINT output.
Add a WHERE statement to the PROC PRINT step
to subset _TYPE_ for observations summarized
by Gender only.
Submit the program and verify the results.

ia l
t e r
M a
77

r y i o n
t a u t
r i e i b
OUTPUT Statement OUT= Option (Self-Study)
r S
p t
Program p112d07 is an example of merging a PROC

o dis SA
MEANS output data set with a detail data set to create the

P r following partial report.

f
r o
Comparison Comparison
First_ to Country to Gender

S o
Obs Name Last_Name Salary Salary Average Salary Average

A t f 1
2
Tom

e
Wilson

d
Zhou
Dawes
108255
87975
Above
Above
Average
Average
Above
Above
Average
Average

S No tsi
3 Irenie Elvish 26600 Below Average Below Average
4 Christina Ngan 27475 Below Average Below Average
5 Kimiko Hotstone 26190 Below Average Below Average
6 Lucian Daymond 26480 Below Average Below Average
7 Fong Hofmeister 32040 Above Average Below Average
8 Satyakam Denny 26780 Below Average Below Average

ou
9 Sharryn Clarkson 28100 Below Average Below Average
10 Monica Kletschkus 30890 Above Average Above Average

detail detail data compared


data to summary data
79
12.2 Using the MEANS Procedure 12-41

The SUMMARY Procedure (Self-Study)


The SUMMARY procedure provides data summarization
tools to compute descriptive statistics for variables across
all observations and within groups of observations.

General form of the SUMMARY procedure:

PROC SUMMARY DATA=SAS-data-set <statistic(s)>


<option(s)>;

ia l
r
VAR analysis-variable(s);
CLASS classification-variable(s);
RUN;

a t e
y M n
80

a r t i o
i e t b u
p r t i
The SUMMARY Procedure (Self-Study)
r S
The SUMMARY procedure uses the same syntax

r o dis SA
as the MEANS procedure.
The only differences to the two procedures are

S P the following:

o r o f
f
PROC MEANS PROC SUMMARY

A
S No tsit d e
The PRINT option is set by default, The NOPRINT option is set by
which displays output.
Omitting the VAR statement
default, which displays no output.
Omitting the VAR statement produces
analyzes all the numeric variables. a simple count of observations.

81
ou
12-42 Chapter 12 Producing Summary Reports

Exercises

Level 1

5. Creating a Summary Report with PROC MEANS

ia l
t e r
a. Retrieve the starter program p112e05.
b. Display only the SUM statistic for the Total_Retail_Price variable.

M a
c. Display separate statistics for the combination of Order_Date and Order_Type. Apply the
ORDERTYPES. format so that the order types are displayed as text descriptions, not numbers.

r y i o n
Apply the YEAR4. format so that order dates are displayed as years, not individual dates.

t
d. Submit the program to produce the following report:

t a u
Partial PROC MEANS Output

e
r i t r i b Revenue (in U.S. Dollars) Earned from All Orders

S
p
The MEANS Procedure

r o dis SA Analysis Variable : Total_Retail_Price Total Retail Price for This Product

S P o r o f Date
Order
was

A t f d e
placed
by Order N

S No tsi Customer Type Obs Sum


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2003 Retail 53 7938.80

ou
Catalog 52 10668.08

Internet 23 4124.05

2004 Retail 63 9012.22

Catalog 23 3494.60

Internet 22 3275.70

2005 Retail 34 5651.29

Catalog 33 6569.98

Internet 23 4626.40
12.2 Using the MEANS Procedure 12-43

Level 2

6. Analyzing Missing Numeric Values with PROC MEANS


a. Retrieve the starter program p112e06.
b. Display the number of missing values and the number of nonmissing values present in the
Birth_Date, Emp_Hire_Date, and Emp_Term_Date variables.

l
c. Suppress any decimal places in the displayed statistics.

ia
d. Display separate statistics for each value of Gender.

e r
e. Suppress the output column that displays the total number of observations in each classification
group.

t
a
f. Submit the program to produce the following report:

M n
PROC MEANS Output

a r y t i o
Number of Missing and Non-Missing Date Values

t
The MEANS Procedure

i e b
Employee

i u N

r
Gender Variable Label Miss N

p t r S
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

o dis SA
F Birth_Date Employee Birth Date 0 191
Emp_Hire_Date Employee Hire Date 0 191

P r f
Emp_Term_Date Employee Termination Date 139 52

r
M Birth_Date Employee Birth Date 0 233

S f o e o Emp_Hire_Date
Emp_Term_Date
Employee Hire Date
Employee Termination Date
0
169
233
64

A
S No tsit
Level 3
d
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

ou
7. Analyzing All Possible Classification Levels with PROC MEANS
a. Retrieve the starter program p112e07.
b. Display the following statistics in the report:
1) Lower Confidence Limit for the Mean
2) Mean
3) Upper Confidence Limit for the Mean

c. Change the value for the confidence limits to 0.10, resulting in a 90% confidence limit.
12-44 Chapter 12 Producing Summary Reports

d. Display all countries stored in the Work.countries data set in the report, even
if there are no customers from that country.

Documentation about the MEANS procedure can be found in the SAS Help and
Documentation from the Contents tab (SAS Products Base SAS
Base SAS 9.2 Procedures Guide Procedures The MEANS Procedure).
Look for options in the PROC MEANS statement that can perform the requested actions.
e. Submit the program to produce the following report:
PROC MEANS Output

ia l
r
Average Age of Customers in Each Country

a t e The MEANS Procedure

Analysis Variable : Customer_Age Customer Age

y M
Customer
Country

n
N
Obs
Lower 90%
CL for Mean Mean
Upper 90%
CL for Mean

a r AU

i o
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

t
8 42.4983854 52.3750000 62.2516146

i e t BE

b u 0 . . .

p r t r i
CA

S
15 31.2270622 40.0000000 48.7729378

o dis SA
DE 10 35.2564025 46.6000000 57.9435975

P r DK

f
0 . . .

S f o r o
ES 0 . . .

e
FR 0 . . .

A
S No tsit d GB

IL
0

5
.

30.1150331
.

40.0000000
.

49.8849669

ou
NL 0 . . .

NO 0 . . .

PT 0 . . .

SE 0 . . .

TR 7 30.5050705 39.4285714 48.3520724

US 28 35.6505942 40.4285714 45.2065486

ZA 4 12.1696649 34.7500000 57.3303351


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
12.2 Using the MEANS Procedure 12-45

8. Creating an Output Data Set with PROC MEANS


a. Retrieve the starter program p112e08.
b. Create an output data set containing the sum of Total_Retail_Price values for each
Product_ID.

Creating an output data set from PROC MEANS results is discussed in the self-study
content at the end of this section.
c. Combine the output data set with orion.product_list to obtain the Product_Name
value for each Product_ID code.

ia l
e r
d. Sort the merged data so that the products with higher revenues appear at the top of the resulting

t
data set. Print the first 10 observations, that is, those that represent the ten products with the most
revenue.

M a
To limit the number of observations displayed by PROC PRINT, apply the

r y i o n
OBS= data set option, as in the following:

a t
proc print data=work.mydataset(obs=10);

t u
e
e. Display the revenue values with a leading euro symbol (€), a period that separates every three

r i t r i b
digits, and a comma that separates the decimal fraction.

S
r p
f. Submit the program to produce the following report:

o dis SA
PROC MEANS Output

P f
Top Ten Products by Revenue

S f
Obs

o r Revenue
o Product
Number Product

A
S No tsit 1
2
3 d e
€3.391,80
€3.080,30
€2.250,00
230100700009
230100700008
230100700011
Family Holiday 6
Family Holiday 4
Hurricane 4
4 €1.937,20 240200100173 Proplay Executive Bi-Metal Graphite

ou
5 €1.796,00 240200100076 Expert Men's Firesole Driver
6 €1.561,80 240300300090 Top R&D Long Jacket
7 €1.514,40 240300300070 Top Men's R&D Ultimate Jacket
8 €1.510,80 240100400098 Rollerskate Roller Skates Ex9 76mm/78a Biofl
9 €1.424,40 240100400129 Rollerskate Roller Skates Sq9 80-76mm/78a
10 €1.343,30 240100400043 Perfect Fit Men's Roller Skates
12-46 Chapter 12 Producing Summary Reports

12.3 Using the TABULATE Procedure (Self-Study)

Objectives
Create one-, two-, and three-dimensional tabular
reports using the TABULATE procedure.
Produce output data sets by using the OUT= option
in the PROC statement.

ia l
t e r
M a
r y i o n
t a u t
85

r i e r i b S
p
o dis SA t
P r The TABULATE Procedure

f
r
The TABULATE procedure displays descriptive statistics

S f o e o
in tabular format.

A
General form of the TABULATE procedure:

S No tsit d
PROC TABULATE DATA=SAS-data-set <options>;
CLASS classification-variable(s);
VAR analysis-variable(s);

ou
TABLE page-expression,
row-expression,
column-expression </ option(s)>;
RUN;

86

The TABULATE procedure computes many of the same statistics that are computed by other descriptive
statistical procedures such as PROC MEANS and PROC FREQ.

A CLASS statement or a VAR statement must be specified, but both statements together
are not required.
12.3 Using the TABULATE Procedure (Self-Study) 12-47

Dimensional Tables
The TABULATE procedure produces one-, two-, or three-
dimensional tables.
page row column
dimension dimension dimension
one-
dimensional
two-

ia l
r
dimensional
three-
dimensional

a t e
y M n
87

a r t i o
i e t b u
p r t i
One-Dimensional Table
r S
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†

o dis SA
‚ Country ‚

r
‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ AU ‚ US ‚

P f
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

r
‚ N ‚ N ‚

o
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

S o
‚ 63.00‚ 102.00‚

f e
Šƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

A
S No tsit d
Country is in the column dimension.

88
ou
12-48 Chapter 12 Producing Summary Reports

Two-Dimensional Table
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ N ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Gender ‚ ‚ ‚

l
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F ‚ 27.00‚ 41.00‚

ia
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 36.00‚ 61.00‚

t e r
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

Country is in the column dimension.

a
Gender is in the row dimension.

M
89

r y i o n
t a u t
r i e i b
Three-Dimensional Table
r S
p
o dis SA t
Job_Title Sales Rep. I
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†

r
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰

P f
‚ ‚ AU ‚ US ‚

r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

o
‚ ‚ N ‚ N ‚

S o
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

f e
‚Gender ‚ ‚ ‚

A
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚

S No tsit d
‚F ‚ 8.00‚ 13.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 13.00‚ 29.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

ou
Country is in the column dimension.
Gender is in the row dimension.
Job_Title is in the page dimension.
90
12.3 Using the TABULATE Procedure (Self-Study) 12-49

The TABLE Statement


The TABLE statement describes the structure of the table.

page row column


table , , ;
expression expression expression

dimension expressions

ia l
t e r
Commas separate the dimension expressions.
Every variable that is part of a dimension expression

a
must be specified as a classification variable (CLASS
statement) or an analysis variable (VAR statement).

M
91

r y i o n
t a u t
r i e
The TABLE Statement
r i b S
p
o dis SA paget
r
row column
table , , ;
expression expression expression

S P o r
Examples:
o f
A t f
table Country;

d e
S No tsitable Gender , Country;

ou
table Job_Title , Gender , Country;

92
12-50 Chapter 12 Producing Summary Reports

The CLASS Statement


The CLASS statement identifies variables to be used
as classification, or grouping, variables.
General form of the CLASS statement:

CLASS classification-variable(s);

N, the number of nonmissing values, is the default

ia l
t e r
statistic for classification variables.
Examples of classification variables:

a
Job_Title, Gender, and Country

y M n
93

a r t i o
Class variables

i e t b
• can be numeric or character
u
p r t r i S
• identify classes or categories on which calculations are done

o dis SA
• represent discrete categories if they are numeric (for example, Year).

P r f
S f o r o
A
S No tsit d e
ou
12.3 Using the TABULATE Procedure (Self-Study) 12-51

The VAR Statement


The VAR statement identifies the numeric variables
for which statistics are calculated.
General form of the VAR statement:

VAR analysis-variable(s);

SUM is the default statistic for analysis variables.

ia l
Salary and Bonus
t e r
Examples of analysis variables:

M a
94

r y i o n
t
Analysis variables
a u t
r i e
• are always numeric
• tend to be continuous
r i b S
p
o dis SA t
• are appropriate for calculating averages, sums, or other statistics.

P r f
S f o r
One-Dimensional Table
o
e
proc tabulate data=orion.sales;

A
S No tsit
class Country;

d
table Country;
run;

ou
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ Country ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ AU ‚ US ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ N ‚ N ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ 63.00‚ 102.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

p112d08
95

If there are only class variables in the TABLE statement, the default statistic is N, or number
of nonmissing values.
12-52 Chapter 12 Producing Summary Reports

Two-Dimensional Table
proc tabulate data=orion.sales;
class Gender Country;
table Gender, Country;
run;

„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†

l
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰

ia
‚ ‚ AU ‚ US ‚

r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ N ‚

e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

t
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F

a ‚ 27.00‚ 41.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M

M
‚ 36.00‚ 61.00‚

n
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

96

a r y t i o p112d08

i e t b u
p r t i
Three-Dimensional Table
r S
proc tabulate data=orion.sales;

r o dis SA
class Job_Title Gender Country;
table Job_Title, Gender, Country;

P
run;

S o r o f
A t f d e
S No tsi
97
ou p112d08
12.3 Using the TABULATE Procedure (Self-Study) 12-53

Three-Dimensional Table
Partial PROC TABULATE Output
Job_Title Sales Rep. I
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

l
‚ ‚ N ‚ N ‚
Job_Title Sales Rep. II
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

ia
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚Gender ‚ ‚ ‚
‚ ‚ Country ‚

r
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚F ‚ 8.00‚ 13.00‚
‚ ‚ AU ‚ US ‚

e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

t
‚M ‚ 13.00‚ 29.00‚
‚ ‚ N ‚ N ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

a
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F

M ‚

n
10.00‚ 14.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M

y
‚ 8.00‚ 14.00‚
98

a r i o
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

t
i e t b u
p r t i
Dimension Expression
r S
Elements that can be used in a dimension expression:

r o dis SA
classification variables
analysis variables

S P o r o f
the universal class variable ALL
keywords for statistics

A t f d e
S No tsi
Operators that can be used in a dimension expression:
blank, which concatenates table information
asterisk *, which crosses table information

ou
parentheses (), which group elements

99

Other operators include


• brackets < >, which name the denominator for row or column percentages
• equal sign =, which changes the label for a variable or a statistic.
12-54 Chapter 12 Producing Summary Reports

Dimension Expression
proc tabulate data=orion.sales;
class Gender Country;
var Salary;
table Gender all, Country*Salary;
run;
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†

l
‚ ‚ Country ‚

ia
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

r
‚ ‚ Salary ‚ Salary ‚

e
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Sum ‚ Sum ‚

t
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

a
‚Gender ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚F ‚ 747965.00‚ 1207900.00‚

M
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚M ‚ 1152050.00‚ 2033505.00‚

n
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

y
‚All ‚ 1900015.00‚ 3241405.00‚

r o
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

i
p112d08
100

t a u t
If there are analysis variables in the TABLE statement, the default statistic is SUM.

r i e r i b S
p t
PROC TABULATE Statistics
o dis SA
P r f CSS
Descriptive Statistic Keywords

CV LCLM MAX

S f o r
MEAN

o MIN MODE N NMISS

e
KURTOSIS RANGE SKEWNESS STDDEV STDERR

A
S No tsit
SUM

PCTN

PCTSUM d
SUMWGT

REPPCTN

REPPCTSUM
UCLM

PAGEPCTN

PAGEPCTSUM
USS

ROWPCTN

ROWPCTSUM
VAR

COLPCTN

COLPCTSUM

ou
Quantile Statistic Keywords

MEDIAN | P50 P1 P5 P10 Q1 | P25

Q3 | P75 P90 P95 P99 QRANGE

Hypothesis Testing Keywords

PROBT T
101
12.3 Using the TABULATE Procedure (Self-Study) 12-55

PROC TABULATE Statistics


proc tabulate data=orion.sales;
class Gender Country;
var Salary;
table Gender all, Country*Salary*(min max);
run;

l
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Country ‚

ia
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ AU ‚ US ‚

r
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Salary ‚ Salary ‚

e
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰

t
‚ ‚ Min ‚ Max ‚ Min ‚ Max ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

a
‚Gender ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚F ‚ 25185.00‚ 30890.00‚ 25390.00‚ 83505.00‚

M
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

n
‚M ‚ 25745.00‚ 108255.00‚ 22710.00‚ 243190.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

y o
‚All ‚ 25185.00‚ 108255.00‚ 22710.00‚ 243190.00‚

r i
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
p112d06
102

t a u t
r i e i b
Additional SAS Statements
r S
p t
Additional statements can be added to enhance the report.

o dis SA
r
proc format;
value $ctryfmt 'AU'='Australia'

S P r
run;

o o f
'US'='United States';

f
options nodate pageno=1;

A
S No tsit d e
ods html file='p112d08.html';
proc tabulate data=orion.sales;
class Gender Country;
var Salary;
table Gender all, Country*Salary*(min max);

ou
where Job_Title contains 'Rep';
label Salary='Annual Salary';
format Country $ctryfmt.;
title 'Sales Rep Tabular Report';
run;
ods html close; p112d08
103
12-56 Chapter 12 Producing Summary Reports

Additional SAS Statements


HTML Output

ia l
t e r
M a
104

r y i o n
t a u t
r i e
Output Data Sets
r i b S
p t
PROC TABULATE produces output data sets using the

o dis SA
following method:

P r f
PROC TABULATE DATA=SAS-data-set

S f o r o
OUT=SAS-data-set <options>;

A
S No tsit d e
The output data set contains the following variables:
BY variables
class variables
the automatic variables _TYPE_, _PAGE_, and

ou
_TABLE_
calculated statistics

106
12.3 Using the TABULATE Procedure (Self-Study) 12-57

PROC Statement OUT= Option


proc tabulate data=orion.sales
out=work.tabulate;
where Job_Title contains 'Rep';
class Job_Title Gender Country;
table Country;
table Gender, Country;

l
table Job_Title, Gender, Country;
run;

proc print data=work.tabulate;


run;

e r ia
a t
y M n
107

a r t i o p112d09

i e t b u
p r t i
PROC Statement OUT= Option
r
Partial PROC PRINT Output
S
r o dis SA
Obs Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_ N

P f
1 AU 001 1 1 61

r
2 US 001 1 1 98

o
3 F AU 011 1 2 27

S o
4 F US 011 1 2 40

f e
5 M AU 011 1 2 34

A
6 M US 011 1 2 58

S No tsit 7
8
9
10
11
d
Sales
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
Rep.
I
I
I
I
II
F
F
M
M
F
AU
US
AU
US
AU
111
111
111
111
111
1
1
1
1
2
3
3
3
3
3
8
13
13
29
10

ou
12 Sales Rep. II F US 111 2 3 14
13 Sales Rep. II M AU 111 2 3 8
14 Sales Rep. II M US 111 2 3 14
15 Sales Rep. III F AU 111 3 3 7
16 Sales Rep. III F US 111 3 3 8
17 Sales Rep. III M AU 111 3 3 10
18 Sales Rep. III M US 111 3 3 9
108
12-58 Chapter 12 Producing Summary Reports

PROC Statement OUT= Option


_TYPE_ is a character variable that shows which
combination of class variables produced the summary
statistics in that observation.

Partial PROC PRINT Output


Obs

1
Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_

AU 001 1 1 61
N

ia l
2
3
4
5

t e r F
F
M
US
AU
US
AU
001
011
011
011
1
1
1
01for
1
2
2
2
98
27
40
Job_Title
34
,

a
6 M US 011 11for Gender
2 , and
58
1 for Country

y M n
109

a r t i o
i e t b u
p r t i
PROC Statement OUT= Option
r S
_PAGE_ is a numeric variable that shows the logical page

r o dis SA
number that contains that observation.

S P o r
Obs f
Partial PROC PRINT Output

o
Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_ N

A t f 7
8

d e
Sales
Sales
Rep.
Rep.
I
I
F
F
AU
US
111
111
1
1
3
3 Page
8
13 1 for

S No tsi 9
10
11
12
Sales
Sales
Sales
Sales
Rep.
Rep.
Rep.
Rep.
I
I
II
II
M
M
F
F
AU
US
AU
US
111
111
111
111
1
1
2
2
3Sales
3
3
3 Page
13Rep. I
29
10
14 2 for

ou
13 Sales Rep. II M AU 111 2 3Sales 8Rep. II
14 Sales Rep. II M US 111 2 3 14
15 Sales Rep. III F AU 111 3 3 7
16 Sales Rep. III F US 111 3 3 Page8 3 for
17 Sales Rep. III M AU 111 3 3Sales10Rep. III
18 Sales Rep. III M US 111 3 3 9

110
12.3 Using the TABULATE Procedure (Self-Study) 12-59

PROC Statement OUT= Option


_TABLE_ is a numeric variable that shows the number
of the TABLE statement that contains that observation.

Partial PROC PRINT Output


Obs Job_Title Gender Country _TYPE_ _PAGE_ _TABLE_ N

1
2
AU 001
1 for firstUSTABLE001
1
statement
1
1
1
61
98

ia l
r
3 F AU 011 1 2 27
4 F US 011 1 2 40

e
5
2 forMsecondAUTABLE011statement
1 2 34
6
7
8
9
Sales
Sales
Sales
Rep.
Rep.
Rep.
a
I
I
I t M
F
F
US
AU
US
011
111
111
3Mfor thirdAUTABLE111
1
1
1
statement
1
2
3
3
3
58
8
13
13
10 Sales

y
Rep.

M I

n
M US 111 1 3 29

111

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
12-60 Chapter 12 Producing Summary Reports

Exercises

Level 1

9. Creating a Simple Tabular Report with PROC TABULATE


a. Retrieve the starter program p112e09.

ia l
e r
b. Add a CLASS statement to enable Customer_Group and Customer_Gender

t
as classification variables.

a
c. Add a VAR statement to enable Customer_Age as an analysis variable

M
r i o n
d. Add a TABLE statement to create a report with the following characteristics:

y
1) Customer_Group defines the rows.

t a u t
2) An extra row that combines all groups appears at the bottom of the table.

r i e i b
3) Customer_Gender defines the columns.

r S
p t
4) The N and MEAN statistics based on Customer_Age are displayed for each

o dis SA
combination of Customer_Group and Customer_Gender.

P r f
e. Submit the program to produce the following report:

S f r o
PROC TABULATE Output

o e
Ages of Customers by Group and Gender

A
S No tsit d„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†


‚ Customer Gender ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰

ou
‚ ‚ F ‚ M ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ Customer Age ‚ Customer Age ‚
‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚ N ‚ Mean ‚ N ‚ Mean ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Customer Group Name ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚
‚Internet/Catalog ‚ ‚ ‚ ‚ ‚
‚Customers ‚ 4.00‚ 49.25‚ 4.00‚ 54.25‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club Gold ‚ ‚ ‚ ‚ ‚
‚members ‚ 11.00‚ 35.36‚ 10.00‚ 38.90‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club members ‚ 15.00‚ 32.53‚ 33.00‚ 47.03‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚All ‚ 30.00‚ 35.80‚ 47.00‚ 45.91‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
12.3 Using the TABULATE Procedure (Self-Study) 12-61

Level 2

10. Creating a Three-Dimensional Tabular Report with PROC TABULATE


a. Retrieve the starter program p112e10.
b. Define a tabular report with the following characteristics:
1) Customer_Gender defines the page dimension.

2) Customer_Group defines the row dimension.

ia l
r
3) The column dimension should display the number of customers and the percentage

e
of customers in each category (COLPCTN).

a t
Change the headers for the statistic columns with a KEYLABEL statement.
Documentation about the KEYLABEL statement can be found in the SAS Help

M
and Documentation from the Contents tab (SAS Products Base SAS

n
Base SAS 9.2 Procedures Guide Procedures The TABULATE Procedure).

y
r t i o
c. Submit the program to produce the following two-page report:

a
i e t
PROC TABULATE Output

b u
i
Customers by Group and Gender

p r t r S
Customer Gender F

o dis SA
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ†

r
‚ ‚ Number ‚ Percentage ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

S P o r o f ‚Customer Group Name ‚


‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Internet/Catalog ‚





A t f d e
‚Customers ‚ 4.00‚ 13.33‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

S No tsi
‚Orion Club Gold ‚ ‚ ‚
‚members ‚ 11.00‚ 36.67‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

ou
‚Orion Club members ‚ 15.00‚ 50.00‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ
12-62 Chapter 12 Producing Summary Reports

Customers by Group and Gender

Customer Gender M
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ†
‚ ‚ Number ‚ Percentage ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Customer Group Name ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚
‚Internet/Catalog ‚ ‚ ‚
‚Customers ‚ 4.00‚ 8.51‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚Orion Club Gold ‚ ‚ ‚

ia l
r
‚members ‚ 10.00‚ 21.28‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

a t e
‚Orion Club members ‚ 33.00‚ 70.21‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

M
Level 3

y o n
11. Creating a Customized Tabular Report with PROC TABULATE

r i
a t
a. Retrieve the starter program p112e11.

t u
r e r i b
b. Modify the label for the Total_Retail_Price variable.

i S
c. Suppress the labels for the Order_Date and Product_ID variables.

p
o dis SA t
d. Suppress the label for the SUM keyword.

P r
e. Insert this text into the box above the row titles: High Cost Products (Unit Cost >

f
S f o r
$250). Suppress all titles.

o
f. Display all calculated cell values with the DOLLAR12. format.

A
S No tsit d e
g. Display $0 in all cells that have no calculated value.

Documentation about the TABULATE procedure can be found in the SAS Help
and Documentation from the Contents tab (SAS Products Base SAS

ou
Base SAS 9.2 Procedures Guide Procedures The TABULATE Procedure).
Look for features of the PROC TABULATE statement, the TABLE statement, and
the KEYLABEL statement that can perform the requested actions.
12.3 Using the TABULATE Procedure (Self-Study) 12-63

h. Submit the program to produce the following report:


PROC TABULATE Output
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ†
‚High Cost Products ‚ Revenue for Each Product ‚
‚(Unit Cost > $250) ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚ ‚230100700008‚230100700009‚240300100028‚240300100032‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚2003 ‚ $0‚ $0‚ $0‚ $1,200‚

l
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰
‚2005 ‚ $2,057‚ $2,256‚ $0‚ $0‚

ia
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

r
‚2006 ‚ $0‚ $1,136‚ $0‚ $0‚

e
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰

t
‚2007 ‚ $519‚ $0‚ $1,066‚ $0‚

a
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

M n
12. Creating an Output Data Set with PROC TABULATE

y
a r t i o
a. Retrieve the starter program p112e12.

t
b. Create an output data set from the PROC TABULATE results. The output data set should contain

i e i b u
average salaries for each combination of Company and Employee_Gender, plus overall
averages for each Company.

p r t r S
Creating an output data set from PROC TABULATE results is discussed in the self-study

r o dis SAcontent at the end of this section.

P
c. Sort the data set by average salary.

S r o f
d. Print the sorted data set. Assign a format and column header to the average salary column.

o
A t f d e
S No tsi
ou
12-64 Chapter 12 Producing Summary Reports

e. Submit the program to produce the following report:


PROC PRINT Output
Average Employee Salaries

Employee Average
Obs Company Gender Salary

1 Orion Australia F $27,760

l
2 Orion USA F $29,167
3 Orion Australia $30,574

ia
4 Orion USA $31,226

r
5 Orion USA M $32,534

e
6 Orion Australia M $32,963

t
7 Concession F $33,375

a
8 Purchasing M $33,462
9 Concession $33,839

M
10 Concession M $34,650

n
11 Purchasing $38,408

y
12 Logistics F $39,055

a r 13

t
14

i o Purchasing
Marketing
F
M
$41,556
$42,645

i e t b u
15
16
Logistics
Shared Functions M
$43,128
$43,428

i
17 Marketing $44,390

p r t r
18
19

S
Shared Functions
Shared Functions F
$44,631
$46,016

o dis SA
20 Marketing F $47,132

r
21 Logistics M $47,630
22 Board of Directors F $68,370

S P o r o f 23
24
Board of Directors
Board of Directors M
$134,034
$212,831

A t f d e
S No tsi
ou
12.4 Chapter Review 12-65

12.4 Chapter Review

Chapter Review
1. What statistics are produced by default by PROC FREQ?

2. How can you produce a two-way frequency table using


PROC FREQ?

ia l
e r
3. What is the purpose of the VAR statement in PROC
MEANS?
t
M a
4. What is the purpose of the CLASS statement in PROC
MEANS?

r y i o n
t a
PROC MEANS output?
u t
5. How can you change which statistics are displayed in

114

r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-66 Chapter 12 Producing Summary Reports

12.5 Solutions

Solutions to Exercises
1. Counting Levels of a Variable with PROC FREQ
a. Retrieve the starter program.
b. Modify the program to produce two separate reports.
proc freq data=orion.orders nlevels;

ia l
r
where Order_Type=1;

e
tables Customer_ID Employee_ID / noprint;

run;

a t
title1 'Unique Customers and Salespersons for Retail Sales';

y M
proc freq data=orion.orders nlevels;
where Order_Type ne 1;
n
r i
tables Customer_ID / noprint;

a t o
title1 'Unique Customers for Catalog and Internet Sales';
run;

i e t b u
i
c. Submit the program.

p r t r S
2. Producing Frequency Reports with PROC FREQ

o dis SA
a. Retrieve the starter program.

r
S Pproc format;

o o f
b. Add TABLES statements to the PROC FREQ step.

r
f
value ordertypes

A
S No tsit
run; d e
1='Retail'
2='Catalog'
3='Internet';

ou
proc freq data=orion.orders ;
tables Order_Date;
tables Order_Type / nocum;
tables Order_Date*Order_Type / nopercent norow nocol;
format Order_Date year4. Order_Type ordertypes.;
title 'Order Summary by Year and Type';
run;
c. Submit the program.
12.5 Solutions 12-67

3. Displaying PROC FREQ Output in Descending Frequency Order


a. Retrieve the starter program.
b. Submit the program.
c. What are the two most common values for each variable?
The top two countries are US (United States, 28 customers) and CA (Canada, 15 customers).

The top two customer types are Orion Club members medium activity (20) and
Orion Club members low activity (17).

ia l
t e r
The top two customer age groups are 31-45 years (27) and 15-30 years (22).

d. Modify the program to display the frequency counts in descending order.

M a
proc freq data=orion.customer_dim order=freq;
tables Customer_Country Customer_Type Customer_Age_Group;
title1 'Customer Demographics';

r y i o n
title3 '(Top two levels for each variable?)';

t
run;

t a u
e. Submit the modified program.

e
r i i b
f. What are the two most common values for each variable?

t r S
r p
The top two countries are US (United States, 28 customers) and CA (Canada, 15 customers).

o dis SA
The top two customer types are Orion Club members medium activity (20) and

P f
Orion Club members low activity (17).

S r o
The top two customer age groups are 31-45 years (27) and 15-30 years (22).

f o
A
S No tsit d e
Which report was easier to use to answer the questions correctly?
The ORDER=FREQ option in the PROC FREQ statement sequences the frequency output
in descending count order. Because the levels that occur most often appear near the top of
each report, the most common data values can be identified more easily.

ou
4. Creating an Output Data Set with PROC FREQ
a. Retrieve the starter program.
b. Create an output data set.
proc freq data=orion.order_fact noprint;
tables Product_ID / out=product_orders;
run;
c. Combine the output data set with orion.product_list.
data product_names;
merge product_orders orion.product_list;
by Product_ID;
keep Product_ID Product_Name Count;
run;
12-68 Chapter 12 Producing Summary Reports

d. Sort the merged data and print the first 10 observations.


proc sort data=product_names;
by descending Count;
run;

proc print data=product_names(obs=10) label;


var Count Product_ID Product_Name;
label Product_ID='Product Number'
Product_Name='Product'
Count='Orders';
title 'Top Ten Products by Number of Orders';

ia l
run;
e. Submit the program.

t e r
a
5. Creating a Summary Report with PROC MEANS

M n
a. Retrieve the starter program.

r y i o
b. Display only the SUM statistic for the Total_Retail_Price variable.
proc format;
a t
i e t
value ordertypes
1='Retail'

b u
p r
2='Catalog'

t
3='Internet';
r i S
o dis SA
run;

P r
proc means data=orion.order_fact sum;

f
r
var Total_Retail_Price;

S run;

f o e o
title 'Revenue (in U.S. Dollars) Earned from All Orders';

A t d
c. Display separate statistics for the combination of Order_Date and Order_Type.

S No tsi
proc means data=orion.order_fact sum;
var Total_Retail_Price;

ou
class Order_Date Order_Type;
format Order_Date year4. Order_Type ordertypes.;
title 'Revenue (in U.S. Dollars) Earned from All Orders';
run;
d. Submit the program.
6. Analyzing Missing Numeric Values with PROC MEANS
a. Retrieve the starter program.
b. Display the number of missing values and the number of nonmissing values.
proc means data=orion.staff nmiss n;
var Birth_Date Emp_Hire_Date Emp_Term_Date;
title 'Number of Missing and Non-Missing Date Values';
run;
12.5 Solutions 12-69

c. Suppress any decimal places.


proc means data=orion.staff nmiss n maxdec=0;
var Birth_Date Emp_Hire_Date Emp_Term_Date;
title 'Number of Missing and Non-Missing Date Values';
run;
d. Display separate statistics for each value of Gender.
proc means data=orion.staff nmiss n maxdec=0;

l
var Birth_Date Emp_Hire_Date Emp_Term_Date;
class Gender;
title 'Number of Missing and Non-Missing Date Values';
run;

e r ia
a t
e. Suppress the output column that displays the total number of observations.
proc means data=orion.staff nmiss n maxdec=0 nonobs;

M
var Birth_Date Emp_Hire_Date Emp_Term_Date;

n
class Gender;

run;

a r y
title 'Number of Missing and Non-Missing Date Values';

t i o
i e t
f. Submit the program.

b u
i
7. Analyzing All Possible Classification Levels with PROC MEANS

p r t r
a. Retrieve the starter program.
S
r o dis SA
b. Display statistics in the report.
data work.countries(keep=Customer_Country);

S P o r
set orion.supplier;

o f
Customer_Country=Country;

f e
run;

A
S No tsit d
proc means data=orion.customer_dim
lclm mean uclm;
class Customer_Country;

ou
var Customer_Age;
title 'Average Age of Customers in Each Country';
run;

c. Change the value.


proc means data=orion.customer_dim
lclm mean uclm alpha=0.10;
class Customer_Country;
var Customer_Age;
title 'Average Age of Customers in Each Country';
run;
12-70 Chapter 12 Producing Summary Reports

d. Display all countries.


proc means data=orion.customer_dim
classdata=work.countries
lclm mean uclm alpha=0.10;
class Customer_Country;
var Customer_Age;
title 'Average Age of Customers in Each Country';
run;
e. Submit the program.
8. Creating an Output Data Set with PROC MEANS
ia l
e
a. Retrieve the starter program.

t r
M a
b. Create an output data set.
proc means data=orion.order_fact noprint nway;

n
class Product_ID;

run;

a y
var Total_Retail_Price;

r t i o
output out=product_orders sum=Product_Revenue;

e t b u
c. Combine the output data set with orion.product_list.

i
p r
data product_names;

t r i S
merge product_orders orion.product_list;

o dis SA
by Product_ID;

r
keep Product_ID Product_Name Product_Revenue;
run;

S P o r o f
d. Sort the merged data and print the first 10 observations.

f e
proc sort data=product_names;

A
S No tsit
by descending Product_Revenue;
run;
d
proc print data=product_names(obs=10) label;

ou
var Product_Revenue Product_ID Product_Name;
label Product_ID='Product Number'
Product_Name='Product'
Product_Revenue='Revenue';
title 'Top Ten Products by Revenue';
run;
e. Display the revenue values with a leading euro symbol.
proc print data=product_names(obs=10) label;
var Product_Revenue Product_ID Product_Name;
label Product_ID='Product Number'
Product_Name='Product'
Product_Revenue='Revenue';
format Product_Revenue eurox12.2;
title 'Top Ten Products by Revenue';
run;
12.5 Solutions 12-71

f. Submit the program.


9. Creating a Simple Tabular Report with PROC TABULATE
a. Retrieve the starter program.
b. Add a CLASS statement.
proc tabulate data=orion.customer_dim;
class Customer_Group Customer_Gender;

l
title 'Ages of Customers by Group and Gender';

ia
run;
c. Add a VAR statement.

e r
proc tabulate data=orion.customer_dim;

t
class Customer_Group Customer_Gender;
var Customer_Age;

M a
title 'Ages of Customers by Group and Gender';

n
run;

r y
d. Add a TABLE statement.

a t i o
proc tabulate data=orion.customer_dim;

i e t
var Customer_Age;

b u
class Customer_Group Customer_Gender;

p r t r i
table Customer_Group all,

S
Customer_Gender*Customer_Age*(n mean);

o dis SA
title 'Ages of Customers by Group and Gender';

r
run;

S Pe. Submit the program.

o r o f
10. Creating a Three-Dimensional Tabular Report with PROC TABULATE

A t f e
a. Retrieve the starter program.

d
S No tsi
b. Define a tabular report.
proc tabulate data=orion.customer_dim;

ou
class Customer_Gender Customer_Group;
table Customer_Gender, Customer_Group, (n colpctn);
keylabel colpctn='Percentage' N='Number';
title 'Customers by Group and Gender';
run;
c. Submit the program.
12-72 Chapter 12 Producing Summary Reports

11. Creating a Customized Tabular Report with PROC TABULATE


a. Retrieve the starter program.
b. Modify the label for the Total_Retail_Price variable.
proc tabulate data=orion.order_fact;
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;
format Order_Date year4.;
var Total_Retail_Price;
table Order_Date, Total_Retail_Price*sum*Product_ID;

ia l
r
label Total_Retail_Price='Revenue for Each Product';
title;
run;

a t e
c. Suppress the labels for the Order_Date and Product_ID variables.

M
proc tabulate data=orion.order_fact;
where CostPrice_Per_Unit > 250;

y n
r
class Product_ID Order_Date;
format Order_Date year4.;

a t i o
t u
var Total_Retail_Price;

e
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' ';

r
title;
i t r i b
label Total_Retail_Price='Revenue for Each Product';

S
run;

r p
o dis SA
d. Suppress the label for the SUM keyword.

P f
proc tabulate data=orion.order_fact;

S o r o
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;

f e
format Order_Date year4.;

A
S No tsit d
var Total_Retail_Price;
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' ';
label Total_Retail_Price='Revenue for Each Product';
keylabel Sum=' ';

ou
title;
run;
e. Insert text into the box above the row titles.
proc tabulate data=orion.order_fact;
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;
format Order_Date year4.;
var Total_Retail_Price;
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' '
/ box='High Cost Products (Unit Cost > $250)';
label Total_Retail_Price='Revenue for Each Product';
keylabel Sum=' ';
title;
run;
12.5 Solutions 12-73

f. Display all calculated cell values with the DOLLAR12. format.


proc tabulate data=orion.order_fact format=dollar12.;
where CostPrice_Per_Unit > 250;
class Product_ID Order_Date;
format Order_Date year4.;
var Total_Retail_Price;
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' '
/ box='High Cost Products (Unit Cost > $250)';
label Total_Retail_Price='Revenue for Each Product';
keylabel Sum=' ';
title;

ia l
run;

t e r
g. Display $0 in all cells that have no calculated value.

M a
proc tabulate data=orion.order_fact format=dollar12.;
where CostPrice_Per_Unit > 250;

n
class Product_ID Order_Date;

y
format Order_Date year4.;

r
var Total_Retail_Price;

a t i o
table Order_Date=' ', Total_Retail_Price*sum*Product_ID=' '

i e t
/ misstext='$0'

b u
box='High Cost Products (Unit Cost > $250)';

p r t
keylabel Sum=' ';
i
label Total_Retail_Price='Revenue for Each Product';

r S
o dis SA
title;

r
run;

S Ph. Submit the program.

o r o f
12. Creating an Output Data Set with PROC TABULATE

A f e
a. Retrieve the starter program.

t d
S No tsi
b. Create an output data set.
proc tabulate data=orion.Organization_Dim format=dollar12.

ou
out=work.Salaries;
class Employee_Gender Company;
var Salary;
table Company, (Employee_Gender all)*Salary*mean;
title 'Average Employee Salaries';
run;
c. Sort the data set.
proc sort data=work.Salaries;
by Salary_Mean;
run;
12-74 Chapter 12 Producing Summary Reports

d. Print the sorted data set.


proc print data=work.Salaries label;
var Company Employee_Gender Salary_Mean;
format Salary_Mean dollar12.;
label Salary_Mean='Average Salary';
title 'Average Employee Salaries';
run;
e. Submit the program.

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12.5 Solutions 12-75

Solutions to Student Activities (Polls/Quizzes)

12.01 Multiple Choice Poll – Correct Answer


Which of the following statements cannot be added
to the PROC FREQ step to enhance the report?
a.
b.
FORMAT
SET

ia l
c.
d.
TITLE
WHERE

t e r
M a
r y i o n
t a u t
12

r i e r i b S
p
o dis SA t
P r 12.02 Quiz – Correct Answer

f
r
Which TABLES statement correctly creates the report?

S o o
a. ;tables Gender nocum;

f e
A
S No tsit d
b. ;tables Gender nocum nopercent;

c. ;tables Gender / nopercent;

ou
d. ;tables Gender / nocum nopercent;

The FREQ Procedure

Gender Frequency
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F 68
M 97

p112d01
20
12-76 Chapter 12 Producing Summary Reports

12.03 Quiz – Correct Answer


The first part of the PROC FREQ output is in the SAS
data set that was created with the TABLES statement.
Obs Gender COUNT PERCENT CUM_FREQ CUM_PCT

1 F 68 41.2121 68 41.212
2 M 97 58.7879 165 100.000

The second part of the PROC FREQ output is in the


SAS data set that was created with the OUTPUT

ia l
statement.

Obs

t
N

e r _PCHI_ DF_PCHI P_PCHI

M a 165 5.09697 1 0.023968

42

r y i o n
t a u t
r i e i b
12.04 Quiz – Correct Answer
r S
p t
For a given data set, there are 63 observations with

o dis SA
a Country value of AU. Of those 63 observations,

P r only 61 observations have a value for Salary.

f
S f o r
Which output is correct?

o
A
S No tsit
a.
e
Analysis Variable : Salary

d Country
N
Obs N
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
AU 63 61
b. Analysis Variable : Salary

Country

AU
N
Obs

61
N
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
63

ou
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

55
12.5 Solutions 12-77

12.05 Quiz – Correct Answer


proc means data=orion.sales noprint chartype;
var Salary;
class Gender Country;
output out=work.means2
min=minSalary max=maxSalary
sum=sumSalary mean=aveSalary;
run;
proc print data=work.means2;
where _type_= '10';

ia l
r
run;

4 F 10

a t
68
e min max
Obs Gender Country _TYPE_ _FREQ_ Salary Salary
sum
Salary
ave
Salary

25185 83505 1955865 28762.72

M
5 M 10 97 22710 243190 3185555 32840.77

78

r y i o n p112a02s

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-78 Chapter 12 Producing Summary Reports

Solutions to Chapter Review

Chapter Review Answers


1. What statistics are produced by default by PROC FREQ?
Frequency
Percent
Cumulative Frequency

ia l
Cumulative Percent

t e r
a
2. How can you produce a two-way frequency table using
PROC FREQ?

y M
By using an asterisk between two variables in a

n
TABLES statement, you can produce a two-way

r i o
frequency table in PROC FREQ.

a t
Example: tables Gender*Country;

i e t b u
i
115 continued...

p r t r S
r o dis SA
Chapter Review Answers

S P o r
MEANS?

o f
3. What is the purpose of the VAR statement in PROC

A t f e
The VAR statement identifies the analysis
variables and their order in the PROC MEANS

d
S No tsi results.

4. What is the purpose of the CLASS statement in PROC

ou
MEANS?
The CLASS statement identifies variables whose
values define subgroups for the PROC MEANS
analysis.

116 continued...
12.5 Solutions 12-79

Chapter Review Answers


5. How can you change which statistics are displayed in
PROC MEANS output?
The statistics to compute and the order to display
them can be specified as statistical keywords in
the PROC MEANS statement.
Example:
proc means data=orion.sales sum mean;

ia l
t e r
M a
117

r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
12-80 Chapter 12 Producing Summary Reports

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Chapter 13 Introduction to Graphics
Using SAS/GRAPH (Self-Study)

l
13.1 Introduction................................................................................................................... 13-3

ia
t e r
13.2 Creating Bar and Pie Charts ........................................................................................ 13-9
Demonstration: Creating Bar and Pie Charts..................................................................... 13-10

M a
13.3 Creating Plots ............................................................................................................. 13-21

y o n
Demonstration: Creating Plots ........................................................................................... 13-22

r i
t a u t
13.4 Enhancing Output ...................................................................................................... 13-25

r i e i b
Demonstration: Enhancing Output ..................................................................................... 13-26

r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-2 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.1 Introduction 13-3

13.1 Introduction

What Is SAS/GRAPH Software?


SAS/GRAPH software is a component of SAS software
that enables you to create the following types of graphs:
bar, block, and pie charts
two-dimensional scatter plots and line plots

ia l
contour plots

t e r
three-dimensional scatter and surface plots

a
maps
text slides
custom graphs

y M n
a r t i o
i e t b u
i
3

p r t r S
r o dis SA
Bar Charts (GCHART Procedure)

S P o r o f
A t f d e
S No tsi
ou
4
13-4 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Pie Charts (GCHART Procedure)

ia l
t e r
M a
5

r y i o n
t a u t
r i e i b
Scatter and Line Plots (GPLOT Procedure)
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
6
ou
13.1 Introduction 13-5

Bar Charts with Line Plot Overlay


(GBARLINE Procedure)

ia l
t e r
M a
7

r y i o n
t a u t
r i e i b
Three-Dimensional Surface and Scatter Plots
r S
p
(G3D Procedure)

o dis SA t
P r f
S f o r o
A
S No tsit d e
8
ou
13-6 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Three-Dimensional Contour Plots


(GCONTOUR Procedure)

ia l
t e r
M a
9

r y i o n
t a u t
r i e i b
Maps (GMAP Procedure)
r S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
10
ou
13.1 Introduction 13-7

Multiple Graphs on a Page


(GREPLAY Procedure)

ia l
t e r
M a
11

r y i o n
t a u t
r i e
SAS/GRAPH Programs
r i b S
p t
General form of a SAS/GRAPH program:

o dis SA
P r GOPTIONS options;
global statements

f
S f o r o
graphics procedure steps

A
S No tsi
For example:

t d e
goptions cback=white;
title 'Number of Employees by Job Title';
proc gchart data=orion.staff;

ou
vbar Job_Title;
run;
quit;

12
13-8 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

RUN-Group Processing
Many SAS/GRAPH procedures can use RUN-group
processing, which means that the following are true:
The procedure executes the group of statements
following the PROC statement when a RUN
statement is encountered.
Additional statements followed by another RUN
statement can be submitted without resubmitting the
PROC statement.
ia l
e r
The procedure stays active until a PROC, DATA, or

t
QUIT statement is encountered.

M a
13

r y i o n
t a u t
r i e i b
Example of RUN-Group Processing
r S
p t
proc gchart data=orion.staff;

o dis SA
vbar Job_Title;

P r title 'Bar Chart of Job Titles';


run;

f
r
pie Job_Title;

S f o
run;

e o
title 'Pie Chart of Job Titles';

A
quit;

S No tsit d
14
ou
13.2 Creating Bar and Pie Charts 13-9

13.2 Creating Bar and Pie Charts

Producing Bar and Pie Charts with the


GCHART Procedure
General form of the PROC GCHART statement:

PROC GCHART DATA=SAS-data-set;

ia l
e r
Use one of these statements to specify the chart type:

t
a
HBAR chart-variable . . . </ options>;
HBAR3D chart-variable . . . </ options>;

M
r i o n
VBAR chart-variable . . . </ options>;

y
VBAR3D chart-variable . . . </ options>;

t a u t
PIE chart-variable . . . </ options>;
PIE3D chart-variable . . . </ options>;
16

r i e r i b S
p t
The chart variable determines the number of bars or slices produced within a graph. The chart variable

o dis SA
can be character or numeric. By default, the height, length, or slice represents a frequency count of the

P r
values of the chart variable.

f
S f o r o
A
S No tsit d e
ou
13-10 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Creating Bar and Pie Charts


p113d01
1. Submit the first PROC GCHART step to create a vertical bar chart representing a frequency count.
The VBAR statement creates a vertical bar chart showing the number of sales representatives for
each value of Job_Title in the orion.staff data set. Job_Title is referred to as the chart

ia
variable. Because the chart variable is a character variable, PROC GCHART displays one bar for
l
r
each value of Job_Title.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title;
a t e
where Job_Title =:'Sales Rep';

M n
title 'Number of Employees by Job Title';
run;

y
quit;

a r t i o
i e t u
The RESET=ALL option resets all graphics options to their default settings and clears any titles
or footnotes that are in effect.

b
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-11

2. Submit the second PROC GCHART step to create a three-dimensional horizontal bar chart.
The HBAR3D statement creates a three-dimensional horizontal bar chart showing the same
information as the previous bar chart. Notice that the HBAR and HBAR3D statements automatically
display statistics to the right of the chart.
goptions reset=all;
proc gchart data=orion.staff;
hbar3d Job_Title;
title 'Number of Employees by Job Title';

run;
where Job_Title =:'Sales Rep';

ia l
r
quit;

a t e
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-12 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

3. Submit the third PROC GCHART step to suppress the display of statistics on the horizontal bar chart.
The NOSTATS option in the HBAR3D statement suppresses the display of statistics on the chart.
goptions reset=all;
proc gchart data=orion.staff;
hbar3d Job_Title / nostats;
title 'Number of Employees by Job Title';
where Job_Title =:'Sales Rep';
run;
quit;

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.2 Creating Bar and Pie Charts 13-13

4. Submit the fourth PROC GCHART step to use a numeric chart variable.
The VBAR3D statement creates a vertical bar chart showing the distribution of values of the variable
Salary. Because the chart variable is numeric, PROC GCHART divides the values of Salary into
ranges and displays one bar for each range. The value under the bar represents the midpoint of the
range. The FORMAT statement assigns the DOLLAR9. format to Salary.
goptions reset=all;
proc gchart data=orion.staff;
vbar3d salary / autoref;
where Job_Title =:'Sales Rep';
format salary dollar9.;

ia l
run;
quit;
t r
title 'Salary Distribution Midpoints for Sales Reps';

e
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-14 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

5. Submit the fifth PROC GCHART step to specify ranges for a numeric chart variable and add
reference lines.
The HBAR3D statement creates a three-dimensional horizontal bar chart.
The LEVELS= option in the HBAR3D statement divides the values of Salary into five ranges and
display a bar for each range of values.
The RANGE option in the HBAR3D statement displays the range of values, rather than the midpoint,
under each bar.
The AUTOREF option displays reference lines at each major tick mark on the horizontal (response)
axis.
goptions reset=all;

ia l
proc gchart data=orion.staff;

e r
hbar3d salary/levels=5 range autoref;

t
where Job_Title =:'Sales Rep';

M a
format salary dollar9.;
title 'Salary Distribution Ranges for Sales Reps';

n
run;
quit;

a r y t i o
To display a bar for each unique value of the chart variable, specify the DISCRETE option instead

i e t u
of the LEVELS= option in the VBAR, VBAR3D, HBAR, or HBAR3D statement. The DISCRETE
option should be used only when the chart variable has a relatively small number of unique values.

b
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-15

6. Submit the sixth PROC GCHART step to create bar charts based on statistics.
A vertical bar chart displays one bar for each value of the variable Job_Title by specifying
Job_Title as the chart variable in the VBAR statement. The height of the bar should be based
on the mean value of the variable Salary for each job title.
The SUMVAR= option in the VBAR statement specifies the variable whose values control the
height or length of the bars. This variable (Salary, in this case) is known as the analysis variable.
The TYPE= option in the VBAR statement specifies the statistic for the analysis variable that

l
dictates the height or length of the bars. Possible values for the TYPE= option are SUM and

ia
MEAN.

goptions reset=all;

t e r
A LABEL statement assigns labels to the variables Job_Title and Salary.

a
proc gchart data=orion.staff;
vbar Job_Title / sumvar=salary type=mean;

M
where Job_Title =:'Sales Rep';
format salary dollar9.;

y n
o
label Job_Title='Job Title'

r
Salary='Salary';

t a t i
title 'Average Salary by Job Title';
run;
quit;

i e i b u
p r t r S
If the TYPE= option is not specified, the default value of the TYPE= option is SUM.

r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-16 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

7. Submit the seventh PROC GCHART step to assign a different color to each bar and display the
mean statistic on the top of each bar.
The PATTERNID=MIDPOINT option in the VBAR statement causes PROC GCHART to assign
a different pattern or color to each value of the midpoint (chart) variable.
The MEAN option in the VBAR statement displays the mean statistic on the top of each bar. Other
options such as SUM, FREQ, and PERCENT can be specified to display other statistics on the top
of the bars.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title / sumvar=salary type=mean patternid=midpoint mean;

ia l
r
where Job_Title =:'Sales Rep';
format salary dollar9.;

run;
quit;
a t e
title 'Average Salary by Job Title';

M n
Only one statistic can be displayed on top of each vertical bar. For horizontal bar charts, you can

y o
specify multiple statistics, which are displayed to the right of the bars.

t a r t i
i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.2 Creating Bar and Pie Charts 13-17

8. Submit the eighth PROC GCHART step to divide the bars into subgroups.
A VBAR statement creates a vertical bar chart that shows the number of sales representatives for
each value of Job_Title.

The SUBGROUP= option in the VBAR statement divides the bar into sections, where each section
represents the frequency count for each value of the variable Gender.
goptions reset=all;
proc gchart data=orion.staff;
vbar Job_Title/subgroup=Gender;
where Job_Title =:'Sales Rep';

ia l
r
title 'Frequency of Job Title, Broken Down by Gender';

e
run;
quit;

a t
y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-18 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

9. Submit the ninth PROC GCHART step to group the bars.


A VBAR statement creates a vertical bar chart showing the frequency for each value of Gender.

The GROUP= option in the VBAR statement displays a separate set of bars for each value of
Job_Title.

The PATTERNID=MIDPOINT option in the VBAR statement displays each value of Gender
(the midpoint or chart variable) with the same pattern.
goptions reset=all;
proc gchart data=orion.staff;
vbar gender/group=Job_Title patternid=midpoint;
ia l
t e r
where Job_Title =:'Sales Rep';
title 'Frequency of Job Gender, Grouped by Job Title';

a
run;
quit;

y M n
a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
Submitting the following statements (reversing the chart and group variables) would produce a chart
with two groups of four bars:
proc gchart data=orion.staff;
vbar gender/group=Job_Title patternid=midpoint;
13.2 Creating Bar and Pie Charts 13-19

10. Submit the final PROC GCHART step to create multiple pie charts using RUN-group processing.
A PIE and a PIE3D statement in the same PROC GCHART step produces both a two-dimensional pie
chart and a three-dimensional pie chart showing the number of sales representatives for each value of
Job_Title.

The TITLE2 statements specify different subtitles for each chart. The NOHEADING option in
the PIE3D statement suppresses the FREQUENCY of Job_Title heading.
Because RUN-group processing is in effect, note the following:
• It is not necessary to submit a separate PROC GCHART statement for each graph.
• The WHERE statement is applied to both charts.
ia l
e r
• The TITLE statement is applied to both charts.

t
• A separate TITLE2 statement is used for each chart.
goptions reset=all;

M a
proc gchart data=orion.staff;
pie Job_Title;

r y o n
where Job_Title =:'Sales Rep';

i
t
title 'Frequency Distribution of Job Titles';

run;

e t a
title2 '2-D Pie Chart';

u
r i r i b
pie3d Job_Title / noheading;
title2 '3-D Pie Chart';

t S
run;
p
o dis SA
quit;

r
S P o r o f
A t f d e
S No tsi
ou
13-20 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.3 Creating Plots 13-21

13.3 Creating Plots

Producing Plots with the GPLOT Procedure


You can use the GPLOT procedure to plot one variable
against another within a set of coordinate axes.

General form of a PROC GPLOT step:

ia l
PROC GPLOT DATA=SAS-data-set;

t e r
PLOT vertical-variable*horizontal-variable </ options>;

a
RUN;
QUIT;

y M n
a r t i o
i e t b u
i
19

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-22 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Creating Plots
p113d02
1. Submit the first PROC GPLOT step to create a simple scatter plot.
The PROC GPLOT creates a plot displaying the values of the variable Yr2007 on the vertical axis

sign).

ia l
and Month on the horizontal axis. The points are displayed using the default plotting symbol (a plus

goptions reset=all;

t e r
A FORMAT statement assigns the DOLLAR12. format to Yr2007.

plot Yr2007*Month;

M a
proc gplot data=orion.budget;

format Yr2007 dollar12.;

run;

r y i o n
title 'Plot of Budget by Month';

quit;

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13.3 Creating Plots 13-23

2. Submit the second PROC GPLOT step to specify plot symbols and interpolation lines.
The SYMBOL statement specifies an alternate plotting symbol and draws an interpolation
line joining the plot points. The options in the SYMBOL statement are as follows:
• V= specifies the plotting symbol (a dot).
• I= specifies the interpolation method to be used to connect the points (join).
• CV= specifies the color of the plotting symbol.
• CI= specifies the color of the interpolation line.
The LABEL statement assigns a label to the variable Yr2007.
goptions reset=all;
ia l
proc gplot data=orion.budget;

t e r
plot Yr2007*Month / haxis=1 to 12;
label Yr2007='Budget';

M a
format Yr2007 dollar12.;
title 'Plot of Budget by Month';

run;
quit;
r y i o n
symbol1 v=dot i=join cv=red ci=blue;

t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
13-24 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

3. Submit the third PROC GPLOT step to overlay multiple plot lines on the same set of axes.
The PLOT statement specifies two separate plot requests (Yr2006*Month and Yr2007*Month),
which results in an overlay plot with the variables Yr2006 and Yr2007 on the vertical axis and
Month on the horizontal axis.

The options in the PLOT statement are as follows:


• The OVERLAY option causes both plot requests to be displayed on the same set of axes.
• The HAXIS= option specifies the range of values for the horizontal axis. (The VAXIS= option
can be used to specify the range for the vertical axis.)

ia l
• The VREF= option specifies a value on the vertical axis where a reference line should be drawn.

goptions reset=all;
t e r
• The CFRAME= option specifies a color to be used for the background within the plot axes.

a
proc gplot data=orion.budget;
plot Yr2006*Month yr2007*Month/ overlay haxis=1 to 12

M n
vref=3000000

y
cframe=”very light gray”;

r
label Yr2006='Budget';

a
format Yr2006 dollar12.;
t i o
i e t b u
title 'Plot of Budget by Month for 2006 and 2007';
symbol1 i=join v=dot ci=blue cv=blue;

run;

p r t i
symbol2 i=join v=triangle ci=red cv=red;

r S
o dis SA
quit;

P r f
S f o r o
A
S No tsit d e
ou
13.4 Enhancing Output 13-25

13.4 Enhancing Output

Enhancing SAS/GRAPH Output


SAS/GRAPH uses default values for colors, fonts, text
size, and other graph attributes. You can override these

l
defaults using the following methods:

ia
specifying an ODS style

r
specifying default attributes in a GOPTIONS statement

t
and procedure statements

a e
specifying attributes and options in global statements

y M n
a r t i o
i e t b u
i
22

p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-26 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Enhancing Output
p113d03
1. Submit the first PROC GPLOT step to use ODS styles to control the appearance of the output.
Specify the style in the ODS LISTING statement with the STYLE= option produces a different
ODS style.
ods listing style=gears;

ia l
r
goptions reset=all;

e
proc gplot data=orion.budget;
plot Yr2007*Month;

a
format Yr2007 dollar12.;
label Yr2007='Budget'; t
run;

y M
title 'Plot of Budget by Month';

n
quit;

a r t i o
i e t b u
p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13.4 Enhancing Output 13-27

2. Submit the second PROC GPLOT step to specify the options in the TITLE and FOOTNOTE
statements to control the text appearance.
The TITLE and FOOTNOTE statements override the default fonts, colors, height, and text
justification. The following options are used:
• F= (or FONT=) specifies a font.
• C= (or COLOR=) specifies text color.
• H= (or HEIGHT=) specifies text height. Units of height can be specified as a percent of the
display (PCT), inches (IN), centimeters (CM), cells (CELLS), or points (PT).
• J= (or JUSTIFY=) specifies text justification. Valid values are LEFT (L), CENTER (C),

ia l
r
and RIGHT (R).

ods listing style=gears;


goptions reset=all;
a t e
All options apply to text following the option.

y M
proc gplot data=orion.budget;

n
plot Yr2007*Month / vref=3000000;

r
label Yr2007='Budget';

a t
format Yr2007 dollar12.;
i o
i e t u
title f=centbi h=5 pct 'Budget by Month';
footnote c=green j=left 'Data for 2007';

b
run;
quit;

p r t r i S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
13-28 Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

3. Submit the third PROC GPLOT step to use a GOPTIONS statement to control the appearance
of the output.
A GOPTIONS statement specifies options to control the appearance of all text in the graph.
The following options are used:
• FTEXT= specifies a font for all text.
• CTEXT= specifies the color for all text.
• HTEXT= specifies the height for all text.
If a text option is specified both in a GOPTIONS statement and in a TITLE or FOOTNOTE
statement, the option specified in the TITLE or FOOTNOTE statement overrides the value in the

ia l
ods listing style=gears;

t e r
GOPTIONS statement only for that title or footnote.

a
goptions reset=all ftext=centb htext=3 pct ctext=dark_blue;
proc gplot data=orion.budget;

M
plot Yr2007*Month / vref=3000000;
label Yr2007='Budget';

y n
o
format Yr2007 dollar12.;

run; r t i
title f=centbi 'Budget by Month';

t a
quit;

i e i b u
p r t r S
r o dis SA
S P o r o f
A t f d e
S No tsi
ou
Chapter 14 Learning More

14.1 SAS Resources ............................................................................................................. 14-3

l
14.2 Beyond This Course ..................................................................................................... 14-6

ia
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
14-2 Chapter 14 Learning More

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
14.1 SAS Resources 14-3

14.1 SAS Resources

Objectives
Identify areas of support that SAS offers.

ia l
t e r
M a
r y i o n
t a u t
3

r i e r i b S
p
o dis SA t
P rEducation

f
r
Comprehensive training to deliver greater value to your

S
organization

f o e o
A
S No tsit d
More than 200 course offerings
World-class instructors
Multiple delivery methods: instructor-led and

ou
self-paced
Training centers around the world

http://support.sas.com/training/

4
14-4 Chapter 14 Learning More

SAS Publishing
SAS offers a complete selection of publications to help
customers use SAS software to its fullest potential:

Multiple delivery methods: e-books,


CD-ROM, and hard-copy books
Wide spectrum of topics
Partnerships with outside authors,

ia l
t e r
other publishers, and distributors

a
http://support.sas.com/publishing/

M
5

r y i o n
t a u t
r i e i b
SAS Global Certification Program
r S
p t
SAS offers several globally recognized certifications.

o dis SA
P r Computer-based

f
r
certification exams –

S f o o
typically 60-70 questions and

e
2-3 hours in length

A
S No tsit d
Preparation materials and
practice exams available
Worldwide directory of
SAS Certified Professionals

6
ou http://support.sas.com/certify/
14.1 SAS Resources 14-5

Support
SAS provides a variety of self-help and assisted-help
resources.

SAS Knowledge Base


Downloads and hot fixes
License assistance
SAS discussion forums

ia l
SAS Technical Support

t e r
a
http://support.sas.com/techsup/

M
7

r y i o n
t a u t
r i e
User Groups
r i b S
p t
SAS supports many local, regional, international, and

o dis SA
special-interest SAS user groups.

P r f
r
SAS Global Forum

S f o o
Online SAS Community: www.sasCommunity.org

e
A
S No tsit d
http://support.sas.com/usergroups/

8
ou
14-6 Chapter 14 Learning More

14.2 Beyond This Course

Objectives
Identify the next set of courses that follow this course.

ia l
t e r
M a
r y i o n
t a u t
10

r i e r i b S
p
o dis SA t
P r Next Steps

f
r
SAS® Programming 1: Essentials is the entry point

S f o e o
to most areas of the SAS curriculum.

A
SAS® Programming 1:

S No tsit
Applications
Development
Curriculum
d Essentials
Accessing and
Manipulating
Data

ou Web
Enablement
Curriculum Data
Warehousing
Statistical
Analysis
Presenting
Your
Information

Curriculum Curriculum

11
14.2 Beyond This Course 14-7

Next Steps
To learn more about this: Enroll in the following:

Reading and SAS® Programming 2:


manipulating data with Data Manipulation
the DATA step Techniques

Creating text-based
reports
SAS® Report Writing 1:
Using Procedures and
ODS

ia l
Creating graphic
reports with
SAS/GRAPH software

t e r SAS® Color Graphics

Processing data with


Structured Query

M a SAS® SQL 1: Essentials

12
Language (SQL)

r y i o n ...

t a u t
r i e
Next Steps
r i b S
p t
In addition, there are prerecorded short technical

o dis SA
discussions and demonstrations called e-lectures.

P r f
S f o r o
http://support.sas.com/training/

A
S No tsit d e
13
ou
14-8 Chapter 14 Learning More

ia l
t e r
M a
r y i o n
t a u t
r i e r i b S
p
o dis SA t
P r f
S f o r o
A
S No tsit d e
ou
Appendix A Index
concatenating data sets, 10-19–10-40
_ constant operand, 5-14
CONTAINS operator, 5-19
_ALL_ keyword, 8-28, 12-12
CONTENTS procedure, 4-4–4-5, 6-9–6-10
A browsing a data library, 4-23–4-32
NODS option, 4-23

ia l
r
accessing SAS data library, 4-16–4-32 CROSSLIST option
APPEND procedure, 10-8, 10-34

e
TABLES statement, 12-10
appending data sets, 10-7–10-16
like-structured, 10-11

a
unlike-structured, 10-12–10-16t D

M
data
arithmetic operators, 5-15–5-16, 9-6

r i o n
assignment statements, 8-51–8-55, 9-4–9-5

y
browsing with the PRINT procedure, 4-
10–4-12
cleaning invalid, 8-41–8-63

t
B

t a
bar charts, 13-3, 13-9–13-20

e u
cleaning programmatically, 8-50
compilation phase, 7-12–7-16

r i
batch mode, 2-9

t r b
with line plot overlay, 13-5

i S
errors, 8-4
execution phase, 7-17–7-23

p
BETWEEN-AND operator, 5-18 missing values, 4-8

o dis SA
BY statement, 10-47–10-50, 11-49 nonstandard, 7-9, 7-30–7-31

P r f
reading, 5-3–5-5
reading nonstandard delimited, 7-30–7-43

S
charts

f o r o
character variables, 4-6 standard, 7-9–7-10, 7-30
validating with the FREQ procedure, 8-

e
bar, 13-3 25–8-29

A t d
bar with line plot overlay, 13-5

S No tsi
creating bar and pie charts, 13-9–13-20
pie, 13-4
CLASS statement, 12-50
validating with the MEANS procedure, 8-
33–8-35
validating with the PRINT procedure, 8-
20–8-25

ou
MEANS procedure, 12-29 validating with the UNIVARIATE
specifying classification variables, 12-50 procedure, 8-35–8-37
classification variables validity and cleaning, 8-3–8-8
specifying, 12-50 data access
cleaning data, 8-3–8-8 data-driven tasks, 1-6
programmatically, 8-50, 8-55–8-56, 8-63 data analysis
UNIX example, 8-47–8-48 data-driven tasks, 1-6
Windows example, 8-44–8-46 data errors, 8-9–8-18
z/OS (OS/390) example, 8-49 data library
cleaning invalid data, 8-41–8-63 browsing, 4-22–4-32
combining data sets, 10-3–10-6 browsing in UNIX, 4-28–4-30
comments browsing in Windows, 4-24–4-27
in SAS code, 3-8 browsing in z/OS (OS/390), 4-31–4-32
comparison operators, 5-14–5-15 data management
COMPRESS option data-driven tasks, 1-6
FREQ procedure, 12-11 data presentation
A-2 Index

data-driven tasks, 1-6 E


data set options
Editor window, 2-12
IN=, 10-78–10-84
ELSE statements, 9-33, 9-40
data sets
Enhanced Editor, 2-12
adding formats, 5-29–5-40
Excel
adding labels, 5-29–5-40
creating files that open in, 11-77–11-79
appending, 10-7–10-16
Output Delivery System destinations, 11-
combining, 10-3–10-6
72–11-77
concatenating, 10-19–10-40

l
Excel worksheets
eliminating duplicates, 10-52

ia
creating a SAS data set, 6-3–6-16
interleaving, 10-35–10-40
creating from SAS data sets, 6-20–6-35

r
like-structured, 10-20–10-28
reading in Windows, 6-15–6-16

e
match-merging, 10-45–10-46
EXPORT procedure, 6-23
merging, 10-4

a t
merging one-to-many, 10-53–10-63
merging one-to-one, 10-44–10-52
F

names, 4-9

y M
merging with non-matches, 10-66–10-88

n
FILE command, 3-14, 3-18
saving programs, 3-14, 3-18

r
output, 12-56–12-59

a t i o
outputting to multiple, 8-17–8-18, 10-85–
FOOTNOTE statement
SAS statements, 11-10–11-13
10-86

e t b u
renaming with RENAME= option, 10-31

i
FORMAT option
TABLES statement, 12-10

r
terminology, 4-10

t r i
unlike-structured, 10-28

p S
FORMAT procedure
creating user-defined formats, 11-33–11-
42

o dis SA
DATA statement, 5-9, 7-6
DROP= option, 9-24–9-26 VALUE statement, 11-33

P r
KEEP= option, 9-24–9-26
DATA step, 5-6–5-11

f
FORMAT statement, 5-29–5-40
SAS statements, 11-25

S o r
processing, 7-12

f o
assignment statements, 9-4–9-5 formats
assigning permanent, 11-27–11-28

A data-driven tasks

t d
data access, 1-6

S No tsi e
data analysis, 1-6
data management, 1-6
assigning temporary, 11-26–11-28
user-defined, 11-32–11-42
FREQ procedure, 8-7, 12-3–12-20
COMPRESS option, 12-11
data presentation, 1-6 NLEVELS option, 12-11

ou
date functions, 9-9–9-10 PAGE option, 12-11
date values, 4-7 TABLES statement, 12-4–12-6
delimiter validating data, 8-25–8-29
default, 7-9 frequency reports
dimension expression, 12-53–12-54 creating, 12-12
DO groups, 9-35–9-37 one-way, 12-4
documentation, 2-24 frequency tables
DROP statement, 5-23, 9-11 producing and enhancing, 12-3–12-20
processing, 9-14–9-24 FSEDIT window
subsetting variables, 5-12–5-25 using to clean data, 8-49
DROP= option functions
DATA statement, 9-24–9-26 date, 9-9–9-10
DSD option sum, 9-8
INFILE statement, 7-39–7-40 fundamentals of SAS statements, 3-3–3-9
Index A-3

G L
G3D procedure LABEL statement, 5-29–5-40, 11-21
SAS/GRAPH software, 13-5 labels
GBARLINE procedure assigning permanent, 11-24
SAS/GRAPH software, 13-5 assigning temporary, 11-21–11-23
GCHART procedure LENGTH statement, 7-24, 9-38
SAS/GRAPH software, 13-3–13-4, 13-9– LIBNAME statement, 4-35–4-39
13-20 assigning a libref, 4-19

l
GCONTOUR procedure SAS/ACCESS, 6-7–6-8

ia
SAS/GRAPH software, 13-6 libref
GMAP procedure assigning, 4-17, 4-19, 4-36
SAS/GRAPH software, 13-6
GPLOT procedure

t e r
SAS/GRAPH software, 13-4, 13-21–13-24
LIKE operator, 5-20–5-21
line plots, 13-4
list input, 7-8
graph
multiple on a page, 13-7

M a LIST option
TABLES statement, 12-10
GREPLAY procedure

y
SAS/GRAPH software, 13-7

r i o n Log window, 2-12, 2-13


logical operators, 5-16–5-17

t a u t M

I
r i e
Help facility, 2-24

r i b S
maps
GMAP procedure, 13-6
match-merging, 10-45–10-46

p
o dis SA t
IF-THEN DELETE statements, 9-50
MEANS procedure, 8-7, 12-27–12-41
CLASS statement, 12-29

r
IF-THEN statements, 8-57–8-63, 9-31
IF-THEN/ELSE DO statements, 9-35

P f
output data sets, 12-34–12-39
OUTPUT statement, 12-35–12-39

r
IF-THEN/ELSE statements, 9-31, 9-34

o
validating data, 8-33–8-35

S
IMPORT procedure, 6-29–6-30

A f o e
Import wizard, 6-23–6-30

t
IN= data set option, 10-78–10-84
VAR statement, 12-28
MERGE statement, 10-49–10-50

S No tsi d
INCLUDE command, 2-17
INFILE statement, 7-6–7-7
DSD option, 7-39–7-40
merging data sets, 10-4
one-to-many, 10-53–10-63
one-to-one, 10-44–10-52
with non-matches, 10-66–10-88

ou
MISSOVER option, 7-42
missing data values, 4-8
informats, 7-31–7-33
MISSOVER option
INPUT statement, 7-7
INFILE statement, 7-42
interleaving data sets, 10-35–10-40
invalid data
N
cleaning, 8-41–8-63
IS MISSING operator, 5-19 names
IS NULL operator, 5-19 data set and variable, 4-9
NLEVELS option, 8-28–8-29
K FREQ procedure, 12-11
NOCOL option
KEEP statement, 5-23, 9-11
TABLES statement, 12-8
processing, 9-14–9-24
NOCUM option
subsetting variables, 5-12–5-25
TABLES statement, 12-8
KEEP= option
NOFREQ option
DATA statement, 9-24–9-26
TABLES statement, 12-8
noninteractive mode, 2-10
A-4 Index

NOPERCENT option OUT= option, 12-17–12-19, 12-35–12-39


TABLES statement, 12-8 Output window, 2-12, 2-14, 2-18
NOPRINT option
TABLES statement, 8-28, 12-12 P
NOROW option
PAGE option
TABLES statement, 12-8
FREQ procedure, 12-11
numeric variables, 4-6
pie charts, 13-4, 13-9–13-20
plots
O

l
creating with the GPLOT procedure, 13-
observations 21–13-24
subsetting, 9-44–9-50

r
subsetting and grouping, 11-46–11-51
ODS

e
line, 13-4
scatter, 13-4
three-dimensional contour, 13-6 ia
a
one-way frequency reports, 12-4 t
See Output Delivery System, 11-55 three-dimensional surface and scatter, 13-
5

M
operands, 9-5 PRINT procedure, 4-10–4-12, 6-11, 8-6
constant, 5-14
variable, 5-14
operators
r y i o n validating data, 8-20–8-25
PROC MEANS statement
computing and displaying statistics, 12-

t a
arithmetic, 5-15–5-16, 9-6
BETWEEN-AND, 5-18
u t 31–12-34
Program Editor, 2-12, 2-17

r i e
comparison, 5-14–5-15
CONTAINS, 5-19

r i b S
Q

p
IS MISSING, 5-19

o dis SA
IS NULL, 5-19
t quotation marks
balanced, 3-16–3-18

P r
LIKE, 5-20–5-21
logical, 5-16–5-17

f
unbalanced, 3-16

S
OPTIONS

f o r
WHERE, 5-18

o
R
raw data files

e
SAS statements, 11-6

A t
OUT= option

S No tsi d
OUTPUT statement, 12-17–12-19, 12-35–
12-39
TABLES statement, 12-14–12-17
errors when reading, 8-9–8-18
reading data, 5-3–5-5
relational database
accessing, 4-35–4-39

ou
TABULATE procedure, 12-56–12-59 RENAME= option, 10-31
output reports
directing to external files, 11-55–11-82 creating, 11-3–11-15
enhancing with SAS/GRAPH software, enhancing, 11-5–11-15, 11-20–11-28
13-25–13-28 tabular, 12-46–12-59
output data sets, 12-56–12-59 RUN-group processing
MEANS procedure, 12-34–12-39 SAS/GRAPH software, 13-8
Output Delivery System, 11-55–11-58
creating HTML, PDF, and RTF files, 11- S
64–11-67 SAS code
destinations used with Excel, 11-72–11-77 comments, 3-8
HTML, PDF, and RTF destinations, 11- SAS data library
58–11-60 accessing, 4-16–4-32
STYLE= option, 11-68 permanent, 4-18
OUTPUT statement temporary, 4-18
MEANS procedure, 12-35–12-39 SAS data sets, 4-3
Index A-5

created from delimited raw data files, 7- SET, 10-20


3–7-26 subsetting IF, 9-46
creating with the DATA step, 5-6–5-11 TITLE, 11-10–11-13
descriptor portion, 4-4–4-5, 4-6 WHERE, 9-45, 11-46–11-48
SAS date formats, 5-36–5-40 SAS syntax rules, 3-5–3-7
SAS date values, 4-7 SAS variable values, 4-6
SAS documentation, 2-24 SAS windowing environment, 2-8, 2-12
SAS Enterprise Guide, 2-9 SAS/ACCESS
SAS Explorer window LIBNAME statement, 4-36, 6-7–6-8
browsing a data library, 4-22
SAS filename
SAS/ACCESS Interface for PC File
Formats, 6-8

ia l
r
temporary, 4-21 SAS/GRAPH software, 13-3–13-8

e
SAS formats, 5-33–5-40 enhancing output, 13-25–13-28
SAS functions, 8-52, 9-8
SAS Help facility, 2-24
SAS informats, 7-31–7-33
a t G3D procedure, 13-5
GBARLINE procedure, 13-5
GCHART procedure, 13-3–13-4, 13-9–13-
SAS names

y M
data set and variable, 4-9
n
20
GCONTOUR procedure, 13-6
SAS programs

a r
debugging, 3-12–3-14

t i o GMAP procedure, 13-6


GPLOT procedure, 13-4, 13-21–13-24

e t
editor windows, 2-13
examples, 2-4–2-5

i b u
GREPLAY procedure, 13-7
RUN-group processing, 13-8

r
fundamentals, 3-3–3-9
including, 2-17

p t r i S
saving programs, 3-14, 3-18
FILE command, 3-14, 3-18

o dis SA
introduction, 2-3–2-9 scatter plots, 13-4
noninteractive mode, 2-10 SET statement, 5-9, 10-20, 10-34

P r
running in batch mode, 2-9
saving, 3-14, 3-18

f
SORT procedure, 10-46, 10-52
BY statement, 10-47–10-48

S o r
submitting, 2-11–2-14

f o
step boundaries, 2-6–2-7 specifying classification variables
CLASS statement, 12-50

A t
16–2-20

S No tsi d e
submitting under Windows, 2-16–2-20, 2-

submitting under z/OS (OS/390), 2-25–2-


29
statistics
computing and displaying with PROC
MEANS, 12-31–12-34
TABULATE procedure, 12-54–12-55
SAS sessions SUBMIT command, 2-18

ou
starting, 2-16, 2-21, 2-25, 2-30 subsetting IF statement, 9-46
SAS statements subsetting observations, 9-44–9-50
BY, 11-49 WHERE statement, 5-12–5-25
DROP, 9-11 subsetting variables
ELSE, 9-33, 9-40 DROP statement, 5-12–5-25
FOOTNOTE, 11-10–11-13 KEEP statement, 5-12–5-25
FORMAT, 11-25 SUM function, 9-8
global, 11-3–11-15 SUMMARY procedure, 12-41
IF-THEN, 9-31 syntax
IF-THEN DELETE, 9-50 diagnosing and correcting errors, 3-10–3-
IF-THEN/ELSE, 9-31, 9-34 18
IF-THEN/ELSE DO, 9-35 syntax rules, 3-5–3-7
KEEP, 9-11
LABEL, 11-21 T
LENGTH statement, 9-38
TABLE statement, 12-49
OPTIONS, 11-6
A-6 Index

tables UPCASE function, 8-52


creating dimensional reports, 12-46–12-59 user-defined formats
frequency, 12-3–12-20 creating, 11-32–11-42
suppressing display of statistics, 12-8
TABLES statement V
CROSSLIST option, 12-10
validating data, 8-3–8-8
FORMAT option, 12-10
VALUE statement
FREQ procedure, 12-4–12-6
FORMAT procedure, 11-33
LIST option, 12-10

l
VAR statement, 12-51
NOCOL option, 12-8
MEANS procedure, 12-28

ia
NOCUM option, 12-8
variable names, 4-9

r
NOFREQ option, 12-8
assigning descriptive labels, 11-21–11-25

e
NOPERCENT option, 12-8
variable operand, 5-14
NOROW option, 12-8
OUT= option, 12-14–12-17

a
TABULATE procedure, 12-46–12-59 t variables
character, 4-6

M
creating, 9-3–9-26
OUT= option, 12-56–12-59

terminology

r y
statistics, 12-54–12-55

i o n creating conditionally, 9-30–9-40


dropping from output data set, 5-23–5-24
numeric, 4-6

t a
SAS data sets, 4-10

u t
three-dimensional contour plots, 13-6
writing to output data set, 5-23–5-24
Viewtable window
13-5

r e r i b
three-dimensional surface and scatter plots,

i S
cleaning data interactively, 8-42–8-43

t
TITLE statement, 11-10–11-13

r p
o dis SA
W
WHERE operators, 5-18
WHERE statement, 7-36, 9-45, 11-46–11-48

P
unbalanced quotation marks, 3-16

S o r
UNIVARIATE procedure, 8-8

o f
validating data, 8-35–8-37
subsetting observations, 5-12–5-25
validating data, 8-21–8-23

A t f d e
S No tsi
ou

You might also like