You are on page 1of 14

Chapter 6

Objectives

In this 6th lesson, you learn to do the following:

use a DATA step to create a SAS data set from an existing SAS data set
subset observations by using the WHERE statement

create a new variable by using the assignment statement

subset variables by using the DROP and KEEP statements

describe the compilation and execution phases of the DATA step

store labels and formats in the descriptor portion of a SAS data set

What types of files can a DATA step read as input data?


a. SAS data sets
b. Microsoft Excel
worksheets
c. raw data files
d. all of the above

Incorrect.
The correct answer is d. A DATA step can read a SAS data set, an Excel worksheet, or
a raw data file as input data.

Examine the following program and then decide which statement is true.

data us;
set orion.sales;
where Country='US';
run;

a. The program reads a temporary data set and creates a permanent data set.
b. The program reads a permanent data set and creates a temporary data set.
c. The program contains a syntax error and will not execute.
d. The program will not execute because you cannot work with permanent and
temporary data sets in the same step.
Incorrect.
The correct answer is b. The DATA statement doesn't specify a libref, so it's creating the
temporary data set us. The SET statement reads orion.sales, which is a permanent data
set. There are no syntax errors in the program.

Write a statement to specify the data set emp.salary as the data set to be read.
data march.payroll;
;
run;

set emp.salary;

DATA output-SAS-data-set;
SET input-SAS-data-set;
RUN;

You specify the keyword SET and the two-level name of the data set (libref.filename) to be
read.

What is the result of the following assignment statement?

num=4+10/2;

a. .(missing)
b. 0
c. 7
d. 9

Incorrect.
The correct answer is d. The order of operations is division and multiplication, followed by
addition and subtraction. So 10 divided by 2 equals 5, and 5 plus 4 equals 9.

What is the result of this assignment statement given the values of var1 and var2?

num=var1+var2/2;

var1
.

var2
10

a. . (missing)
b. 0
c. 5
d. 10

Incorrect.
The correct answer is a. If an operand in an arithmetic expression has a missing value, the
result is a missing value.
. = . + 10/2

Suggested search terms: SOUNDS-LIKE operator or syntax of WHERE expression. The


SOUNDS-LIKE operator selects observations that contain a spelling variation of a specified

word or words. The operator uses the Soundex algorithm to compare the variable value and
the operand.
data work.tony;
set orion.customer_dim;
where Customer_FirstName=* 'Tony';
run;
The SAS System
Obs Customer_FirstName Customer_LastName
1 Tonie

Asmussen

2 Tommy

Mcdonald

If you submit the DATA step below, which variables appear in the work.mysubset data set?
Select all that apply.
data work.mysubset;
set mylib.salesforce;
drop Gender Salary;
run;

mylib.salesforce
Emp_ID

Name

Gender Salary

Job_Title

120102 Zhou, Tom

108255 Sales Manager

120103 Dawes,
Wilson

87975 Sales Manager

a. Emp_ID
b. Name
c. Gender
d. Salary
e. Job_Title

Correct.
The DATA step omits two variables from the output data set: Gender and Salary. The
remaining three variables are included.

Type the letter of the word or phrase on the right that completes the statements on the left.

When you submit a DATA step, SAS processes the


step in the __________________ phase first.

a. descriptor portion

When you submit a DATA step, SAS processes the


step in the __________________ phase second.

b. compilation

During the compilation phase, SAS creates the


_____________ of the output data set.

c. execution

During the execution phase, SAS creates the


_____________ of the output data set.

d. data portion

Correct.
The compilation phase precedes the execution phase. SAS creates the descriptor portion of
the data set during the compilation phase and the data portion during the execution phase.
Copy and paste this program, which includes a WHERE statement to subset on
the Bonus amount, into the editor and submit it.
Reminder: Make sure you've defined the orion library.
data work.subset1;
set orion.sales;
Bonus=Salary*.10;
where Country='AU' and
Bonus>=3000;
run;
proc print data=work.subset1;
run;

Is the output data set created successfully?


a. yes
b. no

Incorrect.
The correct answer is b. No, the output data set is not created successfully. The log
contains an error message, and SAS stopped processing the step. Because Bonus is a
new variable being created in this DATA step and is not in orion.sales, it cannot be used in
a WHERE statement. SAS stopped processing the DATA step.

When you use the subsetting IF statement, how are observations excluded?
a. If the expression is true, SAS excludes the observation from the input data set.
b. If the expression is false, SAS excludes the observation from the output data set.
c. If the expression is false, SAS excludes the observation from the PDV.
d. If the expression is true, SAS excludes the observation from the PDV.

Correct.
When the expression is false, SAS excludes the observation from the output data set and
continues processing.

Copy and paste this program into the editor. This program is a variation of the program in
the previous demonstration, with both conditions combined into a single subsetting IF.
Submit the program and review the log and results.
Reminder: Make sure you've defined the orion library.
data work.auemps;
set orion.sales;
Bonus=Salary*.10;
if Country='AU' and Bonus>=3000;
run;
proc print data=work.auemps;
run;

Are the results the same as what you saw in the previous demonstration?
a. yes
b. no

Correct.
The log and results are the same, but the processing isn't as efficient. SAS reads all 165
observations from orion.sales rather than 63 observations in the previous program. You
should subset as early as possible in your program for more efficient processing.

Select the situation(s) in which you can use the WHERE statement to subset observations.
Select all that apply.
a. in a PROC step
b. in a DATA step, when the variable in the condition is created
c. in a DATA step, when the variable in the condition is in the input data set

Correct.
You can use a WHERE statement to subset observations in situations a and c. A subsetting
IF statement can be used in situations b and c.

If you submit this program, which of the following column headings will display
for Job_Title in the resulting report?
data work.us;
set orion.sales;
where Country='US';
Bonus=Salary*.10;
label Job_Title='Sales Title';
drop Employee_ID Gender Country Birth_Date;
run;
proc print data=work.us label;
label Job_Title='Title';
run;

a. Sales Title
b. Job_Title
c. Title

Correct.
The column heading will be Title, the label specified in the PROC PRINT step. Labels and
formats that you specify in PROC steps override the permanent labels in the current step.
However, the permanent labels are not changed.
Practice Report

The SAS System


The CONTENTS Procedure
Data Set Name

WORK.INCREASE

Observations

10

Member Type

DATA

Variables

Engine

V9

Indexes

Created

05/14/2014 16:12:44

Observation Length

40

Last Modified

05/14/2014 16:12:44

Deleted Observations 0

Protection

Compressed

NO

Data Set Type

Sorted

NO

Label
Data Representation WINDOWS_64
Encoding

wlatin1 Western (Windows)

Engine/Host Dependent Information


Data Set Page Size

65536

Number of Data Set Pages

First Data Page

Max Obs per Page

1632

Obs in First Data Page

10

Number of Data Set Repairs 0


ExtendObsCounter

YES

Filename

<filepath>

Release Created

9.0401M1

Host Created

X64_7PRO

Alphabetic List of Variables and Attributes


# Variable

Type Len Format

Informat Label

3 Emp_Hire_Date Num

8 DATE9.

DATE9.

Hire Date

1 Employee_ID

Num

8 12.

4 Increase

Num

5 NewSalary

Num

New Annual Salary

2 Salary

Num

8 DOLLAR12.

Annual Salary

Employee ID

The SAS System


The CONTENTS Procedure
Data Set Name

WORK.INCREASE

Observations

10

Member Type

DATA

Variables

Engine

V9

Indexes

Created

05/14/2014 16:22:19

Observation Length

40

Last Modified

05/14/2014 16:22:19

Deleted Observations 0

Protection

Compressed

NO

Data Set Type

Sorted

NO

Label
Data Representation WINDOWS_64
Encoding

wlatin1 Western (Windows)

Engine/Host Dependent Information


Data Set Page Size

65536

Number of Data Set Pages

First Data Page

Max Obs per Page

1632

Obs in First Data Page

10

Number of Data Set Repairs 0


ExtendObsCounter

YES

Engine/Host Dependent Information


Filename

<filepath>

Release Created

9.0401M1

Host Created

X64_7PRO

Alphabetic List of Variables and Attributes


# Variable

Type Len Format

Informat Label

3 Emp_Hire_Date Num

8 DATE9.

DATE9.

Hire Date

1 Employee_ID

Num

8 12.

4 Increase

Num

8 COMMA5.

5 NewSalary

Num

8 DOLLAR10.2

New Annual Salary

2 Salary

Num

8 DOLLAR10.2

Annual Salary

Employee ID

The SAS System


Obs Employee
ID

Annual
Salary

Hire Increase
Date

New
Annual
Salary

120128 $30,890.00 01NOV2010

3,089 $33,979.00

120144 $30,265.00 01OCT2010

3,027 $33,291.50

120161 $30,785.00 01OCT2010

3,079 $33,863.50

120264 $37,510.00 01DEC2010

3,751 $41,261.00

120761 $30,960.00

01JUL2010

3,096 $34,056.00

120995 $34,850.00 01AUG2010

3,485 $38,335.00

121055 $30,185.00 01AUG2010

3,019 $33,203.50

121062 $30,305.00 01AUG2010

3,031 $33,335.50

121085 $32,235.00

01JAN2011

3,224 $35,458.50

10

121107 $31,380.00

01JUL2010

3,138 $34,518.00

The SAS System

The CONTENTS Procedure


Data Set Name

WORK.DELAYS

Observations

Member Type

DATA

Variables

Engine

V9

Indexes

Created

05/14/2014 16:39:57

Observation Length

40

Last Modified

05/14/2014 16:39:57

Deleted Observations 0

Protection

Compressed

NO

Data Set Type

Sorted

NO

Label
Data Representation WINDOWS_64
Encoding

wlatin1 Western (Windows)

Engine/Host Dependent Information


Data Set Page Size

65536

Number of Data Set Pages

First Data Page

Max Obs per Page

1632

Obs in First Data Page

Number of Data Set Repairs 0


ExtendObsCounter

YES

Filename

<filepath>

Release Created

9.0401M1

Host Created

X64_7PRO

Alphabetic List of Variables and Attributes


# Variable

Type Len Format Label

2 Customer_ID

Num

8 12.

Customer ID

4 Delivery_Date Num

8 DATE9. Date Delivered

1 Employee_ID

Num

8 12.

3 Order_Date

Num

8 DATE9. Date Ordered

5 Order_Month

Num

Employee ID

Month Ordered

The SAS System


Obs Employee_ID Customer_ID Order_Date Delivery_Date Order_Month
1

99999999

70187

08/13/2007

08/18/2007

99999999

52

08/20/2007

08/26/2007

99999999

16

08/27/2007

09/04/2007

99999999

61

08/29/2007

09/03/2007

99999999

2550

08/10/2008

08/15/2008

99999999

70201

08/15/2008

08/20/2008

99999999

08/10/2009

08/15/2009

99999999

71

08/30/2010

09/05/2010

99999999

70201

08/24/2011

08/29/2011

CHAPTER 7

Reading Spreadsheet and Database Data

Objectives

In this lesson, you learn to do the following:

assign a libref to a Microsoft Excel workbook using the SAS/ACCESS


LIBNAME statement
access an Excel worksheet as though it is a SAS data set using a SAS twolevel name

use the DATA step to create a SAS data set that contains a subset of
worksheet data

assign a libref to an Oracle database using the SAS/ACCESS LIBNAME


statement

access an Oracle table using a SAS-two-level name

create a SAS data set that contains a subset of an Oracle table

The SAS System


The CONTENTS Procedure
Data Set Name

WORK.BIGDONATIONS

Observations

50

Member Type

DATA

Variables

Engine

V9

Indexes

Created

05/14/2014 16:45:33

Observation Length

56

Last Modified

05/14/2014 16:45:33

Deleted Observations 0

Protection

Compressed

NO

Data Set Type

Sorted

NO

Label
Data Representation WINDOWS_64
Encoding

wlatin1 Western (Windows)

Engine/Host Dependent Information


Data Set Page Size

65536

Number of Data Set Pages

First Data Page

Max Obs per Page

1167

Obs in First Data Page

50

Number of Data Set Repairs 0


ExtendObsCounter

YES

Filename

<filepath>

Release Created

9.0401M1

Host Created

X64_7PRO

Alphabetic List of Variables and Attributes


# Variable

Type Len Format Label

1 Employee_ID Num

8 12.

Employee ID

Alphabetic List of Variables and Attributes


# Variable

Type Len Format Label

7 NumQtrs

Num

2 Qtr1

Num

First Quarter

3 Qtr2

Num

Second Quarter

4 Qtr3

Num

Third Quarter

5 Qtr4

Num

Fourth Quarter

6 Total

Num