You are on page 1of 22

PART 2: DATA WRANGLING

Understanding Data
step programming
You will learn:

How the DATA step works


General form of the programming statements
Programming techniques
Reading Materials
Chapter 6 and Chapter 7,
Step-by-Step Programming with Base SAS Software. 2001. Cary,
NC: SAS Institute Inc.

You can find the programs used in these examples and the
data at the link given below:
https://documentation.sas.com/?docsetId=basess&docsetTarget=titlepage.
htm&docsetVersion=9.4&locale=en

You can also Google the name of the book and choose a link that
provides online documentation
Input data used for examples

missing value
Assignment statement
Consider below how a new variable is created and assigned the value of a
mathematical expression

Data Tours1;

Run;
New variable Expression
created
Adding information to some observations but
not others
The use of IF-THEN-ELSE conditions

Note that for 1st and 3rd observtions ‘BonusPoints’ variable has missing value
Making uniform changes to data without
creating new variables

Notice AirCost appears on


Both sides of equality sign
Efficient use of variables
Inefficient: creating variables that scatter information
Efficient use of variables

Efficient: using variables to contain maximum information

More info packed


in one variable
Defining enough storage space for variables
Use LENGTH statement if longest value of variable is not in the 1st assignment
Statement

LENGTH statement
1st assignment statement
Conditionally deleting observations
Working with
Numeric Variables

Chapter 7

You will learn the following:


Input data set for the examples

Numeric variables
Creating new variables By arithmetic expressions
Understanding how SAS handles missing values
Propagating missing values

When you use a missing value in an arithmetic expression, SAS


sets the result of the expression to missing. If you use that
result in another expression, the next result is also missing.

In SAS, this method of treating missing values is called


propagation of missing values.
Calculations Using SAS Functions

Rounding Values

The following assignment statement rounds the value of AirCost to


the nearest $50:

RoundAir = round(AirCost, 50);

The following statement calculates the total cost of each tour,


rounded to the nearest $100:

TotalCostR = round(AirCost + LandCost, 100);

SAS contains around 280 built-in numeric expressions called functions.


Calculating total cost when some values are
missing

An assignment statement creating TotalCost variable when some of its


Components are missing would generate missing values, for example

TotalCost = AirCost + LandCost ;

The SUM function would, however, base calculations only on


non-missing values:

CostSum = Sum(AirCost , LandCost) ;


Combining functions
Logical operators
Input data set used in examples
Comparing numerical variables using logical
operators

Note: Australia tour (>$2000) is deleted

You might also like