You are on page 1of 14

Chapter 4 - Preparing Data for

Analysis

Bradley Paulse
Office 1.87 (Cams)
brpaulse@uwc.ac.za
Labelling Variables

• Assign labels in the data step.

• Make sure that your label is descriptive of the variable.

• If the data set is permanent then the labelling is permanent.

• Assign labels using the label’ statement in the data step.

• Output the labels in the proc print step using the label’ option.

*dlabel.sas,

*dlabel2.sas

*dlabel3.sas
Creating New Variables

• Performed or calculated in the data step.

• Variables are created/assigned when the program is run.

• The order of numeric calculations is important.

• Constant variables can also be created.

*dcalc.sas

*dlabel3
SAS Functions

• Functions that can be used when assigning variables

• Form: variable = function(argument1, argument2,…);

• Some important functions:


o S = INT(x);
o S = MAX(x1, x2, x3, …);
o S = MIN(x1, x2, x3, …);
o S = SUM(x1, x2, x3, …);
o S = ROUND(x,round);
o S = INTCK(‘interval’,startdate, enddate);

*time.sas
SAS Functions

• Some important functions:


o S = INTCK(‘interval’,startdate, enddate);
o S = MDY(month, day, year);

*ddates.sas

ddates2.sas
Conditional Statements

• IF-THEN ELSE is a conditional statement that creates variables based on


the evaluation of the expression.

• If the criteria for the expression are not met then ELSE statement is
executed.
Conditional Statements

• See page 86 for a list of SAS comparison operators

• Multiple expressions can be joined using either AND or OR


Conditional Statements

• Can also stack the conditional statements by adding the ELSE-IF


statement.

• Missing values are assigned by a “.”

*dcondition.sas
Sub setting data sets

• Smaller data sets can be created from a larger main data set by using the
SET statement along with the conditional IF statement.

• Observations are only excluded from the output data set.


Selection of variables

• Variables can be included or excluded from the output data set using the
DROP and KEEP statement in data step
• The DROP and KEEP statement only applies to the output data set.
Joining data sets

• Data sets can either be joined by appending or merging data sets to


create a new data set.
• By appending data sets, observations are merely added to a new data set.
• By merging data sets, observations can joined by a common variable.

• To APPEND data sets, use the SET statement.


• To MERGE data sets, use the MERGE statement with the BY statement.
Joining data sets

*dappend1.sas and dmerge.sas


Sorting data sets

• Data sets can be sorted ascending or descending using the PROC SORT
procedure
• Data sets can are sorted according to a single or multiple variables

• The default sorting sequence if it is not specified is ascending


• PROC SORT can either assign a newly sorted data set or sort the current
data set

*dsort.sas
READ THROUGH CHAPTER 4

You might also like