You are on page 1of 48

Chapter 13

Transforming Data with SAS Functions

Objectives
Learn to use a variety of SAS functions to perform the following
tasks:
Convert character (numeric) data to numeric (character)
data
Create SAS date values
Extract time intervals from a SAS date value
Perform calculations with date, datetime and time values
Extract, edit, concatenate, and search the values of
character strings
Replace, remove occurrences of a particular word within a
character string
General From of SAS Functions

SAS functions are build-in routines that enable to


complete a predefined tasks for data
manipulations.

General syntax of a SAS function:


Function-name(argument-1, .. <argument-n>);
Arguments may be:
Variables,
Constants,
Expressions,
Variable List
When arguments are in Arrays or
Variable List
Variable List is, for example, Var1 Var5 is the same as
Var1, Var2, Var3, Var4, Var5
Varx --Vary : consists of all variables from Varx to Vary.
The syntax of a SAS function involving Variable List or Arrays:

Function-name( OF variable list);


Example:
MEAN( OF Var1 - Var4) ; computes the mean of Var1 to Var4
MEAN( Var1 Var4); does not compute the mean of Var1 to
Var4; instead, if computes the average of Var1 MINUS Var4
Target Variables for SAS Functions
Target variable is the variable to which the result of a SAS function is
assigned. For Example:
Avg_score = Mean (of Quiz1 Quiz 5);
Avg_score is the target variable.

One important property of a Target Variable is the Variable Length.


The length depends on the function.

For Numeric Target Variable, the typical default length is 8.


However, for Character Target, it varied greatly. It can be from 1 to 200.
It is important to specify the LENGTH statement prior to the first
appearance of the Target Variable.

LENGTH char_var $ n num_var n;


Functions for Sample Statistics
Some useful syntax for computing sample statistic
using SUM function:
SUM (x1, x2, x3, x4);
SUM(of x1 x4);
SUM(of x -- y);
SUM (y, z, of x1 x4);
SUM (4, 24, 10, 6);

NOTE: Missing values are ignored in the computation.


Other useful functions for computing
sample statistics
MEAN
MEDIAN
MIN
MAX
VAR : variance
STD : standard deviation
N : num of non missing
NMISS : num of missing
RANGE : max - min
IQR : Q3 Q1, (3rd quartile 1st quartile)
PCTL : (percentile, numeric list); Compute the
percentile from the numeric list.
Exercise 1
Run the following program and observe how SAS functions work.

Data quiz;
input name $ 1-5 q1 6-9 q2 10-12 q3 13-15 q4 16-18 q5 19-21;
/* COMPUTE SUM AND AVERAGE OF QUIZ SCORES FOR EACH STUDENT USING FUNCTIONS */

TOTQUIZ=SUM(Q1,Q2,Q3,Q4,Q5); AVGQUIZ=MEAN(Q1,Q2,Q3,Q4,Q5); SUMQUIZ=SUM(OF Q1-Q5);


meanq=sumq/5; MEANQUIZ=MEAN(OF Q1-Q5);
/*NOTE: The following statement computes the difference of Q1 - Q5, NOT sum of Q1 to Q5.*/
DIFFQ1Q5=SUM(Q1-Q5);
/* NOTE: If we use the assignment statement, then, missing cases will make the summary statistics a missing value as well.*/
asum = q1+q2+q3+q4+q5; amean = (q1+q2+q3+q4+q5)/5;
datalines;
AAA 15 16 20 16 20
AAB 12 14 17 13 12
AAC 19 17 16 19 13
AAD 20 20 18 10 17
AAD 17 18 17 18 16
AAE 15 . 20 14 18
AAF 20 15 20 . 15
AAG 18 14 19 12 20
AAH 20 15 18 14 19
AAI 18 15 19 14 20
;
proc print; run;
Convert Character to Numeric using
INPUT function
The function used to convert character to Numeric type:

INPUT(Source, Informat);
Source is the character variable, constant, or expression to be
converted.
Informat is the format to INPUT (to read) the character into numeric.

The informat is particularly important if the character variable involves


with nonstandard numeric data values for a character variable.
For example: a variable payment is defined as character variable, and
the data values are stored as $4,624.75

The informat is dollar9.2 , which is the format to read nonstandard


numeric data values such as this. The following INPUT function
converts the payment variable to numeric variable:

Num_Pay = INPUT(payment , dollar9.2)


Automatic Conversion from Character to
Numeric WITOHUT INPUT function
For example: Salary = payrate*hours;
Suppose payrate is read as a character variable:
SAS first create a temporary numeric value for each character
value.
if the character value of payrate can be converted into a valid
numeric value, the temporary numeric value is used in the
computation.
If the character value of payrate can NOT be converted into a valid
numeric value, then INPUT function is required in order to have a
valid numeric value.
Examples for automatic conversation
from Character to Numeric
Character Value Numeric Values by Automatic
conversation
24.5 24.5
-12.6 -12.6
1.45E2 145
2,595.4 .
$14,605.34 .
Examples using INPUT function

Character values INPUT Converted


for Pay variable (Charater, Informat) Numeric Values
24.5 INPUT(Pay, 4.1) 24.5
-12.6 INPUT(Pay, 5.1) -12.6
1.45E2 INPUT(Pay, 6.2) 145
2,595.4 INPUT(Pay, comma7.1) 2595.4
$14,605.34 INPUT(Pay, dollar10.2) 14605.34
Convert Numeric Variable to Character
Variable using PUT function
The PUT SAS function conducts numeric-to-character conversation:
PUT(source , Format);
Source is the numeric variable to be converted to character.
Format is the format for to write the source into a character
string.
The format must agree with the source type. Since the source is
Numeric variable, the format MUST be a numeric format.
Automatic Numeric-to-character
conversation
This is similar to character to-numeric conversation,
numeric data values are converted to character values
when they are used in character context.
The format used for automatic conversation is BEST12.
format for writing the numeric into character value, and
then the resulting character data value is
RIGHT-ALIGNED.
A LENGTH problem when using Automatic
Numeric-Character conversation
NOTE: if the numeric value has less than 12 digits, then, since it is
right-aligned, there will have some leading blanks.
Fro example, The following is a case of a raw data:
ZIP (in numeric ) address (in character)
48859 PE109, CMU

We want to concatenate these together as


PE109, CMU 48859
Com_Address = address || ZIP;

NOTE: || is the operator to concatenate strings together.


The result will be : PE109, CMU 48859
NOTE: there will have 7 leading blanks in between CMU and 48859
Exercise 2
Run the following program 1 and observe the results and variable attributes.
data C_N_Conv;
cv1='542.3'; cv2='1.456E2';
cv3='2,368'; cv4='$6,421.5';
N_Cv1 = INPUT(cv1, 5.1); N_cv2 = INPUT(cv2, 7.3);
N_cv3 = INPUT(cv3, comma5.0); N_cv4 = INPUT(cv4, dollar8.1);
proc contents varnum; run; proc print; run;

Run the following program 2 and observe the results and variable attributes.
data N_C_Conv;
var1=245; var2=124.6; var3=1245;
C_var1 =put(var1, 4.); C_var2 = put(var2,6. ); C_var3 = put(var3,
7.1);
proc contents varnum; run; proc print; run;
Manipulating SAS Date Values with
Functions
Recall :
SAS date is numeric data value defined starting at 1/1/1960 as
date value 0.
Ex: 1/30/1960 has the date value 29.
SAS time defines the relative time in a given date in 24 hours
span, and store the time as the number of seconds since mid-
night (00:00:00 to be 0 second of the date).
Ex: For any given date, say today, 1:30:25 am has the time value in
seconds: 5425 seconds.
SAS datetime is the absolute time counting in seconds starting
from the mid-night on 1/1/1960.
SAS Date, Time Functions to create
numeric SAS date, time values
MDY(Mon, Day, Year) : result in a SAS date value
NOTE: if you use two digits year, the default year-cutoff is applied
(1920).
Ex: MDY(11, 1, 15); is the date value of Nov 1st, 2015.
MDY(11, 1, 35); is the date value of Nov 1st, 1935.
MDY (11, 1, 1915); is the date value of Nov 1st, 1915.

TODAY(): gives todays date value


DATE(): gives todays date value
TIME(): gives current time as a SAS time (in seconds)
DATETIME() gives current datetime as a SAS datetime (in
seconds).
SAS Functions to extract Months, Quarter,
Days, Years from SAS date values

DAY(date) gives day of month (1 to 31)


QTR(date) gives quarter in the year of the date (1 to 4)
YEAR(date) gives the year of the date (4 digit year)
WEEKDAY(date) gives the day of week (1 to 7; 1 is
Sunday, and so on).
MONTH(date) gives the month of the date (1 to 12)
SAS Function, INTCK, for finding the number of
time intervals occurred in a given time span
The following function counts the # of time intervals in a given
time span.

INTCK(interval, from, to);

The possible time intervals can be:


DAY, WEEKDAY, WEEK, TENDAY, SEMIMONTH, MONTH, QTR,
SEMIYEAR, YEAR
From: specifies a SAS date, time or datetime value that identifies
the beginning of the time span.
TO: specifies a SAS date, time or datetime value that identifies the
end of the time span.
Some rules for using INTCK function
It counts the # of intervals crossed between the FROM and TO.
Partial intervals are not counted.

INTCK SAS statement Value


Weeks1=INTCK(week , 31DEC2009d , 01JAN2010d) 0

Months=INTCK(Month , 31DEC2009d , 01JAN2010d) 1


Years=INTCK(Year , 31DEC2009d , 01JAN2010d) 1
Week2=INTCK(week , 31DEC2009d , 03JAN2010d) 1
The INTNX Function determines the time based on start-
from time and increments of the intervals
General Syntax:
INTNX(interval, start-from, increment <,alignment>);
The function returns a SAS date, time or datetime values
Interval can be : DAY, WEEKDAY, WEEK, TENDAY, SEMIMONTH,
MONTH, QTR, SEMIYEAR, YEAR
Start-from: specifies the starting SAS date, time, datetime.
Increment: specifies a negative (back to the past) or positive
integer (to the future).
Alignment: forces the alignment of the returned date to be the
beginning (b), middle (m), or end (e) of the time interval. The
default is the beginning.
How does INTNX works?

The following shows some examples of using


INTNX function:
SAS INTNX function Result
INTNX(month, 01NOV2010d, 5); 18718 (April 1, 2011)
INTNX(month, 01NOV2010d, 5, b); 18718 (April 1, 2011)
INTNX(month, 01NOV2010d, 5, m); 18732 (April 15, 2011)
INTNX(month, 01NOV2010d, 5, e); 18747 (April 30, 2011)
Calculating Date difference and Year
difference between two dates
DATDIF counts # of dates between two dates.
YRDIF counts # of years between two dates.
General Syntax:
DATDIF(Start_date, End_date, basis);
YRDIF(Start_date, Eend_date, basis);
Start_Date specifies the starting date as a SAS date value.
End_Date specifies the end date as a SAS date value.
Basis is a string specifies the basis for calculating the date or
year difference. The basis is n/m , where n is the # of days
per months, and m is number of days per year. For example,
30/360 uses 30 days per months to calculate # of months,
and use 360 days to calculate # of years.
Possible basis for DATDIF and YRDIF
The following is the basis that can be applied:

Basis Meaning Valid in Valid in


(string) DATDIF YRDIF
30/360 30 days per month, 360 days per YES YES
year
ACT/ACT Actual # of days for the month, YES YES
actual # of days for the year
ACT/360 Actual # of days for month, 360 days NO YES
per year
ACT/365 Actual # of days for moth, 365 days NO YES
per year
Examples of computing DATDIF and YRDIF

DATA USE_DIF;
DATEDF1=DATDIF('01SEP1984'D,'01NOV2010'D, '30/360');
DATEDF2=DATDIF('01SEP1984'D,'01NOV2010'D, 'ACT/ACT');
YEARDF1=YRDIF('01SEP1984'D,'01NOV2010'D, '30/360');
YEARDF2=YRDIF('01SEP1984'D,'01NOV2010'D, 'ACT/ACT');
PROC PRINT; RUN;

Results:
Obs DATEDF1 DATEDF2 YEARDF1 YEARDF2
1 9420 9557 26.1667 26.1662
Exercise 3
Run the following program and observe how functions
TODAY, YEAR, MONTH, QTR. WEEKDAT, DAY work.

data datefunctions;
date1='25DEC2010'd;
date2=TODAY();
YEAR_date1=YEAR(date1);
MONTH_Date1=MONTH(Date1);
QTR_Date1=QTR(Date1);
WEEKDAY_Date1=WEEKDAY(Date1);
DAY_Date1=DAY(date1);
proc print; format date1 date2 date9.; run;
Exercise 4
Run the following program and observe how INTCK and INTNX functions work.

data dateFunc2;
date1 = '25DEC2010'd;
date2 = TODAY();
NDAYS=INTCK('DAY', Date1, Date2);
NYEARS=INTCK('YEAR', date1, date2);
NMONTH=INTCK('MONTH', date1, date2);
NQTR=INTCK('QTR',date1, date2);
NWEEK=INTCK('WEEK', date1, date2);
Incmonth1=INTNX('MONTH', today(),6, 'b');
incmonth2=INTNX('MONTH', today(),6, 'm');
incmonth3=intnx('month', today(), 6, 'e');
Datediff=DATDIF(date1, date2,'ACT/ACT');
Yeardiff=YRDIF(date1, date2, 'ACT/ACT');
proc print;
format date1 date2 Incmonth1 incmonth2 incmonth3 date9.;
run;
Modify Character Values using SAS
Functions
This section focuses on manipulating character
strings. The objectives include:
Replace the contents of a character value
Trim trailing or leading blanks from a character
value
Search a character value and extract a
proportion of the value
Covert a character values to UPPER, lower and
Proper cases.
SAS Functions for manipulating Character Values
There are many SAS functions for manipulating character strings.
This section will discuss the following functions:
Function Purpose
SCAN Look for a specific word from a character string
SUBSTR Extract a substring or replaces character values
TRIM Trim trailing blanks from character values
LEFT Left align the string that is right-aligned to allow for TRIM the traling blanks
UPCASE Convert the character value to UPPER case
LOWCASE Convert the character value to lower case
PROPCASE Convert the character value to Proper case
CATX Concatenate strings, remove leading, trailing blanks and insert separator

INDEX Search character value for a specific string


FIND Search for a specific substring with a character string user specifies
TRANWRD Replace or remove all occurrences of a pattern of characters within a character
string
How does SCAN function works?
SCAN allows users to separate words in a character string using
separators.
General Syntax:
SCAN(argument, n, <,delimiters>);
Argument is the character variable or expression to be scanned
n specifies which word to read
Delimiters are special characters, which must be enclosed in a single
quotation mark. If you do not specify delimiters, default delimiters
are used.
Default delimiters include:
blank . < ( + | & ! $ * ) ; ^ - / , %
The default length from the SCAN function is 200. Therefore, it is
essential to specify the LENGTH statement prior to the SCAN
function.
Some Examples of using SCAN function

Name = CURTIS, BEN MIKE;


To search for the first name, we can use SCAN
function:
Fname1=scan(Name, 2); gives BEN
Fname2=scan(Name, 2, , ); gives BEN
Fname3=SCAN(Name, 3); gives the result MIKE
Fname4=SCAN(Name, 2, ,); gives BEN MIKE
SUBSTR function
SUBSTR serves two purposes:
extracts a portion of a character string by starting at a specified
position:
General syntax (Right side SUBSTR):
Target = SUBSTR(string, position <,n>);
Replace the content of a character string:
General Syntax (Left side SUBSTR):
SUBSTR(string, position <, n>) = substring;
The string does not need to be marked by delimiters.
If n is omitted in the SUBTR function, then, all remaining
characters are included in (or replaced by) the substring.
The length of the substring has the same length of the string.
Hence, it is important to define the LENGTH statement as
needed prior to the SUBSTR function.
Examples of using Right-side SUBSTR function

SUBSTR serves two purposes:


(1) Extract a substring from a character string (right SUBSTR). Here
are some examples for Right-side SUBSTR:
NAME = CURTIS, BEN MIKE;
To extract the middle initial, one can use SCAN to locate Middle
name, MIKE, then use SUBTR to extract the middle initial, M:

MidName=SCAN(name, 3);
Midinit = SUBSTR(MidName, 1, 1);
Example of using Left-side SUBSTR function

The 2nd purpose of SUBSTR is to replace a substring in a


string:
For example,
NAME = CURTIS, BEN MIKE;
The correct middle name is MICHAEL not MIKE. One can
use Left-side SUBSTR function:
SUBSTR(Name, 13)=MICHAEL;

NOTE: The size of the substring is not specified. This will


replace everything starting at the 13th position in the
string by MICHAEL.
TRIM Function
TRIM function helps to trim the trailing blanks before
concatenating strings together.
The general syntax:

TRIM(Variable);
In case there are LEADING blanks, we can use the function
LEFT(variable), which turn the variable to Left-align, and create
Trailing blank, instead. We can then apply TRIM function:
TRIM(LEFT(Variable));
Converting character values into UPPER,
Lower, and Proper cases
UPCASE(character value) returns the character values all in UPPER
case.
Ex: UPCASE( Mission street) returns MISION STREET
LOWCASE(character value) returns the strings all in lower case.
Ex: LOWCASE( Mission street) returns mission street
PROPCASE(character value) returns the value with 1st character
upper case and the rest in lower cases.
Ex: PROPCASE( MISSION street) returns Mission Street

These functions are very useful when dealing with character values,
especially when we use IF statement that involving character
values, especially when values are stored in mixed cases.
CATX Function
When concatenating character strings, it often requires to trim
leading and trailing blanks, and provides separator to separate
words in order to obtain the correct new character strings. One
can use TRIM, LEFT, concatenating separators to do the task.
Starting SAS 9.1, a new SAS function, CATX is created to handle all
of these at the same step.

The general Syntax of CATX function:


CATX(separator, string-1 <, string-n>);
Separator specifies the character string used for separating
between concatenating stings. It must be in a quotation mark
String-n specifies a SAS character string.
CATX function example
The following data consists of Name (1-20), Jobtype (22-40),
city(42-53), state(55-63), zipcode(65-71):

AARON, BRAD MAC Network Technician Alma Michgan 48801


FLEMING, TIM WAREN Computer Analyst Mt Pleasant Michgan 48858
CHEN, DAVID MICHAEL Instructor MT PLEASANT MICHGAN 48858

The following program reads the data set and creates the address
label for each individual using CATX function
Data job;
INFILE ;
input name $ 1-20 jobtype $ 22-40 city $ 42-53 State $ 55-63
zipcode 65-71;
Address = CATX(, , PROPCASE(city), PROPCASE(state), ZIPCODE);
Exercise 5
Open c13_1 program
Run program 1, observe how SCAN function works, and
see the variable attributes, especially the variable length.
Then, add the Length statement to define the length for
Fname0 to Fname10 the length of 10. Run the program,
check the length again.
Run program 2 and observe the results and the variable
attributes. Then add Length statement for Name with
length 20. Run the program, and see the results.
Run program 3 and observe the results and the variable
attributes. Then add Length statement for Name with
length 20. Run the program, and see the results.
INDEX Function
INDEX function is used to search a character value for a specified
string.
It searches from Left to Right, looking for the first occurrence of the
string, and returns the POSITION of the strings first character. If
the string does not exist, it returns 0.
General syntax:
INDEX(source, excerpt);
Source specifies the character variables or expression to search.
Excerpt is a character string that is enclosed in quotation marks,
to be searched from the source.

Ex: INDEX(upcase(jobtype), WORD PROCESSING); returns the


position of W when WORD PROCESSING first found, or Zero if no
such string is found.
FIND Function
Another way for searching a string is the FIND function, which searches for a
specific substring within a character string.
FIND function searches for the first occurrence of the substring, and return
the POSITION of the substring. If no such a string, it returns zero.
General Syntax:
FIND(string, substring <,modifiers> <,startpos>);
String is a character constant, variable, or expression to be searched for
the substring.
Substring is a character constant, variable, or expression to be searched
from within the String.
Modifiers is a character constant, variable, or expression specifying one
or more modifiers.
Startpos is an integer specifies the position at which the search should
start and the direction of the search. If Startpos is not given, FIND searches
from left to right starting from 1st position.
What are the Modifiers in FIND
function, and what for?
NOTE: FIND function is similar to INDEX function with some
differences. One is that it allows search started at a given
position, and allows to search backward or forward.
Another difference is the modifiers, which will help to speed up
the search under different searching conditions.
The modifiers include:
Modifier i causes the FIND function to ignore character cases during the
search.
Modifier t trims the trailing blanks from string or substring.
If no modifier is specified, FIND search for the substrings with the same
case as the characters in substrings.
If the modifier is a constant, enclose it in quotation marks. One can specify
more than one modifier, all are in one single quotation. Ex. To use both i
and t modifiers, use i t as modifiers.
Examples of FIND function
NOTE: FIND function without using modifier nor startpos behaves
the same as INDEX function.
Similar to INDEX, FIND is also case sensitive. Make sure you use
UPCASE(string) or LOWCASE(string) in the FIND function if the
cases may be mixed.
Here are some examples using FIND function:

FIND(lowcase(job), data mining, t);


One case combine IF statement and FIND function to select
observations that having job title data mining:

Data dmjob;
Set alljobs;
IF FIND(lowcase(job), data mining, t) > 0;
Run;
TRANWRD Function
TRANWRD function replaces or removes all occurrences of a
pattern of characters from within a character string.
A situation using TRANWRD is to update existing variables in
place, such as change MISS to MS., change Doctor to Dr.
and so on.

General Syntax:
TRANWRD(source, target, replacement);

Source is the source string to be translated or updated.


Target is the string SAS is looking for in the source that is to be
removed or replaced.
Replacement specifies the new string to replace the target.
To remove the target from source, simply use as
replacement.
Examples of using TRANWRD
Note: TRANWRD function is case sensitive. Use
UPCASE , LOWCASE function as needed.
Example:

TRANWRD(name, Miss , Ms.);


TRANWRD(propcase(name), Doctor , Dr.);
Nesting SAS Functions
As you have seen in the previous examples, SAS
functions can be nested with another SAS function.
For example, name = Curtis, Ben, mike
Midname=
TRIM(UPCASE(substr(scan(name,3),1,1,)))||.;

is to look for the middle name, then, locate the 1st


position of the middle name, then, select one
character as the middle initial. Make it as upper case,
trim the trailing blanks, and add a period.
SAS Functions for modifying numeric values

In manipulating numeric values, one may be interested in only


integer part of a value, may need to round off to a certain # of
digits, and so on. SAS has a set of functions to modify numeric
values:
INT(argument); returns the integer part of the argument.
ROUND(argument, round-off-unit); returns the value rounded of
to the unit specified.
CEIL(argument); returns the value round-up to the next largest
integer.
FLOOR(argument); returns the value round-down to the next
smallest integer.
Examples of modifying numeric values

Data value INT ROUND(value, .1) CEIL FLOOR

1.259 1 1.3 2 1
-1.259 -1 -1.3 -1 -2
20.934 20 20.9 21 20
-20.934 -20 -20.9 -20 -21

You might also like