An Introduction to SAS Character Functions (including some new SAS 9 functions)
Some Functions We Will Discuss
LENGTH SUBSTR COMPBL COMPRESS VERIFY INPUT PUT TRANWRD SCAN TRIM UPCASE LOWCASE INDEX INDEXC INDEXW SPEDIS LENGTH
Some SAS 9 Functions
CATX and CATS COMPARE LENGTHC LENGTHN STRIP COUNT COUNTC PROPCASE
FIND FINDC
Functions That Compute the Length of Strings: Purpose: To determine the length of a character value, not counting trailing blanks. Syntax: LENGTH(character-value)
Example: For these examples CHAR = "ABC
FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ") RETURNS 3 3 1
Function: LENGTHC
Purpose: To determine the length of a character value, including trailing blanks.
Syntax: LENGTHC(character-value) Examples: For these examples CHAR = "ABC
FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ") RETURNS 3 6 1
Function: LENGTHM
Purpose: To determine the length of a character variable in memory. Syntax: LENGTHM(character-value)
Examples For these examples CHAR = "ABC "
FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ")
RETURNS 3 6 1
Function: LENGTHN
Purpose: To determine the length of a character value, not counting trailing blanks.
Syntax: LENGTHN(character-value)
Examples For these examples CHAR = "ABC
FUNCTION
LENGTH("ABC") LENGTH(CHAR) LENGTH(" ")
RETURNS
3 3 0
The COMPARE Function
data compare; input code $ @@; value = 'V30.450'; c1 = compare(code,value); c2 = compare(code,value,':i'); c3 = compare(trim(code),value,':i'); datalines; V30 V30.450 v30.4 ; Listing of Data Set COMPARE
code
V30 V30.450 v30.4
value
V30.450 V30.450 V30.450
c1
-4 0 1
c2
-4 0 -6
c3
0 0 0
Character Storage Lengths
data chars1; length string $ 7; string = 'abc'; length = length(string); storage_length = lengthc(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;
SAS Log
11 12 13 14 15 16 17 18 19 20 data chars1; length string $ 7; string = 'abc'; storage_length = lengthc(string); length = length(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;
storage_length=7 length=3 display=:abc :
Moving the LENGTH Statement
data chars2; string = 'abc'; length string $ 7; storage_length = lengthc(string); length = length(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;
SAS Log
1 data chars2; 2 string = 'abc'; 3 length string $ 7; WARNING: Length of character variable string has already been set. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable. 4 storage_length = lengthc(string); 5 length = length(string); 6 display = ":" || string || ":"; 7 put storage_length= / 8 length= / 9 display=; 10 run;
storage_length=3 length=3 display=:abc:
Function: SUBSTR
Purpose: To extract part of a string. When the SUBSTR function is used on the left side of the equal sign. Syntax: SUBSTR(character-value, start <,length>) Examples For these examples, let STRING = "ABC123XYZ"
Function
Returns
SUBSTR(STRING,4,2) SUBSTR(STRING,4)
"12" "123XYZ"
The INPUT Function
data special; ***INPUT is a special function often used for character to numeric conversion; length c_date $ 10 numeral $ 3; input c_date numeral; sas_date = input(c_date,mmddyy10.); number = input(numeral,3.); datalines; 11/12/1950 123 Listing of Data Set SPECIAL 9-15-2004 99 c_date numeral sas_date number ;
11/12/1950 9-15-2004 123 99 -3337 16329 123 99
The PUT Function
data special; ***PUT is a special function often used for numeric to character conversion; input sas_date number ss; c_date = put(sas_date,date9.); money = put(number,dollar8.); ss_char = put(ss,ssn.); datalines; 0 1234 123456789 ;
Listing of Data Set SPECIAL sas_date 0 number 1234 ss 123456789 c_date 01JAN1960 money $1,234 ss_char 123-45-6789
DATA SUBSTRING; INPUT ID $ 1-9; LENGTH STATE $ 2; STATE = SUBSTR(ID,1,2); NUM = INPUT(SUBSTR(ID,7,3),3.); DATALINES; NYXXXX123 NJ1234567 ; PROC PRINT DATA=SUBSTRING NOOBS; TITLE 'Listing of Data Set SUBSTRING'; RUN;
Output Dataset: Listing of Data Set SUBSTRING
ID STATE NUM NYXXXX123 NY 123 NJ1234567 NJ 567
Converting Multiple Blanks to a Single Blank
data multiple; input #1 @1 Name $20. #2 @1 Address $30. #3 @1 City $15. @20 State $2. @25 Zip $5.; name = compbl(name); address = compbl(address); city = compbl(city); datalines; Ron Cody 89 Lazy Brook Rd. Flemington NJ 08822 Bill Brown 28 Cathy Street North City NY 11518 ;
Multiple
Name Ron Cody Bill Brown City Flemington North City Address 89 Lazy Brook Rd. 28 Cathy Street State NJ NY Zip 08822 11518
How to Remove Characters from a String
data phone; input phone $15.; phone1 = compress(phone); phone2 = compress(phone,'(-) '); datalines; (908)235-4490 (201) 555-77 99 ; Phone
phone (908)235-4490 (201) 555-77 99 phone1 (908)235-4490 (201)555-7799 phone2 9082354490 2015557799
Another COMPRESS Example
data social; input ss_char $11.; ss = input(compress(ss_char,'-'),9.); easy_ss = input(ss_char,comma11.); datalines; 123-45-6789 ;
ss = 123456789 (numeric) ss_easy = 123456789 (numeric)
The VERIFY Function
data verify; input @1 id $3. @5 answer $5.; position = verify(answer,'abcde'); datalines; Verify 001 acbed 002 abxde id answer position 003 12cce 001 acbed 0 004 abc e 002 abxde 3 003 12cce 1 ;
004 abc e 4
Watch Out for Trailing Blanks
data trailing; length string $ 10; string = 'abc'; position = verify(string,'abcde'); run;
String = 'abc ' Position = 4 (the position of the first trailing blank)
Function: TRIM
Purpose: To remove trailing blanks from a character value. Syntax: TRIM(character-value) Examples For these examples, STRING1 = "ABC Function TRIM(STRING1) TRIM(STRING2) Returns "ABC" " XYZ"
" and STRING2 = "
XYZ
TRIM("A B C
")
"A B C"
"AB" " " (length = 1)
TRIM("A ") || TRIM("B ") TRIM(" ")
Function: STRIP Purpose: To strip leading and trailing blanks from character variables or strings.
STRIP(CHAR) is equivalent to TRIMN(LEFT(CHAR)), but more convenient. Syntax: STRIP(character-value) Examples For these examples, let STRING = "
abc
Function STRIP(STRING) STRIP(" LEADING AND TRAILING ")
Returns "abc" "LEADING AND TRAILING"
STRIP Function
data _null_; string = ' Testing '; try1 = strip(string); try2 = trim(left(string)); put string= quote12./ try1= quote12./ try2= quote12.; run;
Partial log:
string=" Testing" try1="Testing" try2="Testing"
Watch Out for Trailing Blanks
data trailing; length string $ 10; string = 'abc'; position = verify(trim(string),'abcde'); run;
Position = 0
Using VERIFY for Data Cleaning
data clean; input id $; ***Valid ID's contain letters X,Y, or Z and digits; if verify(trim(id),'XYZ0123456789') eq 0 then valid = 'Yes'; else valid = 'No'; datalines; Listing of Data Set CLEAN 12X67YZ 67WXYZ id valid ; 12X67YZ Yes 67WXYZ No
Substring Example
data pieces_parts; input Id $9.; length State $ 2; state = substr(id,1,2); Num = input(substr(id,7,3),3.); datalines; Listing of Data Set PIECES_PARTS NYXXXX123 NJ1234567 Id State Num ;
NYXXXX123 NJ1234567 NY NJ 123 567
Changing Case
Data case; input name $15.; upper = upcase(name); lower = lowcase(name); proper = propcase(name); Datalines; gEOrge SMITH The end ;
Listing of Data Set CASE name gEOrge SMITH The end upper GEORGE SMITH THE END lower george smith the end proper George Smith The End
The SUBSTR Function on the LeftHand Side of the Equal Sign
data pressure; input sbp dbp @@; length sbp_chk dbp_chk $ 4; sbp_chk = put(sbp,3.); dbp_chk = put(dbp,3.); if sbp gt 160 then substr(sbp_chk,4,1) = '*'; if dbp gt 90 then substr(dbp_chk,4,1) = '*'; datalines; 120 80 180 92 200 110 ;
The SUBSTR Function on the LeftHand Side of the Equal Sign
Listing of Data Set PRESSURE
sbp 120 180 200 dbp 80 92 110 sbp_chk 120 180* 200* dbp_chk 80 92* 110*
Parsing a String
data take_apart; input @1 Cost $10.; Integer = input(scan(Cost,1,' /'),8.); Num = input(scan(Cost,2,' /'),8.); Den = input(scan(Cost,3,' /'),8.); if missing(Num) then Amount = Integer; else Amount = Integer + Num/Den; datalines; Listing of Data Set TAKE_APART 1 3/4 12 1/2 Cost Integer Num Den Amount 123 ; 1 3/4 1 3 4 1.75
12 1/2 123 12 123 1 . 2 . 12.50 123.00
Using the SCAN Function to Extract a Last Name
data first_last; length last_name $ 15; input @1 name $20. @22 phone $13.; ***extract the last name from name; last_name = scan(name,-1,' '); *** minus value scans from the right; datalines; Jeff W. Snoker (908)782-4382 Raymond Albert (732)235-4444 Alfred Edward Newman (800)123-4321 Steven J. Foster (201)567-9876 Jose Romerez (516)593-2377 ;
Using the SCAN Function to Extract a Last Name
Names and Phone Numbers in Alphabetical Order (by Last Name) Name Raymond Albert Steven J. Foster Alfred Edward Newman Jose Romerez Jeff W. Snoker Phone Number (732)235-4444 (201)567-9876 (800)123-4321 (516)593-2377 (908)782-4382
Locating the Position of One String Within Another String
data locate; input string $10.; first = index(string,'xyz'); first_c = indexc(string,'x','y','z'); /*Equivalent indexc(string,'xyz') */ datalines; string first first_c abczyx1xyz 1234567890 abczyx1xyz 8 4 abcx1y2z39 1234567890 0 0 XYZabcxyz abcx1y2z39 0 4 ;
XYZabcxyz 7 7
Locating the Position of One String Within Another String
data locate; input string $10.; first = find(string,'xyz','i'); first_c = findc(string,'xyz','i'); /* i means ignore case */ datalines; string first first_c abczyx1xyz 1234567890 abczyx1xyz 8 4 abcx1y2z39 1234567890 0 0 XYZabcxyz abcx1y2z39 0 4 ;
XYZabcxyz 1 1
Locating One Word in a String Function INDEXW
data _null_; string = 'anything goes any where'; index = index(string,'any'); indexw = indexw(string,'any'); put index= indexw=; run;
index = 1 indexw = 15
Note: You can specify delimiters for indexw in a third argument
Substituting One Word for Another in a String
data convert; input @1 address $20. ; *** Convert Street, Avenue and Boulevard to their abbreviations; Address = tranwrd(address,'Street','St.'); Address = tranwrd(address,'Avenue','Ave.'); Address = tranwrd(address,'Road','Rd.'); datalines; Listing of Data Set CONVERT 89 Lazy Brook Road 123 River Rd. Obs Address 12 Main Street ; 1 89 Lazy Brook Rd. 2 123 River Rd. 3 12 Main St.
Spelling distance
data compare; length string1 string2 $ 15; input string1 string2; points = spedis(string1,string2); datalines; Listing of Data Set COMPARE same same same sam string1 string2 points first xirst last lasx same same 0 receipt reciept same sam 8 ; first xirst 40
last receipt lasx reciept 25 7
The "ANY" Functions
data find_alpha_digit; input string $20.; first_alpha = anyalpha(string); first_digit = anydigit(string); datalines; Listing of Data Set FIND_ALPHA_DIGIT no digits here first_ first_ the 3 and 4 string alpha digit 123 456 789 no digits here 1 0 ;
the 3 and 4 123 456 789 1 0 5 1
The "NOT" Functions Beware of Trailing Blanks
length string $ 10; string = '123'; position = notdigit(string); pos_trim = notdigit(trim(string));
position = 4 (position of first blank) pos_trim = 0
The "NOT" Functions
data data_cleaning; input string $20.; not_alpha = notalpha(trim(string)); not_digit = notdigit(trim(string)); datalines; Listing of Data Set DATA_CLEANING abcdefg 1234567 not_ not_ abc123 string alpha digit 1234abcd ;
abcdefg 1234567 abc123 1234abcd 0 1 4 1 1 0 1 5
Concatenation Functions
data join_up; length cats $ 6 catx $ 17; string1 = 'ABC '; string2 = ' XYZ '; string3 = '12345'; cats = cats(string1,string2); catx = catx('***',string1,string2,string3); run;
cats = 'ABCXYZ' catx = 'ABC***XYZ***12345
Without the length statement, cats and catx would have a length of 200
Some LENGTH Functions
data how_long; one = 'ABC '; miss = ' '; /* char missing value */ 3 length_one = length(one); 3 lengthn_one = lengthn(one); 6 lengthc_one = lengthc(one); 1 length_two = length(miss); 0 lengthn_two = lengthn(miss); 1 lengthc_two = lengthc(miss); run;
The COMPARE Function
COMPARE(string1, string2 <,'modifiers'>)
I ignore case L remove leading blanks : truncate the longer string to the length of the shorter string. The default is to pad the shorter string with blanks before a comparison. (Note: similar to the =: comparison operator)
If string1 and string2 are the same, COMPARE returns a value of 0. If the arguments differ, the sign of the result is negative if string1 precedes string2 in a sort sequence, and positive if string1 follows string2 in a sort sequence The magnitude of the result is equal to the position of the leftmost character at which the strings differ.
The STRIP Function
data _null_; length concat $ 8; file print; one = ' ABC '; two = ' XYZ '; one_two = ':' || one || two || ':'; strip = ':' || strip(one) || strip(two) || ':'; concat = cats(':',one,two,':'); put one_two= / strip= / concat=; run;
one_two=: ABC strip=:ABCXYZ: concat=:ABCXYZ:
XYZ
COUNT and COUNTC Functions
data Dracula; /* Get it Count Dracula */ input string $20.; count_abc = count(string,'abc'); countc_abc = countc(string,'abc'); count_abc_i = count(string,'abc','i'); datalines; xxabcxABCxxbbbb cbacba Listing of Data Set DRACULA ;
string
xxabcxABCxxbbbb cbacba count_ abc 1 0 countc_ abc 7 6 count_ abc_i 2 0
Contact Information
Author: Ron Cody You may download copies of the Powerpoint presentation from: www2.umdnj.edu/codyweb/biocomputing